date:20211206

Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-06 Thread Richard Biener via Gcc-patches

On Mon, Dec 6, 2021 at 4:45 PM Jan Hubicka  wrote:
>
> > On Mon, Dec 6, 2021 at 4:03 PM Richard Biener
> >  wrote:
> > >
> > > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
> > >  wrote:
> > > >
> > > > Richard Biener  writes:
> > > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
> > > > >  wrote:
> > > > >>
> > > > >> When compiling an optabs.ii at -O2 with a release-checking build,
> > > > >> we spent a lot of time in call_may_clobber_ref_p.  One of the
> > > > >> first things the function tries is the get_modref_function_summary
> > > > >> approach, but that also seems to be the most expensive check.
>
> In general it should not be that expensive since the trees are generally
> small (despite the 3 nested loops making them look dangerously big) and
> most of time we resort to simple pointer/array lookups.
>
> From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html
> (which are quite useful when trying to understand typical behaviour of
> the alias oracle)
>
> In call_may_clobber_ref_p we get
>
> 2980 :  256141790 :   callee = gimple_call_fndecl (call);
> 2981 ::
> 2982 :  256141790 :   if (callee != NULL_TREE && 
> !ref->volatile_p)
> 2983 :: {
> 2984 :  240446150 :   struct cgraph_node *node = 
> cgraph_node::get (callee);
> this is direct pointer from node
> 2985 :  240446150 :   if (node)
> 2986 :: {
> 2987 :  240234208 :   modref_summary *summary = 
> get_modref_function_summary (node);
> This is fast summary so it is array lookup
> 2988 :  240234208 :   if (summary)
> 2989 :: {
> 2990 :   19732392 :   if 
> (!modref_may_conflict (call, summary->stores, ref, tbaa_p)
>
> And we get here in 8% of invocations
>
> And in modref_may_conflict we have:
>
>
>21147836 : modref_may_conflict (const gcall *stmt,
> 2552 ::  modref_tree 
>  *tt, ao_ref *ref, bool tbaa_p)
> 2553 :: {
> 2554 :   21147836 :   alias_set_type base_set, ref_set;
> 2555 ::
> 2556 :   21147836 :   if (tt->every_base)
> 2557 :: return true;
> 2558 ::
> 2559 :3407876 :   if (!dbg_cnt (ipa_mod_ref))
> 2560 :: return true;
> 2561 ::
> 2562 :3407876 :   base_set = ao_ref_base_alias_set 
> (ref);
> 2563 ::
> 2564 :3407876 :   ref_set = ao_ref_alias_set (ref);
> All direct lookups in most cases
> 2565 ::
> 2566 :3407876 :   int num_tests = 0, max_tests = 
> param_modref_max_tests;
> 2567 :   14479077 :   for (auto base_node : tt->bases)
> 2568 :: {
> 2569 :6171739 :   if (tbaa_p && 
> flag_strict_aliasing)
> At average there are 2 bases travelled by the loop.
> 2570 :: {
> 2571 :5253690 :   if (num_tests >= max_tests)
> 2572 :: return true;
> 2573 :5253690 :   alias_stats.modref_tests++;
> 2574 :5253690 :   if (!alias_sets_conflict_p 
> (base_set, base_node->base))
> alias set checks cheecks are also reasonably cheap and 50% of time avoids 
> walking rest of the tree
> 2575 :2238398 : continue;
> 2576 :3015292 :   num_tests++;
> 2577 :: }
> 2578 ::
> 2579 :3933341 :   if (base_node->every_ref)
>
> We hit the loop only 0.15 times per invocation of the function
>
> 2580 :: return true;
> 2581 ::
> 2582 :   14943624 :   for (auto ref_node : 
> base_node->refs)
> 2583 :: {
> 2584 ::   /* Do not repeat same test 
> as before.  */
> 2585 :4381325 :   if ((ref_set != base_set || 
> base_node->base != ref_node->ref)
>
> and usually visit only bit more htan one refs
> 2586 :2548127 :   && tbaa_p && 
> flag_strict_aliasing)
> 2587 :: {
> 2588 :1840866 :   if (num_tests >= 
> max_tests)
> 2589

Re: [PATCH] Fix hash_map::traverse overload

2021-12-06 Thread Richard Biener via Gcc-patches

On Tue, Dec 7, 2021 at 8:43 AM Richard Biener
 wrote:
>
> On Mon, Dec 6, 2021 at 11:47 AM Matthias Kretz  wrote:
> >
> > While reading the hash_map code I noticed this inconsistency. Bootstrapped 
> > and
> > regtested on x86_64. OK for trunk?
>
> I've inspected two users of said overload and they return true.  Did you look
> at the rest?  I assume that bootstrapping and testing with asserting that
> the callback never returns false in that overload should succeed?
>
> That said, the inconsistency is bad - but how can we be sure we're not
> relying on that?  I mean more than just bootstrapping and regtesting ;)

Btw, can you please amend the

 /* Call the call back on each pair of key and value with the passed in
 arg.  */

comment to say how the return value influences iteration?  Maybe also
note the traversal is unordered.

Note hash-set.h has the same "problem" (a callback with a bool return
but ignored result).  hash-table.h "properly" handles the return.

Richard.

> Thanks,
> Richard.
>
> >
> > The hash_map::traverse overload taking a non-const Value pointer breaks
> > if the callback returns false. The other overload should behave the
> > same.
> >
> > Signed-off-by: Matthias Kretz 
> >
> > gcc/ChangeLog:
> >
> > * hash-map.h (hash_map::traverse): Let both overloads behave the
> > same.
> > ---
> >  gcc/hash-map.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> >
> > --
> > ──
> >  Dr. Matthias Kretz   https://mattkretz.github.io
> >  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
> >  stdₓ::simd
> > ──

Re: [PATCH] Fix hash_map::traverse overload

2021-12-06 Thread Richard Biener via Gcc-patches

On Mon, Dec 6, 2021 at 11:47 AM Matthias Kretz  wrote:
>
> While reading the hash_map code I noticed this inconsistency. Bootstrapped and
> regtested on x86_64. OK for trunk?

I've inspected two users of said overload and they return true.  Did you look
at the rest?  I assume that bootstrapping and testing with asserting that
the callback never returns false in that overload should succeed?

That said, the inconsistency is bad - but how can we be sure we're not
relying on that?  I mean more than just bootstrapping and regtesting ;)

Thanks,
Richard.

>
> The hash_map::traverse overload taking a non-const Value pointer breaks
> if the callback returns false. The other overload should behave the
> same.
>
> Signed-off-by: Matthias Kretz 
>
> gcc/ChangeLog:
>
> * hash-map.h (hash_map::traverse): Let both overloads behave the
> same.
> ---
>  gcc/hash-map.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  stdₓ::simd
> ──

Re: [patch] lto: Don't run ipa-comdats pass during LTO

2021-12-06 Thread Richard Biener via Gcc-patches

On Tue, Dec 7, 2021 at 1:01 AM Sandra Loosemore  wrote:
>
> The attached patch fixes an ICE in lto1 at lto-partition.c:215 that
> was reported by a customer.  Unfortunately I have no test case for
> this; the customer's application is a big C++ shared library with lots
> of dependencies and proprietary code under NDA.  I did try reducing it
> with cvise but couldn't end up with anything small or robust enough to
> be suitable, and likewise my attempts to write a small testcase by
> hand were not successful in reproducing the bug.  OTOH, I did track
> this down in the debugger and I think I have a pretty good
> understanding of what the problem is, namely...

I found PR65995 which looks related (well, the ICE is at the same position).
Does the customer use the linker plugin?

> The symbol lto-partition was failing on was a vtable symbol that was
> in a comdat group but not marked as DECL_COMDAT and without the right
> visibility flags for a comdat symbol.  The LTO data in the input .o
> files is correct and lto1 is creating the symbol correctly when it's
> read in, but the problem is that there are two optimization passes
> being run before lto-partition that are trying to do conflicting
> things with COMDATs.
>
> The first is the ipa-visibility pass.  When this pass is run as part
> of lto1, since it has the complete program or shared library (as in
> this case) available to analyze, it tries to break up COMDATs and turn
> them into ordinary local symbols.  But then the ipa-comdats pass,
> which is also run as part of regular IPA optimizations at compile
> time, tries to do the reverse and move local symbols only referenced
> from a COMDAT into that same COMDAT, but in the process it is failing
> to set all the flags needed by the LTO partitioner to correctly
> process the symbol.  It's possible the failure to set the flags
> properly is a bug in ipa-comdats, but looking at the bigger picture,
> it makes no sense to be running this optimization pass at link time at
> all.  It is a compile-time optimization intended to reduce code size
> by letting the linker keep only one copy of these symbols that may be
> redundantly defined in multiple objects.
>
> So the patch I've come up with disables the ipa-comdats pass in the
> same situations in which ipa-visibility localizes COMDAT symbols, to
> remove the conflict between the two passes.  This fixes the customer
> problem (in a GCC 10 build for arm-linux-gnueabi), and regression
> tests OK on mainline x86_64-linux-gnu as well.  OK for mainline, and
> to backport to older branches?  It looked to me like none of this
> this code had been touched for years.

I hope Honza can review this.

Thanks,
Richard.

>
> -Sandra

Re: [PATCH 2/2] Use dominators to reduce ranger cache-flling.

2021-12-06 Thread Richard Biener via Gcc-patches

On Mon, Dec 6, 2021 at 7:39 PM Andrew MacLeod  wrote:
>
> On 12/6/21 02:27, Richard Biener wrote:
> > On Fri, Dec 3, 2021 at 9:42 PM Andrew MacLeod via Gcc-patches
> >  wrote:
> >> When a request is made for the range of an ssa_name at some location,
> >> the first thing we do is invoke range_of_stmt() to ensure we have looked
> >> at the definition and have an evaluation for the name at a global
> >> level.  I recently added a patch which dramatically reduces the call
> >> stack requirements for that call.
> >>
> >> Once we have confirmed the definition range has been set, a call is made
> >> for the range-on-entry to the block of the use.  This is performed by
> >> the cache, which proceeds to walk the CFG predecessors looking for when
> >> ranges are created  (exported), existing range-on-entry cache hits,  or
> >> the definition block. Once this list of  predecessors has been created,
> >> a forward walk is done, pushing the range's through successor edges of
> >> all the blocks  were visited in the initial walk.
> >>
> >> If the use is far from the definition, we end up filling a lot of the
> >> same value on these paths.  Also uses which are far from a
> >> range-modifying statement push the same value into a lot of blocks.
> >>
> >> This patch tries to address at least some inefficiencies.  It recognizes
> >> that
> >>
> >> First, if there is no range modifying stmt between this use and the last
> >> range we saw in a dominating block, we can just use the value from the
> >> dominating block and not fill in all the cache entries between here and
> >> there.  This is the biggest win.
> >>
> >> Second. if there is a range modifying statement at the end of some
> >> block, we will have to do the appropriate cache walk to this point, but
> >> its possible the range-on-entry to THAT block might be able to use a
> >> dominating range, and we can prevent the walk from going any further
> >> than this block
> >>
> >> Combined, this should prevent a lot of unnecessary ranges from being
> >> plugging into the cache.
> >>
> >> ie, to visualize:
> >>
> >> bb4:
> >> a = foo()
> >> <..>
> >> bb60:
> >>  if (a < 30)
> >> <...>
> >> bb110:
> >>   g = a + 10
> >>
> >> if the first place we ask for a is in bb110, we walk the CFG from 110
> >> all the way back to bb4, on all paths leading back. then fill all those
> >> cache entries.
> >> With this patch,
> >> a) if bb60 does not dominate bb110, the request will scan the
> >> dominators, arrive at the definition block, have seen no range modifier,
> >> and simply set the on-entry for 110 to the range of a. done.
> >> b) if bb60 does dominate 110, we have no idea which edge out of 60
> >> dominates it, so we will revert to he existing cache algorithm.  Before
> >> doing so, it checks and determines that there are no modifiers between
> >> bb60 and the def in bb4, and so sets the on-entry cache for bb60 to be
> >> the range of a.   Now when we do the cache fill walk, it only has to go
> >> back as far as bb60 instead of all the way to bb4.
> >>
> >> Otherwise we just revert to what we do now (or if dominators are not
> >> available).   I have yet to see a case where we miss something we use to
> >> get, but that does not mean there isn't one :-).
> >>
> >> The cumulative performance impact of this compiling a set of 390 GCC
> >> source files at -O2 (measured via callgrind) is pretty decent:  Negative
> >> numbers are a compile time decrease.  Thus -10% is 10% faster
> >> compilation time.
> >>
> >> EVRP : %change from trunk is -26.31% (!)
> >> VRP2 : %change from trunk is -9.53%
> >> thread_jumps_full   : %change from trunk is -15.8%
> >> Total compilation time  : %change from trunk is -1.06%
> >>
> >> So its not insignificant.
> >>
> >> Risk would be very low, unless dominators are screwed up mid-pass.. but
> >> then the relation code would be screwed up too.
> >>
> >> Bootstrapped on  x86_64-pc-linux-gnu with no regressions. OK?
> > OK.
>
> Committed.
>
>
> >
> > Wow - I didn't realize we have this backwards CFG walk w/o dominators ...
> > I wonder if you can add a counter to visualize places we end up using this 
> > path.
>
> Well, its only does the fill now when there is range info located on an
> outgoing edge of the dominator.  Its still used, just on a much smaller
> portion of the graph.
>
> We could do even better if we knew whether one edge was involved. ie
>
> bb3:
> a = foo()
> if (a > 4)
> blah1;   bb4
> else
> blah2;   bb5
> bb6:
>   = a;
>
> The use of a in bb6 will look to bb3 as the dominator, but if it knew
> that both edges are used in that dominance relation (ie, neither
> outgoing edge dominates bb6),  it wouldn't have to calculate a range for
> 'a'.
>
> But as it stands, it cant really tell the above situation from:
>
> bb3:
> a = foo()
> if (a > 4)
>  = abb4
>
> In this case, only the one edge is used, and we DO need to calculate a
> range for a.  The edge from 3->4 does dominate bb4
>
> In

Re: [PATCH v2] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2021-12-06 Thread Kewen.Lin via Gcc-patches

Hi Segher,

on 2021/12/6 下午9:06, Segher Boessenkool wrote:
> On Fri, Dec 03, 2021 at 11:46:53AM +0800, Kewen.Lin wrote:
 This patch is to fix the inconsistent behaviors for non-LTO mode
 and LTO mode.  As Martin pointed out, currently the function
 rs6000_can_inline_p simply makes it inlinable if callee_tree is
 NULL, but it's wrong, we should use the command line options
 from target_option_default_node as default.
>>>
>>> This is not documented.
>>
>> Yeah, but according to the document for the target attribute [1],
>> "Multiple target back ends implement the target attribute to specify
>> that a function is to be compiled with different target options than
>> specified on the command line. The original target command-line options
>> are ignored. ", it seems to say the function without any target
>> attribute/pragma will be compiled with target options specified on the
>> command line.  I think it's a normal expectation for users.
> 
> The whole target_option_default_node is not documented, I meant.
> 

Ah, you meant that.  Yeah, it can be improved separately.

>> Excepting for the inconsistent behaviors between LTO and non-LTO,
>> it can also make the below case different.
>>
>> // Option: -O2 -mcpu=power8 -mhtm
>>
>> int foo(int *b) {
>>   *b += __builtin_ttest();
>>   return *b;
>> }
>>
>> __attribute__((target("no-htm"))) int bar(int *a) {
>>   *a = foo(a);
>>   return 0;
>> }
>>
>> Without this fix, bar inlines foo but get the error as below:
>>
>> In function ‘foo’,
>> inlined from ‘bar’ at test.c:8:8:
>> test.c:3:9: error: ‘__builtin_ttest’ requires the ‘-mhtm’ option
>> 3 |   *b += __builtin_ttest();
>>   | ^
>>
>> Since bar doesn't support foo's required HTM feature.
>>
>> With this fix, bar doesn't inline foo and the case gets compiled well.
> 
> And if you inline something no-htm into something that allows htm?  That
> should work fine, be allowed just fine.
> 

Thanks for helping to answer this, Peter!

 It also replaces
 rs6000_isa_flags with the one from target_option_default_node
 when caller_tree is NULL as rs6000_isa_flags could probably
 change since initialization.
>>>
>>> Is this enough then?  Are there no other places that use
>>> rs6000_isa_flags?  Is it correct for us to have that variable at all
>>> anymore?  Etc.
>>
>> Good question, I think the answer is yes.  I digged into it and found
>> the current target option save/restore scheme would keep rs6000_isa_flags
>> to be the same as the one saved in target_option_default_node for the context
>> of inlining.  So I put one assertion as below:
>>
>> if (!caller_tree) {
>>   caller_tree = target_option_default_node;
>>   gcc_assert (rs6000_isa_flags
>>   == TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags);
>> }
>>
>> And kicked off regression testings on both BE and LE, the result showed
>> only one same failure, which seems to be a latent bug.  I've filed PR103515
>> for it.
> 
> Ah cool :-)  Please send a patch to add that asser then (well, once we
> can bootstrap with it ;-) )
> 

OK.

>> This bug also indicates it's better to use target_option_default_node
>> rather than rs6000_isa_flags when the caller_tree is NULL.  Since
>> the target_option_default_node is straightforward and doesn't rely
>> on the assumption that rs6000_isa_flags would be kept as the default
>> one correctly, it doesn't suffer from bugs like PR103515.
> 
> So we should not ever use rs6000_isa_flags anymore?
> 

If the question is for the scope in function rs6000_can_inline_p, yes.

Before this patch, the only use of rs6000_isa_flags is:

   /* If the caller has option attributes, then use them.
  Otherwise, use the command line options.  */
  if (caller_tree)
caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
  else
caller_isa = rs6000_isa_flags;

This patch changes this part to be like (logically):

  if (caller_tree)
caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
  else
caller_isa = TREE_TARGET_OPTION 
(target_option_default_node)->x_rs6000_isa_flags;

If the question is for the others out of function rs6000_can_inline_p, no.
The rs6000_isa_flags holds the flags for the current context of either top level
or some given function decl.  As function rs6000_set_current_function shows, we
already consider target_option_default_node for the case that there is NULL
DECL_FUNCTION_SPECIFIC_TARGET for one decl.

>>> The fusion ones are obvious.  The other two are not.  Please put those
>>> two more obviously separate (not somewhere in the sea of fusion
>>> options), and some comment wouldn't hurt either.  You can make it
>>> separate statements even, make it really stand out.
>>>
>>
>> Fixed by splitting it into: fusion_options_mask and 
>> optimization_options_mask.
>> I'm not happy for the later name, but poor to get a better in mind.  Do you
>> have any suggestions on

[PATCH] pragma: Update target option node when optimization changes [PR103515]

2021-12-06 Thread Kewen.Lin via Gcc-patches

Hi,

For a function with optimize pragma, it's possible that the target
options change as optimization options change.  Now we create one
optimization option node when parsing pragma optimize, but don't
create target option node for possible target option changes.  It
makes later processing not detect the target options have actually
changed and doesn't update the target options accordingly.

This patch is to check whether target options have changed when
creating one optimization option node for pragma optimize, and
make one target option node if needed.  The associated test case
shows the difference.  Without this patch, the function foo1 will
perform unrolling which is unexpected.  The reason is that flag
unroll_only_small_loops isn't correctly set for it.  The value
is updated after parsing function foo2, but doesn't get restored
later since both decls don't have DECL_FUNCTION_SPECIFIC_TARGET
set and the hook think we don't need to switch.  With this patch,
there is no unrolling for foo1, which is also consistent with the
behavior by replacing pragma by attribute whether w/ and w/o this
patch.

Bootstrapped and regtested on x86_64-redhat-linux, aarch64-linux-gnu
and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
---
gcc/ChangeLog:

PR target/103515
* attribs.c (decl_attributes): Check if target options change and
create one node if so.

gcc/testsuite/ChangeLog:

PR target/103515
* gcc.target/powerpc/pr103515.c: New test.

-
---
 gcc/attribs.c   | 13 -
 gcc/testsuite/gcc.target/powerpc/pr103515.c | 30 +
 2 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103515.c

diff --git a/gcc/attribs.c b/gcc/attribs.c
index c252f5af07b..761d9760769 100644
--- a/gcc/attribs.c
+++ b/gcc/attribs.c
@@ -607,7 +607,18 @@ decl_attributes (tree *node, tree attributes, int flags,
   if (TREE_CODE (*node) == FUNCTION_DECL
   && optimization_current_node != optimization_default_node
   && !DECL_FUNCTION_SPECIFIC_OPTIMIZATION (*node))
-DECL_FUNCTION_SPECIFIC_OPTIMIZATION (*node) = optimization_current_node;
+{
+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (*node) = optimization_current_node;
+  tree cur_tree
+   = build_target_option_node (_options, _options_set);
+  tree old_tree = DECL_FUNCTION_SPECIFIC_TARGET (*node);
+  if (!old_tree)
+   old_tree = target_option_default_node;
+  /* The changes on optimization options can cause the changes in
+target options, update it accordingly if it's changed.  */
+  if (old_tree != cur_tree)
+   DECL_FUNCTION_SPECIFIC_TARGET (*node) = cur_tree;
+}
 
   /* If this is a function and the user used #pragma GCC target, add the
  options to the attribute((target(...))) list.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103515.c 
b/gcc/testsuite/gcc.target/powerpc/pr103515.c
new file mode 100644
index 000..698b9a93037
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103515.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll-optimized" } */
+
+/* The pragma specified for foo2 should not affect foo1.
+   Verify compiler won't perform unrolling for foo1.  */
+
+#define N 1024
+extern int a1[N], b1[N], c1[N];
+extern int a2[N], b2[N], c2[N];
+extern int n;
+
+void
+foo1 ()
+{
+  int i;
+  for (i = 0; i < n; i++)
+c1[i] += a1[i] + b1[i];
+}
+
+#pragma GCC optimize("O3,unroll-loops")
+void
+foo2 ()
+{
+  int i;
+  for (i = 0; i < n; i++)
+c2[i] += a2[i] + b2[i];
+}
+
+/* { dg-final { scan-rtl-dump-times "optimized: loop unrolled" 1 
"loop2_unroll" } } */
+
-- 
2.27.0

[PATCH] [i386]Add combine splitter to transform vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0

2021-12-06 Thread Haochen Jiang via Gcc-patches

This patch adds combine splitter to transform vpcmpeqd/vpxor/vblendvps to 
vblendvps for ~op0.

OK for trunk?

BRs,
Haochen

gcc/ChangeLog:

PR target/100738
* config/i386/sse.md 
(*_blendv_not_ltint):
Add new define_insn_and_split.

gcc/testsuite/ChangeLog:

PR target/100738
* g++.target/i386/pr100738-1.C: New test.

---
 gcc/config/i386/sse.md | 28 ++
 gcc/testsuite/g++.target/i386/pr100738-1.C | 19 +++
 2 files changed, 47 insertions(+)
 create mode 100755 gcc/testsuite/g++.target/i386/pr100738-1.C

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 08bdcddc111..db3506c78d7 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -20659,6 +20659,34 @@
(set_attr "btver2_decode" "vector,vector,vector") 
(set_attr "mode" "")])
 
+;; PR target/100738: Transform vpcmpeqd + vpxor + vblendvps to vblendvps for 
inverted mask;
+(define_insn_and_split 
"*_blendv_not_ltint"
+  [(set (match_operand: 0 "register_operand")
+   (unspec:
+ [(match_operand: 1 "register_operand")
+  (match_operand: 2 "vector_operand")
+  (subreg:
+(lt:VI48_AVX
+  (subreg:VI48_AVX
+  (not:
+(match_operand: 3 "register_operand")) 0)
+  (match_operand:VI48_AVX 4 "const0_operand")) 0)]
+ UNSPEC_BLENDV))]
+  "TARGET_SSE4_1 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:
+[(match_dup 2) (match_dup 1) (match_dup 3)] UNSPEC_BLENDV))]
+{
+  operands[0] = gen_lowpart (mode, operands[0]);
+  operands[1] = gen_lowpart (mode, operands[1]);
+  operands[2] = gen_lowpart (mode, operands[2]);
+  operands[3] = gen_lowpart (mode, operands[3]);
+  if (MEM_P (operands[2]))
+operands[2] = force_reg (mode, operands[2]);
+})
+
 (define_insn "_dp"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
(unspec:VF_128_256
diff --git a/gcc/testsuite/g++.target/i386/pr100738-1.C 
b/gcc/testsuite/g++.target/i386/pr100738-1.C
new file mode 100755
index 000..5a04c5b031f
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr100738-1.C
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx2" } */
+/* { dg-final {scan-assembler-times "vblendvps\[ \\t\]" 2 } } */
+/* { dg-final {scan-assembler-not "vpcmpeqd\[ \\t\]" } } */
+/* { dg-final {scan-assembler-not "vpxor\[ \\t\]" } } */
+
+typedef int v4si __attribute__((vector_size(16)));
+typedef char v16qi __attribute__((vector_size(16)));
+v4si
+foo_1 (v16qi a, v4si b, v4si c, v4si d)
+{
+  return ((v4si)~a) < 0 ? c : d;
+}
+
+v4si
+foo_2 (v16qi a, v4si b, v4si c, v4si d)
+{
+  return ((v4si)~a) >= 0 ? c : d;
+}
-- 
2.18.1

Re: [PATCH 1/6] rs6000: Remove new_builtins_are_live and dead code it was guarding

2021-12-06 Thread Segher Boessenkool

Hi!

On Mon, Dec 06, 2021 at 02:49:03PM -0600, Bill Schmidt wrote:
> To allow for a sane switch-over from the old built-in infrastructure to the
> new, both sets of code have co-existed, with the enabled one under the control
> of the boolean variable new_builtins_are_live.  As a first step in removing 
> the
> old code, remove this variable and the now-dead code it was guarding.

>   * config/rs6000/darwin.h (SUBTARGET_INIT_BUILTINS): Remove
>   test for new_builtins_are_live and simplify.

Please don't terminate changelog lines early.

>   * config/rs6000/rs6000-c.c (altivec_build_resolved_builtin): Remove
>   dead function.

Same here, but you may want to keep line length 79 instead of 80, some
people like that for some reason.

>   (altivec_resolve_overloaded_builtin): Remove test for
>   new_builtins_are_live and simplify.
>   * config/rs6000/rs6000-call.c (altivec_init_builtins): Remove forward
>   declaration.
>   (builtin_function_type): Likewise.
>   (rs6000_common_init_builtins): Likewise.
>   (htm_init_builtins): Likewise.
>   (mma_init_builtins): Likewise.
>   (def_builtin): Remove dead function.
>   (rs6000_expand_zeroop_builtin): Likewise.
>   (rs6000_expand_mtfsf_builtin): Likewise.
>   (rs6000_expand_mtfsb_builtin): Likewise.
>   (rs6000_expand_set_fpscr_rn_builtin): Likewise.
>   (rs6000_expand_set_fpscr_drn_builtin): Likewise.
>   (rs6000_expand_unop_builtin): Likewise.
>   (altivec_expand_abs_builtin): Likewise.
>   (rs6000_expand_binop_builtin): Likewise.
>   (altivec_expand_lxvr_builtin): Likewise.
>   (altivec_expand_lv_builtin): Likewise.
>   (altivec_expand_stxvl_builtin): Likewise.
>   (altivec_expand_stv_builtin): Likewise.
>   (mma_expand_builtin): Likewise.
>   (htm_expand_builtin): Likewise.
>   (cpu_expand_builtin): Likewise.
>   (rs6000_expand_quaternop_builtin): Likewise.
>   (rs6000_expand_ternop_builtin): Likewise.
>   (altivec_expand_dst_builtin): Likewise.
>   (altivec_expand_vec_sel_builtin): Likewise.
>   (altivec_expand_builtin): Likewise.
>   (rs6000_invalid_builtin): Likewise.
>   (rs6000_builtin_valid_without_lhs): Likewise.
>   (rs6000_gimple_fold_builtin): Remove test for new_builtins_are_live and
>   simplify.
>   (rs6000_expand_builtin): Likewise.
>   (rs6000_init_builtins): Remove tests for new_builtins_are_live and
>   simplify.
>   (rs6000_builtin_decl): Likewise.
>   (altivec_init_builtins): Remove dead function.
>   (mma_init_builtins): Likewise.
>   (htm_init_builtins): Likewise.
>   (builtin_quaternary_function_type): Likewise.
>   (builtin_function_type): Likewise.
>   (rs6000_common_init_builtins): Likewise.
>   * config/rs6000/rs6000-gen-builtins.c (write_header_file): Don't
>   declare new_builtins_are_live.
>   (write_init_bif_table): In generated code, remove test for
>   new_builtins_are_live and simplify.
>   (write_init_ovld_table): Likewise.
>   (write_init_file): Don't initialize new_builtins_are_live.
>   * config/rs6000/rs6000.c (rs6000_builtin_vectorized_function): Remove
>   test for new_builtins_are_live and simplify.
>   (rs6000_builtin_md_vectorized_function): Likewise.
>   (rs6000_builtin_reciprocal): Likewise.
>   (add_condition_to_bb): Likewise.
>   (rs6000_atomic_assign_expand_fenv): Likewise.

You could have said "Remove old builtins code." everywhere ;-)

Okay for trunk.  Thanks!


Segher

PING [PATCH v2 2/2] add -Wdangling-pointer [PR #63272]

2021-12-06 Thread Martin Sebor via Gcc-patches


Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585819.html

On 11/30/21 3:55 PM, Martin Sebor wrote:

Attached is a revision of this patch with adjustments for
the changes to the prerequisite patch 1 in the series and
a couple of minor simplifications and slightly improved
test coverage, rested on x86_64-linux.

On 11/1/21 4:18 PM, Martin Sebor wrote:

Patch 2 in this series adds support for detecting the uses of
dangling pointers: those to auto objects that have gone out of
scope.  Like patch 1, to minimize false positives this detection
is very simplistic.  However, thanks to the more deterministic
nature of the problem (all local objects go out of scope) is able
to detect more instances of it.  The approach I used is to simply
search the IL for clobbers that dominate uses of pointers to
the clobbered objects.  If such a use is found that's not
followed by a clobber of the same object the warning triggers.
Similar to -Wuse-after-free, the new -Wdangling-pointer option
has multiple levels: level 1 to detect unconditional uses and
level 2 to flag conditional ones.  Unlike with -Wuse-after-free
there is no use case for testing dangling pointers for
equality, so there is no level 3.

Tested on x86_64-linux and  by building Glibc and Binutils/GDB.
It found no problems outside of the GCC test suite.

As with the first patch in this series, the tests contain a number
of xfails due to known limitations marked with pr??.  I'll
open bugs for them before committing the patch if I don't resolve
them first in a followup.

Martin

PING [PATCH v2 1/2] add -Wuse-after-free

2021-12-06 Thread Martin Sebor via Gcc-patches


Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585816.html

On 11/30/21 3:32 PM, Martin Sebor wrote:

Attached is a revised patch with the following changes based
on your comments:

1) Set and use statement uids to determine which statement
    precedes which in the same basic block.
2) Avoid testing flag_isolate_erroneous_paths_dereference.
3) Use post-dominance to decide whether to use the "maybe"
    phrasing vs a definite form.

David raised (and in our offline discussion today reiterated)
an objection to the default setting of the option being
the strictest.  I have not changed that in this revision.
See my rationale for this choice in my reply below:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583176.html

Martin

On 11/23/21 2:16 PM, Martin Sebor wrote:

On 11/22/21 6:32 PM, Jeff Law wrote:



On 11/1/2021 4:17 PM, Martin Sebor via Gcc-patches wrote:

Patch 1 in the series detects a small subset of uses of pointers
made indeterminate by calls to deallocation functions like free
or C++ operator delete.  To control the conditions the warnings
are issued under the new -Wuse-after-free= option provides three
levels.  At the lowest level the warning triggers only for
unconditional uses of freed pointers and doesn't warn for uses
in equality expressions.  Level 2 warns also for come conditional
uses, and level 3 also for uses in equality expressions.

I debated whether to make level 2 or 3 the default included in
-Wall.  I decided on 3 for two reasons: 1) to raise awareness
of both the problem and GCC's new ability to detect it: using
a pointer after it's been freed, even only in principle, by
a successful call to realloc, is undefined, and 2) because
it's trivial to lower the level either globally, or locally
by suppressing the warning around such misuses.

I've tested the patch on x86_64-linux and by building Glibc
and Binutils/GDB.  It triggers a number of times in each, all
due to comparing invalidated pointers for equality (i.e., level
3).  I have suppressed these in GCC (libiberty) by a #pragma,
and will see how the Glibc folks want to deal with theirs (I
track them in BZ #28521).

The tests contain a number of xfails due to limitations I'm
aware of.  I marked them pr?? until the patch is approved.
I will open bugs for them before committing if I don't resolve
them in a followup.

Martin

gcc-63272-1.diff

Add -Wuse-after-free.

gcc/c-family/ChangeLog

* c.opt (-Wuse-after-free): New options.

gcc/ChangeLog:

* diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Handle
OPT_Wreturn_local_addr and OPT_Wuse_after_free_.
* diagnostic-spec.h (NW_DANGLING): New enumerator.
* doc/invoke.texi (-Wuse-after-free): Document new option.
* gimple-ssa-warn-access.cc (pass_waccess::check_call): Rename...
(pass_waccess::check_call_access): ...to this.
(pass_waccess::check): Rename...
(pass_waccess::check_block): ...to this.
(pass_waccess::check_pointer_uses): New function.
(pass_waccess::gimple_call_return_arg): New function.
(pass_waccess::warn_invalid_pointer): New function.
(pass_waccess::check_builtin): Handle free and realloc.
(gimple_use_after_inval_p): New function.
(get_realloc_lhs): New function.
(maybe_warn_mismatched_realloc): New function.
(pointers_related_p): New function.
(pass_waccess::check_call): Call check_pointer_uses.
(pass_waccess::execute): Compute and free dominance info.

libcpp/ChangeLog:

* files.c (_cpp_find_file): Substitute a valid pointer for
an invalid one to avoid -Wuse-0after-free.

libiberty/ChangeLog:

* regex.c: Suppress -Wuse-after-free.

gcc/testsuite/ChangeLog:

* gcc.dg/Wmismatched-dealloc-2.c: Avoid -Wuse-after-free.
* gcc.dg/Wmismatched-dealloc-3.c: Same.
* gcc.dg/attr-alloc_size-6.c: Disable -Wuse-after-free.
* gcc.dg/attr-alloc_size-7.c: Same.
* c-c++-common/Wuse-after-free-2.c: New test.
* c-c++-common/Wuse-after-free-3.c: New test.
* c-c++-common/Wuse-after-free-4.c: New test.
* c-c++-common/Wuse-after-free-5.c: New test.
* c-c++-common/Wuse-after-free-6.c: New test.
* c-c++-common/Wuse-after-free-7.c: New test.
* c-c++-common/Wuse-after-free.c: New test.
* g++.dg/warn/Wdangling-pointer.C: New test.
* g++.dg/warn/Wmismatched-dealloc-3.C: New test.
* g++.dg/warn/Wuse-after-free.C: New test.

diff --git a/gcc/gimple-ssa-warn-access.cc 
b/gcc/gimple-ssa-warn-access.cc

index 63fc27a1487..2065402a2b9 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc

@@ -3397,33 +3417,460 @@ pass_waccess::maybe_check_dealloc_call 
(gcall *call)

  }
  }
+/* Return true if either USE_STMT's basic block (that of a 
pointer's use)
+   is dominated by INVAL_STMT's (that of a pointer's invalidating 
statement,

+   which is either a clobber or a deallocation call), or if they're in
+   the same block, USE_STMT follows INVAL_STMT.  */
+
+static bool
+gimple_use_after_inval_p (gimple

[patch] lto: Don't run ipa-comdats pass during LTO

2021-12-06 Thread Sandra Loosemore


The attached patch fixes an ICE in lto1 at lto-partition.c:215 that
was reported by a customer.  Unfortunately I have no test case for
this; the customer's application is a big C++ shared library with lots
of dependencies and proprietary code under NDA.  I did try reducing it
with cvise but couldn't end up with anything small or robust enough to
be suitable, and likewise my attempts to write a small testcase by
hand were not successful in reproducing the bug.  OTOH, I did track
this down in the debugger and I think I have a pretty good
understanding of what the problem is, namely...

The symbol lto-partition was failing on was a vtable symbol that was
in a comdat group but not marked as DECL_COMDAT and without the right
visibility flags for a comdat symbol.  The LTO data in the input .o
files is correct and lto1 is creating the symbol correctly when it's
read in, but the problem is that there are two optimization passes
being run before lto-partition that are trying to do conflicting
things with COMDATs.

The first is the ipa-visibility pass.  When this pass is run as part
of lto1, since it has the complete program or shared library (as in
this case) available to analyze, it tries to break up COMDATs and turn
them into ordinary local symbols.  But then the ipa-comdats pass,
which is also run as part of regular IPA optimizations at compile
time, tries to do the reverse and move local symbols only referenced
from a COMDAT into that same COMDAT, but in the process it is failing
to set all the flags needed by the LTO partitioner to correctly
process the symbol.  It's possible the failure to set the flags
properly is a bug in ipa-comdats, but looking at the bigger picture,
it makes no sense to be running this optimization pass at link time at
all.  It is a compile-time optimization intended to reduce code size
by letting the linker keep only one copy of these symbols that may be
redundantly defined in multiple objects.

So the patch I've come up with disables the ipa-comdats pass in the
same situations in which ipa-visibility localizes COMDAT symbols, to
remove the conflict between the two passes.  This fixes the customer
problem (in a GCC 10 build for arm-linux-gnueabi), and regression
tests OK on mainline x86_64-linux-gnu as well.  OK for mainline, and
to backport to older branches?  It looked to me like none of this
this code had been touched for years.

-Sandra
commit 559b1911c825c8df273d6cc7c403a3d0c79d0295
Author: Sandra Loosemore 
Date:   Sun Dec 5 17:59:19 2021 -0800

lto: Don't run ipa-comdats pass during LTO

When the ipa-visibility pass is invoked during LTO, it attempts to
localize comdat symbols and destroy their comdat group.  The
ipa-comdats pass subsequently attempts to do the opposite and move
symbols into comdats, but without marking them as DECL_COMDAT or
restoring appropriate visibility flags.  This confuses the
lto-partition pass and leads to ICEs.  Running the ipa-comdats pass at
link time seems counterproductive in terms of optimization, so the
simplest fix is not to do that.

2021-12-05  Sandra Loosemore  

	gcc/
	* ipa-comdats.c (pass_ipa_comdats::gate): Skip this pass if it
	would un-do the comdat localization already performed at LTO
	time in ipa-visibility.c.

diff --git a/gcc/ipa-comdats.c b/gcc/ipa-comdats.c
index a9d2993..d178734 100644
--- a/gcc/ipa-comdats.c
+++ b/gcc/ipa-comdats.c
@@ -428,6 +428,13 @@ public:
 bool
 pass_ipa_comdats::gate (function *)
 {
+  if ((in_lto_p || flag_whole_program) && !flag_incremental_link)
+/* This is the combination of flags cgraph_externally_visible_p in
+   ipa-visibility.c uses to enable localization of comdat symbols.
+   Do not attempt to undo that by trying to re-insert symbols into
+   comdats without their original visibility information, as it
+   causes assertion failures in lto-partition. */
+return false;
   return HAVE_COMDAT_GROUP;
 }

[committed] analyzer: fix equivalence class state purging [PR103533]

2021-12-06 Thread David Malcolm via Gcc-patches

Whilst debugging state explosions seen when enabling taint detection
with -fanalyzer (PR analyzer/103533), I noticed that constraint
manager instances could contain stray, redundant constants, such
as this instance:

constraint_manager:
  equiv classes:
ec0: {(int)0 == [m_constant]‘0’}
ec1: {(size_t)4 == [m_constant]‘4’}
  constraints:

where there are two equivalence classes, each just containing a
constant, with no constraints using them.

This patch makes constraint_manager::canonicalize more aggressive
about purging state, handling the case of purging a redundant
EC containing just a constant.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-5815-gc9543403c19fdc3c3b5a8db8546340de085bd14e.

gcc/analyzer/ChangeLog:
PR analyzer/103533
* constraint-manager.cc (equiv_class::contains_non_constant_p):
New.
(constraint_manager::canonicalize): Call it when determining
redundant ECs.
(selftest::test_purging): New selftest.
(selftest::run_constraint_manager_tests): Likewise.
* constraint-manager.h (equiv_class::contains_non_constant_p):
New decl.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/constraint-manager.cc | 149 -
 gcc/analyzer/constraint-manager.h  |   2 +
 2 files changed, 149 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/constraint-manager.cc 
b/gcc/analyzer/constraint-manager.cc
index ea6b5dc60e0..76e44e77d0a 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -1145,6 +1145,30 @@ equiv_class::canonicalize ()
   m_vars.qsort (svalue::cmp_ptr_ptr);
 }
 
+/* Return true if this EC contains a variable, false if it merely
+   contains constants.
+   Subroutine of constraint_manager::canonicalize, for removing
+   redundant ECs.  */
+
+bool
+equiv_class::contains_non_constant_p () const
+{
+  if (m_constant)
+{
+  for (auto iter : m_vars)
+   if (iter->maybe_get_constant ())
+ continue;
+   else
+ /* We have {non-constant == constant}.  */
+ return true;
+  /* We only have constants.  */
+  return false;
+}
+  else
+/* Return true if we have {non-constant == non-constant}.  */
+return m_vars.length () > 1;
+}
+
 /* Get a debug string for C_OP.  */
 
 const char *
@@ -2718,8 +2742,7 @@ constraint_manager::canonicalize ()
   {
equiv_class *ec = m_equiv_classes[i];
if (!used_ecs.contains (ec)
-   && ((ec->m_vars.length () < 2 && ec->m_constant == NULL_TREE)
-   || (ec->m_vars.length () == 0)))
+   && !ec->contains_non_constant_p ())
  {
m_equiv_classes.unordered_remove (i);
delete ec;
@@ -3704,6 +3727,127 @@ test_many_constants ()
 }
 }
 
+/* Verify that purging state relating to a variable doesn't leave stray
+   equivalence classes (after canonicalization).  */
+
+static void
+test_purging (void)
+{
+  tree int_0 = build_int_cst (integer_type_node, 0);
+  tree a = build_global_decl ("a", integer_type_node);
+  tree b = build_global_decl ("b", integer_type_node);
+
+  /*  "a != 0".  */
+  {
+region_model_manager mgr;
+region_model model ();
+ADD_SAT_CONSTRAINT (model, a, NE_EXPR, int_0);
+ASSERT_EQ (model.get_constraints ()->m_equiv_classes.length (), 2);
+ASSERT_EQ (model.get_constraints ()->m_constraints.length (), 1);
+
+/* Purge state for "a".  */
+const svalue *sval_a = model.get_rvalue (a, NULL);
+model.purge_state_involving (sval_a, NULL);
+model.canonicalize ();
+/* We should have an empty constraint_manager.  */
+ASSERT_EQ (model.get_constraints ()->m_equiv_classes.length (), 0);
+ASSERT_EQ (model.get_constraints ()->m_constraints.length (), 0);
+  }
+
+  /*  "a != 0" && "b != 0".  */
+  {
+region_model_manager mgr;
+region_model model ();
+ADD_SAT_CONSTRAINT (model, a, NE_EXPR, int_0);
+ADD_SAT_CONSTRAINT (model, b, NE_EXPR, int_0);
+ASSERT_EQ (model.get_constraints ()->m_equiv_classes.length (), 3);
+ASSERT_EQ (model.get_constraints ()->m_constraints.length (), 2);
+
+/* Purge state for "a".  */
+const svalue *sval_a = model.get_rvalue (a, NULL);
+model.purge_state_involving (sval_a, NULL);
+model.canonicalize ();
+/* We should just have the constraint/ECs involving b != 0.  */
+ASSERT_EQ (model.get_constraints ()->m_equiv_classes.length (), 2);
+ASSERT_EQ (model.get_constraints ()->m_constraints.length (), 1);
+ASSERT_CONDITION_TRUE (model, b, NE_EXPR, int_0);
+  }
+
+  /*  "a != 0" && "b == 0".  */
+  {
+region_model_manager mgr;
+region_model model ();
+ADD_SAT_CONSTRAINT (model, a, NE_EXPR, int_0);
+ADD_SAT_CONSTRAINT (model, b, EQ_EXPR, int_0);
+ASSERT_EQ (model.get_constraints ()->m_equiv_classes.length (), 2);
+ASSERT_EQ (model.get_constraints ()->m_constraints.length (), 1);
+
+/* Purge state for "a".  */
+const svalue

[PATCH] PR fortran/103588 - ICE: Simplification error in gfc_ref_dimen_size, at fortran/array.c:2407

2021-12-06 Thread Harald Anlauf via Gcc-patches

Dear all,

using a bad expression as stride in a subsequent array section did
lead to an internal error which was directly invoked after the
failure.  We better return a failure code to let error recovery
do its expected job.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 1487d327b13b45acca79c0c691a748ca1a50bc04 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 6 Dec 2021 23:34:17 +0100
Subject: [PATCH] Fortran: catch failed simplification of	bad stride
 expression

gcc/fortran/ChangeLog:

	PR fortran/103588
	* array.c (gfc_ref_dimen_size): Do not generate internal error on
	failed simplification of stride expression; just return failure.

gcc/testsuite/ChangeLog:

	PR fortran/103588
	* gfortran.dg/pr103588.f90: New test.
---
 gcc/fortran/array.c| 8 +++-
 gcc/testsuite/gfortran.dg/pr103588.f90 | 8 
 2 files changed, 11 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr103588.f90

diff --git a/gcc/fortran/array.c b/gcc/fortran/array.c
index 5762c8d92d4..5f9ed17f919 100644
--- a/gcc/fortran/array.c
+++ b/gcc/fortran/array.c
@@ -2403,11 +2403,9 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int dimen, mpz_t *result, mpz_t *end)
 	{
 	  stride_expr = gfc_copy_expr(ar->stride[dimen]);

-	  if(!gfc_simplify_expr(stride_expr, 1))
-	gfc_internal_error("Simplification error");
-
-	  if (stride_expr->expr_type != EXPR_CONSTANT
-	  || mpz_cmp_ui (stride_expr->value.integer, 0) == 0)
+	  if (!gfc_simplify_expr (stride_expr, 1)
+	 || stride_expr->expr_type != EXPR_CONSTANT
+	 || mpz_cmp_ui (stride_expr->value.integer, 0) == 0)
 	{
 	  mpz_clear (stride);
 	  return false;
diff --git a/gcc/testsuite/gfortran.dg/pr103588.f90 b/gcc/testsuite/gfortran.dg/pr103588.f90
new file mode 100644
index 000..198e1766cd2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr103588.f90
@@ -0,0 +1,8 @@
+! { dg-do compile }
+! PR fortran/103588 - ICE: Simplification error in gfc_ref_dimen_size
+! Contributed by G.Steinmetz
+
+program p
+  integer, parameter :: a(:) = [1,2] ! { dg-error "cannot be automatic or of deferred shape" }
+  integer :: b(2) = a(::a(1))! { dg-error "Invalid" }
+end
--
2.26.2

[COMMITTED] rs6000: Fix errant "vector" instead of "__vector"

2021-12-06 Thread Paul A. Clarke via Gcc-patches

Committed as trivial and obvious.

Fixes 85289ba36c2e62de84cc0232c954d9a74bda708a.

2021-12-06  Paul A. Clarke  

gcc
PR target/103545
* config/rs6000/xmmintrin.h (_mm_movemask_ps): Replace "vector" with
"__vector".
---
 gcc/config/rs6000/xmmintrin.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/xmmintrin.h b/gcc/config/rs6000/xmmintrin.h
index 4c093fd1d5ae..31d26add50b3 100644
--- a/gcc/config/rs6000/xmmintrin.h
+++ b/gcc/config/rs6000/xmmintrin.h
@@ -1353,7 +1353,7 @@ extern __inline int __attribute__((__gnu_inline__, 
__always_inline__, __artifici
 _mm_movemask_ps (__m128  __A)
 {
 #ifdef _ARCH_PWR10
-  return vec_extractm ((vector unsigned int) __A);
+  return vec_extractm ((__vector unsigned int) __A);
 #else
   __vector unsigned long long result;
   static const __vector unsigned int perm_mask =
-- 
2.27.0

[PATCH] PR fortran/103591 - ICE in gfc_compare_string, at fortran/arith.c:1119

2021-12-06 Thread Harald Anlauf via Gcc-patches

Dear all,

we didn't check the type of the upper bound in a case range.
Bummer.  Simply add a corresponding check.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From b4e7aeae4f6c59d8fe950d7981832e3f9c6a8f0e Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 6 Dec 2021 23:15:11 +0100
Subject: [PATCH] Fortran: add check for type of upper bound in case range

gcc/fortran/ChangeLog:

	PR fortran/103591
	* match.c (match_case_selector): Check type of upper bound in case
	range.

gcc/testsuite/ChangeLog:

	PR fortran/103591
	* gfortran.dg/select_9.f90: New test.
---
 gcc/fortran/match.c|  9 +
 gcc/testsuite/gfortran.dg/select_9.f90 | 10 ++
 2 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/select_9.f90

diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
index 2bf21434a42..52bc5af7542 100644
--- a/gcc/fortran/match.c
+++ b/gcc/fortran/match.c
@@ -6075,6 +6075,15 @@ match_case_selector (gfc_case **cp)
 	  m = gfc_match_init_expr (>high);
 	  if (m == MATCH_ERROR)
 	goto cleanup;
+	  if (m == MATCH_YES
+	  && c->high->ts.type != BT_LOGICAL
+	  && c->high->ts.type != BT_INTEGER
+	  && c->high->ts.type != BT_CHARACTER)
+	{
+	  gfc_error ("Expression in CASE selector at %L cannot be %s",
+			 >high->where, gfc_typename (c->high));
+	  goto cleanup;
+	}
 	  /* MATCH_NO is fine.  It's OK if nothing is there!  */
 	}
 }
diff --git a/gcc/testsuite/gfortran.dg/select_9.f90 b/gcc/testsuite/gfortran.dg/select_9.f90
new file mode 100644
index 000..c580e8162bd
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/select_9.f90
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! PR fortran/103591 - ICE in gfc_compare_string
+! Contributed by G.Steinmetz
+
+program p
+  integer :: n
+  select case (n)
+  case ('1':2.)   ! { dg-error "cannot be REAL" }
+  end select
+end
--
2.26.2

[COMMITTED] MAINTAINERS: Add myself to write after approval and DCO

2021-12-06 Thread Navid Rahimi via Gcc-patches

MAINTAINERS: Add myself to write after approval and DCO sections.

 * MAINTAINERS: Adding myself.

Best wishes,
Navid.

0001-MAINTAINERS-Add-myself-to-write-after-approval-and-D.patch
Description: 0001-MAINTAINERS-Add-myself-to-write-after-approval-and-D.patch

Re: [PATCH v2] fix PR 103143

2021-12-06 Thread Martin Sebor via Gcc-patches


On 12/6/21 1:14 PM, Jeff Law wrote:



On 12/6/2021 10:31 AM, Martin Sebor wrote:

I have broken up the patch into a series of six.  Attached is
part (1), the fix for the typo that causes PR 103143.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they 
need to be a single patch and many reasons why they should be a 
series of independent patches.    Combining them into a single patch 
isn't how we do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff




gcc-103413.diff

commit 9a5bb7a2b0cdb8654061d9cba543c1408fa7adc9
Author: Martin Sebor
Date:   Sat Dec 4 16:22:07 2021 -0700

 Use the recursive form of compute_objsize [PR 103143].
 
 gcc/ChangeLog:
 
 PR middle-end/103143

 * pointer-query.cc (gimple_call_return_array): Call 
compute_objsize_r.
 
 gcc/testsuite/ChangeLog:

 PR middle-end/103143
 * gcc.dg/Wstringop-overflow-83.c: New test.
Thanks.  Presumably by going through the public API, qry->depth got 
reset to 0  and/or we'd get a re-initialized snlim at each call, leading 
to the infinite recursion?


Yes, each call creates a new ssa_name_limit_t::visited bitmap.
It might be better to make compute_objsize_r() a member of
the same class as the bitmap and also rename it, to make it
harder to make this mistake again.

Martin



OK for the trunk.  Thanks for breaking this out.

jeff

Re: [PATCH v2] c++: Handle auto(x) in parameter-declaration-clause [PR103401]

2021-12-06 Thread Jason Merrill via Gcc-patches


On 12/3/21 19:44, Marek Polacek wrote:

On Thu, Dec 02, 2021 at 12:56:38PM -0500, Jason Merrill wrote:

On 12/2/21 10:27, Marek Polacek wrote:

On Wed, Dec 01, 2021 at 11:24:58PM -0500, Jason Merrill wrote:

On 12/1/21 10:16, Marek Polacek wrote:

In C++23, auto(x) is valid, so decltype(auto(x)) should also be valid,
so

 void f(decltype(auto(0)));

should be just as

 void f(int);

but currently, everytime we see 'auto' in a parameter-declaration-clause,
we try to synthesize_implicit_template_parm for it, creating a new template
parameter list.  The code above actually has us calling s_i_t_p twice;
once from cp_parser_decltype_expr -> cp_parser_postfix_expression which
fails and then again from cp_parser_decltype_expr -> cp_parser_expression.
So it looks like we have f and we accept ill-formed code.

So we need to be more careful about synthesizing the implicit template
parameter.  cp_parser_postfix_expression looked like a sensible place.


Does this cover other uses of auto in decltype, such as

void f(decltype(new auto{0}));


Yes: the clearing of auto_is_implicit_function_template_parm_p will happen here
too.

However, I'm noticing this:

void f1(decltype(new auto{0}));
void f2(decltype(new int{0}));

void
g ()
{
  int i;
  void f3(decltype(new auto{0}));
  void f4(decltype(new int{0}));
  f1 (); // error: no matching function for call to f1(int*)
   // couldn't deduce template parameter auto:1
  f2 ();
  f3 ();
  f4 ();
}
I think the error we issue is bogus.  (My patch doesn't change this.  clang++
accepts.)  Should I file a PR (and investigate)?


That certainly suggests that auto_is_implicit_function_template_parm_p isn't
getting cleared soon enough for f1.


Exactly right.
  

?  Should we adjust this flag in cp_parser_decltype along with all the other
flags?


I think that's possible, but wouldn't cover auto in default arguments, or array
bounds.


I guess cp_parser_sizeof_operand would need the same change.

Do we currently handle auto in default arguments wrong?  Ah, I see that we
currently set auto_is_... for the whole parameter declaration clause, rather
than just for the decl-specifier-seq of parameters as the standard
specifies:

"A placeholder-type-specifier of the form type-constraint opt auto can be
used as a decl-specifier of the decl-specifier-seq of a
parameter-declaration of a function declaration or lambda-expression"


Thanks.  How about this then?  The patch gives the rationale.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In C++23, auto(x) is valid, so decltype(auto(x)) should also be valid,
so

   void f(decltype(auto(0)));

should be just as

   void f(int);

but currently, everytime we see 'auto' in a parameter-declaration-clause,
we try to synthesize_implicit_template_parm for it, creating a new template
parameter list.  The code above actually has us calling s_i_t_p twice;
once from cp_parser_decltype_expr -> cp_parser_postfix_expression which
fails and then again from cp_parser_decltype_expr -> cp_parser_expression.
So it looks like we have f and we accept ill-formed code.

This shows that we need to be more careful about synthesizing the
implicit template parameter.  [dcl.spec.auto.general] says that "A
placeholder-type-specifier of the form type-constraintopt auto can be
used as a decl-specifier of the decl-specifier-seq of a
parameter-declaration of a function declaration or lambda-expression..."
so this patch turns off auto_is_... after we've parsed the decl-specifier-seq.

That doesn't quite cut yet though, because we also need to handle an
auto nested in the decl-specifier:

   void f(decltype(new auto{0}));

therefore the cp_parser_decltype change.

The second hunk broke lambda-generic-85713-2.C but I think the error we
issue with this patch is in fact correct, and clang++ agrees.

The r11-1913 change is OK: we need to make sure that we see '(auto)' after
decltype to go ahead with 'decltype(auto)'.

PR c++/103401

gcc/cp/ChangeLog:

* parser.c (cp_parser_decltype): Clear
auto_is_implicit_function_template_parm_p.
(cp_parser_parameter_declaration): Clear
auto_is_implicit_function_template_parm_p after parsing the
decl-specifier-seq.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-generic-85713-2.C: Add dg-error.
* g++.dg/cpp23/auto-fncast7.C: New test.
* g++.dg/cpp23/auto-fncast8.C: New test.
* g++.dg/cpp23/auto-fncast9.C: New test.
---
  gcc/cp/parser.c   | 19 +++
  .../g++.dg/cpp1y/lambda-generic-85713-2.C |  2 +-
  gcc/testsuite/g++.dg/cpp23/auto-fncast7.C |  9 +
  gcc/testsuite/g++.dg/cpp23/auto-fncast8.C | 34 +++
  gcc/testsuite/g++.dg/cpp23/auto-fncast9.C | 17 ++
  5 files changed, 80 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast7.C
  create mode 100644

Re: [PATCH] c++: Fix for decltype and bit-fields [PR95009]

2021-12-06 Thread Jason Merrill via Gcc-patches


On 12/4/21 15:26, Marek Polacek wrote:

Here, decltype deduces the wrong type for certain expressions involving
bit-fields.  Unlike in C, in C++ bit-field width is explicitly not part
of the type, so I think decltype should never deduce to 'int:N'.  The
problem isn't that we're not calling unlowered_expr_type--we are--it's
that is_bitfield_expr_with_lowered_type only handles certain codes, but
not others.  For example, += works fine but ++ does not.

This also fixes decltype-bitfield2.C where we were crashing (!), but
unfortunately it does not fix 84516 or 70733 where the problem is likely
a missing call to unlowered_expr_type.  It occurs to me now that typeof
likely has had the same issue, but this patch should fix that too.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11?


OK.


PR c++/95009

gcc/cp/ChangeLog:

* typeck.c (is_bitfield_expr_with_lowered_type) :
Handle UNARY_PLUS_EXPR, NEGATE_EXPR, NON_LVALUE_EXPR, BIT_NOT_EXPR,
*CREMENT_EXPR too.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype-bitfield1.C: New test.
* g++.dg/cpp0x/decltype-bitfield2.C: New test.
---
  gcc/cp/typeck.c   | 14 +++-
  .../g++.dg/cpp0x/decltype-bitfield1.C | 65 +++
  .../g++.dg/cpp0x/decltype-bitfield2.C | 18 +
  3 files changed, 94 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-bitfield1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-bitfield2.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 5ed9a5ab9ee..4e60db40c76 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -2209,9 +2209,9 @@ invalid_nonstatic_memfn_p (location_t loc, tree expr, 
tsubst_flags_t complain)
return false;
  }
  
-/* If EXP is a reference to a bitfield, and the type of EXP does not

-   match the declared type of the bitfield, return the declared type
-   of the bitfield.  Otherwise, return NULL_TREE.  */
+/* If EXP is a reference to a bit-field, and the type of EXP does not
+   match the declared type of the bit-field, return the declared type
+   of the bit-field.  Otherwise, return NULL_TREE.  */
  
  tree

  is_bitfield_expr_with_lowered_type (const_tree exp)
@@ -2230,6 +2230,14 @@ is_bitfield_expr_with_lowered_type (const_tree exp)
  
  case MODIFY_EXPR:

  case SAVE_EXPR:
+case UNARY_PLUS_EXPR:
+case PREDECREMENT_EXPR:
+case PREINCREMENT_EXPR:
+case POSTDECREMENT_EXPR:
+case POSTINCREMENT_EXPR:
+case NEGATE_EXPR:
+case NON_LVALUE_EXPR:
+case BIT_NOT_EXPR:
return is_bitfield_expr_with_lowered_type (TREE_OPERAND (exp, 0));
  
  case COMPONENT_REF:

diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype-bitfield1.C 
b/gcc/testsuite/g++.dg/cpp0x/decltype-bitfield1.C
new file mode 100644
index 000..2d8d8e81bff
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype-bitfield1.C
@@ -0,0 +1,65 @@
+// PR c++/95009
+// { dg-do compile { target c++11 } }
+
+struct false_type { static constexpr bool value = false; };
+struct true_type { static constexpr bool value = true; };
+template
+struct is_same : false_type {};
+template
+struct is_same : true_type {};
+
+struct A {
+  int i : 31;
+  unsigned long l : 37;
+} a;
+
+void
+g ()
+{
+  // Careful: pre{in,de}crements are lvalues -> deduce T&.  */
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+  static_assert (is_same::value, "");
+

Re: [EXTERNAL] Re: [PATCH] tree-optimization/103514 Missing XOR-EQ-AND Optimization

2021-12-06 Thread Navid Rahimi via Gcc-patches

Hi Marc, thanks for clear explanation.

Actually I have to withdraw this patch. As you noticed there are some problems 
with this. I was testing it yesterday, and I did realize I made mistake 
combining different types in this pattern.

The same approach would work only and only if the types of every operand is 
boolean. But if there are any integer here, then this is not going to work.

For looking at example and link to proof in each case [1].

One actual example, with the input values is like this:

(a & b) ^ (a == b) -> !(a | b)

a = 2 and b = 2,
src :
(a == b) => true
(a & b) => 2
(a & b) ^ (a == b) => (2) ^ (true) => 3 (the compiler will consider true as 1) 
[2].

tgt:
(a | b) => 2
!(a | b) => !(2) => false.

Which means the transformation is incorrect for other types. (Same 
transformation does work for bool types, so I think in my next patch I will 
keep the approach and will restrict it to bool types only).

1) https://compiler-explorer.com/z/h7hcohY74
2) https://eel.is/c++draft/conv.prom#6

Best wishes,
Navid.


From: Marc Glisse 
Sent: Saturday, December 4, 2021 13:22
To: gcc-patches@gcc.gnu.org
Cc: Navid Rahimi
Subject: [EXTERNAL] Re: [PATCH] tree-optimization/103514 Missing XOR-EQ-AND 
Optimization

[You don't often get email from marc.gli...@inria.fr. Learn why this is 
important at http://aka.ms/LearnAboutSenderIdentification.]

+/* (a & b) ^ (a == b) -> !(a | b) */
+/* (a & b) == (a ^ b) -> !(a | b) */
+(for first_op (bit_xor eq)
+ second_op (eq bit_xor)
+ (simplify
+  (first_op:c (bit_and:c @0 @1) (second_op:c @0 @1))
+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+&& types_match (TREE_TYPE (@0), TREE_TYPE (@1)))
+(convert (bit_not (bit_ior @0 @1))

I don't think you need types_match, if both are operands of bit_and, their
types must already match.

It isn't clear what the INTEGRAL_TYPE_P test is for. Your 2
transformations don't seem that similar to me. The first one requires that
a and b have the same type as the result of ==, so they are boolean-like.
The second one makes sense for more general integers, but then it looks
like it should produce (a|b)==0.

It doesn't look like we have a canonical representation between a^b and
a!=b for booleans :-(

(sorry for the broken thread, I was automatically unsubscribed because
mailman doesn't like greylisting)

--
Marc Glisse

[COMMITTED] bpf: mark/remove unused arguments and remove an unused function

2021-12-06 Thread Jose E. Marchesi via Gcc-patches

This patch does a little bit of cleanup by removing some unused
arguments, or marking them as unused.  It also removes the function
ctfc_debuginfo_early_finish_p and the corresponding hook macro
definition, which are not used by GCC.

gcc/
* config/bpf/bpf.c (bpf_handle_preserve_access_index_attribute):
Mark arguments `args' and flags' as unused.
(bpf_core_newdecl): Remove unused local `newdecl'.
(bpf_core_newdecl): Remove unused argument `loc'.
(ctfc_debuginfo_early_finish_p): Remove unused function.
(TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P): Remove definition.
(bpf_core_walk): Do not pass a location to bpf_core_newdecl.
---
 gcc/config/bpf/bpf.c | 25 +
 1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
index 82bb698bd91..9d2c0bb6818 100644
--- a/gcc/config/bpf/bpf.c
+++ b/gcc/config/bpf/bpf.c
@@ -129,8 +129,8 @@ bpf_handle_fndecl_attribute (tree *node, tree name,
 
 static tree
 bpf_handle_preserve_access_index_attribute (tree *node, tree name,
-   tree args,
-   int flags,
+   tree args ATTRIBUTE_UNUSED,
+   int flags ATTRIBUTE_UNUSED,
bool *no_add_attrs)
 {
   if (TREE_CODE (*node) != RECORD_TYPE && TREE_CODE (*node) != UNION_TYPE)
@@ -258,20 +258,6 @@ bpf_option_override (void)
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE bpf_option_override
 
-/* Return FALSE iff -mcore has been specified.  */
-
-static bool
-ctfc_debuginfo_early_finish_p (void)
-{
-  if (TARGET_BPF_CORE)
-return false;
-  else
-return true;
-}
-
-#undef TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P
-#define TARGET_CTFC_DEBUGINFO_EARLY_FINISH_P ctfc_debuginfo_early_finish_p
-
 /* Implement TARGET_ASM_INIT_SECTIONS.  */
 
 static void
@@ -1266,15 +1252,14 @@ bpf_core_get_index (const tree node)
   return -1;
 }
 
-/* Synthesize a new builtin function declaration at LOC with signature TYPE.
+/* Synthesize a new builtin function declaration with signature TYPE.
Used by bpf_resolve_overloaded_builtin to resolve calls to
__builtin_preserve_access_index.  */
 
 static tree
-bpf_core_newdecl (location_t loc, tree type)
+bpf_core_newdecl (tree type)
 {
   tree rettype = build_function_type_list (type, type, NULL);
-  tree newdecl = NULL_TREE;
   char name[80];
   int len = snprintf (name, sizeof (name), "%s", "__builtin_pai_");
 
@@ -1317,7 +1302,7 @@ bpf_core_walk (tree *tp, int *walk_subtrees, void *data)
 
   if (bpf_core_is_maybe_aggregate_access (*tp))
 {
-  tree newdecl = bpf_core_newdecl (loc, TREE_TYPE (*tp));
+  tree newdecl = bpf_core_newdecl (TREE_TYPE (*tp));
   tree newcall = build_call_expr_loc (loc, newdecl, 1, *tp);
   *tp = newcall;
   *walk_subtrees = 0;
-- 
2.11.0

[PATCH 6/6] rs6000: Rename arrays to remove temporary _x suffix

2021-12-06 Thread Bill Schmidt via Gcc-patches

Hi!

While we had two sets of built-in infrastructure at once, I added _x as a
suffix to two arrays to disambiguate the old and new versions.  Time to fix
that also.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks!
Bill

2021-12-06  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.c (altivec_build_resolved_builtin): Rename
rs6000_builtin_decls_x to rs6000_builtin_decls.
(altivec_resolve_overloaded_builtin): Likewise.  Also rename
rs6000_builtin_info_x to rs6000_builtin_info.
* config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Rename
rs6000_builtin_info_x to rs6000_builtin_info.
(rs6000_builtin_is_supported): Likewise.
(rs6000_gimple_fold_mma_builtin): Likewise.  Also rename
rs6000_builtin_decls_x to rs6000_builtin_decls.
(rs6000_gimple_fold_builtin): Rename rs6000_builtin_info_x to
rs6000_builtin_info.
(cpu_expand_builtin): Likewise.
(rs6000_expand_builtin): Likewise.
(rs6000_init_builtins): Likewise.  Also rename rs6000_builtin_decls_x
to rs6000_builtin_decls.
(rs6000_builtin_decl): Rename rs6000_builtin_decls_x to
rs6000_builtin_decls.
* config/rs6000/rs6000-gen-builtins.c (write_decls): In generated code,
rename rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
rs6000_builtin_info_x to rs6000_builtin_info.
(write_bif_static_init): In generated code, rename
rs6000_builtin_info_x to rs6000_builtin_info.
(write_init_bif_table): In generated code, rename
rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
rs6000_builtin_info_x to rs6000_builtin_info.
(write_init_ovld_table): In generated code, rename
rs6000_builtin_decls_x to rs6000_builtin_decls.
(write_init_file): Likewise.
* config/rs6000/rs6000.c (rs6000_builtin_vectorized_function):
Likewise.
(rs6000_builtin_md_vectorized_function): Likewise.
(rs6000_builtin_reciprocal): Likewise.
(add_condition_to_bb): Likewise.
(rs6000_atomic_assign_expand_fenv): Likewise.
---
 gcc/config/rs6000/rs6000-c.c| 64 -
 gcc/config/rs6000/rs6000-call.c | 46 +-
 gcc/config/rs6000/rs6000-gen-builtins.c | 27 +--
 gcc/config/rs6000/rs6000.c  | 58 +++---
 4 files changed, 96 insertions(+), 99 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index f790c72d621..e0ebdeed548 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -867,7 +867,7 @@ altivec_build_resolved_builtin (tree *args, int n, tree 
fntype, tree ret_type,
 {
   tree argtypes = TYPE_ARG_TYPES (fntype);
   tree arg_type[MAX_OVLD_ARGS];
-  tree fndecl = rs6000_builtin_decls_x[bif_id];
+  tree fndecl = rs6000_builtin_decls[bif_id];
 
   for (int i = 0; i < n; i++)
 {
@@ -1001,13 +1001,13 @@ altivec_resolve_overloaded_builtin (location_t loc, 
tree fndecl,
  case E_SFmode:
{
  /* For floats use the xvmulsp instruction directly.  */
- tree call = rs6000_builtin_decls_x[RS6000_BIF_XVMULSP];
+ tree call = rs6000_builtin_decls[RS6000_BIF_XVMULSP];
  return build_call_expr (call, 2, arg0, arg1);
}
  case E_DFmode:
{
  /* For doubles use the xvmuldp instruction directly.  */
- tree call = rs6000_builtin_decls_x[RS6000_BIF_XVMULDP];
+ tree call = rs6000_builtin_decls[RS6000_BIF_XVMULDP];
  return build_call_expr (call, 2, arg0, arg1);
}
  /* Other types are errors.  */
@@ -1066,7 +1066,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
vec_safe_push (params, arg0);
vec_safe_push (params, arg1);
tree call = altivec_resolve_overloaded_builtin
- (loc, rs6000_builtin_decls_x[RS6000_OVLD_VEC_CMPEQ],
+ (loc, rs6000_builtin_decls[RS6000_OVLD_VEC_CMPEQ],
   params);
/* Use save_expr to ensure that operands used more than once
   that may have side effects (like calls) are only evaluated
@@ -1076,7 +1076,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
vec_safe_push (params, call);
vec_safe_push (params, call);
return altivec_resolve_overloaded_builtin
- (loc, rs6000_builtin_decls_x[RS6000_OVLD_VEC_NOR], params);
+ (loc, rs6000_builtin_decls[RS6000_OVLD_VEC_NOR], params);
  }
  /* Other types are errors.  */
default:
@@ -1129,9 +1129,9 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  vec_safe_push (params, arg1);
 
  if (fcode ==

[PATCH 5/6] rs6000: Rename functions with "new" in their names

2021-12-06 Thread Bill Schmidt via Gcc-patches

Hi!

While we had two sets of built-in functionality at the same time, I put "new"
in the names of quite a few functions.  Time to undo that.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks!
Bill

2021-12-02  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.c (altivec_resolve_new_overloaded_builtin):
Remove forward declaration.
(rs6000_new_builtin_type_compatible): Rename to
rs6000_builtin_type_compatible.
(rs6000_builtin_type_compatible): Remove.
(altivec_resolve_overloaded_builtin): Remove.
(altivec_build_new_resolved_builtin): Rename to
altivec_build_resolved_builtin.
(altivec_resolve_new_overloaded_builtin): Rename to
altivec_resolve_overloaded_builtin.  Remove static keyword.  Adjust
called function names.
* config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): Remove
forward declaration.
(rs6000_gimple_fold_new_builtin): Likewise.
(rs6000_invalid_new_builtin): Rename to rs6000_invalid_builtin.
(rs6000_gimple_fold_builtin): Remove.
(rs6000_new_builtin_valid_without_lhs): Rename to
rs6000_builtin_valid_without_lhs.
(rs6000_new_builtin_is_supported): Rename to
rs6000_builtin_is_supported.
(rs6000_gimple_fold_new_mma_builtin): Rename to
rs6000_gimple_fold_mma_builtin.
(rs6000_gimple_fold_new_builtin): Rename to
rs6000_gimple_fold_builtin.  Remove static keyword.  Adjust called
function names.
(rs6000_expand_builtin): Remove.
(new_cpu_expand_builtin): Rename to cpu_expand_builtin.
(new_mma_expand_builtin): Rename to mma_expand_builtin.
(new_htm_spr_num): Rename to htm_spr_num.
(new_htm_expand_builtin): Rename to htm_expand_builtin.  Change name
of called function.
(rs6000_expand_new_builtin): Rename to rs6000_expand_builtin.  Remove
static keyword.  Adjust called function names.
(rs6000_new_builtin_decl): Rename to rs6000_builtin_decl.  Remove
static keyword.
(rs6000_builtin_decl): Remove.
* config/rs6000/rs6000-gen-builtins.c (write_decls): In gnerated code,
rename rs6000_new_builtin_is_supported to rs6000_builtin_is_supported.
* config/rs6000/rs6000-internal.h (rs6000_invalid_new_builtin): Rename
to rs6000_invalid_builtin.
* config/rs6000/rs6000.c (rs6000_new_builtin_vectorized_function):
Rename to rs6000_builtin_vectorized_function.
(rs6000_new_builtin_md_vectorized_function): Rename to
rs6000_builtin_md_vectorized_function.
(rs6000_builtin_vectorized_function): Remove.
(rs6000_builtin_md_vectorized_function): Remove.
---
 gcc/config/rs6000/rs6000-c.c| 120 +---
 gcc/config/rs6000/rs6000-call.c |  99 ++-
 gcc/config/rs6000/rs6000-gen-builtins.c |   3 +-
 gcc/config/rs6000/rs6000-internal.h |   2 +-
 gcc/config/rs6000/rs6000.c  |  31 ++
 5 files changed, 80 insertions(+), 175 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index d44edf585aa..f790c72d621 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -37,9 +37,6 @@
 
 #include "rs6000-internal.h"
 
-static tree altivec_resolve_new_overloaded_builtin (location_t, tree, void *);
-
-
 /* Handle the machine specific pragma longcall.  Its syntax is
 
# pragma longcall ( TOGGLE )
@@ -817,7 +814,7 @@ is_float128_p (tree t)
 
 /* Return true iff ARGTYPE can be compatibly passed as PARMTYPE.  */
 static bool
-rs6000_new_builtin_type_compatible (tree parmtype, tree argtype)
+rs6000_builtin_type_compatible (tree parmtype, tree argtype)
 {
   if (parmtype == error_mark_node)
 return false;
@@ -840,23 +837,6 @@ rs6000_new_builtin_type_compatible (tree parmtype, tree 
argtype)
   return lang_hooks.types_compatible_p (parmtype, argtype);
 }
 
-static inline bool
-rs6000_builtin_type_compatible (tree t, int id)
-{
-  tree builtin_type;
-  builtin_type = rs6000_builtin_type (id);
-  if (t == error_mark_node)
-return false;
-  if (INTEGRAL_TYPE_P (t) && INTEGRAL_TYPE_P (builtin_type))
-return true;
-  else if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
-  && is_float128_p (t) && is_float128_p (builtin_type))
-return true;
-  else
-return lang_hooks.types_compatible_p (t, builtin_type);
-}
-
-
 /* In addition to calling fold_convert for EXPR of type TYPE, also
call c_fully_fold to remove any C_MAYBE_CONST_EXPRs that could be
hiding there (PR47197).  */
@@ -873,16 +853,6 @@ fully_fold_convert (tree type, tree expr)
   return result;
 }
 
-/* Implementation of the resolve_overloaded_builtin target hook, to
-   support Altivec's overloaded builtins.  */
-
-tree
-altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
-   void

[PATCH 4/6] rs6000: Remove rs6000-builtin.def and associated data and functions

2021-12-06 Thread Bill Schmidt via Gcc-patches

Hi!

The old rs6000-builtin.def file is no longer needed.  Remove it and the code
that depends on it.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks!
Bill

2021-12-02  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin.def: Delete.
* config/rs6000/rs6000-call.c (builtin_compatibility): Delete.
(builtin_description): Delete.
(builtin_hash_struct): Delete.
(builtin_hasher): Delete.
(builtin_hash_table): Delete.
(builtin_hasher::hash): Delete.
(builtin_hasher::equal): Delete.
(rs6000_builtin_info_type): Delete.
(rs6000_builtin_info): Delete.
(bdesc_compat): Delete.
(bdesc_3arg): Delete.
(bdesc_4arg): Delete.
(bdesc_dst): Delete.
(bdesc_2arg): Delete.
(bdesc_altivec_preds): Delete.
(bdesc_abs): Delete.
(bdesc_1arg): Delete.
(bdesc_0arg): Delete.
(bdesc_htm): Delete.
(bdesc_mma): Delete.
(rs6000_overloaded_builtin_p): Delete.
(rs6000_overloaded_builtin_name): Delete.
(htm_spr_num): Delete.
(rs6000_builtin_is_supported_p): Delete.
(rs6000_gimple_fold_mma_builtin): Delete.
(gt-rs6000-call.h): Remove include directive.
* config/rs6000/rs6000-protos.h (rs6000_overloaded_builtin_p): Delete.
(rs6000_builtin_is_supported_p): Delete.
(rs6000_overloaded_builtin_name): Delete.
* config/rs6000/rs6000.c (rs6000_builtin_decls): Delete.
(rs6000_debug_reg_global): Remove reference to RS6000_BUILTIN_COUNT.
* config/rs6000/rs6000.h (rs6000_builtins): Delete.
(altivec_builtin_types): Delete.
(rs6000_builtin_decls): Delete.
* config/rs6000/t-rs6000 (TM_H): Don't add rs6000-builtin.def.
---
 gcc/config/rs6000/rs6000-builtin.def | 3350 --
 gcc/config/rs6000/rs6000-call.c  |  712 --
 gcc/config/rs6000/rs6000-protos.h|3 -
 gcc/config/rs6000/rs6000.c   |3 -
 gcc/config/rs6000/rs6000.h   |   57 -
 gcc/config/rs6000/t-rs6000   |1 -
 6 files changed, 4126 deletions(-)
 delete mode 100644 gcc/config/rs6000/rs6000-builtin.def

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
deleted file mode 100644
index 9dbf16f48c4..000
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 86054f75756..a5ee06c991f 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -89,20 +89,6 @@
 #define TARGET_NO_PROTOTYPE 0
 #endif
 
-struct builtin_compatibility
-{
-  const enum rs6000_builtins code;
-  const char *const name;
-};
-
-struct builtin_description
-{
-  const HOST_WIDE_INT mask;
-  const enum insn_code icode;
-  const char *const name;
-  const enum rs6000_builtins code;
-};
-
 /* Used by __builtin_cpu_is(), mapping from PLATFORM names to values.  */
 static const struct
 {
@@ -184,127 +170,6 @@ static const struct
 
 static rtx rs6000_expand_new_builtin (tree, rtx, rtx, machine_mode, int);
 static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
-
-
-/* Hash table to keep track of the argument types for builtin functions.  */
-
-struct GTY((for_user)) builtin_hash_struct
-{
-  tree type;
-  machine_mode mode[4];/* return value + 3 arguments.  */
-  unsigned char uns_p[4];  /* and whether the types are unsigned.  */
-};
-
-struct builtin_hasher : ggc_ptr_hash
-{
-  static hashval_t hash (builtin_hash_struct *);
-  static bool equal (builtin_hash_struct *, builtin_hash_struct *);
-};
-
-static GTY (()) hash_table *builtin_hash_table;
-
-/* Hash function for builtin functions with up to 3 arguments and a return
-   type.  */
-hashval_t
-builtin_hasher::hash (builtin_hash_struct *bh)
-{
-  unsigned ret = 0;
-  int i;
-
-  for (i = 0; i < 4; i++)
-{
-  ret = (ret * (unsigned)MAX_MACHINE_MODE) + ((unsigned)bh->mode[i]);
-  ret = (ret * 2) + bh->uns_p[i];
-}
-
-  return ret;
-}
-
-/* Compare builtin hash entries H1 and H2 for equivalence.  */
-bool
-builtin_hasher::equal (builtin_hash_struct *p1, builtin_hash_struct *p2)
-{
-  return ((p1->mode[0] == p2->mode[0])
- && (p1->mode[1] == p2->mode[1])
- && (p1->mode[2] == p2->mode[2])
- && (p1->mode[3] == p2->mode[3])
- && (p1->uns_p[0] == p2->uns_p[0])
- && (p1->uns_p[1] == p2->uns_p[1])
- && (p1->uns_p[2] == p2->uns_p[2])
- && (p1->uns_p[3] == p2->uns_p[3]));
-}
-
-
-/* Table that classifies rs6000 builtin functions (pure, const, etc.).  */
-#undef RS6000_BUILTIN_0
-#undef RS6000_BUILTIN_1
-#undef RS6000_BUILTIN_2
-#undef RS6000_BUILTIN_3
-#undef RS6000_BUILTIN_4
-#undef RS6000_BUILTIN_A
-#undef RS6000_BUILTIN_D
-#undef RS6000_BUILTIN_H
-#undef RS6000_BUILTIN_M
-#undef RS6000_BUILTIN_P
-#undef RS6000_BUILTIN_X
-
-#define RS6000_BUILTIN_0(ENUM, NAME, MASK, ATTR, ICODE) \
-  { NAME, ICODE,

[PATCH 3/6] rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def

2021-12-06 Thread Bill Schmidt via Gcc-patches

Hi!

This patch just renames a file and updates the build machinery accordingly.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks!
Bill

2021-12-02  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Rename to...
* config/rs6000/rs6000-builtins.def: ...this.
* config/rs6000/rs6000-gen-builtins.c: Adjust header commentary.
* config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Rename
rs6000-builtin-new.def to rs6000-builtins.def.
(rs6000-builtins.c): Likewise.
---
 .../rs6000/{rs6000-builtin-new.def => rs6000-builtins.def}  | 0
 gcc/config/rs6000/rs6000-gen-builtins.c | 4 ++--
 gcc/config/rs6000/t-rs6000  | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)
 rename gcc/config/rs6000/{rs6000-builtin-new.def => rs6000-builtins.def} (100%)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtins.def
similarity index 100%
rename from gcc/config/rs6000/rs6000-builtin-new.def
rename to gcc/config/rs6000/rs6000-builtins.def
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c 
b/gcc/config/rs6000/rs6000-gen-builtins.c
index 78b2486aafc..9c61b7d9fe6 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -22,7 +22,7 @@ along with GCC; see the file COPYING3.  If not see
recognition code for Power targets, based on text files that
describe the built-in functions and vector overloads:
 
- rs6000-builtin-new.def Table of built-in functions
+ rs6000-builtins.defTable of built-in functions
  rs6000-overload.defTable of overload functions
 
Both files group similar functions together in "stanzas," as
@@ -125,7 +125,7 @@ along with GCC; see the file COPYING3.  If not see
 
The second line contains the  that this particular instance of
the overloaded function maps to.  It must match a token that appears in
-   rs6000-builtin-new.def.  Optionally, a second token may appear.  If only
+   rs6000-builtins.def.  Optionally, a second token may appear.  If only
one token is on the line, it is also used to build the unique identifier
for the overloaded function.  If a second token is present, the second
token is used instead for this purpose.  This is necessary in cases
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index d48a4b1be6c..3d3143a171d 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -22,7 +22,7 @@ TM_H += $(srcdir)/config/rs6000/rs6000-builtin.def
 TM_H += $(srcdir)/config/rs6000/rs6000-cpus.def
 TM_H += $(srcdir)/config/rs6000/rs6000-modes.h
 PASSES_EXTRA += $(srcdir)/config/rs6000/rs6000-passes.def
-EXTRA_GTYPE_DEPS += $(srcdir)/config/rs6000/rs6000-builtin-new.def
+EXTRA_GTYPE_DEPS += $(srcdir)/config/rs6000/rs6000-builtins.def
 
 rs6000-pcrel-opt.o: $(srcdir)/config/rs6000/rs6000-pcrel-opt.c
$(COMPILE) $<
@@ -59,10 +59,10 @@ build/rs6000-gen-builtins$(build_exeext): 
build/rs6000-gen-builtins.o \
 # For now, the header files depend on rs6000-builtins.c, which avoids
 # races because the .c file is closed last in rs6000-gen-builtins.c.
 rs6000-builtins.c: build/rs6000-gen-builtins$(build_exeext) \
-  $(srcdir)/config/rs6000/rs6000-builtin-new.def \
+  $(srcdir)/config/rs6000/rs6000-builtins.def \
   $(srcdir)/config/rs6000/rs6000-overload.def
$(RUN_GEN) ./build/rs6000-gen-builtins$(build_exeext) \
-   $(srcdir)/config/rs6000/rs6000-builtin-new.def \
+   $(srcdir)/config/rs6000/rs6000-builtins.def \
$(srcdir)/config/rs6000/rs6000-overload.def rs6000-builtins.h \
rs6000-builtins.c rs6000-vecdefines.h
 
-- 
2.27.0

[PATCH v2 0/6] Remove "old" built-in function infrastructure

2021-12-06 Thread Bill Schmidt via Gcc-patches

Hi!

Now that the new built-in function support is all upstream and enabled, it
seems safe and prudent to remove the old code to avoid confusion.  I broke this
up to the extent possible, but a couple of patches are still pretty large.

David Edelsohn found that I had broken some C++ library functions for AIX, and
his fix for that required me to re-spin the patches.  I also generated the diff
with a more efficient algorithm to reduce the patch size.  Otherwise this
series is identical to V1.

Thanks!
Bill

Bill Schmidt (6):
  rs6000: Remove new_builtins_are_live and dead code it was guarding
  rs6000: Remove altivec_overloaded_builtins array and initialization
  rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def
  rs6000: Remove rs6000-builtin.def and associated data and functions
  rs6000: Rename functions with "new" in their names
  rs6000: Rename arrays to remove temporary _x suffix

 gcc/config/rs6000/darwin.h| 8 +-
 gcc/config/rs6000/rs6000-builtin.def  |  3350 -
 ...00-builtin-new.def => rs6000-builtins.def} | 0
 gcc/config/rs6000/rs6000-c.c  |  1266 +-
 gcc/config/rs6000/rs6000-call.c   | 11964 +---
 gcc/config/rs6000/rs6000-gen-builtins.c   |   115 +-
 gcc/config/rs6000/rs6000-internal.h   | 2 +-
 gcc/config/rs6000/rs6000-protos.h | 3 -
 gcc/config/rs6000/rs6000.c|   334 +-
 gcc/config/rs6000/rs6000.h|58 -
 gcc/config/rs6000/t-rs6000| 7 +-
 11 files changed, 224 insertions(+), 16883 deletions(-)
 delete mode 100644 gcc/config/rs6000/rs6000-builtin.def
 rename gcc/config/rs6000/{rs6000-builtin-new.def => rs6000-builtins.def} (100%)

-- 
2.27.0

Re: [PATCH v2] fix PR 103143

2021-12-06 Thread Jeff Law via Gcc-patches





On 12/6/2021 10:31 AM, Martin Sebor wrote:

I have broken up the patch into a series of six.  Attached is
part (1), the fix for the typo that causes PR 103143.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they 
need to be a single patch and many reasons why they should be a 
series of independent patches.    Combining them into a single patch 
isn't how we do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff




gcc-103413.diff

commit 9a5bb7a2b0cdb8654061d9cba543c1408fa7adc9
Author: Martin Sebor
Date:   Sat Dec 4 16:22:07 2021 -0700

 Use the recursive form of compute_objsize [PR 103143].
 
 gcc/ChangeLog:
 
 PR middle-end/103143

 * pointer-query.cc (gimple_call_return_array): Call 
compute_objsize_r.
 
 gcc/testsuite/ChangeLog:

 PR middle-end/103143
 * gcc.dg/Wstringop-overflow-83.c: New test.
Thanks.  Presumably by going through the public API, qry->depth got 
reset to 0  and/or we'd get a re-initialized snlim at each call, leading 
to the infinite recursion?


OK for the trunk.  Thanks for breaking this out.

jeff

Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries

2021-12-06 Thread Segher Boessenkool

On Mon, Dec 06, 2021 at 11:12:00AM -0700, Martin Sebor wrote:
> On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
> >Approach 1: Custom Address Spaces
> >=
> >
> >GCC's C frontend supports target-specific address spaces; see:
> >   https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
> >Quoting the N1275 draft of ISO/IEC DTR 18037:
> >   "Address space names are ordinary identifiers, sharing the same name
> >   space as variables and typedef names.  Any such names follow the same
> >   rules for scope as other ordinary identifiers (such as typedef names).
> >   An implementation may provide an implementation-defined set of
> >   intrinsic address spaces that are, in effect, predefined at the start
> >   of every translation unit.  The names of intrinsic address spaces must
> >   be reserved identifiers (beginning with an underscore and an uppercase
> >   letter or with two underscores).  An implementation may also
> >   optionally support a means for new address space names to be defined
> >   within a translation unit."
> >
> >Patch 1a in the following patch kit for GCC implements such a means to
> >define new address spaces names in a translation unit, via a pragma:
> >   #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
> >
> >For example, the Linux kernel could perhaps write:
> >
> >   #define __kernel
> >   #pragma GCC custom_address_space(__user)
> >   #pragma GCC custom_address_space(__iomem)
> >   #pragma GCC custom_address_space(__percpu)
> >   #pragma GCC custom_address_space(__rcu)
> >
> >and thus the C frontend can complain about code that mismatches __user
> >and kernel pointers, e.g.:
> >
> >custom-address-space-1.c: In function ‘test_argpass_to_p’:
> >custom-address-space-1.c:29:14: error: passing argument 1 of 
> >‘accepts_p’
> >from pointer to non-enclosed address space
> >29 |   accepts_p (p_user);
> >   |  ^~
> >custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
> >of type ‘__user void *’
> >21 | extern void accepts_p (void *);
> >   |^~
> >custom-address-space-1.c: In function ‘test_cast_k_to_u’:
> >custom-address-space-1.c:135:12: warning: cast to ‘__user’ address 
> >space
> >pointer from disjoint generic address space pointer
> >   135 |   p_user = (void __user *)p_kernel;
> >   |^
> 
> This seems like an excellent use of named address spaces :)

It has some big problems though.

Named address spaces are completely target-specific.  Defining them with
a pragma like this does not allow you to set the pointer mode or
anything related to a custom LEGITIMATE_ADDRESS_P.  It does not allow
you to sayy zero pointers are invalid in some address spaces and not in
others.  You cannot provide any of the DWARF address space stuff this
way.  But most importantly, there are only four bits for the address
space field internally, and they are used by however a backend wants to
use them.

None of this cannot be solved, but all of it will have to be solved.

IMO it will be best to not mix this with address spaces in the user
interface (it is of course fine to *implement* it like that, or with
big overlap at least).

> >The patch doesn't yet maintain a good distinction between implicit
> >target-specific address spaces and user-defined address spaces,

And that will have to be fixed in the user code syntax at least.

> >has at
> >least one known major bug, and has only been lightly tested.  I can
> >fix these issues, but was hoping for feedback that this approach is the
> >right direction from both the GCC and Linux development communities.

Allowing the user to define new address spaces does not jibe well with
how targets do (validly!) use them.

> >Approach 2: An "untrusted" attribute
> >
> >
> >Alternatively, patch 1b in the kit implements:
> >
> >   __attribute__((untrusted))
> >
> >which can be applied to types as a qualifier (similarly to const,
> >volatile, etc) to mark a trust boundary, hence the kernel could have:
> >
> >   #define __user __attribute__((untrusted))
> >
> >where my patched GCC treats
> >   T *
> >vs
> >   T __attribute__((untrusted)) *
> >as being different types and thus the C frontend can complain (even without
> >-fanalyzer) about e.g.:
> >
> >extern void accepts_p(void *);
> >
> >void test_argpass_to_p(void __user *p_user)
> >{
> >   accepts_p(p_user);
> >}
> >
> >untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
> >untrusted-pointer-1.c:22:13: error: passing argument 1 of ‘accepts_p’
> >from pointer with different trust level
> >22 |   accepts_p(p_user);
> >   |  ^~
> >untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument is of
> >type ‘__attribute__((untrusted)) void *’
> >14 | extern void accepts_p(void *);
> >   |^~
> >
> >So you'd get enforcement of __user vs non-__user pointers as part of
> >GCC's regular

[patch, power-ieee128, committed] First stab at the library

2021-12-06 Thread Thomas Koenig via Gcc-patches


Hi,

here is a first stab at the library side of power-ieee128,
committed to the branch.

It compiles, but probably still has a lot of issues.  It
is also not called from the compiler yet.

Regards

Thomas


2021-10-19  Thomas Koenig  

Prepare library for REAL(KIND=17).

This prepares the library side for REAL(KIND=17).  It is
not yet tested, but at least compiles cleanly on POWER 9
and x86_64.

fixincludes/ChangeLog:

* configure: Regenerate.
* fixincl.x: Regenerate.2021-10-19  Thomas Koenig 



Prepare library for REAL(KIND=17).

This prepares the library side for REAL(KIND=17).  It is
not yet tested, but at least compiles cleanly on POWER 9
and x86_64.

fixincludes/ChangeLog:

* configure: Regenerate.
* fixincl.x: Regenerate.

intl/ChangeLog:

* aclocal.m4: Regenerate.
* configure: Regenerate.

libatomic/ChangeLog:

* Makefile.in: Regenerate.
* configure: Regenerate.
* testsuite/Makefile.in:

libcc1/ChangeLog:

* Makefile.in: Regenerate.
* configure: Regenerate.

libdecnumber/ChangeLog:

* configure: Regenerate.

libgcc/ChangeLog:

* configure: Regenerate.

libgfortran/ChangeLog:

* Makefile.am: Add _r17 and _c17 files.  Build them
with -mabi=ieeelongdouble on POWER.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: New flag HAVE_REAL_17.
* kinds-override.h: (HAVE_GFC_REAL_17): New macro.
(HAVE_GFC_COMPLEX_17): New macro.
(GFC_REAL_17_HUGE): New macro.
(GFC_REAL_17_LITERAL_SUFFIX): New macro.
(GFC_REAL_17_LITERAL): New macro.
(GFC_REAL_17_DIGITS): New macro.
(GFC_REAL_17_RADIX): New macro.
* libgfortran.h (POWER_IEEE128): New macro.
(gfc_array_r17): Typedef.
(GFC_DTYPE_REAL_17): New macro.
(GFC_DTYPE_COMPLEX_17): New macro.
(__acoshieee128): Prototype.
(__acosieee128): Prototype.
(__asinhieee128): Prototype.
(__asinieee128): Prototype.
(__atan2ieee128): Prototype.
(__atanhieee128): Prototype.
(__atanieee128): Prototype.
(__coshieee128): Prototype.
(__cosieee128): Prototype.
(__erfieee128): Prototype.
(__expieee128): Prototype.
(__fabsieee128): Prototype.
(__jnieee128): Prototype.
(__log10ieee128): Prototype.
(__logieee128): Prototype.
(__powieee128): Prototype.
(__sinhieee128): Prototype.
(__sinieee128): Prototype.
(__sqrtieee128): Prototype.
(__tanhieee128): Prototype.
(__tanieee128): Prototype.
(__ynieee128): Prototype.
* m4/mtype.m4: Make a bit more readable. Add KIND=17.
* generated/_abs_c17.F90: New file.
* generated/_abs_r17.F90: New file.
* generated/_acos_r17.F90: New file.
* generated/_acosh_r17.F90: New file.
* generated/_aimag_c17.F90: New file.
* generated/_aint_r17.F90: New file.
* generated/_anint_r17.F90: New file.
* generated/_asin_r17.F90: New file.
* generated/_asinh_r17.F90: New file.
* generated/_atan2_r17.F90: New file.
* generated/_atan_r17.F90: New file.
* generated/_atanh_r17.F90: New file.
* generated/_conjg_c17.F90: New file.
* generated/_cos_c17.F90: New file.
* generated/_cos_r17.F90: New file.
* generated/_cosh_r17.F90: New file.
* generated/_dim_r17.F90: New file.
* generated/_exp_c17.F90: New file.
* generated/_exp_r17.F90: New file.
* generated/_log10_r17.F90: New file.
* generated/_log_c17.F90: New file.
* generated/_log_r17.F90: New file.
* generated/_mod_r17.F90: New file.
* generated/_sign_r17.F90: New file.
* generated/_sin_c17.F90: New file.
* generated/_sin_r17.F90: New file.
* generated/_sinh_r17.F90: New file.
* generated/_sqrt_c17.F90: New file.
* generated/_sqrt_r17.F90: New file.
* generated/_tan_r17.F90: New file.
* generated/_tanh_r17.F90: New file.
* generated/bessel_r17.c: New file.
* generated/cshift0_c17.c: New file.
* generated/cshift0_r17.c: New file.
* generated/cshift1_16_c17.c: New file.
* generated/cshift1_16_r17.c: New file.
* generated/cshift1_4_c17.c: New file.
* generated/cshift1_4_r17.c: New file.
* generated/cshift1_8_c17.c: New file.
*

[PATCH v3 7/7] ifcvt: Run second pass if it is possible to omit a temporary.

2021-12-06 Thread Robin Dapp via Gcc-patches

If one of the to-be-converted SETs requires the original comparison
(i.e. in order to generate a min/max insn) but no other insn after it
does, we can omit creating temporaries, thus facilitating costing.
---
 gcc/ifcvt.c | 33 +++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index d1e7db1ee27..3be5bb131df 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3262,6 +3262,11 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 
   need_cmov_or_rewire (then_bb, _no_cmov, _src);
 
+  int last_needs_comparison = -1;
+  bool second_try = false;
+
+restart:
+
   FOR_BB_INSNS (then_bb, insn)
 {
   /* Skip over non-insns.  */
@@ -3305,8 +3310,12 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 Therefore we introduce a temporary every time we are about to
 overwrite a variable used in the check.  Costing of a sequence with
 these is going to be inaccurate so only use temporaries when
-needed.  */
-  if (reg_overlap_mentioned_p (target, cond))
+needed.
+
+If performing a second try, we know how many insns require a
+temporary.  For the last of these, we can omit creating one.  */
+  if (reg_overlap_mentioned_p (target, cond)
+ && (!second_try || count < last_needs_comparison))
temp = gen_reg_rtx (GET_MODE (target));
   else
temp = target;
@@ -3389,6 +3398,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
{
  seq = seq1;
  temp_dest = temp_dest1;
+ if (!second_try)
+   last_needs_comparison = count;
}
   else if (seq2 != NULL_RTX)
{
@@ -3412,6 +3423,24 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   unmodified_insns.safe_push (insn);
 }
 
+/* If there are insns that overwrite part of the initial
+   comparison, we can still omit creating temporaries for
+   the last of them.
+   As the second try will always create a less expensive,
+   valid sequence, we do not need to compare and can discard
+   the first one.  */
+if (!second_try && last_needs_comparison >= 0)
+  {
+   end_sequence ();
+   start_sequence ();
+   count = 0;
+   targets.truncate (0);
+   temporaries.truncate (0);
+   unmodified_insns.truncate (0);
+   second_try = true;
+   goto restart;
+  }
+
   /* We must have seen some sort of insn to insert, otherwise we were
  given an empty BB to convert, and we can't handle that.  */
   gcc_assert (!unmodified_insns.is_empty ());
-- 
2.31.1

[PATCH v3 6/7] testsuite/s390: Add tests for noce_convert_multiple.

2021-12-06 Thread Robin Dapp via Gcc-patches

Add new s390-specific tests that check if we convert two SETs into two
loads on condition.  Remove the s390-specific target-check in
gcc.dg/ifcvt-4.c.
---
 gcc/testsuite/gcc.dg/ifcvt-4.c|  2 +-
 .../gcc.target/s390/ifcvt-two-insns-bool.c| 39 +++
 .../gcc.target/s390/ifcvt-two-insns-int.c | 39 +++
 .../gcc.target/s390/ifcvt-two-insns-long.c| 39 +++
 4 files changed, 118 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c

diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
index ec142cfd943..871ab950756 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -2,7 +2,7 @@
 /* { dg-additional-options "-misel" { target { powerpc*-*-* } } } */
 /* { dg-additional-options "-march=z196" { target { s390x-*-* } } } */
 /* { dg-additional-options "-mtune-ctrl=^one_if_conv_insn" { target { i?86-*-* 
x86_64-*-* } } } */
-/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" 
{ "arm*-*-* avr-*-* hppa*64*-*-* s390-*-* visium-*-*" riscv*-*-* msp430-*-* } } 
 */
+/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" 
{ "arm*-*-* avr-*-* hppa*64*-*-* visium-*-*" riscv*-*-* msp430-*-* } }  */
 /* { dg-skip-if "" { "s390x-*-*" } { "-m31" } }  */
 
 typedef int word __attribute__((mode(word)));
diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c 
b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
new file mode 100644
index 000..4415cb49f11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
@@ -0,0 +1,39 @@
+/* Check if conversion for two instructions.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -march=z13 --save-temps" } */
+
+/* { dg-final { scan-assembler "lochinhe\t%r.?,1" } } */
+/* { dg-final { scan-assembler "locrnhe\t.*" } } */
+#include 
+#include 
+#include 
+#include 
+
+__attribute__ ((noinline))
+int foo (int *a, unsigned int n)
+{
+  int min = 99;
+  bool bla = false;
+  for (int i = 0; i < n; i++)
+{
+  if (a[i] < min)
+   {
+ min = a[i];
+ bla = true;
+   }
+}
+
+  if (bla)
+min += 1;
+  return min;
+}
+
+int main()
+{
+  int a[] = {2, 1, -13, INT_MAX, INT_MIN, 0};
+
+  int res = foo (a, sizeof (a));
+
+  assert (res == (INT_MIN + 1));
+}
diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c 
b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
new file mode 100644
index 000..db8df84cced
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
@@ -0,0 +1,39 @@
+/* Check if conversion for two instructions.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -march=z13 --save-temps" } */
+
+/* { dg-final { scan-assembler "lochinhe\t%r.?,1" } } */
+/* { dg-final { scan-assembler "locrnhe\t.*" } } */
+#include 
+#include 
+#include 
+#include 
+
+__attribute__ ((noinline))
+int foo (int *a, unsigned int n)
+{
+  int min = 99;
+  int bla = 0;
+  for (int i = 0; i < n; i++)
+{
+  if (a[i] < min)
+   {
+ min = a[i];
+ bla = 1;
+   }
+}
+
+  if (bla)
+min += 1;
+  return min;
+}
+
+int main()
+{
+  int a[] = {2, 1, -13, INT_MAX, INT_MIN, 0};
+
+  int res = foo (a, sizeof (a));
+
+  assert (res == (INT_MIN + 1));
+}
diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c 
b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c
new file mode 100644
index 000..bee63328729
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c
@@ -0,0 +1,39 @@
+/* Check if conversion for two instructions.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -march=z13 --save-temps" } */
+
+/* { dg-final { scan-assembler "locghinhe\t%r.?,1" } } */
+/* { dg-final { scan-assembler "locgrnhe\t.*" } } */
+#include 
+#include 
+#include 
+#include 
+
+__attribute__ ((noinline))
+long foo (long *a, unsigned long n)
+{
+  long min = 99;
+  long bla = 0;
+  for (int i = 0; i < n; i++)
+{
+  if (a[i] < min)
+   {
+ min = a[i];
+ bla = 1;
+   }
+}
+
+  if (bla)
+min += 1;
+  return min;
+}
+
+int main()
+{
+  long a[] = {2, 1, -13, LONG_MAX, LONG_MIN, 0};
+
+  long res = foo (a, sizeof (a));
+
+  assert (res == (LONG_MIN + 1));
+}
-- 
2.31.1

[PATCH v3 5/7] ifcvt: Try re-using CC for conditional moves.

2021-12-06 Thread Robin Dapp via Gcc-patches

Following up on the previous patch, this patch makes
noce_convert_multiple emit two cmov sequences:  The same one as before
and a second one that tries to re-use the existing CC.  Then their costs
are compared and the cheaper one is selected.
---
 gcc/ifcvt.c | 112 +---
 1 file changed, 88 insertions(+), 24 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 3e78e1bb03d..d1e7db1ee27 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -83,7 +83,7 @@ static rtx_insn *last_active_insn (basic_block, int);
 static rtx_insn *find_active_insn_before (basic_block, rtx_insn *);
 static rtx_insn *find_active_insn_after (basic_block, rtx_insn *);
 static basic_block block_fallthru (basic_block);
-static rtx cond_exec_get_condition (rtx_insn *);
+static rtx cond_exec_get_condition (rtx_insn *, bool);
 static rtx noce_get_condition (rtx_insn *, rtx_insn **, bool);
 static int noce_operand_ok (const_rtx);
 static void merge_if_block (ce_if_block *);
@@ -427,7 +427,7 @@ cond_exec_process_insns (ce_if_block *ce_info 
ATTRIBUTE_UNUSED,
 /* Return the condition for a jump.  Do not do any special processing.  */
 
 static rtx
-cond_exec_get_condition (rtx_insn *jump)
+cond_exec_get_condition (rtx_insn *jump, bool get_reversed = false)
 {
   rtx test_if, cond;
 
@@ -439,8 +439,9 @@ cond_exec_get_condition (rtx_insn *jump)
 
   /* If this branches to JUMP_LABEL when the condition is false,
  reverse the condition.  */
-  if (GET_CODE (XEXP (test_if, 2)) == LABEL_REF
-  && label_ref_label (XEXP (test_if, 2)) == JUMP_LABEL (jump))
+  if (get_reversed || (GET_CODE (XEXP (test_if, 2)) == LABEL_REF
+  && label_ref_label (XEXP (test_if, 2))
+  == JUMP_LABEL (jump)))
 {
   enum rtx_code rev = reversed_comparison_code (cond, jump);
   if (rev == UNKNOWN)
@@ -3146,6 +3147,46 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx 
cond,
   return false;
 }
 
+/* Helper function to emit a cmov sequence.  */
+
+static rtx_insn*
+try_emit_cmove_seq (struct noce_if_info *if_info, rtx temp,
+   rtx cond, rtx new_val, rtx old_val, bool need_cmov,
+   unsigned *cost, rtx *temp_dest,
+   rtx cc_cmp = NULL, rtx rev_cc_cmp = NULL)
+{
+  rtx_insn *seq = NULL;
+  *cost = 0;
+
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  start_sequence ();
+
+  if (need_cmov)
+*temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+ x, y, new_val, old_val, cc_cmp, rev_cc_cmp);
+  else
+{
+  *temp_dest = temp;
+  if (if_info->then_else_reversed)
+   noce_emit_move_insn (temp, old_val);
+  else
+   noce_emit_move_insn (temp, new_val);
+}
+
+  if (*temp_dest != NULL_RTX)
+{
+  seq = get_insns ();
+  *cost = seq_cost (seq, if_info->speed_p);
+}
+
+  end_sequence ();
+
+  return seq;
+}
+
 /* We have something like:
 
  if (x > y)
@@ -3203,7 +3244,9 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   rtx cond = noce_get_condition (jump, _earliest, false);
   rtx x = XEXP (cond, 0);
   rtx y = XEXP (cond, 1);
-  rtx_code cond_code = GET_CODE (cond);
+
+  rtx cc_cmp = cond_exec_get_condition (jump);
+  rtx rev_cc_cmp = cond_exec_get_condition (jump, /* get_reversed */ true);
 
   /* The true targets for a conditional move.  */
   auto_vec targets;
@@ -3317,31 +3360,51 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
}
 
-  rtx temp_dest = NULL_RTX;
+  /* Try emitting a conditional move passing the backend the
+canonicalized comparison.  The backend is then able to
+recognize expressions like
 
-  if (need_cmov)
+  if (x > y)
+y = x;
+
+as min/max and emit an insn, accordingly.  */
+  unsigned cost1 = 0, cost2 = 0;
+  rtx_insn *seq, *seq1, *seq2;
+  rtx temp_dest = NULL_RTX, temp_dest1 = NULL_RTX, temp_dest2 = NULL_RTX;
+
+  seq1 = try_emit_cmove_seq (if_info, temp, cond,
+new_val, old_val, need_cmov,
+, _dest1);
+
+  /* Here, we try to pass the backend a non-canonicalized cc comparison
+as well.  This allows the backend to emit a cmov directly without
+creating an additional compare for each.  If successful, costing
+is easier and this sequence is usually preferred.  */
+  seq2 = try_emit_cmove_seq (if_info, target, cond,
+new_val, old_val, need_cmov,
+, _dest2, cc_cmp, rev_cc_cmp);
+
+  /* Check which version is less expensive.  */
+  if (seq1 != NULL_RTX && (cost1 <= cost2 || seq2 == NULL_RTX))
{
- /* Actually emit the conditional move.  */
- temp_dest = noce_emit_cmove (if_info, temp, cond_code,
-

[PATCH v3 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.

2021-12-06 Thread Robin Dapp via Gcc-patches

Currently we only ever call emit_conditional_move with the comparison
(as well as its comparands) we got from the jump.  Thus, backends are
going to emit a CC comparison for every conditional move that is being
generated instead of re-using the existing CC.
This, combined with emitting temporaries for each conditional move,
causes sky-high costs for conditional moves.

This patch allows to re-use a CC so the costing situation is improved a
bit.
---
 gcc/expmed.c |   8 +--
 gcc/expr.c   |  10 ++--
 gcc/ifcvt.c  |  45 ++---
 gcc/optabs.c | 135 ++-
 gcc/optabs.h |   4 +-
 gcc/rtl.h|  11 -
 6 files changed, 150 insertions(+), 63 deletions(-)

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 59734d4841c..af51bad7cfc 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -4122,8 +4122,8 @@ expand_sdiv_pow2 (scalar_int_mode mode, rtx op0, 
HOST_WIDE_INT d)
   temp = force_reg (mode, temp);
 
   /* Construct "temp2 = (temp2 < 0) ? temp : temp2".  */
-  temp2 = emit_conditional_move (temp2, LT, temp2, const0_rtx,
-mode, temp, temp2, mode, 0);
+  temp2 = emit_conditional_move (temp2, { LT, temp2, const0_rtx, mode },
+temp, temp2, mode, 0);
   if (temp2)
{
  rtx_insn *seq = get_insns ();
@@ -6125,10 +6125,10 @@ emit_store_flag (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
return 0;
 
   if (and_them)
-   tem = emit_conditional_move (target, code, op0, op1, mode,
+   tem = emit_conditional_move (target, { code, op0, op1, mode },
 tem, const0_rtx, GET_MODE (tem), 0);
   else
-   tem = emit_conditional_move (target, code, op0, op1, mode,
+   tem = emit_conditional_move (target, { code, op0, op1, mode },
 trueval, tem, GET_MODE (tem), 0);
 
   if (tem == 0)
diff --git a/gcc/expr.c b/gcc/expr.c
index e0bcbccd905..c5631a9dd2e 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8840,8 +8840,9 @@ expand_cond_expr_using_cmove (tree treeop0 
ATTRIBUTE_UNUSED,
 op2 = gen_lowpart (mode, op2);
 
   /* Try to emit the conditional move.  */
-  insn = emit_conditional_move (temp, comparison_code,
-   op00, op01, comparison_mode,
+  insn = emit_conditional_move (temp,
+   { comparison_code, op00, op01,
+ comparison_mode },
op1, op2, mode,
unsignedp);
 
@@ -9732,8 +9733,9 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode 
tmode,
start_sequence ();
 
/* Try to emit the conditional move.  */
-   insn = emit_conditional_move (target, comparison_code,
- op0, cmpop1, mode,
+   insn = emit_conditional_move (target,
+ { comparison_code,
+   op0, cmpop1, mode },
  op0, op1, mode,
  unsignedp);
 
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 7e1ae2564a3..3e78e1bb03d 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -772,7 +772,7 @@ static int noce_try_addcc (struct noce_if_info *);
 static int noce_try_store_flag_constants (struct noce_if_info *);
 static int noce_try_store_flag_mask (struct noce_if_info *);
 static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
-   rtx, rtx, rtx);
+   rtx, rtx, rtx, rtx = NULL, rtx = NULL);
 static int noce_try_cmove (struct noce_if_info *);
 static int noce_try_cmove_arith (struct noce_if_info *);
 static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
@@ -1711,7 +1711,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
 static rtx
 noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
-rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue)
+rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue, rtx cc_cmp,
+rtx rev_cc_cmp)
 {
   rtx target ATTRIBUTE_UNUSED;
   int unsignedp ATTRIBUTE_UNUSED;
@@ -1743,23 +1744,30 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, 
enum rtx_code code,
   end_sequence ();
 }
 
-  /* Don't even try if the comparison operands are weird
- except that the target supports cbranchcc4.  */
-  if (! general_operand (cmp_a, GET_MODE (cmp_a))
-  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
-{
-  if (!have_cbranchcc4
- || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
- || cmp_b != const0_rtx)
-   return NULL_RTX;
-}
-
   unsignedp = (code == LTU || code == GEU
   || code == LEU || code == GTU);
 
-  target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
- vtrue,

[PATCH v3 2/7] ifcvt: Allow constants for noce_convert_multiple.

2021-12-06 Thread Robin Dapp via Gcc-patches

This lifts the restriction of not allowing constants for
noce_convert_multiple.  The code later checks if a valid sequence
is produced anyway.
---
 gcc/ifcvt.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index c98668d5646..4642176957e 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3282,7 +3282,9 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 we'll end up trying to emit r4:HI = cond ? (r1:SI) : (r3:HI).
 Wrap the two cmove operands into subregs if appropriate to prevent
 that.  */
-  if (GET_MODE (new_val) != GET_MODE (temp))
+
+  if (!CONSTANT_P (new_val)
+ && GET_MODE (new_val) != GET_MODE (temp))
{
  machine_mode src_mode = GET_MODE (new_val);
  machine_mode dst_mode = GET_MODE (temp);
@@ -3293,7 +3295,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
}
  new_val = lowpart_subreg (dst_mode, new_val, src_mode);
}
-  if (GET_MODE (old_val) != GET_MODE (temp))
+  if (!CONSTANT_P (old_val)
+ && GET_MODE (old_val) != GET_MODE (temp))
{
  machine_mode src_mode = GET_MODE (old_val);
  machine_mode dst_mode = GET_MODE (temp);
@@ -3435,9 +3438,9 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
   if (!REG_P (dest))
return false;
 
-  if (!(REG_P (src)
-  || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
-  && subreg_lowpart_p (src
+  if (!((REG_P (src) || CONSTANT_P (src))
+   || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
+ && subreg_lowpart_p (src
return false;
 
   /* Destination must be appropriate for a conditional write.  */
-- 
2.31.1

[PATCH v3 3/7] ifcvt: Improve costs handling for noce_convert_multiple.

2021-12-06 Thread Robin Dapp via Gcc-patches

When noce_convert_multiple is called the original costs are not yet
initialized.  Therefore, up to now, costs were only ever unfairly
compared against COSTS_N_INSNS (2).  This would lead to
default_noce_conversion_profitable_p () rejecting all but the most
contrived of sequences.

This patch temporarily initializes the original costs by counting
adding costs for all sets inside the then_bb.
---
 gcc/ifcvt.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 4642176957e..7e1ae2564a3 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3408,14 +3408,17 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
(SET (REG) (REG)) insns suitable for conversion to a series
of conditional moves.  Also check that we have more than one set
(other routines can handle a single set better than we would), and
-   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  */
+   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
+   through the insns store the sum of their potential costs in COST.  */
 
 static bool
-bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
 {
   rtx_insn *insn;
   unsigned count = 0;
   unsigned param = param_max_rtl_if_conversion_insns;
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+  unsigned potential_cost = 0;
 
   FOR_BB_INSNS (test_bb, insn)
 {
@@ -3451,9 +3454,13 @@ bb_ok_for_noce_convert_multiple_sets (basic_block 
test_bb)
   if (!can_conditionally_move_p (GET_MODE (dest)))
return false;
 
+  potential_cost += insn_cost (insn, speed_p);
+
   count++;
 }
 
+  *cost += potential_cost;
+
   /* If we would only put out one conditional move, the other strategies
  this pass tries are better optimized and will be more appropriate.
  Some targets want to strictly limit the number of conditional moves
@@ -3501,11 +3508,24 @@ noce_process_if_block (struct noce_if_info *if_info)
  to calculate a value for x.
  ??? For future expansion, further expand the "multiple X" rules.  */
 
-  /* First look for multiple SETS.  */
+  /* First look for multiple SETS.  The original costs already include
+ a base cost of COSTS_N_INSNS (2): one instruction for the compare
+ (which we will be needing either way) and one instruction for the
+ branch.  When comparing costs we want to use the branch instruction
+ cost and the sets vs. the cmovs generated here.  Therefore subtract
+ the costs of the compare before checking.
+ ??? Actually, instead of the branch instruction costs we might want
+ to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
+
+  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  unsigned old_cost = if_info->original_cost;
   if (!else_bb
   && HAVE_conditional_move
-  && bb_ok_for_noce_convert_multiple_sets (then_bb))
+  && bb_ok_for_noce_convert_multiple_sets (then_bb, _cost))
 {
+  /* Temporarily set the original costs to what we estimated so
+we can determine if the transformation is worth it.  */
+  if_info->original_cost = potential_cost;
   if (noce_convert_multiple_sets (if_info))
{
  if (dump_file && if_info->transform_name)
@@ -3513,6 +3533,9 @@ noce_process_if_block (struct noce_if_info *if_info)
 if_info->transform_name);
  return TRUE;
}
+
+  /* Restore the original costs.  */
+  if_info->original_cost = old_cost;
 }
 
   bool speed_p = optimize_bb_for_speed_p (test_bb);
-- 
2.31.1

[PATCH v3 1/7] ifcvt: Check if cmovs are needed.

2021-12-06 Thread Robin Dapp via Gcc-patches

When if-converting multiple SETs and we encounter a swap-style idiom

  if (a > b)
{
  tmp = c;   // [1]
  c = d;
  d = tmp;
}

ifcvt should not generate a conditional move for the instruction at
[1].

In order to achieve that, this patch goes through all relevant SETs
and marks the relevant instructions.  This helps to evaluate costs.

On top, only generate temporaries if the current cmov is going to
overwrite one of the comparands of the initial compare.
---
 gcc/ifcvt.c | 174 
 1 file changed, 150 insertions(+), 24 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 017944f4f79..c98668d5646 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -98,6 +98,8 @@ static int dead_or_predicable (basic_block, basic_block, 
basic_block,
   edge, int);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
+static void need_cmov_or_rewire (basic_block, hash_set *,
+hash_map *);
 
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3203,6 +3205,11 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   auto_vec unmodified_insns;
   int count = 0;
 
+  hash_set need_no_cmov;
+  hash_map rewired_src;
+
+  need_cmov_or_rewire (then_bb, _no_cmov, _src);
+
   FOR_BB_INSNS (then_bb, insn)
 {
   /* Skip over non-insns.  */
@@ -3213,26 +3220,49 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
   gcc_checking_assert (set);
 
   rtx target = SET_DEST (set);
-  rtx temp = gen_reg_rtx (GET_MODE (target));
+  rtx temp;
+
   rtx new_val = SET_SRC (set);
+  if (int *ii = rewired_src.get (insn))
+   new_val = simplify_replace_rtx (new_val, targets[*ii],
+   temporaries[*ii]);
   rtx old_val = target;
 
-  /* If we were supposed to read from an earlier write in this block,
-we've changed the register allocation.  Rewire the read.  While
-we are looking, also try to catch a swap idiom.  */
-  for (int i = count - 1; i >= 0; --i)
-   if (reg_overlap_mentioned_p (new_val, targets[i]))
- {
-   /* Catch a "swap" style idiom.  */
-   if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
- /* The write to targets[i] is only live until the read
-here.  As the condition codes match, we can propagate
-the set to here.  */
- new_val = SET_SRC (single_set (unmodified_insns[i]));
-   else
- new_val = temporaries[i];
-   break;
- }
+  /* As we are transforming
+if (x > y)
+  {
+a = b;
+c = d;
+  }
+into
+  a = (x > y) ...
+  c = (x > y) ...
+
+we potentially check x > y before every set.
+Even though the check might be removed by subsequent passes, this means
+that we cannot transform
+  if (x > y)
+{
+  x = y;
+  ...
+}
+into
+  x = (x > y) ...
+  ...
+since this would invalidate x and the following to-be-removed checks.
+Therefore we introduce a temporary every time we are about to
+overwrite a variable used in the check.  Costing of a sequence with
+these is going to be inaccurate so only use temporaries when
+needed.  */
+  if (reg_overlap_mentioned_p (target, cond))
+   temp = gen_reg_rtx (GET_MODE (target));
+  else
+   temp = target;
+
+  /* We have identified swap-style idioms before.  A normal
+set will need to be a cmov while the first instruction of a swap-style
+idiom can be a regular move.  This helps with costing.  */
+  bool need_cmov = !need_no_cmov.contains (insn);
 
   /* If we had a non-canonical conditional jump (i.e. one where
 the fallthrough is to the "else" case) we need to reverse
@@ -3275,16 +3305,29 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
}
 
-  /* Actually emit the conditional move.  */
-  rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+  rtx temp_dest = NULL_RTX;
+
+  if (need_cmov)
+   {
+ /* Actually emit the conditional move.  */
+ temp_dest = noce_emit_cmove (if_info, temp, cond_code,
   x, y, new_val, old_val);
 
-  /* If we failed to expand the conditional move, drop out and don't
-try to continue.  */
-  if (temp_dest == NULL_RTX)
+ /* If we failed to expand the conditional move, drop out and don't
+try to continue.  */
+ if (temp_dest == NULL_RTX)
+   {
+ end_sequence ();
+ return FALSE;
+   }
+   }
+  else
{
- end_sequence ();
-

[PATCH v3 0/7] ifcvt: Convert multiple

2021-12-06 Thread Robin Dapp via Gcc-patches

Hi Richard,

as requested, this is the whole series (without the s390-specific parts)
in total.  7/7 is "new" as it was posted separately as "8/7" before
but actually does not change too much compared to the rest.

Supposing that the adjusted 4/7 is OK after recent discussions, if I'm
not mistaken, 5/7, 6/7 (very minor) and 7/7 still need review.

Robin Dapp (7):
  ifcvt: Check if cmovs are needed.
  ifcvt: Allow constants for noce_convert_multiple.
  ifcvt: Improve costs handling for noce_convert_multiple.
  ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.
  ifcvt: Try re-using CC for conditional moves.
  testsuite/s390: Add tests for noce_convert_multiple.
  ifcvt: Run second pass if it is possible to omit a temporary.

 gcc/expmed.c  |   8 +-
 gcc/expr.c|  10 +-
 gcc/ifcvt.c   | 366 +++---
 gcc/optabs.c  | 135 +--
 gcc/optabs.h  |   4 +-
 gcc/rtl.h |  11 +-
 gcc/testsuite/gcc.dg/ifcvt-4.c|   2 +-
 .../gcc.target/s390/ifcvt-two-insns-bool.c|  39 ++
 .../gcc.target/s390/ifcvt-two-insns-int.c |  39 ++
 .../gcc.target/s390/ifcvt-two-insns-long.c|  39 ++
 10 files changed, 551 insertions(+), 102 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c

-- 
2.31.1

Re: [PATCH 2/2] Use dominators to reduce ranger cache-flling.

2021-12-06 Thread Andrew MacLeod via Gcc-patches


On 12/6/21 02:27, Richard Biener wrote:

On Fri, Dec 3, 2021 at 9:42 PM Andrew MacLeod via Gcc-patches
 wrote:

When a request is made for the range of an ssa_name at some location,
the first thing we do is invoke range_of_stmt() to ensure we have looked
at the definition and have an evaluation for the name at a global
level.  I recently added a patch which dramatically reduces the call
stack requirements for that call.

Once we have confirmed the definition range has been set, a call is made
for the range-on-entry to the block of the use.  This is performed by
the cache, which proceeds to walk the CFG predecessors looking for when
ranges are created  (exported), existing range-on-entry cache hits,  or
the definition block. Once this list of  predecessors has been created,
a forward walk is done, pushing the range's through successor edges of
all the blocks  were visited in the initial walk.

If the use is far from the definition, we end up filling a lot of the
same value on these paths.  Also uses which are far from a
range-modifying statement push the same value into a lot of blocks.

This patch tries to address at least some inefficiencies.  It recognizes
that

First, if there is no range modifying stmt between this use and the last
range we saw in a dominating block, we can just use the value from the
dominating block and not fill in all the cache entries between here and
there.  This is the biggest win.

Second. if there is a range modifying statement at the end of some
block, we will have to do the appropriate cache walk to this point, but
its possible the range-on-entry to THAT block might be able to use a
dominating range, and we can prevent the walk from going any further
than this block

Combined, this should prevent a lot of unnecessary ranges from being
plugging into the cache.

ie, to visualize:

bb4:
a = foo()
<..>
bb60:
 if (a < 30)
<...>
bb110:
  g = a + 10

if the first place we ask for a is in bb110, we walk the CFG from 110
all the way back to bb4, on all paths leading back. then fill all those
cache entries.
With this patch,
a) if bb60 does not dominate bb110, the request will scan the
dominators, arrive at the definition block, have seen no range modifier,
and simply set the on-entry for 110 to the range of a. done.
b) if bb60 does dominate 110, we have no idea which edge out of 60
dominates it, so we will revert to he existing cache algorithm.  Before
doing so, it checks and determines that there are no modifiers between
bb60 and the def in bb4, and so sets the on-entry cache for bb60 to be
the range of a.   Now when we do the cache fill walk, it only has to go
back as far as bb60 instead of all the way to bb4.

Otherwise we just revert to what we do now (or if dominators are not
available).   I have yet to see a case where we miss something we use to
get, but that does not mean there isn't one :-).

The cumulative performance impact of this compiling a set of 390 GCC
source files at -O2 (measured via callgrind) is pretty decent:  Negative
numbers are a compile time decrease.  Thus -10% is 10% faster
compilation time.

EVRP : %change from trunk is -26.31% (!)
VRP2 : %change from trunk is -9.53%
thread_jumps_full   : %change from trunk is -15.8%
Total compilation time  : %change from trunk is -1.06%

So its not insignificant.

Risk would be very low, unless dominators are screwed up mid-pass.. but
then the relation code would be screwed up too.

Bootstrapped on  x86_64-pc-linux-gnu with no regressions. OK?

OK.


Committed.




Wow - I didn't realize we have this backwards CFG walk w/o dominators ...
I wonder if you can add a counter to visualize places we end up using this path.


Well, its only does the fill now when there is range info located on an 
outgoing edge of the dominator.  Its still used, just on a much smaller 
portion of the graph.


We could do even better if we knew whether one edge was involved. ie

bb3:
a = foo()
if (a > 4)
   blah1;   bb4
else
   blah2;   bb5
bb6:
 = a;

The use of a in bb6 will look to bb3 as the dominator, but if it knew 
that both edges are used in that dominance relation (ie, neither 
outgoing edge dominates bb6),  it wouldn't have to calculate a range for 
'a'.


But as it stands, it cant really tell the above situation from:

bb3:
a = foo()
if (a > 4)
    = a    bb4

In this case, only the one edge is used, and we DO need to calculate a 
range for a.  The edge from 3->4 does dominate bb4


In both cases, bb3 is the dominator, and bb3 can generate an outgoing 
range, but only in one case do we need to calculate a range.


What would be really useful would be to be able to tell if an edge 
dominates a block :-)


I have thoughts on this for next release, and may overhaul the cache... 
but I don't think there is any such facility in the dominator subsystem?


Andrew

Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes

2021-12-06 Thread Martin Sebor via Gcc-patches


On 11/18/21 4:34 PM, David Malcolm via Gcc-patches wrote:

On Wed, 2021-11-17 at 22:43 +, Joseph Myers wrote:

On Wed, 17 Nov 2021, Prathamesh Kulkarni via Gcc-patches wrote:


More generally, would it be a good idea to provide attributes for
mod/ref anaylsis ?
So sth like:
void foo(void) __attribute__((modifies(errno)));
which would state that foo modifies errno, but neither reads nor
modifies any other global var.
and
void bar(void) __attribute__((reads(errno)))
which would state that bar only reads errno, and doesn't modify or
read any other global var.


Many math.h functions are const except for possibly setting errno,
possibly raising floating-point exceptions (which might have other
effects
when using alternate exception handling) and possibly reading the
rounding
mode.  To represent that, it might be useful for such attributes to
be
able to describe state (such as the floating-point environment) that
doesn't correspond to a C identifier.  (errno tends to be a macro, so
referring to it as such in an attribute may be awkward as well.)

(See also 
with
some proposals for features to describe const/pure-like properties of
functions.)



Thanks for the link.

As noted in my reply to Prathamesh, these ideas sound interesting, but
this thread seems to be entering scope creep - I don't need these ideas
to implement this patch kit (but I do need the attributes specified in
the patch, or similar).

Do the specific attributes I posted sound reasonable?  (without
necessarily going in to a full review).

If we're thinking longer term, I want the ability to express that a
function can have multiple outcomes (e.g. "success" vs "failure" or
"found" vs "not found", etc), and it might be good to have a way to
attach attributes to those outcomes.  Unfortunately the attribute
syntax is flat, but maybe there could be a two level hierarchy,
something like:

int foo (args)
   __attribute__((outcome("success")
  __attribute__((return_value(0
   __attribute__((outcome("failure")
  __attribute__((return_value_ne(0))
  __attribute__((modifies(errno);

Or given that we're enamored by Lisp-ish DSLs we could go the whole hog
and have something like:

int foo (args)
   __attribute ((semantics(
 "(def-outcomes (success (return-value (eq 0))"
 "  (failure (return-value (ne 0)"
 "modifies (errno")));

which may be over-engineering things :)


For a fully general solution, one that can express (nearly)
arbitrarily complex pre-conditions and invariants, I'd look
at the ideas in the C++ contracts papers.  I don't know if
any of the proposals (there were quite a few) made it possible
to specify postconditions involving function return values,
but I'd think that could be overcome by introducing some
special token like __retval.

Syntactically, one of the nice things about contracts that
I hope should be possible to implement in our attributes is
a way to refer to formal function arguments by name rather
than by their position in the argument list.  With that,
the expressivity goes up dramatically because it becomes
possible to use any C expression.

Martin


Going back to the patch itself, returns_zero_on_success/failure get me
what I want to express for finding trust boundaries in the Linux
kernel, have obvious meaning to a programmer (helpful even w/o compiler
support), and could interoperate with one the more elaborate ideas in
this thread.

Hope this is constructive
Dave

Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries

2021-12-06 Thread Martin Sebor via Gcc-patches


On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:

[Crossposting between gcc-patches@gcc.gnu.org and
linux-toolcha...@vger.kernel.org; sorry about my lack of kernel
knowledge, in case of the following seems bogus]

I've been trying to turn my prototype from the LPC2021 session on
"Adding kernel-specific test coverage to GCC's -fanalyzer option"
( https://linuxplumbersconf.org/event/11/contributions/1076/ ) into
something that can go into GCC upstream without adding kernel-specific
special cases, or requiring a GCC plugin.  The prototype simply
specialcased "copy_from_user" and "copy_to_user" in GCC, which is
clearly not OK.

This GCC patch kit implements detection of "trust boundaries", aimed at
detection of "infoleaks" and of use of unsanitized attacker-controlled
values ("taint") in the Linux kernel.

For example, here's an infoleak diagnostic (using notes to
express what fields and padding within a struct have not been
initialized):

infoleak-CVE-2011-1078-2.c: In function ‘test_1’:
infoleak-CVE-2011-1078-2.c:28:9: warning: potential exposure of sensitive
   information by copying uninitialized data from stack across trust
   boundary [CWE-200] [-Wanalyzer-exposure-through-uninit-copy]
28 | copy_to_user(optval, , sizeof(cinfo));
   | ^~~
   ‘test_1’: events 1-3
 |
 |   21 | struct sco_conninfo cinfo;
 |  | ^
 |  | |
 |  | (1) region created on stack here
 |  | (2) capacity: 6 bytes
 |..
 |   28 | copy_to_user(optval, , sizeof(cinfo));
 |  | ~~~
 |  | |
 |  | (3) uninitialized data copied from stack here
 |
infoleak-CVE-2011-1078-2.c:28:9: note: 1 byte is uninitialized
28 | copy_to_user(optval, , sizeof(cinfo));
   | ^~~
infoleak-CVE-2011-1078-2.c:14:15: note: padding after field ‘dev_class’ is 
uninitialized (1 byte)
14 | __u8  dev_class[3];
   |   ^
infoleak-CVE-2011-1078-2.c:21:29: note: suggest forcing zero-initialization by 
providing a ‘{0}’ initializer
21 | struct sco_conninfo cinfo;
   | ^
   |   = {0}

I have to come up with a way of expressing trust boundaries in a way
that will be:
- acceptable to the GCC community (not be too kernel-specific), and
- useful to the Linux kernel community.

At LPC it was pointed out that the kernel already has various
annotations e.g. "__user" for different kinds of pointers, and that it
would be best to reuse those.


Approach 1: Custom Address Spaces
=

GCC's C frontend supports target-specific address spaces; see:
   https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
Quoting the N1275 draft of ISO/IEC DTR 18037:
   "Address space names are ordinary identifiers, sharing the same name
   space as variables and typedef names.  Any such names follow the same
   rules for scope as other ordinary identifiers (such as typedef names).
   An implementation may provide an implementation-defined set of
   intrinsic address spaces that are, in effect, predefined at the start
   of every translation unit.  The names of intrinsic address spaces must
   be reserved identifiers (beginning with an underscore and an uppercase
   letter or with two underscores).  An implementation may also
   optionally support a means for new address space names to be defined
   within a translation unit."

Patch 1a in the following patch kit for GCC implements such a means to
define new address spaces names in a translation unit, via a pragma:
   #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)

For example, the Linux kernel could perhaps write:

   #define __kernel
   #pragma GCC custom_address_space(__user)
   #pragma GCC custom_address_space(__iomem)
   #pragma GCC custom_address_space(__percpu)
   #pragma GCC custom_address_space(__rcu)

and thus the C frontend can complain about code that mismatches __user
and kernel pointers, e.g.:

custom-address-space-1.c: In function ‘test_argpass_to_p’:
custom-address-space-1.c:29:14: error: passing argument 1 of ‘accepts_p’
from pointer to non-enclosed address space
29 |   accepts_p (p_user);
   |  ^~
custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
of type ‘__user void *’
21 | extern void accepts_p (void *);
   |^~
custom-address-space-1.c: In function ‘test_cast_k_to_u’:
custom-address-space-1.c:135:12: warning: cast to ‘__user’ address space
pointer from disjoint generic address space pointer
   135 |   p_user = (void __user *)p_kernel;
   |^


This seems like an excellent use

[PATCH v2 5/5] fix up compute_objsize: add a dump function

2021-12-06 Thread Martin Sebor via Gcc-patches


Attached is the subset of the patch in part (5) below: Add
a new dump function.  It applies on top of patch 4/5.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they need 
to be a single patch and many reasons why they should be a series of 
independent patches.    Combining them into a single patch isn't how we 
do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff



commit 2054b01fb383560b96d51fabfe9dee6dbd611f4a
Author: Martin Sebor 
Date:   Mon Dec 6 09:52:32 2021 -0700

Add a new dump function.

gcc/ChangeLog:

* pointer-query.cc (access_ref::dump): Define new function
(pointer_query::dump): Call it.
* pointer-query.h (access_ref::dump): Declare new function.

diff --git a/gcc/pointer-query.cc b/gcc/pointer-query.cc
index 3f583110c71..e618c4d7276 100644
--- a/gcc/pointer-query.cc
+++ b/gcc/pointer-query.cc
@@ -1240,6 +1240,63 @@ access_ref::inform_access (access_mode mode, int ostype /* = 1 */) const
 	sizestr, allocfn);
 }
 
+/* Dump *THIS to FILE.  */
+
+void
+access_ref::dump (FILE *file) const
+{
+  for (int i = deref; i < 0; ++i)
+fputc ('&', file);
+
+  for (int i = 0; i < deref; ++i)
+fputc ('*', file);
+
+  if (gphi *phi_stmt = phi ())
+{
+  fputs ("PHI <", file);
+  unsigned nargs = gimple_phi_num_args (phi_stmt);
+  for (unsigned i = 0; i != nargs; ++i)
+	{
+	  tree arg = gimple_phi_arg_def (phi_stmt, i);
+	  print_generic_expr (file, arg);
+	  if (i + 1 < nargs)
+	fputs (", ", file);
+	}
+  fputc ('>', file);
+}
+  else
+print_generic_expr (file, ref);
+
+  if (offrng[0] != offrng[1])
+fprintf (file, " + [%lli, %lli]",
+	 (long long) offrng[0].to_shwi (),
+	 (long long) offrng[1].to_shwi ());
+  else if (offrng[0] != 0)
+fprintf (file, " %c %lli",
+	 offrng[0] < 0 ? '-' : '+',
+	 (long long) offrng[0].to_shwi ());
+
+  if (base0)
+fputs (" (base0)", file);
+
+  fputs ("; size: ", file);
+  if (sizrng[0] != sizrng[1])
+{
+  offset_int maxsize = wi::to_offset (max_object_size ());
+  if (sizrng[0] == 0 && sizrng[1] >= maxsize)
+	fputs ("unknown", file);
+  else
+	fprintf (file, "[%llu, %llu]",
+		 (unsigned long long) sizrng[0].to_uhwi (),
+		 (unsigned long long) sizrng[1].to_uhwi ());
+}
+  else if (sizrng[0] != 0)
+fprintf (file, "%llu",
+	 (unsigned long long) sizrng[0].to_uhwi ());
+
+  fputc ('\n', file);
+}
+
 /* Set the access to at most MAXWRITE and MAXREAD bytes, and at least 1
when MINWRITE or MINREAD, respectively, is set.  */

[PATCH v2 4/5] fix up compute_objsize: refactor it into helpers

2021-12-06 Thread Martin Sebor via Gcc-patches


Attached is the subset of the patch in part (2) below: refactor
compute_objsize_r into helpers.  It applies on top of patch 3/5.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they need 
to be a single patch and many reasons why they should be a series of 
independent patches.    Combining them into a single patch isn't how we 
do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff



commit 7d1d8cc18678ccaec5f274919e0c9910ccfea86e
Author: Martin Sebor 
Date:   Mon Dec 6 09:33:32 2021 -0700

Refactor compute_objsize_r into helpers.

gcc/ChangeLog:

* pointer-query.cc (compute_objsize_r): Add an argument.
(gimple_call_return_array): Pass a new argument to compute_objsize_r.
(access_ref::merge_ref): Same.
(access_ref::inform_access): Add an argument and use it.
(access_data::access_data): Initialize new member.
(handle_min_max_size): Pass a new argument to compute_objsize_r.
(handle_decl): New function.
(handle_array_ref): Pass a new argument to compute_objsize_r.
Avoid incrementing deref.
(set_component_ref_size): New function.
(handle_component_ref): New function.
(handle_mem_ref): Pass a new argument to compute_objsize_r.
Only increment deref after successfully computing object size.
(handle_ssa_name): New function.
(compute_objsize_r): Move code into helpers and call them.
(compute_objsize): Pass a new argument to compute_objsize_r.
* pointer-query.h (access_ref::inform_access): Add an argument.
(access_data::ostype): New member.

diff --git a/gcc/pointer-query.cc b/gcc/pointer-query.cc
index 24fbac84ec4..3f583110c71 100644
--- a/gcc/pointer-query.cc
+++ b/gcc/pointer-query.cc
@@ -43,7 +43,7 @@
 #include "tree-ssanames.h"
 #include "target.h"
 
-static bool compute_objsize_r (tree, gimple *, int, access_ref *,
+static bool compute_objsize_r (tree, gimple *, bool, int, access_ref *,
 			   ssa_name_limit_t &, pointer_query *);
 
 /* Wrapper around the wide_int overload of get_range that accepts
@@ -199,7 +199,7 @@ gimple_call_return_array (gimple *stmt, offset_int offrng[2], bool *past_end,
 	   of the source object.  */
 	access_ref aref;
 	tree src = gimple_call_arg (stmt, 1);
-	if (compute_objsize_r (src, stmt, 1, , snlim, qry)
+	if (compute_objsize_r (src, stmt, false, 1, , snlim, qry)
 		&& aref.sizrng[1] < offrng[1])
 	  offrng[1] =

[PATCH v2 3/5] fix up compute_objsize: factor out PHI handling

2021-12-06 Thread Martin Sebor via Gcc-patches


Attached is subset of the patch in part (4) below: factor out
PHI handling.  It applies on top of patch 3/5.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they need 
to be a single patch and many reasons why they should be a series of 
independent patches.    Combining them into a single patch isn't how we 
do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff



commit 6ac1d37947ad5cf07fe133faaf8414f00e0eed13
Author: Martin Sebor 
Date:   Mon Dec 6 09:23:22 2021 -0700

Introduce access_ref::merge_ref.

gcc/ChangeLog:

* pointer-query.cc (access_ref::merge_ref): Define new function.
(access_ref::get_ref): Move code into merge_ref and call it.
* pointer-query.h (access_ref::merge_ref): Declare new function.

diff --git a/gcc/pointer-query.cc b/gcc/pointer-query.cc
index c75c4da6b60..24fbac84ec4 100644
--- a/gcc/pointer-query.cc
+++ b/gcc/pointer-query.cc
@@ -624,6 +624,97 @@ access_ref::phi () const
   return as_a  (def_stmt);
 }
 
+/* Determine the size and offset for ARG, append it to ALL_REFS, and
+   merge the result with *THIS.  Ignore ARG if SKIP_NULL is set and
+   ARG refers to the null pointer.  Return true on success and false
+   on failure.  */
+
+bool
+access_ref::merge_ref (vec *all_refs, tree arg, gimple *stmt,
+		   int ostype, bool skip_null,
+		   ssa_name_limit_t , pointer_query )
+{
+  access_ref aref;
+  if (!compute_objsize_r (arg, stmt, ostype, , snlim, )
+  || aref.sizrng[0] < 0)
+/* This may be a PHI with all null pointer arguments.  */
+return false;
+
+  if (all_refs)
+{
+  access_ref dummy_ref;
+  aref.get_ref (all_refs, _ref, ostype, , );
+}
+
+  if (TREE_CODE (arg) == SSA_NAME)
+qry.put_ref (arg, aref, ostype);
+
+  if (all_refs)
+all_refs->safe_push (aref);
+
+  aref.deref += deref;
+
+  bool merged_parmarray = aref.parmarray;
+
+  const bool nullp = skip_null && integer_zerop (arg);
+  const offset_int maxobjsize = wi::to_offset (max_object_size ());
+  offset_int minsize = sizrng[0];
+
+  if (sizrng[0] < 0)
+{
+  /* If *THIS doesn't contain a meaningful result yet set it to AREF
+	 unless the argument is null and it's okay to ignore it.  */
+  if (!nullp)
+	*this = aref;
+
+  /* Set if the current argument refers to one or more objects of
+	 known size (or range of sizes), as opposed to referring to
+	 one or more unknown object(s).  */
+  const bool arg_known_size = (aref.sizrng[0] != 0
+   || aref.sizrng[1] != maxobjsize);
+  if (arg_known_size)
+

[PATCH v2 2/5] fix up compute_objsize: pass GIMPLE statement to it

2021-12-06 Thread Martin Sebor via Gcc-patches


Attached is the subset of the patch in part (3) below: Pass
GIMPLE statement to compute_objsize.  It applies on top of
patch 1/5.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they need 
to be a single patch and many reasons why they should be a series of 
independent patches.    Combining them into a single patch isn't how we 
do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff



commit 64d34f249fb98d53ae8fcb3b36b01c2b9b05ea4f
Author: Martin Sebor 
Date:   Sat Dec 4 16:57:48 2021 -0700

Pass GIMPLE statement to compute_objsize.

gcc/ChangeLog:

* gimple-ssa-warn-restrict.c (builtin_access::builtin_access): Pass
GIMPLE statement to compute_objsize.
* pointer-query.cc (compute_objsize): Add a statement argument.
* pointer-query.h (compute_objsize): Define a new overload.

diff --git a/gcc/gimple-ssa-warn-restrict.c b/gcc/gimple-ssa-warn-restrict.c
index d1df9ca8d4f..ca2d4c2c32e 100644
--- a/gcc/gimple-ssa-warn-restrict.c
+++ b/gcc/gimple-ssa-warn-restrict.c
@@ -777,7 +777,7 @@ builtin_access::builtin_access (range_query *query, gimple *call,
   if (!POINTER_TYPE_P (TREE_TYPE (addr)))
 	addr = build1 (ADDR_EXPR, (TREE_TYPE (addr)), addr);
 
-  if (tree dstsize = compute_objsize (addr, ostype))
+  if (tree dstsize = compute_objsize (addr, call, ostype))
 	dst.basesize = wi::to_offset (dstsize);
   else if (POINTER_TYPE_P (TREE_TYPE (addr)))
 	dst.basesize = HOST_WIDE_INT_MIN;
@@ -791,7 +791,7 @@ builtin_access::builtin_access (range_query *query, gimple *call,
   if (!POINTER_TYPE_P (TREE_TYPE (addr)))
 	addr = build1 (ADDR_EXPR, (TREE_TYPE (addr)), addr);
 
-  if (tree srcsize = compute_objsize (addr, ostype))
+  if (tree srcsize = compute_objsize (addr, call, ostype))
 	src.basesize = wi::to_offset (srcsize);
   else if (POINTER_TYPE_P (TREE_TYPE (addr)))
 	src.basesize = HOST_WIDE_INT_MIN;
diff --git a/gcc/pointer-query.cc b/gcc/pointer-query.cc
index a8c9671e3ba..c75c4da6b60 100644
--- a/gcc/pointer-query.cc
+++ b/gcc/pointer-query.cc
@@ -1645,7 +1645,7 @@ handle_array_ref (tree aref, gimple *stmt, bool addr, int ostype,
   offset_int orng[2];
   tree off = pref->eval (TREE_OPERAND (aref, 1));
   range_query *const rvals = qry ? qry->rvals : NULL;
-  if (!get_offset_range (off, NULL, orng, rvals))
+  if (!get_offset_range (off, stmt, orng, rvals))
 {
   /* Set ORNG to the maximum offset representable in ptrdiff_t.  */
   orng[1] = wi::to_offset (TYPE_MAX_VALUE (ptrdiff_type_node));
@@ -1732,7

[PATCH v2 1/5] fix up compute_objsize: move bndrng into access_data

2021-12-06 Thread Martin Sebor via Gcc-patches


Attached is the subset of the patch in part (1) below:  Move
bndrng from access_ref to access_data.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they need 
to be a single patch and many reasons why they should be a series of 
independent patches.    Combining them into a single patch isn't how we 
do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff



commit 1062c1154beeeae26e86f053946ea733ffb5d136
Author: Martin Sebor 
Date:   Sat Dec 4 16:46:17 2021 -0700

Move bndrng from access_ref to access_data.

gcc/ChangeLog:

* gimple-ssa-warn-access.cc (check_access): Adjust to member name
change.
(pass_waccess::check_strncmp): Same.
* pointer-query.cc (access_ref::access_ref): Remove arguments.
Simpilfy.
(access_data::access_data): Define new ctors.
(access_data::set_bound): Define new member function.
(compute_objsize_r): Remove unnecessary code.
* pointer-query.h (struct access_ref): Remove ctor arguments.
(struct access_data): Declare ctor overloads.
(access_data::dst_bndrng): New member.
(access_data::src_bndrng): New member.

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 48bf8aaff50..05b8d91b71d 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -1337,10 +1337,10 @@ check_access (GimpleOrTree exp, tree dstwrite,
   if (!dstsize)
 dstsize = maxobjsize;
 
-  /* Set RANGE to that of DSTWRITE if non-null, bounded by PAD->DST.BNDRNG
+  /* Set RANGE to that of DSTWRITE if non-null, bounded by PAD->DST_BNDRNG
  if valid.  */
   gimple *stmt = pad ? pad->stmt : nullptr;
-  get_size_range (rvals, dstwrite, stmt, range, pad ? pad->dst.bndrng : NULL);
+  get_size_range (rvals, dstwrite, stmt, range, pad ? pad->dst_bndrng : NULL);
 
   tree func = get_callee_fndecl (exp);
   /* Read vs write access by built-ins can be determined from the const
@@ -1432,9 +1432,9 @@ check_access (GimpleOrTree exp, tree dstwrite,
  of an object.  */
   if (maxread)
 {
-  /* Set RANGE to that of MAXREAD, bounded by PAD->SRC.BNDRNG if
+  /* Set RANGE to that of MAXREAD, bounded by PAD->SRC_BNDRNG if
 	 PAD is nonnull and BNDRNG is valid.  */
-  get_size_range (rvals, maxread, stmt, range, pad ? pad->src.bndrng : NULL);
+  get_size_range (rvals, maxread, stmt, range, pad ? pad->src_bndrng : NULL);
 
   location_t loc = get_location (exp);
   tree size = dstsize;
@@ -1479,12

[PATCH v2] fix PR 103143

2021-12-06 Thread Martin Sebor via Gcc-patches


I have broken up the patch into a series of six.  Attached is
part (1), the fix for the typo that causes PR 103143.

On 12/3/21 5:00 PM, Jeff Law wrote:



On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they need 
to be a single patch and many reasons why they should be a series of 
independent patches.    Combining them into a single patch isn't how we 
do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff



commit 9a5bb7a2b0cdb8654061d9cba543c1408fa7adc9
Author: Martin Sebor 
Date:   Sat Dec 4 16:22:07 2021 -0700

Use the recursive form of compute_objsize [PR 103143].

gcc/ChangeLog:

PR middle-end/103143
* pointer-query.cc (gimple_call_return_array): Call compute_objsize_r.

gcc/testsuite/ChangeLog:
PR middle-end/103143
* gcc.dg/Wstringop-overflow-83.c: New test.

diff --git a/gcc/pointer-query.cc b/gcc/pointer-query.cc
index 2ead0271617..25ce4303849 100644
--- a/gcc/pointer-query.cc
+++ b/gcc/pointer-query.cc
@@ -199,7 +199,7 @@ gimple_call_return_array (gimple *stmt, offset_int offrng[2], bool *past_end,
 	   of the source object.  */
 	access_ref aref;
 	tree src = gimple_call_arg (stmt, 1);
-	if (compute_objsize (src, stmt, 1, , qry)
+	if (compute_objsize_r (src, stmt, 1, , snlim, qry)
 		&& aref.sizrng[1] < offrng[1])
 	  offrng[1] = aref.sizrng[1];
 	  }
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-83.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-83.c
new file mode 100644
index 000..6928ee4d559
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-83.c
@@ -0,0 +1,19 @@
+/* PR  middle-end/103143 - ICE due to infinite recursion in pointer-query.cc
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+typedef __SIZE_TYPE__ size_t;
+
+void foo (size_t x)
+{
+  struct T { char buf[64]; char buf2[64]; } t;
+  char *p = [8];
+  char *r = t.buf2;
+  size_t i;
+
+  for (i = 0; i < x; i++)
+{
+  r = __builtin_mempcpy (r, p, i);
+  p = r + 1;
+}
+}

Re: [PATCH 0/6] rs6000: Remove "old" built-in function infrastructure

2021-12-06 Thread Bill Schmidt via Gcc-patches

I had difficulty with patch 1/6 being too large, and there have been some small
upstream changes in this area, so I will resubmit this series shortly.  There
were also problems with my SMTP server for some of the CCs as well...

Sorry for the churn!
Bill

On 12/3/21 12:22 PM, Bill Schmidt wrote:
> From: Bill Schmidt 
>
> Hi!
>
> Now that the new built-in function support is all upstream and enabled, it
> seems safe and prudent to remove the old code to avoid confusion.  I broke 
> this
> up to the extent possible, but the first patch is a bit large and messy 
> because
> so many dead functions have to be removed when taking out the
> "new_builtins_are_live" variable.
>
> Bill Schmidt (6):
>   rs6000: Remove new_builtins_are_live and dead code it was guarding
>   rs6000: Remove altivec_overloaded_builtins array and initialization
>   rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def
>   rs6000: Remove rs6000-builtin.def and associated data and functions
>   rs6000: Rename functions with "new" in their names
>   rs6000: Rename arrays to remove temporary _x suffix
>
>  gcc/config/rs6000/darwin.h| 8 +-
>  gcc/config/rs6000/rs6000-builtin.def  |  3350 ---
>  ...00-builtin-new.def => rs6000-builtins.def} | 0
>  gcc/config/rs6000/rs6000-c.c  |  1342 +-
>  gcc/config/rs6000/rs6000-call.c   | 17810 +++-
>  gcc/config/rs6000/rs6000-gen-builtins.c   |   115 +-
>  gcc/config/rs6000/rs6000-internal.h   | 2 +-
>  gcc/config/rs6000/rs6000-protos.h | 3 -
>  gcc/config/rs6000/rs6000.c|   334 +-
>  gcc/config/rs6000/rs6000.h|58 -
>  gcc/config/rs6000/t-rs6000| 7 +-
>  11 files changed, 3173 insertions(+), 19856 deletions(-)
>  delete mode 100644 gcc/config/rs6000/rs6000-builtin.def
>  rename gcc/config/rs6000/{rs6000-builtin-new.def => rs6000-builtins.def} 
> (100%)
>

[patch, Fortran] IEEE support for aarch64-apple-darwin

2021-12-06 Thread FX via Gcc-patches

Hi everyone,

Since support for target aarch64-apple-darwin has been submitted for review, 
it’s time to submit the Fortran part, i.e. enabling IEEE support on that target.

The patch has been in use now for several months, in a developer branch shipped 
by some distros on macOS (including Homebrew). It was authored more than a year 
ago, but I figured it wasn’t relevant to submit until the target was actually 
close to be in trunk: 
https://github.com/iains/gcc-darwin-arm64/commit/b107973550d3d9a9ce9acc751adbbe2171d13736

Bootstrapped and tested on aarch64-apple-darwin20 (macOS Big Sur) and 
aarch64-apple-darwin21 (macOS Monterey).

OK to merge?
Can someone point me to the right way of formatting ChangeLogs and commit 
entries, nowadays?


Thanks,
FX



b107973550d3d9a9ce9acc751adbbe2171d13736.patch
Description: Binary data

Re: [RFC][WIP Patch] OpenMP map with iterator + Fortran OpenMP deep mapping / custom allocator (+ Fortran co_reduce)

2021-12-06 Thread Jakub Jelinek via Gcc-patches

On Mon, Dec 06, 2021 at 05:06:10PM +0100, Tobias Burnus wrote:
> Regarding the sorting and iterators: I think we already have this problem
> intrinsically – for depend/affinity, we create for (iterator(...) : 
> a, b)
> a single loop - also to have a consistency with regards to the array bounds.

depend and affinity don't need to sort anything, we ignore affinity
altogether, depend is just an unordered list of (from what we care about) 
addresses
with the kinds next to them, it can contain duplicates etc. (and affinity
if we implemented it can too).
> 
> But if we want to put 'd' between 'a' and 'b' - we either need to split
> the loop - or 'd' cannot be put between 'a' and 'b'. That's a fundamental
> issue. I am not sure whether that's a real issue as all have the same map
> type, but still.
> 
> > but I'd
> > prefer not to outline complex expressions from map's clause as separate
> > function each, it can use many variables etc. from the parent function
> > and calling those as callbacks would be too ugly.
> 
> I concur that it would be useful to avoid using callbacks; it it seems
> as if it can be avoided for iterators. I am not sure how well, but okay.
> 
> But I have no idea how to avoid callbacks for allocatable components in
> Fortran. For
> 
> type t
>   type(t), allocatable :: a
> end t
> type(t) :: var
> 
> (recursive type) - it is at least semi-known at compile time:
>   e = var;
>   while (e)
>{ map(e); e = e->a; }
> I am not sure how to pass this on to the middle end - but
> code for it can be generated.

I bet we'd need to add a target hook for that, but other than that,
I don't see why we'd need a callback at runtime.
Let a target hook in first phase compute how many slots in the 3 arrays
will be needed, then let's allocate the 3 arrays, fill in the static
parts in there and when filling such maps follow the target hook to
emit inline code that fills in those extra mappings.
Note, I think it might be better to do declare mapper support before
doing the recursive allocatables or Fortran polymorphism, because
it will necessarily be affected by declare mapper at each level too.

But generally, I don't see why whatever you want to do with a callback
couldn't be done by just emitting a runtime loop that does something
when filling the arrays.  After all, we'll have such runtime loops even
for simple iterator unless we optimize those as an array descriptor,
map(iterator(i=0:n), to: *foo (i)) - in some way it is inlining what the
callback would do at the GOMP_target_ext etc. caller, but it is actually
the other way around, callbacks would mean outlining what can be done in
mere runtime loops inside of the function that has all the vars etc.
accessible there.

Jakub

Re: [RFC][WIP Patch] OpenMP map with iterator + Fortran OpenMP deep mapping / custom allocator (+ Fortran co_reduce)

2021-12-06 Thread Tobias Burnus


On 06.12.21 16:16, Jakub Jelinek wrote:

I think there is no reason why the 3 arrays passed to GOMP_target_ext
(etc., for target data {, enter, exit} too and because this
affects to and from clauses as well, target update as well)
need to be constant size.
We do a lot of sorting of the map clauses especially during gimplification,
one question is whether it is ok to sort the whole map clause with iterator
as one clause, or if we'd need to do the sorting at runtime.


Regarding sorting at runtime: It looks as if Julian's patches at
 [PATCH 00/16] OpenMP: lvalues in "map" clauses and struct handling rework
can do without run-time sorting.

Regarding the sorting and iterators: I think we already have this problem
intrinsically – for depend/affinity, we create for (iterator(...) : a, 
b)
a single loop - also to have a consistency with regards to the array bounds.

But if we want to put 'd' between 'a' and 'b' - we either need to split
the loop - or 'd' cannot be put between 'a' and 'b'. That's a fundamental
issue. I am not sure whether that's a real issue as all have the same map
type, but still.


but I'd
prefer not to outline complex expressions from map's clause as separate
function each, it can use many variables etc. from the parent function
and calling those as callbacks would be too ugly.


I concur that it would be useful to avoid using callbacks; it it seems
as if it can be avoided for iterators. I am not sure how well, but okay.

But I have no idea how to avoid callbacks for allocatable components in
Fortran. For

type t
  type(t), allocatable :: a
end t
type(t) :: var

(recursive type) - it is at least semi-known at compile time:
  e = var;
  while (e)
   { map(e); e = e->a; }
I am not sure how to pass this on to the middle end - but
code for it can be generated.

But as soon as polymorphism comes into play, I do not see how
callbacks can be avoided. Like for:

  class(t) :: var2

Here, it is known at compile time that var2%a exists (recursively).
But the dynamic type might additionally have var2%b(:) which in turn
might have var2%(:)%c.


I see two places for calling the callback: Either by passing the
Fortran callback function on to libgomp or by generating the
function call handling inside omp-low.c - to populate a nonconstant
array.

Which solution do you prefer?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] libsanitizer: Use SSE to save and restore XMM registers

2021-12-06 Thread Jakub Jelinek via Gcc-patches

On Tue, Nov 30, 2021 at 06:20:09AM -0800, H.J. Lu via Gcc-patches wrote:
> Use SSE, instead of AVX, to save and restore XMM registers to support
> processors without AVX.  The affected codes are unused in upstream since
> 
> https://github.com/llvm/llvm-project/commit/66d4ce7e26a5
> 
> and will be removed in
> 
> https://reviews.llvm.org/D112604
> 
> This fixed
> 
> FAIL: g++.dg/tsan/pthread_cond_clockwait.C   -O0  execution test
> FAIL: g++.dg/tsan/pthread_cond_clockwait.C   -O2  execution test
> 
> on machines without AVX.
> 
>   PR sanitizer/103466
>   * tsan/tsan_rtl_amd64.S (__tsan_trace_switch_thunk): Replace
>   vmovdqu with movdqu.
>   (__tsan_report_race_thunk): Likewise.

Ok, thanks.

Jakub

Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-06 Thread Jan Hubicka via Gcc-patches

> On Mon, Dec 6, 2021 at 4:03 PM Richard Biener
>  wrote:
> >
> > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
> >  wrote:
> > >
> > > Richard Biener  writes:
> > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
> > > >  wrote:
> > > >>
> > > >> When compiling an optabs.ii at -O2 with a release-checking build,
> > > >> we spent a lot of time in call_may_clobber_ref_p.  One of the
> > > >> first things the function tries is the get_modref_function_summary
> > > >> approach, but that also seems to be the most expensive check.

In general it should not be that expensive since the trees are generally
small (despite the 3 nested loops making them look dangerously big) and
most of time we resort to simple pointer/array lookups.

>From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html
(which are quite useful when trying to understand typical behaviour of
the alias oracle)

In call_may_clobber_ref_p we get

2980 :  256141790 :   callee = gimple_call_fndecl (call);
2981 :: 
2982 :  256141790 :   if (callee != NULL_TREE && 
!ref->volatile_p)
2983 :: {
2984 :  240446150 :   struct cgraph_node *node = 
cgraph_node::get (callee);
this is direct pointer from node
2985 :  240446150 :   if (node)
2986 :: {
2987 :  240234208 :   modref_summary *summary = 
get_modref_function_summary (node);
This is fast summary so it is array lookup
2988 :  240234208 :   if (summary)
2989 :: {
2990 :   19732392 :   if (!modref_may_conflict 
(call, summary->stores, ref, tbaa_p)

And we get here in 8% of invocations

And in modref_may_conflict we have:


   21147836 : modref_may_conflict (const gcall *stmt,
2552 ::  modref_tree 
 *tt, ao_ref *ref, bool tbaa_p)
2553 :: {
2554 :   21147836 :   alias_set_type base_set, ref_set;
2555 :: 
2556 :   21147836 :   if (tt->every_base)
2557 :: return true;
2558 :: 
2559 :3407876 :   if (!dbg_cnt (ipa_mod_ref))
2560 :: return true;
2561 :: 
2562 :3407876 :   base_set = ao_ref_base_alias_set 
(ref);
2563 :: 
2564 :3407876 :   ref_set = ao_ref_alias_set (ref);
All direct lookups in most cases
2565 :: 
2566 :3407876 :   int num_tests = 0, max_tests = 
param_modref_max_tests;
2567 :   14479077 :   for (auto base_node : tt->bases)
2568 :: {
2569 :6171739 :   if (tbaa_p && 
flag_strict_aliasing)
At average there are 2 bases travelled by the loop.
2570 :: {
2571 :5253690 :   if (num_tests >= max_tests)
2572 :: return true;
2573 :5253690 :   alias_stats.modref_tests++;
2574 :5253690 :   if (!alias_sets_conflict_p 
(base_set, base_node->base))
alias set checks cheecks are also reasonably cheap and 50% of time avoids 
walking rest of the tree
2575 :2238398 : continue;
2576 :3015292 :   num_tests++;
2577 :: }
2578 :: 
2579 :3933341 :   if (base_node->every_ref)

We hit the loop only 0.15 times per invocation of the function

2580 :: return true;
2581 :: 
2582 :   14943624 :   for (auto ref_node : 
base_node->refs)
2583 :: {
2584 ::   /* Do not repeat same test as 
before.  */
2585 :4381325 :   if ((ref_set != base_set || 
base_node->base != ref_node->ref)

and usually visit only bit more htan one refs
2586 :2548127 :   && tbaa_p && 
flag_strict_aliasing)
2587 :: {
2588 :1840866 :   if (num_tests >= 
max_tests)
2589 :: return true;
2590 :1809594 :   
alias_stats.modref_tests++;
2591 :1809594 :   if 
(!alias_sets_conflict_p (ref_set, ref_node->ref))

Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-06 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
>> >  wrote:
>> >>
>> >> When compiling an optabs.ii at -O2 with a release-checking build,
>> >> we spent a lot of time in call_may_clobber_ref_p.  One of the
>> >> first things the function tries is the get_modref_function_summary
>> >> approach, but that also seems to be the most expensive check.
>> >> At least for the optabs.ii test, most of the things g_m_f_s could
>> >> handle could also be handled by points-to analysis, which is much
>> >> cheaper.
>> >>
>> >> This patch therefore rearranges the tests in approximate order
>> >> of cheapness.  One wart is that points-to analysis can't be
>> >> used on its own for this purpose; it has to be used in
>> >> conjunction with check_fnspec.  This is because the argument
>> >> information in the fnspec is not reflected in the points-to
>> >> information, which instead contains overly optimistic (rather
>> >> than conservatively pessimistic) information.
>> >>
>> >> Specifically, find_func_aliases_for_builtin_call short-circuits
>> >> the handle_rhs_call/handle_call_arg logic and doesn't itself add
>> >> any compensating information about the arguments.  This means that:
>> >>
>> >>   free (foo)
>> >>
>> >> does not have an points-to information for foo (it has an empty
>> >> set instead).
>> >
>> > Huh?  The gimple_call_use/clobber_sets should be conservative
>> > and stand on their own.  Can you share a testcase that breaks when
>> > returning early if independent_pt?  Does this  problem also exist
>> > on the branch?
>>
>> With the attached patch to disable get_modref_function_summary,
>> do the PT query ahead of the fnspec query, and assert if the fnspec
>> query would have found an alias, I get the following results for
>> execute.exp on x86_64-linux-gnu (full log attached):
>>
>> # of expected passes44554
>> # of unexpected failures1710
>> # of unresolved testcases   855
>> # of unsupported tests  62
>>
>> E.g. for execute/2703-1.c, the assertion trips for:
>>
>> 3786  if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p))
>> (gdb) call debug (def_stmt)
>> __builtin_memset (p_3(D), 0, 28);
>> (gdb) call debug (ref.base)
>> *p_3(D)
>> (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, 
>> ref.base)
>> $3 = false
>
> huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for 
> decls.
> In fact it should eventually ICE even ;)

Oops, yes.  That's what comes of trying to be fancy and doing things
in a gdb session rather than printfs. :-)  It is of course the:

  else if ((TREE_CODE (base) == MEM_REF
|| TREE_CODE (base) == TARGET_MEM_REF)
   && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME)
{
  struct ptr_info_def *pi = SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0));
  if (pi
  && !pt_solutions_intersect (gimple_call_clobber_set (call), >pt))
{

path that fails the assert.

> That said, nothing should
> really iniitalize
> the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation time
> or IPA PTA when enabled.  pt_solutuon_reset sets ->anything to true so the
> function should return true ...

Yeah, it works OK in the reset state.

> It looks like Honza added computation of these sets at some point, also maybe
> forgetting to set some bits here.
>
> I prefer to investigate / fix this before refactoring the uses in this way.  
> Can
> you file a bugreport please?  I can reproduce the memcpy FAIL with
> just moving check_fnspec down.

OK, filed as PR103584 (for the list).

Thanks,
Richard

>
> Thanks,
> Richard.
>
>> Setting ASSERTS to 0 gives:
>>
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2  execution test
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fno-use-linker-plugin 
>> -flto-partition=none  execution test
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fuse-linker-plugin 
>> -fno-fat-lto-objects  execution test
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -fomit-frame-pointer 
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -g  execution test
>> FAIL: gcc.c-torture/execute/memcpy-1.c   -Os  execution test
>> FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer 
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>> FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test
>>
>> # of expected passes23138
>> # of unexpected failures8
>> # of unsupported tests  24
>>
>> (same for -m32).
>>
>> Originally I saw it as a bootstrap failure building stage 2
>> build-side libcpp, due to a bogus uninitialised warning.
>>
>> But alongside the memset and free examples, there's:
>>
>>   /* All the following functions do not return pointers, do not
>>  modify the

Re: [RFC][WIP Patch] OpenMP map with iterator + Fortran OpenMP deep mapping / custom allocator (+ Fortran co_reduce)

2021-12-06 Thread Jakub Jelinek via Gcc-patches

On Mon, Dec 06, 2021 at 03:00:30PM +0100, Tobias Burnus wrote:
> This is a RFC/WIP patch about:
> 
> (A) OpenMP (C/C++/Fortran)
>omp target map(iterator(i=n:m),to : x(i))
> 
> (B) Fortran:
> (1)   omp target map(to : dt_var, class_var)
> (2)   omp parallel allocator(my_alloc) firstprivate(class_var)
> (3)  call co_reduce(dt_coarray, my_func)
> 
> The problem with (A) is that there is not a compile-time countable
> number of iterations such that it cannot be easily add to the array
> used to call GOMP_target_ext.
> 
> The problem with (B) is that dt_var can have allocatable components
> which complicates stuff and with recursive types, the number of
> elements it not known at compile time - not with polymorphic types
> as it depends on the recursion depth and dynamic type, respectively.

I think there is no reason why the 3 arrays passed to GOMP_target_ext
(etc., for target data {, enter, exit} too and because this
affects to and from clauses as well, target update as well)
need to be constant size.
We can allocate them as VLA or from heap as well.
I guess only complication for using __builtin_allocate_with_align
would be target data, where the construct body could be using alloca
and we wouldn't want to silently free those allocas at the end of the
construct, though I bet we already have that problem whenever we
privatize some variable length variables on constructs that don't
result in outlined body into a new function, and outlining a body
into a new function will also break alloca across the boundaries.

We do a lot of sorting of the map clauses especially during gimplification,
one question is whether it is ok to sort the whole map clause with iterator
as one clause, or if we'd need to do the sorting at runtime.
With arbitrary lvalue expressions, the clauses with iterator
don't need to be just map(iterator(i=0:n),to : x[i]) but can be e.g.
map(iterator(i=0:n), tofrom : i == 0 ? a : i == 1 ? b : c[i - 2])
etc. (at least in C++, in C I think ?: doesn't give lvalues), or
*(i == 0 ?  : i == 1 ?  : [i - 2]) otherwise, though
I hope that is ok, it isn't much different from such lvalue expressions
when i isn't an iterator but say function parameter or some other variable,
I think we only map value in that case and don't really remap the vars
etc. (but sure, for map(iterator(i=0:n), to : foo(i).a[i].b[i]) we should
follow the rules for []s and .

So, I wouldn't be really afraid of going into dynamic allocation of the
arrays if the count isn't compile time constant.

Another thing is that it would be nice to optimize some most common cases
where some mappings could be described in more compact ways, and that
wouldn't be solely about iterator clause, but also when we start properly
implementing all the mapping nastiness of 5.0 and beyond, like mapping
of references, or the declare mapper stuff etc.
So if we come up with something like array descriptors Fortran has to
describe mapping of some possibly non-contiguous multidimensional array
with strides etc. in a single map element, it will be nice, but I'd
prefer not to outline complex expressions from map's clause as separate
function each, it can use many variables etc. from the parent function
and calling those as callbacks would be too ugly.

Jakub

[PATCH] tree-optimization/103581 - fix masked gather on x86

2021-12-06 Thread Richard Biener via Gcc-patches

The recent fix to PR103527 exposed an issue with how the various
special casing for AVX512 masks in vect_build_gather_load_calls
are handled.  The following makes that more obvious, fixing the
miscompile of 403.gcc.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-12-06  Richard Biener  

PR tree-optimization/103581
* tree-vect-stmts.c (vect_build_gather_load_calls): Properly
guard all the AVX512 mask cases.

* gcc.dg/vect/pr103581.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr103581.c | 59 
 gcc/tree-vect-stmts.c|  4 +-
 2 files changed, 61 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr103581.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr103581.c 
b/gcc/testsuite/gcc.dg/vect/pr103581.c
new file mode 100644
index 000..d072748de31
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr103581.c
@@ -0,0 +1,59 @@
+/* { dg-additional-options "-mavx2 -mtune-ctrl=use_gather" { target 
avx2_runtime } } */
+
+#include "tree-vect.h"
+
+#define MASKGATHER(SUFF, TYPE1, TYPE2) \
+TYPE1 * __attribute__((noipa)) \
+maskgather ## SUFF (int n, TYPE2 *indices, TYPE1 *data) \
+{ \
+  TYPE1 *out = __builtin_malloc (sizeof (TYPE1) * n); \
+  for (int i = 0; i < n; ++i) \
+{ \
+  TYPE2 d = indices[i]; \
+  if (d > 1) \
+out[i] = data[d]; \
+} \
+  return out; \
+}
+
+MASKGATHER(udiusi, unsigned long long, unsigned int)
+MASKGATHER(usiusi, unsigned int, unsigned int)
+MASKGATHER(udiudi, unsigned long long, unsigned long long)
+MASKGATHER(usiudi, unsigned int, unsigned long long)
+
+int
+main()
+{
+  check_vect ();
+
+unsigned int idx4[32], data4[32];
+  unsigned long long idx8[32], data8[32];
+  for (int i = 0; i < 32; ++i)
+{
+  idx4[i] = i;
+  idx8[i] = i;
+  data4[i] = i;
+  data8[i] = i;
+}
+  unsigned long long *resudiusi = maskgatherudiusi (16, idx4, data8);
+  unsigned int *resusiusi = maskgatherusiusi (16, idx4, data4);
+  unsigned long long *resudiudi = maskgatherudiudi (16, idx8, data8);
+  unsigned int *resusiudi = maskgatherusiudi (16, idx8, data4);
+  for (int i = 0; i < 16; ++i)
+{
+  unsigned int d = idx4[i];
+  if (d > 1)
+{
+  if (resudiusi[i] != data4[d])
+__builtin_abort ();
+  if (resudiudi[i] != data4[d])
+__builtin_abort ();
+  if (resusiudi[i] != data4[d])
+__builtin_abort ();
+  if (resusiusi[i] != data4[d])
+__builtin_abort ();
+}
+}
+  return 0;
+}
+
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 84c6d9777db..8c427174b37 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2785,7 +2785,7 @@ vect_build_gather_load_calls (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   ncopies *= 2;
 
-  if (mask && masktype == real_masktype)
+  if (mask && VECTOR_TYPE_P (real_masktype))
{
  for (int i = 0; i < count; ++i)
sel[i] = i | (count / 2);
@@ -2882,7 +2882,7 @@ vect_build_gather_load_calls (vec_info *vinfo, 
stmt_vec_info stmt_info,
  mask_op = var;
}
}
- if (modifier == NARROW && masktype != real_masktype)
+ if (modifier == NARROW && !VECTOR_TYPE_P (real_masktype))
{
  var = vect_get_new_ssa_name (mask_halftype, vect_simple_var);
  gassign *new_stmt
-- 
2.31.1

Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-06 Thread Richard Biener via Gcc-patches

On Mon, Dec 6, 2021 at 4:03 PM Richard Biener
 wrote:
>
> On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
>  wrote:
> >
> > Richard Biener  writes:
> > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
> > >  wrote:
> > >>
> > >> When compiling an optabs.ii at -O2 with a release-checking build,
> > >> we spent a lot of time in call_may_clobber_ref_p.  One of the
> > >> first things the function tries is the get_modref_function_summary
> > >> approach, but that also seems to be the most expensive check.
> > >> At least for the optabs.ii test, most of the things g_m_f_s could
> > >> handle could also be handled by points-to analysis, which is much
> > >> cheaper.
> > >>
> > >> This patch therefore rearranges the tests in approximate order
> > >> of cheapness.  One wart is that points-to analysis can't be
> > >> used on its own for this purpose; it has to be used in
> > >> conjunction with check_fnspec.  This is because the argument
> > >> information in the fnspec is not reflected in the points-to
> > >> information, which instead contains overly optimistic (rather
> > >> than conservatively pessimistic) information.
> > >>
> > >> Specifically, find_func_aliases_for_builtin_call short-circuits
> > >> the handle_rhs_call/handle_call_arg logic and doesn't itself add
> > >> any compensating information about the arguments.  This means that:
> > >>
> > >>   free (foo)
> > >>
> > >> does not have an points-to information for foo (it has an empty
> > >> set instead).
> > >
> > > Huh?  The gimple_call_use/clobber_sets should be conservative
> > > and stand on their own.  Can you share a testcase that breaks when
> > > returning early if independent_pt?  Does this  problem also exist
> > > on the branch?
> >
> > With the attached patch to disable get_modref_function_summary,
> > do the PT query ahead of the fnspec query, and assert if the fnspec
> > query would have found an alias, I get the following results for
> > execute.exp on x86_64-linux-gnu (full log attached):
> >
> > # of expected passes44554
> > # of unexpected failures1710
> > # of unresolved testcases   855
> > # of unsupported tests  62
> >
> > E.g. for execute/2703-1.c, the assertion trips for:
> >
> > 3786  if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p))
> > (gdb) call debug (def_stmt)
> > __builtin_memset (p_3(D), 0, 28);
> > (gdb) call debug (ref.base)
> > *p_3(D)
> > (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, 
> > ref.base)
> > $3 = false
>
> huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for 
> decls.
> In fact it should eventually ICE even ;)  That said, nothing should
> really iniitalize
> the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation time
> or IPA PTA when enabled.  pt_solutuon_reset sets ->anything to true so the
> function should return true ...
>
> It looks like Honza added computation of these sets at some point, also maybe
> forgetting to set some bits here.
>
> I prefer to investigate / fix this before refactoring the uses in this way.  
> Can
> you file a bugreport please?  I can reproduce the memcpy FAIL with
> just moving check_fnspec down.

So the issue seems that for
# .MEM_22 = VDEF <.MEM_19>
memcpy (, , 1024);

determine_global_memory_access determines the stmt doesn't write
to global memory (well, yes!), but it fails to add the actual arguments
to the set of clobbered vars.

Honza, can you please look into this?  Seems to be caused
by g:008e7397dad971c03c08fc1b0a4a98fddccaaed8

Richard.

> Thanks,
> Richard.
>
> > Setting ASSERTS to 0 gives:
> >
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2  execution test
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fno-use-linker-plugin 
> > -flto-partition=none  execution test
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fuse-linker-plugin 
> > -fno-fat-lto-objects  execution test
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -fomit-frame-pointer 
> > -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -g  execution test
> > FAIL: gcc.c-torture/execute/memcpy-1.c   -Os  execution test
> > FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer 
> > -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> > FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test
> >
> > # of expected passes23138
> > # of unexpected failures8
> > # of unsupported tests  24
> >
> > (same for -m32).
> >
> > Originally I saw it as a bootstrap failure building stage 2
> > build-side libcpp, due to a bogus uninitialised warning.
> >
> > But alongside the memset and free examples, there's:
> >
> >   /* All the following functions do not return pointers, do not
> >  modify the points-to sets of memory reachable from their
> >  arguments and do not add to the ESCAPED solution.  */
> >

Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-06 Thread Richard Biener via Gcc-patches

On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
> >  wrote:
> >>
> >> When compiling an optabs.ii at -O2 with a release-checking build,
> >> we spent a lot of time in call_may_clobber_ref_p.  One of the
> >> first things the function tries is the get_modref_function_summary
> >> approach, but that also seems to be the most expensive check.
> >> At least for the optabs.ii test, most of the things g_m_f_s could
> >> handle could also be handled by points-to analysis, which is much
> >> cheaper.
> >>
> >> This patch therefore rearranges the tests in approximate order
> >> of cheapness.  One wart is that points-to analysis can't be
> >> used on its own for this purpose; it has to be used in
> >> conjunction with check_fnspec.  This is because the argument
> >> information in the fnspec is not reflected in the points-to
> >> information, which instead contains overly optimistic (rather
> >> than conservatively pessimistic) information.
> >>
> >> Specifically, find_func_aliases_for_builtin_call short-circuits
> >> the handle_rhs_call/handle_call_arg logic and doesn't itself add
> >> any compensating information about the arguments.  This means that:
> >>
> >>   free (foo)
> >>
> >> does not have an points-to information for foo (it has an empty
> >> set instead).
> >
> > Huh?  The gimple_call_use/clobber_sets should be conservative
> > and stand on their own.  Can you share a testcase that breaks when
> > returning early if independent_pt?  Does this  problem also exist
> > on the branch?
>
> With the attached patch to disable get_modref_function_summary,
> do the PT query ahead of the fnspec query, and assert if the fnspec
> query would have found an alias, I get the following results for
> execute.exp on x86_64-linux-gnu (full log attached):
>
> # of expected passes44554
> # of unexpected failures1710
> # of unresolved testcases   855
> # of unsupported tests  62
>
> E.g. for execute/2703-1.c, the assertion trips for:
>
> 3786  if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p))
> (gdb) call debug (def_stmt)
> __builtin_memset (p_3(D), 0, 28);
> (gdb) call debug (ref.base)
> *p_3(D)
> (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, 
> ref.base)
> $3 = false

huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for decls.
In fact it should eventually ICE even ;)  That said, nothing should
really iniitalize
the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation time
or IPA PTA when enabled.  pt_solutuon_reset sets ->anything to true so the
function should return true ...

It looks like Honza added computation of these sets at some point, also maybe
forgetting to set some bits here.

I prefer to investigate / fix this before refactoring the uses in this way.  Can
you file a bugreport please?  I can reproduce the memcpy FAIL with
just moving check_fnspec down.

Thanks,
Richard.

> Setting ASSERTS to 0 gives:
>
> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2  execution test
> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  execution test
> FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test
> FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -g  execution test
> FAIL: gcc.c-torture/execute/memcpy-1.c   -Os  execution test
> FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test
>
> # of expected passes23138
> # of unexpected failures8
> # of unsupported tests  24
>
> (same for -m32).
>
> Originally I saw it as a bootstrap failure building stage 2
> build-side libcpp, due to a bogus uninitialised warning.
>
> But alongside the memset and free examples, there's:
>
>   /* All the following functions do not return pointers, do not
>  modify the points-to sets of memory reachable from their
>  arguments and do not add to the ESCAPED solution.  */
>   case BUILT_IN_SINCOS:
>   case BUILT_IN_SINCOSF:
>   ...
> return true;
>
> (always an empty set), whereas check_fnspec takes errno into account.
>
> > I wonder if, at this point, we should split the patch up to do the
> > trivial move of the ao_ref_base dependent and volatile checks
> > and leave this more complicated detail for a second patch?
>
> Yeah, that'd work, but in practice it was the PT part that was the
> high-value part.  IIRC the early base checks caught less than 1% of
> cases (I can rerun tonight to get exact numbers).
>
> Thanks,
> Richard
>

Re: Patch ping related to OpenMP

2021-12-06 Thread Tobias Burnus


First, thanks for the four reviews. Secondly, I missed one patch –
hence, reposted with all three pending patches:

* Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584894.html

and:

On 01.12.21 17:34, Tobias Burnus wrote:

* [PATCH] C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to
'target' construct
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585361.html

* [PATCH 00/16] OpenMP: lvalues in "map" clauses and struct handling
rework
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585439.html


Tobias

PS: Thanks for the reviews of the following patches:

* [PATCH, v5, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120
v5]
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584602.html

* [PATCH, v2, OpenMP 5.0] Remove array section base-pointer mapping
semantics, and other front-end adjustments (mainline trunk)
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584994.html

* libgomp.texi: Update OMP_PLACES
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582121.html

* [patch] Fortran/OpenMP: Support most of 5.1 atomic extensions
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584450.html

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH v2] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2021-12-06 Thread Peter Bergner via Gcc-patches

On 12/6/21 7:06 AM, Segher Boessenkool wrote:
> On Fri, Dec 03, 2021 at 11:46:53AM +0800, Kewen.Lin wrot
>> Without this fix, bar inlines foo but get the error as below:
>>
>> In function ‘foo’,
>> inlined from ‘bar’ at test.c:8:8:
>> test.c:3:9: error: ‘__builtin_ttest’ requires the ‘-mhtm’ option
>> 3 |   *b += __builtin_ttest();
>>   | ^
>>
>> Since bar doesn't support foo's required HTM feature.
>>
>> With this fix, bar doesn't inline foo and the case gets compiled well.
> 
> And if you inline something no-htm into something that allows htm?  That
> should work fine, be allowed just fine.

It is only ok to inline a function to be compiled with no-htm into a
function that will be compiled with htm if the no-htm function is
being compiled with no-htm by default.  If the user explicitly used
-fno-htm on the function, then we should not inline the htm function
into it.  Ditto for the other way around.  Meaning, the caller's
option flags should be a superset of the callee's flags, but the
explicit flags used on both functions need to match exactly.

Peter

Re: [Patch 3/8, Arm, GCC] Add option -mbranch-protection. [Was RE: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.]

2021-12-06 Thread Andrea Corallo via Gcc-patches

Richard Earnshaw via Gcc-patches  writes:

[...]

>> Thanks for the reviews.
>> Add -mbranch-protection option.  This option enables the
>> code-generation of
>> pointer signing and authentication instructions in function prologues and
>> epilogues.
>> 2021-10-25  Tejas Belagod  
>> gcc/ChangeLog:
>>  * config/arm/arm.c (arm_configure_build_target): Parse and
>> validate
>>  -mbranch-protection option and initialize appropriate data structures.
>>  * config/arm/arm.opt: New option -mbranch-protection.
>
> .../arm.opt (-mbranch-protection) : New option.
>
>>  * doc/invoke.texi: Document -mbranch-protection.
>
> .../invoke.texi (Arm Options): Document it.
>
>> Tested the following configurations, OK for trunk?
>> -mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
>> -marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
>> mcmodel=small and tiny
>> aarch64-none-linux-gnu native test and bootstrap
>> 
>
> +@item
> -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}][+@var{bti}]|@var{bti}[+@var{pac-ret}[+@var{leaf}]]
> +@opindex mbranch-protection
> +Select the branch protection features to use.
> +@samp{none} is the default and turns off all types of branch protection.
> +@samp{standard} turns on all types of branch protection features.  If
> a feature
> +has additional tuning options, then @samp{standard} sets it to its standard
> +level.
> +@samp{pac-ret[+@var{leaf}]} turns on return address signing to its standard
> +level: signing functions that save the return address to memory (non-leaf
> +functions will practically always do this).  The optional argument
> @samp{leaf}
> +can be used to extend the signing to include leaf functions.
> +@samp{bti} turns on branch target identification mechanism.
>
> This doesn't really use the right documentation style.  Closer would be:

[...]

Hi Richard,

thanks for reviewing, here is the updated patch.

Best Regards

  Andrea

>From 3383a1e265b7b695c5d6ce6115ad66eed2a4cb48 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Mon, 6 Dec 2021 11:39:03 +0100
Subject: [PATCH] Add option -mbranch-protection.

Add -mbranch-protection option.  This option enables the
code-generation of pointer signing and authentication instructions in
function prologues and epilogues.

gcc/ChangeLog:

* config/arm/arm.c (arm_configure_build_target): Parse and validate
-mbranch-protection option and initialize appropriate data structures.
* config/arm/arm.opt (-mbranch-protection): New option.
* doc/invoke.texi (Arm Options): Document it.

Co-Authored-By: Tejas Belagod  
Co-Authored-By: Richard Earnshaw 
---
 gcc/config/arm/arm.c   | 11 +++
 gcc/config/arm/arm.opt |  4 
 gcc/doc/invoke.texi| 35 +--
 3 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 96186b8ad02..ee22acddee5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3237,6 +3237,17 @@ arm_configure_build_target (struct arm_build_target 
*target,
   tune_opts = strchr (opts->x_arm_tune_string, '+');
 }
 
+  if (opts->x_arm_branch_protection_string)
+{
+  aarch_validate_mbranch_protection (opts->x_arm_branch_protection_string);
+
+  if (aarch_ra_sign_key != AARCH_KEY_A)
+   {
+ warning (0, "invalid key type for %<-mbranch-protection=%>");
+ aarch_ra_sign_key = AARCH_KEY_A;
+   }
+}
+
   if (arm_selected_arch)
 {
   arm_initialize_isa (target->isa, arm_selected_arch->common.isa_bits);
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 5c5b4f3ae06..4f2754c3e84 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -313,6 +313,10 @@ mbranch-cost=
 Target RejectNegative Joined UInteger Var(arm_branch_cost) Init(-1)
 Cost to assume for a branch insn.
 
+mbranch-protection=
+Target RejectNegative Joined Var(arm_branch_protection_string) Save
+Use branch-protection features.
+
 mgeneral-regs-only
 Target RejectNegative Mask(GENERAL_REGS_ONLY) Save
 Generate code which uses the core registers only (r0-r14).
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3257df5596d..a808c6bb599 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -820,7 +820,9 @@ Objective-C and Objective-C++ Dialects}.
 -mpure-code @gol
 -mcmse @gol
 -mfix-cmse-cve-2021-35465 @gol
--mfdpic}
+-mfdpic @gol
+-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]
+[+@var{bti}]|@var{bti}[+@var{pac-ret}[+@var{leaf}]]}
 
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu}  -mabsdata  -maccumulate-args @gol
@@ -21187,7 +21189,36 @@ The opposite @option{-mno-fdpic} option is useful (and 
required) to
 build the Linux kernel using the same (@code{arm-*-uclinuxfdpiceabi})
 toolchain as the one used to build the userland programs.
 
-@end table
+@item

Re: [PATCH v2] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2021-12-06 Thread Peter Bergner via Gcc-patches

On 12/6/21 3:35 AM, Martin Liška wrote:
> On 12/4/21 00:23, Peter Bergner wrote:
>> I thought Martin and richi mentioned that target attribute options
>> are treated as if they are appended to the end of the command line
>> options, so they can potentially override earlier options, but they
>> don't actually ignore them?
> 
> No, the described behavior is true for optimize attribute:
> 
> optimize (string, …)
> ...
> The optimize attribute arguments of a function behave behave as if appended 
> to the command-line.
> 
> but:
> 
> target (string, …)
> ...
> The original target command-line options are ignored.

Ok, you learn something new every day.  Thanks for setting me straight!

Peter

[RFC][WIP Patch] OpenMP map with iterator + Fortran OpenMP deep mapping / custom allocator (+ Fortran co_reduce)

2021-12-06 Thread Tobias Burnus


This is a RFC/WIP patch about:

(A) OpenMP (C/C++/Fortran)
   omp target map(iterator(i=n:m),to : x(i))

(B) Fortran:
(1)   omp target map(to : dt_var, class_var)
(2)   omp parallel allocator(my_alloc) firstprivate(class_var)
(3)  call co_reduce(dt_coarray, my_func)

The problem with (A) is that there is not a compile-time countable
number of iterations such that it cannot be easily add to the array
used to call GOMP_target_ext.

The problem with (B) is that dt_var can have allocatable components
which complicates stuff and with recursive types, the number of
elements it not known at compile time - not with polymorphic types
as it depends on the recursion depth and dynamic type, respectively.


Comments/questions/remarks ... to the proposal below?

Regarding mapping, I currently have no idea how to handle
the virtual table. Thoughts?

 * * *

The idea for OpenMP mapping is a callback function - such that

integer function f() result(ires)
  implicit none
  integer :: a
  !$omp target  map(iterator(i=1:5), to: a)
  !$omp end target
  ires = 7
end

becomes

  #pragma omp target map(iterator(integer(kind=4) i=1:5:1):to:a)

and then during gimplify:

  #pragma omp target num_teams(1) thread_limit(0) 
map(map_function:f_._omp_mapfn.0 [len: 0])

with

unsigned long f_._omp_mapfn.0 (unsigned long (*) (void *) cb_fn,
   void * token, void * base, unsigned short flags)
{
...

with the loop around the cb_fn call and flag = GOMP_MAP_TO.

(Not fully working yet. ME part needs still to generate the
loop similar to depend or affinity. For C/C++, the basic
parsing is done but some more code changes are needed
in the FE.)


 * * *

Fortran - with an OpenMP example:

module m
  implicit none (type, external)
  type t3
  end type t3
  type t
class(t3), allocatable :: cx
type(t3), pointer :: ptx
  end type t
end module m

use m
implicit none (type, external)
class(t), allocatable :: var

!$omp target map(to:var)
  if (allocated(var)) stop 1
!$omp end target
end


The idea is that this becomes:

  #pragma omp target map(to:var) map(map_function:var._vptr->_callback [len: 
1]) map(to:var [len: 0])

That's:
* 'var' is first normally mapped
* Then the map function is added which gets 'var' as argument


(For an array, I plan to add an internal function which calls the
callback function in a scalarization loop.)


On the Fortran side - this requires in the vtable a new entry,
(*ABI breakage*) which points to:

integer(kind=8) __callback_m_T (
   integer(kind=8) (*) (void *, void *, integer(kind=8),
  void (*) (void), integer(kind=2)) cb,
   void * token, struct t & restrict scalar, integer(kind=4) f_flags)
{
  __result___callback_m_T = 0;
  if (scalar->cx._data != 0B)
{
void * D.4384;
D.4384 = (void *) scalar->cx._data;
__result___callback_m_T = cb (token, D.4384, scalar->cx._vptr->_size, 
0B, 0)
  + __result___callback_m_T;
  __result___callback_m_T = cb (token, *scalar->cx._data, 0, 
*scalar->cx._vptr->_callback, 0)
+ __result___callback_m_T;
}
  if (scalar->ptx != 0B)
{
void * D.4386;
D.4386 = (void *) scalar->ptx;
__result___callback_m_T = cb (token, D.4386, 0, 0B, 0) + 
__result___callback_m_T;
}
  return __result___callback_m_T;
}


That is:

* For pointer, the CB is called with SIZE = 0, permitting the caller to
  remap pointer - or ignore the callback call.
* For allocatables, it passes the SIZE, permitting to map the allocatable
* If the allocatable is a CLASS or has allocatable components, cb is
  called with a callback function - which that those can be mapped as well.
  (and SIZE = 0)

(The GOMP_MAP_TO needs to be handled by libgomp, e.g. by putting it into
the void *token.)


The vtable's callback function can then also be used with
* OpenMP ALLOCATOR or for
* deep copying with CO_REDUCE.


Question: Does this way of passing make sense or not?
Comments?


Tobias


PS: The patch has a lot of pieces in places, but still lacks both
some glue code and some other bit. :-/
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
 gcc/c/c-parser.c  |  69 -
 gcc/cp/parser.c   |  70 +++--
 gcc/fortran/class.c   | 351 ++
 gcc/fortran/dump-parse-tree.c |  14 +-
 gcc/fortran/gfortran.h|   1 +
 gcc/fortran/intrinsic.c   |   2 +-
 gcc/fortran/module.c  |   9 +-
 gcc/fortran/openmp.c  |  41 -
 gcc/fortran/resolve.c |   2 +-
 gcc/fortran/trans-expr.c  |   5 +
 gcc/fortran/trans-intrinsic.c |   3 +-
 gcc/fortran/trans-openmp.c|  59 ++-
 gcc/fortran/trans.h   |   1 +
 gcc/gimplify.c

Re: [PATCH] ranger: Optimise irange_union

2021-12-06 Thread Andrew MacLeod via Gcc-patches


On 12/5/21 16:55, Richard Sandiford via Gcc-patches wrote:


This gives a ~2% compile-time speed up with the test above.

I also tried adding a path for two single-pair ranges, but it
wasn't a win.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


Thanks for these performance tweaks!

Re: [PATCH v2] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2021-12-06 Thread Segher Boessenkool

On Fri, Dec 03, 2021 at 11:46:53AM +0800, Kewen.Lin wrote:
> >> This patch is to fix the inconsistent behaviors for non-LTO mode
> >> and LTO mode.  As Martin pointed out, currently the function
> >> rs6000_can_inline_p simply makes it inlinable if callee_tree is
> >> NULL, but it's wrong, we should use the command line options
> >> from target_option_default_node as default.
> > 
> > This is not documented.
> 
> Yeah, but according to the document for the target attribute [1],
> "Multiple target back ends implement the target attribute to specify
> that a function is to be compiled with different target options than
> specified on the command line. The original target command-line options
> are ignored. ", it seems to say the function without any target
> attribute/pragma will be compiled with target options specified on the
> command line.  I think it's a normal expectation for users.

The whole target_option_default_node is not documented, I meant.

> Excepting for the inconsistent behaviors between LTO and non-LTO,
> it can also make the below case different.
> 
> // Option: -O2 -mcpu=power8 -mhtm
> 
> int foo(int *b) {
>   *b += __builtin_ttest();
>   return *b;
> }
> 
> __attribute__((target("no-htm"))) int bar(int *a) {
>   *a = foo(a);
>   return 0;
> }
> 
> Without this fix, bar inlines foo but get the error as below:
> 
> In function ‘foo’,
> inlined from ‘bar’ at test.c:8:8:
> test.c:3:9: error: ‘__builtin_ttest’ requires the ‘-mhtm’ option
> 3 |   *b += __builtin_ttest();
>   | ^
> 
> Since bar doesn't support foo's required HTM feature.
> 
> With this fix, bar doesn't inline foo and the case gets compiled well.

And if you inline something no-htm into something that allows htm?  That
should work fine, be allowed just fine.

> >> It also replaces
> >> rs6000_isa_flags with the one from target_option_default_node
> >> when caller_tree is NULL as rs6000_isa_flags could probably
> >> change since initialization.
> > 
> > Is this enough then?  Are there no other places that use
> > rs6000_isa_flags?  Is it correct for us to have that variable at all
> > anymore?  Etc.
> 
> Good question, I think the answer is yes.  I digged into it and found
> the current target option save/restore scheme would keep rs6000_isa_flags
> to be the same as the one saved in target_option_default_node for the context
> of inlining.  So I put one assertion as below:
> 
> if (!caller_tree) {
>   caller_tree = target_option_default_node;
>   gcc_assert (rs6000_isa_flags
>   == TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags);
> }
> 
> And kicked off regression testings on both BE and LE, the result showed
> only one same failure, which seems to be a latent bug.  I've filed PR103515
> for it.

Ah cool :-)  Please send a patch to add that asser then (well, once we
can bootstrap with it ;-) )

> This bug also indicates it's better to use target_option_default_node
> rather than rs6000_isa_flags when the caller_tree is NULL.  Since
> the target_option_default_node is straightforward and doesn't rely
> on the assumption that rs6000_isa_flags would be kept as the default
> one correctly, it doesn't suffer from bugs like PR103515.

So we should not ever use rs6000_isa_flags anymore?

> > The fusion ones are obvious.  The other two are not.  Please put those
> > two more obviously separate (not somewhere in the sea of fusion
> > options), and some comment wouldn't hurt either.  You can make it
> > separate statements even, make it really stand out.
> > 
> 
> Fixed by splitting it into: fusion_options_mask and optimization_options_mask.
> I'm not happy for the later name, but poor to get a better in mind.  Do you
> have any suggestions on the name and words?

You dont have to split it into variables, as you found out then you get
the usual naming problem.  But you can just split it in the code:

  if (important_condition
  || another_important_one
  /* comment explaining things */
  || bla1 || bla2 || bla3 || bla4 || bla5)

> > Why are there OPTION_MASKs for separate P10 fusion types here, as well as
> > MASK_P10_FUSION?
> 
> Mike helped to explain the history, I've updated all of them using 
> OPTION_MASK_
> to avoid potential confusion.

That is one thing, sure, but why are both needed?  Both the "main" flag,
and the "details" flags.

(The latter should soon go away btw).


Segher

Re: [PATCH] D: fix UBSAN

2021-12-06 Thread Martin Liška


On 12/6/21 13:03, Martin Liška wrote:

gcc/d/expr.cc:2596:9: runtime error: null pointer passed as argument 2, which 
is declared to never be null


I forgot to mention that it happens for gcc/testsuite/gdc.dg/attr_optimize1.d
test-case.

Cheers,
Martin

[PATCH] D: fix UBSAN

2021-12-06 Thread Martin Liška


Fixes:
gcc/d/expr.cc:2596:9: runtime error: null pointer passed as argument 2, which 
is declared to never be null

Ready for master?
Thanks,
Martin

gcc/d/ChangeLog:

* expr.cc: Call memcpy only when length != 0.
---
 gcc/d/expr.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index 31680564bdd..22acbc07cde 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -2593,7 +2593,8 @@ public:
/* Copy the string contents to a null terminated string.  */
dinteger_t length = (e->len * e->sz);
char *string = XALLOCAVEC (char, length + 1);
-   memcpy (string, e->string, length);
+   if (length > 0)
+ memcpy (string, e->string, length);
string[length] = '\0';
 
 	/* String value and type includes the null terminator.  */

--
2.34.1

[PATCH] tree-optimization/103544 - SLP reduction chain as SLP reduction issue

2021-12-06 Thread Richard Biener via Gcc-patches

When SLP reduction chain vectorization support added handling of
an outer conversion in the chain picking a failed reduction up
as SLP reduction that broke the invariant that the whole reduction
was forward reachable.  The following plugs that hole noting
a future enhancement possibility.

Boostrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk sofar.

2021-12-06  Richard Biener  

PR tree-optimization/103544
* tree-vect-slp.c (vect_analyze_slp): Only add a SLP reduction
opportunity if the stmt in question is the reduction root.
(dot_slp_tree): Add missing check for NULL child.

* gcc.dg/vect/pr103544.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr103544.c | 24 
 gcc/tree-vect-slp.c  | 12 +---
 2 files changed, 33 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr103544.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr103544.c 
b/gcc/testsuite/gcc.dg/vect/pr103544.c
new file mode 100644
index 000..c8bdee86e77
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr103544.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-additional-options "-march=haswell" { target x86_64-*-* i?86-*-* } } */
+
+int crash_me(char* ptr, unsigned long size)
+{
+  short result[16] = {0};
+
+  unsigned long no_iters = 0;
+  for(unsigned long i = 0; i < size - 12; i+= 13){
+  for(unsigned long j = 0; j < 12; j++){
+ result[j] += ptr[i + j] - '0';
+  }
+  no_iters++;
+  }
+
+  int result_int = 0;
+  for(int j = 0; j < 12; j++){
+  int bit_value = result[j] > no_iters/2 ? 1 : 0;
+  result_int |= bit_value;
+  }
+
+  return result_int;
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index bc22ffeed82..b912c3577df 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2537,7 +2537,8 @@ dot_slp_tree (FILE *f, slp_tree node, hash_set 
)
 fprintf (f, "\"%p\" -> \"%p\";", (void *)node, (void *)child);
 
   for (slp_tree child : SLP_TREE_CHILDREN (node))
-dot_slp_tree (f, child, visited);
+if (child)
+  dot_slp_tree (f, child, visited);
 }
 
 DEBUG_FUNCTION void
@@ -3418,8 +3419,13 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
vinfo = next;
  }
STMT_VINFO_DEF_TYPE (first_element) = vect_internal_def;
-   /* It can be still vectorized as part of an SLP reduction.  */
-   loop_vinfo->reductions.safe_push (last);
+   /* It can be still vectorized as part of an SLP reduction.
+  ???  But only if we didn't skip a conversion around the group.
+  In that case we'd have to reverse engineer that conversion
+  stmt following the chain using reduc_idx and from the PHI
+  using reduc_def.  */
+   if (STMT_VINFO_DEF_TYPE (last) == vect_reduction_def)
+ loop_vinfo->reductions.safe_push (last);
  }
 
   /* Find SLP sequences starting from groups of reductions.  */
-- 
2.31.1

Re: [committed] avr: Fix AVR build [PR71934]

2021-12-06 Thread Martin Liška


On 12/6/21 11:23, Jakub Jelinek wrote:

|This patch fixes that.|


Thanks for the quick fix!

Martin

Re: [Patch 1/8, Arm, AArch64, GCC] Refactor mbranch-protection option parsing and make it common to AArch32 and AArch64 backends. [Was RE: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.]

2021-12-06 Thread Andrea Corallo via Gcc-patches

Richard Earnshaw  writes:

> On 30/11/2021 11:11, Andrea Corallo via Gcc-patches wrote:
>> Tejas Belagod via Gcc-patches  writes:
>> 
>>> Ping for this series.
>>>
>>> Thanks,
>>> Tejas.
>> Hi all,
>> pinging this series.
>> BR
>>Andrea
>> 

Hi Richard,

thanks for reviewing.

> It would be easier to find the 'series' if the messages were properly
> threaded together...

Feel free to let me know if you want me to ping the other patches as
well so they all pops up.

Thanks

  Andrea

[PATCH] Fix hash_map::traverse overload

2021-12-06 Thread Matthias Kretz

While reading the hash_map code I noticed this inconsistency. Bootstrapped and 
regtested on x86_64. OK for trunk?


The hash_map::traverse overload taking a non-const Value pointer breaks
if the callback returns false. The other overload should behave the
same.

Signed-off-by: Matthias Kretz 

gcc/ChangeLog:

* hash-map.h (hash_map::traverse): Let both overloads behave the
same.
---
 gcc/hash-map.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 stdₓ::simd
──diff --git a/gcc/hash-map.h b/gcc/hash-map.h
index dd039f10343..6e1c7b6e071 100644
--- a/gcc/hash-map.h
+++ b/gcc/hash-map.h
@@ -225,7 +225,8 @@ public:
 {
   for (typename hash_table::iterator iter = m_table.begin ();
 	   iter != m_table.end (); ++iter)
-	f ((*iter).m_key, (*iter).m_value, a);
+	if (!f ((*iter).m_key, (*iter).m_value, a))
+	  break;
 }
 
   template

[committed] avr: Fix AVR build [PR71934]

2021-12-06 Thread Jakub Jelinek via Gcc-patches

Hi!

On Mon, Dec 06, 2021 at 11:00:30AM +0100, Martin Liška wrote:
> Jakub, I think the patch broke avr-linux target:
>
> g++  -fno-PIE -c   -g   -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   
> -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall 
> -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-erro
> /home/marxin/Programming/gcc/gcc/config/avr/avr.c: In function ‘void 
> avr_output_data_section_asm_op(const void*)’:
> /home/marxin/Programming/gcc/gcc/config/avr/avr.c:10097:26: error: invalid 
> conversion from ‘const void*’ to ‘const char*’ [-fpermissive]

This patch fixes that.
Tested with a cross-compiler from x86_64-linux that it builds, committed
as obvious.
I didn't notice that because avr isn't creating those sections
with that hook as argument, but is overriding the hooks in already created
section structure.

2021-12-06  Jakub Jelinek  

PR pch/71934
* config/avr/avr.c (avr_output_data_section_asm_op,
avr_output_bss_section_asm_op): Change argument type from const void *
to const char *.

--- gcc/config/avr/avr.c.jj 2021-12-03 11:03:10.479503638 +0100
+++ gcc/config/avr/avr.c2021-12-06 11:16:31.102208406 +0100
@@ -10089,7 +10089,7 @@ avr_asm_asm_output_aligned_bss (FILE *fi
to track need of __do_copy_data.  */
 
 static void
-avr_output_data_section_asm_op (const void *data)
+avr_output_data_section_asm_op (const char *data)
 {
   avr_need_copy_data_p = true;
 
@@ -10102,7 +10102,7 @@ avr_output_data_section_asm_op (const vo
to track need of __do_clear_bss.  */
 
 static void
-avr_output_bss_section_asm_op (const void *data)
+avr_output_bss_section_asm_op (const char *data)
 {
   avr_need_clear_bss_p = true;
 


Jakub

Re: [PATCH]middle-end cse: Make sure duplicate elements are not entered into the equivalence set [PR103404]

2021-12-06 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
> Hi,
>
> As discussed off-line this can only happen with a V1 mode, so here's a much 
> simpler patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
>
>
> Ok for master?

OK, thanks.

Richard

> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR rtl-optimization/103404
> * cse.c (find_sets_in_insn): Don't select elements out of a V1 mode
> subreg.
>
> gcc/testsuite/ChangeLog:
>
> PR rtl-optimization/103404
> * gcc.target/i386/pr103404.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/cse.c b/gcc/cse.c
> index 
> c1c7d0ca27b73c4b944b4719f95fece74e0358d5..dc5d5aed047c7776f44b159a4286390d6499c18d
>  100644
> --- a/gcc/cse.c
> +++ b/gcc/cse.c
> @@ -4275,7 +4275,12 @@ find_sets_in_insn (rtx_insn *insn, vec 
> *psets)
>else if (GET_CODE (SET_SRC (x)) == CALL)
> ;
>else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR
> -  && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL)
> +  && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL
> +  /* Prevent duplicates from being generated if the type is a V1
> + type and a subreg.  Folding this will result in the same
> + element as folding x itself.  */
> +  && !(SUBREG_P (SET_DEST (x))
> +   && known_eq (GET_MODE_NUNITS (GET_MODE (SET_SRC (x))), 
> 1)))
> {
>   /* First register the vector itself.  */
>   add_to_set (psets, x);
> diff --git a/gcc/testsuite/gcc.target/i386/pr103404.c 
> b/gcc/testsuite/gcc.target/i386/pr103404.c
> new file mode 100644
> index 
> ..66f33645301db09503fc0977fd0f061a19e56ea5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103404.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Og -fcse-follow-jumps -fno-dce 
> -fno-early-inlining -fgcse -fharden-conditional-branches 
> -frerun-cse-after-loop -fno-tree-ccp -mavx5124fmaps -std=c99 -w" } */
> +
> +typedef unsigned __attribute__((__vector_size__ (4))) U;
> +typedef unsigned __attribute__((__vector_size__ (16))) V;
> +typedef unsigned __attribute__((__vector_size__ (64))) W;
> +
> +int x, y;
> +
> +V v;
> +W w;
> +
> +inline
> +int bar (U a)
> +{
> +  a |= x;
> +  W k =
> +__builtin_shufflevector (v, 5 / a,
> +2, 4, 0, 2, 4, 1, 0, 1,
> +1, 2, 1, 3, 0, 4, 4, 0);
> +  w = k;
> +  y = 0;
> +}
> +
> +int
> +foo ()
> +{
> +  bar ((U){0x});
> +  for (unsigned i; i < sizeof (foo);)
> +;
> +}
> +

Re: [PATCH] pch, v2: Add support for PCH for relocatable executables

2021-12-06 Thread Martin Liška


On 11/18/21 09:04, Jakub Jelinek via Gcc-patches wrote:

On Mon, Nov 08, 2021 at 08:48:07PM +0100, Jakub Jelinek via Gcc-patches wrote:

On Mon, Nov 08, 2021 at 12:46:04PM +0100, Jakub Jelinek via Gcc-patches wrote:

So, if we want to make PCH work for PIEs, I'd say we can:
1) add a new GTY option, say callback, which would act like
skip for non-PCH and for PCH would make us skip it but
remember for address bias translation
2) drop the skip for tree_translation_unit_decl::language
3) change get_unnamed_section to have const char * as
last argument instead of const void *, change
unnamed_section::data also to const char * and update
everything related to that
4) maybe add a host hook whether it is ok to support binaries
changing addresses (the only thing I'm worried is if
some host that uses function descriptors allocates them
dynamically instead of having them somewhere in the
executable)
5) maybe add a gengtype warning if it sees in GTY tracked
structure a function pointer without that new callback
option


So, here is 1), 2), 3) implemented.  With this patch alone,
g++.dg/pch/system-2.C test ICEs.  This is because GCC 12 has added
function::x_range_query member, which is set to _ranges on
cfun creation and is:
   range_query * GTY ((skip)) x_range_query;
which means when a PIE binary writes PCH and a PIE binary loaded
at a different address loads it, cfun->x_range_query might be a garbage
pointer.  We can either apply a patch like the attached one after
this inline patch, but then probably callback is misnamed and we should
rename it to relocate_and_skip or something similar.  Or we could
e.g. during gimplification overwrite cfun->x_range_query = _ranges.
Other than that make check-gcc check-g++ passes RUNTESTFLAGS=pch.exp.

Note, on stdc++.h.gch/O2g.gch there are just those 10 relocations without
the second patch, with it a few more, but nothing huge.  And for non-PIEs
there isn't really any extra work on the load side except freading two scalar
values and fseek.


Here is an updated version (the previous version doesn't apply any longer
in gt_pch_restore due to the line_maps saving/restoring changes).

Bootstrapped/regtested on x86_64-linux and i686-linux with
--enable-host-shared in addition to normal configure options (and without
Ada which doesn't even with the above options build PIC libgnat.a and links
it) and a hack:
--- gcc/configure.ac.jj 2021-10-28 22:12:31.569299780 +0200
+++ gcc/configure.ac2021-11-09 11:34:33.453776105 +0100
@@ -7566,7 +7566,7 @@ AC_CACHE_CHECK([for -no-pie option],
   [gcc_cv_no_pie=no])
 LDFLAGS="$saved_LDFLAGS"])
  if test "$gcc_cv_no_pie" = "yes"; then
-  NO_PIE_FLAG="-no-pie"
+  NO_PIE_FLAG="-pie"
  fi
  AC_SUBST([NO_PIE_FLAG])
  
to force binaries be PIE, no regressions against non-PIC/PIE build

without this patch.  Ok for trunk?

2021-11-18  Jakub Jelinek  

PR pch/71934
gcc/
* ggc.h (gt_pch_note_callback): Declare.
* gengtype.h (enum typekind): Add TYPE_CALLBACK.
(callback_type): Declare.
* gengtype.c (dbgprint_count_type_at): Handle TYPE_CALLBACK.
(callback_type): New variable.
(process_gc_options): Add CALLBACK argument, handle callback
option.
(set_gc_used_type): Adjust process_gc_options caller, if callback,
set type to _type.
(output_mangled_typename): Handle TYPE_CALLBACK.
(walk_type): Likewise.  Handle callback option.
(write_types_process_field): Handle TYPE_CALLBACK.
(write_types_local_user_process_field): Likewise.
(write_types_local_process_field): Likewise.
(write_root): Likewise.
(dump_typekind): Likewise.
(dump_type): Likewise.
* gengtype-state.c (type_lineloc): Handle TYPE_CALLBACK.
(state_writer::write_state_callback_type): New method.
(state_writer::write_state_type): Handle TYPE_CALLBACK.
(read_state_callback_type): New function.
(read_state_type): Handle TYPE_CALLBACK.
* ggc-common.c (callback_vec): New variable.
(gt_pch_note_callback): New function.
(gt_pch_save): Stream out gt_pch_save function address and relocation
table.
(gt_pch_restore): Stream in saved gt_pch_save function address and
relocation table and apply relocations if needed.
* doc/gty.texi (callback): Document new GTY option.
* varasm.c (get_unnamed_section): Change callback argument's type and
last argument's type from const void * to const char *.
(output_section_asm_op): Change argument's type from const void *
to const char *, remove unnecessary cast.
* tree-core.h (struct tree_translation_unit_decl): Drop GTY((skip))
from language member.
* output.h (unnamed_section_callback): Change argument type from
const void * to const char *.
(struct unnamed_section): Use GTY((callback)) instead of GTY((skip))
for

RE: [PATCH]middle-end cse: Make sure duplicate elements are not entered into the equivalence set [PR103404]

2021-12-06 Thread Tamar Christina via Gcc-patches

Hi,

As discussed off-line this can only happen with a V1 mode, so here's a much 
simpler patch.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.


Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR rtl-optimization/103404
* cse.c (find_sets_in_insn): Don't select elements out of a V1 mode
subreg.

gcc/testsuite/ChangeLog:

PR rtl-optimization/103404
* gcc.target/i386/pr103404.c: New test.

--- inline copy of patch ---

diff --git a/gcc/cse.c b/gcc/cse.c
index 
c1c7d0ca27b73c4b944b4719f95fece74e0358d5..dc5d5aed047c7776f44b159a4286390d6499c18d
 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4275,7 +4275,12 @@ find_sets_in_insn (rtx_insn *insn, vec 
*psets)
   else if (GET_CODE (SET_SRC (x)) == CALL)
;
   else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR
-  && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL)
+  && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL
+  /* Prevent duplicates from being generated if the type is a V1
+ type and a subreg.  Folding this will result in the same
+ element as folding x itself.  */
+  && !(SUBREG_P (SET_DEST (x))
+   && known_eq (GET_MODE_NUNITS (GET_MODE (SET_SRC (x))), 1)))
{
  /* First register the vector itself.  */
  add_to_set (psets, x);
diff --git a/gcc/testsuite/gcc.target/i386/pr103404.c 
b/gcc/testsuite/gcc.target/i386/pr103404.c
new file mode 100644
index 
..66f33645301db09503fc0977fd0f061a19e56ea5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103404.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Og -fcse-follow-jumps -fno-dce 
-fno-early-inlining -fgcse -fharden-conditional-branches -frerun-cse-after-loop 
-fno-tree-ccp -mavx5124fmaps -std=c99 -w" } */
+
+typedef unsigned __attribute__((__vector_size__ (4))) U;
+typedef unsigned __attribute__((__vector_size__ (16))) V;
+typedef unsigned __attribute__((__vector_size__ (64))) W;
+
+int x, y;
+
+V v;
+W w;
+
+inline
+int bar (U a)
+{
+  a |= x;
+  W k =
+__builtin_shufflevector (v, 5 / a,
+2, 4, 0, 2, 4, 1, 0, 1,
+1, 2, 1, 3, 0, 4, 4, 0);
+  w = k;
+  y = 0;
+}
+
+int
+foo ()
+{
+  bar ((U){0x});
+  for (unsigned i; i < sizeof (foo);)
+;
+}
+


rb15109.patch
Description: rb15109.patch

Re: [PATCH v2] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2021-12-06 Thread Martin Liška


On 12/4/21 00:23, Peter Bergner wrote:

On 12/2/21 9:46 PM, Kewen.Lin via Gcc-patches wrote:

on 2021/11/30 上午12:57, Segher Boessenkool wrote:

On Wed, Sep 01, 2021 at 02:55:51PM +0800, Kewen.Lin wrote:

This patch is to fix the inconsistent behaviors for non-LTO mode
and LTO mode.  As Martin pointed out, currently the function
rs6000_can_inline_p simply makes it inlinable if callee_tree is
NULL, but it's wrong, we should use the command line options
from target_option_default_node as default.


This is not documented.



Yeah, but according to the document for the target attribute [1],
"Multiple target back ends implement the target attribute to specify
that a function is to be compiled with different target options than
specified on the command line. The original target command-line options
are ignored. ", it seems to say the function without any target
attribute/pragma will be compiled with target options specified on the
command line.  I think it's a normal expectation for users.

Excepting for the inconsistent behaviors between LTO and non-LTO,
it can also make the below case different.


I thought Martin and richi mentioned that target attribute options
are treated as if they are appended to the end of the command line
options, so they can potentially override earlier options, but they
don't actually ignore them?


No, the described behavior is true for optimize attribute:

optimize (string, …)
...
The optimize attribute arguments of a function behave behave as if appended to 
the command-line.

but:

target (string, …)
...
The original target command-line options are ignored.

As seen here:
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

Cheers,
Martin



Peter

Re: [PATCH] RISC-V: jal cannot refer to a default visibility symbol for shared object.

2021-12-06 Thread Kito Cheng via Gcc-patches

Committed, thanks :)

On Mon, Nov 29, 2021 at 8:48 PM Nelson Chu  wrote:
>
> This is the original binutils bugzilla report,
> https://sourceware.org/bugzilla/show_bug.cgi?id=28509
>
> And this is the first version of the proposed binutils patch,
> https://sourceware.org/pipermail/binutils/2021-November/118398.html
>
> After applying the binutils patch, I get the the unexpected error when
> building libgcc,
>
> /scratch/nelsonc/riscv-gnu-toolchain/riscv-gcc/libgcc/config/riscv/div.S:42:
> /scratch/nelsonc/build-upstream/rv64gc-linux/build-install/riscv64-unknown-linux-gnu/bin/ld:
>  relocation R_RISCV_JAL against `__udivdi3' which may bind externally can not 
> be used when making a shared object; recompile with -fPIC
>
> Therefore, this patch add an extra hidden alias symbol for __udivdi3, and
> then use HIDDEN_JUMPTARGET to target a non-preemptible symbol instead.
> The solution is similar to glibc as follows,
> https://sourceware.org/git/?p=glibc.git;a=commit;h=68389203832ab39dd0dbaabbc4059e7fff51c29b
>
> libgcc/ChangeLog:
>
> * config/riscv/div.S: Add the hidden alias symbol for __udivdi3, and
> then use HIDDEN_JUMPTARGET to target it since it is non-preemptible.
> * config/riscv/riscv-asm.h: Added new macros HIDDEN_JUMPTARGET and
> HIDDEN_DEF.
> ---
>  libgcc/config/riscv/div.S   | 15 ---
>  libgcc/config/riscv/riscv-asm.h |  6 ++
>  2 files changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/libgcc/config/riscv/div.S b/libgcc/config/riscv/div.S
> index c9bd787..723c3b8 100644
> --- a/libgcc/config/riscv/div.S
> +++ b/libgcc/config/riscv/div.S
> @@ -40,7 +40,7 @@ FUNC_BEGIN (__udivsi3)
>slla0, a0, 32
>slla1, a1, 32
>move   t0, ra
> -  jal__udivdi3
> +  jalHIDDEN_JUMPTARGET(__udivdi3)
>sext.w a0, a0
>jr t0
>  FUNC_END (__udivsi3)
> @@ -52,7 +52,7 @@ FUNC_BEGIN (__umodsi3)
>srla0, a0, 32
>srla1, a1, 32
>move   t0, ra
> -  jal__udivdi3
> +  jalHIDDEN_JUMPTARGET(__udivdi3)
>sext.w a0, a1
>jr t0
>  FUNC_END (__umodsi3)
> @@ -95,11 +95,12 @@ FUNC_BEGIN (__udivdi3)
>  .L5:
>ret
>  FUNC_END (__udivdi3)
> +HIDDEN_DEF (__udivdi3)
>
>  FUNC_BEGIN (__umoddi3)
>/* Call __udivdi3(a0, a1), then return the remainder, which is in a1.  */
>move  t0, ra
> -  jal   __udivdi3
> +  jal   HIDDEN_JUMPTARGET(__udivdi3)
>move  a0, a1
>jrt0
>  FUNC_END (__umoddi3)
> @@ -111,12 +112,12 @@ FUNC_END (__umoddi3)
>bgtz  a1, .L12 /* Compute __udivdi3(-a0, a1), then negate the result.  
> */
>
>neg   a1, a1
> -  j __udivdi3/* Compute __udivdi3(-a0, -a1).  */
> +  j HIDDEN_JUMPTARGET(__udivdi3) /* Compute __udivdi3(-a0, -a1).  */
>  .L11:/* Compute __udivdi3(a0, -a1), then negate the result.  
> */
>neg   a1, a1
>  .L12:
>move  t0, ra
> -  jal   __udivdi3
> +  jal   HIDDEN_JUMPTARGET(__udivdi3)
>neg   a0, a0
>jrt0
>  FUNC_END (__divdi3)
> @@ -126,7 +127,7 @@ FUNC_BEGIN (__moddi3)
>bltz   a1, .L31
>bltz   a0, .L32
>  .L30:
> -  jal__udivdi3/* The dividend is not negative.  */
> +  jalHIDDEN_JUMPTARGET(__udivdi3)/* The dividend is not negative.  */
>move   a0, a1
>jr t0
>  .L31:
> @@ -134,7 +135,7 @@ FUNC_BEGIN (__moddi3)
>bgez   a0, .L30
>  .L32:
>nega0, a0
> -  jal__udivdi3/* The dividend is hella negative.  */
> +  jalHIDDEN_JUMPTARGET(__udivdi3)/* The dividend is hella negative.  
> */
>nega0, a1
>jr t0
>  FUNC_END (__moddi3)
> diff --git a/libgcc/config/riscv/riscv-asm.h b/libgcc/config/riscv/riscv-asm.h
> index 8550707..96dd85b 100644
> --- a/libgcc/config/riscv/riscv-asm.h
> +++ b/libgcc/config/riscv/riscv-asm.h
> @@ -33,3 +33,9 @@ X:
>  #define FUNC_ALIAS(X,Y)\
> .globl X;   \
> X = Y
> +
> +#define CONCAT1(a, b)  CONCAT2(a, b)
> +#define CONCAT2(a, b)  a ## b
> +#define HIDDEN_JUMPTARGET(X)   CONCAT1(__hidden_, X)
> +#define HIDDEN_DEF(X)  FUNC_ALIAS(HIDDEN_JUMPTARGET(X), X); \
> +   .hidden HIDDEN_JUMPTARGET(X)
> --
> 2.7.4
>

Re: [PATCH] [i386] Prefer INT_SSE_REGS for SSE_FLOAT_MODE_P in preferred_reload_class.

2021-12-06 Thread Uros Bizjak via Gcc-patches

On Mon, Dec 6, 2021 at 4:41 AM liuhongt via Gcc-patches
 wrote:
>
> When moves between integer and sse registers are cheap.
>
> 2021-12-06  Hongtao Liu  
> Uroš Bizjak  
> gcc/ChangeLog:
>
> PR target/95740
> * config/i386/i386.c (ix86_preferred_reload_class): Allow
> integer regs when moves between register units are cheap.
> * config/i386/i386.h (INT_SSE_CLASS_P): New.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr95740.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.c  | 12 ++--
>  gcc/config/i386/i386.h  |  2 ++
>  gcc/testsuite/gcc.target/i386/pr95740.c | 26 +
>  3 files changed, 38 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr95740.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 80fee627358..e3c2e294988 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -19194,9 +19194,17 @@ ix86_preferred_reload_class (rtx x, reg_class_t 
> regclass)
>return NO_REGS;
>  }
>
> -  /* Prefer SSE regs only, if we can use them for math.  */
> +  /* Prefer SSE if we can use them for math.  Also allow integer regs
> + when moves between register units are cheap.  */
>if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> -return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
> +{
> +  if (TARGET_INTER_UNIT_MOVES_FROM_VEC
> + && TARGET_INTER_UNIT_MOVES_TO_VEC
> + && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (word_mode))
> +   return INT_SSE_CLASS_P (regclass) ? regclass : NO_REGS;
> +  else
> +   return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
> +}
>
>/* Generally when we see PLUS here, it's the function invariant
>   (plus soft-fp const_int).  Which can only be computed into general
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 2fda1e0686e..ec90e47904b 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1283,6 +1283,8 @@ enum reg_class
>reg_class_subset_p ((CLASS), FLOAT_REGS)
>  #define SSE_CLASS_P(CLASS) \
>reg_class_subset_p ((CLASS), ALL_SSE_REGS)
> +#define INT_SSE_CLASS_P(CLASS) \
> +  reg_class_subset_p ((CLASS), INT_SSE_REGS)
>  #define MMX_CLASS_P(CLASS) \
>((CLASS) == MMX_REGS)
>  #define MASK_CLASS_P(CLASS) \
> diff --git a/gcc/testsuite/gcc.target/i386/pr95740.c 
> b/gcc/testsuite/gcc.target/i386/pr95740.c
> new file mode 100644
> index 000..7ecd71ba8c1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr95740.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-msse2 -O2 -mtune=generic -mtune-ctrl=use_incdec -masm=att 
> -mfpmath=sse" } */
> +/* { dg-final { scan-assembler-times {(?n)movd[\t ]*%xmm0.*%eax} 1 } } */
> +/* { dg-final { scan-assembler-times {(?n)incl[\t ]*%eax} 1 } } */
> +/* { dg-final { scan-assembler-times {(?n)movq[\t ]*%xmm0.*%rax} 1 } } */
> +/* { dg-final { scan-assembler-times {(?n)incq[\t ]*%rax} 1 } } */
> +
> +int
> +foo (float a)
> +{
> +  union{
> +int b;
> +float a;}u;
> +  u.a = a;
> +  return u.b + 1;
> +}
> +
> +long long
> +foo1 (double a)
> +{
> +  union{
> +long long b;
> +double a;}u;
> +  u.a = a;
> +  return u.b + 1;
> +}
> --
> 2.18.2
>

Re: [PATCH] gimple: Optimise inlined gimple_seq_last

2021-12-06 Thread Richard Biener via Gcc-patches

On Sun, Dec 5, 2021 at 11:02 PM Richard Sandiford via Gcc-patches
 wrote:
>
> In self-compilations, GCC doesn't realise that gimple_seq_last
> always returns nonnull when the argument is nonnull.  Although
> it's a small thing in itself, the function is used (indirectly)
> many times and the extra null checks bloat what are otherwise
> simple loops.
>
> This patch adds an optimisation hint to tell the compiler that
> s && !s->prev is impossible.
>  This gives a small (<1%) but
> measureable improvement in compile time.  The original intention
> was to make a loop small enough that it could be turned into an
> inline function.
>
> The form of assertion in the patch:
>
>   __extension__ ({ if (!(EXPR)) __builtin_unreachable (); (void) 0; })
>
> seems to work more reliably than the form used by release-checking
> gcc_asserts:
>
>   ((void)(__builtin_expect (!(EXPR), 0) ? __builtin_unreachable (), 0 : 0))
>
> For that reason, I wondered about making gcc_assert use the
> new macro too.  However, gcc_assert is currently inconsistent:
> it preserves side-effects even in release compilers when compiled
> with GCC, but it doesn't when compiled by other compilers.
> This difference dates back to 2013 (g:2df77822ee0974d84b2)
> and it looks like we now have several gcc_asserts that assume
> that side effects will be preserved.  It therefore seemed better
> to deal with that separately.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
> * system.h (optimization_assert): New macro.
> * gimple.h (gimple_seq_last): Use it to assert that s->prev is
> never null.
> ---
>  gcc/gimple.h |  2 ++
>  gcc/system.h | 10 ++
>  2 files changed, 12 insertions(+)
>
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index f7fdefc5362..8162c1ea507 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -1723,6 +1723,8 @@ gimple_seq_first_stmt_as_a_bind (gimple_seq s)
>  static inline gimple_seq_node
>  gimple_seq_last (gimple_seq s)
>  {
> +  /* Helps to avoid a double branch in callers.  */
> +  optimization_assert (!s || s->prev);
>return s ? s->prev : NULL;

Does it work when you write it in a more legible way like

 if (s)
   {
  /* gimple_seq is linked cyclic in prev.  */
  gcc_assert (s->prev);
  return s->prev;
   }
return NULL;

?

>  }
>
> diff --git a/gcc/system.h b/gcc/system.h
> index 4ac656c9c3c..6e27b3270b9 100644
> --- a/gcc/system.h
> +++ b/gcc/system.h
> @@ -778,6 +778,16 @@ extern void fancy_abort (const char *, int, const char *)
>  ATTRIBUTE_NORETURN ATTRIBUTE_COLD;
>  #define abort() fancy_abort (__FILE__, __LINE__, __FUNCTION__)
>
> +/* Tell the compiler of GCC that EXPR is known to be true.  This is purely
> +   an optimization hint and EXPR must not have any side effects.  */
> +#if (GCC_VERSION >= 4005)
> +#define optimization_assert(EXPR)  \
> +  __extension__ ({ if (!(EXPR)) __builtin_unreachable (); (void) 0; })
> +#else
> +/* Include EXPR, so that unused variable warnings do not occur.  */
> +#define optimization_assert(EXPR) ((void)(0 && (EXPR)))
> +#endif
> +
>  /* Use gcc_assert(EXPR) to test invariants.  */
>  #if ENABLE_ASSERT_CHECKING
>  #define gcc_assert(EXPR)   \
> --
> 2.31.1
>

79 matches

Mail list logo