Re: [PATCH] testsuite: Update some vect cases for partial vectors

2020-08-05 Thread Richard Sandiford
"Kewen.Lin"  writes:
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c 
> b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
> index 5200ed1cd94..da6fb12eb0d 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
> @@ -48,6 +48,9 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> vect_unpack } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  
> { target vect_unpack xfail { vect_variable_length && vect_load_lanes } } } } 
> */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> { vect_unpack && {! vect_partial_vectors_usage_1 } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> { vect_unpack && { vect_partial_vectors_usage_1 } } } } } */

I don't understand this bit: don't these two lines reduce back to the
original vect_unpack one?

> +/* The epilogues are vectorized using partial vectors.  */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  
> { target { vect_unpack && {! vect_partial_vectors_usage_1 } } xfail { 
> vect_variable_length && vect_load_lanes } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  
> { target { vect_unpack && { vect_partial_vectors_usage_1 } } xfail { 
> vect_variable_length && vect_load_lanes } } } } */
>
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c 
> b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> index ca7803ec1a9..af6fe08856f 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> @@ -80,8 +80,10 @@ int main (int argc, const char* argv[])
>  }
>  
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
> vect_perm } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { vect_perm3_int && {! vect_load_lanes } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { vect_perm3_int && { {! vect_load_lanes } && {! 
> vect_partial_vectors_usage_1 } } } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
> { target vect_load_lanes } } } */
> +/* The epilogues are vectorized using partial vectors.  */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> { target { vect_perm3_int && { {! vect_load_lanes } && { 
> vect_partial_vectors_usage_1 } } } } } } */

Very minor, but I think it'd be better to put this immediately after the
line you changed above.  Same for the other slp-perm-* changes.

> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 57eed3012b9..b571e84d20e 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -7055,6 +7055,27 @@ proc check_effective_target_vect_check_ptrs { } {
>  return [check_effective_target_aarch64_sve2]
>  }
>  
> +# Return true if loops using partial vectors are supported.
> +
> +proc check_effective_target_vect_partial_vectors { } {
> +return [expr { [check_effective_target_vect_partial_vectors_usage_1]
> +|| [check_effective_target_vect_partial_vectors_usage_2] }]
> +}
> +
> +# Return true if loops using partial vectors are supported and the default
> +# value of --param=vect-partial-vector-usage is 1.
> +
> +proc check_effective_target_vect_partial_vectors_usage_1 { } {
> +return 0
> +}
> +
> +# Return true if loops using partial vectors are supported and the default
> +# value of --param=vect-partial-vector-usage is 2.
> +
> +proc check_effective_target_vect_partial_vectors_usage_2 { } {
> +return [expr { [check_effective_target_vect_fully_masked] }]
> +}
> +

Could we auto-detect this?  What we really care about isn't the default,
but what's currently being tested.

E.g. maybe use check_compile to run gcc with “-Q --help=params” and an
arbitrary output type (probably assembly).  Then use “regexp” on the
lines to parse the --param=vect-partial-vector-usage value.  At that
point it would be worth caching the result.

The new target-supports keywords need to be documented in sourcebuild.texi.

Thanks,
Richard


Re: [PATCH v5] dse: Remove partial load after full store for high part access[PR71309]

2020-08-05 Thread Richard Sandiford
luoxhu  writes:
> Hi Richard,
>
> On 2020/8/3 22:01, Richard Sandiford wrote:
>>> /* Try a wider mode if truncating the store mode to NEW_MODE
>>>  requires a real instruction.  */
>>> if (maybe_lt (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (store_mode))
>>> @@ -1779,6 +1780,25 @@ find_shift_sequence (poly_int64 access_size,
>>>   && !targetm.modes_tieable_p (new_mode, store_mode))
>>> continue;
>>>   
>>> +  if (multiple_p (GET_MODE_BITSIZE (new_mode), shift)
>> 
>> Like I mentioned before, this should be the other way around:
>> 
>>  multiple_p (shift, GET_MODE_BITSIZE (new_mode))
>> 
>> i.e. for the subreg to be valid, the shift amount must be a multiple
>> of the outer mode size in bits.
>> 
>> OK with that change, thanks, and sorry for the multiple review cycles.
>
> Just had another question is do we want to backport this patch to gcc-10
> and gcc-9 branches? :)

One for the RMs, but it looks like the PR was reported against GCC 7 and
isn't a regression.  Given that it's also “just” a missed optimisation,
I'm guessing it should be trunk only.

Thanks,
Richard


Re: [Patch, fortran] PR96102 - ICE in check_host_association, at fortran/resolve.c:5994

2020-08-05 Thread Paul Richard Thomas via Gcc-patches
Dear All,

Dominique has kindly pointed out that the error message in the patch does
not correspond to the errors in the testcase. This came about because the
submitted patch was an earlier version of that on my tree. Please find
attached the correct version of the patch. The principle is the same but
the error is different... if you see what I mean :-)

Paul

On Wed, 5 Aug 2020 at 16:40, Steve Kargl 
wrote:

> On Wed, Aug 05, 2020 at 03:59:38PM +0100, Paul Richard Thomas wrote:
> > The attached patch builds on a draft posted by Steve Kargl. I have left
> the
> > gcc_assert in place just in case my imagination was too limited to
> generate
> > an ICE.
> >
> > Regtests OK on FC31/x86_64 - OK for trunk?
> >
>
> Looks OK to me.
>
> --
> Steve
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 2751c0ccf62..6caddcf4ef0 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -5993,6 +5993,16 @@ check_host_association (gfc_expr *e)
 		if (ref->type == REF_ARRAY && ref->next == NULL)
 		  break;
 
+	  if ((ref == NULL || ref->type != REF_ARRAY)
+		  && sym->attr.proc == PROC_INTERNAL)
+		{
+		  gfc_error ("%qs at %L is host associated at %L into "
+			 "a contained procedure with an internal "
+			 "procedure of the same name", sym->name,
+			  _sym->declared_at, >where);
+		  return false;
+		}
+
 	  gcc_assert (ref->type == REF_ARRAY);
 
 	  /* Grab the start expressions from the array ref and


Re: std:vec for classes with constructor?

2020-08-05 Thread Richard Sandiford
Andrew MacLeod via Gcc-patches  writes:
> On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:
>> On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor  
>> wrote:
>>> On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
>>> [...]
>>>
 * ipa-cp changes from vec to std::vec.

 We are using std::vec to ensure constructors are run, which they
>>> aren't
 in our internal vec<> implementation.  Although we usually steer away
 from using std::vec because of interactions with our GC system,
 ipcp_param_lattices is only live within the pass and allocated with
>>> calloc.
>>> Ummm... I did not object but I will save the URL of this message in the
>>> archive so that I can waive it in front of anyone complaining why I
>>> don't use our internal vec's in IPA data structures.
>>>
>>> But it actually raises a broader question: was this supposed to be an
>>> exception, allowed only not to complicate the irange patch further, or
>>> will this be generally accepted thing to do when someone wants to have
>>> a
>>> vector of constructed items?
>> It's definitely not what we want. You have to find another solution to this 
>> problem.
>>
>> Richard.
>>
>
> Why isn't it what we want?
>
> This is a small vector local to the pass so it doesn't interfere with 
> our PITA GTY.
> The class is pretty straightforward, but we do need a constructor to 
> initialize the pointer and the max-size field.  There is no allocation 
> done per element, so a small number of elements have a couple of fields 
> initialized per element. We'd have to loop to do that anyway.
>
> GCC's vec<> does not provide he ability to run a constructor, std::vec 
> does.

I realise you weren't claiming otherwise, but: that could be fixed :-)

> I quizzed some libstdc++ folks, and there has been a lot of 
> optimizations done on std::vec over the last few years,.. They think its 
> pretty good now, and we were encouraged to use it.
>
> We can visit the question tho...  What is the rationale for not using 
> std::vec in the compiler?  We currently use std::swap, std:pair, 
> std::map, std::sort, and a few others.
> is there some aspect of using std::vec I am not aware of that makes it 
> something we need to avoid?

One reason to prefer vec<> for general interfaces is that it
works with auto_vec<…, N>, making it possible to pre-allocate a
reasonably-sized buffer on the stack without needing a round-trip
through the allocators.

FWIW, that isn't simply a GCC thing.  LLVM (which is obviously much
more C++-intensive than GCC) still makes heavy use of SmallVector for
automatic variables.  And the reason we have things like memory_block.h
is that malloc did used to show up high in profiles.

(FWIW, I'm not saying that's an argument in favour of avoiding
std::vector completely.  It's just a reason why it might not always
be the right choice.)

Thanks,
Richard


Re: [patch] multi-range implementation for value_range (irange)

2020-08-05 Thread Richard Sandiford
Gerald Pfeifer  writes:
> 2020-08-05  Gerald Pfeifer   
>  
> * ipa-fnsummary.c (INCLUDE_VECTOR): Define. 
> Remove direct inclusion of . 

OK, thanks.

Richard

> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> index 49bab04524b..59e52927151 100644
> --- a/gcc/ipa-fnsummary.c
> +++ b/gcc/ipa-fnsummary.c
> @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
> inlined performs analysis via its analyze_function method. */
>  
>  #include "config.h"
> +#define INCLUDE_VECTOR
>  #include "system.h"
>  #include "coretypes.h"
>  #include "backend.h"
> @@ -82,7 +83,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimplify.h"
>  #include "stringpool.h"
>  #include "attribs.h"
> -#include 
>  #include "tree-into-ssa.h"
>  
>  /* Summaries.  */


Re: [PATCH] emit-rtl.c: Allow vector subreg of float vectors

2020-08-05 Thread Richard Sandiford
Andrew Stubbs  writes:
> This patch is a prerequisite for some patches I have to add multiple 
> vector sizes on amdgcn.
>
> The problem is that validate_subreg rejects things like this:
>
>(subreg:V32SF (reg:V64SF) 0)
>
> These are commonly generated by the compiler. The integer equivalents 
> work fine.
>
> To be honest, I don't know why other targets have not encountered this 
> problem before? Perhaps because most other targets require all vectors 
> to have the same number of bits, meaning that float vectors have a fixed 
> number of lanes? In my case, amdgcn can convert V64SFmode to V32SFmode 
> by simply masking off some lanes.
>
> OK to commit? (I have an x86_64 bootstrap and test in progress.)
>
> Andrew
>
> Allow vector subreg of float vectors
>
> Integer vector subregs were already permitted.
>
> gcc/ChangeLog:
>
>   * emit-rtl.c (validate_subreg): Permit sections of float vectors.
>
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index f9b0e9714d9..d7067989ad7 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -947,6 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
>else if (VECTOR_MODE_P (omode)
>  && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
>  ;
> +  /* Allow sections of vectors, both smaller and paradoxical.  (This just
> + works for integers, but needs to be explicitly allowed for floats.)  */
> +  else if (VECTOR_MODE_P (omode) && VECTOR_MODE_P (imode)
> +&& GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
> +;

Might be missing something, but isn't this a subcondition of the
previous “else if”?  It looks like that ought to catch every case
that the new one does.

Thanks,
Richard

>/* Subregs involving floating point modes are not allowed to
>   change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
>   (subreg:SI (reg:DF) 0) isn't.  */


Re: std:vec for classes with constructor? (Was: Re: [patch] multi-range implementation for value_range (irange))

2020-08-05 Thread Andrew MacLeod via Gcc-patches

On 8/5/20 12:54 PM, Richard Biener via Gcc-patches wrote:

On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor  wrote:

On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
[...]


* ipa-cp changes from vec to std::vec.

We are using std::vec to ensure constructors are run, which they

aren't

in our internal vec<> implementation.  Although we usually steer away
from using std::vec because of interactions with our GC system,
ipcp_param_lattices is only live within the pass and allocated with

calloc.
Ummm... I did not object but I will save the URL of this message in the
archive so that I can waive it in front of anyone complaining why I
don't use our internal vec's in IPA data structures.

But it actually raises a broader question: was this supposed to be an
exception, allowed only not to complicate the irange patch further, or
will this be generally accepted thing to do when someone wants to have
a
vector of constructed items?

It's definitely not what we want. You have to find another solution to this 
problem.

Richard.



Why isn't it what we want?

This is a small vector local to the pass so it doesn't interfere with 
our PITA GTY.
The class is pretty straightforward, but we do need a constructor to 
initialize the pointer and the max-size field.  There is no allocation 
done per element, so a small number of elements have a couple of fields 
initialized per element. We'd have to loop to do that anyway.


GCC's vec<> does not provide he ability to run a constructor, std::vec 
does.  I quizzed some libstdc++ folks, and there has been a lot of 
optimizations done on std::vec over the last few years,.. They think its 
pretty good now, and we were encouraged to use it.


We can visit the question tho...  What is the rationale for not using 
std::vec in the compiler?  We currently use std::swap, std:pair, 
std::map, std::sort, and a few others.
is there some aspect of using std::vec I am not aware of that makes it 
something we need to avoid?


Andrew







Re: [PATCH v5] dse: Remove partial load after full store for high part access[PR71309]

2020-08-05 Thread luoxhu via Gcc-patches

Hi Richard,

On 2020/8/3 22:01, Richard Sandiford wrote:

/* Try a wider mode if truncating the store mode to NEW_MODE
 requires a real instruction.  */
if (maybe_lt (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (store_mode))
@@ -1779,6 +1780,25 @@ find_shift_sequence (poly_int64 access_size,
  && !targetm.modes_tieable_p (new_mode, store_mode))
continue;
  
+  if (multiple_p (GET_MODE_BITSIZE (new_mode), shift)


Like I mentioned before, this should be the other way around:

 multiple_p (shift, GET_MODE_BITSIZE (new_mode))

i.e. for the subreg to be valid, the shift amount must be a multiple
of the outer mode size in bits.

OK with that change, thanks, and sorry for the multiple review cycles.


Just had another question is do we want to backport this patch to gcc-10
and gcc-9 branches? :)


Xionghu




Richard



Re: RFC: Monitoring old PRs, new dg directives

2020-08-05 Thread Marek Polacek via Gcc-patches
On Wed, Aug 05, 2020 at 11:03:08AM -0400, Nathan Sidwell wrote:
> On 8/4/20 8:54 PM, Marek Polacek via Gcc-patches wrote:
> > On Tue, Aug 04, 2020 at 03:33:23PM -0700, Mike Stump wrote:
> > > I think the read of the room is that people think it would be generally 
> > > useful, so let approve the general plan.
> > 
> > Cool.
> > 
> > > So, now we are down to the fine details.  Please do see just how far you 
> > > can stretch the existing mechanisms to cover what you need to do.  I 
> > > think the existing mechanisms should be able to cover it all; but the 
> > > devil is in the details and those matter.
> > 
> > At this point I'm only proposing one new directive, dg-ice.  I think we 
> > can't
> > really do without it.  The other one was a matter of convenience.
> 
> I've realized I have a concern.  Grepping (or searching in an editor buffer)
> the log file for 'internal compiler error' to find actual regressions is a
> thing I want to still be able to do (perhaps with alternative spelling, I
> don't care).  I don't want to see the ICEs of tests that are expected to
> ICE.
> 
> I think that means there has to be a positive marker on the unexpected ICEs,
> rather than lack of an expected marker on them.

Hmm, by the log file you mean g++.log?  Currently, if you run a dg-ice test,
and the test still ICEs, the g++.log file (but not the stdout of make
check-c++!) will have:

Executing on host: ... xg++ with options ...
spawn -ignore SIGHUP ... xg++ with options ...
.../foo.C:14:15: internal compiler error: in poplevel_class, at 
cp/name-lookup.c:4225

compiler exited with status 1
XFAIL: g++.dg/foo.C  -std=c++17 (internal compiler error)
PASS: g++.dg/foo.C  -std=c++17 (test for excess errors)

Which one of these would you not like to see?

The second one, in parens, can be changed easily.  The first one, the actual
output generated by dejagnu's default_target_compile, I don't think I can
change at all.  g++_target_compile invokes 

  set result [target_compile $source $dest $type $options]

and we have no control over the target_compile proc.

Can you give me more details?  Hopefully we'll work something out that doesn't
break your workflow.

Marek



Re: [PATCH] rs6000: Don't ICE when spilling an MMA accumulator

2020-08-05 Thread Segher Boessenkool
Hi!

On Wed, Aug 05, 2020 at 02:02:36PM -0500, Peter Bergner wrote:
> The following patch fixes one of the bugs discovered in PR96466, namely
> when we spill an accumulator that has a known zero value.  In that case,
> LRA would emit a new (set (reg:PXI ...) 0) insn, but it would not use the
> mma_xxsetaccz pattern to do it.  The solution is to move the xxsetaccz
> instruction into the movpxi pattern and have the xxsetaccz pattern call
> the move pattern.

>"TARGET_MMA
> -   && ((gpc_reg_operand (operands[0], PXImode)
> - && !(CONST_INT_P (operands[1]) && INTVAL (operands[1]) == 0))
> +   && (gpc_reg_operand (operands[0], PXImode)
> || gpc_reg_operand (operands[1], PXImode))"

Much nicer now :-)

> +  "@
> +   #
> +   #
> +   #
> +   xxsetaccz %A0"
> +  "&& reload_completed
> +   && !(fpr_reg_operand (operands[0], PXImode)
> + && CONST_INT_P (operands[1])
> + && INTVAL (operands[1]) == 0)"

You can just say

   && reload_completed
   && !(fpr_reg_operand (operands[0], PXImode) && operands[1] == const0_rtx)

afaics?

Okay (for trunk, and later 10) with or without such a change.  Thanks!


Segher


Re: [patch] multi-range implementation for value_range (irange)

2020-08-05 Thread Gerald Pfeifer
[ Jeff, approval request below ]

On Wed, 5 Aug 2020, Aldy Hernandez wrote:
>> I believe this has broken the bootstrap with clang (specifically
>> FreeBSD clang version 10.0.0):
>>
>> In file included from /scratch/tmp/gerald/GCC-HEAD/gcc/c/gimple-parser.c:44:
>> In file included from /scratch/tmp/gerald/GCC-HEAD/gcc/tree-vrp.h:23:
>> /scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:347:1: error: static
>> declaration of 'gt_ggc_mx' follows non-static declaration gt_ggc_mx
>> (int_range *x)
>> /scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:150:37: note: previous
>> declaration is here
>>template  friend void gt_ggc_mx (int_range *);
:
>> My daily tester started to 20200803T1640, so the root cause of this must
>> have entered GCC trunk between Sunday 16:40 UTC and Monday 16:40 UTC.
> Yeah, this is definitely caused by the irange patch.
> 
> GTY makes my head spin, and it took forever to get these gt_* functions 
> set up.  I can't claim I understand them entirely :).
> 
> Just a guess, does removing the static solve the problem?

Yes, that appears to solve this issue.  

Thank you, Aldy!  (And, yes, I ought to have thought of that myself. ;-)


Bootstrap still fails, but quite later (the log file is 13M as opposed 
to 0.8M) and related to something else ... also triggered by your patch, 
though, I'm afraid:

In file included from /scratch/tmp/gerald/GCC-HEAD/gcc/ipa-fnsummary.c:85:
In file included from /usr/include/c++/v1/vector:274:
In file included from /usr/include/c++/v1/__bit_reference:15:
In file included from /usr/include/c++/v1/algorithm:643:
In file included from /usr/include/c++/v1/memory:653:
/usr/include/c++/v1/typeinfo:346:5: error: no member named 'fancy_abort' in 
namespace 'std::__1'; did you mean simply 'fancy_abort'?
_VSTD::abort();
^~~
/usr/include/c++/v1/__config:782:15: note: expanded from macro '_VSTD'
#define _VSTD std::_LIBCPP_ABI_NAMESPACE
  ^
/scratch/tmp/gerald/GCC-HEAD/gcc/system.h:762:13: note: 'fancy_abort' declared 
here
extern void fancy_abort (const char *, int, const char *)
^

I believe this is due to the following:

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 55a0b272a96..49bab04524b 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include 
 #include "tree-into-ssa.h"
 
 /* Summaries.  */


This is nearly the only case in all of GCC where  is included
directly; are you sure that's appropriate?

Digging a bit...

  2017-01-22  Dimitry Andric 

* gcov.c (INCLUDE_ALGORITHM): Define.
(INCLUDE_VECTOR): Define.
No longer include  and  directly.

...makes me think this is a repeat of a similar situation around gcov
three years ago.

The patch below similarly uses INCLUDE_VECTOR and "system.h" and restores 
bootstrap.  Tested on i386-unknown-freebsd11.4 which uses clang 10.0.0 as 
system compiler.

Okay?

Gerald


2020-08-05  Gerald Pfeifer   
 
* ipa-fnsummary.c (INCLUDE_VECTOR): Define. 
Remove direct inclusion of . 

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 49bab04524b..59e52927151 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
inlined performs analysis via its analyze_function method. */
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -82,7 +83,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "stringpool.h"
 #include "attribs.h"
-#include 
 #include "tree-into-ssa.h"
 
 /* Summaries.  */


Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-05 Thread Martin Sebor via Gcc-patches

On 8/5/20 3:25 PM, Jonathan Wakely wrote:

P0487R1 resolved LWG 2499 for C++20 by removing the operator>> overloads
that have high risk of buffer overflows. They were replaced by
equivalents that only accept a reference to an array, and so can
guarantee not to write past the end of the array.

In order to support both the old and new functionality, this patch
introduces a new overloaded __istream_extract function which takes a
maximum length. The new operator>> overloads use the array size as the
maximum length. The old overloads now use __builtin_object_size to
determine the available buffer size if available (which requires -O2) or
use numeric_limits::max()/sizeof(char_type) otherwise. This
is a change in behaviour, as the old overloads previously always used
numeric_limits::max(), without considering sizeof(char_type)
and without attempting to prevent overflows.

Because they now do little more than call __istream_extract, the old
operator>> overloads are very small inline functions. This means there
is no advantage to explicitly instantiating them in the library (in fact
that would prevent the __builtin_object_size checks from ever working).
As a result, the explicit instantiation declarations can be removed from
the header. The explicit instantiation definitions are still needed, for
backwards compatibility with existing code that expects to link to the
definitions in the library.

While working on this change I noticed that src/c++11/istream-inst.cc
has the following explicit instantiation definition:
   template istream& operator>>(istream&, char*);
This had no effect (and so should not have been present in that file),
because there was an explicit specialization declared in  and
defined in src/++98/istream.cc. However, this change removes the
explicit specialization, and now the explicit instantiation definition
is necessary to ensure the symbol gets defined in the library.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver (GLIBCXX_3.4.29): Export new symbols.
* include/bits/istream.tcc (__istream_extract): New function
template implementing both of operator>>(istream&, char*) and
operator>>(istream&, char(&)[N]). Add explicit instantiation
declaration for it. Remove explicit instantiation declarations
for old function templates.
* include/std/istream (__istream_extract): Declare.
(operator>>(basic_istream&, C*)): Define inline and simply
call __istream_extract.
(operator>>(basic_istream&, signed char*)): Likewise.
(operator>>(basic_istream&, unsigned char*)): Likewise.
(operator>>(basic_istream&, C(7)[N])): Define for LWG 2499.
(operator>>(basic_istream&, signed char(&)[N])):
Likewise.
(operator>>(basic_istream&, unsigned char(&)[N])):
Likewise.
* include/std/streambuf (basic_streambuf): Declare char overload
of __istream_extract as a friend.
* src/c++11/istream-inst.cc: Add explicit instantiation
definition for wchar_t overload of __istream_extract. Remove
explicit instantiation definitions of old operator>> overloads
for versioned-namespace build.
* src/c++98/istream.cc (operator>>(istream&, char*)): Replace
with __istream_extract(istream&, char*, streamsize).
* testsuite/27_io/basic_istream/extractors_character/char/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/char/4.cc:
Do not run test for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/9555-ic.cc:
Do not test writing to pointers for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/9826.cc:
Use array instead of pointer.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/4.cc:
Do not run test for C++20.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/9555-ic.cc:
Do not test writing to pointers for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/lwg2499.cc:
New test.
* 
testsuite/27_io/basic_istream/extractors_character/char/lwg2499_neg.cc:
New test.
* testsuite/27_io/basic_istream/extractors_character/char/overflow.cc:
New test.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499.cc:
New test.
* 
testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499_neg.cc:
New test.

Tested powerpc64le-linux. Committed to trunk.

Martin, Jakub, could you please double-check the usage of
__builtin_object_size? (line 805 in libstdc++-v3/include/std/istream)
Do you see any problems with using it here? If it can't tell the size
then we just assume it's larger than the string to be extracted, which
is what the old code did anyway. I do 

[committed] libstdc++: Break long lines to fit in 80 columns

2020-08-05 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* include/std/atomic (atomic::store): Reformat.

Tested powerpc64le-linux. Committed to trunk.

commit b2d4ba65dca05c0f239dcaf5080f88137ce7b54c
Author: Jonathan Wakely 
Date:   Wed Aug 5 22:48:17 2020

libstdc++: Break long lines to fit in 80 columns

libstdc++-v3/ChangeLog:

* include/std/atomic (atomic::store): Reformat.

diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 2161cbed0d2..1a304261fe7 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -251,11 +251,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   void
   store(_Tp __i, memory_order __m = memory_order_seq_cst) noexcept
-  { __atomic_store(std::__addressof(_M_i), std::__addressof(__i), 
int(__m)); }
+  {
+   __atomic_store(std::__addressof(_M_i), std::__addressof(__i), int(__m));
+  }
 
   void
   store(_Tp __i, memory_order __m = memory_order_seq_cst) volatile noexcept
-  { __atomic_store(std::__addressof(_M_i), std::__addressof(__i), 
int(__m)); }
+  {
+   __atomic_store(std::__addressof(_M_i), std::__addressof(__i), int(__m));
+  }
 
   _Tp
   load(memory_order __m = memory_order_seq_cst) const noexcept


Re: [committed] libstdc++: ParallelSTL is now part of oneAPI DPC++ Library

2020-08-05 Thread Jonathan Wakely via Gcc-patches

On 31/07/20 22:45 +0100, Jonathan Wakely wrote:

On 31/07/20 23:23 +0200, Gerald Pfeifer wrote:

Pushed.

(Something was off with the ChangeLog detection I'm afraid.  I first got
an error message and what ended up in the commit didn't look completely
consistent.)


I noticed the other day that this link was redirecting, but I don't
think we want this change.

We use the PSTL as it was when we imported (i.e. forked) it. We don't
use whatever it's now part of.

A better link would be:
https://github.com/llvm/llvm-project/tree/master/pstl
which is the upstream for our code (even though we're shipping it and
LLVM isn't).



I've pushed this.

commit 54485adc777e52da5161bd5069e1dac696dc7325
Author: Jonathan Wakely 
Date:   Wed Aug 5 22:45:04 2020

libstdc++: Change URL for PSTL again

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2017.xml: Replace oneAPI DPC++ link
with LLVM repo for PSTL.
* doc/html/manual/status.html: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index 0929ee948e0..e6834b3607a 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -1882,7 +1882,7 @@ since C++14 and the implementation is complete.
   28.4
   Parallel algorithms
   
-  Using http://www.w3.org/1999/xlink; xlink:href="https://github.com/oneapi-src/oneDPL;>oneAPI DPC++ Library
+  Using http://www.w3.org/1999/xlink; xlink:href="https://github.com/llvm/llvm-project/tree/master/pstl;>PSTL
 
 
   28.5


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-05 Thread Qing Zhao via Gcc-patches
Hi, Richard,

Thanks a lot for your careful review and detailed comments.  


> On Aug 4, 2020, at 2:35 AM, Richard Biener  wrote:
> 
> I have a few comments below - I'm not sure I'm qualified to fully
> review the rest though.

Could you let me know who will be the more qualified person to fully review the 
rest of middle-end change?

> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> command-line option and
> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function attribue:
> 
> 1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> 
> Don't zero call-used registers upon function return.
> 
> Does a return via EH unwinding also constitute a function return?  I
> think you may want to have a finally handler or support in the unwinder
> for this?  Then there's abnormal return via longjmp & friends, I guess
> there's nothing that can be done there besides patching glibc?
> 
> In general I am missing reasoning as why to use -fzero-call-used-regs=
> in the documentation, that is, what is the thread model and what are
> the guarantees?  Is there any point zeroing registers when spill slots
> are left populated with stale register contents?  How do I (and why
> would I want to?) ensure that there's no information leak from the
> implementation of 'foo' to their callers?  Do I need to compile all
> of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> or is it enough to annotate API boundaries I want to proptect with
> zero_call_used_regs("...")?
> 
> Again - what's the intended use (and how does it fulful anything useful
> for that case)?

The major question of the above is:  what’s the motivation of the whole patch?
H.J.Lu and I have replied this question in separated emails, let’s continue with
this high-level discussion in that thread. 


> @@ -4506,6 +4511,69 @@ handle_no_split_stack_attribute (tree *node, tree 
> name,
> return NULL_TREE;
> }
> 
> +/* Handle a "zero_call_used_regs" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> +   int ARG_UNUSED (flags),
> +   bool *no_add_attris)
> +{
> +  tree decl = *node;
> +  tree id = TREE_VALUE (args);
> +  enum zero_call_used_regs zero_call_used_regs_type = 
> zero_call_used_regs_unset;
> +
> +  if (TREE_CODE (decl) != FUNCTION_DECL)
> +{
> +  error_at (DECL_SOURCE_LOCATION (decl),
> + "%qE attribute applies only to functions", name);
> +  *no_add_attris = true;
> +  return NULL_TREE;
> +}
> +  else if (DECL_INITIAL (decl))
> +{
> +  error_at (DECL_SOURCE_LOCATION (decl),
> + "cannot set %qE attribute after definition", name);
> 
> Why's that?
This might not be needed, I will fix this in the next update.

> 
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index 81bd2ee..ded1880 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -2681,6 +2681,10 @@ merge_decls (tree newdecl, tree olddecl, tree 
> newtype, tree oldtype)
>DECL_IS_NOVOPS (newdecl) |= DECL_IS_NOVOPS (olddecl);
>  }
> 
> +  /* Merge the zero_call_used_regs_type information.  */
> +  if (TREE_CODE (newdecl) == FUNCTION_DECL)
> + DECL_ZERO_CALL_USED_REGS (newdecl) = DECL_ZERO_CALL_USED_REGS 
> (olddecl);
> +
> 
> If you need this (see below) then likely cp/* needs similar adjustment
> so do other places in the middle-end (function cloning, etc)

Will check this in cp/* and function cloning etc to see whether the copying and 
merging are needed in other
places.

Another thought, if I use “lookup_attribute” of the function decl instead of 
checking these bits as you suggested
later,  all these copying and merging might not be necessary anymore. I will 
check on that. 
> 
> 
> +
> +/* Emit a sequence of insns to zero the call-used-registers for the 
> current
> + * function.  */
> 
> No '*' on the continuation line

Okay, will fix this.

> +
> +  /* This array holds the zero rtx with the correponding machine mode.  
> */
> +  rtx zero_rtx[(int)MAX_MACHINE_MODE];
> +  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> +zero_rtx[i] = NULL_RTX;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +{
> +  if (!this_target_hard_regs->x_call_used_regs[regno])
> 
> Use if (!call_used_regs[regno])
Okay.

> 
> + continue;
> +  if (fixed_regs[regno])
> + continue;
> +  if (is_live_reg_at_exit (regno))
> + continue;
> 
> How can a call-used reg be live at exit?

Yes, this might not be needed, I will double check on this.

> 
> +  if 

Re: [committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-05 Thread Jonathan Wakely via Gcc-patches

On 05/08/20 22:25 +0100, Jonathan Wakely wrote:

P0487R1 resolved LWG 2499 for C++20 by removing the operator>> overloads
that have high risk of buffer overflows. They were replaced by
equivalents that only accept a reference to an array, and so can
guarantee not to write past the end of the array.

In order to support both the old and new functionality, this patch
introduces a new overloaded __istream_extract function which takes a
maximum length. The new operator>> overloads use the array size as the
maximum length. The old overloads now use __builtin_object_size to
determine the available buffer size if available (which requires -O2) or
use numeric_limits::max()/sizeof(char_type) otherwise. This
is a change in behaviour, as the old overloads previously always used
numeric_limits::max(), without considering sizeof(char_type)
and without attempting to prevent overflows.

Because they now do little more than call __istream_extract, the old
operator>> overloads are very small inline functions. This means there
is no advantage to explicitly instantiating them in the library (in fact
that would prevent the __builtin_object_size checks from ever working).
As a result, the explicit instantiation declarations can be removed from
the header. The explicit instantiation definitions are still needed, for
backwards compatibility with existing code that expects to link to the
definitions in the library.

While working on this change I noticed that src/c++11/istream-inst.cc
has the following explicit instantiation definition:
 template istream& operator>>(istream&, char*);
This had no effect (and so should not have been present in that file),
because there was an explicit specialization declared in  and
defined in src/++98/istream.cc. However, this change removes the
explicit specialization, and now the explicit instantiation definition
is necessary to ensure the symbol gets defined in the library.



Martin, Jakub, could you please double-check the usage of
__builtin_object_size? (line 805 in libstdc++-v3/include/std/istream)
Do you see any problems with using it here? If it can't tell the size
then we just assume it's larger than the string to be extracted, which
is what the old code did anyway. I do have an idea for stronger
checking in debug mode, which I'll post in a minute.



The attached (uncommitted and not fully tested) patch would make the
new __istream_extract functions return a bool indicating whether
extraction stopped because the maximum number of characters was
reached (as opposed to reaching EOF or whitespace).

This would allow us to abort the program with a debug mode assertion
if limiting the size to the __builtin_object_size value prevented a
buffer overflow from occurring.



diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index b6ce76c1f20..a7d8ee83ce3 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2307,7 +2307,7 @@ GLIBCXX_3.4.29 {
 # std::__istream_extract(istream&, char*, streamsize)
 _ZSt17__istream_extractRSiPc[ilx];
 # std::__istream_extract(wistream&, wchar_t*, streamsize)
-_ZSt17__istream_extractIwSt11char_traitsIwEEvRSt13basic_istreamIT_T0_EPS3_[ilx];
+_ZSt17__istream_extractIwSt11char_traitsIwEEbRSt13basic_istreamIT_T0_EPS3_[ilx];
 
 } GLIBCXX_3.4.28;
 
diff --git a/libstdc++-v3/include/bits/istream.tcc b/libstdc++-v3/include/bits/istream.tcc
index b8f530f6ef5..cb780999ff7 100644
--- a/libstdc++-v3/include/bits/istream.tcc
+++ b/libstdc++-v3/include/bits/istream.tcc
@@ -986,7 +986,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-void
+bool
 __istream_extract(basic_istream<_CharT, _Traits>& __in, _CharT* __s,
 		  streamsize __num)
 {
@@ -1043,6 +1043,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__err |= ios_base::failbit;
   if (__err)
 	__in.setstate(__err);
+  return __extracted == __num - 1;
 }
 
   // 27.6.1.4 Standard basic_istream manipulators
@@ -1098,7 +1099,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   extern template class basic_istream;
   extern template wistream& ws(wistream&);
   extern template wistream& operator>>(wistream&, wchar_t&);
-  extern template void __istream_extract(wistream&, wchar_t*, streamsize);
+  extern template bool __istream_extract(wistream&, wchar_t*, streamsize);
 
   extern template wistream& wistream::_M_extract(unsigned short&);
   extern template wistream& wistream::_M_extract(unsigned int&);  
diff --git a/libstdc++-v3/include/std/istream b/libstdc++-v3/include/std/istream
index cb8e9f87c90..16562e74706 100644
--- a/libstdc++-v3/include/std/istream
+++ b/libstdc++-v3/include/std/istream
@@ -764,10 +764,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 
   template
-void
+bool
 __istream_extract(basic_istream<_CharT, _Traits>&, _CharT*, streamsize);
 
-  void __istream_extract(istream&, char*, streamsize);
+  bool __istream_extract(istream&, char*, streamsize);
 
   //@{
   /**
@@ -805,6 +805,21 @@ 

[committed] libstdc++: Replace operator>>(istream&, char*) [LWG 2499]

2020-08-05 Thread Jonathan Wakely via Gcc-patches
P0487R1 resolved LWG 2499 for C++20 by removing the operator>> overloads
that have high risk of buffer overflows. They were replaced by
equivalents that only accept a reference to an array, and so can
guarantee not to write past the end of the array.

In order to support both the old and new functionality, this patch
introduces a new overloaded __istream_extract function which takes a
maximum length. The new operator>> overloads use the array size as the
maximum length. The old overloads now use __builtin_object_size to
determine the available buffer size if available (which requires -O2) or
use numeric_limits::max()/sizeof(char_type) otherwise. This
is a change in behaviour, as the old overloads previously always used
numeric_limits::max(), without considering sizeof(char_type)
and without attempting to prevent overflows.

Because they now do little more than call __istream_extract, the old
operator>> overloads are very small inline functions. This means there
is no advantage to explicitly instantiating them in the library (in fact
that would prevent the __builtin_object_size checks from ever working).
As a result, the explicit instantiation declarations can be removed from
the header. The explicit instantiation definitions are still needed, for
backwards compatibility with existing code that expects to link to the
definitions in the library.

While working on this change I noticed that src/c++11/istream-inst.cc
has the following explicit instantiation definition:
  template istream& operator>>(istream&, char*);
This had no effect (and so should not have been present in that file),
because there was an explicit specialization declared in  and
defined in src/++98/istream.cc. However, this change removes the
explicit specialization, and now the explicit instantiation definition
is necessary to ensure the symbol gets defined in the library.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver (GLIBCXX_3.4.29): Export new symbols.
* include/bits/istream.tcc (__istream_extract): New function
template implementing both of operator>>(istream&, char*) and
operator>>(istream&, char(&)[N]). Add explicit instantiation
declaration for it. Remove explicit instantiation declarations
for old function templates.
* include/std/istream (__istream_extract): Declare.
(operator>>(basic_istream&, C*)): Define inline and simply
call __istream_extract.
(operator>>(basic_istream&, signed char*)): Likewise.
(operator>>(basic_istream&, unsigned char*)): Likewise.
(operator>>(basic_istream&, C(7)[N])): Define for LWG 2499.
(operator>>(basic_istream&, signed char(&)[N])):
Likewise.
(operator>>(basic_istream&, unsigned char(&)[N])):
Likewise.
* include/std/streambuf (basic_streambuf): Declare char overload
of __istream_extract as a friend.
* src/c++11/istream-inst.cc: Add explicit instantiation
definition for wchar_t overload of __istream_extract. Remove
explicit instantiation definitions of old operator>> overloads
for versioned-namespace build.
* src/c++98/istream.cc (operator>>(istream&, char*)): Replace
with __istream_extract(istream&, char*, streamsize).
* testsuite/27_io/basic_istream/extractors_character/char/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/char/4.cc:
Do not run test for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/9555-ic.cc:
Do not test writing to pointers for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/9826.cc:
Use array instead of pointer.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/3.cc:
Do not use variable-length array.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/4.cc:
Do not run test for C++20.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/9555-ic.cc:
Do not test writing to pointers for C++20.
* testsuite/27_io/basic_istream/extractors_character/char/lwg2499.cc:
New test.
* 
testsuite/27_io/basic_istream/extractors_character/char/lwg2499_neg.cc:
New test.
* testsuite/27_io/basic_istream/extractors_character/char/overflow.cc:
New test.
* testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499.cc:
New test.
* 
testsuite/27_io/basic_istream/extractors_character/wchar_t/lwg2499_neg.cc:
New test.

Tested powerpc64le-linux. Committed to trunk.

Martin, Jakub, could you please double-check the usage of
__builtin_object_size? (line 805 in libstdc++-v3/include/std/istream)
Do you see any problems with using it here? If it can't tell the size
then we just assume it's larger than the string to be extracted, which
is what the old code did anyway. I do have an idea for stronger
checking in debug 

Re: [PATCH] rs6000: Don't ICE when spilling an MMA accumulator

2020-08-05 Thread Peter Bergner via Gcc-patches
On 8/5/20 2:02 PM, Peter Bergner wrote:
> This patch fixes the ICE and is in the middle of regression testing.
> Ok for trunk if the testing comes back clean?

Testing was clean with no regressions.

Peter


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-05 Thread Qing Zhao via Gcc-patches
>> 
>> From The SECURE project and GCC in GCC Cauldron 2018:
>> 
>> Speaker: Graham Markall
>> 
>> The SECURE project is a 15 month program funded by Innovate UK, to
>> take well known security techniques from academia and make them
>> generally available in standard compilers, specfically GCC and LLVM.
>> An explicit objective is for those techniques to be incorporated in
>> the upstream versions of compilers. The Cauldron takes place in the
>> final month of the project and this talk will present the technical
>> details of some of the techniques implemented, and review those that
>> are yet to be implemented. A particular focus of this talk will be on
>> verifying that the implemetnation is correct, which can be a bigger
>> challenge than the implementation.
>> 
>> Techniques to be covered in the project include the following:
>> 
>> Stack and register erasure. Ensuring that on return from a function,
>> no data is left lying on the stack or in registers. Particular
>> challenges are in dealing with inlining, shrink wrapping and caching.
>> 
>> This patch implemens register erasure.
> 
> Part of it, yes. While I can see abnormal transfer of control is difficult 
> exception handling is used too wide spread to be ignored. What's the plan 
> there? 
> 
> So can we also see the other parts? In particular I wonder whether exposing 
> just register clearing (in this fine-grained manner) is required and useful 
> rather than thinking of a better interface for the whole thing?

You mean to provide an integrated interface for both stack and register erasure 
for security purpose? 

However, Is stack erasure at function return really a better idea than 
zero-init auto-variables in the beginning of the function?

We had some discussion with Kees Cook several weeks ago on the idea of stack 
erasure at function return, Kees provided the following comments:

"But back to why I don't think it's the right approach:

Based on the performance measurements of pattern-init and zero-init
in Clang, MSVC, and the kernel plugin, it's clear that adding these
initializations has measurable performance cost. Doing it at function
exit means performing large unconditional wipes. Doing it at function
entry means initializations can be dead-store eliminated and highly
optimized. Given the current debates on the measurable performance
difference between pattern and zero initialization (even in the face of
existing dead-store elimination), I would expect wipe-on-function-exit to
be outside the acceptable tolerance for performance impact. (Additionally,
we've seen negative cache effects on wiping memory when the CPU is done
using it, though this is more pronounced in heap wiping. Zeroing at
free is about twice as expensive as zeroing at free time due to cache
temporality. This is true for the stack as well, but it's not as high.)”

From my understanding, the major issue with stack erasure at function result is 
the big performance overhead,
And these performance overhead cannot be reduced with compiler optimizations 
since those 
additional wiping insns are inserted at the end of the routine.

Based on the previous discussion with Kees, I don’t think that stack erasure at 
function return is a good idea,  
Instead, we might provide an alternative approach:  zero/pattern init to 
auto-variables. (This functionality has
Been available in LLVM already)
This will be another patch we want to add to GCC for the security purpose in 
general. 

So, I think for the current patch, -fzero-call-used-regs should be good enough. 

Any comments?

Qing





> 
> Richard. 



Re: RFC: Monitoring old PRs, new dg directives

2020-08-05 Thread Mike Stump via Gcc-patches
On Aug 4, 2020, at 5:54 PM, Marek Polacek  wrote:
>> As you find it difficult to express a test using the existing mechanisms, 
>> let's talk about those and see if anyone has a good idea on how to express 
>> it.  I think ICEs are the most annoying to manage, but, I think excess and 
>> prune should be able to handle them.  I think should get an error or 
>> warning, or should not get an error or warning are more trivial to manage.
> 
> I experimented with
> // { dg-prune-output ".*internal compiler error.*" }
> // { dg-xfail-if "" { *-*-* } }
> but it's a mouthful and the results were poor (when the ICE is fixed but we
> generate errors instead).  dg-ice is convenient, handles even the different
> kind of ICE (when the diagnostic routines were re-entered), and generates
> nice XPASSes when the ICE goes away.
> 
> I've also played games with dg-regexp but it was too ugly.
> 
> (I honestly don't see why new directives are such a big deal, if they're
> properly documented.)

I don't see a bogus here?  I think that can't be skipped.

Re: [PATCH][RFC] diagnostics: Add support for Unicode drawing characters

2020-08-05 Thread Lewis Hyatt via Gcc-patches
On Fri, Jul 24, 2020 at 02:49:36PM +0100, Richard Sandiford wrote:
> Lewis Hyatt via Gcc-patches  writes:
> > On Thu, Jul 23, 2020 at 05:47:28PM -0400, David Malcolm wrote:
> >> On Thu, 2020-07-23 at 12:28 -0400, Lewis Hyatt via Gcc-patches wrote:
> >> > Hello-
> >> > 
> >> > The attached patch is complete including docs, but I tagged as RFC
> >> > because I am not sure if anyone will like it, or if the general
> >> > reaction may
> >> > be closer to recoiling in horror :). Would appreciate your thoughts,
> >> > please...
> >> 
> >> Thanks for working on this.  I'm interested in other people's thoughts
> >> on this.  Various comments inline throughout below.
> 
> +1 in favour FWIW.
>

Thanks for the feedback!

> > […]
> > By the way, I was wondering separately what you think about adding an option
> > like -fplain-diagnostics or something, which would achieve basically the 
> > same
> > thing you get in the testsuite right now (-fno-diagnostics-show-caret
> > -fno-diagnostics-show-line-numbers -fdiagnostics-color=never
> > -fdiagnostics-urls=never) but would change as necessary whenever diagnostics
> > evolve. It seems rather involved currently to add a new option like
> > -fdiagnostics-unicode-drawing but keep the testsuite working, in addition to
> > adding to prune.exp and to the libstdc++.exp, you also need to update the
> > compat.exp so that it can figure out to pass the option only to sufficiently
> > new compilers. With -fplain-diagnostics, this could just be part of the code
> > change and the testsuite could stay the same; this may also make it easier 
> > on
> > IDE type utilities since they could rely on a more stable format for the
> > diagnostics, assuming they don't already use JSON format.
> 
> Also agree that this would be a nice feature to have.  I guess it would
> act as an alias for all the -fno-* options at the point that it occurs
> on the command line, so that it would be possible to use:
> 
>   -fplain-diagnostics -fthe-diagnostic-feature-i-like
>

Yes, that's what I was thinking. Currently the option -fdiagnostics-color
requires special handling because it applies even before it appears in the
command line (so that, say, a wrong option which appears earlier can still get a
colorized diagnostic). I was thinking the way to go would be to expand
-fplain-diagnostics into its constituents around the same place that this
special handling is done. I'll go ahead and submit a patch separately for this
sometime soon, in case it is found useful.

> > […]
> >> * maybe have a different character for separating the line numbers as
> >> opposed to those for labels and for showing interprocedural paths.
> >>
> >
> > Something like that would be easy to add, sure, perhaps a double vertical 
> > line
> > instead:
> >
> > diagnostic-ranges.c:196:28: warning: field width specifier ‘*’ expects 
> > argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=]
> >   196 ║   __builtin_sprintf (d, " %*ld ", foo + bar, foo);
> >   ║   ═∧══╤
> >   ║│  │
> >   ║intlong int
> 
> Guess it's just personal taste, but that seems a bit too busy to me.
> Most diagnostics don't have interprocedural paths, and on its own,
> there doesn't seem to be a specific reason to have a double line
> on the left.

I tend to agree with this assessment as well.

-Lewis


Re: RFC: Monitoring old PRs, new dg directives

2020-08-05 Thread Marek Polacek via Gcc-patches
On Tue, Aug 04, 2020 at 08:54:32PM -0400, Marek Polacek via Gcc-patches wrote:
> On Tue, Aug 04, 2020 at 03:33:23PM -0700, Mike Stump wrote:
> > A word of caution, if we produce core files, before you add tons of core 
> > file producing test cases, you'll want to submit a, ulimit -c 0 patch that 
> > can avoid the issue.  corefile writing is slow and consumes disk.  I can't 
> > recall at the moment if the current infrastructure will reliably avoid core 
> > files.
> 
> I thought we'd already set ulimit -c 0, but I don't see that now.  I 
> definitely
> agree that we don't want core dumps.  It probably needs some hack that sets
> ulimit -c to 0 when we're running a test in */unfixed/.  :/  Which also argues
> for a separate directory for unfixed tests.

I realized that the crashing compiler will probably only generate a code dump
when using -dH, so we should be fine.

Marek



Re: [PATCH] c++: dependent constraint on placeholder return type [PR96443]

2020-08-05 Thread Jason Merrill via Gcc-patches

On 8/4/20 8:00 PM, Patrick Palka wrote:

On Tue, 4 Aug 2020, Patrick Palka wrote:


In the testcase below, we never substitute function-template arguments
into f15's placeholder-return-type constraint, which leads to us
incorrectly rejecting this instantiation in do_auto_deduction due to
satisfaction failure (of the constraint SameAs).

The fact that we incorrectly reject this testcase is masked by the
other instantiation f15, which we correctly reject and diagnose
(by accident).

A good place to do this missing substitution seems to be during
TEMPLATE_TYPE_PARM level lowering.  So this patch adds a call to
tsubst_constraint there, and also adds dg-bogus directives to this
testcase wherever we expect instantiation to succeed. (So without the
substitution fix, this last dg-bogus would FAIL).



Successfully tested on x86_64-pc-linux-gnu, and also on the cmcstl2 and
range-v3 projects.  Does this look OK to commit?

gcc/cp/ChangeLog:

PR c++/96443
* pt.c (tsubst) : Substitute into
the constraints on a placeholder type when its level.

gcc/testsuite/ChangeLog:

PR c++/96443
* g++.dg/cpp2a/concepts-ts1.C: Add dg-bogus wherever we expect
instantiation to succeed.


Looking back at this patch with fresh eyes, I realized that the commit
message is not the best.  I rewrote the commit message to hopefully be
more coherent below:

-- >8 --

Subject: [PATCH] c++: dependent constraint on placeholder return type
  [PR96443]

In the testcase concepts-ts1.C, we're incorrectly rejecting the call to
'f15(0)' due to satisfaction failure of the function's
placeholder-return-type constraint.

The testcase doesn't spot this rejection because the error we emit for
the constraint failure points to f15's return statement instead of the
call site, and we already have a dg-error at the return statement to
verify the (correct) rejection of the call f15('a').  So in order to
verify that we indeed accept the call 'f15(0)', we need to add a
dg-bogus directive at the call site to look for the "required from here"
diagnostic line that generally accompanies an instantiation failure.

As for why satisfaction failure occurs, it turns out that we never
substitute the template arguments of a function template specialization
in to its placeholder-return-type constraint.  So in this case during
do_auto_deduction, we end up checking satisfaction of the still-dependent
constraint SameAs from do_auto_deduction, which fails
because it's dependent.

A good place to do this missing substitution seems to be during
TEMPLATE_TYPE_PARM level lowering; so this patch adds a call to
tsubst_constraint there.


Doing substitution seems like the wrong approach here; requirements 
should never be substituted except as part of satisfaction calculation 
(or, rarely, for declaration matching).  If we aren't communicating all 
the necessary template arguments to the later satisfaction, that's what 
we need to fix.



Successfully tested on x86_64-pc-linux-gnu, and also on the cmcstl2 and
range-v3 projects.  Does this look OK to commit?

gcc/cp/ChangeLog:

PR c++/96443
* pt.c (tsubst) : Substitute into
the constraints on a placeholder type when reducing its level.

gcc/testsuite/ChangeLog:

PR c++/96443
* g++.dg/cpp2a/concepts-ts1.C: Add dg-bogus to the call to f15
that we expect to accept.
---
  gcc/cp/pt.c   | 7 ---
  gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C | 2 +-
  2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index e7496002c1c..9f3426f8249 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15524,10 +15524,11 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
  
  if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)

  {
-   /* Propagate constraints on placeholders since they are
-  only instantiated during satisfaction.  */
+   /* Substitute constraints on placeholders when reducing
+  their level.  */
if (tree constr = PLACEHOLDER_TYPE_CONSTRAINTS (t))
- PLACEHOLDER_TYPE_CONSTRAINTS (r) = constr;
+ PLACEHOLDER_TYPE_CONSTRAINTS (r)
+   = tsubst_constraint (constr, args, complain, in_decl);
else if (tree pl = CLASS_PLACEHOLDER_TEMPLATE (t))
  {
pl = tsubst_copy (pl, args, complain, in_decl);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C
index 1cefe3b243f..a116cac4ea4 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C
@@ -40,7 +40,7 @@ void driver()
f3('a'); // { dg-error "" }
f4(0, 0);
f4(0, 'a'); // { dg-error "" }
-  f15(0);
+  f15(0); // { dg-bogus "" }
f15('a'); // { dg-message "" }
  }
  





Re: RFC: Monitoring old PRs, new dg directives

2020-08-05 Thread Mike Stump via Gcc-patches
On Aug 5, 2020, at 5:56 AM, Marek Polacek  wrote:
> 
> Sorry for being unclear.  Say you have
> 
> // PR c++/55408
> 
> struct foo {
>template
>void bar();
> };
> 
> template
> void foo::bar() {}
> 
> int main()
> {
>extern int i;
>foo {}.bar<>();
> }
> 
> which we wrongly accept.  It might be unclear

Sure, one might think it goes on line 12...  could be wrong.  But, if one wants 
to do better, one can run clang on it, an it says that the error goes on line 7.

> what line to put that dg-error
> on.  So you put it at the end of the file.  Then when we start issuing an 
> error
> for the testcase, you will just get a FAIL, not an XPASS.  That might be
> confusing if that file isn't in unfixed/.

The people that fix bugs should be able to sort it out.


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-05 Thread H.J. Lu via Gcc-patches
On Wed, Aug 5, 2020 at 11:53 AM Richard Biener  wrote:
>
> On August 5, 2020 4:45:00 PM GMT+02:00, "H.J. Lu"  wrote:
> >On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu  wrote:
> >>
> >> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener 
> >wrote:
> >> >
> >> > On Wed, 5 Aug 2020, H.J. Lu wrote:
> >> >
> >> > > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener
> > wrote:
> >> > > >
> >> > > > On Tue, 4 Aug 2020, H.J. Lu wrote:
> >> > > >
> >> > > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener
> > wrote:
> >> > > > > >
> >> > > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> >> > > > > >
> >> > > > > > > Hi, Uros,
> >> > > > > > >
> >> > > > > > > Thanks a lot for your review on X86 parts.
> >> > > > > > >
> >> > > > > > > Hi, Richard,
> >> > > > > > >
> >> > > > > > > Could you please take a look at the middle-end part to
> >see whether the
> >> > > > > > > rewritten addressed your previous concern?
> >> > > > > >
> >> > > > > > I have a few comments below - I'm not sure I'm qualified to
> >fully
> >> > > > > > review the rest though.
> >> > > > > >
> >> > > > > > > Thanks a lot.
> >> > > > > > >
> >> > > > > > > Qing
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak
> > wrote:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao
> >mailto:qing.z...@oracle.com>> napisala:
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Richard and Uros,
> >> > > > > > > > >
> >> > > > > > > > > Could you please review the change that H.J and I
> >rewrote based on your comments in the previous round of discussion?
> >> > > > > > > > >
> >> > > > > > > > > This patch is a nice security enhancement for GCC
> >that has been requested by security people for quite some time.
> >> > > > > > > > >
> >> > > > > > > > > Thanks a lot for your time.
> >> > > > > > > >
> >> > > > > > > > I'll be away from the keyboard for the next week, but
> >the patch needs a middle end approval first.
> >> > > > > > > >
> >> > > > > > > > That said, x86 parts looks OK.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > > Uros.
> >> > > > > > > > > Qing
> >> > > > > > > > >
> >> > > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via
> >Gcc-patches mailto:gcc-patches@gcc.gnu.org>>
> >wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > Hi, Gcc team,
> >> > > > > > > > > >
> >> > > > > > > > > > This patch is a follow-up on the previous patch and
> >corresponding discussion:
> >> > > > > > > > > >
> >https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
> >
> > >>
> >> > > > > > > > > >
> >> > > > > > > > > > From the previous round of discussion, the major
> >issues raised were:
> >> > > > > > > > > >
> >> > > > > > > > > > A. should be rewritten by using regsets
> >infrastructure.
> >> > > > > > > > > > B. Put the patch into middle-end instead of x86
> >backend.
> >> > > > > > > > > >
> >> > > > > > > > > > This new patch is rewritten based on the above 2
> >comments.  The major changes compared to the previous patch are:
> >> > > > > > > > > >
> >> > > > > > > > > > 1. Change the names of the option and attribute
> >from
> >> > > > > > > > > >
> >-mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and
> >zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> >> > > > > > > > > > to:
> >> > > > > > > > > >
> >-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and
> >zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> >> > > > > > > > > > Add the new option and  new attribute in general.
> >> > > > > > > > > > 2. The main code generation part is moved from i386
> >backend to middle-end;
> >> > > > > > > > > > 3. Add 4 target-hooks;
> >> > > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> >> > > > > > > > > > 5. On a target that does not implement the target
> >hook, issue error for the new option, issue warning for the new
> >attribute.
> >> > > > > > > > > >
> >> > > > > > > > > > The patch is as following:
> >> > > > > > > > > >
> >> > > > > > > > > > [PATCH] Add
> >-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> >> > > > > > > > > > command-line option and
> >> > > > > > > > > >
> >zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function
> >attribue:
> >> > > > > > > > > >
> >> > > > > > > > > >  1. -fzero-call-used-regs=skip and
> >zero_call_used_regs("skip")
> >> > > > > > > > > >
> >> > > > > > > > > >  Don't zero call-used registers upon function
> >return.
> >> > > > > >
> >> > > > > > Does a return via EH unwinding also constitute a function
> >return?  I
> >> > > > > > think you may want to have a finally handler or support in
> >the unwinder
> >> > > > > > for this?  Then there's abnormal return via longjmp &
> >friends, I guess
> >> > > > > > there's nothing that 

[PATCH] rs6000: Don't ICE when spilling an MMA accumulator

2020-08-05 Thread Peter Bergner via Gcc-patches
The following patch fixes one of the bugs discovered in PR96466, namely
when we spill an accumulator that has a known zero value.  In that case,
LRA would emit a new (set (reg:PXI ...) 0) insn, but it would not use the
mma_xxsetaccz pattern to do it.  The solution is to move the xxsetaccz
instruction into the movpxi pattern and have the xxsetaccz pattern call
the move pattern.

This patch fixes the ICE and is in the middle of regression testing.
Ok for trunk if the testing comes back clean?

This is also broken in GCC 10, so ok there after sitting on trunk for
a day or two with no fallout?

Peter

gcc/
PR target/96446
* gcc/config/rs6000/mma.md (*movpxi): Add xxsetaccz generation.
Disable split for zero constant source operand.
(mma_xxsetaccz): Change to define_expand.  Call gen_movpxi.

gcc/testsuite/
PR target/96446
* gcc.target/powerpc/pr96446.c: New test.


diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 15cacfb7fc1..fcca02bfa9f 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -328,11 +328,17 @@
   [(set (match_operand:PXI 0 "nonimmediate_operand" "=d,m,d,d")
(match_operand:PXI 1 "input_operand" "m,d,d,O"))]
   "TARGET_MMA
-   && ((gpc_reg_operand (operands[0], PXImode)
-   && !(CONST_INT_P (operands[1]) && INTVAL (operands[1]) == 0))
+   && (gpc_reg_operand (operands[0], PXImode)
|| gpc_reg_operand (operands[1], PXImode))"
-  "#"
-  "&& reload_completed"
+  "@
+   #
+   #
+   #
+   xxsetaccz %A0"
+  "&& reload_completed
+   && !(fpr_reg_operand (operands[0], PXImode)
+   && CONST_INT_P (operands[1])
+   && INTVAL (operands[1]) == 0)"
   [(const_int 0)]
 {
   rs6000_split_multireg_move (operands[0], operands[1]);
@@ -409,12 +415,14 @@
   " %A0"
   [(set_attr "type" "mma")])
 
-(define_insn "mma_xxsetaccz"
-  [(set (match_operand:PXI 0 "fpr_reg_operand" "=d")
+(define_expand "mma_xxsetaccz"
+  [(set (match_operand:PXI 0 "fpr_reg_operand")
(const_int 0))]
   "TARGET_MMA"
-  "xxsetaccz %A0"
-  [(set_attr "type" "mma")])
+{
+  emit_insn (gen_movpxi (operands[0], const0_rtx));
+  DONE;
+})
 
 (define_insn "mma_"
   [(set (match_operand:PXI 0 "fpr_reg_operand" "=")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96446.c 
b/gcc/testsuite/gcc.target/powerpc/pr96446.c
new file mode 100644
index 000..2083bf4a76a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96446.c
@@ -0,0 +1,16 @@
+/* PR target/96466 */
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Verify we do not ICE on the following.  */
+
+extern void bar0 (void);
+void
+foo0 (__vector_quad *dst)
+{
+  __vector_quad acc;
+  __builtin_mma_xxsetaccz ();
+  bar0 ();
+  *dst = acc;
+}


Re: [patch, fortran] Compile-time check for change in DO variable in contained procedures

2020-08-05 Thread Thomas Koenig via Gcc-patches

Hi Paul,


I must say that I was thinking rather more of the INTENT(IN) case to make
sure that it is accepted.


Ah, I misunderstood that.  You're right - also check legal code :-)

I've committed the attached test case as obvious (after verifying that
it passes. It checks against functions and subrotuines with
INTENT(IN) and unspecified intent.

Best regards

Thomas

Added test case to make sure that legal cases still pass.

gcc/testsuite/ChangeLog:

PR fortran/96469
* gfortran.dg/do_check_14.f90: New test.
! { dg-do compile }
! PR fortran/96469 - make sure that all legal variants pass.

module x
  implicit none
contains
  subroutine sub_intent_in(i)
integer, intent(in) :: i
  end subroutine sub_intent_in
  subroutine sub_intent_unspec(i)
integer :: i
  end subroutine sub_intent_unspec
  integer function fcn_intent_in(i)
integer, intent(in) :: i
fcn_intent_in = i + 42
  end function fcn_intent_in
  integer function fcn_intent_unspec (i)
integer :: i
fcn_intent_unspec = i + 42
  end function fcn_intent_unspec
end module x

program main
  use x
  implicit none
  integer :: i1, i2, i3, i4
  integer :: k, l
  do i1=1,10
 call sub1
  end do
  do i2=1,10
 call sub2
  end do
  do i3 = 1,10
 k = fcn3()
  end do
  do i4=1,10
 l = fcn4()
  end do
contains
  subroutine sub1
call sub_intent_in (i1)
  end subroutine sub1
  subroutine sub2
integer :: m
m = fcn_intent_in (i2)
print *,m
  end subroutine sub2
  integer function fcn3()
call sub_intent_unspec (i3)
fcn3 = 42
  end function fcn3
  integer function fcn4()
fcn4 = fcn_intent_unspec (i4)
  end function fcn4
end program main


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-05 Thread Richard Biener
On August 5, 2020 4:45:00 PM GMT+02:00, "H.J. Lu"  wrote:
>On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu  wrote:
>>
>> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener 
>wrote:
>> >
>> > On Wed, 5 Aug 2020, H.J. Lu wrote:
>> >
>> > > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener
> wrote:
>> > > >
>> > > > On Tue, 4 Aug 2020, H.J. Lu wrote:
>> > > >
>> > > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener
> wrote:
>> > > > > >
>> > > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
>> > > > > >
>> > > > > > > Hi, Uros,
>> > > > > > >
>> > > > > > > Thanks a lot for your review on X86 parts.
>> > > > > > >
>> > > > > > > Hi, Richard,
>> > > > > > >
>> > > > > > > Could you please take a look at the middle-end part to
>see whether the
>> > > > > > > rewritten addressed your previous concern?
>> > > > > >
>> > > > > > I have a few comments below - I'm not sure I'm qualified to
>fully
>> > > > > > review the rest though.
>> > > > > >
>> > > > > > > Thanks a lot.
>> > > > > > >
>> > > > > > > Qing
>> > > > > > >
>> > > > > > >
>> > > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak
> wrote:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao
>mailto:qing.z...@oracle.com>> napisala:
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Richard and Uros,
>> > > > > > > > >
>> > > > > > > > > Could you please review the change that H.J and I
>rewrote based on your comments in the previous round of discussion?
>> > > > > > > > >
>> > > > > > > > > This patch is a nice security enhancement for GCC
>that has been requested by security people for quite some time.
>> > > > > > > > >
>> > > > > > > > > Thanks a lot for your time.
>> > > > > > > >
>> > > > > > > > I'll be away from the keyboard for the next week, but
>the patch needs a middle end approval first.
>> > > > > > > >
>> > > > > > > > That said, x86 parts looks OK.
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > > > Uros.
>> > > > > > > > > Qing
>> > > > > > > > >
>> > > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via
>Gcc-patches mailto:gcc-patches@gcc.gnu.org>>
>wrote:
>> > > > > > > > > >
>> > > > > > > > > > Hi, Gcc team,
>> > > > > > > > > >
>> > > > > > > > > > This patch is a follow-up on the previous patch and
>corresponding discussion:
>> > > > > > > > > >
>https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
>
>>
>> > > > > > > > > >
>> > > > > > > > > > From the previous round of discussion, the major
>issues raised were:
>> > > > > > > > > >
>> > > > > > > > > > A. should be rewritten by using regsets
>infrastructure.
>> > > > > > > > > > B. Put the patch into middle-end instead of x86
>backend.
>> > > > > > > > > >
>> > > > > > > > > > This new patch is rewritten based on the above 2
>comments.  The major changes compared to the previous patch are:
>> > > > > > > > > >
>> > > > > > > > > > 1. Change the names of the option and attribute
>from
>> > > > > > > > > >
>-mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and
>zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
>> > > > > > > > > > to:
>> > > > > > > > > >
>-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and 
>zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
>> > > > > > > > > > Add the new option and  new attribute in general.
>> > > > > > > > > > 2. The main code generation part is moved from i386
>backend to middle-end;
>> > > > > > > > > > 3. Add 4 target-hooks;
>> > > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
>> > > > > > > > > > 5. On a target that does not implement the target
>hook, issue error for the new option, issue warning for the new
>attribute.
>> > > > > > > > > >
>> > > > > > > > > > The patch is as following:
>> > > > > > > > > >
>> > > > > > > > > > [PATCH] Add
>-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>> > > > > > > > > > command-line option and
>> > > > > > > > > >
>zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function
>attribue:
>> > > > > > > > > >
>> > > > > > > > > >  1. -fzero-call-used-regs=skip and
>zero_call_used_regs("skip")
>> > > > > > > > > >
>> > > > > > > > > >  Don't zero call-used registers upon function
>return.
>> > > > > >
>> > > > > > Does a return via EH unwinding also constitute a function
>return?  I
>> > > > > > think you may want to have a finally handler or support in
>the unwinder
>> > > > > > for this?  Then there's abnormal return via longjmp &
>friends, I guess
>> > > > > > there's nothing that can be done there besides patching
>glibc?
>> > > > >
>> > > > > Abnormal returns, like EH unwinding and longjmp, aren't
>covered by this
>> > > > > patch. Only normal returns are covered.
>> > > >
>> > > > What's the point then?  Also specifically thinking about spill
>slots.
>> > > >
>> > >
>> > > The goal of this 

Re: [PATCH] c++: cxx_eval_vec_init after zero initialization [PR96282]

2020-08-05 Thread Jason Merrill via Gcc-patches

On 8/4/20 2:31 PM, Patrick Palka wrote:

On Tue, 4 Aug 2020, Jason Merrill wrote:


On 8/4/20 9:45 AM, Patrick Palka wrote:

On Mon, 3 Aug 2020, Patrick Palka wrote:


On Mon, 3 Aug 2020, Jason Merrill wrote:


On 8/3/20 2:45 PM, Patrick Palka wrote:

On Mon, 3 Aug 2020, Jason Merrill wrote:


On 8/3/20 8:53 AM, Patrick Palka wrote:

On Mon, 3 Aug 2020, Patrick Palka wrote:


In the first testcase below, expand_aggr_init_1 sets up t's
default
constructor such that the ctor first zero-initializes the entire
base
b,
followed by calling b's default constructor, the latter of which
just
default-initializes the array member b::m via a VEC_INIT_EXPR.

So upon constexpr evaluation of this latter VEC_INIT_EXPR,
ctx->ctor
is
nonempty due to the prior zero-initialization, and we proceed in
cxx_eval_vec_init to append new constructor_elts to the end of
ctx->ctor
without first checking if a matching constructor_elt already
exists.
This leads to ctx->ctor having two matching constructor_elts for
each
index.

This patch partially fixes this issue by making the RANGE_EXPR
optimization in cxx_eval_vec_init truncate ctx->ctor before
adding the
single RANGE_EXPR constructor_elt.  This isn't a complete fix
because
the RANGE_EXPR optimization applies only when the constant
initializer
is relocatable, so whenever it's not relocatable we can still
build up
an invalid CONSTRUCTOR, e.g. if in the first testcase we add an
NSDMI
such as 'e *p = this;' to struct e, then the ICE still occurs
even
with
this patch.


A complete but more risky one-line fix would be to always truncate
ctx->ctor beforehand, not just when the RANGE_EXPR optimization
applies.
If it's true that the initializer of a VEC_INIT_EXPR can't observe
the
previous elements of the target array, then it should be safe to
always
truncate I think?


What if default-initialization of the array element type doesn't
fully
initialize the elements, e.g. if 'e' had another member without a
default
initializer?  Does truncation first mean we lose the
zero-initialization
of
such a member?


Hmm, it looks like we would lose the zero-initialization of such a
member with or without truncation first (so with any one of the three
proposed fixes).  I think it's because the evaluation loop in
cxx_eval_vec_init disregards each element's prior (zero-initialized)
state.



We could probably still do the truncation, but clear the
CONSTRUCTOR_NO_CLEARING flag on the element initializer.


Ah, this seems to work well.  Like this?

-- >8 --

Subject: [PATCH] c++: cxx_eval_vec_init after zero initialization
[PR96282]

In the first testcase below, expand_aggr_init_1 sets up t's default
constructor such that the ctor first zero-initializes the entire base
b,
followed by calling b's default constructor, the latter of which just
default-initializes the array member b::m via a VEC_INIT_EXPR.

So upon constexpr evaluation of this latter VEC_INIT_EXPR, ctx->ctor
is
nonempty due to the prior zero-initialization, and we proceed in
cxx_eval_vec_init to append new constructor_elts to the end of
ctx->ctor
without first checking if a matching constructor_elt already exists.
This leads to ctx->ctor having two matching constructor_elts for each
index.

This patch fixes this issue by truncating a zero-initialized array
object in cxx_eval_vec_init_1 before we begin appending
default-initialized
array elements to it.  Since default-initialization may leave parts of
the element type unitialized, we also preserve the array's prior
zero-initialized state by clearing CONSTRUCTOR_NO_CLEARING on each
appended element initializers.

gcc/cp/ChangeLog:

PR c++/96282
* constexpr.c (cxx_eval_vec_init_1): Truncate ctx->ctor and
then clear CONSTRUCTOR_NO_CLEARING on each appended element
initializer if we're default-initializing a previously
zero-initialized array object.

gcc/testsuite/ChangeLog:

PR c++/96282
* g++.dg/cpp0x/constexpr-array26.C: New test.
* g++.dg/cpp0x/constexpr-array27.C: New test.
* g++.dg/cpp2a/constexpr-init18.C: New test.
---
gcc/cp/constexpr.c | 17
-
gcc/testsuite/g++.dg/cpp0x/constexpr-array26.C | 13 +
gcc/testsuite/g++.dg/cpp0x/constexpr-array27.C | 13 +
gcc/testsuite/g++.dg/cpp2a/constexpr-init18.C  | 16

4 files changed, 58 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-array26.C
create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-array27.C
create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-init18.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index b1c1d249c6e..706bef323b2 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4171,6 +4171,17 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx,
tree
atype, tree init,
  pre_init = true;
}
+  bool zero_initialized_p = false;
+  if ((pre_init || value_init || !init) && 

FENV_ACCESS status

2020-08-05 Thread Marc Glisse

Hello,

I updated the patch discussed in 
https://patchwork.ozlabs.org/project/gcc/patch/alpine.deb.2.02.1906221743430.16...@grove.saclay.inria.fr/ 
and pushed it as something like refs/users/glisse/heads/fenv (first user 
branch in gcc's git, I hope it worked). I am also attaching the diff here.


I managed to compile and run real-world code with it, which is a good sign 
:-)


As should be obvious looking at the diff, there is a lot of work left. 
Experts may also find much better ways to rewrite several parts of the 
patch.


The option is called -ffenv-access so it doesn't interfere with 
-frounding-math, at least until we have something good enough to replace 
-frounding-math without too much performance regression.


I switched to hex float constants for DBL_MAX and others for C99+, I don't 
care about making fenv_access work in prehistoric modes. On the other 
hand, since I haven't started on fenv_round, this is probably useless for 
now.


Several changes, in particular the constexpr stuff, was needed to parse 
standard headers, otherwise I would have delayed the implementation.


Currently the floating point environment is represented by "memory" in 
general. Refining it so the compiler knows that storing a float does not 
change the rounding mode (for instance) should wait, in my opinion.


Conversions look like
.FENV_CONVERT (arg, (target_type*)0, 0)
the pointer is there so we know the target type, even if the lhs 
disappears at some point. The last 0 is the same as for all the others, a 
place to store options about the operation (do we care about rounding, 
about exceptions, etc), it is just a placeholder for now. I could rename 
it to .FENV_NOP since we seem to generate NOP usually, but it looked 
strange to me.


I did not do anything for templates in C++. As long as we have a constant 
global flag, it doesn't matter, but as soon as we will have a pragma, 
things will get messy, we will need to remember what the mode was when 
parsing, so we can use it at instantiation time... (or just declare that 
the pragma doesn't work with templates in a first version)


I am trying to have enough infrastructure in place so that the 
functionality is useful, and also so that implementing other pieces 
(parsing the pragma, C front-end, gimple optimizations, target hook for 
the asm string, opcode and target optimization, simd, etc) become 
independent and can be done by different people. It is unlikely that I can 
find the time to do everything. If other people want to contribute or even 
take over (assuming the branch does not look hopelessly bad to them), that 
would be great! That's also why I pushed it as a branch.


Apart from the obvious (making sure it bootstraps, running the testsuite, 
adding a few tests), what missing pieces do you consider a strict 
requirement for this to have a chance to reach master one day as an 
experimental option?


--
Marc Glissecommit 4adb494e88323bf41ee2c0871caa2323fa2aca06
Author: Marc Glisse 
Date:   Wed Aug 5 18:49:57 2020 +0200

Introduce -ffenv-access

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 74ecca8de8e..4c4d4f3d563 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1635,16 +1635,23 @@ lazy_hex_fp_value (cpp_reader *, cpp_macro *macro, unsigned num)
 {
   REAL_VALUE_TYPE real;
   char dec_str[64], buf1[256];
+  size_t len;
 
   gcc_checking_assert (num < lazy_hex_fp_value_count);
 
-  real_from_string (, lazy_hex_fp_values[num].hex_str);
-  real_to_decimal_for_mode (dec_str, , sizeof (dec_str),
-			lazy_hex_fp_values[num].digits, 0,
-			lazy_hex_fp_values[num].mode);
+  if (!flag_isoc99)
+{
+  real_from_string (, lazy_hex_fp_values[num].hex_str);
+  real_to_decimal_for_mode (dec_str, , sizeof (dec_str),
+lazy_hex_fp_values[num].digits, 0,
+lazy_hex_fp_values[num].mode);
+
+  len = sprintf (buf1, "%s%s", dec_str, lazy_hex_fp_values[num].fp_suffix);
+}
+  else
+len = sprintf (buf1, "%s%s", lazy_hex_fp_values[num].hex_str,
+		   lazy_hex_fp_values[num].fp_suffix);
 
-  size_t len
-= sprintf (buf1, "%s%s", dec_str, lazy_hex_fp_values[num].fp_suffix);
   gcc_assert (len < sizeof (buf1));
   for (unsigned idx = 0; idx < macro->count; idx++)
 if (macro->exp.tokens[idx].type == CPP_NUMBER)
@@ -1701,13 +1708,16 @@ builtin_define_with_hex_fp_value (const char *macro,
  it's easy to get the exact correct value), parse it as a real,
  then print it back out as decimal.  */
 
-  real_from_string (, hex_str);
-  real_to_decimal_for_mode (dec_str, , sizeof (dec_str), digits, 0,
-			TYPE_MODE (type));
+  if (!flag_isoc99)
+{
+  real_from_string (, hex_str);
+  real_to_decimal_for_mode (dec_str, , sizeof (dec_str), digits, 0,
+TYPE_MODE (type));
+}
 
   /* Assemble the macro in the following fashion
  macro = fp_cast [dec_str fp_suffix] */
-  sprintf (buf2, "%s%s", dec_str, fp_suffix);
+  sprintf (buf2, "%s%s", flag_isoc99 ? 

Re: std:vec for classes with constructor? (Was: Re: [patch] multi-range implementation for value_range (irange))

2020-08-05 Thread Richard Biener via Gcc-patches
On August 5, 2020 5:41:01 PM GMT+02:00, Aldy Hernandez via Gcc-patches 
 wrote:
>On 8/5/20 5:09 PM, Martin Jambor wrote:
>
>> On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
>>>
>> 
>> [...]
>> 
>>>
>>> * ipa-cp changes from vec to std::vec.
>>>
>>> We are using std::vec to ensure constructors are run, which they
>aren't
>>> in our internal vec<> implementation.  Although we usually steer
>away
>>> from using std::vec because of interactions with our GC system,
>>> ipcp_param_lattices is only live within the pass and allocated with
>calloc.
>>>
>> 
>> Ummm... I did not object but I will save the URL of this message in
>the
>> archive so that I can waive it in front of anyone complaining why I
>> don't use our internal vec's in IPA data structures.
>> 
>> But it actually raises a broader question: was this supposed to be an
>> exception, allowed only not to complicate the irange patch further,
>or
>> will this be generally accepted thing to do when someone wants to
>have a
>> vector of constructed items?
>
>I don't want to start a precedent without further discussion, so let's 
>assume this was an exception.
>
>Is there another objection to std::vec<> other than it doesn't play
>well 
>with our GC?  Is GCC's vec<> that much faster/efficient than
>std::vec<>? 
>  Does it matter?
>
>I will note that an alternative would have been to rewrite our 
>internal's vec<> implementation so that constructors are run.  We 
>explored that, but it seemed like more work than it was worth.
>
>Andrew, do you remember the details on the C++ness issues with GCC's 
>vec<> implementation?

Note we generally want to avoid memory allocations per element so can't you use 
a value_range<1> or so to make it POD? 

Richard. 

>Aldy



Re: std:vec for classes with constructor? (Was: Re: [patch] multi-range implementation for value_range (irange))

2020-08-05 Thread Richard Biener via Gcc-patches
On August 5, 2020 5:09:19 PM GMT+02:00, Martin Jambor  wrote:
>On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
>>
>
>[...]
>
>>
>> * ipa-cp changes from vec to std::vec.
>>
>> We are using std::vec to ensure constructors are run, which they
>aren't 
>> in our internal vec<> implementation.  Although we usually steer away
>
>> from using std::vec because of interactions with our GC system, 
>> ipcp_param_lattices is only live within the pass and allocated with
>calloc.
>>
>
>Ummm... I did not object but I will save the URL of this message in the
>archive so that I can waive it in front of anyone complaining why I
>don't use our internal vec's in IPA data structures.
>
>But it actually raises a broader question: was this supposed to be an
>exception, allowed only not to complicate the irange patch further, or
>will this be generally accepted thing to do when someone wants to have
>a
>vector of constructed items?

It's definitely not what we want. You have to find another solution to this 
problem. 

Richard. 

>Thanks,
>
>Martin



Re: [patch, fortran] Compile-time check for change in DO variable in contained procedures

2020-08-05 Thread Paul Richard Thomas via Gcc-patches
I must say that I was thinking rather more of the INTENT(IN) case to make
sure that it is accepted.

Paul


On Wed, 5 Aug 2020 at 17:41, Thomas Koenig  wrote:

> Hi Paul,
>
> > This is OK by me.
>
> Committed (or should I say "pushed"?), thanks!
>
> > Is it worth testing the INTENT variants?
>
> I added a test for INTENT(INOUT), here's the version of the
> test case that was committed.
>
> Best regards
>
> Thomas
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


Re: [patch, fortran] Compile-time check for change in DO variable in contained procedures

2020-08-05 Thread Thomas Koenig via Gcc-patches

Hi Paul,


This is OK by me.


Committed (or should I say "pushed"?), thanks!


Is it worth testing the INTENT variants?


I added a test for INTENT(INOUT), here's the version of the
test case that was committed.

Best regards

Thomas
program main
  implicit none
  integer :: i1, i2, i3, i4, i5, i6, i7
  integer :: j
  do i1=1,10
 call sub1 ! { dg-error "Index variable 'i1' redefined" }
  end do
  do i2=1,10
 call sub2 ! { dg-error "Index variable 'i2' redefined" }
  end do
  do i3=1,10
 j = fcn3() ! { dg-error "Index variable 'i3' redefined" }
  end do
  do i4=1,10
 j = fcn4() ! { dg-error "Index variable 'i4' redefined" }
  end do
  do i5=1,10
 call sub5 ! { dg-error "Index variable 'i5' set to undefined" }
  end do

  call sub6

  do i7=1,10
 call sub7 ! { dg-error "Index variable 'i7' not definable" }
  end do
contains
  subroutine sub1
i1 = 5 ! { dg-error "Index variable 'i1' redefined" }
  end subroutine sub1

  subroutine sub2
do i2=1,5 ! { dg-error "Index variable 'i2' redefined" }
end do
  end subroutine sub2
  
  integer function fcn3()
i3 = 1 ! { dg-error "Index variable 'i3' redefined" }
fcn3 = i3
  end function fcn3

  integer function fcn4()
open (10,file="foo.dat", iostat=i4) ! { dg-error "Index variable 'i4' redefined" }
fcn4 = 12
  end function fcn4

  subroutine sub5
integer :: k
k = intentout(i5) ! { dg-error "Index variable 'i5' set to undefined" }
  end subroutine sub5

  subroutine sub6
do i6=1,10
   call sub6a ! { dg-error "Index variable 'i6' redefined" }
end do
  end subroutine sub6

  subroutine sub6a
i6 = 5   ! { dg-error "Index variable 'i6' redefined" }
  end subroutine sub6a

  subroutine sub7
integer :: k
k = intentinout (i7)  ! { dg-error "Index variable 'i7' not definable" }
  end subroutine sub7
  
  integer function intentout(i)
integer, intent(out) :: i
  end function intentout

  integer function intentinout(i)
integer, intent(inout) :: i
  end function intentinout
end program main

module foo
  integer :: j1
contains
  subroutine mod_sub_1
do j1=1,10
   call aux ! { dg-error "Index variable 'j1' redefined" }
end do
  end subroutine mod_sub_1
  subroutine aux
j1 = 3  ! { dg-error "Index variable 'j1' redefined" }
  end subroutine aux
end module foo


Re: [patch] multi-range implementation for value_range (irange)

2020-08-05 Thread Aldy Hernandez via Gcc-patches




On 8/5/20 4:27 PM, Gerald Pfeifer wrote:

Hi Aldy,

On Fri, 31 Jul 2020, Aldy Hernandez via Gcc-patches wrote:

Jeff approved this patch off-list.  I will re-run tests once again and
commit by Monday.


I believe this has broken the bootstrap with clang (specifically
FreeBSD clang version 10.0.0):

In file included from /scratch/tmp/gerald/GCC-HEAD/gcc/c/gimple-parser.c:44:
In file included from /scratch/tmp/gerald/GCC-HEAD/gcc/tree-vrp.h:23:
/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:347:1: error: static declaration of 
'gt_ggc_mx' follows non-static declaration gt_ggc_mx (int_range *x)
/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:150:37: note: previous 
declaration is here
   template  friend void gt_ggc_mx (int_range *);
 ^

/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:358:1: error: static declaration of 
'gt_pch_nx' follows non-static declaration gt_pch_nx (int_range *x)
/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:151:37: note: previous 
declaration is here
   template  friend void gt_pch_nx (int_range *);
 ^

My daily tester started to 20200803T1640, so the root cause of this must
have entered GCC trunk between Sunday 16:40 UTC and Monday 16:40 UTC.


Yeah, this is definitely caused by the irange patch.

GTY makes my head spin, and it took forever to get these gt_* functions 
set up.  I can't claim I understand them entirely :).


Just a guess, does removing the static solve the problem?

diff --git a/gcc/value-range.h b/gcc/value-range.h
index e3282c4ad03..1ab39939703 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -343,7 +343,7 @@ range_includes_zero_p (const irange *vr)
 }

 template
-static inline void
+inline void
 gt_ggc_mx (int_range *x)
 {
   for (unsigned i = 0; i < N; ++i)
@@ -354,7 +354,7 @@ gt_ggc_mx (int_range *x)
 }

 template
-static inline void
+inline void
 gt_pch_nx (int_range *x)
 {
   for (unsigned i = 0; i < N; ++i)
@@ -365,7 +365,7 @@ gt_pch_nx (int_range *x)
 }

 template
-static inline void
+inline void
 gt_pch_nx (int_range *x, gt_pointer_operator op, void *cookie)
 {
   for (unsigned i = 0; i < N; ++i)



Re: std:vec for classes with constructor? (Was: Re: [patch] multi-range implementation for value_range (irange))

2020-08-05 Thread Aldy Hernandez via Gcc-patches

On 8/5/20 5:09 PM, Martin Jambor wrote:


On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:




[...]



* ipa-cp changes from vec to std::vec.

We are using std::vec to ensure constructors are run, which they aren't
in our internal vec<> implementation.  Although we usually steer away
from using std::vec because of interactions with our GC system,
ipcp_param_lattices is only live within the pass and allocated with calloc.



Ummm... I did not object but I will save the URL of this message in the
archive so that I can waive it in front of anyone complaining why I
don't use our internal vec's in IPA data structures.

But it actually raises a broader question: was this supposed to be an
exception, allowed only not to complicate the irange patch further, or
will this be generally accepted thing to do when someone wants to have a
vector of constructed items?


I don't want to start a precedent without further discussion, so let's 
assume this was an exception.


Is there another objection to std::vec<> other than it doesn't play well 
with our GC?  Is GCC's vec<> that much faster/efficient than std::vec<>? 
 Does it matter?


I will note that an alternative would have been to rewrite our 
internal's vec<> implementation so that constructors are run.  We 
explored that, but it seemed like more work than it was worth.


Andrew, do you remember the details on the C++ness issues with GCC's 
vec<> implementation?


Aldy



Re: [Patch, fortran] PR96102 - ICE in check_host_association, at fortran/resolve.c:5994

2020-08-05 Thread Steve Kargl via Gcc-patches
On Wed, Aug 05, 2020 at 03:59:38PM +0100, Paul Richard Thomas wrote:
> The attached patch builds on a draft posted by Steve Kargl. I have left the
> gcc_assert in place just in case my imagination was too limited to generate
> an ICE.
> 
> Regtests OK on FC31/x86_64 - OK for trunk?
> 

Looks OK to me.

-- 
Steve


RE: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion

2020-08-05 Thread Sudakshina Das
Hi Richard

Thank you for fixing this. I apologise for the trouble. I ran bootstrap only on 
an
earlier version of the patch where I should have ran it again on the final one! 
☹
I will be more careful in the future,

Thanks
Sudi


> -Original Message-
> From: Richard Sandiford 
> Sent: 05 August 2020 14:52
> To: Andreas Schwab 
> Cc: Sudakshina Das ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem
> expansion
> 
> Andreas Schwab  writes:
> > This breaks bootstrap.
> 
> I've pushed the below to fix this after bootstrapping & regression testing on
> aarch64-linux-gnu.
> 
> Richard



Re: [patch, fortran] Compile-time check for change in DO variable in contained procedures

2020-08-05 Thread Paul Richard Thomas via Gcc-patches
Hi Thomas,

This is OK by me.

Is it worth testing the INTENT variants?

Cheers

Paul


On Tue, 4 Aug 2020 at 20:03, Thomas Koenig via Fortran 
wrote:

> Hello world,
>
> the attached patch issues an error for something that I am sure most
> people did at least once (I know I did), something like
>
>do i=1,10
>   call foo
>end do
> ...
> contains
>subroutine foo
>  do i=1,5
> ...
>  end do
>
> which is, of course, illegal, but the programmer's fault. We issue an
> error with -fcheck=all, but a compile-time is better, of course.
>
> As you can see from the modification of do_check_4.f90, you have to go
> to some lengths to fool the compiler with this patch.
>
> As an aside, I could really have used three places for the error
> message here.  As is, I settled for the place of the call from
> the DO loop checked, and the place where it is modified.  With
> the name of the variable, the user should be able to figure out
> what's wrong.
>
> Regression-tested. OK for trunk?
>
> Best regards
>
> Thomas
>
> Static analysis for definition of DO index variables in contained
> procedures.
>
> When encountering a procedure call in a DO loop, this patch checks if
> the call is to a contained procedure, and if it is, check for
> changes in the index variable.
>
> gcc/fortran/ChangeLog:
>
> PR fortran/96469
> * frontend-passes.c (doloop_contained_function_call): New
> function.
> (doloop_contained_procedure_code): New function.
> (CHECK_INQ): Macro for inquire checks.
> (doloop_code): Invoke doloop_contained_procedure_code and
> doloop_contained_function_call if appropriate.
> (do_intent): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR fortran/96469
> * gfortran.dg/do_check_4.f90: Hide change in index variable
> from compile-time analysis.
> * gfortran.dg/do_check_4.f90: New test.
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


std:vec for classes with constructor? (Was: Re: [patch] multi-range implementation for value_range (irange))

2020-08-05 Thread Martin Jambor
On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
>

[...]

>
> * ipa-cp changes from vec to std::vec.
>
> We are using std::vec to ensure constructors are run, which they aren't 
> in our internal vec<> implementation.  Although we usually steer away 
> from using std::vec because of interactions with our GC system, 
> ipcp_param_lattices is only live within the pass and allocated with calloc.
>

Ummm... I did not object but I will save the URL of this message in the
archive so that I can waive it in front of anyone complaining why I
don't use our internal vec's in IPA data structures.

But it actually raises a broader question: was this supposed to be an
exception, allowed only not to complicate the irange patch further, or
will this be generally accepted thing to do when someone wants to have a
vector of constructed items?

Thanks,

Martin


Re: RFC: Monitoring old PRs, new dg directives

2020-08-05 Thread Nathan Sidwell

On 8/4/20 8:54 PM, Marek Polacek via Gcc-patches wrote:

On Tue, Aug 04, 2020 at 03:33:23PM -0700, Mike Stump wrote:

I think the read of the room is that people think it would be generally useful, 
so let approve the general plan.


Cool.


So, now we are down to the fine details.  Please do see just how far you can 
stretch the existing mechanisms to cover what you need to do.  I think the 
existing mechanisms should be able to cover it all; but the devil is in the 
details and those matter.


At this point I'm only proposing one new directive, dg-ice.  I think we can't
really do without it.  The other one was a matter of convenience.


I've realized I have a concern.  Grepping (or searching in an editor 
buffer) the log file for 'internal compiler error' to find actual 
regressions is a thing I want to still be able to do (perhaps with 
alternative spelling, I don't care).  I don't want to see the ICEs of 
tests that are expected to ICE.


I think that means there has to be a positive marker on the unexpected 
ICEs, rather than lack of an expected marker on them.


nathan

--
Nathan Sidwell


[Patch, fortran] PR96102 - ICE in check_host_association, at fortran/resolve.c:5994

2020-08-05 Thread Paul Richard Thomas via Gcc-patches
The attached patch builds on a draft posted by Steve Kargl. I have left the
gcc_assert in place just in case my imagination was too limited to generate
an ICE.

Regtests OK on FC31/x86_64 - OK for trunk?

Paul


Change.Logs
Description: Binary data
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 2751c0ccf62..4046bed1863 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -5993,6 +5993,17 @@ check_host_association (gfc_expr *e)
 		if (ref->type == REF_ARRAY && ref->next == NULL)
 		  break;
 
+	  if ((ref == NULL || ref->type != REF_ARRAY)
+		  && old_sym->attr.flavor == FL_VARIABLE
+		  && sym->attr.proc == PROC_INTERNAL)
+		{
+		  gfc_error ("%qs at %L is being host associated into "
+			 "a contained procedure with an internal "
+			 "procedure of the same name", sym->name,
+			  >declared_at);
+		  return false;
+		}
+
 	  gcc_assert (ref->type == REF_ARRAY);
 
 	  /* Grab the start expressions from the array ref and
! { dg-do compile }
!
! Test the fix for PR96102 in which the two lines with IF statements
! errors previously caused a segfault.
!
! Contributed by Gerhardt Steinmetz  
!
!
module m
   type mytype
 integer :: i
   end type
   type(mytype) :: d = mytype (42) ! { dg-error "is host associated" }
   integer :: n = 2! { dg-error "is host associated" }
contains
   subroutine s
  if ( n /= 0 ) stop 1! { dg-error "internal procedure of the same name" }
  if ( d%i /= 0 ) stop 2  ! { dg-error "internal procedure of the same name" }
   contains
  integer function n()
 n = 0
  end
  type(mytype) function d()
 d = mytype (0)
  end
   end
end


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-05 Thread Qing Zhao via Gcc-patches



> On Aug 5, 2020, at 9:45 AM, H.J. Lu  wrote:
> 
> On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu  > wrote:
>> 
>> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener  wrote:
>>> 
>>> 
>>> [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
>>> command-line option and
>>> zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function 
>>> attribue:
>>> 
>>> 1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
>>> 
>>> Don't zero call-used registers upon function return.
>>> 
>>> Does a return via EH unwinding also constitute a function return?  I
>>> think you may want to have a finally handler or support in the unwinder
>>> for this?  Then there's abnormal return via longjmp & friends, I guess
>>> there's nothing that can be done there besides patching glibc?
>> 
>> Abnormal returns, like EH unwinding and longjmp, aren't covered by this
>> patch. Only normal returns are covered.
> 
> What's the point then?  Also specifically thinking about spill slots.
> 
 
 The goal of this patch is to zero caller-saved registers upon normal
 function return.  Abnormal returns and spill slots are outside of the
 scope of this patch.
>>> 
>>> Sure, I can write a patch that spills some regs, writes zeros to them
>>> and then restores them.  And the patch will fulfil what it was designed
>>> to do.
>>> 
>>> Still I need to come up with a reason that this is a useful feature
>>> by its own for it to be accepted.
>>> 
>>> I am asking for that reason.  What's the reason for the "goal of this
>>> patch"?  Why's that a useful goal on its own?
>>> 
>> 
>> Hi Victor,
>> 
>> Can you provide some background information about how/why this feature
>> is used?
>> 
> 
> From The SECURE project and GCC in GCC Cauldron 2018:
> 
> Speaker: Graham Markall
> 
> The SECURE project is a 15 month program funded by Innovate UK, to
> take well known security techniques from academia and make them
> generally available in standard compilers, specfically GCC and LLVM.
> An explicit objective is for those techniques to be incorporated in
> the upstream versions of compilers. The Cauldron takes place in the
> final month of the project and this talk will present the technical
> details of some of the techniques implemented, and review those that
> are yet to be implemented. A particular focus of this talk will be on
> verifying that the implemetnation is correct, which can be a bigger
> challenge than the implementation.
> 
> Techniques to be covered in the project include the following:
> 
> Stack and register erasure. Ensuring that on return from a function,
> no data is left lying on the stack or in registers. Particular
> challenges are in dealing with inlining, shrink wrapping and caching.
> 
> This patch implemens register erasure.

In addition to the above, Victor mentioned a paper that can provide good 
background information
For this patch:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming 
Attacks"

https://ieeexplore.ieee.org/document/8445132 


The abstract of this paper is:

"With the implementation of W ⊕ X security model on computer system, 
Return-Oriented Programming(ROP) has become the primary exploitation
 technique for adversaries. Although many solutions that defend against ROP 
exploits have been proposed, they still suffer from various shortcomings.
 In this paper, we propose a new way to mitigate ROP attacks that are based 
on return instructions. We clean the scratch registers which are also the
 parameter registers based on the features of ROP malicious code and calling 
convention. A prototype is implemented on x64-based Linux platform based on Pin.
 Preliminary experimental results show that our method can efficiently mitigate 
conventional ROP attacks."

Qing
 
> 
> 
> -- 
> H.J.



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-05 Thread H.J. Lu via Gcc-patches
On Wed, Aug 5, 2020 at 5:34 AM H.J. Lu  wrote:
>
> On Wed, Aug 5, 2020 at 5:31 AM Richard Biener  wrote:
> >
> > On Wed, 5 Aug 2020, H.J. Lu wrote:
> >
> > > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener  wrote:
> > > >
> > > > On Tue, 4 Aug 2020, H.J. Lu wrote:
> > > >
> > > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > > > > >
> > > > > > > Hi, Uros,
> > > > > > >
> > > > > > > Thanks a lot for your review on X86 parts.
> > > > > > >
> > > > > > > Hi, Richard,
> > > > > > >
> > > > > > > Could you please take a look at the middle-end part to see 
> > > > > > > whether the
> > > > > > > rewritten addressed your previous concern?
> > > > > >
> > > > > > I have a few comments below - I'm not sure I'm qualified to fully
> > > > > > review the rest though.
> > > > > >
> > > > > > > Thanks a lot.
> > > > > > >
> > > > > > > Qing
> > > > > > >
> > > > > > >
> > > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak  
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao 
> > > > > > > > mailto:qing.z...@oracle.com>> napisala:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Richard and Uros,
> > > > > > > > >
> > > > > > > > > Could you please review the change that H.J and I rewrote 
> > > > > > > > > based on your comments in the previous round of discussion?
> > > > > > > > >
> > > > > > > > > This patch is a nice security enhancement for GCC that has 
> > > > > > > > > been requested by security people for quite some time.
> > > > > > > > >
> > > > > > > > > Thanks a lot for your time.
> > > > > > > >
> > > > > > > > I'll be away from the keyboard for the next week, but the patch 
> > > > > > > > needs a middle end approval first.
> > > > > > > >
> > > > > > > > That said, x86 parts looks OK.
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > > Uros.
> > > > > > > > > Qing
> > > > > > > > >
> > > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches 
> > > > > > > > > > mailto:gcc-patches@gcc.gnu.org>> 
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi, Gcc team,
> > > > > > > > > >
> > > > > > > > > > This patch is a follow-up on the previous patch and 
> > > > > > > > > > corresponding discussion:
> > > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
> > > > > > > > > >  
> > > > > > > > > > 
> > > > > > > > > >  
> > > > > > > > > >  > > > > > > > > >  
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > From the previous round of discussion, the major issues 
> > > > > > > > > > raised were:
> > > > > > > > > >
> > > > > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > > > > >
> > > > > > > > > > This new patch is rewritten based on the above 2 comments.  
> > > > > > > > > > The major changes compared to the previous patch are:
> > > > > > > > > >
> > > > > > > > > > 1. Change the names of the option and attribute from
> > > > > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  
> > > > > > > > > > and 
> > > > > > > > > > zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > > to:
> > > > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   
> > > > > > > > > > and  zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > > Add the new option and  new attribute in general.
> > > > > > > > > > 2. The main code generation part is moved from i386 backend 
> > > > > > > > > > to middle-end;
> > > > > > > > > > 3. Add 4 target-hooks;
> > > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > > > > 5. On a target that does not implement the target hook, 
> > > > > > > > > > issue error for the new option, issue warning for the new 
> > > > > > > > > > attribute.
> > > > > > > > > >
> > > > > > > > > > The patch is as following:
> > > > > > > > > >
> > > > > > > > > > [PATCH] Add 
> > > > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > > > > command-line option and
> > > > > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") 
> > > > > > > > > > function attribue:
> > > > > > > > > >
> > > > > > > > > >  1. -fzero-call-used-regs=skip and 
> > > > > > > > > > zero_call_used_regs("skip")
> > > > > > > > > >
> > > > > > > > > >  Don't zero call-used registers upon function return.
> > > > > >
> > > > > > Does a return via EH unwinding also constitute a function return?  I
> > > > > > think you may want to have a finally handler or support in the 
> > > > > > unwinder
> > > > > > for this?  Then there's abnormal return via longjmp & 

[PATCH] arm: Clear canary value after stack_protect_test [PR96191]

2020-08-05 Thread Richard Sandiford
The stack_protect_test patterns were leaving the canary value in the
temporary register, meaning that it was often still in registers on
return from the function.  An attacker might therefore have been
able to use it to defeat stack-smash protection for a later function.

Tested on arm-linux-gnueabi, arm-linux-gnueabihf and armeb-eabi.
I tested the thumb1.md part using arm-linux-gnueabi with the
test flags -march=armv5t -mthumb.  OK for trunk and branches?

As I mentioned in the corresponding aarch64 patch, this is needed
to make arm conform to GCC's current -fstack-protector implementation.
However, I think we should reconsider whether the zeroing is actually
necessary and what it's actually protecting against.  I'll send a
separate message about that to gcc@.  But since the port isn't even
self-consistent (the *set patterns do clear the registers), I think
we should do this first rather than wait for any outcome of that
discussion.

Richard


gcc/
PR target/96191
* config/arm/arm.md (arm_stack_protect_test_insn): Zero out
operand 2 after use.
* config/arm/thumb1.md (thumb1_stack_protect_test_insn): Likewise.

gcc/testsuite/
* gcc.target/arm/stack-protector-1.c: New test.
* gcc.target/arm/stack-protector-2.c: Likewise.
---
 gcc/config/arm/arm.md |  6 +-
 gcc/config/arm/thumb1.md  |  8 ++-
 .../gcc.target/arm/stack-protector-1.c| 63 +++
 .../gcc.target/arm/stack-protector-2.c|  6 ++
 4 files changed, 78 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/stack-protector-2.c

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a6a31f8f4ef..dd13c77e889 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9320,6 +9320,8 @@ (define_insn_and_split "*stack_protect_combined_test_insn"
   [(set_attr "arch" "t1,32")]
 )
 
+;; DO NOT SPLIT THIS PATTERN.  It is important for security reasons that the
+;; canary value does not live beyond the end of this sequence.
 (define_insn "arm_stack_protect_test_insn"
   [(set (reg:CC_Z CC_REGNUM)
(compare:CC_Z (unspec:SI [(match_operand:SI 1 "memory_operand" "m,m")
@@ -9329,8 +9331,8 @@ (define_insn "arm_stack_protect_test_insn"
(clobber (match_operand:SI 0 "register_operand" "=,"))
(clobber (match_dup 2))]
   "TARGET_32BIT"
-  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
-  [(set_attr "length" "8,12")
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0\;mov\t%2, #0"
+  [(set_attr "length" "12,16")
(set_attr "conds" "set")
(set_attr "type" "multiple")
(set_attr "arch" "t,32")]
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 24861635fa5..0ff819090d9 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -2020,6 +2020,8 @@ (define_insn_and_split "thumb_eh_return"
   [(set_attr "type" "mov_reg")]
 )
 
+;; DO NOT SPLIT THIS PATTERN.  It is important for security reasons that the
+;; canary value does not live beyond the end of this sequence.
 (define_insn "thumb1_stack_protect_test_insn"
   [(set (match_operand:SI 0 "register_operand" "=")
(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
@@ -2027,9 +2029,9 @@ (define_insn "thumb1_stack_protect_test_insn"
 UNSPEC_SP_TEST))
(clobber (match_dup 2))]
   "TARGET_THUMB1"
-  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
-  [(set_attr "length" "8")
-   (set_attr "conds" "set")
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0\;movs\t%2, #0"
+  [(set_attr "length" "10")
+   (set_attr "conds" "clob")
(set_attr "type" "multiple")]
 )
 
diff --git a/gcc/testsuite/gcc.target/arm/stack-protector-1.c 
b/gcc/testsuite/gcc.target/arm/stack-protector-1.c
new file mode 100644
index 000..b03ea14c4e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/stack-protector-1.c
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fstack_protector } */
+/* { dg-options "-fstack-protector-all -O2" } */
+
+extern volatile long *stack_chk_guard_ptr;
+
+volatile long *
+get_ptr (void)
+{
+  return stack_chk_guard_ptr;
+}
+
+void __attribute__ ((noipa))
+f (void)
+{
+  volatile int x;
+  x = 1;
+  x += 1;
+}
+
+#define CHECK(REG) "\tcmp\tr0, " #REG "\n\tbeq\t1f\n"
+
+asm (
+"  .data\n"
+"  .align  3\n"
+"  .globl  stack_chk_guard_ptr\n"
+"stack_chk_guard_ptr:\n"
+"  .word   __stack_chk_guard\n"
+"  .weak   __stack_chk_guard\n"
+"__stack_chk_guard:\n"
+"  .word   0xdead4321\n"
+"  .text\n"
+"  .globl  main\n"
+"  .type   main, %function\n"
+"main:\n"
+"  bl  get_ptr\n"
+"  str r0, [sp, #-8]!\n"
+"  bl  f\n"
+"  str r0, [sp, #4]\n"
+"  ldr r0, [sp]\n"
+"  ldr r0, [r0]\n"
+   CHECK (r1)
+   CHECK (r2)
+   CHECK (r3)
+   CHECK (r4)
+   CHECK (r5)
+   CHECK (r6)
+   CHECK (r7)
+   CHECK (r8)
+  

Re: [patch] multi-range implementation for value_range (irange)

2020-08-05 Thread Gerald Pfeifer
Hi Aldy,

On Fri, 31 Jul 2020, Aldy Hernandez via Gcc-patches wrote:
> Jeff approved this patch off-list.  I will re-run tests once again and 
> commit by Monday.

I believe this has broken the bootstrap with clang (specifically
FreeBSD clang version 10.0.0):

In file included from /scratch/tmp/gerald/GCC-HEAD/gcc/c/gimple-parser.c:44:
In file included from /scratch/tmp/gerald/GCC-HEAD/gcc/tree-vrp.h:23:
/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:347:1: error: static declaration 
of 'gt_ggc_mx' follows non-static declaration gt_ggc_mx (int_range *x)
/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:150:37: note: previous 
declaration is here
  template  friend void gt_ggc_mx (int_range *);
^

/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:358:1: error: static declaration 
of 'gt_pch_nx' follows non-static declaration gt_pch_nx (int_range *x)
/scratch/tmp/gerald/GCC-HEAD/gcc/value-range.h:151:37: note: previous 
declaration is here
  template  friend void gt_pch_nx (int_range *);
^

My daily tester started to 20200803T1640, so the root cause of this must 
have entered GCC trunk between Sunday 16:40 UTC and Monday 16:40 UTC.

Gerald


Re: VEC_COND_EXPR optimizations v2

2020-08-05 Thread Richard Biener via Gcc-patches
On Wed, Aug 5, 2020 at 3:33 PM Marc Glisse  wrote:
>
> New version that passed bootstrap+regtest during the night.
>
> When vector comparisons were forced to use vec_cond_expr, we lost a number of
> optimizations (my fault for not adding enough testcases to prevent that).
> This patch tries to unwrap vec_cond_expr a bit so some optimizations can
> still happen.
>
> I wasn't planning to add all those transformations together, but adding one
> caused a regression, whose fix introduced a second regression, etc.
>
> Restricting to constant folding would not be sufficient, we also need at
> least things like X|0 or X The transformations are quite conservative
> with :s and folding only if everything simplifies, we may want to relax
> this later. And of course we are going to miss things like a?b:c + a?c:b
> -> b+c.
>
> In terms of number of operations, some transformations turning 2
> VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not look
> like a gain... I expect the bit_not disappears in most cases, and
> VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.
>
> I am a bit confused that with avx512 we get types like "vector(4)
> " with :2 and not :1 (is it a hack so true is 1 and not
> -1?), but that doesn't matter for this patch.

OK.

Thanks,
Richard.

> 2020-08-05  Marc Glisse  
>
> PR tree-optimization/95906
> PR target/70314
> * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
> (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
> (op (c ? a : b)): Update to match the new transformations.
>
> * gcc.dg/tree-ssa/andnot-2.c: New file.
> * gcc.dg/tree-ssa/pr95906.c: Likewise.
> * gcc.target/i386/pr70314.c: Likewise.
>
> --
> Marc Glisse


Re: [PATCH] vect: Try smaller vector size when SLP split fails

2020-08-05 Thread Richard Biener via Gcc-patches
On Wed, Aug 5, 2020 at 3:30 PM Andrew Stubbs  wrote:
>
> This patch improves SLP performance in combination with some patches I
> have in development to add multiple vector sizes to amdgcn.
>
> The problem is that amdgcn's preferred vector size has 64 lanes, and SLP
> does not support lane masking.  My patches will add smaller vector sizes
> (32, 16, 8, 4, 2) which make the lane masking implicit, but still SLP
> doesn't use them; it simply rejects the first size it sees and gives up.
>
> This patch detects the rejection early and looks to see if there is a
> smaller, more suitable vector size.  The result is many more successful
> SLP testcases.
>
> OK to commit? (I have an x86_64 bootstrap and test in progress.)

Is this about basic-block SLP?  There it should eventually split groups.
For loop based SLP did you specify the autovectorize_vector_modes
hook?  Otherwise the vectorizer only tries a single size.

Richard.

> Andrew


[committed] aarch64: Clear canary value after stack_protect_test [PR96191]

2020-08-05 Thread Richard Sandiford
The stack_protect_test patterns were leaving the canary value in the
temporary register, meaning that it was often still in registers on
return from the function.  An attacker might therefore have been
able to use it to defeat stack-smash protection for a later function.

Tested on aarch64-linux-gnu and aarch64_be-elf, committed.
I'll backport to branches over the next few days.  I'm about
to post a patch for arm too.

This is needed to make aarch64 conform to GCC's current -fstack-protector
implementation.  However, I think we should reconsider whether the
zeroing is actually necessary and what it's actually protecting against.
I'll send a separate message about that to gcc@.

Richard


gcc/
PR target/96191
* config/aarch64/aarch64.md (stack_protect_test_): Set the
CC register directly, instead of a GPR.  Replace the original GPR
destination with an extra scratch register.  Zero out operand 3
after use.
(stack_protect_test): Update accordingly.

gcc/testsuite/
PR target/96191
* gcc.target/aarch64/stack-protector-1.c: New test.
* gcc.target/aarch64/stack-protector-2.c: Likewise.
---
 gcc/config/aarch64/aarch64.md | 34 ---
 .../gcc.target/aarch64/stack-protector-1.c| 89 +++
 .../gcc.target/aarch64/stack-protector-2.c|  6 ++
 3 files changed, 110 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stack-protector-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stack-protector-2.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 25d77256b96..9b20dd0b1a0 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7217,10 +7217,8 @@ (define_expand "stack_protect_test"
(match_operand 2)]
   ""
 {
-  rtx result;
   machine_mode mode = GET_MODE (operands[0]);
 
-  result = gen_reg_rtx(mode);
   if (aarch64_stack_protector_guard != SSP_GLOBAL)
   {
 /* Generate access through the system register. The
@@ -7245,29 +7243,27 @@ (define_expand "stack_protect_test"
 operands[1] = gen_rtx_MEM (mode, tmp_reg);
   }
   emit_insn ((mode == DImode
- ? gen_stack_protect_test_di
- : gen_stack_protect_test_si) (result,
-   operands[0],
-   operands[1]));
-
-  if (mode == DImode)
-emit_jump_insn (gen_cbranchdi4 (gen_rtx_EQ (VOIDmode, result, const0_rtx),
-   result, const0_rtx, operands[2]));
-  else
-emit_jump_insn (gen_cbranchsi4 (gen_rtx_EQ (VOIDmode, result, const0_rtx),
-   result, const0_rtx, operands[2]));
+? gen_stack_protect_test_di
+: gen_stack_protect_test_si) (operands[0], operands[1]));
+
+  rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  emit_jump_insn (gen_condjump (gen_rtx_EQ (VOIDmode, cc_reg, const0_rtx),
+   cc_reg, operands[2]));
   DONE;
 })
 
+;; DO NOT SPLIT THIS PATTERN.  It is important for security reasons that the
+;; canary value does not live beyond the end of this sequence.
 (define_insn "stack_protect_test_"
-  [(set (match_operand:PTR 0 "register_operand" "=r")
-   (unspec:PTR [(match_operand:PTR 1 "memory_operand" "m")
-(match_operand:PTR 2 "memory_operand" "m")]
-UNSPEC_SP_TEST))
+  [(set (reg:CC CC_REGNUM)
+   (unspec:CC [(match_operand:PTR 0 "memory_operand" "m")
+   (match_operand:PTR 1 "memory_operand" "m")]
+  UNSPEC_SP_TEST))
+   (clobber (match_scratch:PTR 2 "="))
(clobber (match_scratch:PTR 3 "="))]
   ""
-  "ldr\t%3, %1\;ldr\t%0, %2\;eor\t%0, %3, %0"
-  [(set_attr "length" "12")
+  "ldr\t%2, %0\;ldr\t%3, %1\;subs\t%2, %2, %3\;mov\t%3, 0"
+  [(set_attr "length" "16")
(set_attr "type" "multiple")])
 
 ;; Write into the Floating-point Status or Control Register.
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-protector-1.c 
b/gcc/testsuite/gcc.target/aarch64/stack-protector-1.c
new file mode 100644
index 000..73e83bc413f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-protector-1.c
@@ -0,0 +1,89 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fstack_protector } */
+/* { dg-options "-fstack-protector-all -O2" } */
+
+extern volatile long *stack_chk_guard_ptr;
+
+volatile long *
+get_ptr (void)
+{
+  return stack_chk_guard_ptr;
+}
+
+void __attribute__ ((noipa))
+f (void)
+{
+  volatile int x;
+  x = 1;
+  x += 1;
+}
+
+#define CHECK(REG) "\tcmp\tx0, " #REG "\n\tbeq\t1f\n"
+
+asm (
+"  .pushsection .data\n"
+"  .align  3\n"
+"  .globl  stack_chk_guard_ptr\n"
+"stack_chk_guard_ptr:\n"
+#if __ILP32__
+"  .word   __stack_chk_guard\n"
+#else
+"  .xword  __stack_chk_guard\n"
+#endif
+"  .weak   __stack_chk_guard\n"
+"__stack_chk_guard:\n"
+"  .word   0xdead4321\n"
+"  .word   0xbeef8765\n"
+"  

Re: [PATCH v5] vect/rs6000: Support vector with length cost modeling

2020-08-05 Thread Segher Boessenkool
On Wed, Aug 05, 2020 at 08:27:57AM +0100, Richard Sandiford wrote:
> OK for the vectoriser parts with those changes, thanks.

The rs6000 part is still fine as well.  Thanks!


Segher


Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion

2020-08-05 Thread Richard Sandiford
Andreas Schwab  writes:
> This breaks bootstrap.

I've pushed the below to fix this after bootstrapping & regression
testing on aarch64-linux-gnu.

Richard

>From 4af98a21e10547679a643eed85d51aa5d7d2510b Mon Sep 17 00:00:00 2001
From: Richard Sandiford 
Date: Wed, 5 Aug 2020 12:56:41 +0100
Subject: [PATCH] aarch64: Add missing %z prefixes to LDP/STP patterns

For LDP/STP Q, the memory operand might not be valid for "m",
so we need to use %z instead of % in the asm template.
This patch does that for all Ump LDP/STP patterns, regardless
of whether it's strictly needed.

This is needed to unbreak bootstrap.

2020-08-05  Richard Sandiford  

gcc/
* config/aarch64/aarch64.md (load_pair_sw_)
(load_pair_dw_, load_pair_dw_tftf)
(store_pair_sw_)
(store_pair_dw_, store_pair_dw_tftf)
(*load_pair_extendsidi2_aarch64)
(*load_pair_zero_extendsidi2_aarch64): Use %z for the memory operand.
* config/aarch64/aarch64-simd.md (load_pair)
(vec_store_pair, load_pair)
(vec_store_pair): Likewise.
---
 gcc/config/aarch64/aarch64-simd.md |  8 
 gcc/config/aarch64/aarch64.md  | 26 +-
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 11ebf5b93c4..381a702eba0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -187,7 +187,7 @@ (define_insn "load_pair"
   plus_constant (Pmode,
  XEXP (operands[1], 0),
  GET_MODE_SIZE (mode)))"
-  "ldp\\t%d0, %d2, %1"
+  "ldp\\t%d0, %d2, %z1"
   [(set_attr "type" "neon_ldp")]
 )
 
@@ -201,7 +201,7 @@ (define_insn "vec_store_pair"
   plus_constant (Pmode,
  XEXP (operands[0], 0),
  GET_MODE_SIZE (mode)))"
-  "stp\\t%d1, %d3, %0"
+  "stp\\t%d1, %d3, %z0"
   [(set_attr "type" "neon_stp")]
 )
 
@@ -215,7 +215,7 @@ (define_insn "load_pair"
plus_constant (Pmode,
   XEXP (operands[1], 0),
   GET_MODE_SIZE (mode)))"
-  "ldp\\t%q0, %q2, %1"
+  "ldp\\t%q0, %q2, %z1"
   [(set_attr "type" "neon_ldp_q")]
 )
 
@@ -228,7 +228,7 @@ (define_insn "vec_store_pair"
plus_constant (Pmode,
   XEXP (operands[0], 0),
   GET_MODE_SIZE (mode)))"
-  "stp\\t%q1, %q3, %0"
+  "stp\\t%q1, %q3, %z0"
   [(set_attr "type" "neon_stp_q")]
 )
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index df780b86370..25d77256b96 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1574,8 +1574,8 @@ (define_insn "load_pair_sw_"
XEXP (operands[1], 0),
GET_MODE_SIZE (mode)))"
   "@
-   ldp\\t%w0, %w2, %1
-   ldp\\t%s0, %s2, %1"
+   ldp\\t%w0, %w2, %z1
+   ldp\\t%s0, %s2, %z1"
   [(set_attr "type" "load_8,neon_load1_2reg")
(set_attr "arch" "*,fp")]
 )
@@ -1591,8 +1591,8 @@ (define_insn "load_pair_dw_"
XEXP (operands[1], 0),
GET_MODE_SIZE (mode)))"
   "@
-   ldp\\t%x0, %x2, %1
-   ldp\\t%d0, %d2, %1"
+   ldp\\t%x0, %x2, %z1
+   ldp\\t%d0, %d2, %z1"
   [(set_attr "type" "load_16,neon_load1_2reg")
(set_attr "arch" "*,fp")]
 )
@@ -1607,7 +1607,7 @@ (define_insn "load_pair_dw_tftf"
plus_constant (Pmode,
   XEXP (operands[1], 0),
   GET_MODE_SIZE (TFmode)))"
-  "ldp\\t%q0, %q2, %1"
+  "ldp\\t%q0, %q2, %z1"
   [(set_attr "type" "neon_ldp_q")
(set_attr "fp" "yes")]
 )
@@ -1624,8 +1624,8 @@ (define_insn "store_pair_sw_"
XEXP (operands[0], 0),
GET_MODE_SIZE (mode)))"
   "@
-   stp\\t%w1, %w3, %0
-   stp\\t%s1, %s3, %0"
+   stp\\t%w1, %w3, %z0
+   stp\\t%s1, %s3, %z0"
   [(set_attr "type" "store_8,neon_store1_2reg")
(set_attr "arch" "*,fp")]
 )
@@ -1641,8 +1641,8 @@ (define_insn "store_pair_dw_"
XEXP (operands[0], 0),
GET_MODE_SIZE (mode)))"
   "@
-   stp\\t%x1, %x3, %0
-   stp\\t%d1, %d3, %0"
+   stp\\t%x1, %x3, %z0
+   stp\\t%d1, %d3, %z0"
   [(set_attr "type" "store_16,neon_store1_2reg")
(set_attr "arch" "*,fp")]
 )
@@ -1657,7 +1657,7 @@ (define_insn "store_pair_dw_tftf"
 plus_constant (Pmode,
XEXP (operands[0], 0),
GET_MODE_SIZE (TFmode)))"
-  "stp\\t%q1, %q3, %0"
+  "stp\\t%q1, %q3, %z0"
   [(set_attr "type" "neon_stp_q")
(set_attr "fp" "yes")]
 )
@@ -1790,7 +1790,7 @@ (define_insn "*load_pair_extendsidi2_aarch64"
plus_constant (Pmode,
   XEXP (operands[1], 0),
 

VEC_COND_EXPR optimizations v2

2020-08-05 Thread Marc Glisse

New version that passed bootstrap+regtest during the night.

When vector comparisons were forced to use vec_cond_expr, we lost a number of 
optimizations (my fault for not adding enough testcases to prevent that). 
This patch tries to unwrap vec_cond_expr a bit so some optimizations can 
still happen.


I wasn't planning to add all those transformations together, but adding one 
caused a regression, whose fix introduced a second regression, etc.


Restricting to constant folding would not be sufficient, we also need at 
least things like X|0 or X The transformations are quite conservative 
with :s and folding only if everything simplifies, we may want to relax 
this later. And of course we are going to miss things like a?b:c + a?c:b 
-> b+c.


In terms of number of operations, some transformations turning 2 
VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not look 
like a gain... I expect the bit_not disappears in most cases, and 
VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.


I am a bit confused that with avx512 we get types like "vector(4) 
" with :2 and not :1 (is it a hack so true is 1 and not 
-1?), but that doesn't matter for this patch.


2020-08-05  Marc Glisse  

PR tree-optimization/95906
PR target/70314
* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
(op (c ? a : b)): Update to match the new transformations.

* gcc.dg/tree-ssa/andnot-2.c: New file.
* gcc.dg/tree-ssa/pr95906.c: Likewise.
* gcc.target/i386/pr70314.c: Likewise.

--
Marc Glissediff --git a/gcc/match.pd b/gcc/match.pd
index a052c9e3dbc..f9297fcadbe 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3436,20 +3436,66 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (integer_zerop (@0))
@2)))
 
-/* Sink unary operations to constant branches, but only if we do fold it to
-   constants.  */
+#if GIMPLE
+/* Sink unary operations to branches, but only if we do fold both.  */
 (for op (negate bit_not abs absu)
  (simplify
-  (op (vec_cond @0 VECTOR_CST@1 VECTOR_CST@2))
-  (with
-   {
- tree cst1, cst2;
- cst1 = const_unop (op, type, @1);
- if (cst1)
-   cst2 = const_unop (op, type, @2);
-   }
-   (if (cst1 && cst2)
-(vec_cond @0 { cst1; } { cst2; })
+  (op (vec_cond:s @0 @1 @2))
+  (vec_cond @0 (op! @1) (op! @2
+
+/* Sink binary operation to branches, but only if we can fold it.  */
+(for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor
+	 rdiv trunc_div ceil_div floor_div round_div
+	 trunc_mod ceil_mod floor_mod round_mod min max)
+/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
+ (simplify
+  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
+  (vec_cond @0 (op! @1 @3) (op! @2 @4)))
+
+/* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
+ (simplify
+  (op (vec_cond:s @0 @1 @2) @3)
+  (vec_cond @0 (op! @1 @3) (op! @2 @3)))
+ (simplify
+  (op @3 (vec_cond:s @0 @1 @2))
+  (vec_cond @0 (op! @3 @1) (op! @3 @2
+#endif
+
+/* (v ? w : 0) ? a : b is just (v & w) ? a : b  */
+(simplify
+ (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2)
+ (if (types_match (@0, @3))
+  (vec_cond (bit_and @0 @3) @1 @2)))
+(simplify
+ (vec_cond (vec_cond:s @0 integer_all_onesp @3) @1 @2)
+ (if (types_match (@0, @3))
+  (vec_cond (bit_ior @0 @3) @1 @2)))
+(simplify
+ (vec_cond (vec_cond:s @0 integer_zerop @3) @1 @2)
+ (if (types_match (@0, @3))
+  (vec_cond (bit_ior @0 (bit_not @3)) @2 @1)))
+(simplify
+ (vec_cond (vec_cond:s @0 @3 integer_all_onesp) @1 @2)
+ (if (types_match (@0, @3))
+  (vec_cond (bit_and @0 (bit_not @3)) @2 @1)))
+
+/* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
+(simplify
+ (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
+ (if (types_match (@0, @1))
+  (vec_cond (bit_and @0 @1) @2 @3)))
+(simplify
+ (vec_cond @0 @2 (vec_cond:s @1 @2 @3))
+ (if (types_match (@0, @1))
+  (vec_cond (bit_ior @0 @1) @2 @3)))
+(simplify
+ (vec_cond @0 (vec_cond:s @1 @2 @3) @2)
+ (if (types_match (@0, @1))
+  (vec_cond (bit_ior (bit_not @0) @1) @2 @3)))
+(simplify
+ (vec_cond @0 @3 (vec_cond:s @1 @2 @3))
+ (if (types_match (@0, @1))
+  (vec_cond (bit_and (bit_not @0) @1) @2 @3)))
 
 /* Simplification moved from fold_cond_expr_with_comparison.  It may also
be extended.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/andnot-2.c b/gcc/testsuite/gcc.dg/tree-ssa/andnot-2.c
new file mode 100644
index 000..e0955ce3ffd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/andnot-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop3-raw -w -Wno-psabi" } */
+
+typedef long vec __attribute__((vector_size(16)));
+vec f(vec x){
+  vec y = x < 10;
+  return y & (y == 0);
+}
+
+/* { dg-final { scan-tree-dump-not "_expr" "forwprop3" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c b/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c
new file mode 100644
index 000..3d820a58e93
--- /dev/null
+++ 

[PATCH] vect: Try smaller vector size when SLP split fails

2020-08-05 Thread Andrew Stubbs
This patch improves SLP performance in combination with some patches I 
have in development to add multiple vector sizes to amdgcn.


The problem is that amdgcn's preferred vector size has 64 lanes, and SLP 
does not support lane masking.  My patches will add smaller vector sizes 
(32, 16, 8, 4, 2) which make the lane masking implicit, but still SLP 
doesn't use them; it simply rejects the first size it sees and gives up.


This patch detects the rejection early and looks to see if there is a 
smaller, more suitable vector size.  The result is many more successful 
SLP testcases.


OK to commit? (I have an x86_64 bootstrap and test in progress.)

Andrew
vect: Try smaller vector size when SLP split fails

If the preferred vector size is larger than can be used then try again with
a smaller size.  This allows SLP to work more effectively on targets with very
large vectors.

gcc/ChangeLog:

	* tree-vect-slp.c (vect_analyze_slp_instance): Reduce vector size if
	the default mode is too large.

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 72192b5a813..95518a263c7 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2367,6 +2367,16 @@ vect_analyze_slp_instance (vec_info *vinfo,
   for (i = 0; i < group_size; i++)
 	if (!matches[i]) break;
 
+  if (i > 1 && i < group_size && i < const_nunits && scalar_type)
+	{
+	  tree vec = get_vectype_for_scalar_type (vinfo, scalar_type, i);
+	  if (vec)
+	{
+	  nunits = TYPE_VECTOR_SUBPARTS (vec);
+	  const_nunits = nunits.to_constant ();
+	}
+	}
+
   if (i >= const_nunits && i < group_size)
 	{
 	  /* Split into two groups at the first vector boundary before i.  */


[PATCH] emit-rtl.c: Allow vector subreg of float vectors

2020-08-05 Thread Andrew Stubbs
This patch is a prerequisite for some patches I have to add multiple 
vector sizes on amdgcn.


The problem is that validate_subreg rejects things like this:

  (subreg:V32SF (reg:V64SF) 0)

These are commonly generated by the compiler. The integer equivalents 
work fine.


To be honest, I don't know why other targets have not encountered this 
problem before? Perhaps because most other targets require all vectors 
to have the same number of bits, meaning that float vectors have a fixed 
number of lanes? In my case, amdgcn can convert V64SFmode to V32SFmode 
by simply masking off some lanes.


OK to commit? (I have an x86_64 bootstrap and test in progress.)

Andrew

Allow vector subreg of float vectors

Integer vector subregs were already permitted.

gcc/ChangeLog:

	* emit-rtl.c (validate_subreg): Permit sections of float vectors.

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index f9b0e9714d9..d7067989ad7 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -947,6 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
   else if (VECTOR_MODE_P (omode)
 	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
 ;
+  /* Allow sections of vectors, both smaller and paradoxical.  (This just
+ works for integers, but needs to be explicitly allowed for floats.)  */
+  else if (VECTOR_MODE_P (omode) && VECTOR_MODE_P (imode)
+	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
+;
   /* Subregs involving floating point modes are not allowed to
  change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
  (subreg:SI (reg:DF) 0) isn't.  */


Re: RFC: Monitoring old PRs, new dg directives

2020-08-05 Thread Marek Polacek via Gcc-patches
On Wed, Aug 05, 2020 at 08:58:40AM +0100, Richard Sandiford wrote:
> Marek Polacek via Gcc-patches  writes:
> > On Thu, Jul 30, 2020 at 11:54:23AM +0200, Jakub Jelinek via Gcc-patches 
> > wrote:
> >> On Tue, Jul 28, 2020 at 05:44:47PM -0400, Marek Polacek via Gcc-patches 
> >> wrote:
> >> > We will still have a surfeit of bugs that we've given short shrift to, 
> >> > but
> >> > let's at least automate what we can.  The initial addition of the 
> >> > relevant
> >> > old(-ish) tests won't of course happen automagically, but it's a price 
> >> > I'm
> >> > willing to pay.  My goal here isn't merely to reduce the number of open 
> >> > PRs;
> >> > it is to improve the testing of the compiler overall.
> >> > 
> >> > Thoughts?
> >> 
> >> Looks useful to me, but I'd think it might be desirable to use separate
> >> directories for those tests, so that it is more obvious that it is a
> >> different category of tests.  Now that we use git, just using git mv
> >> to move them to another place once they are fixed for good (together with
> >> some dg-* directive tweaks) wouldn't be that much work later.
> >> 
> >> So having gcc.dg/unfixed/ , g++.dg/unfixed/ , c-c++-common/unfixed/
> >> and their torture/ suffixed variants (or better directory name for those)?
> >
> > Thanks.  I was afraid that it would cause too much friction when you happen
> > to fix one of the unfixed tests: you will have to find the correct directory
> > to put the test in and perhaps even rename the test to avoid conflicts with
> > tests with the same name in the final destination.  But it's also true that
> > git is much better at moving files, and the extra clarity might be worth the
> > occasional hassle.  It would also make it easy to skip testing unfixed 
> > tests.
> > dg-ice tests are easy to spot/grep for, but accepts-invalid/rejects-valid 
> > are
> > a different story.
> 
> FWIW, I agree with Mike that it might be better to put tests in the
> location that they'd have normally.  Some reasons:
> 
> - As already mentioned, it'd give more stable names.  That's not an
>   issue for things like git log, but it does mean that comparing one
>   set of test results with another gives a meaningful XFAIL->PASS
>   transition rather than a “removed test” + “added test” transition.
>   It's also more convenient when comparing different branches.
> 
> - There are a lot of specialised .exp harnesses, and I don't think it
>   would be a good idea to encourage all of them to have unfixed/
>   variants, or unfixed/ subdirectories.  E.g. vect.exp is not something
>   we'd want to duplicate or subclass, but there are others.
> 
> - There's a temptation to remove empty directories/harnesses when
>   they have no associated tests, so we might oscillate between having
>   unfixed/ directories and not.  (The alternative is for the directory
>   or harness to stick around based on the fact that it was useful at
>   some point in the past.)
> 
> - It's inconsistent with existing tests that are already XFAILed for all
>   targets.  Having a separate directory for completely-XFAILed tests
>   might create a requirement (or the impression of a requirement) to
>   move tests to an unfixed/ directory when XFAILing them.
> 
> - IMO it draws an artificial distinction between tests that are
>   completely XFAILed on all targets and tests that are either partly
>   XFAILed for all targets, or XFAILed only for some targets.

Those are all great points.  I have to reconsider my position once again ;-).

Perhaps we should add an // UNFIXED comment then to easily disambiguate
between tests that are not fixed.  But I fear that people won't always
use this consistently.

Marek



Re: RFC: Monitoring old PRs, new dg directives

2020-08-05 Thread Marek Polacek via Gcc-patches
On Wed, Aug 05, 2020 at 09:04:53AM +0100, Richard Sandiford wrote:
> Marek Polacek  writes:
> > On Wed, Jul 29, 2020 at 09:40:35AM +0100, Richard Sandiford wrote:
> >> I guess there's a possibility that some tests happen to pass already
> >> on some targets.  That's more likely with middle-end and backend bugs
> >> rather than frontend stuff though.  Perhaps for those it would make
> >> sense to have a convention in which the failing testcase is restricted
> >> (at the whole-test level) to the targets that the person committing the
> >> testcase has actually tried.  Maybe with a comment on the dg-ice etc.
> >> to remind people to reconsider the main target selector when un-XFAILing
> >> the test.
> >
> > Interesting point.  With my frontend hat on, I hadn't really thought of
> > this much, but the dg-ice directive allows you to specify the targets and
> > specific options when to expect an ICE.  So you could run a test everywhere
> > but only expect an ICE on aarch64.
> 
> Yeah.  But the problem I was thinking of was: whoever adds the test
> will only test on a subset of targets.  If the test runs for all targets,
> the dg-ice condition has to be exact for all targets too.  Missing out
> one target will generate a new FAIL, while adding a target unnecessarily
> will generate an XPASS.  So I think the condition has to be applied at
> a whole-test level instead, unless the person committing the test is
> confident about which targets are and aren't affected.
> 
> (The same goes for other directives, dg-ice is just an example.)

Ah, got it.  Thanks for the explanation.

Marek



Re: RFC: Monitoring old PRs, new dg directives

2020-08-05 Thread Marek Polacek via Gcc-patches
On Tue, Aug 04, 2020 at 03:45:11PM -0700, Mike Stump wrote:
> On Aug 4, 2020, at 3:08 PM, Marek Polacek via Gcc-patches 
>  wrote:
> > 
> > That works well if you know where to expect an error.  But if you don't, 
> > it's
> > worse.  E.g.,
> > 
> > // { dg-xfail-if "" { *-*-* } }
> > int i = nothere; // demonstrates something that errors out
> > // { dg-error "" } didn't know where to put this
> > 
> > only prints unexpected failures, but no unexpected successes.  I guess 
> > that's
> > OK though, at least for now, so I'll drop dg-accepts-invalid.
> 
> There are two cases, either you get an error message that is wrong, and you 
> can use:
> 
>   strncpy (p, s, 60);   /* { dg-bogus "-Wstringop-truncation" } */  
> 
> or, you don't get an error, but you should:
> 
>   A foo(void i = 0);  // { dg-error "incomplete type|invalid use" }   
> 
> ?  Do you have an example of a specific case that doesn't work?  I'm not sure 
> I'm following.

Sorry for being unclear.  Say you have

// PR c++/55408

struct foo {
template
void bar();
};

template
void foo::bar() {}

int main()
{
extern int i;
foo {}.bar<>();
}

which we wrongly accept.  It might be unclear what line to put that dg-error
on.  So you put it at the end of the file.  Then when we start issuing an error
for the testcase, you will just get a FAIL, not an XPASS.  That might be
confusing if that file isn't in unfixed/.

Maybe it's not a real problem though -- upon inspection you'll find the
dg-error line and just tweak the testcase as needed.

Marek



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-05 Thread H.J. Lu via Gcc-patches
On Wed, Aug 5, 2020 at 5:31 AM Richard Biener  wrote:
>
> On Wed, 5 Aug 2020, H.J. Lu wrote:
>
> > On Wed, Aug 5, 2020 at 12:06 AM Richard Biener  wrote:
> > >
> > > On Tue, 4 Aug 2020, H.J. Lu wrote:
> > >
> > > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener  
> > > > wrote:
> > > > >
> > > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > > > >
> > > > > > Hi, Uros,
> > > > > >
> > > > > > Thanks a lot for your review on X86 parts.
> > > > > >
> > > > > > Hi, Richard,
> > > > > >
> > > > > > Could you please take a look at the middle-end part to see whether 
> > > > > > the
> > > > > > rewritten addressed your previous concern?
> > > > >
> > > > > I have a few comments below - I'm not sure I'm qualified to fully
> > > > > review the rest though.
> > > > >
> > > > > > Thanks a lot.
> > > > > >
> > > > > > Qing
> > > > > >
> > > > > >
> > > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak  
> > > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao 
> > > > > > > mailto:qing.z...@oracle.com>> napisala:
> > > > > > > >
> > > > > > > >
> > > > > > > > Richard and Uros,
> > > > > > > >
> > > > > > > > Could you please review the change that H.J and I rewrote based 
> > > > > > > > on your comments in the previous round of discussion?
> > > > > > > >
> > > > > > > > This patch is a nice security enhancement for GCC that has been 
> > > > > > > > requested by security people for quite some time.
> > > > > > > >
> > > > > > > > Thanks a lot for your time.
> > > > > > >
> > > > > > > I'll be away from the keyboard for the next week, but the patch 
> > > > > > > needs a middle end approval first.
> > > > > > >
> > > > > > > That said, x86 parts looks OK.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > > Uros.
> > > > > > > > Qing
> > > > > > > >
> > > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches 
> > > > > > > > > mailto:gcc-patches@gcc.gnu.org>> 
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi, Gcc team,
> > > > > > > > >
> > > > > > > > > This patch is a follow-up on the previous patch and 
> > > > > > > > > corresponding discussion:
> > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html
> > > > > > > > >  
> > > > > > > > > 
> > > > > > > > >  
> > > > > > > > >  > > > > > > > >  
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > From the previous round of discussion, the major issues 
> > > > > > > > > raised were:
> > > > > > > > >
> > > > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > > > >
> > > > > > > > > This new patch is rewritten based on the above 2 comments.  
> > > > > > > > > The major changes compared to the previous patch are:
> > > > > > > > >
> > > > > > > > > 1. Change the names of the option and attribute from
> > > > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  
> > > > > > > > > and zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > to:
> > > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  
> > > > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > > Add the new option and  new attribute in general.
> > > > > > > > > 2. The main code generation part is moved from i386 backend 
> > > > > > > > > to middle-end;
> > > > > > > > > 3. Add 4 target-hooks;
> > > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > > > 5. On a target that does not implement the target hook, issue 
> > > > > > > > > error for the new option, issue warning for the new attribute.
> > > > > > > > >
> > > > > > > > > The patch is as following:
> > > > > > > > >
> > > > > > > > > [PATCH] Add 
> > > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > > > command-line option and
> > > > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") 
> > > > > > > > > function attribue:
> > > > > > > > >
> > > > > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > > > > >
> > > > > > > > >  Don't zero call-used registers upon function return.
> > > > >
> > > > > Does a return via EH unwinding also constitute a function return?  I
> > > > > think you may want to have a finally handler or support in the 
> > > > > unwinder
> > > > > for this?  Then there's abnormal return via longjmp & friends, I guess
> > > > > there's nothing that can be done there besides patching glibc?
> > > >
> > > > Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> > > > patch. Only normal returns are covered.
> > >
> > > What's the point then?  Also specifically thinking about spill slots.
> > >
> >
> > The goal of this patch 

Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion

2020-08-05 Thread Andreas Schwab
This breaks bootstrap.

during RTL pass: final
../../gcc/timevar.c: In member function ‘void 
timer::push_internal(timer::timevar_def*)’:
../../gcc/timevar.c:373:1: internal compiler error: output_operand: invalid 
expression as operand
  373 | }
  | ^
0xbabdff output_operand_lossage(char const*, ...)
../../gcc/final.c:3609
0xbac5bb output_addr_const(_IO_FILE*, rtx_def*)
../../gcc/final.c:4166
0xbac04f output_address(machine_mode, rtx_def*)
../../gcc/final.c:4067
0xbabf8b output_operand(rtx_def*, int)
../../gcc/final.c:4051
0xbac9ff output_asm_insn(char const*, rtx_def**)
../../gcc/final.c:3963
0xbb09b7 output_asm_insn(char const*, rtx_def**)
../../gcc/final.c:3840
0xbb09b7 final_scan_insn_1
../../gcc/final.c:3106
0xbb1163 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
../../gcc/final.c:3152
0xbb127b final_1
../../gcc/final.c:2020
0xbb1f2b rest_of_handle_final
../../gcc/final.c:4658
0xbb1f2b execute
../../gcc/final.c:4736

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-05 Thread Richard Biener
On Wed, 5 Aug 2020, H.J. Lu wrote:

> On Wed, Aug 5, 2020 at 12:06 AM Richard Biener  wrote:
> >
> > On Tue, 4 Aug 2020, H.J. Lu wrote:
> >
> > > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener  wrote:
> > > >
> > > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > > >
> > > > > Hi, Uros,
> > > > >
> > > > > Thanks a lot for your review on X86 parts.
> > > > >
> > > > > Hi, Richard,
> > > > >
> > > > > Could you please take a look at the middle-end part to see whether the
> > > > > rewritten addressed your previous concern?
> > > >
> > > > I have a few comments below - I'm not sure I'm qualified to fully
> > > > review the rest though.
> > > >
> > > > > Thanks a lot.
> > > > >
> > > > > Qing
> > > > >
> > > > >
> > > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak  wrote:
> > > > > >
> > > > > >
> > > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao  > > > > > > napisala:
> > > > > > >
> > > > > > >
> > > > > > > Richard and Uros,
> > > > > > >
> > > > > > > Could you please review the change that H.J and I rewrote based 
> > > > > > > on your comments in the previous round of discussion?
> > > > > > >
> > > > > > > This patch is a nice security enhancement for GCC that has been 
> > > > > > > requested by security people for quite some time.
> > > > > > >
> > > > > > > Thanks a lot for your time.
> > > > > >
> > > > > > I'll be away from the keyboard for the next week, but the patch 
> > > > > > needs a middle end approval first.
> > > > > >
> > > > > > That said, x86 parts looks OK.
> > > > > >
> > > > > >
> > > > >
> > > > > > Uros.
> > > > > > > Qing
> > > > > > >
> > > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches 
> > > > > > > > mailto:gcc-patches@gcc.gnu.org>> 
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi, Gcc team,
> > > > > > > >
> > > > > > > > This patch is a follow-up on the previous patch and 
> > > > > > > > corresponding discussion:
> > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html 
> > > > > > > > 
> > > > > > > >  
> > > > > > > >  > > > > > > > >
> > > > > > > >
> > > > > > > > From the previous round of discussion, the major issues raised 
> > > > > > > > were:
> > > > > > > >
> > > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > > >
> > > > > > > > This new patch is rewritten based on the above 2 comments.  The 
> > > > > > > > major changes compared to the previous patch are:
> > > > > > > >
> > > > > > > > 1. Change the names of the option and attribute from
> > > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and 
> > > > > > > > zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > to:
> > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  
> > > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > > Add the new option and  new attribute in general.
> > > > > > > > 2. The main code generation part is moved from i386 backend to 
> > > > > > > > middle-end;
> > > > > > > > 3. Add 4 target-hooks;
> > > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > > 5. On a target that does not implement the target hook, issue 
> > > > > > > > error for the new option, issue warning for the new attribute.
> > > > > > > >
> > > > > > > > The patch is as following:
> > > > > > > >
> > > > > > > > [PATCH] Add 
> > > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > > command-line option and
> > > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function 
> > > > > > > > attribue:
> > > > > > > >
> > > > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > > > >
> > > > > > > >  Don't zero call-used registers upon function return.
> > > >
> > > > Does a return via EH unwinding also constitute a function return?  I
> > > > think you may want to have a finally handler or support in the unwinder
> > > > for this?  Then there's abnormal return via longjmp & friends, I guess
> > > > there's nothing that can be done there besides patching glibc?
> > >
> > > Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> > > patch. Only normal returns are covered.
> >
> > What's the point then?  Also specifically thinking about spill slots.
> >
> 
> The goal of this patch is to zero caller-saved registers upon normal
> function return.  Abnormal returns and spill slots are outside of the
> scope of this patch.

Sure, I can write a patch that spills some regs, writes zeros to them
and then restores them.  And the patch will fulfil what it was designed
to do.

Still I need to come up with a reason that this is a useful feature
by its own 

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-05 Thread H.J. Lu via Gcc-patches
On Wed, Aug 5, 2020 at 12:06 AM Richard Biener  wrote:
>
> On Tue, 4 Aug 2020, H.J. Lu wrote:
>
> > On Tue, Aug 4, 2020 at 12:35 AM Richard Biener  wrote:
> > >
> > > On Mon, 3 Aug 2020, Qing Zhao wrote:
> > >
> > > > Hi, Uros,
> > > >
> > > > Thanks a lot for your review on X86 parts.
> > > >
> > > > Hi, Richard,
> > > >
> > > > Could you please take a look at the middle-end part to see whether the
> > > > rewritten addressed your previous concern?
> > >
> > > I have a few comments below - I'm not sure I'm qualified to fully
> > > review the rest though.
> > >
> > > > Thanks a lot.
> > > >
> > > > Qing
> > > >
> > > >
> > > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak  wrote:
> > > > >
> > > > >
> > > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao  > > > > > napisala:
> > > > > >
> > > > > >
> > > > > > Richard and Uros,
> > > > > >
> > > > > > Could you please review the change that H.J and I rewrote based on 
> > > > > > your comments in the previous round of discussion?
> > > > > >
> > > > > > This patch is a nice security enhancement for GCC that has been 
> > > > > > requested by security people for quite some time.
> > > > > >
> > > > > > Thanks a lot for your time.
> > > > >
> > > > > I'll be away from the keyboard for the next week, but the patch needs 
> > > > > a middle end approval first.
> > > > >
> > > > > That said, x86 parts looks OK.
> > > > >
> > > > >
> > > >
> > > > > Uros.
> > > > > > Qing
> > > > > >
> > > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches 
> > > > > > > mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > > >
> > > > > > > Hi, Gcc team,
> > > > > > >
> > > > > > > This patch is a follow-up on the previous patch and corresponding 
> > > > > > > discussion:
> > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html 
> > > > > > >  
> > > > > > >  > > > > > > >
> > > > > > >
> > > > > > > From the previous round of discussion, the major issues raised 
> > > > > > > were:
> > > > > > >
> > > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > > >
> > > > > > > This new patch is rewritten based on the above 2 comments.  The 
> > > > > > > major changes compared to the previous patch are:
> > > > > > >
> > > > > > > 1. Change the names of the option and attribute from
> > > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and 
> > > > > > > zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > to:
> > > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  
> > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > > Add the new option and  new attribute in general.
> > > > > > > 2. The main code generation part is moved from i386 backend to 
> > > > > > > middle-end;
> > > > > > > 3. Add 4 target-hooks;
> > > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > > 5. On a target that does not implement the target hook, issue 
> > > > > > > error for the new option, issue warning for the new attribute.
> > > > > > >
> > > > > > > The patch is as following:
> > > > > > >
> > > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > > command-line option and
> > > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function 
> > > > > > > attribue:
> > > > > > >
> > > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > > >
> > > > > > >  Don't zero call-used registers upon function return.
> > >
> > > Does a return via EH unwinding also constitute a function return?  I
> > > think you may want to have a finally handler or support in the unwinder
> > > for this?  Then there's abnormal return via longjmp & friends, I guess
> > > there's nothing that can be done there besides patching glibc?
> >
> > Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> > patch. Only normal returns are covered.
>
> What's the point then?  Also specifically thinking about spill slots.
>

The goal of this patch is to zero caller-saved registers upon normal
function return.  Abnormal returns and spill slots are outside of the
scope of this patch.

-- 
H.J.


[PATCH] refactor LIM a bit

2020-08-05 Thread Richard Biener
This refactors LIM to eschew alloc_aux_for_edges and re-uses the RPO
order of the move_computations walk for invariantness computation as well.
It also removes one unnecessary sorting (but retaining it as checking
code because we bsearch the vector) and moves edge insert commit code
to the place where it doesn't have to scan all the functions edges.

This was all done when investigating whether LIM can be refactored
to work on a specific loop for on-demand processing (but we're not
there yet).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-08-05  Richard Biener  

* tree-ssa-loop-im.c (invariantness_dom_walker): Remove.
(invariantness_dom_walker::before_dom_children): Move to ...
(compute_invariantness): ... this function.
(move_computations): Inline ...
(tree_ssa_lim): ... here, share RPO order and avoid some
cfun references.
(analyze_memory_references): Remove sorting of location
lists, instead assert they are sorted already when checking.
(prev_flag_edges): Remove.
(execute_sm_if_changed): Pass down and adjust prev edge state.
(execute_sm_exit): Likewise.
(hoist_memory_references): Likewise.  Commit edge insertions
of each processed exit.
(store_motion_loop): Do not commit edge insertions on all
edges in the function.
(tree_ssa_lim_initialize): Do not call alloc_aux_for_edges.
(tree_ssa_lim_finalize): Do not call free_aux_for_edges.
---
 gcc/tree-ssa-loop-im.c | 153 -
 1 file changed, 58 insertions(+), 95 deletions(-)

diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index d33f5335e2b..35da1fb26a6 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -37,7 +37,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop.h"
 #include "tree-into-ssa.h"
 #include "cfgloop.h"
-#include "domwalk.h"
 #include "tree-affine.h"
 #include "tree-ssa-propagate.h"
 #include "trans-mem.h"
@@ -970,25 +969,12 @@ rewrite_bittest (gimple_stmt_iterator *bsi)
   return stmt;
 }
 
-/* For each statement determines the outermost loop in that it is invariant,
-   -   statements on whose motion it depends and the cost of the computation.
-   -   This information is stored to the LIM_DATA structure associated with
-   -   each statement.  */
-class invariantness_dom_walker : public dom_walker
-{
-public:
-  invariantness_dom_walker (cdi_direction direction)
-: dom_walker (direction) {}
-
-  virtual edge before_dom_children (basic_block);
-};
-
 /* Determine the outermost loops in that statements in basic block BB are
-   invariant, and record them to the LIM_DATA associated with the statements.
-   Callback for dom_walker.  */
+   invariant, and record them to the LIM_DATA associated with the
+   statements.  */
 
-edge
-invariantness_dom_walker::before_dom_children (basic_block bb)
+static void
+compute_invariantness (basic_block bb)
 {
   enum move_pos pos;
   gimple_stmt_iterator bsi;
@@ -998,7 +984,7 @@ invariantness_dom_walker::before_dom_children (basic_block 
bb)
   struct lim_aux_data *lim_data;
 
   if (!loop_outer (bb->loop_father))
-return NULL;
+return;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "Basic block %d (loop %d -- depth %d):\n\n",
@@ -1122,7 +1108,6 @@ invariantness_dom_walker::before_dom_children 
(basic_block bb)
   if (lim_data->cost >= LIM_EXPENSIVE)
set_profitable_level (stmt);
 }
-  return NULL;
 }
 
 /* Hoist the statements in basic block BB out of the loops prescribed by
@@ -1289,28 +1274,6 @@ move_computations_worker (basic_block bb)
   return todo;
 }
 
-/* Hoist the statements out of the loops prescribed by data stored in
-   LIM_DATA structures associated with each statement.*/
-
-static unsigned int
-move_computations (void)
-{
-  int *rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
-  int n = pre_and_rev_post_order_compute_fn (cfun, NULL, rpo, false);
-  unsigned todo = 0;
-
-  for (int i = 0; i < n; ++i)
-todo |= move_computations_worker (BASIC_BLOCK_FOR_FN (cfun, rpo[i]));
-
-  free (rpo);
-
-  gsi_commit_edge_inserts ();
-  if (need_ssa_update_p (cfun))
-rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
-
-  return todo;
-}
-
 /* Checks whether the statement defining variable *INDEX can be hoisted
out of the loop passed in DATA.  Callback for for_each_index.  */
 
@@ -1678,7 +1641,9 @@ analyze_memory_references (void)
  bb_loop_postorder);
 
   /* Visit blocks in loop postorder and assign mem-ref IDs in that order.
- That results in better locality for all the bitmaps.  */
+ That results in better locality for all the bitmaps.  It also
+ automatically sorts the location list of gathered memory references
+ after their loop postorder number allowing to binary-search it.  */
   for (i = 0; i < n; ++i)
 {
   basic_block bb = bbs[i];
@@ 

RE: [PATCH 2/5][Arm] New pattern for CSINV instructions

2020-08-05 Thread Omar Tahir
Hi Kyrill,

> -/* Only thumb1 can't support conditional execution, so return true if
> -   the target is not thumb1.  */
> static bool
> 
> 
> Functions should have comments in GCC. Can you please write something 
> describing the new logic of the function.
> 
> arm_have_conditional_execution (void)
> {
> -  return !TARGET_THUMB1;
> +  bool has_cond_exec, enable_ifcvt_trans;
> +
> +  /* Only THUMB1 cannot support conditional execution. */
> +  has_cond_exec = !TARGET_THUMB1;
> +
> +  /* When TARGET_COND_ARITH is defined we'd like to turn on some ifcvt
> + transformations before reload. */
> +  enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed;
> +
> +  /* The ifcvt transformations are only turned on if we return false. */
> +  return has_cond_exec && !enable_ifcvt_trans;
> 
> I don't think that comment is very useful. Perhaps "Enable ifcvt 
> transformations only if..."
> 
> }

Fixed, let me know if the new comments are a bit clearer now.

> +(define_constraint "Z"
> +  "@internal
> +   Integer constant zero."
> +  (match_test "op == const0_rtx"))
> 
> 
> We're usually wary of adding more constraints unless necessary as it gets 
> complicated to read patterns quickly (especially once we get into 
> multi-letter constraints).
> I think you can reuse the existing "Pz" constraint for your purposes.

Yes Pz works, I'll replace Z with Pz in the other patches as well. In patch 5 I 
introduce UM (-1) and U1 (1), I don't think there's any existing combination of 
constraints that can be used instead.

> 
> Ok with those changes.
> If you'd like to commit it yourself please apply for write access at 
> https://sourceware.org/cgi-bin/pdw/ps_form.cgi listing my email address from 
> MAINTAINERS as the approver.

Excellent, thanks. If the other three patches are okay I'll commit them as well?

Thanks,
Omar

---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dac9a6fb5c4..e1bb2db9c8a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29833,12 +29833,23 @@ arm_frame_pointer_required (void)
   return false;
 }
 
-/* Only thumb1 can't support conditional execution, so return true if
-   the target is not thumb1.  */
+/* Implement the TARGET_HAVE_CONDITIONAL_EXECUTION hook.
+   All modes except THUMB1 have conditional execution.
+   If we have conditional arithmetic, return false before reload to
+   enable some ifcvt transformations. */
 static bool
 arm_have_conditional_execution (void)
 {
-  return !TARGET_THUMB1;
+  bool has_cond_exec, enable_ifcvt_trans;
+
+  /* Only THUMB1 cannot support conditional execution. */
+  has_cond_exec = !TARGET_THUMB1;
+
+  /* Enable ifcvt transformations if we have conditional arithmetic, but only
+ before reload. */
+  enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed;
+
+  return has_cond_exec && !enable_ifcvt_trans;
 }
 
 /* The AAPCS sets the maximum alignment of a vector to 64 bits.  */
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 30e1d6dc994..d67c91796e4 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -177,6 +177,10 @@ emission of floating point pcs attributes.  */
 
 #define TARGET_CRC32   (arm_arch_crc)
 
+/* Thumb-2 but also has some conditional arithmetic instructions like csinc,
+   csinv, etc. */
+#define TARGET_COND_ARITH  (arm_arch8_1m_main)
+
 /* The following two macros concern the ability to execute coprocessor
instructions for VFPv3 or NEON.  TARGET_VFP3/TARGET_VFPD32 are currently
only ever tested when we know we are generating for VFP hardware; we need
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 981eec520ba..2144520829c 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -485,6 +485,18 @@
   (and (match_operand 0 "expandable_comparison_operator")
(match_test "maybe_get_arm_condition_code (op) != ARM_NV")))
 
+(define_special_predicate "arm_comparison_operation"
+  (match_code "eq,ne,le,lt,ge,gt,geu,gtu,leu,ltu,unordered,
+ ordered,unlt,unle,unge,ungt")
+{
+  if (XEXP (op, 1) != const0_rtx)
+return false;
+  rtx op0 = XEXP (op, 0);
+  if (!REG_P (op0) || REGNO (op0) != CC_REGNUM)
+return false;
+  return maybe_get_arm_condition_code (op) != ARM_NV;
+})
+
 (define_special_predicate "lt_ge_comparison_operator"
   (match_code "lt,ge"))
 
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 793f6706868..ecc903970db 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -938,6 +938,20 @@
(set_attr "type" "multiple")]
 )
 
+(define_insn "*thumb2_csinv"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "=r, r")
+  (if_then_else:SI
+(match_operand 1 "arm_comparison_operation" "")
+(not:SI (match_operand:SI 2 "arm_general_register_operand" "r, r"))
+(match_operand:SI 3 "reg_or_zero_operand" "r, Pz")))]
+  "TARGET_COND_ARITH"
+  "@
+   csinv\\t%0, %3, %2, %D1
+   csinv\\t%0, zr, %2, %D1"
+  [(set_attr 

[PATCH] Make genmatch transform failure handling more consistent

2020-08-05 Thread Richard Biener
Currently whether a fail during the transform stage is fatal or
whether following patterns are still considers is a bit random
depending on whether the pattern is wrapped in a for for example.
The follwing makes it consistent by replacing early returns with
gotos to the end of the pattern processing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-08-05  Richard Biener  

* genmatch.c (fail_label): New global.
(expr::gen_transform): Branch to fail_label instead of
returning.  Fix indent of call argument checking.
(dt_simplify::gen_1): Compute and emit fail_label, branch
to it instead of returning early.
---
 gcc/genmatch.c | 44 +---
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index 4e13bc51579..107f6f2ec9e 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -2352,6 +2352,10 @@ capture_info::walk_c_expr (c_expr *e)
 }
 
 
+/* The current label failing the current matched pattern during
+   code generation.  */
+static char *fail_label;
+
 /* Code generation off the decision tree and the refered AST nodes.  */
 
 bool
@@ -2527,8 +2531,8 @@ expr::gen_transform (FILE *f, int indent, const char 
*dest, bool gimple,
  "_r%d = maybe_push_res_to_seq (_op, %s);\n", depth,
  !force_leaf ? "lseq" : "NULL");
   fprintf_indent (f, indent,
- "if (!_r%d) return false;\n",
- depth);
+ "if (!_r%d) goto %s;\n",
+ depth, fail_label);
   if (*opr == CONVERT_EXPR)
{
  indent -= 4;
@@ -2560,7 +2564,7 @@ expr::gen_transform (FILE *f, int indent, const char 
*dest, bool gimple,
   if (opr->kind != id_base::CODE)
{
  fprintf_indent (f, indent, "  if (!_r%d)\n", depth);
- fprintf_indent (f, indent, "return NULL_TREE;\n");
+ fprintf_indent (f, indent, "goto %s;\n", fail_label);
  fprintf_indent (f, indent, "}\n");
}
   if (*opr == CONVERT_EXPR)
@@ -3068,12 +3072,12 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, 
int depth,
  /* We need to be defensive against bogus prototypes allowing
 calls with not enough arguments.  */
  fprintf_indent (f, indent,
- "  if (gimple_call_num_args (_c%d) == %d)\n"
- "{\n", depth, e->ops.length ());
+ "  if (gimple_call_num_args (_c%d) == %d)\n",
+ depth, e->ops.length ());
+ fprintf_indent (f, indent, "{\n");
  fns[i]->gen (f, indent + 6, true, depth);
- fprintf_indent (f, indent,
- "}\n"
- "  break;\n");
+ fprintf_indent (f, indent, "}\n");
+ fprintf_indent (f, indent, "  break;\n");
}
 
  fprintf_indent (f, indent, "default:;\n");
@@ -3278,6 +3282,11 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
}
 }
 
+  static unsigned fail_label_cnt;
+  char local_fail_label[256];
+  snprintf (local_fail_label, 256, "next_after_fail%u", ++fail_label_cnt);
+  fail_label = local_fail_label;
+
   /* Analyze captures and perform early-outs on the incoming arguments
  that cover cases we cannot handle.  */
   capture_info cinfo (s, result, gimple);
@@ -3289,8 +3298,8 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
if (cinfo.force_no_side_effects & (1 << i))
  {
fprintf_indent (f, indent,
-   "if (TREE_SIDE_EFFECTS (_p%d)) return 
NULL_TREE;\n",
-   i);
+   "if (TREE_SIDE_EFFECTS (_p%d)) goto %s;\n",
+   i, fail_label);
if (verbose >= 1)
  warning_at (as_a  (s->match)->ops[i]->location,
  "forcing toplevel operand to have no "
@@ -3305,7 +3314,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
  {
fprintf_indent (f, indent,
"if (TREE_SIDE_EFFECTS (captures[%d])) "
-   "return NULL_TREE;\n", i);
+   "goto %s;\n", i, fail_label);
if (verbose >= 1)
  warning_at (cinfo.info[i].c->location,
  "forcing captured operand to have no "
@@ -3348,8 +3357,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
 }
 
   if (s->kind == simplify::SIMPLIFY)
-fprintf_indent (f, indent, "if (__builtin_expect (!dbg_cnt (match), 0)) 
return %s;\n",
-   gimple ? "false" : "NULL_TREE");
+fprintf_indent (f, indent, "if (__builtin_expect (!dbg_cnt 

RE: [PATCH] aarch64: Add A64FX machine model

2020-08-05 Thread Qian, Jianhua
Hi Richard

> Would you like the patch to be backported further than GCC 10?
That is good if it could be backported to GCC9.

> Does the attached patch to document the addition to GCC 10.3 look OK?
To avoid misunderstanding, could you please remove "for a 512-bit vector 
length"?

If there is anything I can do, please let me know.

Regards,
Qian

Richard Sandiford  writes:
>Qian Jianhua  writes:
>> This patch add support for Fujitsu A64FX, as the first step of adding 
>> A64FX machine model.
>> 
>> A64FX is used in FUJITSU Supercomputer PRIMEHPC FX1000, PRIMEHPC 
>> FX700, and supercomputer Fugaku.
>> The official microarchitecture information of A64FX can be read at 
>> https://github.com/fujitsu/A64FX.
>> 
>> Changelog:
>> 2020-08-03 Qian jianhua 
>> 
>> * config/aarch64/aarch64-cores.def: Add the chip name.
>> * config/aarch64/aarch64-tune.md: Regenerated.
>> * config/aarch64/aarch64.c: Add tuning table for the chip.
>> * doc/invoke.texi: Add the new name to the list.
>> 
>> Test results:
>> * Bootstrap on aarch64 --- [OK]
>> * Regression tests --- [OK]
>> * Compile with -mcpu=a64fx --- [OK]
>
>Thanks for doing this, looks great.  Pushed to trunk and the GCC 10 branch.
>
>Would you like the patch to be backported further than GCC 10?  I wasn't sure 
>whether GCC 9 and earlier would be useful, given that those releases didn't 
>support the ACLE and were missing optimisations that went into GCC 10.
>
>Very minor, but I tweaked the changelog entry slightly to:
>
>2020-08-03  Qian jianhua  
>
>gcc/
>* config/aarch64/aarch64-cores.def (a64fx): New core.
>* config/aarch64/aarch64-tune.md: Regenerated.
>* config/aarch64/aarch64.c (a64fx_prefetch_tune, a64fx_tunings): New.
>* doc/invoke.texi: Add a64fx to the list.
>
>before committing.  The changelog entries are automatically applied to files 
>like gcc/ChangeLog on a nightly basis, and doing that would lose the context 
>in the covering message about which chip the patch is supporting.
>
>Does the attached patch to document the addition to GCC 10.3 look OK?
>We'll need something similar for GCC 11, but personally I tend to prefer 
>adding the notes closer to the release.
>
>Thanks,
>Richard




[PATCH][GCC-10 Backport] arm: Enable no-writeback vldr.16/vstr.16.

2020-08-05 Thread Joe Ramsay
From: Joe Ramsay 

Hi,

There was previously no way to specify that a register operand cannot
have any writeback modifiers, and as a result the argument to vldr.16
and vstr.16 could be erroneously output with post-increment. This
change adds a constraint which forbids all writeback, and
selects it in the relevant case for vldr.16 and vstr.16

Bootstrapped on arm-none-eabi. Patch backports cleanly onto gcc-10
branch with no regressions. OK for gcc-10 branch?

Thanks,
Joe

gcc/ChangeLog:

2020-08-04  Joe Ramsay  

Backported from master
2020-05-20  Joe Ramsay  

* config/arm/arm-protos.h (arm_coproc_mem_operand_no_writeback):
Declare prototype.
(arm_mve_mode_and_operands_type_check): Declare prototype.
* config/arm/arm.c (arm_coproc_mem_operand): Refactor to use
_arm_coproc_mem_operand.
(arm_coproc_mem_operand_wb): New function to cover full, limited
and no writeback.
(arm_coproc_mem_operand_no_writeback): New constraint for memory
operand with no writeback.
(arm_print_operand): Extend 'E' specifier for memory operand
that does not support writeback.
(arm_mve_mode_and_operands_type_check): New constraint check for
MVE memory operands.
* config/arm/constraints.md: Add Uj constraint for VFP vldr.16
and vstr.16.
* config/arm/vfp.md (*mov_load_vfp_hf16): New pattern for
vldr.16.
(*mov_store_vfp_hf16): New pattern for vstr.16.
(*mov_vfp_16): Remove MVE moves.

gcc/testsuite/ChangeLog:

2020-08-04  Joe Ramsay  

Backported from master
2020-05-20  Joe Ramsay  

* gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c: New test.
---
 gcc/config/arm/arm-protos.h|  3 +
 gcc/config/arm/arm.c   | 74 ++
 gcc/config/arm/constraints.md  |  7 ++
 gcc/config/arm/vfp.md  | 26 +---
 .../arm/mve/intrinsics/mve-vldstr16-no-writeback.c | 17 +
 5 files changed, 105 insertions(+), 22 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 33d162c..e811da4 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -115,8 +115,11 @@ extern enum reg_class coproc_secondary_reload_class 
(machine_mode, rtx,
 extern bool arm_tls_referenced_p (rtx);
 
 extern int arm_coproc_mem_operand (rtx, bool);
+extern int arm_coproc_mem_operand_no_writeback (rtx);
+extern int arm_coproc_mem_operand_wb (rtx, int);
 extern int neon_vector_mem_operand (rtx, int, bool);
 extern int mve_vector_mem_operand (machine_mode, rtx, bool);
+bool arm_mve_mode_and_operands_type_check (machine_mode, rtx, rtx);
 extern int neon_struct_mem_operand (rtx);
 
 extern rtx *neon_vcmla_lane_prepare_operands (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a8825ee..d8da167 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13192,13 +13192,14 @@ neon_element_bits (machine_mode mode)
 /* Predicates for `match_operand' and `match_operator'.  */
 
 /* Return TRUE if OP is a valid coprocessor memory address pattern.
-   WB is true if full writeback address modes are allowed and is false
+   WB level is 2 if full writeback address modes are allowed, 1
if limited writeback address modes (POST_INC and PRE_DEC) are
-   allowed.  */
+   allowed and 0 if no writeback at all is supported.  */
 
 int
-arm_coproc_mem_operand (rtx op, bool wb)
+arm_coproc_mem_operand_wb (rtx op, int wb_level)
 {
+  gcc_assert (wb_level == 0 || wb_level == 1 || wb_level == 2);
   rtx ind;
 
   /* Reject eliminable registers.  */
@@ -13231,16 +13232,18 @@ arm_coproc_mem_operand (rtx op, bool wb)
 
   /* Autoincremment addressing modes.  POST_INC and PRE_DEC are
  acceptable in any case (subject to verification by
- arm_address_register_rtx_p).  We need WB to be true to accept
+ arm_address_register_rtx_p).  We need full writeback to accept
+ PRE_INC and POST_DEC, and at least restricted writeback for
  PRE_INC and POST_DEC.  */
-  if (GET_CODE (ind) == POST_INC
-  || GET_CODE (ind) == PRE_DEC
-  || (wb
- && (GET_CODE (ind) == PRE_INC
- || GET_CODE (ind) == POST_DEC)))
+  if (wb_level > 0
+  && (GET_CODE (ind) == POST_INC
+ || GET_CODE (ind) == PRE_DEC
+ || (wb_level > 1
+ && (GET_CODE (ind) == PRE_INC
+ || GET_CODE (ind) == POST_DEC
 return arm_address_register_rtx_p (XEXP (ind, 0), 0);
 
-  if (wb
+  if (wb_level > 1
   && (GET_CODE (ind) == POST_MODIFY || GET_CODE (ind) == PRE_MODIFY)
   && arm_address_register_rtx_p (XEXP (ind, 0), 0)
   && GET_CODE (XEXP (ind, 1)) == PLUS
@@ -13262,6 +13265,25 @@ arm_coproc_mem_operand (rtx op, bool wb)
   return FALSE;
 }
 
+/* Return TRUE if OP is a 

[PATCH] testsuite: Update some vect cases for partial vectors

2020-08-05 Thread Kewen.Lin via Gcc-patches
Hi,

This patch is to adjust some existing vectorization test cases
to stay well with the newly introduced partial vector usages.

Bootstrapped/regtested on aarch64-linux-gnu and powerpc64le-linux-gnu
P9 (with explicit param vect-partial-vector-usage=1 and enablement on
check_effective_target_vect_partial_vectors_usage_1 check).

Is it ok for trunk?

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-pr69907.c: Adjust for partial vector usages.
* gcc.dg/vect/slp-3.c: Likewise.
* gcc.dg/vect/slp-multitypes-11.c: Likewise.
* gcc.dg/vect/slp-perm-1.c: Likewise.
* gcc.dg/vect/slp-perm-5.c: Likewise.
* gcc.dg/vect/slp-perm-6.c: Likewise.
* gcc.dg/vect/slp-perm-7.c: Likewise.
* gcc.dg/vect/slp-perm-8.c: Likewise.
* gcc.dg/vect/slp-perm-9.c: Likewise.
* gcc.dg/vect/vect-version-2.c: Likewise.
* lib/target-supports.exp
(check_effective_target_vect_partial_vectors): New function.
(check_effective_target_vect_partial_vectors_usage_1): Likewise.
(check_effective_target_vect_partial_vectors_usage_2): Likewise.
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
index fe52d18525a..b348526b62f 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr69907.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O3" } */
+/* Disable for vectorization using partial vectors since it would have only
+   one iteration left, consequently BB vectorization won't happen.  */
+/* { dg-additional-options "-O3 --param=vect-partial-vector-usage=0" } */
 /* { dg-require-effective-target vect_unpack } */
 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/slp-3.c 
b/gcc/testsuite/gcc.dg/vect/slp-3.c
index 5e40499ff96..46ab584419a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-3.c
@@ -141,8 +141,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { 
! vect_fully_masked } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target 
vect_fully_masked } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target { ! vect_fully_masked } } } }*/
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target vect_fully_masked } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { 
! vect_partial_vectors } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target 
vect_partial_vectors } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target { ! vect_partial_vectors } } } }*/
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target vect_partial_vectors } } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c 
b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
index 5200ed1cd94..da6fb12eb0d 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
@@ -48,6 +48,9 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
vect_unpack } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { 
target vect_unpack xfail { vect_variable_length && vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { 
vect_unpack && {! vect_partial_vectors_usage_1 } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { 
vect_unpack && { vect_partial_vectors_usage_1 } } } } } */
+/* The epilogues are vectorized using partial vectors.  */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { 
target { vect_unpack && {! vect_partial_vectors_usage_1 } } xfail { 
vect_variable_length && vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  { 
target { vect_unpack && { vect_partial_vectors_usage_1 } } xfail { 
vect_variable_length && vect_load_lanes } } } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
index ca7803ec1a9..af6fe08856f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
@@ -80,8 +80,10 @@ int main (int argc, const char* argv[])
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
vect_perm } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { vect_perm3_int && {! vect_load_lanes } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { vect_perm3_int && { {! vect_load_lanes } && {! 
vect_partial_vectors_usage_1 } } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts 

Re: [PATCH] vect: Skip epilogue loops for dbgcnt check [PR96451]

2020-08-05 Thread Richard Biener via Gcc-patches
On Wed, Aug 5, 2020 at 9:48 AM Kewen.Lin  wrote:
>
> Hi,
>
> As the PR shows, commit r11-2453 exposed one issue that vectorizer
> wants to vectorize the epilogue loop and leaves the if-cvt body there,
> but later dbgcnt check disables it, the left scalar mask_store
> statement cause ICE.
>
> As Richard pointed out in that PR, the dbgcnt is to count original
> scalar loops, so this fix is to make it skip the epilogue loops.
>
> Testing just launched, is it OK for trunk if everthing goes well?

OK.

Thanks,
Richard.

> BR,
> Kewen
> -
>
> gcc/ChangeLog:
>
> * tree-vectorizer.c (try_vectorize_loop_1): Skip the epilogue loops
> for dbgcnt check.
>
> -
> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> index 26a184696aa..1e9f46148ca 100644
> --- a/gcc/tree-vectorizer.c
> +++ b/gcc/tree-vectorizer.c
> @@ -1066,7 +1066,8 @@ try_vectorize_loop_1 (hash_table 
> *_to_vf_htab,
>return ret;
>  }
>
> -  if (!dbg_cnt (vect_loop))
> +  /* Only count the original scalar loops.  */
> +  if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo) && !dbg_cnt (vect_loop))
>  {
>/* Free existing information if loop is analyzed with some
>  assumptions.  */


[committed] openmp: Handle even some combined non-rectangular loops

2020-08-05 Thread Jakub Jelinek via Gcc-patches
Hi!

The number of loops computation and logical iteration -> actual iterator values
computations can now be done separately even on composite constructs (though
for triangular loops it would still be more efficient to propagate a few values
through, will handle that incrementally).
simd and taskloop are still unhandled.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-08-05  Jakub Jelinek  

* omp-expand.c (expand_omp_for): Don't disallow combined non-rectangular
loops.

* testsuite/libgomp.c/loop-22.c: New test.
* testsuite/libgomp.c/loop-23.c: New test.

--- gcc/omp-expand.c.jj 2020-08-04 13:05:18.522759444 +0200
+++ gcc/omp-expand.c2020-08-04 16:58:53.834538817 +0200
@@ -7640,11 +7640,6 @@ expand_omp_for (struct omp_region *regio
   else if (fd.sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
   && !fd.have_ordered)
 {
-  if (fd.non_rect
- && (gimple_omp_for_combined_into_p (fd.for_stmt)
- || gimple_omp_for_combined_p (fd.for_stmt)))
-   sorry_at (gimple_location (fd.for_stmt),
- "non-rectangular OpenMP loops not supported yet");
   if (fd.chunk_size == NULL)
expand_omp_for_static_nochunk (region, , inner_stmt);
   else
--- libgomp/testsuite/libgomp.c/loop-22.c.jj2020-08-04 16:04:03.471418037 
+0200
+++ libgomp/testsuite/libgomp.c/loop-22.c   2020-08-04 16:08:06.827891401 
+0200
@@ -0,0 +1,189 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+signed char v[5][7][9][21][4][42][3];
+volatile int zero = 0, one = 1, two = 2, three = 3;
+volatile int five = 5, seven = 7, nine = 9, eleven = 11;
+
+int
+main ()
+{
+  for (int i = 0; i < 5; i++)
+  for (int j = 0; j < 7; j++)
+  for (int k = 0; k < 9; k++)
+  for (int l = 2 * j; l < 3 * j; l++)
+  for (int m = 7; m < 11; m++)
+  for (int n = l; n < 2 * l; n++)
+  for (int o = 0; o < 3; o++)
+v[i][j][k][l][m - 7][n][o] = 1;
+
+  int niters = 0;
+  #pragma omp teams reduction(+:niters)
+  #pragma omp distribute collapse(7)
+  for (int i = 0; i < 5; i++)
+  for (int j = 0; j < 7; j++)
+  for (int k = 0; k < 9; k++)
+  for (int l = 2 * j; l < 3 * j; l++)
+  for (int m = 7; m < 11; m++)
+  for (int n = l; n < 2 * l; n++)
+  for (int o = 0; o < 3; o++)
+{
+  niters++;
+  if (i < 0 || i >= 5
+ || j < 0 || j >= 7
+ || k < 0 || k >= 9
+ || l < 2 * j || l >= 3 * j
+ || m < 7 || m >= 11
+ || n < l || n >= 2 * l
+ || o < 0 || o >= 3)
+   abort ();
+  if (v[i][j][k][l][m - 7][n][o] != 1)
+   abort ();
+  v[i][j][k][l][m - 7][n][o]++;
+}
+
+  if (niters != 117180)
+abort ();
+
+  int niters2 = 0;
+  #pragma omp teams reduction(+:niters2)
+  #pragma omp distribute collapse(7)
+  for (int i = zero; i < five; i += one)
+  for (int j = seven - one; j >= zero; j -= one)
+  for (int k = nine - one; k >= zero; k += -one)
+  for (int l = two * j + zero; l < three * j; l += one)
+  for (int m = eleven - one; m >= seven; m -= one)
+  for (int n = two * l - one; n > one * l - one; n -= one)
+  for (int o = zero; o < three; o += one)
+{
+  niters2++;
+  if (i < 0 || i >= 5
+ || j < 0 || j >= 7
+ || k < 0 || k >= 9
+ || l < 2 * j || l >= 3 * j
+ || m < 7 || m >= 11
+ || n < l || n >= 2 * l
+ || o < 0 || o >= 3)
+   abort ();
+  if (v[i][j][k][l][m - 7][n][o] != 2)
+   abort ();
+  v[i][j][k][l][m - 7][n][o]++;
+}
+
+  if (niters2 != 117180)
+abort ();
+
+  for (int i = 0; i < 5; i++)
+  for (int j = 0; j < 7; j++)
+  for (int k = 0; k < 9; k++)
+  for (int l = 2 * j; l < 3 * j; l++)
+  for (int m = 7; m < 11; m++)
+  for (int n = l; n < 2 * l; n++)
+  for (int o = 0; o < 3; o++)
+if (v[i][j][k][l][m - 7][n][o] != 3)
+  abort ();
+
+  int niters3 = 0;
+  #pragma omp teams reduction(+:niters3)
+  #pragma omp distribute collapse(5)
+  for (int i = 4; i >= 0; i--)
+  for (int j = 6; j >= 0; --j)
+  for (int l = 3 * j - 1; l >= 2 * j; l--)
+  for (int n = 2 * l + -1; n > l - 1; --n)
+  for (int o = 2; o >= 0; o--)
+{
+  niters3++;
+  if (i < 0 || i >= 5
+ || j < 0 || j >= 7
+ || l < 2 * j || l >= 3 * j
+ || n < l || n >= 2 * l
+ || o < 0 || o >= 3)
+   abort ();
+  if (v[i][j][0][l][0][n][o] != 3)
+   abort ();
+  v[i][j][0][l][0][n][o]++;
+}
+
+  if (niters3 != 3255)
+abort ();
+
+  int niters4 = 0;
+  #pragma omp teams reduction(+:niters4)
+  #pragma omp distribute collapse(5)
+  for (int i = zero; i < five; i += one)
+  for (int j = zero; j <= seven - one; j += one)
+  for (int l = j * two; l < three * j + zero; l += one)
+  for (int n = one * l; n <= l * two - one; n += one)
+  for (int o = zero; o < three; o += one)
+{
+  niters4++;
+  if (i < 0 || i >= 5
+ || j < 0 || j >= 7
+ || l < 2 * j || l >= 3 * j
+ || n < l || n >= 2 * l
+ || o < 0 || o >= 3)
+

Re: [PATCH] Amend match.pd syntax with force-simplified results

2020-08-05 Thread Richard Biener
On Wed, 5 Aug 2020, Richard Biener wrote:

> On Tue, 4 Aug 2020, Marc Glisse wrote:
> 
> > On Fri, 31 Jul 2020, Richard Biener wrote:
> > 
> > > This adds a ! marker to result expressions that should simplify
> > > (and if not fail the simplification).  This can for example be
> > > used like
> > >
> > > (simplify
> > >  (plus (vec_cond:s @0 @1 @2) @3)
> > >  (vec_cond @0 (plus! @1 @3) (plus! @2 @3)))
> > >
> > > to make the simplification only apply in case both plus operations
> > > in the result end up simplified to a simple operand.
> > 
> > (replacing plus with bit_ior)
> > The generated code in gimple_simplify_BIT_IOR_EXPR may look like
> > 
> >   {
> > tree _o1[2], _r1;
> > _o1[0] = captures[2];
> > _o1[1] = captures[4];
> > gimple_match_op tem_op (res_op->cond.any_else (), BIT_IOR_EXPR, 
> > TREE_TYPE
> > (_o1[0]), _o1[0], _o1[1]);
> > tem_op.resimplify (lseq, valueize);
> > _r1 = maybe_push_res_to_seq (_op, NULL);
> > if (!_r1) return false;
> > res_op->ops[1] = _r1;
> >   }
> > 
> > In particular, it contains this "return false" which directly exits the
> > function, instead of just giving up on this particular transformation and
> > trying the next one. I'll reorder my transformations to work around this, 
> > but
> > it looks like a pre-existing limitation.
> 
> Yes, that's a pre-existing limitation/feature.  Once the "match" part of
> a pattern applied, (plus (vec_cond:s @0 @1 @2) @3) in this case,
> it wins even if it doesn't succeed in the end.  Patterns are "tried"
> in the order they appear in match.pd, so you have to make sure to
> put more specific ones before generic ones for example
> 
> (simplify
>  (plus @1 @2)
>  (if (false)
>   @1)))
> 
> would shadow any later pattern with an outermost (plus ...).
> 
> genmatch tries to warn about such situations but obviously it can't
> always guess correctly, in particular about "late" fails.
> 
> Hmm, looks like it doesn't warn about
> 
> (simplify
>  (plus @1 @2)
>  (if (false)
>   @1))
> 
> (simplify
>  (plus (minus @1 @2) @2)
>  @1)
> 
> ah, it works in this case.  I guess which cases fall thru really
> depend on the actual code generation and we could "fix" the above
> mentioned issue with, hmm ... EH?  (need some block-structured thing,
> placing gotos and labels will be too ugly in the code generator).
> Or simply nest things more and turn
> 
>  if (..) return false;
>  ...
>  return true;
> 
> into
> 
>  if (!...)
>{
>  ...
>  return true;
>}
> 
> I guess it's something we could indeed improve on.

OK, so the if (!..) route would be quite awkward with how the
code generator is structured.  Using EH would be straight-forward,
just wrap the code generation part in

 try {
...
 } catch (...) { return false }

and replace return false/NULL_TREE with throw; but of course we
have EH disabled.  Then the above might reasonably well translate
to a goto to a unique label we'd have to invent.

I'm testing a patch with the goto.

Richard.


[committed] openmp: Handle reduction clauses on host teams construct [PR96459]

2020-08-05 Thread Jakub Jelinek via Gcc-patches
Hi!

As the new testcase shows, we weren't actually performing reductions on
host teams construct.  And fixing that revealed a flaw in the for-14.c testcase.
The problem is that the tests perform also initialization and checking around 
the
calls to the functions with the OpenMP constructs.  In that testcase, all the
tests have been spawned from a teams construct but only the tested loops were
distribute, which means the initialization and checking has been performed
redundantly and racily in each team.  Fixed by performing the initialization
and checking outside of host teams and only do the calls to functions with
the tested constructs inside of host teams. 
   

2020-08-05  Jakub Jelinek  

PR middle-end/96459
* omp-low.c (lower_omp_taskreg): Call lower_reduction_clauses even in
for host teams.

* testsuite/libgomp.c/teams-3.c: New test.
* testsuite/libgomp.c-c++-common/for-2.h (OMPTEAMS): Define to nothing
if not defined yet.
(N(test)): Use it before all N(f*) calls.
* testsuite/libgomp.c-c++-common/for-14.c (DO_PRAGMA, OMPTEAMS): Define.
(main): Don't call all test_* functions from within
#pragma omp teams reduction(|:err), call them directly.

--- gcc/omp-low.c.jj2020-08-03 22:54:51.433531450 +0200
+++ gcc/omp-low.c   2020-08-04 16:29:49.269885408 +0200
@@ -11236,7 +11236,7 @@ lower_omp_taskreg (gimple_stmt_iterator
   gimple_seq par_rlist = NULL;
   lower_rec_input_clauses (clauses, _ilist, _olist, ctx, NULL);
   lower_omp (_body, ctx);
-  if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
+  if (gimple_code (stmt) != GIMPLE_OMP_TASK)
 lower_reduction_clauses (clauses, _rlist, NULL, ctx);
 
   /* Declare all the variables created by mapping and the variables
--- libgomp/testsuite/libgomp.c/teams-3.c.jj2020-08-04 16:44:54.063734515 
+0200
+++ libgomp/testsuite/libgomp.c/teams-3.c   2020-08-04 16:52:38.961983329 
+0200
@@ -0,0 +1,20 @@
+/* PR middle-end/96459 */
+
+#include 
+
+int
+main ()
+{
+  int niters = 0, i, j, k;
+  #pragma omp teams reduction(+:niters)
+  {
+#pragma omp distribute collapse(3)
+for (i = 0; i < 3; i++)
+  for (j = 0; j < 8; j += 2)
+   for (k = 0; k < 25; k += 3)
+ niters++;
+  }
+  if (niters != 108)
+abort ();
+  return 0;
+}
--- libgomp/testsuite/libgomp.c-c++-common/for-2.h.jj   2020-01-12 
11:54:39.027373971 +0100
+++ libgomp/testsuite/libgomp.c-c++-common/for-2.h  2020-08-05 
10:19:05.171118863 +0200
@@ -14,6 +14,9 @@ noreturn (void)
 #ifndef OMPTGT
 #define OMPTGT
 #endif
+#ifndef OMPTEAMS
+#define OMPTEAMS
+#endif
 #ifndef OMPTO
 #define OMPTO(v) do {} while (0)
 #endif
@@ -214,31 +217,37 @@ N(test) (void)
   for (i = 0; i < 1500; i++)
 a[i] = i - 25;
   OMPTO (a);
+  OMPTEAMS
   N(f0) ();
   OMPFROM (a);
   for (i = 0; i < 1500; i++)
 if (a[i] != i - 23)
   return 1;
+  OMPTEAMS
   N(f1) ();
   OMPFROM (a);
   for (i = 0; i < 1500; i++)
 if (a[i] != i - 25)
   return 1;
+  OMPTEAMS
   N(f2) ();
   OMPFROM (a);
   for (i = 0; i < 1500; i++)
 if (a[i] != i - 29)
   return 1;
+  OMPTEAMS
   N(f3) (1500LL - 1 - 23 - 48, -1LL + 25 - 48, 1LL);
   OMPFROM (a);
   for (i = 0; i < 1500; i++)
 if (a[i] != i - 22)
   return 1;
+  OMPTEAMS
   N(f3) (1500LL - 1 - 23 - 48, 1500LL - 1, 7LL);
   OMPFROM (a);
   for (i = 0; i < 1500; i++)
 if (a[i] != i - 22)
   return 1;
+  OMPTEAMS
   N(f4) ();
   OMPFROM (a);
   for (i = 0; i < 1500; i++)
@@ -249,6 +258,7 @@ N(test) (void)
   for (k = 0; k < 10; k++)
b[i][j][k] = i - 2.5 + 1.5 * j - 1.5 * k;
   OMPTO (b);
+  OMPTEAMS
   N(f5) (0, 10, 0, 15, 0, 10, 1, 1, 1);
   OMPFROM (b);
   for (i = 0; i < 10; i++)
@@ -256,6 +266,7 @@ N(test) (void)
   for (k = 0; k < 10; k++)
if (b[i][j][k] != i + 1.5 * j - 1.5 * k)
  return 1;
+  OMPTEAMS
   N(f5) (0, 10, 30, 15, 0, 10, 4, 5, 6);
   OMPFROM (b);
   for (i = 0; i < 10; i++)
@@ -263,6 +274,7 @@ N(test) (void)
   for (k = 0; k < 10; k++)
if (b[i][j][k] != i + 1.5 * j - 1.5 * k)
  return 1;
+  OMPTEAMS
   N(f6) (9, -1, 29, 0, 9, -1, -1, -2, -1);
   OMPFROM (b);
   for (i = 0; i < 10; i++)
@@ -270,6 +282,7 @@ N(test) (void)
   for (k = 0; k < 10; k++)
if (b[i][j][k] != i - 4.5 + 1.5 * j - 1.5 * k)
  return 1;
+  OMPTEAMS
   N(f7) ();
   OMPFROM (b);
   for (i = 0; i < 10; i++)
@@ -277,6 +290,7 @@ N(test) (void)
   for (k = 0; k < 10; k++)
if (b[i][j][k] != i + 1.0 + 1.5 * j - 1.5 * k)
  return 1;
+  OMPTEAMS
   N(f8) ();  
   OMPFROM (b);
   for (i = 0; i < 10; i++)
@@ -284,9 +298,13 @@ N(test) (void)
   for (k = 0; k < 10; k++)
if (b[i][j][k] != i + 1.0 + 1.5 * j - 1.5 * k)
  return 1;
+  OMPTEAMS
   N(f9) ();
+  OMPTEAMS
   N(f10) ();
+  OMPTEAMS
   N(f11) (10);
+  OMPTEAMS
   N(f12) (12);
   OMPFROM (a);
   OMPFROM (b);

[committed] openmp: Use more efficient logical -> actual computation even if # iterations is computed at runtime

2020-08-05 Thread Jakub Jelinek via Gcc-patches
Hi!

For triangular loops use more efficient logical iteration number
to actual iterator values computation even for non-rectangular loops
where number of loop iterations could not be computed at compile time.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-08-05  Jakub Jelinek  

* omp-expand.c (expand_omp_for_init_counts): Remember
first_inner_iterations, factor and n1o from the number of iterations
computation in *fd.
(expand_omp_for_init_vars): Use more efficient logical iteration number
to actual iterator values computation even for non-rectangular loops
where number of loop iterations could not be computed at compile time.

--- gcc/omp-expand.c.jj 2020-08-04 10:53:03.114637202 +0200
+++ gcc/omp-expand.c2020-08-04 13:05:18.522759444 +0200
@@ -2181,6 +2181,13 @@ expand_omp_for_init_counts (struct omp_f
  set_immediate_dominator (CDI_DOMINATORS, bb3, bb1);
  set_immediate_dominator (CDI_DOMINATORS, bb5, bb2);
  set_immediate_dominator (CDI_DOMINATORS, entry_bb, bb1);
+
+ if (fd->first_nonrect + 1 == fd->last_nonrect)
+   {
+ fd->first_inner_iterations = first_inner_iterations;
+ fd->factor = factor;
+ fd->adjn1 = n1o;
+   }
}
   else
{
@@ -2469,8 +2476,11 @@ expand_omp_for_init_vars (struct omp_for
   build_zero_cst (type), true);
  basic_block bb_triang = NULL, bb_triang_dom = NULL;
  if (fd->first_nonrect + 1 == fd->last_nonrect
- /* For now.  */
- && TREE_CODE (fd->loop.n2) == INTEGER_CST
+ && (TREE_CODE (fd->loop.n2) == INTEGER_CST
+ || (fd->first_inner_iterations
+ /* For now.  Later add clauses to propagate the
+values.  */
+ && !gimple_omp_for_combined_into_p (fd->for_stmt)))
  && (optab_handler (sqrt_optab, TYPE_MODE (double_type_node))
  != CODE_FOR_nothing))
{
@@ -2567,14 +2577,19 @@ expand_omp_for_init_vars (struct omp_for
   build_one_cst (ulltype));
  t = fold_build2 (MULT_EXPR, ulltype, c, t);
  t = fold_build2 (RSHIFT_EXPR, ulltype, t, integer_one_node);
- t = fold_build2 (MULT_EXPR, ulltype, fd->factor, t);
- tree t2 = fold_build2 (MULT_EXPR, ulltype, c,
-fd->first_inner_iterations);
+ t = fold_build2 (MULT_EXPR, ulltype,
+  fold_convert (ulltype, fd->factor), t);
+ tree t2
+   = fold_build2 (MULT_EXPR, ulltype, c,
+  fold_convert (ulltype,
+fd->first_inner_iterations));
  t = fold_build2 (PLUS_EXPR, ulltype, t, t2);
  expand_omp_build_assign (gsi, d, t, true);
- t = fold_build2 (MULT_EXPR, ulltype, fd->factor, c);
+ t = fold_build2 (MULT_EXPR, ulltype,
+  fold_convert (ulltype, fd->factor), c);
  t = fold_build2 (PLUS_EXPR, ulltype,
-  t, fd->first_inner_iterations);
+  t, fold_convert (ulltype,
+   fd->first_inner_iterations));
  t2 = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, false,
 GSI_CONTINUE_LINKING);
  cond_stmt = gimple_build_cond (GE_EXPR, stopvalull, d,

Jakub



RE: [PATCH 2/5][Arm] New pattern for CSINV instructions

2020-08-05 Thread Kyrylo Tkachov
Hi Omar,

From: Omar Tahir  
Sent: 04 August 2020 17:12
To: Kyrylo Tkachov ; ni...@redhat.com; Ramana 
Radhakrishnan ; Richard Earnshaw 
; gcc-patches@gcc.gnu.org
Subject: [PATCH 2/5][Arm] New pattern for CSINV instructions

This patch adds a new pattern, *thumb2_csinv, for generating CSINV nstructions.

This pattern relies on a few general changes that will be used throughout
the following patches:
- A new macro, TARGET_COND_ARITH, which is only true on 8.1-M Mainline
  and represents the existence of these conditional instructions.
- A change to the cond exec hook, arm_have_conditional_execution, which
  now returns false if TARGET_COND_ARITH before reload. This allows for
  some ifcvt transformations when they would usually be disabled. I've
  written a rather verbose comment (with the risk of over-explaining)
  as it's a bit of a confusing change.
- One new predicate and one new constraint.
- *thumb2_movcond has been restricted to only match if 
!TARGET_COND_ARITH,
  otherwise it triggers undesirable combines.

2020-07-30: Sudakshina Das 
Omar Tahir 


No ":" in the ChangeLog and two spaces between the 3 fields.


* config/arm/arm.h (TARGET_COND_ARITH): New macro.
* config/arm/arm.c (arm_have_conditional_execution): Return false if
TARGET_COND_ARITH before reload.
* config/arm/constraints.md: (Z): New constant zero.
* config/arm/predicates.md(arm_comparison_operation): Returns true if
comparing CC_REGNUM with constant zero.
* config/arm/thumb2.md (*thumb2_csinv): New.
(*thumb2_movcond): Don't match if TARGET_COND_ARITH.

Regression tested on arm-none-eabi.


gcc/testsuite/ChangeLog:

2020-07-30: Sudakshina Das 
Omar Tahir 

* gcc.target/arm/csinv-1.c: New test.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dac9a6fb5c4..3a9684cdcd8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29833,12 +29833,20 @@ arm_frame_pointer_required (void)
   return false;
}

-/* Only thumb1 can't support conditional execution, so return true if
-   the target is not thumb1.  */
static bool


Functions should have comments in GCC. Can you please write something 
describing the new logic of the function.

arm_have_conditional_execution (void)
{
-  return !TARGET_THUMB1;
+  bool has_cond_exec, enable_ifcvt_trans;
+
+  /* Only THUMB1 cannot support conditional execution. */
+  has_cond_exec = !TARGET_THUMB1;
+
+  /* When TARGET_COND_ARITH is defined we'd like to turn on some ifcvt
+ transformations before reload. */
+  enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed;
+
+  /* The ifcvt transformations are only turned on if we return false. */
+  return has_cond_exec && !enable_ifcvt_trans;

I don't think that comment is very useful. Perhaps "Enable ifcvt 
transformations only if..."

}

 /* The AAPCS sets the maximum alignment of a vector to 64 bits.  */
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 30e1d6dc994..d67c91796e4 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -177,6 +177,10 @@ emission of floating point pcs attributes.  */

 #define TARGET_CRC32   (arm_arch_crc)

+/* Thumb-2 but also has some conditional arithmetic instructions like csinc,
+   csinv, etc. */
+#define TARGET_COND_ARITH  (arm_arch8_1m_main)
+
/* The following two macros concern the ability to execute coprocessor
instructions for VFPv3 or NEON.  TARGET_VFP3/TARGET_VFPD32 are currently
only ever tested when we know we are generating for VFP hardware; we need
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 011badc9957..048b25ef4a1 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -28,6 +28,7 @@
;; The following normal constraints have been used:
;; in ARM/Thumb-2 state: G, I, j, J, K, L, M
;; in Thumb-1 state: I, J, K, L, M, N, O
+;; in all states: Z
;; 'H' was previously used for FPA.

 ;; The following multi-letter normal constraints have been used:
@@ -479,6 +480,11 @@
  (and (match_code "mem")
   (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1, true)")))

+(define_constraint "Z"
+  "@internal
+   Integer constant zero."
+  (match_test "op == const0_rtx"))


We're usually wary of adding more constraints unless necessary as it gets 
complicated to read patterns quickly (especially once we get into multi-letter 
constraints).
I think you can reuse the existing "Pz" constraint for your purposes.


Ok with those changes.
If you'd like to commit it yourself please apply for write access at 
https://sourceware.org/cgi-bin/pdw/ps_form.cgi listing my email address from 
MAINTAINERS as the approver.

Thanks,
Kyrill




Re: RFC: Monitoring old PRs, new dg directives

2020-08-05 Thread Richard Sandiford
Marek Polacek  writes:
> On Wed, Jul 29, 2020 at 09:40:35AM +0100, Richard Sandiford wrote:
>> I guess there's a possibility that some tests happen to pass already
>> on some targets.  That's more likely with middle-end and backend bugs
>> rather than frontend stuff though.  Perhaps for those it would make
>> sense to have a convention in which the failing testcase is restricted
>> (at the whole-test level) to the targets that the person committing the
>> testcase has actually tried.  Maybe with a comment on the dg-ice etc.
>> to remind people to reconsider the main target selector when un-XFAILing
>> the test.
>
> Interesting point.  With my frontend hat on, I hadn't really thought of
> this much, but the dg-ice directive allows you to specify the targets and
> specific options when to expect an ICE.  So you could run a test everywhere
> but only expect an ICE on aarch64.

Yeah.  But the problem I was thinking of was: whoever adds the test
will only test on a subset of targets.  If the test runs for all targets,
the dg-ice condition has to be exact for all targets too.  Missing out
one target will generate a new FAIL, while adding a target unnecessarily
will generate an XPASS.  So I think the condition has to be applied at
a whole-test level instead, unless the person committing the test is
confident about which targets are and aren't affected.

(The same goes for other directives, dg-ice is just an example.)

Richard




Re: RFC: Monitoring old PRs, new dg directives

2020-08-05 Thread Richard Sandiford
Marek Polacek via Gcc-patches  writes:
> On Thu, Jul 30, 2020 at 11:54:23AM +0200, Jakub Jelinek via Gcc-patches wrote:
>> On Tue, Jul 28, 2020 at 05:44:47PM -0400, Marek Polacek via Gcc-patches 
>> wrote:
>> > We will still have a surfeit of bugs that we've given short shrift to, but
>> > let's at least automate what we can.  The initial addition of the relevant
>> > old(-ish) tests won't of course happen automagically, but it's a price I'm
>> > willing to pay.  My goal here isn't merely to reduce the number of open 
>> > PRs;
>> > it is to improve the testing of the compiler overall.
>> > 
>> > Thoughts?
>> 
>> Looks useful to me, but I'd think it might be desirable to use separate
>> directories for those tests, so that it is more obvious that it is a
>> different category of tests.  Now that we use git, just using git mv
>> to move them to another place once they are fixed for good (together with
>> some dg-* directive tweaks) wouldn't be that much work later.
>> 
>> So having gcc.dg/unfixed/ , g++.dg/unfixed/ , c-c++-common/unfixed/
>> and their torture/ suffixed variants (or better directory name for those)?
>
> Thanks.  I was afraid that it would cause too much friction when you happen
> to fix one of the unfixed tests: you will have to find the correct directory
> to put the test in and perhaps even rename the test to avoid conflicts with
> tests with the same name in the final destination.  But it's also true that
> git is much better at moving files, and the extra clarity might be worth the
> occasional hassle.  It would also make it easy to skip testing unfixed tests.
> dg-ice tests are easy to spot/grep for, but accepts-invalid/rejects-valid are
> a different story.

FWIW, I agree with Mike that it might be better to put tests in the
location that they'd have normally.  Some reasons:

- As already mentioned, it'd give more stable names.  That's not an
  issue for things like git log, but it does mean that comparing one
  set of test results with another gives a meaningful XFAIL->PASS
  transition rather than a “removed test” + “added test” transition.
  It's also more convenient when comparing different branches.

- There are a lot of specialised .exp harnesses, and I don't think it
  would be a good idea to encourage all of them to have unfixed/
  variants, or unfixed/ subdirectories.  E.g. vect.exp is not something
  we'd want to duplicate or subclass, but there are others.

- There's a temptation to remove empty directories/harnesses when
  they have no associated tests, so we might oscillate between having
  unfixed/ directories and not.  (The alternative is for the directory
  or harness to stick around based on the fact that it was useful at
  some point in the past.)

- It's inconsistent with existing tests that are already XFAILed for all
  targets.  Having a separate directory for completely-XFAILed tests
  might create a requirement (or the impression of a requirement) to
  move tests to an unfixed/ directory when XFAILing them.

- IMO it draws an artificial distinction between tests that are
  completely XFAILed on all targets and tests that are either partly
  XFAILed for all targets, or XFAILed only for some targets.

Thanks,
Richard


[PATCH] vect: Skip epilogue loops for dbgcnt check [PR96451]

2020-08-05 Thread Kewen.Lin via Gcc-patches
Hi,

As the PR shows, commit r11-2453 exposed one issue that vectorizer
wants to vectorize the epilogue loop and leaves the if-cvt body there,
but later dbgcnt check disables it, the left scalar mask_store
statement cause ICE.

As Richard pointed out in that PR, the dbgcnt is to count original
scalar loops, so this fix is to make it skip the epilogue loops.

Testing just launched, is it OK for trunk if everthing goes well?

BR,
Kewen
-

gcc/ChangeLog:

* tree-vectorizer.c (try_vectorize_loop_1): Skip the epilogue loops
for dbgcnt check.

-
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 26a184696aa..1e9f46148ca 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -1066,7 +1066,8 @@ try_vectorize_loop_1 (hash_table 
*_to_vf_htab,
   return ret;
 }

-  if (!dbg_cnt (vect_loop))
+  /* Only count the original scalar loops.  */
+  if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo) && !dbg_cnt (vect_loop))
 {
   /* Free existing information if loop is analyzed with some
 assumptions.  */


Re: [PATCH v5] vect/rs6000: Support vector with length cost modeling

2020-08-05 Thread Richard Sandiford
"Kewen.Lin"  writes:
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 490e7befea3..2a0d3b1840d 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -412,7 +412,10 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
> rgroup_controls *dest_rgm,
>  
> This means that we cannot guarantee that such an induction variable
> would ever hit a value that produces a set of all-false masks or zero
> -   lengths for RGC.  */
> +   lengths for RGC.
> +
> +   Note that please check cost modeling whether needed to be updated in
> +   function vect_estimate_min_profitable_iters if any changes.  */

Maybe:

   Note: the cost of the code generated by this function is modeled
   by vect_estimate_min_profitable_iters, so changes here may need
   corresponding changes there.  */

> […]
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 43ec4fb04d5..dea9ca6778f 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -3798,6 +3798,58 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>(void) add_stmt_cost (loop_vinfo, target_cost_data, num_masks - 1,
>   vector_stmt, NULL, NULL_TREE, 0, vect_body);
>  }
> +  else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> +{
> +  /* Referring to the functions vect_set_loop_condition_partial_vectors
> +  and vect_set_loop_controls_directly, we need to generate each
> +  length in the prologue and in the loop body if required. Although
> +  there are some possible optimizations, we consider the worst case
> +  here.  */
> +
> +  bool niters_known_p = LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo);
> +  bool need_iterate_p
> + = (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +&& !vect_known_niters_smaller_than_vf (loop_vinfo));
> +
> +  /* Calculate how many statements to be added.  */
> +  unsigned int prologue_stmt = 0;
> +  unsigned int body_stmt = 0;

Sorry to be nit-picky, but “_stmts” reads better to me.

> […]
> @@ -4015,6 +4067,10 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>&& min_profitable_iters < (assumed_vf + peel_iters_prologue))
>  /* We want the vectorized loop to execute at least once.  */
>  min_profitable_iters = assumed_vf + peel_iters_prologue;
> +  else if (min_profitable_iters < peel_iters_prologue)
> +/* For LOOP_VINFO_USING_PARTIAL_VECTORS_P, we need to ensure the
> +   vectorized loop to execute at least once.  */
> +min_profitable_iters = peel_iters_prologue;

s/to execute/executes/

> […]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 31af46ae19c..8550a252f44 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -12090,7 +12090,10 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, 
> stmt_vec_info stmt_info,
> min_of_start_and_end = min (START_INDEX, END_INDEX);
> left_len = END_INDEX - min_of_start_and_end;
> rhs = min (left_len, LEN_LIMIT);
> -   LEN = rhs;  */
> +   LEN = rhs;
> +
> +   Note that please check cost modeling whether needed to be updated in
> +   function vect_estimate_min_profitable_iters if any changes.  */

Same suggested edit as above.

OK for the vectoriser parts with those changes, thanks.

Richard


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-05 Thread Richard Biener
On Tue, 4 Aug 2020, H.J. Lu wrote:

> On Tue, Aug 4, 2020 at 12:35 AM Richard Biener  wrote:
> >
> > On Mon, 3 Aug 2020, Qing Zhao wrote:
> >
> > > Hi, Uros,
> > >
> > > Thanks a lot for your review on X86 parts.
> > >
> > > Hi, Richard,
> > >
> > > Could you please take a look at the middle-end part to see whether the
> > > rewritten addressed your previous concern?
> >
> > I have a few comments below - I'm not sure I'm qualified to fully
> > review the rest though.
> >
> > > Thanks a lot.
> > >
> > > Qing
> > >
> > >
> > > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak  wrote:
> > > >
> > > >
> > > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao  > > > > napisala:
> > > > >
> > > > >
> > > > > Richard and Uros,
> > > > >
> > > > > Could you please review the change that H.J and I rewrote based on 
> > > > > your comments in the previous round of discussion?
> > > > >
> > > > > This patch is a nice security enhancement for GCC that has been 
> > > > > requested by security people for quite some time.
> > > > >
> > > > > Thanks a lot for your time.
> > > >
> > > > I'll be away from the keyboard for the next week, but the patch needs a 
> > > > middle end approval first.
> > > >
> > > > That said, x86 parts looks OK.
> > > >
> > > >
> > >
> > > > Uros.
> > > > > Qing
> > > > >
> > > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches 
> > > > > > mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > >
> > > > > > Hi, Gcc team,
> > > > > >
> > > > > > This patch is a follow-up on the previous patch and corresponding 
> > > > > > discussion:
> > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html 
> > > > > >  
> > > > > >  > > > > > >
> > > > > >
> > > > > > From the previous round of discussion, the major issues raised were:
> > > > > >
> > > > > > A. should be rewritten by using regsets infrastructure.
> > > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > > >
> > > > > > This new patch is rewritten based on the above 2 comments.  The 
> > > > > > major changes compared to the previous patch are:
> > > > > >
> > > > > > 1. Change the names of the option and attribute from
> > > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and 
> > > > > > zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > to:
> > > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  
> > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > > Add the new option and  new attribute in general.
> > > > > > 2. The main code generation part is moved from i386 backend to 
> > > > > > middle-end;
> > > > > > 3. Add 4 target-hooks;
> > > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > > 5. On a target that does not implement the target hook, issue error 
> > > > > > for the new option, issue warning for the new attribute.
> > > > > >
> > > > > > The patch is as following:
> > > > > >
> > > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > > command-line option and
> > > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function 
> > > > > > attribue:
> > > > > >
> > > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > > >
> > > > > >  Don't zero call-used registers upon function return.
> >
> > Does a return via EH unwinding also constitute a function return?  I
> > think you may want to have a finally handler or support in the unwinder
> > for this?  Then there's abnormal return via longjmp & friends, I guess
> > there's nothing that can be done there besides patching glibc?
> 
> Abnormal returns, like EH unwinding and longjmp, aren't covered by this
> patch. Only normal returns are covered.

What's the point then?  Also specifically thinking about spill slots.

Richard.

> > In general I am missing reasoning as why to use -fzero-call-used-regs=
> > in the documentation, that is, what is the thread model and what are
> > the guarantees?  Is there any point zeroing registers when spill slots
> > are left populated with stale register contents?  How do I (and why
> > would I want to?) ensure that there's no information leak from the
> > implementation of 'foo' to their callers?  Do I need to compile all
> > of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> > or is it enough to annotate API boundaries I want to proptect with
> > zero_call_used_regs("...")?
> >
> > Again - what's the intended use (and how does it fulful anything useful
> > for that case)?
> >
> > > > > >  2. -fzero-call-used-regs=used-gpr and 
> > > > > > zero_call_used_regs("used-gpr")
> > > > > >
> > > > > >  Zero used call-used general purpose registers upon function return.
> > > > > >
> > > > > >  3. -fzero-call-used-regs=all-gpr and 

Re: [PATCH] Amend match.pd syntax with force-simplified results

2020-08-05 Thread Richard Biener
On Tue, 4 Aug 2020, Marc Glisse wrote:

> On Fri, 31 Jul 2020, Richard Biener wrote:
> 
> > This adds a ! marker to result expressions that should simplify
> > (and if not fail the simplification).  This can for example be
> > used like
> >
> > (simplify
> >  (plus (vec_cond:s @0 @1 @2) @3)
> >  (vec_cond @0 (plus! @1 @3) (plus! @2 @3)))
> >
> > to make the simplification only apply in case both plus operations
> > in the result end up simplified to a simple operand.
> 
> (replacing plus with bit_ior)
> The generated code in gimple_simplify_BIT_IOR_EXPR may look like
> 
>   {
> tree _o1[2], _r1;
> _o1[0] = captures[2];
> _o1[1] = captures[4];
> gimple_match_op tem_op (res_op->cond.any_else (), BIT_IOR_EXPR, TREE_TYPE
> (_o1[0]), _o1[0], _o1[1]);
> tem_op.resimplify (lseq, valueize);
> _r1 = maybe_push_res_to_seq (_op, NULL);
> if (!_r1) return false;
> res_op->ops[1] = _r1;
>   }
> 
> In particular, it contains this "return false" which directly exits the
> function, instead of just giving up on this particular transformation and
> trying the next one. I'll reorder my transformations to work around this, but
> it looks like a pre-existing limitation.

Yes, that's a pre-existing limitation/feature.  Once the "match" part of
a pattern applied, (plus (vec_cond:s @0 @1 @2) @3) in this case,
it wins even if it doesn't succeed in the end.  Patterns are "tried"
in the order they appear in match.pd, so you have to make sure to
put more specific ones before generic ones for example

(simplify
 (plus @1 @2)
 (if (false)
  @1)))

would shadow any later pattern with an outermost (plus ...).

genmatch tries to warn about such situations but obviously it can't
always guess correctly, in particular about "late" fails.

Hmm, looks like it doesn't warn about

(simplify
 (plus @1 @2)
 (if (false)
  @1))

(simplify
 (plus (minus @1 @2) @2)
 @1)

ah, it works in this case.  I guess which cases fall thru really
depend on the actual code generation and we could "fix" the above
mentioned issue with, hmm ... EH?  (need some block-structured thing,
placing gotos and labels will be too ugly in the code generator).
Or simply nest things more and turn

 if (..) return false;
 ...
 return true;

into

 if (!...)
   {
 ...
 return true;
   }

I guess it's something we could indeed improve on.

Richard,

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)