date:20200804

Re: [RISC-V] Add support for AddressSanitizer on RISC-V GCC

2020-08-04 Thread Kito Cheng via Gcc-patches

Hi Joshua, Jim:

> > +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
> > +
> > +static unsigned HOST_WIDE_INT
> > +riscv_asan_shadow_offset (void)
> > +{
> > +  return HOST_WIDE_INT_UC (0x1000);
> > +}
>
> Is there a reason why you used 0x1000?
>
> Looking at other targets, it appears the convention is 1<<29 for
> 32-bit targets, and a number larger than 1<<32 for 64-bit targets.  I
> think the RISC-V Linux port has a minimum of 39-bit virtual addresses
> (SV39) suggesting that this should be 1<<36 for 64-bit targets.  I can
> test the 32-bit support on qemu, and the 64-bit support on hardware,
> but my hardware is doing other stuff today.  I should be able to try
> testing this tomorrow.
>
> Otherwise the gcc stuff is pretty simple and looks OK.  We just need
> to double check these numbers.

Default offset is 1ULL << 44 for 64 bit target and 1ULL << 29 for 32
bit target in LLVM[1, 2],
I am not talking about we should use those values, just remind that we
should sync this offset value to LLVM :)

[1] 
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp#L96
[2] 
https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/asan/asan_mapping.h#L159

[PATCH] Power10: Add BRH, BRW, BRD support.

2020-08-04 Thread Michael Meissner via Gcc-patches

Power10: Add BRH, BRW, BRD support.

The power10 processor adds 3 new instructions (BRH, BRW, BRD) that byte swaps
half-words, words, and double-words within a GPR register.  This patch adds
support for these instructions.  I have applied the suggestions from the
previous times I have submitted this patch.  I have done bootstrap builds on a
Linux power8 system.  I have run the regression tests, and there were no
regressions, and the 3 new tests pass.  Can I check this into the master
branch?

gcc/
2020-08-04  Michael Meissner  

* config/rs6000/rs6000.md (bswaphi2_reg): Generate the BRH
instruction on ISA 3.1.
(bswapsi2_reg): Generate the BRW instruction on ISA 3.1.
(bswapdi2): Rename bswapdi2_xxbrd to bswapdi2_brd.
(bswapdi2_brd): Rename from bswapdi2_xxbrd.  Generate the BRD
instruction on ISA 3.1.

gcc/testsuite/
2020-08-04  Michael Meissner  

* gcc.target/powerpc/bswap-brd.c: New test.
* gcc.target/powerpc/bswap-brw.c: New test.
* gcc.target/powerpc/bswap-brh.c: New test.
---
 gcc/config/rs6000/rs6000.md  | 44 +++-
 gcc/testsuite/gcc.target/powerpc/bswap-brd.c | 23 +++
 gcc/testsuite/gcc.target/powerpc/bswap-brh.c | 11 +++
 gcc/testsuite/gcc.target/powerpc/bswap-brw.c | 22 ++
 4 files changed, 80 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bswap-brd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bswap-brh.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bswap-brw.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 48f1f1c..43b620a 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2591,15 +2591,16 @@ (define_insn "bswap2_store"
   [(set_attr "type" "store")])
 
 (define_insn_and_split "bswaphi2_reg"
-  [(set (match_operand:HI 0 "gpc_reg_operand" "=,wa")
+  [(set (match_operand:HI 0 "gpc_reg_operand" "=r,,wa")
(bswap:HI
-(match_operand:HI 1 "gpc_reg_operand" "r,wa")))
-   (clobber (match_scratch:SI 2 "=,X"))]
+(match_operand:HI 1 "gpc_reg_operand" "r,r,wa")))
+   (clobber (match_scratch:SI 2 "=X,,X"))]
   ""
   "@
+   brh %0,%1
#
xxbrh %x0,%x1"
-  "reload_completed && int_reg_operand (operands[0], HImode)"
+  "reload_completed && !TARGET_POWER10 && int_reg_operand (operands[0], 
HImode)"
   [(set (match_dup 3)
(and:SI (lshiftrt:SI (match_dup 4)
 (const_int 8))
@@ -2615,21 +2616,22 @@ (define_insn_and_split "bswaphi2_reg"
   operands[3] = simplify_gen_subreg (SImode, operands[0], HImode, 0);
   operands[4] = simplify_gen_subreg (SImode, operands[1], HImode, 0);
 }
-  [(set_attr "length" "12,4")
-   (set_attr "type" "*,vecperm")
-   (set_attr "isa" "*,p9v")])
+  [(set_attr "length" "*,12,*")
+   (set_attr "type" "shift,*,vecperm")
+   (set_attr "isa" "p10,*,p9v")])
 
 ;; We are always BITS_BIG_ENDIAN, so the bit positions below in
 ;; zero_extract insns do not change for -mlittle.
 (define_insn_and_split "bswapsi2_reg"
-  [(set (match_operand:SI 0 "gpc_reg_operand" "=,wa")
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r,,wa")
(bswap:SI
-(match_operand:SI 1 "gpc_reg_operand" "r,wa")))]
+(match_operand:SI 1 "gpc_reg_operand" "r,r,wa")))]
   ""
   "@
+   brw %0,%1
#
xxbrw %x0,%x1"
-  "reload_completed && int_reg_operand (operands[0], SImode)"
+  "reload_completed && !TARGET_POWER10 && int_reg_operand (operands[0], 
SImode)"
   [(set (match_dup 0)  ; DABC
(rotate:SI (match_dup 1)
   (const_int 24)))
@@ -2646,9 +2648,9 @@ (define_insn_and_split "bswapsi2_reg"
(and:SI (match_dup 0)
(const_int -256]
   ""
-  [(set_attr "length" "12,4")
-   (set_attr "type" "*,vecperm")
-   (set_attr "isa" "*,p9v")])
+  [(set_attr "length" "4,12,4")
+   (set_attr "type" "shift,*,vecperm")
+   (set_attr "isa" "p10,*,p9v")])
 
 ;; On systems with LDBRX/STDBRX generate the loads/stores directly, just like
 ;; we do for L{H,W}BRX and ST{H,W}BRX above.  If not, we have to generate more
@@ -2681,7 +2683,7 @@ (define_expand "bswapdi2"
  emit_insn (gen_bswapdi2_store (dest, src));
 }
   else if (TARGET_P9_VECTOR)
-   emit_insn (gen_bswapdi2_xxbrd (dest, src));
+   emit_insn (gen_bswapdi2_brd (dest, src));
   else
emit_insn (gen_bswapdi2_reg (dest, src));
   DONE;
@@ -2712,13 +2714,15 @@ (define_insn "bswapdi2_store"
   "stdbrx %1,%y0"
   [(set_attr "type" "store")])
 
-(define_insn "bswapdi2_xxbrd"
-  [(set (match_operand:DI 0 "gpc_reg_operand" "=wa")
-   (bswap:DI (match_operand:DI 1 "gpc_reg_operand" "wa")))]
+(define_insn "bswapdi2_brd"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,wa")
+   (bswap:DI (match_operand:DI 1 "gpc_reg_operand" "r,wa")))]
   "TARGET_P9_VECTOR"
-  "xxbrd %x0,%x1"
-  [(set_attr "type" "vecperm")
-

Re: [PATCH] [AVX512]For vector compare to mask register, UNSPEC is needed instead of comparison operator [PR96243]

2020-08-04 Thread Hongtao Liu via Gcc-patches

On Tue, Aug 4, 2020 at 6:28 PM Kirill Yukhin  wrote:
>
> On 04 авг 13:26, Kirill Yukhin wrote:
> > Could you please clarify, how your patch relared to [1]?
> > I see from the bug that it describes perf issue w.r.t. scalar
> > operations.
>
Sorry for Typo, it's pr96243.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96243
> [1] - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96226
>
> >
> > --
> > Regards, Kirill Yukhin



-- 
BR,
Hongtao

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Marek Polacek via Gcc-patches

On Tue, Aug 04, 2020 at 03:53:50PM -0700, Mike Stump wrote:
> On Aug 4, 2020, at 3:16 PM, Marek Polacek via Gcc-patches 
>  wrote:
> > 
> > The benefit of dg-accepts-invalid was that you would
> > get an XPASS even for a test that should not be accepted, but you didn't 
> > know
> > what line to expect an error on, so you put a dg-error at the end of the 
> > test.
> 
> I think for most cases it's easy enough to figure out where the error goes.  
> I do see the subtly of the dg-accepts-invalid directive now in the harder 
> cases.  A change of state of them by the new error message I'd like to think 
> is enough to get the people to look at the test case and the corresponding 
> bug report.  I'd propose seeing if people don't also push along the bug in 
> that sort of complex case, I think they will.

You're probably right.  Let's go ahead without dg-accepts-invalid for now and
perhaps reconsider later.

Marek

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Marek Polacek via Gcc-patches

On Tue, Aug 04, 2020 at 03:33:23PM -0700, Mike Stump wrote:
> I think the read of the room is that people think it would be generally 
> useful, so let approve the general plan.

Cool.

> So, now we are down to the fine details.  Please do see just how far you can 
> stretch the existing mechanisms to cover what you need to do.  I think the 
> existing mechanisms should be able to cover it all; but the devil is in the 
> details and those matter.

At this point I'm only proposing one new directive, dg-ice.  I think we can't
really do without it.  The other one was a matter of convenience.

> For the suggestion to isolate the tests into their own area easily 
> distinguished by filename, I think it is better to choose an original home 
> for them using the existing naming scheme as much as possible, that way when 
> fixed, they are already in the right spot.  We in theory can move them 
> around, but, there is a beauty in having a long term stable name that just 
> doesn't change any.  You can look through time to see all state changes.  git 
> log file.C, show you all the history nicely, and so on.

What you say makes sense.  I'm still pretty wishy-washy about this.

But since git tracks renaming files well (you see the whole history of the
renamed file, even with its old name), I'm currently of the mind that having
a dedicated directory is preferable.

> You can experiment with dg-prms-id and see if that lets you tag in bug report 
> numbers in a more meaningful way.  Anyway, would be good to always include 
> the bug report number in the test case, and in the bug report, add the name 
> of the added test case (so others don't add yet another instance of the bug).

Absolutely, the PR number should be in the test, and the monitored PR ought
to say what test it's covered by.

I've looked at dg-prms-id in dejagnu, but I don't readily see how that
could help us.

> So with that as a backdrop, I think it's reasonable to self-approve additions 
> of this sort to the test suite.  If you have test cases that can go in with 
> existing mechanisms, feel free to start adding them in.

Sounds good.  As I mentioned, they should be of high quality, just like the
test we normally add when fixing bugs.

> As you find it difficult to express a test using the existing mechanisms, 
> let's talk about those and see if anyone has a good idea on how to express 
> it.  I think ICEs are the most annoying to manage, but, I think excess and 
> prune should be able to handle them.  I think should get an error or warning, 
> or should not get an error or warning are more trivial to manage.

I experimented with
// { dg-prune-output ".*internal compiler error.*" }
// { dg-xfail-if "" { *-*-* } }
but it's a mouthful and the results were poor (when the ICE is fixed but we
generate errors instead).  dg-ice is convenient, handles even the different
kind of ICE (when the diagnostic routines were re-entered), and generates
nice XPASSes when the ICE goes away.

I've also played games with dg-regexp but it was too ugly.

(I honestly don't see why new directives are such a big deal, if they're
properly documented.)

> A word of caution, if we produce core files, before you add tons of core file 
> producing test cases, you'll want to submit a, ulimit -c 0 patch that can 
> avoid the issue.  corefile writing is slow and consumes disk.  I can't recall 
> at the moment if the current infrastructure will reliably avoid core files.

I thought we'd already set ulimit -c 0, but I don't see that now.  I definitely
agree that we don't want core dumps.  It probably needs some hack that sets
ulimit -c to 0 when we're running a test in */unfixed/.  :/  Which also argues
for a separate directory for unfixed tests.

> If test cases infinitely loop, timeout, consume all available ram, fill the 
> disk, crash the host machine, or do other really nasty stuff, please, let's 
> avoid those for now.  It is mazing host fast testing goes on a 200 thread 
> count box, and how slow it can go if a single test case needs to time out.  
> If you start with the idea that any individual test case should only take 
> 2-10 seconds, you won't go wrong.

I agree, but that should be true for all tests.  Really, only the expect-ice
tests are novel.

Marek

Re: [RISC-V] Add support for AddressSanitizer on RISC-V GCC

2020-08-04 Thread Jim Wilson

On Thu, Jul 30, 2020 at 5:31 AM Joshua via Gcc-patches
 wrote:
> +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
> +
> +static unsigned HOST_WIDE_INT
> +riscv_asan_shadow_offset (void)
> +{
> +  return HOST_WIDE_INT_UC (0x1000);
> +}

Is there a reason why you used 0x1000?

Looking at other targets, it appears the convention is 1<<29 for
32-bit targets, and a number larger than 1<<32 for 64-bit targets.  I
think the RISC-V Linux port has a minimum of 39-bit virtual addresses
(SV39) suggesting that this should be 1<<36 for 64-bit targets.  I can
test the 32-bit support on qemu, and the 64-bit support on hardware,
but my hardware is doing other stuff today.  I should be able to try
testing this tomorrow.

Otherwise the gcc stuff is pretty simple and looks OK.  We just need
to double check these numbers.

> diff --git a/libsanitizer/sanitizer_common/sanitizer_common.h 
> b/libsanitizer/sanitizer_common/sanitizer_common.h
> index ac16e0e..ea7dff7 100644
> --- a/libsanitizer/sanitizer_common/sanitizer_common.h
> +++ b/libsanitizer/sanitizer_common/sanitizer_common.h
> @@ -649,7 +649,8 @@ enum ModuleArch {
>kModuleArchARMV7,
>kModuleArchARMV7S,
>kModuleArchARMV7K,
> -  kModuleArchARM64
> +  kModuleArchARM64,
> +  kModuleArchRISCV
>  };
>

Libsanitizer patches should go upstream and then be pulled down into
gcc.  I haven't done libsanitizer work before so I'm not sure of the
exact details.  I would expect to find the info here:
https://gcc.gnu.org/codingconventions.html#upstream
but it doesn't mention libsanitizer.  I found the rules in the
libsanitizer/README.gcc and HOWTO_MERGE files.  As expected it says to
submit upstream first.

You are adding a SANITIZER_RISCV macro but not using it.  It isn't
clear why you need this, unless maybe it is just for completeness.  it
looks harmless though, and might be useful later.  This is something
for upstream reviewers to decide though.

In sanitizer_common.h I see a comment
// When adding a new architecture, don't forget to also update
// script/asan_symbolize.py and sanitizer_symbolizer_libcdep.cpp.
but I don't see any script/asan_symbolize.py file.  So maybe the
comment should be fixed.  Or if there is a file in the llvm tree that
we don't import into gcc, then you will need a patch for it.  if the
comment is wrong, then there is a similar comment in
sanitizer_symbolizer_libcdep.cpp that needs to be fixed too.  If the
file is gone, the comment fix can be a separate patch.

Otherwise this stuff looks pretty simple and obvious but it needs to
be submitted upstream.

Jim

Re: [PATCH] Adjust tree-ssa-strlen.c for irange API.

2020-08-04 Thread Martin Sebor via Gcc-patches


On 8/4/20 3:23 PM, Aldy Hernandez wrote:



On 8/4/20 9:34 PM, Martin Sebor wrote:

On 8/4/20 5:33 AM, Aldy Hernandez via Gcc-patches wrote:

This patch adapts the strlen pass to use the irange API.

I wasn't able to remove the one annoying use of VR_ANTI_RANGE, because
I'm not sure what to do.  Perhaps Martin can shed some light.  The
current code has:

   else if (rng == VR_ANTI_RANGE)
{
  wide_int maxobjsize = wi::to_wide (TYPE_MAX_VALUE 
(ptrdiff_type_node));

  if (wi::ltu_p (cntrange[1], maxobjsize))
    {
  cntrange[0] = cntrange[1] + 1;
  cntrange[1] = maxobjsize;

Suppose we have ~[10,20], won't the above set cntrange[] to [21,MAX]? 
Won't

this ignore the 0..9 that is part of the range?  What should we do here?


cntrange is the range of the strncpy (and strncat) bound.  It does
ignore the lower subrange but I think that's intentional because
the lower the bound the more likely the truncation, so it serves
to minimize false positives.

I didn't see any tests fail with the anti-range block disabled but
with some effort I was able to come up with one:

   char a[7];

   void f (int n)
   {
 if (n > 3)
   n = 0;

 strncpy (a, "12345678", n);   // -Wstringop-truncation
   }

The warning disappears when the anti-range handling is removed so
unless that's causing headaches for the new API I think we want to
keep it (and add the test case :)


Hi Martin.

Thanks for taking the time to respond.

On the strlen1 dump I see that the 3rd argument to strncpy above is:

   long unsigned int ~[4, 18446744071562067967]

which is a fancy way of saying:

   long unsigned int [0,3][18446744071562067967,+INF]

The second sub-range is basically [INT_MIN,+INF] for the original int N, 
which makes sense because N could be negative on the way in.


I don't understand the warning though:

a.c:8:5: warning: ‘__builtin_strncpy’ output truncated copying between 0 
and 3 bytes from a

  string of length 8 [-Wstringop-truncation]
     8 | __builtin_strncpy (a, "12345678", n);   // 
-Wstringop-truncation

   | ^~~~

The range of the bound to strncpy can certainly be [0,3], but it can 
also be [1844...,+INF] which shouldn't warn.


strncpy(d, s, n) zeroes out the destination after it copies s, up
to n - strlen (s).  It's been a while and -Wstringop-truncation
certainly has its quirks (to put it nicely) but I think this one
is a feature.



In a world without anti-ranges, we'd see the 2 sub-ranges above.  How 
would you suggest handling it?  We could nuke out the uppermost 
sub-range, but what if the range is [0,3][10,20]?  Perhaps remove from 
some arbitrary number on up?  Say...[0xf.,+INF]?  This seems like a 
hack, but perhaps is what's needed???


The second subrange in [0, 3][10, 20] is out of bounds for char[7]
and the first one truncates so either we issue -Wstringop-truncation
or if not, -Wstringop-overflow.

A better example might be [0, 3][5, 7] with "1234" as the source.
Only one of these ranges truncates so a warning would probably not
be called for.  Only warn when there's no range that doesn't lead
to truncation.

It will get more interesting if/when the length of the source string
is also in a discontiguous range.  My head is already starting to hurt.

Martin



It doesn't seem like the above source should warn.  Am I missing something?

Thanks.
Aldy

Re: [RISC-V] Add support for AddressSanitizer on RISC-V GCC

2020-08-04 Thread Jim Wilson

On Thu, Jul 30, 2020 at 6:28 AM Martin Liška  wrote:
> What's the reason for sending the same patch multiple times
> from a different sender?

I see 3 in the gcc.gnu.org email archive, and I saw 3 on the NNTP feed
from gmane, but it seems only one of them ended up in my gmail inbox.
The other two appear to have problems with mail headers.  Maybe they
resent because of bounces.  Alibaba is fairly new to gcc development,
so I'd just chalk this up as newbies trying to get the procedure right
with tools that they aren't familiar with.  Few people still use email
the same way that we do for patches.

I am curious about the names though.  The first one was from "shaj
" and the last two are from "cooper.joshua
".  If more than one person
contributed to the patch then it should include all names in the
changelog entry for correct attribution.

Jim

Re: [PATCH] c++: dependent constraint on placeholder return type [PR96443]

2020-08-04 Thread Patrick Palka via Gcc-patches

On Tue, 4 Aug 2020, Patrick Palka wrote:

> In the testcase below, we never substitute function-template arguments
> into f15's placeholder-return-type constraint, which leads to us
> incorrectly rejecting this instantiation in do_auto_deduction due to
> satisfaction failure (of the constraint SameAs).
> 
> The fact that we incorrectly reject this testcase is masked by the
> other instantiation f15, which we correctly reject and diagnose
> (by accident).
> 
> A good place to do this missing substitution seems to be during
> TEMPLATE_TYPE_PARM level lowering.  So this patch adds a call to
> tsubst_constraint there, and also adds dg-bogus directives to this
> testcase wherever we expect instantiation to succeed. (So without the
> substitution fix, this last dg-bogus would FAIL).
> 
> Successfully tested on x86_64-pc-linux-gnu, and also on the cmcstl2 and
> range-v3 projects.  Does this look OK to commit?
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/96443
>   * pt.c (tsubst) : Substitute into
>   the constraints on a placeholder type when its level.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/96443
>   * g++.dg/cpp2a/concepts-ts1.C: Add dg-bogus wherever we expect
>   instantiation to succeed.

Looking back at this patch with fresh eyes, I realized that the commit
message is not the best.  I rewrote the commit message to hopefully be
more coherent below:

-- >8 --

Subject: [PATCH] c++: dependent constraint on placeholder return type
 [PR96443]

In the testcase concepts-ts1.C, we're incorrectly rejecting the call to
'f15(0)' due to satisfaction failure of the function's
placeholder-return-type constraint.

The testcase doesn't spot this rejection because the error we emit for
the constraint failure points to f15's return statement instead of the
call site, and we already have a dg-error at the return statement to
verify the (correct) rejection of the call f15('a').  So in order to
verify that we indeed accept the call 'f15(0)', we need to add a
dg-bogus directive at the call site to look for the "required from here"
diagnostic line that generally accompanies an instantiation failure.

As for why satisfaction failure occurs, it turns out that we never
substitute the template arguments of a function template specialization
in to its placeholder-return-type constraint.  So in this case during
do_auto_deduction, we end up checking satisfaction of the still-dependent
constraint SameAs from do_auto_deduction, which fails
because it's dependent.

A good place to do this missing substitution seems to be during
TEMPLATE_TYPE_PARM level lowering; so this patch adds a call to
tsubst_constraint there.

Successfully tested on x86_64-pc-linux-gnu, and also on the cmcstl2 and
range-v3 projects.  Does this look OK to commit?

gcc/cp/ChangeLog:

PR c++/96443
* pt.c (tsubst) : Substitute into
the constraints on a placeholder type when reducing its level.

gcc/testsuite/ChangeLog:

PR c++/96443
* g++.dg/cpp2a/concepts-ts1.C: Add dg-bogus to the call to f15
that we expect to accept.
---
 gcc/cp/pt.c   | 7 ---
 gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index e7496002c1c..9f3426f8249 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15524,10 +15524,11 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)

 if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
  {
-   /* Propagate constraints on placeholders since they are
-  only instantiated during satisfaction.  */
+   /* Substitute constraints on placeholders when reducing
+  their level.  */
if (tree constr = PLACEHOLDER_TYPE_CONSTRAINTS (t))
- PLACEHOLDER_TYPE_CONSTRAINTS (r) = constr;
+ PLACEHOLDER_TYPE_CONSTRAINTS (r)
+   = tsubst_constraint (constr, args, complain, in_decl);
else if (tree pl = CLASS_PLACEHOLDER_TEMPLATE (t))
  {
pl = tsubst_copy (pl, args, complain, in_decl);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C
index 1cefe3b243f..a116cac4ea4 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C
@@ -40,7 +40,7 @@ void driver()
   f3('a'); // { dg-error "" }
   f4(0, 0);
   f4(0, 'a'); // { dg-error "" }
-  f15(0);
+  f15(0); // { dg-bogus "" }
   f15('a'); // { dg-message "" }
 }

-- 
2.28.0.89.g85b4e0a6dc

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Mike Stump via Gcc-patches

On Aug 4, 2020, at 3:16 PM, Marek Polacek via Gcc-patches 
 wrote:
> 
> The benefit of dg-accepts-invalid was that you would
> get an XPASS even for a test that should not be accepted, but you didn't know
> what line to expect an error on, so you put a dg-error at the end of the test.

I think for most cases it's easy enough to figure out where the error goes.  I 
do see the subtly of the dg-accepts-invalid directive now in the harder cases.  
A change of state of them by the new error message I'd like to think is enough 
to get the people to look at the test case and the corresponding bug report.  
I'd propose seeing if people don't also push along the bug in that sort of 
complex case, I think they will.

Re: [PATCH] nvptx: Add support for PTX highpart multiplications (e.g. mul.hi.s32)

2020-08-04 Thread Tom de Vries

On 8/4/20 2:20 PM, Roger Sayle wrote:
> 
> This patch adds support for signed and unsigned, HImode, SImode and
> DImode highpart multiplications to the nvptx backend.  Without the
> middle-end patch that I've just posted, the middle-end is able to
> (easily) make use of the narrow four of the six instructions, but
> with that patch, all six of these instructions are generated in the
> provided test cases.
> 
> This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
> with a "make" and "make -k check" with no new failures with the
> above patch, and just the two failures to find mul.hi.?64 against
> current mainline.  I'd considered submitting this patch either without
> support for the 64bit variants, or without tests for them, but it
> seemed more reasonable to make both enhancements at the same time.
> 
> Ok for mainline (once the previous patch has been approved/pushed)?

I've committed the HImode/SImode part of the patches (as attached below).

DImode part is OK once the respective tests starts passing.

Thanks,
- Tom
[PATCH] nvptx: Add support for PTX highpart multiplications (HI/SI)

This patch adds support for signed and unsigned, HImode and SImode highpart
multiplications to the nvptx backend.

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures with the
above patch.

2020-08-04  Roger Sayle  

gcc/ChangeLog:

	* config/nvptx/nvptx.md (smulhi3_highpart, smulsi3_highpart)
	(umulhi3_highpart, umulsi3_highpart): New instructions.

gcc/testsuite/ChangeLog:

	* gcc.target/nvptx/mul-hi.c: New test.
	* gcc.target/nvptx/umul-hi.c: New test.

---
 gcc/config/nvptx/nvptx.md| 48 
 gcc/testsuite/gcc.target/nvptx/mul-hi.c  | 15 ++
 gcc/testsuite/gcc.target/nvptx/umul-hi.c | 15 ++
 3 files changed, 78 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index c23edcf34bf..4168190fa42 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -568,6 +568,54 @@
   ""
   "%.\\tmul.wide.u32\\t%0, %1, %2;")
 
+(define_insn "smulhi3_highpart"
+  [(set (match_operand:HI 0 "nvptx_register_operand" "=R")
+	(truncate:HI
+	 (lshiftrt:SI
+	  (mult:SI (sign_extend:SI
+		(match_operand:HI 1 "nvptx_register_operand" "R"))
+		   (sign_extend:SI
+		(match_operand:HI 2 "nvptx_register_operand" "R")))
+	  (const_int 16]
+  ""
+  "%.\\tmul.hi.s16\\t%0, %1, %2;")
+
+(define_insn "smulsi3_highpart"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(truncate:SI
+	 (lshiftrt:DI
+	  (mult:DI (sign_extend:DI
+		(match_operand:SI 1 "nvptx_register_operand" "R"))
+		   (sign_extend:DI
+		(match_operand:SI 2 "nvptx_register_operand" "R")))
+	  (const_int 32]
+  ""
+  "%.\\tmul.hi.s32\\t%0, %1, %2;")
+
+(define_insn "umulhi3_highpart"
+  [(set (match_operand:HI 0 "nvptx_register_operand" "=R")
+	(truncate:HI
+	 (lshiftrt:SI
+	  (mult:SI (zero_extend:SI
+		(match_operand:HI 1 "nvptx_register_operand" "R"))
+		   (zero_extend:SI
+		(match_operand:HI 2 "nvptx_register_operand" "R")))
+	  (const_int 16]
+  ""
+  "%.\\tmul.hi.u16\\t%0, %1, %2;")
+
+(define_insn "umulsi3_highpart"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(truncate:SI
+	 (lshiftrt:DI
+	  (mult:DI (zero_extend:DI
+		(match_operand:SI 1 "nvptx_register_operand" "R"))
+		   (zero_extend:DI
+		(match_operand:SI 2 "nvptx_register_operand" "R")))
+	  (const_int 32]
+  ""
+  "%.\\tmul.hi.u32\\t%0, %1, %2;")
+
 ;; Shifts
 
 (define_insn "ashl3"
diff --git a/gcc/testsuite/gcc.target/nvptx/mul-hi.c b/gcc/testsuite/gcc.target/nvptx/mul-hi.c
new file mode 100644
index 000..c66fa38623b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/mul-hi.c
@@ -0,0 +1,15 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -save-temps" } */
+
+short smulhi3_highpart(short x, short y)
+{
+  return ((int)x * (int)y) >> 16;
+}
+
+int smulsi3_highpart(int x, int y)
+{
+  return ((long)x * (long)y) >> 32;
+}
+
+/* { dg-final { scan-assembler-times "mul.hi.s16" 1 } } */
+/* { dg-final { scan-assembler-times "mul.hi.s32" 1 } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/umul-hi.c b/gcc/testsuite/gcc.target/nvptx/umul-hi.c
new file mode 100644
index 000..3b35d6b50ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/umul-hi.c
@@ -0,0 +1,15 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -save-temps" } */
+
+unsigned short umulhi3_highpart(unsigned short x, unsigned short y)
+{
+  return ((unsigned int)x * (unsigned int)y) >> 16;
+}
+
+unsigned int umulsi3_highpart(unsigned int x, unsigned int y)
+{
+  return ((unsigned long)x * (unsigned long)y) >> 32;
+}
+
+/* { dg-final { scan-assembler-times "mul.hi.u16" 1 } } */
+/* { dg-final { scan-assembler-times "mul.hi.u32" 1 } } */

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Mike Stump via Gcc-patches

On Aug 4, 2020, at 3:08 PM, Marek Polacek via Gcc-patches 
 wrote:
> 
> That works well if you know where to expect an error.  But if you don't, it's
> worse.  E.g.,
> 
> // { dg-xfail-if "" { *-*-* } }
> int i = nothere; // demonstrates something that errors out
> // { dg-error "" } didn't know where to put this
> 
> only prints unexpected failures, but no unexpected successes.  I guess that's
> OK though, at least for now, so I'll drop dg-accepts-invalid.

There are two cases, either you get an error message that is wrong, and you can 
use:

  strncpy (p, s, 60);   /* { dg-bogus "-Wstringop-truncation" } */  

or, you don't get an error, but you should:

  A foo(void i = 0);  // { dg-error "incomplete type|invalid use" }   

?  Do you have an example of a specific case that doesn't work?  I'm not sure 
I'm following.

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Marek Polacek via Gcc-patches

On Thu, Jul 30, 2020 at 11:54:23AM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Jul 28, 2020 at 05:44:47PM -0400, Marek Polacek via Gcc-patches wrote:
> > We will still have a surfeit of bugs that we've given short shrift to, but
> > let's at least automate what we can.  The initial addition of the relevant
> > old(-ish) tests won't of course happen automagically, but it's a price I'm
> > willing to pay.  My goal here isn't merely to reduce the number of open PRs;
> > it is to improve the testing of the compiler overall.
> > 
> > Thoughts?
> 
> Looks useful to me, but I'd think it might be desirable to use separate
> directories for those tests, so that it is more obvious that it is a
> different category of tests.  Now that we use git, just using git mv
> to move them to another place once they are fixed for good (together with
> some dg-* directive tweaks) wouldn't be that much work later.
> 
> So having gcc.dg/unfixed/ , g++.dg/unfixed/ , c-c++-common/unfixed/
> and their torture/ suffixed variants (or better directory name for those)?

Thanks.  I was afraid that it would cause too much friction when you happen
to fix one of the unfixed tests: you will have to find the correct directory
to put the test in and perhaps even rename the test to avoid conflicts with
tests with the same name in the final destination.  But it's also true that
git is much better at moving files, and the extra clarity might be worth the
occasional hassle.  It would also make it easy to skip testing unfixed tests.
dg-ice tests are easy to spot/grep for, but accepts-invalid/rejects-valid are
a different story.

I'll post a v2 patch soon with the unfixed/ dir in mind.

Marek

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Mike Stump via Gcc-patches

I think the read of the room is that people think it would be generally useful, 
so let approve the general plan.

So, now we are down to the fine details.  Please do see just how far you can 
stretch the existing mechanisms to cover what you need to do.  I think the 
existing mechanisms should be able to cover it all; but the devil is in the 
details and those matter.

For the suggestion to isolate the tests into their own area easily 
distinguished by filename, I think it is better to choose an original home for 
them using the existing naming scheme as much as possible, that way when fixed, 
they are already in the right spot.  We in theory can move them around, but, 
there is a beauty in having a long term stable name that just doesn't change 
any.  You can look through time to see all state changes.  git log file.C, show 
you all the history nicely, and so on.

You can experiment with dg-prms-id and see if that lets you tag in bug report 
numbers in a more meaningful way.  Anyway, would be good to always include the 
bug report number in the test case, and in the bug report, add the name of the 
added test case (so others don't add yet another instance of the bug).

So with that as a backdrop, I think it's reasonable to self-approve additions 
of this sort to the test suite.  If you have test cases that can go in with 
existing mechanisms, feel free to start adding them in.

As you find it difficult to express a test using the existing mechanisms, let's 
talk about those and see if anyone has a good idea on how to express it.  I 
think ICEs are the most annoying to manage, but, I think excess and prune 
should be able to handle them.  I think should get an error or warning, or 
should not get an error or warning are more trivial to manage.

A word of caution, if we produce core files, before you add tons of core file 
producing test cases, you'll want to submit a, ulimit -c 0 patch that can avoid 
the issue.  corefile writing is slow and consumes disk.  I can't recall at the 
moment if the current infrastructure will reliably avoid core files.

If test cases infinitely loop, timeout, consume all available ram, fill the 
disk, crash the host machine, or do other really nasty stuff, please, let's 
avoid those for now.  It is mazing host fast testing goes on a 200 thread count 
box, and how slow it can go if a single test case needs to time out.  If you 
start with the idea that any individual test case should only take 2-10 
seconds, you won't go wrong.

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Marek Polacek via Gcc-patches

On Thu, Jul 30, 2020 at 11:08:03AM +0200, Martin Liška wrote:
> Hello.
> 
> I support the initiative!
> What would be nice to add leading 'PR component/12345'
> to a git commit so that these test additions are linked to bugzilla issues.

Thanks!  Yes, it should be clear which test tests a PR that has the monitored
keyword.  That may get lost when adding a lot of tests in one commit, but can
always be clarified in a comment.  Or just grep 'PR component/12345' in the
testsuite; new tests should have this as their first line.

Marek

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Marek Polacek via Gcc-patches

On Wed, Jul 29, 2020 at 04:00:27PM -0600, Martin Sebor wrote:
> I've created a much more rudimentary setup for myself to deal
> with the same problem.  I copy tests from Bugzilla, sometimes
> with tweaks, and compile them from time to time as I revisit
> unresolved bugs.  I've also thought about adding those to
> the test suite and marking them XFAIL but I don't think I've
> actually done it more than a handful of times.  I was told
> adding tests (passing or xfailing) is fine without approval.
> 
> I think your proposal to add tests for known failures is a good
> idea.  I don't have much of an opinion about extending the test
> harness to differentiate other kinds of failures (like ICEs) and
> mark them as expected.  I'm not sure I understand the benefit
> of adding directives like dg-accepts-invalid over using xfail.

Thanks for the feedback.  The benefit of dg-accepts-invalid was that you would
get an XPASS even for a test that should not be accepted, but you didn't know
what line to expect an error on, so you put a dg-error at the end of the test.

That's probably not necessary for the first incarnation of this patch so I've
dropped it.

Thanks,
Marek

> On 7/28/20 3:44 PM, Marek Polacek via Gcc-patches wrote:
> > In Bugzilla, for the c++ component, we currently have over 3200 open bugs.  
> > In
> > my experience, a good amount of them have already been fixed; my periodical
> > sweeps always turn up a bunch of PRs that had already been fixed previously.
> > Sometimes my sweeps are more or less random, but more often than not I'm 
> > just
> > looking for duplicates of an existing PR.  Sometimes the reason the already
> > fixed PRs are still open is because a PR that was fixed had duplicates that 
> > we
> > didn't catch earlier when confirming the PR.  Sometimes a PR gets fixed as a
> > side-effect of fixing another PR.  Manual sweeps are tedious and 
> > time-consuming
> > because often you need to grab the test from the Bugzilla yet again (and
> > sometimes there are multiple tests).  Even if you find a PR that was fixed, 
> > you
> > still need to bisect the fix and perhaps add the test to our testsuite.  
> > That's
> > draining and since the number of bugs only increases, never decreases, it 
> > is not
> > sustainable.
> > 
> > So I've started a personal repo where I've gathered dozens of tests and 
> > wrote a
> > script that just compiles every test in the repo and reports if anything
> > changed.  One line from it:
> > 
> > pr=59798; $cxx $o -c $pr.C 2>&1 | grep -qE 'internal compiler error' || 
> > echo -e "$pr: ${msg_ice}"
> > 
> > This has major drawbacks: you have to remember to run this manually, keep
> > updating it, and it's yet another repo that people interested in this would
> > have to clone, but the worst thing is that typically you would only discover
> > that a patch fixed a different PR long after the patch was committed.  And
> > quite likely it wasn't even your patch.  We know that finding problems 
> > earlier
> > in the developer workflow reduces costs; if we can catch this before the
> > original developer commits & pushes the changes, it's cheaper, because the
> > developer already understands what the patch does.
> > 
> > A case in point: https://gcc.gnu.org/PR58156 which has been fixed recently
> > by an unrelated (?) patch.  Knowing that the tsubst_pack_expansion hunk in
> > the patch had this effect would probably have been very useful.  More 
> > testing
> > will lead to a better compiler.
> > 
> > Another case: https://gcc.gnu.org/35098 which was fixed 12 years (!) after
> > it was reported by a different change.
> > 
> > Or another: https://gcc.gnu.org/91525 where the patch contained a test, but
> > that was ice-on-invalid, whereas the test in PR91525 was ice-on-valid.
> > 
> > To alleviate some of these problems, I propose that we introduce a means to 
> > our
> > DejaGNU infrastructure that allows adding tests for old bugs that have not 
> > been
> > fixed yet, and re-introduce the keyword monitored (no longer used for 
> > anything
> > -- I think Volker stepped away) to the GCC Bugzilla to signal that a PR is
> > tracked in the testsuite.  I don't want any unnecessary moving tests 
> > around, so
> > the tests would go where they would normally go; they have to be reduced and
> > have proper targets, etc.  Having such tests in the testsuite means that 
> > when
> > something changes, you will know immediately, before you push any changes.
> > 
> > My thinking is that for:
> > 
> > * rejects-valid: use the existing dg-xfail-if
> > * accepts-valid: use the new dg-accepts-invalid
> > * ICEs: use the new dg-ice
> > 
> > dg-ice can be used like this:
> > 
> > // { dg-ice "build_over_call" { target c++11 } }
> > 
> > and it means that if the test still ICEs, you'll get a quiet XFAIL.  If the
> > ICE is fixed, you'll get an XPASS; if the ICE is gone but there are errors,
> > you'll get an XPASS + FAIL.  Then you can close the old PR.
> > 
> > Similarly,

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Marek Polacek via Gcc-patches

On Wed, Jul 29, 2020 at 04:37:03PM -0400, Jason Merrill wrote:
> On Tue, Jul 28, 2020 at 5:45 PM Marek Polacek via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> 
> > In Bugzilla, for the c++ component, we currently have over 3200 open
> > bugs.  In
> > my experience, a good amount of them have already been fixed; my periodical
> > sweeps always turn up a bunch of PRs that had already been fixed
> > previously.
> > Sometimes my sweeps are more or less random, but more often than not I'm
> > just
> > looking for duplicates of an existing PR.  Sometimes the reason the already
> > fixed PRs are still open is because a PR that was fixed had duplicates
> > that we
> > didn't catch earlier when confirming the PR.  Sometimes a PR gets fixed as
> > a
> > side-effect of fixing another PR.  Manual sweeps are tedious and
> > time-consuming
> > because often you need to grab the test from the Bugzilla yet again (and
> > sometimes there are multiple tests).  Even if you find a PR that was
> > fixed, you
> > still need to bisect the fix and perhaps add the test to our testsuite.
> > That's
> > draining and since the number of bugs only increases, never decreases, it
> > is not
> > sustainable.
> >
> > So I've started a personal repo where I've gathered dozens of tests and
> > wrote a
> > script that just compiles every test in the repo and reports if anything
> > changed.  One line from it:
> >
> > pr=59798; $cxx $o -c $pr.C 2>&1 | grep -qE 'internal compiler error' ||
> > echo -e "$pr: ${msg_ice}"
> >
> > This has major drawbacks: you have to remember to run this manually, keep
> > updating it, and it's yet another repo that people interested in this would
> > have to clone, but the worst thing is that typically you would only
> > discover
> > that a patch fixed a different PR long after the patch was committed.  And
> > quite likely it wasn't even your patch.  We know that finding problems
> > earlier
> > in the developer workflow reduces costs; if we can catch this before the
> > original developer commits & pushes the changes, it's cheaper, because the
> > developer already understands what the patch does.
> >
> > A case in point: https://gcc.gnu.org/PR58156 which has been fixed recently
> > by an unrelated (?) patch.  Knowing that the tsubst_pack_expansion hunk in
> > the patch had this effect would probably have been very useful.  More
> > testing
> > will lead to a better compiler.
> >
> > Another case: https://gcc.gnu.org/35098 which was fixed 12 years (!) after
> > it was reported by a different change.
> >
> > Or another: https://gcc.gnu.org/91525 where the patch contained a test,
> > but
> > that was ice-on-invalid, whereas the test in PR91525 was ice-on-valid.
> >
> > To alleviate some of these problems, I propose that we introduce a means
> > to our
> > DejaGNU infrastructure that allows adding tests for old bugs that have not
> > been
> > fixed yet, and re-introduce the keyword monitored (no longer used for
> > anything
> > -- I think Volker stepped away) to the GCC Bugzilla to signal that a PR is
> > tracked in the testsuite.  I don't want any unnecessary moving tests
> > around, so
> > the tests would go where they would normally go; they have to be reduced
> > and
> > have proper targets, etc.  Having such tests in the testsuite means that
> > when
> > something changes, you will know immediately, before you push any changes.
> >
> > My thinking is that for:
> >
> > * rejects-valid: use the existing dg-xfail-if
> >
> 
> Or dg-excess-errors, or xfailed dg-bogus.

Yeah, whatever works.  With e.g.

int i = ...;  // { dg-xfail-if "" { *-*-* } }

one gets expected failures when we emit an error, and unexpected successes when
we stop emitting and error there.

> > * accepts-valid: use the new dg-accepts-invalid

(Surely I meant accepts-*in*valid here.)

> >
> 
> xfailed dg-error should cover this case.

That works well if you know where to expect an error.  But if you don't, it's
worse.  E.g.,

// { dg-xfail-if "" { *-*-* } }
int i = nothere; // demonstrates something that errors out
// { dg-error "" } didn't know where to put this

only prints unexpected failures, but no unexpected successes.  I guess that's
OK though, at least for now, so I'll drop dg-accepts-invalid.

> > * ICEs: use the new dg-ice.
> >
> 
> This seems like a good addition.

Thanks!

Marek

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Marek Polacek via Gcc-patches

On Wed, Jul 29, 2020 at 09:40:35AM +0100, Richard Sandiford wrote:
> Thanks for doing this.  +1 for the best fix being to add XFAILing tests
> to the main testsute, enabled by default.  I don't see any other realistic
> way of ensuring that fixes are matched with PRs at the time that the fix
> is made (rather than some time after the fact).

I appreciate the feedback!

> Marek Polacek via Gcc-patches  writes:
> > […]
> > My thinking is that for:
> >
> > * rejects-valid: use the existing dg-xfail-if
> > * accepts-valid: use the new dg-accepts-invalid
> > * ICEs: use the new dg-ice
> >
> > dg-ice can be used like this:
> >
> > // { dg-ice "build_over_call" { target c++11 } }
> >
> > and it means that if the test still ICEs, you'll get a quiet XFAIL.  If the
> > ICE is fixed, you'll get an XPASS; if the ICE is gone but there are errors,
> > you'll get an XPASS + FAIL.  Then you can close the old PR.
> 
> This is long overdue IMO, thanks for adding it.
> 
> > Similarly, dg-accepts-invalid:
> >
> > // { dg-accepts-invalid "PR86500" }
> >
> > means that if the test still compiles without errors, you'll get a quiet 
> > XFAIL.
> > If we start giving errors, you'll get an XPASS.
> >
> > If the bug is fixed, simply remove the directive.
> >
> > The patch implementing these new directives is appended.  Once/if this is
> > accepted, I can start adding the old tests we have in our Bugzilla.  (I'm
> > only concerned about the c++ component, if that wasn't already clear.)
> >
> > The question is what makes the bug "old": is it one year without it being
> > assigned?  6 months?  3 months?  Note: I *don't* propose to add every test 
> > for
> > every new PR, just the reasonably old ones that are useful/important.  Such
> > additions should be done in batches, so that we don't have dozens of 
> > commits,
> > each of them merely adding a single test.
> 
> IMO it should be OK to add a testcase for any open PR, if someone think
> it's useful, regardless of age and without being forced to batch the
> commits.  I.e. I think it should come under the “obvious” rule and
> people should just use their judgement about when it's appropriate.
> Adding XFAILing tests shouldn't disturb anyone else very much.

Sounds good.  I do think it should be left up to developers, and that such
tests can go in under the obvious rule -- you can hardly break stuff.

My point about the batches was that if you know you're going to add 10 tests,
it's better to add them in 1 squashed commit rather than 10 separate commits.

> I guess there's a possibility that some tests happen to pass already
> on some targets.  That's more likely with middle-end and backend bugs
> rather than frontend stuff though.  Perhaps for those it would make
> sense to have a convention in which the failing testcase is restricted
> (at the whole-test level) to the targets that the person committing the
> testcase has actually tried.  Maybe with a comment on the dg-ice etc.
> to remind people to reconsider the main target selector when un-XFAILing
> the test.

Interesting point.  With my frontend hat on, I hadn't really thought of
this much, but the dg-ice directive allows you to specify the targets and
specific options when to expect an ICE.  So you could run a test everywhere
but only expect an ICE on aarch64.

Thanks,
Marek

Re: [PATCH] Adjust tree-ssa-strlen.c for irange API.

2020-08-04 Thread Aldy Hernandez via Gcc-patches





On 8/4/20 11:23 PM, Aldy Hernandez wrote:



On 8/4/20 9:34 PM, Martin Sebor wrote:

On 8/4/20 5:33 AM, Aldy Hernandez via Gcc-patches wrote:

This patch adapts the strlen pass to use the irange API.

I wasn't able to remove the one annoying use of VR_ANTI_RANGE, because
I'm not sure what to do.  Perhaps Martin can shed some light.  The
current code has:

   else if (rng == VR_ANTI_RANGE)
{
  wide_int maxobjsize = wi::to_wide (TYPE_MAX_VALUE 
(ptrdiff_type_node));

  if (wi::ltu_p (cntrange[1], maxobjsize))
    {
  cntrange[0] = cntrange[1] + 1;
  cntrange[1] = maxobjsize;

Suppose we have ~[10,20], won't the above set cntrange[] to [21,MAX]? 
Won't

this ignore the 0..9 that is part of the range?  What should we do here?


cntrange is the range of the strncpy (and strncat) bound.  It does
ignore the lower subrange but I think that's intentional because
the lower the bound the more likely the truncation, so it serves
to minimize false positives.

I didn't see any tests fail with the anti-range block disabled but
with some effort I was able to come up with one:

   char a[7];

   void f (int n)
   {
 if (n > 3)
   n = 0;

 strncpy (a, "12345678", n);   // -Wstringop-truncation
   }

The warning disappears when the anti-range handling is removed so
unless that's causing headaches for the new API I think we want to
keep it (and add the test case :)


Hi Martin.

Thanks for taking the time to respond.

On the strlen1 dump I see that the 3rd argument to strncpy above is:

   long unsigned int ~[4, 18446744071562067967]

which is a fancy way of saying:

   long unsigned int [0,3][18446744071562067967,+INF]

The second sub-range is basically [INT_MIN,+INF] for the original int N, 
which makes sense because N could be negative on the way in.


I don't understand the warning though:

a.c:8:5: warning: ‘__builtin_strncpy’ output truncated copying between 0 
and 3 bytes from a

  string of length 8 [-Wstringop-truncation]
     8 | __builtin_strncpy (a, "12345678", n);   // 
-Wstringop-truncation

   | ^~~~

The range of the bound to strncpy can certainly be [0,3], but it can 
also be [1844...,+INF] which shouldn't warn.


In a world without anti-ranges, we'd see the 2 sub-ranges above.  How 
would you suggest handling it?  We could nuke out the uppermost 
sub-range, but what if the range is [0,3][10,20]?  Perhaps remove from 
some arbitrary number on up?  Say...[0xf.,+INF]?  This seems like a 
hack, but perhaps is what's needed???


It doesn't seem like the above source should warn.  Am I missing something?


FWIW, evrp gets a slightly more pessimistic range:

_1: long unsigned int ~[2147483648, 18446744071562067967]

it isn't until VRP1 that the range excludes [0,3]:

_1: long unsigned int ~[4, 18446744071562067967]

Whereas the ranger can get the more refined range at -O1, and without 
dominators.  I would even venture to say it could get it at -O0, with 
just SSA + CFG.


I noticed that the strlen pass only runs at -O2.  Perhaps we could 
explore running the pass for lower or no optimization levels when the 
ranger becomes available.  Just a thought.


Aldy

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Marek Polacek via Gcc-patches

On Tue, Jul 28, 2020 at 09:02:17PM -0600, Jeff Law wrote:
> On Tue, 2020-07-28 at 17:44 -0400, Marek Polacek via Gcc-patches wrote:
> > In Bugzilla, for the c++ component, we currently have over 3200 open bugs.  
> > In
> > my experience, a good amount of them have already been fixed; my periodical
> > sweeps always turn up a bunch of PRs that had already been fixed previously.
> > Sometimes my sweeps are more or less random, but more often than not I'm 
> > just
> > looking for duplicates of an existing PR.  Sometimes the reason the already
> > fixed PRs are still open is because a PR that was fixed had duplicates that 
> > we
> > didn't catch earlier when confirming the PR.  Sometimes a PR gets fixed as a
> > side-effect of fixing another PR.  Manual sweeps are tedious and 
> > time-consuming
> > because often you need to grab the test from the Bugzilla yet again (and
> > sometimes there are multiple tests).  Even if you find a PR that was fixed, 
> > you
> > still need to bisect the fix and perhaps add the test to our testsuite.  
> > That's
> > draining and since the number of bugs only increases, never decreases, it 
> > is not
> > sustainable.
> [ ... ]
> Another approach is to add tests for unfixed bugs as XFAILs.  When we see the
> test go from XFAIL to XPASS, then we know the bug got fixed.

That's the plan precisely.  XFAILs won't show up in your test summary so won't
clutter the output, whereas XPASSs will be noticeable.

> Anyway, there's certainly room to do something here to make it easier to find
> bugs we've already fixed.

Right, I don't expect that people will start adding unfixed tests by the
hundred.  ;-)

Marek

Re: RFC: Monitoring old PRs, new dg directives

2020-08-04 Thread Marek Polacek via Gcc-patches

Hi Mike,

thanks for your comments.

On Tue, Jul 28, 2020 at 06:37:26PM -0700, Mike Stump via Gcc-patches wrote:
> I'll punt to the the C++ front-end folks to chime in.  Usually we only check 
> in bugs that are fixed, as they are fixed, this is what makes it a regression 
> suite.  Doing this does have advantages, like, the testsuite is small and 
> doesn't have duplicates and doesn't test anything that is known to fail that 
> isn't an actual regression.

We also add sanity tests for new language features, for example.  I also like
adding (small) tests for DRs that happen to be already fixed to insure that the
compiler continues to behave as expected.  Avoiding duplicates is a good thing,
obviously, but, with this new scheme, if you fix something and while testing
the fix you find that we already have a test, you can choose not to introduce
a new test.  Whether or not a compiler bug is a regression shouldn't, IMHO, be
a criterion for (not) including the test.

tree(1) in testsuite/g++.dg says 70 directories, 13406 files.  If I go wild and
add 200 new C++ tests, that's ~1.5% increase in the number of tests.  That seems
reasonable.  If it still causes grief for some people, and we go with the idea
of using an unfixed/ directory, we could add GCC_TESTSUITE_TEST_UNFIXED envvar
to enable or disable running such tests.

> Ideally, it would be nice to have a way to test bugs out of bugzilla, and 
> report on those fixed bugs as they are fixed.  If people want to keep the 
> test suite a regression suite, then my counter proposal would be to have a 
> branch with the non-regression bugs on it and then people can checkout and 
> test that branch.  Most of the people don't (saving the testing time, which 
> is handy), and then more sporadically, the old bugs branch can be test and BZ 
> state can be moved along as bugs as fixed.  A run once a week or even once a 
> month would seem to be plenty often.

I'm afraid this would defeat the point of this proposal.  If the additional
tests are only available on a branch, no one is going to use it.  Quite
frankly, I'd just stick with my personal repo (where I can do whatever I want,
include test with #includes, unreduced tests, ...) rather than to bother with
rebasing a branch etc.  Even if someone would actually use such a branch from
time to time, we'd lose the benefit of "left shifting" bugs in the developer
workflow -- you would only notice that your patch had changed something days
or weeks after the commit at which point you likely have lost state on it.

> You in general don't need to check in fixes from bug reports that have been 
> fixed in the past, as those fixes generally already have a test case for the 
> fix that went in with the fix.

I agree, but experience shows that's not always the case.  Just the few PRs
I mentioned in my original mail prove that.  There are patches that change
something as a side-effect only, and, as a developer, you want to be aware
of those side-effects.

> As for how old is old, we'd leave that to the contributor of work to decide.  
> In theory, bugs can be added as soon as they come in, no need to wait.  To 
> the extent that waiting saves work, well, that's a personal choice, for the 
> person doing the work to choose.

I agree.  I'd leave it up to developers.  Just... be reasonable.  No need to
add every broken nonsense test lurking in Bugzilla.

> Why not just use xfail and xpass?  Seems less work than doing a setup_xfail.  
> Also, why not just use the existing directives instead of adding new 
> directives?  I'm suspicious of expect_ice and accepts_invalid.  You set them 
> to 1 all the time, but almost never set them to 0?  I'm wondering if it 
> should be more like shouldfail?

Good point, I missed that I could use xfail and xpass.  Fixed.

For accepts_invalid I think we *could* use the existing directives, if you know
where to expect an error.  But for ICEs we currently have nothing that would
work well.  I'll probably drop dg-accepts-invalid completely.

I followed shouldfail's suit when it comes to setting them to 1, so that should
be fine.

Thanks,
Marek

> On Jul 28, 2020, at 2:44 PM, Marek Polacek via Gcc-patches 
>  wrote:
> > 
> > In Bugzilla, for the c++ component, we currently have over 3200 open bugs.  
> > In
> > my experience, a good amount of them have already been fixed; my periodical
> > sweeps always turn up a bunch of PRs that had already been fixed previously.
> > Sometimes my sweeps are more or less random, but more often than not I'm 
> > just
> > looking for duplicates of an existing PR.  Sometimes the reason the already
> > fixed PRs are still open is because a PR that was fixed had duplicates that 
> > we
> > didn't catch earlier when confirming the PR.  Sometimes a PR gets fixed as a
> > side-effect of fixing another PR.  Manual sweeps are tedious and 
> > time-consuming
> > because often you need to grab the test from the Bugzilla yet again (and
> > sometimes there are multiple tests).  Even if

Re: [PATCH] Adjust tree-ssa-strlen.c for irange API.

2020-08-04 Thread Aldy Hernandez via Gcc-patches





On 8/4/20 9:34 PM, Martin Sebor wrote:

On 8/4/20 5:33 AM, Aldy Hernandez via Gcc-patches wrote:

This patch adapts the strlen pass to use the irange API.

I wasn't able to remove the one annoying use of VR_ANTI_RANGE, because
I'm not sure what to do.  Perhaps Martin can shed some light.  The
current code has:

   else if (rng == VR_ANTI_RANGE)
{
  wide_int maxobjsize = wi::to_wide (TYPE_MAX_VALUE 
(ptrdiff_type_node));

  if (wi::ltu_p (cntrange[1], maxobjsize))
    {
  cntrange[0] = cntrange[1] + 1;
  cntrange[1] = maxobjsize;

Suppose we have ~[10,20], won't the above set cntrange[] to [21,MAX]?  
Won't

this ignore the 0..9 that is part of the range?  What should we do here?


cntrange is the range of the strncpy (and strncat) bound.  It does
ignore the lower subrange but I think that's intentional because
the lower the bound the more likely the truncation, so it serves
to minimize false positives.

I didn't see any tests fail with the anti-range block disabled but
with some effort I was able to come up with one:

   char a[7];

   void f (int n)
   {
     if (n > 3)
   n = 0;

     strncpy (a, "12345678", n);   // -Wstringop-truncation
   }

The warning disappears when the anti-range handling is removed so
unless that's causing headaches for the new API I think we want to
keep it (and add the test case :)


Hi Martin.

Thanks for taking the time to respond.

On the strlen1 dump I see that the 3rd argument to strncpy above is:

  long unsigned int ~[4, 18446744071562067967]

which is a fancy way of saying:

  long unsigned int [0,3][18446744071562067967,+INF]

The second sub-range is basically [INT_MIN,+INF] for the original int N, 
which makes sense because N could be negative on the way in.


I don't understand the warning though:

a.c:8:5: warning: ‘__builtin_strncpy’ output truncated copying between 0 
and 3 bytes from a

 string of length 8 [-Wstringop-truncation]
8 | __builtin_strncpy (a, "12345678", n);   // 
-Wstringop-truncation

  | ^~~~

The range of the bound to strncpy can certainly be [0,3], but it can 
also be [1844...,+INF] which shouldn't warn.


In a world without anti-ranges, we'd see the 2 sub-ranges above.  How 
would you suggest handling it?  We could nuke out the uppermost 
sub-range, but what if the range is [0,3][10,20]?  Perhaps remove from 
some arbitrary number on up?  Say...[0xf.,+INF]?  This seems like a 
hack, but perhaps is what's needed???


It doesn't seem like the above source should warn.  Am I missing something?

Thanks.
Aldy

Re: [PATCH] Adjust tree-ssa-strlen.c for irange API.

2020-08-04 Thread Martin Sebor via Gcc-patches


On 8/4/20 5:33 AM, Aldy Hernandez via Gcc-patches wrote:

This patch adapts the strlen pass to use the irange API.

I wasn't able to remove the one annoying use of VR_ANTI_RANGE, because
I'm not sure what to do.  Perhaps Martin can shed some light.  The
current code has:

   else if (rng == VR_ANTI_RANGE)
{
  wide_int maxobjsize = wi::to_wide (TYPE_MAX_VALUE 
(ptrdiff_type_node));
  if (wi::ltu_p (cntrange[1], maxobjsize))
{
  cntrange[0] = cntrange[1] + 1;
  cntrange[1] = maxobjsize;

Suppose we have ~[10,20], won't the above set cntrange[] to [21,MAX]?  Won't
this ignore the 0..9 that is part of the range?  What should we do here?


cntrange is the range of the strncpy (and strncat) bound.  It does
ignore the lower subrange but I think that's intentional because
the lower the bound the more likely the truncation, so it serves
to minimize false positives.

I didn't see any tests fail with the anti-range block disabled but
with some effort I was able to come up with one:

  char a[7];

  void f (int n)
  {
if (n > 3)
  n = 0;

strncpy (a, "12345678", n);   // -Wstringop-truncation
  }

The warning disappears when the anti-range handling is removed so
unless that's causing headaches for the new API I think we want to
keep it (and add the test case :)

Martin



Anyways, I've left the anti-range in place, but the rest of the patch still
stands.

OK?

gcc/ChangeLog:

* tree-ssa-strlen.c (get_range): Adjust for irange API.
(compare_nonzero_chars): Same.
(dump_strlen_info): Same.
(get_range_strlen_dynamic): Same.
(set_strlen_range): Same.
(maybe_diag_stxncpy_trunc): Same.
(get_len_or_size): Same.
(count_nonzero_bytes_addr): Same.
(handle_integral_assign): Same.
---
  gcc/tree-ssa-strlen.c | 122 --
  1 file changed, 57 insertions(+), 65 deletions(-)

diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index fbaee745f7d..e6009874ee5 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -220,21 +220,25 @@ get_range (tree val, wide_int minmax[2], const vr_values 
*rvals /* = NULL */)
 GCC 11).  */
const value_range *vr
= (CONST_CAST (class vr_values *, rvals)->get_value_range (val));
-  value_range_kind rng = vr->kind ();
-  if (rng != VR_RANGE || !range_int_cst_p (vr))
+  if (vr->undefined_p () || vr->varying_p ())
return NULL_TREE;
  
-  minmax[0] = wi::to_wide (vr->min ());

-  minmax[1] = wi::to_wide (vr->max ());
+  minmax[0] = vr->lower_bound ();
+  minmax[1] = vr->upper_bound ();
return val;
  }
  
-  value_range_kind rng = get_range_info (val, minmax, minmax + 1);

-  if (rng == VR_RANGE)
-return val;
+  value_range vr;
+  get_range_info (val, vr);
+  if (!vr.undefined_p () && !vr.varying_p ())
+{
+  minmax[0] = vr.lower_bound ();
+  minmax[1] = vr.upper_bound ();
+  return val;
+}
  
-  /* Do not handle anti-ranges and instead make use of the on-demand

- VRP if/when it becomes available (hopefully in GCC 11).  */
+  /* We should adjust for the on-demand VRP if/when it becomes
+ available (hopefully in GCC 11).  */
return NULL_TREE;
  }
  
@@ -278,16 +282,18 @@ compare_nonzero_chars (strinfo *si, unsigned HOST_WIDE_INT off,

  = (CONST_CAST (class vr_values *, rvals)
 ->get_value_range (si->nonzero_chars));
  
-  value_range_kind rng = vr->kind ();

-  if (rng != VR_RANGE || !range_int_cst_p (vr))
+  if (vr->undefined_p () || vr->varying_p ())
  return -1;
  
/* If the offset is less than the minimum length or if the bounds

   of the length range are equal return the result of the comparison
   same as in the constant case.  Otherwise return a conservative
   result.  */
-  int cmpmin = compare_tree_int (vr->min (), off);
-  if (cmpmin > 0 || tree_int_cst_equal (vr->min (), vr->max ()))
+  tree type = TREE_TYPE (si->nonzero_chars);
+  tree tmin = wide_int_to_tree (type, vr->lower_bound ());
+  tree tmax = wide_int_to_tree (type, vr->upper_bound ());
+  int cmpmin = compare_tree_int (tmin, off);
+  if (cmpmin > 0 || tree_int_cst_equal (tmin, tmax))
  return cmpmin;
  
return -1;

@@ -905,32 +911,14 @@ dump_strlen_info (FILE *fp, gimple *stmt, const vr_values 
*rvals)
  print_generic_expr (fp, si->nonzero_chars);
  if (TREE_CODE (si->nonzero_chars) == SSA_NAME)
{
- value_range_kind rng = VR_UNDEFINED;
- wide_int min, max;
+ value_range vr;
  if (rvals)
-   {
- const value_range *vr
-   = CONST_CAST (class vr_values *, rvals)
-   ->get_value_range (si->nonzero_chars);
- rng = vr->kind ();
- if

[PATCH] c++: dependent constraint on placeholder return type [PR96443]

2020-08-04 Thread Patrick Palka via Gcc-patches

In the testcase below, we never substitute function-template arguments
into f15's placeholder-return-type constraint, which leads to us
incorrectly rejecting this instantiation in do_auto_deduction due to
satisfaction failure (of the constraint SameAs).

The fact that we incorrectly reject this testcase is masked by the
other instantiation f15, which we correctly reject and diagnose
(by accident).

A good place to do this missing substitution seems to be during
TEMPLATE_TYPE_PARM level lowering.  So this patch adds a call to
tsubst_constraint there, and also adds dg-bogus directives to this
testcase wherever we expect instantiation to succeed. (So without the
substitution fix, this last dg-bogus would FAIL).

Successfully tested on x86_64-pc-linux-gnu, and also on the cmcstl2 and
range-v3 projects.  Does this look OK to commit?

gcc/cp/ChangeLog:

PR c++/96443
* pt.c (tsubst) : Substitute into
the constraints on a placeholder type when its level.

gcc/testsuite/ChangeLog:

PR c++/96443
* g++.dg/cpp2a/concepts-ts1.C: Add dg-bogus wherever we expect
instantiation to succeed.
---
 gcc/cp/pt.c   | 7 ---
 gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C | 8 
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index e7496002c1c..04bf6da0cdd 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15524,10 +15524,11 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
 
 if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
  {
-   /* Propagate constraints on placeholders since they are
-  only instantiated during satisfaction.  */
+   /* Substitute constraints on placeholder when reducing
+  their level.  */
if (tree constr = PLACEHOLDER_TYPE_CONSTRAINTS (t))
- PLACEHOLDER_TYPE_CONSTRAINTS (r) = constr;
+ PLACEHOLDER_TYPE_CONSTRAINTS (r)
+   = tsubst_constraint (constr, args, complain, in_decl);
else if (tree pl = CLASS_PLACEHOLDER_TEMPLATE (t))
  {
pl = tsubst_copy (pl, args, complain, in_decl);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C
index 1cefe3b243f..1a9b71c2296 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ts1.C
@@ -34,13 +34,13 @@ auto f15(auto x) -> SameAs { return 0; } // { 
dg-error "deduced ret
 
 void driver()
 {
-  f1(0);
+  f1(0); // { dg-bogus "" }
   f2(0); // { dg-error "" }
-  f3(0);
+  f3(0); // { dg-bogus "" }
   f3('a'); // { dg-error "" }
-  f4(0, 0);
+  f4(0, 0); // { dg-bogus "" }
   f4(0, 'a'); // { dg-error "" }
-  f15(0);
+  f15(0); // { dg-bogus "" }
   f15('a'); // { dg-message "" }
 }
 
-- 
2.28.0.89.g85b4e0a6dc

[patch, fortran] Compile-time check for change in DO variable in contained procedures

2020-08-04 Thread Thomas Koenig via Gcc-patches


Hello world,

the attached patch issues an error for something that I am sure most
people did at least once (I know I did), something like

  do i=1,10
 call foo
  end do
...
contains
  subroutine foo
do i=1,5
   ...
end do

which is, of course, illegal, but the programmer's fault. We issue an
error with -fcheck=all, but a compile-time is better, of course.

As you can see from the modification of do_check_4.f90, you have to go
to some lengths to fool the compiler with this patch.

As an aside, I could really have used three places for the error
message here.  As is, I settled for the place of the call from
the DO loop checked, and the place where it is modified.  With
the name of the variable, the user should be able to figure out
what's wrong.

Regression-tested. OK for trunk?

Best regards

Thomas

Static analysis for definition of DO index variables in contained 
procedures.


When encountering a procedure call in a DO loop, this patch checks if
the call is to a contained procedure, and if it is, check for
changes in the index variable.

gcc/fortran/ChangeLog:

PR fortran/96469
* frontend-passes.c (doloop_contained_function_call): New
function.
(doloop_contained_procedure_code): New function.
(CHECK_INQ): Macro for inquire checks.
(doloop_code): Invoke doloop_contained_procedure_code and
doloop_contained_function_call if appropriate.
(do_intent): Likewise.

gcc/testsuite/ChangeLog:

PR fortran/96469
* gfortran.dg/do_check_4.f90: Hide change in index variable
from compile-time analysis.
* gfortran.dg/do_check_4.f90: New test.
diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index cdeed8943b0..13390e33188 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -2305,6 +2305,208 @@ optimize_minmaxloc (gfc_expr **e)
   mpz_set_ui (a->expr->value.integer, 1);
 }
 
+typedef struct contained_info
+{
+  gfc_symbol *do_var;
+  gfc_symbol *procedure;
+  locus where_do;
+} contained_info;
+
+
+/* Callback function that goes through the code in a contained
+   procedure to make sure it does not change a variable in a DO
+   loop.  */
+
+static enum gfc_exec_op last_io_op;
+
+static int
+doloop_contained_function_call (gfc_expr **e,
+int *walk_subtrees ATTRIBUTE_UNUSED, void *data)
+{
+  gfc_expr *expr = *e;
+  gfc_formal_arglist *f;
+  gfc_actual_arglist *a;
+  gfc_symbol *sym, *do_var;
+  contained_info *info;
+
+  if (expr->expr_type != EXPR_FUNCTION || expr->value.function.isym)
+return 0;
+
+  sym = expr->value.function.esym;
+  f = gfc_sym_get_dummy_args (sym);
+  if (f == NULL)
+return 0;
+
+  info = (contained_info *) data;
+  do_var = info->do_var;
+  a = expr->value.function.actual;
+
+  while (a && f)
+{
+  if (a->expr && a->expr->symtree && a->expr->symtree->n.sym == do_var)
+	{
+	  if (f->sym->attr.intent == INTENT_OUT)
+	{
+	  gfc_error_now ("Index variable %qs set to undefined as "
+			 "INTENT(OUT) argument at %L in procedure %qs "
+			 "called from within DO loop at %L", do_var->name,
+			 >expr->where, info->procedure->name,
+			 >where_do);
+	  return 1;
+	}
+	  else if (f->sym->attr.intent == INTENT_INOUT)
+	{
+	  gfc_error_now ("Index variable %qs not definable as "
+			 "INTENT(INOUT) argument at %L in procedure %qs "
+			 "called from within DO loop at %L", do_var->name,
+			 >expr->where, info->procedure->name,
+			 >where_do);
+	  return 1;
+	}
+	}
+  a = a->next;
+  f = f->next;
+}
+  return 0;
+}
+
+static int
+doloop_contained_procedure_code (gfc_code **c,
+ int *walk_subtrees ATTRIBUTE_UNUSED,
+ void *data)
+{
+  gfc_code *co = *c;
+  contained_info *info = (contained_info *) data;
+  gfc_symbol *do_var = info->do_var;
+  const char *errmsg = _("Index variable %qs redefined at %L in procedure %qs "
+			 "called from within DO loop at %L");
+  static enum gfc_exec_op saved_io_op;
+
+  switch (co->op)
+{
+case EXEC_ASSIGN:
+  if (co->expr1->symtree->n.sym == do_var)
+	gfc_error_now (errmsg, do_var->name, >loc, info->procedure->name,
+		   >where_do);
+  break;
+
+case EXEC_DO:
+  if (co->ext.iterator && co->ext.iterator->var
+	  && co->ext.iterator->var->symtree->n.sym == do_var)
+	gfc_error (errmsg, do_var->name, >loc, info->procedure->name,
+		   >where_do);
+  break;
+
+case EXEC_READ:
+case EXEC_WRITE:
+case EXEC_INQUIRE:
+  saved_io_op = last_io_op;
+  last_io_op = co->op;
+  break;
+
+case EXEC_OPEN:
+  if (co->ext.open->iostat
+	  && co->ext.open->iostat->symtree->n.sym == do_var)
+	gfc_error_now (errmsg, do_var->name, >ext.open->iostat->where,
+		   info->procedure->name, >where_do);
+  break;
+
+case EXEC_CLOSE:
+  if (co->ext.close->iostat
+	  && co->ext.close->iostat->symtree->n.sym == do_var)
+	gfc_error_now (errmsg,

Re: [PATCH] c++: Template keyword following :: [PR96082]

2020-08-04 Thread Nathan Sidwell


On 8/4/20 1:30 PM, Jason Merrill via Gcc-patches wrote:

On 8/4/20 10:05 AM, Marek Polacek wrote:

In r9-4235 I tried to make sure that the template keyword follows
a nested-name-specifier.  :: is a valid nested-name-specifier, so
I also have to check 'globalscope' before giving the error.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10/9?


Why would anyone write that?


Users are more ... imaginative than you or me!



The patch is OK.


nathan

--
Nathan Sidwell

Re: [PATCH] c++: cxx_eval_vec_init after zero initialization [PR96282]

2020-08-04 Thread Patrick Palka via Gcc-patches

On Tue, 4 Aug 2020, Jason Merrill wrote:

> On 8/4/20 9:45 AM, Patrick Palka wrote:
> > On Mon, 3 Aug 2020, Patrick Palka wrote:
> > 
> > > On Mon, 3 Aug 2020, Jason Merrill wrote:
> > > 
> > > > On 8/3/20 2:45 PM, Patrick Palka wrote:
> > > > > On Mon, 3 Aug 2020, Jason Merrill wrote:
> > > > > 
> > > > > > On 8/3/20 8:53 AM, Patrick Palka wrote:
> > > > > > > On Mon, 3 Aug 2020, Patrick Palka wrote:
> > > > > > > 
> > > > > > > > In the first testcase below, expand_aggr_init_1 sets up t's
> > > > > > > > default
> > > > > > > > constructor such that the ctor first zero-initializes the entire
> > > > > > > > base
> > > > > > > > b,
> > > > > > > > followed by calling b's default constructor, the latter of which
> > > > > > > > just
> > > > > > > > default-initializes the array member b::m via a VEC_INIT_EXPR.
> > > > > > > > 
> > > > > > > > So upon constexpr evaluation of this latter VEC_INIT_EXPR,
> > > > > > > > ctx->ctor
> > > > > > > > is
> > > > > > > > nonempty due to the prior zero-initialization, and we proceed in
> > > > > > > > cxx_eval_vec_init to append new constructor_elts to the end of
> > > > > > > > ctx->ctor
> > > > > > > > without first checking if a matching constructor_elt already
> > > > > > > > exists.
> > > > > > > > This leads to ctx->ctor having two matching constructor_elts for
> > > > > > > > each
> > > > > > > > index.
> > > > > > > > 
> > > > > > > > This patch partially fixes this issue by making the RANGE_EXPR
> > > > > > > > optimization in cxx_eval_vec_init truncate ctx->ctor before
> > > > > > > > adding the
> > > > > > > > single RANGE_EXPR constructor_elt.  This isn't a complete fix
> > > > > > > > because
> > > > > > > > the RANGE_EXPR optimization applies only when the constant
> > > > > > > > initializer
> > > > > > > > is relocatable, so whenever it's not relocatable we can still
> > > > > > > > build up
> > > > > > > > an invalid CONSTRUCTOR, e.g. if in the first testcase we add an
> > > > > > > > NSDMI
> > > > > > > > such as 'e *p = this;' to struct e, then the ICE still occurs
> > > > > > > > even
> > > > > > > > with
> > > > > > > > this patch.
> > > > > > > 
> > > > > > > A complete but more risky one-line fix would be to always truncate
> > > > > > > ctx->ctor beforehand, not just when the RANGE_EXPR optimization
> > > > > > > applies.
> > > > > > > If it's true that the initializer of a VEC_INIT_EXPR can't observe
> > > > > > > the
> > > > > > > previous elements of the target array, then it should be safe to
> > > > > > > always
> > > > > > > truncate I think?
> > > > > > 
> > > > > > What if default-initialization of the array element type doesn't
> > > > > > fully
> > > > > > initialize the elements, e.g. if 'e' had another member without a
> > > > > > default
> > > > > > initializer?  Does truncation first mean we lose the
> > > > > > zero-initialization
> > > > > > of
> > > > > > such a member?
> > > > > 
> > > > > Hmm, it looks like we would lose the zero-initialization of such a
> > > > > member with or without truncation first (so with any one of the three
> > > > > proposed fixes).  I think it's because the evaluation loop in
> > > > > cxx_eval_vec_init disregards each element's prior (zero-initialized)
> > > > > state.
> > > > > 
> > > > > > 
> > > > > > We could probably still do the truncation, but clear the
> > > > > > CONSTRUCTOR_NO_CLEARING flag on the element initializer.
> > > > > 
> > > > > Ah, this seems to work well.  Like this?
> > > > > 
> > > > > -- >8 --
> > > > > 
> > > > > Subject: [PATCH] c++: cxx_eval_vec_init after zero initialization
> > > > > [PR96282]
> > > > > 
> > > > > In the first testcase below, expand_aggr_init_1 sets up t's default
> > > > > constructor such that the ctor first zero-initializes the entire base
> > > > > b,
> > > > > followed by calling b's default constructor, the latter of which just
> > > > > default-initializes the array member b::m via a VEC_INIT_EXPR.
> > > > > 
> > > > > So upon constexpr evaluation of this latter VEC_INIT_EXPR, ctx->ctor
> > > > > is
> > > > > nonempty due to the prior zero-initialization, and we proceed in
> > > > > cxx_eval_vec_init to append new constructor_elts to the end of
> > > > > ctx->ctor
> > > > > without first checking if a matching constructor_elt already exists.
> > > > > This leads to ctx->ctor having two matching constructor_elts for each
> > > > > index.
> > > > > 
> > > > > This patch fixes this issue by truncating a zero-initialized array
> > > > > object in cxx_eval_vec_init_1 before we begin appending
> > > > > default-initialized
> > > > > array elements to it.  Since default-initialization may leave parts of
> > > > > the element type unitialized, we also preserve the array's prior
> > > > > zero-initialized state by clearing CONSTRUCTOR_NO_CLEARING on each
> > > > > appended element initializers.
> > > > > 
> > > > > gcc/cp/ChangeLog:
> > > > > 
> > > > >   PR c++/96282
> > > > >   * constexpr.c (cxx_eval_vec_init_1): Truncate ctx->ctor and
> > >

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-04 Thread H.J. Lu via Gcc-patches

On Tue, Aug 4, 2020 at 12:35 AM Richard Biener  wrote:
>
> On Mon, 3 Aug 2020, Qing Zhao wrote:
>
> > Hi, Uros,
> >
> > Thanks a lot for your review on X86 parts.
> >
> > Hi, Richard,
> >
> > Could you please take a look at the middle-end part to see whether the
> > rewritten addressed your previous concern?
>
> I have a few comments below - I'm not sure I'm qualified to fully
> review the rest though.
>
> > Thanks a lot.
> >
> > Qing
> >
> >
> > > On Jul 31, 2020, at 12:57 PM, Uros Bizjak  wrote:
> > >
> > >
> > > 22:05, tor., 28. jul. 2020 je oseba Qing Zhao  > > > napisala:
> > > >
> > > >
> > > > Richard and Uros,
> > > >
> > > > Could you please review the change that H.J and I rewrote based on your 
> > > > comments in the previous round of discussion?
> > > >
> > > > This patch is a nice security enhancement for GCC that has been 
> > > > requested by security people for quite some time.
> > > >
> > > > Thanks a lot for your time.
> > >
> > > I'll be away from the keyboard for the next week, but the patch needs a 
> > > middle end approval first.
> > >
> > > That said, x86 parts looks OK.
> > >
> > >
> >
> > > Uros.
> > > > Qing
> > > >
> > > > > On Jul 14, 2020, at 9:45 AM, Qing Zhao via Gcc-patches 
> > > > > mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > >
> > > > > Hi, Gcc team,
> > > > >
> > > > > This patch is a follow-up on the previous patch and corresponding 
> > > > > discussion:
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545101.html 
> > > > >  
> > > > >  > > > > >
> > > > >
> > > > > From the previous round of discussion, the major issues raised were:
> > > > >
> > > > > A. should be rewritten by using regsets infrastructure.
> > > > > B. Put the patch into middle-end instead of x86 backend.
> > > > >
> > > > > This new patch is rewritten based on the above 2 comments.  The major 
> > > > > changes compared to the previous patch are:
> > > > >
> > > > > 1. Change the names of the option and attribute from
> > > > > -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]  and 
> > > > > zero_caller_saved_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > to:
> > > > > -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]   and  
> > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all”)
> > > > > Add the new option and  new attribute in general.
> > > > > 2. The main code generation part is moved from i386 backend to 
> > > > > middle-end;
> > > > > 3. Add 4 target-hooks;
> > > > > 4. Implement these 4 target-hooks on i386 backend.
> > > > > 5. On a target that does not implement the target hook, issue error 
> > > > > for the new option, issue warning for the new attribute.
> > > > >
> > > > > The patch is as following:
> > > > >
> > > > > [PATCH] Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]
> > > > > command-line option and
> > > > > zero_call_used_regs("skip|used-gpr|all-gpr||used|all") function 
> > > > > attribue:
> > > > >
> > > > >  1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> > > > >
> > > > >  Don't zero call-used registers upon function return.
>
> Does a return via EH unwinding also constitute a function return?  I
> think you may want to have a finally handler or support in the unwinder
> for this?  Then there's abnormal return via longjmp & friends, I guess
> there's nothing that can be done there besides patching glibc?

Abnormal returns, like EH unwinding and longjmp, aren't covered by this
patch. Only normal returns are covered.

> In general I am missing reasoning as why to use -fzero-call-used-regs=
> in the documentation, that is, what is the thread model and what are
> the guarantees?  Is there any point zeroing registers when spill slots
> are left populated with stale register contents?  How do I (and why
> would I want to?) ensure that there's no information leak from the
> implementation of 'foo' to their callers?  Do I need to compile all
> of 'foo' and functions called from 'foo' with -fzero-call-used-regs=
> or is it enough to annotate API boundaries I want to proptect with
> zero_call_used_regs("...")?
>
> Again - what's the intended use (and how does it fulful anything useful
> for that case)?
>
> > > > >  2. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> > > > >
> > > > >  Zero used call-used general purpose registers upon function return.
> > > > >
> > > > >  3. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> > > > >
> > > > >  Zero all call-used general purpose registers upon function return.
> > > > >
> > > > >  4. -fzero-call-used-regs=used and zero_call_used_regs("used")
> > > > >
> > > > >  Zero used call-used registers upon function return.
> > > > >
> > > > >  5. -fzero-call-used-regs=all and zero_call_used_regs("all")
> > > > >
>

Re: [PATCH 0/6] Backport power10 prefixed instruction tests to GCC 10

2020-08-04 Thread Segher Boessenkool

On Tue, Aug 04, 2020 at 01:46:49AM -0400, Michael Meissner wrote:
> The following 6 patches backport the tests on the master branch that were 
> added
> to test the new prefixed instructions being added to the Power10 processor.
> These patches include changes made by David Edelsohn to make the patches work
> on AIX.  I have tested them on a GCC 10 compiler on a little endian Linux
> power8 system, and all of the tests now pass.  Can I check these patches into
> the GCC 10 branch?

All six are okay for backport to GCC 10.  Thanks!


Segher

Re: [PATCH] c++: Template keyword following :: [PR96082]

2020-08-04 Thread Jason Merrill via Gcc-patches


On 8/4/20 10:05 AM, Marek Polacek wrote:

In r9-4235 I tried to make sure that the template keyword follows
a nested-name-specifier.  :: is a valid nested-name-specifier, so
I also have to check 'globalscope' before giving the error.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10/9?


Why would anyone write that?

The patch is OK.


gcc/cp/ChangeLog:

PR c++/96082
* parser.c (cp_parser_elaborated_type_specifier): Allow
'template' following ::.

gcc/testsuite/ChangeLog:

PR c++/96082
* g++.dg/template/template-keyword3.C: New test.
---
  gcc/cp/parser.c   |  2 +-
  gcc/testsuite/g++.dg/template/template-keyword3.C | 11 +++
  2 files changed, 12 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/template/template-keyword3.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ab088874ba7..3782edd429e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -18826,7 +18826,7 @@ cp_parser_elaborated_type_specifier (cp_parser* parser,
if (!template_p)
cp_parser_parse_tentatively (parser);
/* The `template' keyword must follow a nested-name-specifier.  */
-  else if (!nested_name_specifier)
+  else if (!nested_name_specifier && !globalscope)
{
  cp_parser_error (parser, "% must follow a nested-"
   "name-specifier");
diff --git a/gcc/testsuite/g++.dg/template/template-keyword3.C 
b/gcc/testsuite/g++.dg/template/template-keyword3.C
new file mode 100644
index 000..91af2b3dc02
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/template-keyword3.C
@@ -0,0 +1,11 @@
+// PR c++/96082
+// { dg-do compile { target c++11 } }
+
+template  class A {};
+
+void
+f ()
+{
+  typename::template A  a;
+  ::template A  a2;
+}

base-commit: 7bd72dd5a385dfa6d49cfe640cefc9ed187361d3

Re: [PATCH] c++: cxx_eval_vec_init after zero initialization [PR96282]

2020-08-04 Thread Jason Merrill via Gcc-patches


On 8/4/20 9:45 AM, Patrick Palka wrote:

On Mon, 3 Aug 2020, Patrick Palka wrote:


On Mon, 3 Aug 2020, Jason Merrill wrote:


On 8/3/20 2:45 PM, Patrick Palka wrote:

On Mon, 3 Aug 2020, Jason Merrill wrote:


On 8/3/20 8:53 AM, Patrick Palka wrote:

On Mon, 3 Aug 2020, Patrick Palka wrote:


In the first testcase below, expand_aggr_init_1 sets up t's default
constructor such that the ctor first zero-initializes the entire base
b,
followed by calling b's default constructor, the latter of which just
default-initializes the array member b::m via a VEC_INIT_EXPR.

So upon constexpr evaluation of this latter VEC_INIT_EXPR, ctx->ctor
is
nonempty due to the prior zero-initialization, and we proceed in
cxx_eval_vec_init to append new constructor_elts to the end of
ctx->ctor
without first checking if a matching constructor_elt already exists.
This leads to ctx->ctor having two matching constructor_elts for each
index.

This patch partially fixes this issue by making the RANGE_EXPR
optimization in cxx_eval_vec_init truncate ctx->ctor before adding the
single RANGE_EXPR constructor_elt.  This isn't a complete fix because
the RANGE_EXPR optimization applies only when the constant initializer
is relocatable, so whenever it's not relocatable we can still build up
an invalid CONSTRUCTOR, e.g. if in the first testcase we add an NSDMI
such as 'e *p = this;' to struct e, then the ICE still occurs even
with
this patch.


A complete but more risky one-line fix would be to always truncate
ctx->ctor beforehand, not just when the RANGE_EXPR optimization applies.
If it's true that the initializer of a VEC_INIT_EXPR can't observe the
previous elements of the target array, then it should be safe to always
truncate I think?


What if default-initialization of the array element type doesn't fully
initialize the elements, e.g. if 'e' had another member without a default
initializer?  Does truncation first mean we lose the zero-initialization
of
such a member?


Hmm, it looks like we would lose the zero-initialization of such a
member with or without truncation first (so with any one of the three
proposed fixes).  I think it's because the evaluation loop in
cxx_eval_vec_init disregards each element's prior (zero-initialized)
state.



We could probably still do the truncation, but clear the
CONSTRUCTOR_NO_CLEARING flag on the element initializer.


Ah, this seems to work well.  Like this?

-- >8 --

Subject: [PATCH] c++: cxx_eval_vec_init after zero initialization [PR96282]

In the first testcase below, expand_aggr_init_1 sets up t's default
constructor such that the ctor first zero-initializes the entire base b,
followed by calling b's default constructor, the latter of which just
default-initializes the array member b::m via a VEC_INIT_EXPR.

So upon constexpr evaluation of this latter VEC_INIT_EXPR, ctx->ctor is
nonempty due to the prior zero-initialization, and we proceed in
cxx_eval_vec_init to append new constructor_elts to the end of ctx->ctor
without first checking if a matching constructor_elt already exists.
This leads to ctx->ctor having two matching constructor_elts for each
index.

This patch fixes this issue by truncating a zero-initialized array
object in cxx_eval_vec_init_1 before we begin appending default-initialized
array elements to it.  Since default-initialization may leave parts of
the element type unitialized, we also preserve the array's prior
zero-initialized state by clearing CONSTRUCTOR_NO_CLEARING on each
appended element initializers.

gcc/cp/ChangeLog:

PR c++/96282
* constexpr.c (cxx_eval_vec_init_1): Truncate ctx->ctor and
then clear CONSTRUCTOR_NO_CLEARING on each appended element
initializer if we're default-initializing a previously
zero-initialized array object.

gcc/testsuite/ChangeLog:

PR c++/96282
* g++.dg/cpp0x/constexpr-array26.C: New test.
* g++.dg/cpp0x/constexpr-array27.C: New test.
* g++.dg/cpp2a/constexpr-init18.C: New test.
---
   gcc/cp/constexpr.c | 17 -
   gcc/testsuite/g++.dg/cpp0x/constexpr-array26.C | 13 +
   gcc/testsuite/g++.dg/cpp0x/constexpr-array27.C | 13 +
   gcc/testsuite/g++.dg/cpp2a/constexpr-init18.C  | 16 
   4 files changed, 58 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-array26.C
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-array27.C
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-init18.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index b1c1d249c6e..706bef323b2 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4171,6 +4171,17 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree
atype, tree init,
 pre_init = true;
   }
   +  bool zero_initialized_p = false;
+  if ((pre_init || value_init || !init) && initializer_zerop (ctx->ctor))


Does initializer_zerop capture the difference between a default-initialized

Re: libgo patch committed: Update to go1.15rc1

2020-08-04 Thread Ian Lance Taylor via Gcc-patches

On Sun, Aug 2, 2020 at 1:00 PM Rainer Orth  
wrote:
>
> Hi Ian,
>
> > This libgo patch updates the sources to the go1.15rc1 release
> > candidate.  As usual, the changes for this update are too large to
> > include in an e-mail message.  I've just included the highlights and
> > changes to GCC-specific files below.  Bootstrapped and ran Go
> > testsuite on x86_64-pc-linux-gnu.  Committed to mainline.
>
> this seems to have broken the libgo build with 32-bit compilers:
>
> $ files=`echo /vol/gcc/src/hg/master/local/libgo/go/time/tzdata/tzdata.go 
> /vol/gcc/src/hg/master/local/libgo/go/time/tzdata/zipdata.go errors.gox 
> syscall.gox | sed -e 's/[^ ]*\.gox//g' -e 's/[^ ]*\.dep//'`; /bin/ksh 
> ./libtool --tag GO --mode=compile 
> /var/gcc/regression/master/11.4-gcc-gas/build/./gcc/gccgo 
> -B/var/gcc/regression/master/11.4-gcc-gas/build/./gcc/ 
> -B/vol/gcc/i386-pc-solaris2.11/bin/ -B/vol/gcc/i386-pc-solaris2.11/lib/ 
> -isystem /vol/gcc/i386-pc-solaris2.11/include -isystem 
> /vol/gcc/i386-pc-solaris2.11/sys-include   -fchecking=1  
> -minline-all-stringops  -O2 -g -I . -c -fgo-pkgpath=`echo time/tzdata.lo | 
> sed -e 's/.lo$//'`  -o time/tzdata.lo $files
> libtool: compile:  /var/gcc/regression/master/11.4-gcc-gas/build/./gcc/gccgo 
> -B/var/gcc/regression/master/11.4-gcc-gas/build/./gcc/ 
> -B/vol/gcc/i386-pc-solaris2.11/bin/ -B/vol/gcc/i386-pc-solaris2.11/lib/ 
> -isystem /vol/gcc/i386-pc-solaris2.11/include -isystem 
> /vol/gcc/i386-pc-solaris2.11/sys-include -fchecking=1 -minline-all-stringops 
> -O2 -g -I . -c -fgo-pkgpath=time/tzdata 
> /vol/gcc/src/hg/master/local/libgo/go/time/tzdata/tzdata.go 
> /vol/gcc/src/hg/master/local/libgo/go/time/tzdata/zipdata.go  -fPIC -o 
> time/.libs/tzdata.o
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  std::bad_alloc
> go1: internal compiler error: Abort
> mmap: Not enough space
>
> I'm seeing this on all of i386-pc-solaris2.11, sparc-sun-solaris2.11,
> and i686-pc-linux-gnu.  amd64-pc-solaris2.11 and sparcv9-sun-solaris2.11
> are ok, though (running make check right now).

This is fixed by this patch to the Go frontend that deletes lowered
constant strings.  If we lower a constant string operation in a
Binary_expression, we delete the strings.  This is safe because
constant strings are always newly allocated.

This is a hack to use much less memory when compiling the new
time/tzdata package, which has a file that contains the sum of over
13,000 constant strings.  We don't do this for numeric expressions
because that could cause us to delete an Iota_expression.

The Go frontend should have a cleaner approach to memory usage some day.

This also fixes PR go/96450.  Bootstrapped and tested on
x86_64-pc-linux-gnu and verified that I could build the time/tzdata
package on i686-pc-linux-gnu.  Committed to mainline.

Ian
2066393280f5c1573535c6a863c38cfe6baedae8
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 0bc8e1b5a59..c21b6000229 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-7f0d3834ac40cf3bcbeb9b13926ab5ccb2523537
+f45afedf90ac9af8f03d7d4515e952cbd724953a
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 90f860bd735..7e7fb8c7313 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -556,7 +556,10 @@ Expression::get_backend(Translate_context* context)
 {
   // The child may have marked this expression as having an error.
   if (this->classification_ == EXPRESSION_ERROR)
-return context->backend()->error_expression();
+{
+  go_assert(saw_errors());
+  return context->backend()->error_expression();
+}
 
   return this->do_get_backend(context);
 }
@@ -6080,6 +6083,8 @@ Binary_expression::do_lower(Gogo* gogo, Named_object*,
   Type* result_type = (left->type()->named_type() != NULL
? left->type()
: right->type());
+ delete left;
+ delete right;
   return Expression::make_string_typed(left_string + right_string,
result_type, location);
 }
@@ -6087,6 +6092,8 @@ Binary_expression::do_lower(Gogo* gogo, Named_object*,
{
  int cmp = left_string.compare(right_string);
  bool r = Binary_expression::cmp_to_bool(op, cmp);
+ delete left;
+ delete right;
  return Expression::make_boolean(r, location);
}
}

[committed] amdgcn: Remove dead defines from gcn-run

2020-08-04 Thread Andrew Stubbs

This is just an obvious code cleanup; the relocation defines have been 
unused since the move to HSACOv3. They were just left in by mistake.


Andrew
amdgcn: Remove dead defines from gcn-run

Nothing uses these since the switch to HSACOv3.

gcc/ChangeLog:

	* config/gcn/gcn-run.c (R_AMDGPU_NONE): Delete.
	(R_AMDGPU_ABS32_LO): Delete.
	(R_AMDGPU_ABS32_HI): Delete.
	(R_AMDGPU_ABS64): Delete.
	(R_AMDGPU_REL32): Delete.
	(R_AMDGPU_REL64): Delete.
	(R_AMDGPU_ABS32): Delete.
	(R_AMDGPU_GOTPCREL): Delete.
	(R_AMDGPU_GOTPCREL32_LO): Delete.
	(R_AMDGPU_GOTPCREL32_HI): Delete.
	(R_AMDGPU_REL32_LO): Delete.
	(R_AMDGPU_REL32_HI): Delete.
	(reserved): Delete.
	(R_AMDGPU_RELATIVE64): Delete.

diff --git a/gcc/config/gcn/gcn-run.c b/gcc/config/gcn/gcn-run.c
index 8961ea17d37..31f14f39c6d 100644
--- a/gcc/config/gcn/gcn-run.c
+++ b/gcc/config/gcn/gcn-run.c
@@ -34,24 +34,6 @@
 #include 
 #include 
 
-/* These probably won't be in elf.h for a while.  */
-#ifndef R_AMDGPU_NONE
-#define R_AMDGPU_NONE		0
-#define R_AMDGPU_ABS32_LO	1	/* (S + A) & 0x  */
-#define R_AMDGPU_ABS32_HI	2	/* (S + A) >> 32  */
-#define R_AMDGPU_ABS64		3	/* S + A  */
-#define R_AMDGPU_REL32		4	/* S + A - P  */
-#define R_AMDGPU_REL64		5	/* S + A - P  */
-#define R_AMDGPU_ABS32		6	/* S + A  */
-#define R_AMDGPU_GOTPCREL	7	/* G + GOT + A - P  */
-#define R_AMDGPU_GOTPCREL32_LO	8	/* (G + GOT + A - P) & 0x  */
-#define R_AMDGPU_GOTPCREL32_HI	9	/* (G + GOT + A - P) >> 32  */
-#define R_AMDGPU_REL32_LO	10	/* (S + A - P) & 0x  */
-#define R_AMDGPU_REL32_HI	11	/* (S + A - P) >> 32  */
-#define reserved		12
-#define R_AMDGPU_RELATIVE64	13	/* B + A  */
-#endif
-
 #include "hsa.h"
 
 #ifndef HSA_RUNTIME_LIB

Re: [PATCH] Amend match.pd syntax with force-simplified results

2020-08-04 Thread Marc Glisse


On Fri, 31 Jul 2020, Richard Biener wrote:


This adds a ! marker to result expressions that should simplify
(and if not fail the simplification).  This can for example be
used like

(simplify
 (plus (vec_cond:s @0 @1 @2) @3)
 (vec_cond @0 (plus! @1 @3) (plus! @2 @3)))

to make the simplification only apply in case both plus operations
in the result end up simplified to a simple operand.


(replacing plus with bit_ior)
The generated code in gimple_simplify_BIT_IOR_EXPR may look like

  {
tree _o1[2], _r1;
_o1[0] = captures[2];
_o1[1] = captures[4];
gimple_match_op tem_op (res_op->cond.any_else (), BIT_IOR_EXPR, TREE_TYPE 
(_o1[0]), _o1[0], _o1[1]);
tem_op.resimplify (lseq, valueize);
_r1 = maybe_push_res_to_seq (_op, NULL);
if (!_r1) return false;
res_op->ops[1] = _r1;
  }

In particular, it contains this "return false" which directly exits the 
function, instead of just giving up on this particular transformation and 
trying the next one. I'll reorder my transformations to work around this, 
but it looks like a pre-existing limitation.


--
Marc Glisse

RE: [PATCH 1/5][Arm] Modify default tuning of armv8.1-m.main to use Cortex-M55

2020-08-04 Thread Kyrylo Tkachov

Hi Omar,

Ok, thanks.
I've pushed this to master.
Kyrill

From: Omar Tahir 
Sent: 04 August 2020 17:10
To: Kyrylo Tkachov ; ni...@redhat.com; Ramana 
Radhakrishnan ; Richard Earnshaw 
; gcc-patches@gcc.gnu.org
Subject: [PATCH 1/5][Arm] Modify default tuning of armv8.1-m.main to use 
Cortex-M55

Previously, compiling with -march=armv8.1-m.main would tune for Cortex-M7.
However, the Cortex-M7 only supports up to Armv7e-M. The Cortex-M55 is the
earliest CPU that supports Armv8.1-M Mainline so is more appropriate. This
also has the effect of changing the branch cost function used, which will be
necessary to correctly prioritise conditional instructions over branches in
the rest of this patch series.

Regression tested on arm-none-eabi.

gcc/ChangeLog:

2020-07-30: Omar Tahir mailto:omar.ta...@arm.com>>

* config/arm/arm-cpus.in (armv8.1-m.main): Tune for Cortex-M55.

diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 728be500b80..c98f8ede8fd 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -716,7 +716,7 @@ begin arch armv8-r
end arch armv8-r

begin arch armv8.1-m.main
- tune for cortex-m7
+ tune for cortex-m55
  tune flags CO_PROC
  base 8M_MAIN
  profile M

Re: [PATCH] target: delete unnecessary codes in aarch64.c

2020-08-04 Thread Richard Sandiford

Hu Jiangping  writes:
> Hi,
>
> This patch deletes 2 unnecessary codes in function
> aarch64_if_then_else_costs, which were duplicated
> where the function starts.

Thanks, pushed to trunk.

Richard

>
> Tested on aarch64. OK for trunk?
>
> Regards!
> Hujp
>
> ---
>  gcc/config/aarch64/aarch64.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 7c3ab3eeb1f..a70b2287b2c 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11774,8 +11774,6 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx 
> op2, int *cost, bool speed)
> if (speed)
>   {
> machine_mode mode = GET_MODE (XEXP (op1, 0));
> -   const struct cpu_cost_table *extra_cost
> - = aarch64_tune_params.insn_extra_cost;
>  
> if (GET_MODE_CLASS (mode) == MODE_INT)
>   *cost += extra_cost->alu.arith;

c++: fix template parm count leak

2020-08-04 Thread Nathan Sidwell

I noticed that we could leak parser->num_template_parameter_lists with 
erroneous specializations.  We'd increment, notice a problem and then 
bail out.  This refactors cp_parser_explicit_specialization to avoid 
that code path.  A couple of tests get different diagnostics because of 
the fix.  pr39425 then goes to unbounded template instantiation and 
exceeds the implementation limit.


gcc/cp/
* parser.c (cp_parser_explicit_specialization): Refactor
to avoid leak of num_template_parameter_lists value.
gcc/testsuite/
* g++.dg/template/pr39425.C: Adjust errors, (unbounded
template recursion).
* g++.old-deja/g++.pt/spec20.C: Remove fallout diagnostics.

pushed,

nathan
--
Nathan Sidwell
diff --git i/gcc/cp/parser.c w/gcc/cp/parser.c
index ab088874ba7..9946acdb42f 100644
--- i/gcc/cp/parser.c
+++ w/gcc/cp/parser.c
@@ -17640,7 +17640,6 @@ cp_parser_explicit_instantiation (cp_parser* parser)
 static void
 cp_parser_explicit_specialization (cp_parser* parser)
 {
-  bool need_lang_pop;
   cp_token *token = cp_lexer_peek_token (parser->lexer);
 
   /* Look for the `template' keyword.  */
@@ -17651,52 +17650,54 @@ cp_parser_explicit_specialization (cp_parser* parser)
   cp_parser_require (parser, CPP_GREATER, RT_GREATER);
   /* We have processed another parameter list.  */
   ++parser->num_template_parameter_lists;
+
   /* [temp]
 
  A template ... explicit specialization ... shall not have C
  linkage.  */
-  if (current_lang_name == lang_name_c)
+  bool need_lang_pop = current_lang_name == lang_name_c;
+  if (need_lang_pop)
 {
   error_at (token->location, "template specialization with C linkage");
   maybe_show_extern_c_location ();
+
   /* Give it C++ linkage to avoid confusing other parts of the
 	 front end.  */
   push_lang_context (lang_name_cplusplus);
   need_lang_pop = true;
 }
-  else
-need_lang_pop = false;
-  /* Let the front end know that we are beginning a specialization.  */
-  if (!begin_specialization ())
-{
-  end_specialization ();
-  return;
-}
 
-  /* If the next keyword is `template', we need to figure out whether
- or not we're looking a template-declaration.  */
-  if (cp_lexer_next_token_is_keyword (parser->lexer, RID_TEMPLATE))
+  /* Let the front end know that we are beginning a specialization.  */
+  if (begin_specialization ())
 {
-  if (cp_lexer_peek_nth_token (parser->lexer, 2)->type == CPP_LESS
-	  && cp_lexer_peek_nth_token (parser->lexer, 3)->type != CPP_GREATER)
-	cp_parser_template_declaration_after_export (parser,
-		 /*member_p=*/false);
+  /* If the next keyword is `template', we need to figure out
+	 whether or not we're looking a template-declaration.  */
+  if (cp_lexer_next_token_is_keyword (parser->lexer, RID_TEMPLATE))
+	{
+	  if (cp_lexer_peek_nth_token (parser->lexer, 2)->type == CPP_LESS
+	  && cp_lexer_peek_nth_token (parser->lexer, 3)->type != CPP_GREATER)
+	cp_parser_template_declaration_after_export (parser,
+			 /*member_p=*/false);
+	  else
+	cp_parser_explicit_specialization (parser);
+	}
   else
-	cp_parser_explicit_specialization (parser);
+	/* Parse the dependent declaration.  */
+	cp_parser_single_declaration (parser,
+  /*checks=*/NULL,
+  /*member_p=*/false,
+  /*explicit_specialization_p=*/true,
+  /*friend_p=*/NULL);
 }
-  else
-/* Parse the dependent declaration.  */
-cp_parser_single_declaration (parser,
-  /*checks=*/NULL,
-  /*member_p=*/false,
-  /*explicit_specialization_p=*/true,
-  /*friend_p=*/NULL);
+
   /* We're done with the specialization.  */
   end_specialization ();
+
   /* For the erroneous case of a template with C linkage, we pushed an
  implicit C++ linkage scope; exit that scope now.  */
   if (need_lang_pop)
 pop_lang_context ();
+
   /* We're done with this parameter list.  */
   --parser->num_template_parameter_lists;
 }
diff --git i/gcc/testsuite/g++.dg/template/pr39425.C w/gcc/testsuite/g++.dg/template/pr39425.C
index d55f547e253..cd304896a61 100644
--- i/gcc/testsuite/g++.dg/template/pr39425.C
+++ w/gcc/testsuite/g++.dg/template/pr39425.C
@@ -5,14 +5,16 @@ class a {
 
   template
 struct _rec {
-  static const char size = _rec< (s >> 1) >::size;
+static const char size = _rec< (s >> 1) >::size; // { dg-error "depth" }
 };
 
   template<>	// { dg-error "explicit" }
-  struct _rec <0> {
+  struct _rec <0> { // { dg-error "too few" }
 static const char size = 0;
   };
 
   static const unsigned int value = _rec < 1 >::size;
 
-} // { dg-error "after class definition" }
+};
+
+// { dg-prune-output "compilation terminated" }
diff --git i/gcc/testsuite/g++.old-deja/g++.pt/spec20.C w/gcc/testsuite/g++.old-deja/g++.pt/spec20.C
index 610e6c73371..51bc26906eb 100644
--- i/gcc/testsuite/g++.old-deja/g++.pt/spec20.C
+++ w/gcc/testsuite/g++.old-deja/g++.pt/spec20.C

Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-08-04 Thread Richard Sandiford

xiezhiheng  writes:
>> > Sorry, I should have used it.  And I prepare a patch to use FLOAT_MODE_P
>> > macro and add a flag FLAG_SUPPRESS_FP_EXCEPTIONS to suppress
>> > FLAG_RAISE_FP_EXCEPTIONS for certain intrinsics in future.
>> 
>> The same thing is true for reading FPCR as well, so I think the flag
>> should suppress the FLOAT_MODE_P check, instead of fixing up the flags
>> afterwards.
>> 
>> I'm struggling to think of a good name though.  How about adding
>> FLAG_AUTO_FP and making the FLOAT_MODE_P check dependent on
>> FLAG_AUTO_FP
>> being set?
>> 
>> We could leave FLAG_AUTO_FP out of FLAG_ALL, since FLAG_ALL already
>> includes FLAG_FP.  Including it in FLAG_ALL wouldn't do no any harm
>> though.
>
> I could not think of a better name either.  So I choose to use FLAG_AUTO_FP
> to control the check of FLOAT_MODE_P finally.
>
> Bootstrapped and tested on aarch64 Linux platform.

Thanks, pushed to master.

Richard

[PATCH 5/5][Arm] New pattern for CSEL, CSET and CSETM instructions

2020-08-04 Thread Omar Tahir

This patch adds a new pattern, *cmovsi_insn, for generating CSEL, CSET and
CSETM instructions. It also generates CSINV and CSINC instructions in specific
cases where one of the operands is constant.

To facilitate this, one new predicate and two new constraints are added, and
*compare_scc is restricted to only match if !TARGET_COND_ARITH to prevent
an unwanted split. Additionally, alternatives 8-10 are re-enabled in
*thumnb2_movsicc_insn and splitting only occurs if !TARGET_COND_ARITH. This
forces the new pattern to be used when possible, but for more complex cases
that can't directly use CSEL it falls back to using IT blocks.

Regression tested on arm-none-eabi. The entire patch set was regression
tested on arm-linux-gnueabi also.

That's all folks!

Thanks,
Omar


2020-07-30: Sudakshina Das 
Omar Tahir 

* config/arm/thumb2.md (*cmovsi_insn): New.
(*thumb2_movsicc_insn): Don't split if TARGET_COND_ARITH, 
re-enable
alternatives 8-10.
* config/arm/arm.md (*compare_scc): Don't match if 
TARGET_COND_ARITH.


gcc/testsuite/ChangeLog:

2020-07-30: Omar Tahir 

* gcc.target/arm/csel.c: New test.
* gcc.target/arm/cset.c: New test.
* gcc.target/arm/csetm.c: New test.
* gcc.target/arm/csinv-2.c: New test.
* gcc.target/arm/csinc-2.c: New test.


diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 950e46edfee..b8dd6af50a8 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9760,7 +9760,7 @@
[(match_operand:SI 2 "s_register_operand" "r,r")
 (match_operand:SI 3 "arm_add_operand" "rI,L")]))
   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && !TARGET_COND_ARITH"
   "#"
   "&& reload_completed"
   [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3)))
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 048b25ef4a1..37d240d139b 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -37,7 +37,7 @@
;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, Ra,
;; Rg, Ri
-;; in all states: Pf, Pg
+;; in all states: Pf, Pg, UM, U1
 ;; The following memory constraints have been used:
;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
@@ -485,6 +485,16 @@
Integer constant zero."
   (match_test "op == const0_rtx"))
+(define_constraint "UM"
+  "@internal
+   A constraint that matches the immediate constant -1."
+  (match_test "op == constm1_rtx"))
+
+(define_constraint "U1"
+  "@internal
+   A constraint that matches the immediate constant +1."
+  (match_test "op == const1_rtx"))
+
(define_memory_constraint "Ux"
  "@internal
   In ARM/Thumb-2 state a valid address and load into CORE regs or only to
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 2144520829c..5d75341c9ef 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -454,6 +454,13 @@
   && arm_general_register_operand (op, 
GET_MODE (op))")
(match_test "satisfies_constraint_Pg (op)")))
+(define_predicate "arm_reg_or_m1_or_1_or_zero"
+  (and (match_code "reg,subreg,const_int")
+   (ior (match_operand 0 "arm_general_register_operand")
+(match_test "op == constm1_rtx")
+(match_test "op == const1_rtx")
+(match_test "op == const0_rtx"
+
;; True for MULT, to identify which variant of shift_operator is in use.
(define_special_predicate "mult_operator"
   (match_code "mult"))
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index d12467d7644..bc6f2a52004 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -432,6 +432,30 @@
(set_attr "type" "multiple")]
)
+(define_insn "*cmovsi_insn"
+  [(set (match_operand:SI 0 "arm_general_register_operand" 
"=r,r,r,r,r,r,r,r,r")
+(if_then_else:SI
+ (match_operator 1 "arm_comparison_operator"
+  [(match_operand 2 "cc_register" "") (const_int 0)])
+ (match_operand:SI 3 "arm_reg_or_m1_or_1_or_zero" "r, r,UM, r,U1,U1, 
Z,UM, Z")
+ (match_operand:SI 4 "arm_reg_or_m1_or_1_or_zero" "r,UM, r,U1, r, 
Z,U1, Z,UM")))]
+  "TARGET_THUMB2 && TARGET_COND_ARITH
+   && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx)
+   || (operands[3] == constm1_rtx && operands[4] == const1_rtx)))"
+  "@
+   csel\\t%0, %3, %4, %d1
+   csinv\\t%0, %3, zr, %d1
+   csinv\\t%0, %4, zr, %D1
+   csinc\\t%0, %3, zr, %d1
+   csinc\\t%0, %4, zr, %D1
+   cset\\t%0, %d1
+   cset\\t%0, %D1
+   csetm\\t%0, %d1
+   csetm\\t%0, %D1"
+  [(set_attr "type" "csel")
+   (set_attr "predicable" "no")]
+)
+
(define_insn_and_split "*thumb2_movsicc_insn"
   [(set (match_operand:SI 0 "s_register_operand" "=l,l,r,r,r,r,r,r,r,r,r,r")

[PATCH 4/5][Arm] New pattern for CSNEG instructions

2020-08-04 Thread Omar Tahir

This patch adds a new pattern, *thumb2_csneg, for generating CSNEG
instructions. It also restricts *if_neg_move and *thumb2_negscc to only match
if !TARGET_COND_ARITH which prevents undesirable matches during ifcvt.

Regression tested on arm-none-eabi.


2020-07-30: Sudakshina Das 
Omar Tahir 

* config/arm/thumb2.md (*thumb2_csneg): New.
(*thumb2_negscc): Don't match if TARGET_COND_ARITH.
* config/arm/arm.md (*if_neg_move): Don't match if 
TARGET_COND_ARITH.

gcc/testsuite/ChangeLog:

2020-07-30: Sudakshina Das 
Omar Tahir 

* gcc.target/arm/csneg.c: New test.


diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a6a31f8f4ef..950e46edfee 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11211,7 +11211,7 @@
 [(match_operand 3 "cc_register" "") (const_int 0)])
(neg:SI (match_operand:SI 2 "s_register_operand" "l,r"))
(match_operand:SI 1 "s_register_operand" "0,0")))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && !TARGET_COND_ARITH"
   "#"
   "&& reload_completed"
   [(cond_exec (match_op_dup 4 [(match_dup 3) (const_int 0)])
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 79cf684e5cb..d12467d7644 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -880,7 +880,7 @@
[(match_operand:SI 1 "s_register_operand" "r")
 (match_operand:SI 2 "arm_rhs_operand" "rI")])))
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_THUMB2"
+  "TARGET_THUMB2 && !TARGET_COND_ARITH"
   "#"
   "&& reload_completed"
   [(const_int 0)]
@@ -970,6 +970,20 @@
(set_attr "predicable" "no")]
)
+(define_insn "*thumb2_csneg"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "=r, r")
+  (if_then_else:SI
+(match_operand 1 "arm_comparison_operation" "")
+(neg:SI (match_operand:SI 2 "arm_general_register_operand" "r, r"))
+(match_operand:SI 3 "reg_or_zero_operand" "r, Z")))]
+  "TARGET_COND_ARITH"
+  "@
+   csneg\\t%0, %3, %2, %D1
+   csneg\\t%0, zr, %2, %D1"
+  [(set_attr "type" "csel")
+   (set_attr "predicable" "no")]
+)
+
(define_insn "*thumb2_movcond"
   [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts,Ts")
   (if_then_else:SI
diff --git a/gcc/testsuite/gcc.target/arm/csneg.c 
b/gcc/testsuite/gcc.target/arm/csneg.c
new file mode 100644
index 000..e48606265af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/csneg.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v8_1m_main_ok } */
+/* { dg-options "-O2 -march=armv8.1-m.main" } */
+
+int
+test_csneg32_condasn1(int w0, int w1, int w2, int w3)
+{
+  int w4;
+
+  /* { dg-final { scan-assembler "csneg\tr\[0-9\]*.*ne" } } */
+  w4 = (w0 == w1) ? -w2 : w3;
+  return w4;
+}
+
+int
+test_csneg32_condasn2(int w0, int w1, int w2, int w3)
+{
+  int w4;
+
+  /* { dg-final { scan-assembler "csneg\tr\[0-9\]*.*eq" } } */
+  w4 = (w0 == w1) ? w3 : -w2;
+  return w4;
+}
+
+unsigned long long
+test_csneg_uxtw (unsigned int a, unsigned int b, unsigned int c)
+{
+  unsigned int val;
+
+  /* { dg-final { scan-assembler "csneg\tr\[0-9\]*.*ne" } } */
+  val = a ? b : -c;
+  return val;
+}


csel_4.patch
Description: csel_4.patch

Re: [Patch] Fortran/OpenMP: Fix detecting not perfectly nested loops

2020-08-04 Thread Jakub Jelinek via Gcc-patches

On Tue, Aug 04, 2020 at 05:54:18PM +0200, Tobias Burnus wrote:
> I am not sure whether the following code is supposed
> to work but the "x = 5" is never converted into
> tree-code in gfc_trans_omp_do – hence, it makes sense
> to error out. (I have the feeling that this needs to
> be revisited for OpenMP 5.x.)

Yes 5.0 allows this, but we don't handle that yet.
And I'm afraid I have quite a few questions that need to be discussed :(

> (The equivalent C/C++ code is rejected, see PR.)
> 
>!$omp parallel do collapse(3)
>do i = 1, 8
>   do j = 1, 8
> do k = 1, 8
> end do
> x = 5  ! <<<
>   end do
>end do
> 
> OK?

Ok, thanks.

Jakub

[PATCH 3/5][Arm] New pattern for CSINC instructions

2020-08-04 Thread Omar Tahir

This patch adds a new pattern, *thumb2_csinc, for generating CSINC
instructions. It also modifies an existing pattern, *thumb2_cond_arith, to
output CINC when the operation is an addition and TARGET_COND_ARITH is true.

Regression tested on arm-none-eabi.


2020-07-30: Sudakshina Das 
Omar Tahir 

* config/arm/thumb2.md (*thumb2_csinc): New.
(*thumb2_cond_arith): Generate CINC where possible.

gcc/testsuite/ChangeLog:

2020-07-30: Sudakshina Das 
Omar Tahir 

* gcc.target/arm/csinc-1.c: New test.


diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 0b00aef7ef7..79cf684e5cb 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -743,6 +743,9 @@
 if (GET_CODE (operands[4]) == LT && operands[3] == const0_rtx)
   return \"%i5\\t%0, %1, %2, lsr #31\";
+if (GET_CODE (operands[5]) == PLUS && TARGET_COND_ARITH)
+  return \"cinc\\t%0, %1, %d4\";
+
 output_asm_insn (\"cmp\\t%2, %3\", operands);
 if (GET_CODE (operands[5]) == AND)
   {
@@ -952,6 +955,21 @@
(set_attr "predicable" "no")]
)
+(define_insn "*thumb2_csinc"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "=r, r")
+  (if_then_else:SI
+(match_operand 1 "arm_comparison_operation" "")
+(plus:SI (match_operand:SI 2 "arm_general_register_operand" "r, r")
+ (const_int 1))
+(match_operand:SI 3 "reg_or_zero_operand" "r, Z")))]
+  "TARGET_COND_ARITH"
+  "@
+   csinc\\t%0, %3, %2, %D1
+   csinc\\t%0, zr, %2, %D1"
+  [(set_attr "type" "csel")
+   (set_attr "predicable" "no")]
+)
+
(define_insn "*thumb2_movcond"
   [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts,Ts")
   (if_then_else:SI
diff --git a/gcc/testsuite/gcc.target/arm/csinc-1.c 
b/gcc/testsuite/gcc.target/arm/csinc-1.c
new file mode 100644
index 000..b9928493862
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/csinc-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v8_1m_main_ok } */
+/* { dg-options "-O2 -march=armv8.1-m.main" } */
+
+int
+test_csinc32_condasn1(int w0, int w1, int w2, int w3)
+{
+  int w4;
+
+  /* { dg-final { scan-assembler "csinc\tr\[0-9\]*.*ne" } } */
+  w4 = (w0 == w1) ? (w2 + 1) : w3;
+  return w4;
+}
+
+int
+test_csinc32_condasn2(int w0, int w1, int w2, int w3)
+{
+  int w4;
+
+  /* { dg-final { scan-assembler "csinc\tr\[0-9\]*.*eq" } } */
+  w4 = (w0 == w1) ? w3 : (w2 + 1);
+  return w4;
+}


csel_3.patch
Description: csel_3.patch

[PATCH 2/5][Arm] New pattern for CSINV instructions

2020-08-04 Thread Omar Tahir

This patch adds a new pattern, *thumb2_csinv, for generating CSINV nstructions.

This pattern relies on a few general changes that will be used throughout
the following patches:
- A new macro, TARGET_COND_ARITH, which is only true on 8.1-M 
Mainline
  and represents the existence of these conditional 
instructions.
- A change to the cond exec hook, 
arm_have_conditional_execution, which
  now returns false if TARGET_COND_ARITH before reload. This 
allows for
  some ifcvt transformations when they would usually be 
disabled. I've
  written a rather verbose comment (with the risk of 
over-explaining)
  as it's a bit of a confusing change.
- One new predicate and one new constraint.
- *thumb2_movcond has been restricted to only match if 
!TARGET_COND_ARITH,
  otherwise it triggers undesirable combines.

2020-07-30: Sudakshina Das 
Omar Tahir 

* config/arm/arm.h (TARGET_COND_ARITH): New macro.
* config/arm/arm.c (arm_have_conditional_execution): Return 
false if
TARGET_COND_ARITH before reload.
* config/arm/constraints.md: (Z): New constant zero.
* config/arm/predicates.md(arm_comparison_operation): Returns 
true if
comparing CC_REGNUM with constant zero.
* config/arm/thumb2.md (*thumb2_csinv): New.
(*thumb2_movcond): Don't match if TARGET_COND_ARITH.

Regression tested on arm-none-eabi.


gcc/testsuite/ChangeLog:

2020-07-30: Sudakshina Das 
Omar Tahir 

* gcc.target/arm/csinv-1.c: New test.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dac9a6fb5c4..3a9684cdcd8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29833,12 +29833,20 @@ arm_frame_pointer_required (void)
   return false;
}
-/* Only thumb1 can't support conditional execution, so return true if
-   the target is not thumb1.  */
static bool
arm_have_conditional_execution (void)
{
-  return !TARGET_THUMB1;
+  bool has_cond_exec, enable_ifcvt_trans;
+
+  /* Only THUMB1 cannot support conditional execution. */
+  has_cond_exec = !TARGET_THUMB1;
+
+  /* When TARGET_COND_ARITH is defined we'd like to turn on some ifcvt
+ transformations before reload. */
+  enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed;
+
+  /* The ifcvt transformations are only turned on if we return false. */
+  return has_cond_exec && !enable_ifcvt_trans;
}
 /* The AAPCS sets the maximum alignment of a vector to 64 bits.  */
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 30e1d6dc994..d67c91796e4 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -177,6 +177,10 @@ emission of floating point pcs attributes.  */
 #define TARGET_CRC32   
(arm_arch_crc)
+/* Thumb-2 but also has some conditional arithmetic instructions like csinc,
+   csinv, etc. */
+#define TARGET_COND_ARITH(arm_arch8_1m_main)
+
/* The following two macros concern the ability to execute coprocessor
instructions for VFPv3 or NEON.  TARGET_VFP3/TARGET_VFPD32 are currently
only ever tested when we know we are generating for VFP hardware; we need
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 011badc9957..048b25ef4a1 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -28,6 +28,7 @@
;; The following normal constraints have been used:
;; in ARM/Thumb-2 state: G, I, j, J, K, L, M
;; in Thumb-1 state: I, J, K, L, M, N, O
+;; in all states: Z
;; 'H' was previously used for FPA.
 ;; The following multi-letter normal constraints have been used:
@@ -479,6 +480,11 @@
  (and (match_code "mem")
   (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1, true)")))
+(define_constraint "Z"
+  "@internal
+   Integer constant zero."
+  (match_test "op == const0_rtx"))
+
(define_memory_constraint "Ux"
  "@internal
   In ARM/Thumb-2 state a valid address and load into CORE regs or only to
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 981eec520ba..2144520829c 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -485,6 +485,18 @@
   (and (match_operand 0 "expandable_comparison_operator")
(match_test "maybe_get_arm_condition_code (op) != ARM_NV")))
+(define_special_predicate "arm_comparison_operation"
+  (match_code "eq,ne,le,lt,ge,gt,geu,gtu,leu,ltu,unordered,
+ ordered,unlt,unle,unge,ungt")
+{
+  if (XEXP (op, 1) != const0_rtx)
+return false;
+  rtx op0 = XEXP (op, 0);
+  if (!REG_P (op0) || REGNO (op0) != CC_REGNUM)
+return false;
+  return maybe_get_arm_condition_code (op) != ARM_NV;
+})
+
(define_special_predicate "lt_ge_comparison_operator"
   (match_code "lt,ge"))
diff --git a/gcc/config/arm/thumb2.md

[PATCH 1/5][Arm] Modify default tuning of armv8.1-m.main to use Cortex-M55

2020-08-04 Thread Omar Tahir

Previously, compiling with -march=armv8.1-m.main would tune for Cortex-M7.
However, the Cortex-M7 only supports up to Armv7e-M. The Cortex-M55 is the
earliest CPU that supports Armv8.1-M Mainline so is more appropriate. This
also has the effect of changing the branch cost function used, which will be
necessary to correctly prioritise conditional instructions over branches in
the rest of this patch series.

Regression tested on arm-none-eabi.


gcc/ChangeLog:

2020-07-30: Omar Tahir 

* config/arm/arm-cpus.in (armv8.1-m.main): Tune for Cortex-M55.


diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 728be500b80..c98f8ede8fd 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -716,7 +716,7 @@ begin arch armv8-r
end arch armv8-r
 begin arch armv8.1-m.main
- tune for cortex-m7
+ tune for cortex-m55
  tune flags CO_PROC
  base 8M_MAIN
  profile M


csel_1.patch
Description: csel_1.patch

[PATCH 0/5][Arm] Add support for conditional instructions (CSEL, CSINC etc.) for Armv8.1-M Mainline

2020-08-04 Thread Omar Tahir

Hi all,

This patch series provides support for the following instructions that were
added in Armv8.1-M Mainline [1]:
- CSEL
- CSET
- CSETM
- CSNEG
- CSINV
- CSINC
- CINC

The patch series is organised as follows:
1) Modify default tuning for -march=armv8.1-m.main.
2) New macro, predicate and constraint. New pattern *thumb2_csinv that
   generates CSINV.
3) New pattern *thumb2_csinc that generates CSINC.
4) New pattern *thumb2_csneg that generates CSNEG.
5) New predicate, new constraints. New pattern *cmovsi_insn that generates
   CSEL, CSET, CSETM, CSINC and CSINV in specific cases.

CINV and CNEG aren't used as they are aliases for CSINV and CSNEG
respectively. There is one place CINC is used, as an optimisation in an
existing pattern.

Some existing patterns are modified to force the new patterns to be used
when appropriate and to prevent undesirable "optimisations". For example,
often `if_then_else` insns are split into `cond_exec` insns (see *compare_scc
in arm.md). This makes it harder to generate instructions like CSEL, so this
behaviour is disabled when targting Armv8.1-M Mainline. The combine and ifcvt
passes also cause problems, for example *thumb2_movcond which can cause
unwanted combines. In some cases the define_insn is disabled, in others only
splitting is disabled.

Along with matching the obvious cases, some edge cases are taken advantage of
to slightly optimise code generation. For example, R1 = CC ? 1 : R0 can take
advantage of the zero register to generate CSINC R1, ZR, R0.

There are a few cases where CSEL etc. could be used, but it's more cumbersome
to do so, therefore the default IT block implementation is kept (see
*thumb2_movsicc_insn alts 8-10). In general however, code generated on
Armv8.1-M Mainline will see a large decrease in the number of IT blocks.

Entire patch series together regression tested on arm-none-eabi and
arm-none-linux-gnueabi with no regressions, with a minor performance
improvement (-0.1% cycle count) on a proprietary benchmark.

Thanks,
Omar

[1] https://static.docs.arm.com/ddi0553/bf/DDI0553B_f_armv8m_arm.pdf



csel.patch
Description: csel.patch

[Patch] Fortran/OpenMP: Fix detecting not perfectly nested loops

2020-08-04 Thread Tobias Burnus


I am not sure whether the following code is supposed
to work but the "x = 5" is never converted into
tree-code in gfc_trans_omp_do – hence, it makes sense
to error out. (I have the feeling that this needs to
be revisited for OpenMP 5.x.)

(The equivalent C/C++ code is rejected, see PR.)

   !$omp parallel do collapse(3)
   do i = 1, 8
  do j = 1, 8
do k = 1, 8
end do
x = 5  ! <<<
  end do
   end do

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
Fortran/OpenMP: Fix detecting not perfectly nested loops

gcc/fortran/ChangeLog:

	* openmp.c (resolve_omp_do): Detect not perfectly
	nested loop with innermost collapse.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/collapse1.f90: Add dg-error.
	* gfortran.dg/gomp/collapse2.f90: New test.

 gcc/fortran/openmp.c |  4 +---
 gcc/testsuite/gfortran.dg/gomp/collapse1.f90 |  2 +-
 gcc/testsuite/gfortran.dg/gomp/collapse2.f90 | 32 
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index ec116206a5c..f402febc211 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -6201,18 +6201,16 @@ resolve_omp_do (gfc_code *code)
 		}
 	  do_code2 = do_code2->block->next;
 	}
 	}
-  if (i == collapse)
-	break;
   for (c = do_code->next; c; c = c->next)
 	if (c->op != EXEC_NOP && c->op != EXEC_CONTINUE)
 	  {
 	gfc_error ("collapsed %s loops not perfectly nested at %L",
 		   name, >loc);
 	break;
 	  }
-  if (c)
+  if (i == collapse || c)
 	break;
   do_code = do_code->block;
   if (do_code->op != EXEC_DO && do_code->op != EXEC_DO_WHILE)
 	{
diff --git a/gcc/testsuite/gfortran.dg/gomp/collapse1.f90 b/gcc/testsuite/gfortran.dg/gomp/collapse1.f90
index f16a780ad99..1a06eaba823 100644
--- a/gcc/testsuite/gfortran.dg/gomp/collapse1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/collapse1.f90
@@ -30,9 +30,9 @@ subroutine collapse1
   !$omp parallel do collapse(2)
 do i = 1, 3
   do j = 4, 6
   end do
-  k = 4
+  k = 4  ! { dg-error "loops not perfectly nested" }
 end do
   !$omp parallel do collapse(2)
 do i = 1, 3
   do			! { dg-error "cannot be a DO WHILE or DO without loop control" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/collapse2.f90 b/gcc/testsuite/gfortran.dg/gomp/collapse2.f90
new file mode 100644
index 000..1ab934e3d0d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/collapse2.f90
@@ -0,0 +1,32 @@
+program p
+   integer :: i, j, k
+   real :: x
+   !$omp parallel do collapse(3)
+   do i = 1, 8
+  do j = 1, 8
+do k = 1, 8
+end do
+x = 5  ! { dg-error "loops not perfectly nested" }
+  end do
+   end do
+   !$omp parallel do ordered(3)
+   do i = 1, 8
+  do j = 1, 8
+do k = 1, 8
+end do
+  end do
+  x = 5  ! { dg-error "loops not perfectly nested" }
+   end do
+   !$omp parallel do collapse(2)  ! { dg-error "not enough DO loops for collapsed" }
+   do i = 1, 8
+  x = 5
+  do j = 1, 8
+  end do
+   end do
+   !$omp parallel do ordered(2)  ! { dg-error "not enough DO loops for collapsed" }
+   do i = 1, 8
+  x = 5
+  do j = 1, 8
+  end do
+   end do
+end

Re: [PATCH] Enable GCC support for AMX

2020-08-04 Thread Hongyu Wang via Gcc-patches

Kirill Yukhin  于2020年8月4日周二 下午10:47写道：
>
> Hello,
>
> On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> > Hi:
> >
> > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > which will be enabled in GLC.
> >
> > AMX is a new 64-bit programming paradigm consisting of two
> > compo nents: a set of 2-dimensional registers (tiles) representing
> > sub-arrays from a larger 2-dimensional memory image,
> > and an accelerator able to operate on tiles
> >
> > Supported instructions are
> >
> > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > AMX-BF16:tdpbf16ps
> >
> > The intrinsics adopts constant tile register number as its input parameters.
>
> I didn't go into the patch deeply, but why did you use inline asm for 
> intrinsics
> definition? Are you going to introduce register classes for thouse new tmm
> registers and new instruction definitions for new insns in machine 
> description?

In this version of patch, we just align our implementation to what
have been submitted
to llvm community. Since AMX allows variant register size in runtime
configuration,
the implementation of register allocation is still under discussion.
We will introduce
new register class and new insns in the future patch.

>
> --
> K

Re: Simplify X * C1 == C2 with undefined overflow

2020-08-04 Thread Marc Glisse


On Mon, 3 Aug 2020, Richard Biener wrote:


On Sat, Aug 1, 2020 at 9:29 AM Marc Glisse  wrote:


Hello,

this transformation is quite straightforward, without overflow, 3*X==15 is
the same as X==5 and 3*X==5 cannot happen. Adding a single_use restriction
for the first case didn't seem necessary, although of course it can
slightly increase register pressure in some cases.

Bootstrap+regtest on x86_64-pc-linux-gnu.


OK with using constant_boolean_node (cmp == NE_EXPR, type).

ISTR we had the x * 0 == CST simplification somewhere
but maybe it was x * y == 0 ...  ah, yes:

/* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the
  signed arithmetic case.  That form is created by the compiler
  often enough for folding it to be of value.  One example is in
  computing loop trip counts after Operator Strength Reduction.  */
(for cmp (simple_comparison)
scmp (swapped_simple_comparison)
(simplify
 (cmp (mult@3 @0 INTEGER_CST@1) integer_zerop@2)

As it is placed after your pattern it will be never matched I think
(but we don't warn because of INTEGER_CST vs. integer_zerop).

But I think your pattern subsumes it besides of the X * 0 == 0
compare - oh, and the other pattern also handles relational compares
(those will still trigger).

Maybe place the patterns next to each other?  Also see whether
moving yours after the above will cease the testcases to be handled
because it's no longer matched - if not this might be the better
order.


I moved it after, it still works, so I pushed the patch. Note that the 
other transformation has a single_use restriction, while this one doesn't, 
that's not very consistent, but also hopefully not so important...


--
Marc Glisse

[committed] d: Fix struct literals that have non-deterministic hash values (PR96153)

2020-08-04 Thread Iain Buclaw via Gcc-patches

Hi,

This patch adds code generation for generating a temporary for, and
pre-filling struct and array literals with zeroes before assigning, so
that alignment holes don't cause objects to produce a non-deterministic
hash value.  A new field has been added to the expression visitor to
track whether the result is being generated for another literal, so that
memset() is only called once on the top-level literal expression, and
not for nesting struct or arrays.

Bootstrapped and regression tested on x86_64-linux-gnu with multilib
configurations -m32 and -mx32.  Committed to mainline.

Regards
Iain

---
gcc/d/ChangeLog:

PR d/96153
* d-tree.h (build_expr): Add literalp argument.
* expr.cc (ExprVisitor): Add literalp_ field.
(ExprVisitor::ExprVisitor): Initialize literalp_.
(ExprVisitor::visit (AssignExp *)): Call memset() on blits where RHS
is a struct literal.  Elide assignment if initializer is all zeroes.
(ExprVisitor::visit (CastExp *)): Forward literalp_ to generation of
subexpression.
(ExprVisitor::visit (AddrExp *)): Likewise.
(ExprVisitor::visit (ArrayLiteralExp *)): Use memset() to pre-fill
object with zeroes.  Set literalp in subexpressions.
(ExprVisitor::visit (StructLiteralExp *)): Likewise.
(ExprVisitor::visit (TupleExp *)): Set literalp in subexpressions.
(ExprVisitor::visit (VectorExp *)): Likewise.
(ExprVisitor::visit (VectorArrayExp *)): Likewise.
(build_expr): Forward literal_p to ExprVisitor.

gcc/testsuite/ChangeLog:

PR d/96153
* gdc.dg/pr96153.d: New test.
---
 gcc/d/d-tree.h |   2 +-
 gcc/d/expr.cc  | 104 ++---
 gcc/testsuite/gdc.dg/pr96153.d |  31 ++
 3 files changed, 101 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/pr96153.d

diff --git a/gcc/d/d-tree.h b/gcc/d/d-tree.h
index 2be80dd1867..072de7e6543 100644
--- a/gcc/d/d-tree.h
+++ b/gcc/d/d-tree.h
@@ -633,7 +633,7 @@ extern void d_comdat_linkage (tree);
 extern void d_linkonce_linkage (tree);
 
 /* In expr.cc.  */
-extern tree build_expr (Expression *, bool = false);
+extern tree build_expr (Expression *, bool = false, bool = false);
 extern tree build_expr_dtor (Expression *);
 extern tree build_return_dtor (Expression *, Type *, TypeFunction *);
 
diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index ac3d4aaa171..85407ac7eb0 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -223,12 +223,14 @@ class ExprVisitor : public Visitor
 
   tree result_;
   bool constp_;
+  bool literalp_;
 
 public:
-  ExprVisitor (bool constp)
+  ExprVisitor (bool constp, bool literalp)
   {
 this->result_ = NULL_TREE;
 this->constp_ = constp;
+this->literalp_ = literalp;
   }
 
   tree result (void)
@@ -1072,7 +1074,7 @@ public:
 if (tb1->ty == Tstruct)
   {
tree t1 = build_expr (e->e1);
-   tree t2 = convert_for_assignment (build_expr (e->e2),
+   tree t2 = convert_for_assignment (build_expr (e->e2, false, true),
  e->e2->type, e->e1->type);
StructDeclaration *sd = tb1->isTypeStruct ()->sym;
 
@@ -1101,11 +1103,22 @@ public:
tree init = NULL_TREE;
 
/* Fill any alignment holes in the struct using memset.  */
-   if (e->op == TOKconstruct && !identity_compare_p (sd))
- init = build_memset_call (t1);
+   if ((e->op == TOKconstruct
+|| (e->e2->op == TOKstructliteral && e->op == TOKblit))
+   && (sd->isUnionDeclaration () || !identity_compare_p (sd)))
+ {
+   t1 = stabilize_reference (t1);
+   init = build_memset_call (t1);
+ }
 
-   tree result = build_assign (modifycode, t1, t2);
-   this->result_ = compound_expr (init, result);
+   /* Elide generating assignment if init is all zeroes.  */
+   if (init != NULL_TREE && initializer_zerop (t2))
+ this->result_ = compound_expr (init, t1);
+   else
+ {
+   tree result = build_assign (modifycode, t1, t2);
+   this->result_ = compound_expr (init, result);
+ }
  }
 
return;
@@ -1135,6 +1148,7 @@ public:
   to call postblits, this assignment should call dtors on old
   assigned elements.  */
if ((!postblit && !destructor)
+   || (e->op == TOKconstruct && e->e2->op == TOKarrayliteral)
|| (e->op == TOKconstruct && !lvalue && postblit)
|| (e->op == TOKblit || e->e1->type->size () == 0))
  {
@@ -1452,7 +1466,7 @@ public:
   {
 Type *ebtype = e->e1->type->toBasetype ();
 Type *tbtype = e->to->toBasetype ();
-tree result = build_expr (e->e1, this->constp_);
+tree result = build_expr (e->e1, this->constp_, this->literalp_);
 
 /* Just evaluate e1 if it has any side effects.  */
 if

Re: [PATCH] doc: Add @cindex to symver attribute

2020-08-04 Thread Sandra Loosemore


On 8/4/20 2:46 AM, Jakub Jelinek wrote:

Hi!

When looking at the symver attr documentation in html, I found there is no
name to refer to for it.
The following patch fixes that, bootstrapped on x86_64-linux, ok for trunk
and 10.3?

2020-08-04  Jakub Jelinek  

* doc/extend.texi (symver): Add @cindex for symver function attribute.


Thanks!  This looks fine.

-Sandra

[committed] amdgcn: TImode shifts

2020-08-04 Thread Andrew Stubbs


This patch implements scalar TImode shifts using hardware DImode shifts.

The middle-end cannot synthesize these because BITS_PER_WORD is 32, on 
this architecture, meaning it would try to use SImode shifts, and only 
double-word shifts are implemented.


This fixes a large number of test failures, caused by the need to enable 
TImode to support libgomp.


TImode multiply and divide remain unimplemented.

Andrew
amdgcn: TImode shifts

Implement TImode shifts in the backend.

The middle-end support that does it for other architectures doesn't work for
GCN because BITS_PER_WORD==32, meaning that TImode is quad-word, not
double-word.

gcc/ChangeLog:

	* config/gcn/gcn.md ("ti3"): New.

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 8cfb3a85d25..ed98d2d2706 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1538,6 +1538,111 @@ (define_insn "di3"
   [(set_attr "type" "sop2,sop2,vop2")
(set_attr "length" "8")])
 
+;; }}}
+;; {{{ ALU: generic 128-bit binop
+
+; TImode shifts can't be synthesized by the middle-end
+(define_expand "ti3"
+  [(set (match_operand:TI 0 "register_operand")
+	(vec_and_scalar_nocom:TI
+	  (match_operand:TI 1 "gcn_alu_operand")
+	  (match_operand:SI 2 "gcn_alu_operand")))]
+  ""
+  {
+rtx dest = operands[0];
+rtx src = operands[1];
+rtx shift = operands[2];
+
+enum {ashr, lshr, ashl} shiftop = ;
+rtx (*inverse_shift_fn) (rtx, rtx, rtx)
+  = (shiftop == ashl ? gen_lshrdi3 : gen_ashldi3);
+rtx (*logical_shift_fn) (rtx, rtx, rtx)
+  = (shiftop == ashl ? gen_ashldi3 : gen_lshrdi3);
+
+/* We shift "from" one subreg "to" the other, according to shiftop.  */
+int from = (shiftop == ashl ? 0 : 8);
+int to = (shiftop == ashl ? 8 : 0);
+rtx destfrom = simplify_gen_subreg (DImode, dest, TImode, from);
+rtx destto = simplify_gen_subreg (DImode, dest, TImode, to);
+rtx srcfrom = simplify_gen_subreg (DImode, src, TImode, from);
+rtx srcto = simplify_gen_subreg (DImode, src, TImode, to);
+
+int shiftval = (CONST_INT_P (shift) ? INTVAL (shift) : -1);
+enum {RUNTIME, ZERO, SMALL, LARGE} shiftcomparison
+ = (!CONST_INT_P (shift) ? RUNTIME
+: shiftval == 0 ? ZERO
+: shiftval < 64 ? SMALL
+: LARGE);
+
+rtx large_label, zero_label, exit_label;
+
+if (shiftcomparison == RUNTIME)
+  {
+zero_label = gen_label_rtx ();
+large_label = gen_label_rtx ();
+exit_label = gen_label_rtx ();
+
+rtx cond = gen_rtx_EQ (VOIDmode, shift, const0_rtx);
+emit_insn (gen_cbranchsi4 (cond, shift, const0_rtx, zero_label));
+
+rtx sixtyfour = GEN_INT (64);
+cond = gen_rtx_GE (VOIDmode, shift, sixtyfour);
+emit_insn (gen_cbranchsi4 (cond, shift, sixtyfour, large_label));
+  }
+
+if (shiftcomparison == SMALL || shiftcomparison == RUNTIME)
+  {
+/* Shift both parts by the same amount, then patch in the bits that
+   cross the boundary.
+   This does *not* work for zero-length shifts.  */
+rtx tmpto1 = gen_reg_rtx (DImode);
+rtx tmpto2 = gen_reg_rtx (DImode);
+emit_insn (gen_di3 (destfrom, srcfrom, shift));
+emit_insn (logical_shift_fn (tmpto1, srcto, shift));
+rtx lessershiftval = gen_reg_rtx (SImode);
+emit_insn (gen_subsi3 (lessershiftval, GEN_INT (64), shift));
+emit_insn (inverse_shift_fn (tmpto2, srcfrom, lessershiftval));
+emit_insn (gen_iordi3 (destto, tmpto1, tmpto2));
+  }
+
+if (shiftcomparison == RUNTIME)
+  {
+emit_jump_insn (gen_jump (exit_label));
+emit_barrier ();
+
+emit_label (zero_label);
+  }
+
+if (shiftcomparison == ZERO || shiftcomparison == RUNTIME)
+  emit_move_insn (dest, src);
+
+if (shiftcomparison == RUNTIME)
+  {
+emit_jump_insn (gen_jump (exit_label));
+emit_barrier ();
+
+emit_label (large_label);
+  }
+
+if (shiftcomparison == LARGE || shiftcomparison == RUNTIME)
+  {
+/* Do the shift within one part, and set the other part appropriately.
+   Shifts of 128+ bits are an error.  */
+rtx lessershiftval = gen_reg_rtx (SImode);
+emit_insn (gen_subsi3 (lessershiftval, shift, GEN_INT (64)));
+emit_insn (gen_di3 (destto, srcfrom, lessershiftval));
+if (shiftop == ashr)
+  emit_insn (gen_ashrdi3 (destfrom, srcfrom, GEN_INT (63)));
+else
+  emit_move_insn (destfrom, const0_rtx);
+  }
+
+if (shiftcomparison == RUNTIME)
+  emit_label (exit_label);
+
+DONE;
+  })
+
 ;; }}}
 ;; {{{ Atomics

[PING] Re: [PATCH] nvptx: Add support for subword compare-and-swap

2020-08-04 Thread Kwok Cheung Yeung


Hello

I posted a revised patchset about two weeks ago at:

https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550291.html

Are you able to take a look at it?

Thanks

Kwok

[RFC] libstdc++: Fix pretty-printing old implementations of std::unique_ptr

2020-08-04 Thread Andres Rodriguez via Gcc-patches

On binaries compiled against gcc5 the impl_type parameter is None,
which results in an exception being raised by is_specialization_of()

These versions of std::unique_ptr have the tuple as a root element.
---

Hi,

I ran into this issue when debugging a binary built using gcc5.

I'm not very familiar with python or the gcc codebase, so this might be
the wrong way to address this problem. But a patch seemed like a good
way to start the conversation.

Thanks for taking a look.

-Andres

P.S.: This is a resend of the patch. I joined the mailing list after
sending this patch so I'm guessing the original email got stuck in a
moderation queue.


 libstdc++-v3/python/libstdcxx/v6/printers.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index e4da8dfe5b6..3154a2a6f9d 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -247,7 +247,9 @@ class UniquePointerPrinter:
 self.val = val
 impl_type = val.type.fields()[0].type.tag
 # Check for new implementations first:
-if is_specialization_of(impl_type, '__uniq_ptr_data') \
+if impl_type is None:
+tuple_member = val['_M_t']
+elif is_specialization_of(impl_type, '__uniq_ptr_data') \
 or is_specialization_of(impl_type, '__uniq_ptr_impl'):
 tuple_member = val['_M_t']['_M_t']
 elif is_specialization_of(impl_type, 'tuple'):
-- 
2.27.0

Re: [PATCH] Enable GCC support for AMX

2020-08-04 Thread Kirill Yukhin via Gcc-patches

Hello,

On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> Hi:
> 
> This patch is about to support Intel Advanced Matrix Extensions (AMX)
> which will be enabled in GLC.
> 
> AMX is a new 64-bit programming paradigm consisting of two
> compo nents: a set of 2-dimensional registers (tiles) representing
> sub-arrays from a larger 2-dimensional memory image,
> and an accelerator able to operate on tiles
> 
> Supported instructions are
> 
> AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> AMX-BF16:tdpbf16ps
> 
> The intrinsics adopts constant tile register number as its input parameters.

I didn't go into the patch deeply, but why did you use inline asm for intrinsics
definition? Are you going to introduce register classes for thouse new tmm
registers and new instruction definitions for new insns in machine description?

--
K

Re: [PATCH] Adjust gimple-ssa-sprintf.c for irange API.

2020-08-04 Thread Martin Sebor via Gcc-patches


On 8/4/20 8:11 AM, Aldy Hernandez wrote:

On Tue, Aug 4, 2020 at 3:59 PM Martin Sebor  wrote:


On 8/4/20 5:21 AM, Aldy Hernandez via Gcc-patches wrote:

This is a rather obvious patch, but I'd like a nod before committing.

Martin, I've removed your anti-range check, as it is subsumed by the
lower_bound/upper_bound code.  However, you will have to adapt the code
for multi-ranges if desired.  For example, you may want to loop through the
sub-ranges and do the right thing.  Look at value-range.h and see the comments
for class irange.  Those are the methods you should stick to.

i.e.
   for (i=0; i < vr->num_pairs(); ++i)
   stuff_with(vr->lower_bound(i), vr->upper_bound(i))

There should be no functional changes with this patch.


I have no concern with this change but I appreciate the heads
up and the tip on how to add the multi-range support.  Just
one suggestion: I'd prefer to keep the comment about the POSIX
requirement somewhere just as a reminder.


The comment is still there, as you had a duplicate one further up:

  else if (dstsize > target_int_max ())
 {
   warning_at (gimple_location (info.callstmt), info.warnopt (),
   "specified bound %wu exceeds %",
   dstsize);
   /* POSIX requires snprintf to fail if DSTSIZE is greater
  than INT_MAX.  Avoid folding in that case.  */
   posunder4k = false;
 }

Are you ok with this, or would you rather me copy that comment somewhere else?


I'm fine with it as is, I didn't see the other copy.

Thanks!
Martin

[PATCH PR96375] arm: Fix testcase selection for Low Overhead Loop tests

2020-08-04 Thread Andrea Corallo

Hi all,

I'd like to submit the following patch to fix PR96375 ([11 regression]
arm/lob[2-5].c fail on some configurations).

It fix the observed regression making sure -mthumb is always used and
allowing Low Overhead Loop tests to be executed only on cortex-M profile
targets.

Does not introduce regressions in my testing and fix the reported one
according to Christophe (in Cc).

Okay for trunk?

Thanks

  Andrea

2020-07-31  Andrea Corallo  

* gcc.target/arm/lob1.c: Fix missing flag.
* gcc.target/arm/lob2.c: Likewise.
* gcc.target/arm/lob3.c: Likewise.
* gcc.target/arm/lob4.c: Likewise.
* gcc.target/arm/lob5.c: Likewise.
* gcc.target/arm/lob6.c: Likewise.
* lib/target-supports.exp
(check_effective_target_arm_v8_1_lob_ok): Return 1 only for
cortex-m targets, add '-mthumb' flag.

>From c4e7c1193a736c1e25dd0646c00310bc8dc833df Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Fri, 31 Jul 2020 14:52:24 +0100
Subject: [PATCH] arm: Fix testcase selection for Low Overhead Loop tests
 [PR96375]

gcc/testsuite/ChangeLog

2020-07-31  Andrea Corallo  

* gcc.target/arm/lob1.c: Fix missing flag.
* gcc.target/arm/lob2.c: Likewise.
* gcc.target/arm/lob3.c: Likewise.
* gcc.target/arm/lob4.c: Likewise.
* gcc.target/arm/lob5.c: Likewise.
* gcc.target/arm/lob6.c: Likewise.
* lib/target-supports.exp
(check_effective_target_arm_v8_1_lob_ok): Return 1 only for
cortex-m targets, add '-mthumb' flag.
---
 gcc/testsuite/gcc.target/arm/lob1.c   | 2 +-
 gcc/testsuite/gcc.target/arm/lob2.c   | 2 +-
 gcc/testsuite/gcc.target/arm/lob3.c   | 2 +-
 gcc/testsuite/gcc.target/arm/lob4.c   | 2 +-
 gcc/testsuite/gcc.target/arm/lob5.c   | 2 +-
 gcc/testsuite/gcc.target/arm/lob6.c   | 2 +-
 gcc/testsuite/lib/target-supports.exp | 4 ++--
 7 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/lob1.c 
b/gcc/testsuite/gcc.target/arm/lob1.c
index b92dc551d50..ba5c82cd55c 100644
--- a/gcc/testsuite/gcc.target/arm/lob1.c
+++ b/gcc/testsuite/gcc.target/arm/lob1.c
@@ -3,7 +3,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target arm_v8_1_lob_ok } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } } */
-/* { dg-options "-march=armv8.1-m.main -O3 --save-temps" } */
+/* { dg-options "-march=armv8.1-m.main -mthumb -O3 --save-temps" } */
 #include 
 #include "lob.h"
 
diff --git a/gcc/testsuite/gcc.target/arm/lob2.c 
b/gcc/testsuite/gcc.target/arm/lob2.c
index 1fe9a9d82bb..fdeb2686f51 100644
--- a/gcc/testsuite/gcc.target/arm/lob2.c
+++ b/gcc/testsuite/gcc.target/arm/lob2.c
@@ -2,7 +2,7 @@
if a non-inlineable function call takes place inside the loop.  */
 /* { dg-do compile } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } } */
-/* { dg-options "-march=armv8.1-m.main -O3 --save-temps" } */
+/* { dg-options "-march=armv8.1-m.main -mthumb -O3 --save-temps" } */
 #include 
 #include "lob.h"
 
diff --git a/gcc/testsuite/gcc.target/arm/lob3.c 
b/gcc/testsuite/gcc.target/arm/lob3.c
index 17cba007ccb..70314ea84b3 100644
--- a/gcc/testsuite/gcc.target/arm/lob3.c
+++ b/gcc/testsuite/gcc.target/arm/lob3.c
@@ -2,7 +2,7 @@
if causes VFP emulation library calls to happen inside the loop.  */
 /* { dg-do compile } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } } */
-/* { dg-options "-march=armv8.1-m.main -O3 --save-temps -mfloat-abi=soft" } */
+/* { dg-options "-march=armv8.1-m.main -mthumb -O3 --save-temps 
-mfloat-abi=soft" } */
 /* { dg-require-effective-target arm_softfloat } */
 #include 
 #include "lob.h"
diff --git a/gcc/testsuite/gcc.target/arm/lob4.c 
b/gcc/testsuite/gcc.target/arm/lob4.c
index 444a2c7b4bf..792f352d682 100644
--- a/gcc/testsuite/gcc.target/arm/lob4.c
+++ b/gcc/testsuite/gcc.target/arm/lob4.c
@@ -2,7 +2,7 @@
if LR is modified within the loop.  */
 /* { dg-do compile } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } } */
-/* { dg-options "-march=armv8.1-m.main -O3 --save-temps -mfloat-abi=soft" } */
+/* { dg-options "-march=armv8.1-m.main -mthumb -O3 --save-temps 
-mfloat-abi=soft" } */
 /* { dg-require-effective-target arm_softfloat } */
 #include 
 #include "lob.h"
diff --git a/gcc/testsuite/gcc.target/arm/lob5.c 
b/gcc/testsuite/gcc.target/arm/lob5.c
index c4f46e41532..1a6adf1e28e 100644
--- a/gcc/testsuite/gcc.target/arm/lob5.c
+++ b/gcc/testsuite/gcc.target/arm/lob5.c
@@ -3,7 +3,7 @@
therefore is not optimizable.  Outer loops are not optimized.  */
 /* { dg-do compile } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" 
"-mcpu=*" } } */
-/* { dg-options "-march=armv8.1-m.main -O3 --save-temps" } */
+/* { dg-options "-march=armv8.1-m.main -mthumb -O3 --save-temps" } */
 #include 
 #include "lob.h"
 
diff --git a/gcc/testsuite/gcc.target/arm/lob6.c

Re: [PATCH 2/2] Decouple adjust_range_from_scev from vr_values and value_range_equiv.

2020-08-04 Thread Aldy Hernandez via Gcc-patches

On Tue, Aug 4, 2020 at 3:27 PM Richard Biener
 wrote:
>
> On Tue, Aug 4, 2020 at 2:05 PM Aldy Hernandez via Gcc-patches
>  wrote:
> >
> > I've abstracted out the parts of the code that had nothing to do with
> > value_range_equiv into an externally visible range_of_var_in_loop().
> > This way, it can be called with any range.
> >
> > adjust_range_with_scev still works as before, intersecting with a
> > known range.  Due to the way value_range_equiv::intersect works,
> > intersecting a value_range_equiv with no equivalences into one
> > with equivalences will result in the resulting range maintaining
> > whatever equivalences it had.  So everything works as the
> > vr->update() did before (remember that ::update() retains
> > equivalences).
> >
> > OK?
> >
> > gcc/ChangeLog:
> >
> > * vr-values.c (check_for_binary_op_overflow): Change type of store
> > to range_query.
> > (vr_values::adjust_range_with_scev): Abstract most of the code...
> > (range_of_var_in_loop): ...here.  Remove value_range_equiv uses.
> > (simplify_using_ranges::simplify_using_ranges): Change type of store
> > to range_query.
> > * vr-values.h (class range_query): New.
> > (class simplify_using_ranges): Use range_query.
> > (class vr_values): Add OVERRIDE to get_value_range.
> > (range_of_var_in_loop): New.
> > ---
> >  gcc/vr-values.c | 140 ++--
> >  gcc/vr-values.h |  23 ++--
> >  2 files changed, 81 insertions(+), 82 deletions(-)
> >
> > diff --git a/gcc/vr-values.c b/gcc/vr-values.c
> > index 9002d87c14b..e7f97bdbf7b 100644
> > --- a/gcc/vr-values.c
> > +++ b/gcc/vr-values.c
> > @@ -1004,7 +1004,7 @@ vr_values::extract_range_from_comparison 
> > (value_range_equiv *vr,
> > overflow.  */
> >
> >  static bool
> > -check_for_binary_op_overflow (vr_values *store,
> > +check_for_binary_op_overflow (range_query *store,
> >   enum tree_code subcode, tree type,
> >   tree op0, tree op1, bool *ovf)
> >  {
> > @@ -1737,22 +1737,18 @@ compare_range_with_value (enum tree_code comp, 
> > const value_range *vr,
> >
> >gcc_unreachable ();
> >  }
> > +
> >  /* Given a range VR, a LOOP and a variable VAR, determine whether it
> > would be profitable to adjust VR using scalar evolution information
> > for VAR.  If so, update VR with the new limits.  */
>
> Certainly this comment needs updating now.  It's tempting to provide
> a range from the scalar evolution info separately from "adjusting" a range,
> at least the comment suggests we'll not always do so.  I'm not sure
> your patch factors that decision out or simply returns [-INF,+INF] for
> intersection.  For example...

The comment belonged to the original method which is now a wrapper.  I've moved
the comment to its original location and have added a  comment to the
new function.  And yes,
we return VARYING if we weren't able to determine a range.  I've documented it.

Thanks.
Aldy

>
> >  void
> > -vr_values::adjust_range_with_scev (value_range_equiv *vr, class loop *loop,
> > -  gimple *stmt, tree var)
> > +range_of_var_in_loop (irange *vr, range_query *query,
> > + class loop *loop, gimple *stmt, tree var)
> >  {
> > -  tree init, step, chrec, tmin, tmax, min, max, type, tem;
> > +  tree init, step, chrec, tmin, tmax, min, max, type;
> >enum ev_direction dir;
> >
> > -  /* TODO.  Don't adjust anti-ranges.  An anti-range may provide
> > - better opportunities than a regular range, but I'm not sure.  */
> > -  if (vr->kind () == VR_ANTI_RANGE)
> > -return;
> > -
>
> ... this (probably the worst example).  The rest seem to be more
> correctness issues than profitability.
>
> >chrec = instantiate_parameters (loop, analyze_scalar_evolution (loop, 
> > var));
> >
> >/* Like in PR19590, scev can return a constant function.  */
> > @@ -1763,16 +1759,17 @@ vr_values::adjust_range_with_scev 
> > (value_range_equiv *vr, class loop *loop,
> >  }
> >
> >if (TREE_CODE (chrec) != POLYNOMIAL_CHREC)
> > -return;
> > +{
> > +  vr->set_varying (TREE_TYPE (var));
> > +  return;
> > +}
> >
> >init = initial_condition_in_loop_num (chrec, loop->num);
> > -  tem = op_with_constant_singleton_value_range (init);
> > -  if (tem)
> > -init = tem;
> > +  if (TREE_CODE (init) == SSA_NAME)
> > +query->get_value_range (init, stmt)->singleton_p ();
> >step = evolution_part_in_loop_num (chrec, loop->num);
> > -  tem = op_with_constant_singleton_value_range (step);
> > -  if (tem)
> > -step = tem;
> > +  if (TREE_CODE (step) == SSA_NAME)
> > +query->get_value_range (step, stmt)->singleton_p ();
> >
> >/* If STEP is symbolic, we can't know whether INIT will be the
> >   minimum or maximum value in the range.  Also, unless INIT is
> > @@ -1781,7 +1778,10 @@ vr_values::adjust_range_with_scev

Re: [PATCH] Adjust gimple-ssa-sprintf.c for irange API.

2020-08-04 Thread Aldy Hernandez via Gcc-patches

On Tue, Aug 4, 2020 at 3:59 PM Martin Sebor  wrote:
>
> On 8/4/20 5:21 AM, Aldy Hernandez via Gcc-patches wrote:
> > This is a rather obvious patch, but I'd like a nod before committing.
> >
> > Martin, I've removed your anti-range check, as it is subsumed by the
> > lower_bound/upper_bound code.  However, you will have to adapt the code
> > for multi-ranges if desired.  For example, you may want to loop through the
> > sub-ranges and do the right thing.  Look at value-range.h and see the 
> > comments
> > for class irange.  Those are the methods you should stick to.
> >
> > i.e.
> >   for (i=0; i < vr->num_pairs(); ++i)
> >   stuff_with(vr->lower_bound(i), vr->upper_bound(i))
> >
> > There should be no functional changes with this patch.
>
> I have no concern with this change but I appreciate the heads
> up and the tip on how to add the multi-range support.  Just
> one suggestion: I'd prefer to keep the comment about the POSIX
> requirement somewhere just as a reminder.

The comment is still there, as you had a duplicate one further up:

 else if (dstsize > target_int_max ())
{
  warning_at (gimple_location (info.callstmt), info.warnopt (),
  "specified bound %wu exceeds %",
  dstsize);
  /* POSIX requires snprintf to fail if DSTSIZE is greater
 than INT_MAX.  Avoid folding in that case.  */
  posunder4k = false;
}

Are you ok with this, or would you rather me copy that comment somewhere else?

Thanks.
Aldy

[PATCH] c++: Template keyword following :: [PR96082]

2020-08-04 Thread Marek Polacek via Gcc-patches

In r9-4235 I tried to make sure that the template keyword follows
a nested-name-specifier.  :: is a valid nested-name-specifier, so
I also have to check 'globalscope' before giving the error.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10/9?

gcc/cp/ChangeLog:

PR c++/96082
* parser.c (cp_parser_elaborated_type_specifier): Allow
'template' following ::.

gcc/testsuite/ChangeLog:

PR c++/96082
* g++.dg/template/template-keyword3.C: New test.
---
 gcc/cp/parser.c   |  2 +-
 gcc/testsuite/g++.dg/template/template-keyword3.C | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/template-keyword3.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ab088874ba7..3782edd429e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -18826,7 +18826,7 @@ cp_parser_elaborated_type_specifier (cp_parser* parser,
   if (!template_p)
cp_parser_parse_tentatively (parser);
   /* The `template' keyword must follow a nested-name-specifier.  */
-  else if (!nested_name_specifier)
+  else if (!nested_name_specifier && !globalscope)
{
  cp_parser_error (parser, "% must follow a nested-"
   "name-specifier");
diff --git a/gcc/testsuite/g++.dg/template/template-keyword3.C 
b/gcc/testsuite/g++.dg/template/template-keyword3.C
new file mode 100644
index 000..91af2b3dc02
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/template-keyword3.C
@@ -0,0 +1,11 @@
+// PR c++/96082
+// { dg-do compile { target c++11 } }
+
+template  class A {};
+
+void
+f ()
+{ 
+  typename::template A  a;
+  ::template A  a2;
+}

base-commit: 7bd72dd5a385dfa6d49cfe640cefc9ed187361d3
-- 
2.26.2

Re: [PATCH] Adjust gimple-ssa-sprintf.c for irange API.

2020-08-04 Thread Martin Sebor via Gcc-patches


On 8/4/20 5:21 AM, Aldy Hernandez via Gcc-patches wrote:

This is a rather obvious patch, but I'd like a nod before committing.

Martin, I've removed your anti-range check, as it is subsumed by the
lower_bound/upper_bound code.  However, you will have to adapt the code
for multi-ranges if desired.  For example, you may want to loop through the
sub-ranges and do the right thing.  Look at value-range.h and see the comments
for class irange.  Those are the methods you should stick to.

i.e.
for (i=0; i < vr->num_pairs(); ++i)
stuff_with(vr->lower_bound(i), vr->upper_bound(i))

There should be no functional changes with this patch.


I have no concern with this change but I appreciate the heads
up and the tip on how to add the multi-range support.  Just
one suggestion: I'd prefer to keep the comment about the POSIX
requirement somewhere just as a reminder.

Thanks
Martin



Aldy

gcc/ChangeLog:

* gimple-ssa-sprintf.c (get_int_range): Adjust for irange API.
(format_integer): Same.
(handle_printf_call): Same.
---
  gcc/gimple-ssa-sprintf.c | 37 -
  1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 3d77459d811..70b031fe7b9 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -1070,7 +1070,7 @@ get_int_range (tree arg, HOST_WIDE_INT *pmin, 
HOST_WIDE_INT *pmax,
  const value_range_equiv *vr
= CONST_CAST (class vr_values *, vr_values)->get_value_range (arg);
  
-	  if (range_int_cst_p (vr))

+ if (!vr->undefined_p () && !vr->varying_p () && !vr->symbolic_p ())
{
  HOST_WIDE_INT type_min
= (TYPE_UNSIGNED (argtype)
@@ -1079,8 +1079,11 @@ get_int_range (tree arg, HOST_WIDE_INT *pmin, 
HOST_WIDE_INT *pmax,
  
  	  HOST_WIDE_INT type_max = tree_to_uhwi (TYPE_MAX_VALUE (argtype));
  
-	  *pmin = TREE_INT_CST_LOW (vr->min ());

- *pmax = TREE_INT_CST_LOW (vr->max ());
+ tree type = TREE_TYPE (arg);
+ tree tmin = wide_int_to_tree (type, vr->lower_bound ());
+ tree tmax = wide_int_to_tree (type, vr->upper_bound ());
+ *pmin = TREE_INT_CST_LOW (tmin);
+ *pmax = TREE_INT_CST_LOW (tmax);
  
  	  if (*pmin < *pmax)

{
@@ -1372,10 +1375,10 @@ format_integer (const directive , tree arg, const 
vr_values *vr_values)
const value_range_equiv *vr
= CONST_CAST (class vr_values *, vr_values)->get_value_range (arg);
  
-  if (range_int_cst_p (vr))

+  if (!vr->varying_p () && !vr->undefined_p () && !vr->symbolic_p ())
{
- argmin = vr->min ();
- argmax = vr->max ();
+ argmin = wide_int_to_tree (TREE_TYPE (arg), vr->lower_bound ());
+ argmax = wide_int_to_tree (TREE_TYPE (arg), vr->upper_bound ());
  
  	  /* Set KNOWNRANGE if the argument is in a known subrange

 of the directive's type and neither width nor precision
@@ -1388,11 +1391,7 @@ format_integer (const directive , tree arg, const 
vr_values *vr_values)
  res.argmin = argmin;
  res.argmax = argmax;
}
-  else if (vr->kind () == VR_ANTI_RANGE)
-   {
- /* Handle anti-ranges if/when bug 71690 is resolved.  */
-   }
-  else if (vr->varying_p () || vr->undefined_p ())
+  else
{
  /* The argument here may be the result of promoting the actual
 argument to int.  Try to determine the type of the actual
@@ -4561,10 +4560,13 @@ handle_printf_call (gimple_stmt_iterator *gsi, const 
vr_values *vr_values)
  const value_range_equiv *vr
= CONST_CAST (class vr_values *, vr_values)->get_value_range (size);
  
-	  if (range_int_cst_p (vr))

+ if (!vr->undefined_p () && !vr->symbolic_p ())
{
- unsigned HOST_WIDE_INT minsize = TREE_INT_CST_LOW (vr->min ());
- unsigned HOST_WIDE_INT maxsize = TREE_INT_CST_LOW (vr->max ());
+ tree type = TREE_TYPE (size);
+ tree tmin = wide_int_to_tree (type, vr->lower_bound ());
+ tree tmax = wide_int_to_tree (type, vr->upper_bound ());
+ unsigned HOST_WIDE_INT minsize = TREE_INT_CST_LOW (tmin);
+ unsigned HOST_WIDE_INT maxsize = TREE_INT_CST_LOW (tmax);
  dstsize = warn_level < 2 ? maxsize : minsize;
  
  	  if (minsize > target_int_max ())

@@ -4578,13 +4580,6 @@ handle_printf_call (gimple_stmt_iterator *gsi, const 
vr_values *vr_values)
  if (maxsize > target_int_max ())
posunder4k = false;
}
- else if (vr->varying_p ())
-   {
- /* POSIX requires snprintf to fail if DSTSIZE is greater
-than INT_MAX.  Since SIZE's range is unknown, avoid
-folding.  */
- posunder4k = false;
-   }
  
  	  /* The destination size is not

Re: [PATCH] c++: cxx_eval_vec_init after zero initialization [PR96282]

2020-08-04 Thread Patrick Palka via Gcc-patches

On Mon, 3 Aug 2020, Patrick Palka wrote:

> On Mon, 3 Aug 2020, Jason Merrill wrote:
> 
> > On 8/3/20 2:45 PM, Patrick Palka wrote:
> > > On Mon, 3 Aug 2020, Jason Merrill wrote:
> > > 
> > > > On 8/3/20 8:53 AM, Patrick Palka wrote:
> > > > > On Mon, 3 Aug 2020, Patrick Palka wrote:
> > > > > 
> > > > > > In the first testcase below, expand_aggr_init_1 sets up t's default
> > > > > > constructor such that the ctor first zero-initializes the entire 
> > > > > > base
> > > > > > b,
> > > > > > followed by calling b's default constructor, the latter of which 
> > > > > > just
> > > > > > default-initializes the array member b::m via a VEC_INIT_EXPR.
> > > > > > 
> > > > > > So upon constexpr evaluation of this latter VEC_INIT_EXPR, ctx->ctor
> > > > > > is
> > > > > > nonempty due to the prior zero-initialization, and we proceed in
> > > > > > cxx_eval_vec_init to append new constructor_elts to the end of
> > > > > > ctx->ctor
> > > > > > without first checking if a matching constructor_elt already exists.
> > > > > > This leads to ctx->ctor having two matching constructor_elts for 
> > > > > > each
> > > > > > index.
> > > > > > 
> > > > > > This patch partially fixes this issue by making the RANGE_EXPR
> > > > > > optimization in cxx_eval_vec_init truncate ctx->ctor before adding 
> > > > > > the
> > > > > > single RANGE_EXPR constructor_elt.  This isn't a complete fix 
> > > > > > because
> > > > > > the RANGE_EXPR optimization applies only when the constant 
> > > > > > initializer
> > > > > > is relocatable, so whenever it's not relocatable we can still build 
> > > > > > up
> > > > > > an invalid CONSTRUCTOR, e.g. if in the first testcase we add an 
> > > > > > NSDMI
> > > > > > such as 'e *p = this;' to struct e, then the ICE still occurs even
> > > > > > with
> > > > > > this patch.
> > > > > 
> > > > > A complete but more risky one-line fix would be to always truncate
> > > > > ctx->ctor beforehand, not just when the RANGE_EXPR optimization 
> > > > > applies.
> > > > > If it's true that the initializer of a VEC_INIT_EXPR can't observe the
> > > > > previous elements of the target array, then it should be safe to 
> > > > > always
> > > > > truncate I think?
> > > > 
> > > > What if default-initialization of the array element type doesn't fully
> > > > initialize the elements, e.g. if 'e' had another member without a 
> > > > default
> > > > initializer?  Does truncation first mean we lose the zero-initialization
> > > > of
> > > > such a member?
> > > 
> > > Hmm, it looks like we would lose the zero-initialization of such a
> > > member with or without truncation first (so with any one of the three
> > > proposed fixes).  I think it's because the evaluation loop in
> > > cxx_eval_vec_init disregards each element's prior (zero-initialized)
> > > state.
> > > 
> > > > 
> > > > We could probably still do the truncation, but clear the
> > > > CONSTRUCTOR_NO_CLEARING flag on the element initializer.
> > > 
> > > Ah, this seems to work well.  Like this?
> > > 
> > > -- >8 --
> > > 
> > > Subject: [PATCH] c++: cxx_eval_vec_init after zero initialization 
> > > [PR96282]
> > > 
> > > In the first testcase below, expand_aggr_init_1 sets up t's default
> > > constructor such that the ctor first zero-initializes the entire base b,
> > > followed by calling b's default constructor, the latter of which just
> > > default-initializes the array member b::m via a VEC_INIT_EXPR.
> > > 
> > > So upon constexpr evaluation of this latter VEC_INIT_EXPR, ctx->ctor is
> > > nonempty due to the prior zero-initialization, and we proceed in
> > > cxx_eval_vec_init to append new constructor_elts to the end of ctx->ctor
> > > without first checking if a matching constructor_elt already exists.
> > > This leads to ctx->ctor having two matching constructor_elts for each
> > > index.
> > > 
> > > This patch fixes this issue by truncating a zero-initialized array
> > > object in cxx_eval_vec_init_1 before we begin appending 
> > > default-initialized
> > > array elements to it.  Since default-initialization may leave parts of
> > > the element type unitialized, we also preserve the array's prior
> > > zero-initialized state by clearing CONSTRUCTOR_NO_CLEARING on each
> > > appended element initializers.
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   PR c++/96282
> > >   * constexpr.c (cxx_eval_vec_init_1): Truncate ctx->ctor and
> > >   then clear CONSTRUCTOR_NO_CLEARING on each appended element
> > >   initializer if we're default-initializing a previously
> > >   zero-initialized array object.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   PR c++/96282
> > >   * g++.dg/cpp0x/constexpr-array26.C: New test.
> > >   * g++.dg/cpp0x/constexpr-array27.C: New test.
> > >   * g++.dg/cpp2a/constexpr-init18.C: New test.
> > > ---
> > >   gcc/cp/constexpr.c | 17 -
> > >   gcc/testsuite/g++.dg/cpp0x/constexpr-array26.C | 13 +
> > >

[PATCH] tree-optimization/88240 - stopgap for floating point code-hoisting issues

2020-08-04 Thread Richard Biener

This adds a stopgap measure to avoid performing code-hoisting
on mixed type loads when the load we'd insert in the hoisting
position would be a floating point one.  This is because certain
targets (hello x87) cannot perform floating point loads without
possibly altering the bit representation and thus cannot be used
in place of integral loads.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk 
sofar.

Richard.

2020-08-04  Richard Biener  

PR tree-optimization/88240
* tree-ssa-sccvn.h (vn_reference_s::punned): New flag.
* tree-ssa-sccvn.c (vn_reference_insert): Initialize punned.
(vn_reference_insert_pieces): Likewise.
(visit_reference_op_call): Likewise.
(visit_reference_op_load): Track whether a ref was punned.
* tree-ssa-pre.c (do_hoist_insertion): Refuse to perform hoist
insertion on punned floating point loads.

* gcc.target/i386/pr88240.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr88240.c | 26 +
 gcc/tree-ssa-pre.c  | 10 ++
 gcc/tree-ssa-sccvn.c| 13 -
 gcc/tree-ssa-sccvn.h|  1 +
 4 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr88240.c

diff --git a/gcc/testsuite/gcc.target/i386/pr88240.c 
b/gcc/testsuite/gcc.target/i386/pr88240.c
new file mode 100644
index 000..5ee02f3193c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr88240.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mno-sse" } */
+
+int flag;
+union { double f; unsigned long long i; } u;
+void __attribute__((noinline))
+init ()
+{
+  flag = 1;
+  u.i = 18442936822990639076ULL;
+}
+unsigned long long __attribute__((noinline))
+test ()
+{
+  if (flag)
+return u.i;
+  else
+return u.f;
+}
+int main()
+{
+  init ();
+  if (test () != 18442936822990639076ULL)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 0c1654f3580..7d67305bf4b 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3571,6 +3571,16 @@ do_hoist_insertion (basic_block block)
  continue;
}
 
+  /* If we end up with a punned expression representation and this
+happens to be a float typed one give up - we can't know for
+sure whether all paths perform the floating-point load we are
+about to insert and on some targets this can cause correctness
+issues.  See PR88240.  */
+  if (expr->kind == REFERENCE
+ && PRE_EXPR_REFERENCE (expr)->punned
+ && FLOAT_TYPE_P (get_expr_type (expr)))
+   continue;
+
   /* OK, we should hoist this value.  Perform the transformation.  */
   pre_stats.hoist_insert++;
   if (dump_file && (dump_flags & TDF_DETAILS))
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 2e925a1afbf..934ae40670d 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -3601,6 +3601,7 @@ vn_reference_insert (tree op, tree result, tree vuse, 
tree vdef)
   vr1->vuse = vuse_ssa_val (vuse);
   vr1->operands = valueize_shared_reference_ops_from_ref (op, ).copy ();
   vr1->type = TREE_TYPE (op);
+  vr1->punned = 0;
   ao_ref op_ref;
   ao_ref_init (_ref, op);
   vr1->set = ao_ref_alias_set (_ref);
@@ -3660,6 +3661,7 @@ vn_reference_insert_pieces (tree vuse, alias_set_type set,
   vr1->vuse = vuse_ssa_val (vuse);
   vr1->operands = valueize_refs (operands);
   vr1->type = type;
+  vr1->punned = 0;
   vr1->set = set;
   vr1->base_set = base_set;
   vr1->hashcode = vn_reference_compute_hash (vr1);
@@ -4892,6 +4894,7 @@ visit_reference_op_call (tree lhs, gcall *stmt)
 them here.  */
   vr2->operands = vr1.operands.copy ();
   vr2->type = vr1.type;
+  vr2->punned = vr1.punned;
   vr2->set = vr1.set;
   vr2->base_set = vr1.base_set;
   vr2->hashcode = vr1.hashcode;
@@ -4918,10 +4921,11 @@ visit_reference_op_load (tree lhs, tree op, gimple 
*stmt)
   bool changed = false;
   tree last_vuse;
   tree result;
+  vn_reference_t res;
 
   last_vuse = gimple_vuse (stmt);
   result = vn_reference_lookup (op, gimple_vuse (stmt),
-   default_vn_walk_kind, NULL, true, _vuse);
+   default_vn_walk_kind, , true, _vuse);
 
   /* We handle type-punning through unions by value-numbering based
  on offset and size of the access.  Be prepared to handle a
@@ -4943,6 +4947,13 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
  gimple_match_op res_op (gimple_match_cond::UNCOND,
  VIEW_CONVERT_EXPR, TREE_TYPE (op), result);
  result = vn_nary_build_or_lookup (_op);
+ if (result
+ && TREE_CODE (result) == SSA_NAME
+ && VN_INFO (result)->needs_insertion)
+   /* Track whether this is the canonical expression for different
+  typed loads.  We use that as a stopgap

Re: [PATCH 2/2] Decouple adjust_range_from_scev from vr_values and value_range_equiv.

2020-08-04 Thread Richard Biener via Gcc-patches

On Tue, Aug 4, 2020 at 2:05 PM Aldy Hernandez via Gcc-patches
 wrote:
>
> I've abstracted out the parts of the code that had nothing to do with
> value_range_equiv into an externally visible range_of_var_in_loop().
> This way, it can be called with any range.
>
> adjust_range_with_scev still works as before, intersecting with a
> known range.  Due to the way value_range_equiv::intersect works,
> intersecting a value_range_equiv with no equivalences into one
> with equivalences will result in the resulting range maintaining
> whatever equivalences it had.  So everything works as the
> vr->update() did before (remember that ::update() retains
> equivalences).
>
> OK?
>
> gcc/ChangeLog:
>
> * vr-values.c (check_for_binary_op_overflow): Change type of store
> to range_query.
> (vr_values::adjust_range_with_scev): Abstract most of the code...
> (range_of_var_in_loop): ...here.  Remove value_range_equiv uses.
> (simplify_using_ranges::simplify_using_ranges): Change type of store
> to range_query.
> * vr-values.h (class range_query): New.
> (class simplify_using_ranges): Use range_query.
> (class vr_values): Add OVERRIDE to get_value_range.
> (range_of_var_in_loop): New.
> ---
>  gcc/vr-values.c | 140 ++--
>  gcc/vr-values.h |  23 ++--
>  2 files changed, 81 insertions(+), 82 deletions(-)
>
> diff --git a/gcc/vr-values.c b/gcc/vr-values.c
> index 9002d87c14b..e7f97bdbf7b 100644
> --- a/gcc/vr-values.c
> +++ b/gcc/vr-values.c
> @@ -1004,7 +1004,7 @@ vr_values::extract_range_from_comparison 
> (value_range_equiv *vr,
> overflow.  */
>
>  static bool
> -check_for_binary_op_overflow (vr_values *store,
> +check_for_binary_op_overflow (range_query *store,
>   enum tree_code subcode, tree type,
>   tree op0, tree op1, bool *ovf)
>  {
> @@ -1737,22 +1737,18 @@ compare_range_with_value (enum tree_code comp, const 
> value_range *vr,
>
>gcc_unreachable ();
>  }
> +
>  /* Given a range VR, a LOOP and a variable VAR, determine whether it
> would be profitable to adjust VR using scalar evolution information
> for VAR.  If so, update VR with the new limits.  */

Certainly this comment needs updating now.  It's tempting to provide
a range from the scalar evolution info separately from "adjusting" a range,
at least the comment suggests we'll not always do so.  I'm not sure
your patch factors that decision out or simply returns [-INF,+INF] for
intersection.  For example...

>  void
> -vr_values::adjust_range_with_scev (value_range_equiv *vr, class loop *loop,
> -  gimple *stmt, tree var)
> +range_of_var_in_loop (irange *vr, range_query *query,
> + class loop *loop, gimple *stmt, tree var)
>  {
> -  tree init, step, chrec, tmin, tmax, min, max, type, tem;
> +  tree init, step, chrec, tmin, tmax, min, max, type;
>enum ev_direction dir;
>
> -  /* TODO.  Don't adjust anti-ranges.  An anti-range may provide
> - better opportunities than a regular range, but I'm not sure.  */
> -  if (vr->kind () == VR_ANTI_RANGE)
> -return;
> -

... this (probably the worst example).  The rest seem to be more
correctness issues than profitability.

>chrec = instantiate_parameters (loop, analyze_scalar_evolution (loop, 
> var));
>
>/* Like in PR19590, scev can return a constant function.  */
> @@ -1763,16 +1759,17 @@ vr_values::adjust_range_with_scev (value_range_equiv 
> *vr, class loop *loop,
>  }
>
>if (TREE_CODE (chrec) != POLYNOMIAL_CHREC)
> -return;
> +{
> +  vr->set_varying (TREE_TYPE (var));
> +  return;
> +}
>
>init = initial_condition_in_loop_num (chrec, loop->num);
> -  tem = op_with_constant_singleton_value_range (init);
> -  if (tem)
> -init = tem;
> +  if (TREE_CODE (init) == SSA_NAME)
> +query->get_value_range (init, stmt)->singleton_p ();
>step = evolution_part_in_loop_num (chrec, loop->num);
> -  tem = op_with_constant_singleton_value_range (step);
> -  if (tem)
> -step = tem;
> +  if (TREE_CODE (step) == SSA_NAME)
> +query->get_value_range (step, stmt)->singleton_p ();
>
>/* If STEP is symbolic, we can't know whether INIT will be the
>   minimum or maximum value in the range.  Also, unless INIT is
> @@ -1781,7 +1778,10 @@ vr_values::adjust_range_with_scev (value_range_equiv 
> *vr, class loop *loop,
>if (step == NULL_TREE
>|| !is_gimple_min_invariant (step)
>|| !valid_value_p (init))
> -return;
> +{
> +  vr->set_varying (TREE_TYPE (var));
> +  return;
> +}
>
>dir = scev_direction (chrec);
>if (/* Do not adjust ranges if we do not know whether the iv increases
> @@ -1790,7 +1790,10 @@ vr_values::adjust_range_with_scev (value_range_equiv 
> *vr, class loop *loop,
>/* ... or if it may wrap.  */
>|| scev_probably_wraps_p (NULL_TREE, init, step,

Re: [PATCH] Aarch64: Add missing clobber for fjcvtzs

2020-08-04 Thread Andrea Corallo

Andrea Corallo  writes:

> Hi Kyrill,
>
> thanks for catching that.
>
> The attached is committed into master as d2b86e14c14.
>
>   Andrea

And backported in releases/gcc-10 as e5907f3b631.

  Andrea

Re: [Patch] Fortran: Fix for OpenMP's 'lastprivate(conditional:'

2020-08-04 Thread Jakub Jelinek via Gcc-patches

On Tue, Aug 04, 2020 at 02:26:45PM +0200, Tobias Burnus wrote:
> Follow-up to my 'lastprivate(conditional:' patch for
> a left-over problem which I kind mostly missed.
> Checking the dump also shows that it now works
> as expected.
> 
> OK?

Ok, thanks.
Note, we'll need to remove the warning when we properly implement the OpenMP
5.1 zero iterations lastprivate conditional semantics, because then there is
a difference between conditional and non-conditional iterators that are
never written in the loop.  Assuming it passes 2nd voting today.

Jakub

[Patch] Fortran: Fix for OpenMP's 'lastprivate(conditional:'

2020-08-04 Thread Tobias Burnus


Follow-up to my 'lastprivate(conditional:' patch for
a left-over problem which I kind mostly missed.
Checking the dump also shows that it now works
as expected.

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
Fortran: Fix for OpenMP's 'lastprivate(conditional:'

gcc/fortran/ChangeLog:

	* trans-openmp.c (gfc_trans_omp_do): Fix 'lastprivate(conditional:'.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/lastprivate-conditional-3.f90: Enable some
	previously disabled 'lastprivate(conditional:' dg-warnings.

 gcc/fortran/trans-openmp.c   |  2 ++
 gcc/testsuite/gfortran.dg/gomp/lastprivate-conditional-3.f90 | 12 
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 98702b1aa22..7891a7e651b 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -4570,6 +4570,8 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
 		  {
 		tree l = build_omp_clause (input_location,
 	   OMP_CLAUSE_LASTPRIVATE);
+		if (OMP_CLAUSE_LASTPRIVATE_CONDITIONAL (c))
+		  OMP_CLAUSE_LASTPRIVATE_CONDITIONAL (l) = 1;
 		OMP_CLAUSE_DECL (l) = dovar_decl;
 		OMP_CLAUSE_CHAIN (l) = omp_clauses;
 		OMP_CLAUSE_LASTPRIVATE_STMT (l) = tmp;
diff --git a/gcc/testsuite/gfortran.dg/gomp/lastprivate-conditional-3.f90 b/gcc/testsuite/gfortran.dg/gomp/lastprivate-conditional-3.f90
index 720fe9b64a8..932249c9225 100644
--- a/gcc/testsuite/gfortran.dg/gomp/lastprivate-conditional-3.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/lastprivate-conditional-3.f90
@@ -27,14 +27,12 @@ subroutine foo
 end do
   !$omp end parallel
 
-  ! Error in eqiv. C code: "conditional 'lastprivate' on loop iterator 'i' ignored"
-  !$omp parallel do lastprivate (conditional: i)
+  !$omp parallel do lastprivate (conditional: i)  ! { dg-warning "conditional 'lastprivate' on loop iterator 'i' ignored" }
   do i = 1, 32
   end do
   !$omp end parallel do
 
-  ! Error in eqiv. C code: "conditional 'lastprivate' on loop iterator 'i' ignored"
-  !$omp parallel do collapse (3) lastprivate (conditional: i)
+  !$omp parallel do collapse (3) lastprivate (conditional: i)  ! { dg-warning "conditional 'lastprivate' on loop iterator 'i' ignored" }
   do i = 1, 32
 do j = 1, 32
   do k = 1, 32
@@ -43,8 +41,7 @@ subroutine foo
   end do
   !$omp end parallel do
 
-  ! Error in eqiv. C code: "conditional 'lastprivate' on loop iterator 'j' ignored"
-  !$omp parallel do collapse (3) lastprivate (conditional: j)
+  !$omp parallel do collapse (3) lastprivate (conditional: j)  ! { dg-warning "conditional 'lastprivate' on loop iterator 'j' ignored" }
   do i = 1, 32
 do j = 1, 32
   do k = 1, 32
@@ -53,8 +50,7 @@ subroutine foo
   end do
   !$omp end parallel do
 
-  ! Error in eqiv. C code: "conditional 'lastprivate' on loop iterator 'k' ignored"
-  !$omp parallel do collapse (3) lastprivate (conditional: k)
+  !$omp parallel do collapse (3) lastprivate (conditional: k)  ! { dg-warning "conditional 'lastprivate' on loop iterator 'k' ignored" }
   do i = 1, 32
 do j = 1, 32
   do k = 1, 32

[PATCH] nvptx: Add support for PTX highpart multiplications (e.g. mul.hi.s32)

2020-08-04 Thread Roger Sayle


This patch adds support for signed and unsigned, HImode, SImode and
DImode highpart multiplications to the nvptx backend.  Without the
middle-end patch that I've just posted, the middle-end is able to
(easily) make use of the narrow four of the six instructions, but
with that patch, all six of these instructions are generated in the
provided test cases.

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures with the
above patch, and just the two failures to find mul.hi.?64 against
current mainline.  I'd considered submitting this patch either without
support for the 64bit variants, or without tests for them, but it
seemed more reasonable to make both enhancements at the same time.

Ok for mainline (once the previous patch has been approved/pushed)?


2020-08-04  Roger Sayle  

gcc/ChangeLog
* config/nvptx/nvptx.md (smulhi3_highpart, smulsi3_highpart,
smuldi4_highpart, umulhi3_highpart, umulsi3_highpart,
umuldi3_highpart): New instructions.

gcc/testsuite/ChangeLog
* gcc.target/nvptx/mul-hi.c: New test.
* gcc.target/nvptx/umul-hi.c: New test.


Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index c23edcf..0459549 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -568,6 +568,78 @@
   ""
   "%.\\tmul.wide.u32\\t%0, %1, %2;")
 
+(define_insn "smulhi3_highpart"
+  [(set (match_operand:HI 0 "nvptx_register_operand" "=R")
+   (truncate:HI
+(lshiftrt:SI
+ (mult:SI (sign_extend:SI
+   (match_operand:HI 1 "nvptx_register_operand" "R"))
+  (sign_extend:SI
+   (match_operand:HI 2 "nvptx_register_operand" "R")))
+ (const_int 16]
+  ""
+  "%.\\tmul.hi.s16\\t%0, %1, %2;")
+
+(define_insn "smulsi3_highpart"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (truncate:SI
+(lshiftrt:DI
+ (mult:DI (sign_extend:DI
+   (match_operand:SI 1 "nvptx_register_operand" "R"))
+  (sign_extend:DI
+   (match_operand:SI 2 "nvptx_register_operand" "R")))
+ (const_int 32]
+  ""
+  "%.\\tmul.hi.s32\\t%0, %1, %2;")
+
+(define_insn "smuldi3_highpart"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
+   (truncate:DI
+(lshiftrt:TI
+ (mult:TI (sign_extend:TI
+   (match_operand:DI 1 "nvptx_register_operand" "R"))
+  (sign_extend:TI
+   (match_operand:DI 2 "nvptx_register_operand" "R")))
+ (const_int 64]
+  ""
+  "%.\\tmul.hi.s64\\t%0, %1, %2;")
+
+(define_insn "umulhi3_highpart"
+  [(set (match_operand:HI 0 "nvptx_register_operand" "=R")
+   (truncate:HI
+(lshiftrt:SI
+ (mult:SI (zero_extend:SI
+   (match_operand:HI 1 "nvptx_register_operand" "R"))
+  (zero_extend:SI
+   (match_operand:HI 2 "nvptx_register_operand" "R")))
+ (const_int 16]
+  ""
+  "%.\\tmul.hi.u16\\t%0, %1, %2;")
+
+(define_insn "umulsi3_highpart"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (truncate:SI
+(lshiftrt:DI
+ (mult:DI (zero_extend:DI
+   (match_operand:SI 1 "nvptx_register_operand" "R"))
+  (zero_extend:DI
+   (match_operand:SI 2 "nvptx_register_operand" "R")))
+ (const_int 32]
+  ""
+  "%.\\tmul.hi.u32\\t%0, %1, %2;")
+
+(define_insn "umuldi3_highpart"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
+   (truncate:DI
+(lshiftrt:TI
+ (mult:TI (zero_extend:TI
+   (match_operand:DI 1 "nvptx_register_operand" "R"))
+  (zero_extend:TI
+   (match_operand:DI 2 "nvptx_register_operand" "R")))
+ (const_int 64]
+  ""
+  "%.\\tmul.hi.u64\\t%0, %1, %2;")
+
 ;; Shifts
 
 (define_insn "ashl3"
diff --git a/gcc/testsuite/gcc.target/nvptx/mul-hi.c 
b/gcc/testsuite/gcc.target/nvptx/mul-hi.c
new file mode 100644
index 000..2cc35af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/mul-hi.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wno-long-long" } */
+
+typedef int __attribute ((mode(TI))) ti_t;
+
+short smulhi3_highpart(short x, short y)
+{
+  return ((int)x * (int)y) >> 16;
+}
+
+int smulsi3_highpart(int x, int y)
+{
+  return ((long)x * (long)y) >> 32;
+}
+
+long smuldi3_highpart(long x, long y)
+{
+  return ((ti_t)x * (ti_t)y) >> 64;
+}
+
+/* { dg-final { scan-assembler-times "mul.hi.s16" 1 } } */
+/* { dg-final { scan-assembler-times "mul.hi.s32" 1 } } */
+/* { dg-final { scan-assembler-times "mul.hi.s64" 1 } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/umul-hi.c 
b/gcc/testsuite/gcc.target/nvptx/umul-hi.c
new file mode 100644
index 000..148d1ce
--- /dev/null
+++

[PATCH] middle-end: Recognize/canonicalize MULT_HIGHPART_EXPR and expand it.

2020-08-04 Thread Roger Sayle


This middle-end patch teaches fold/match to recognize the idiom for
a highpart multiplication and represent it internally as a
MULT_HIGHPART_EXPR tree code.  At RTL expansion time, the compiler
will trying using an appropriate instruction (sequence) provided
by the backend, but if that fails, this patch now provides a fallback
by synthesizing a suitable sequence using either a widening multiply
or a multiplication in a wider mode [matching the original tree].

The benefit of this internal canonicalization is that it allows GCC
to generate muldi3_highpart instructions even on targets that require
a libcall to perform TImode multiplications.  Currently the RTL
optimizers can recognize highpart multiplications in combine, but
this matching fails when the multiplication requires a libcall.
Rather than attempt to do something via REG_EQUAL_NOTEs, a clever
solution is to make more use of the MULT_HIGHPART_EXPR tree code
in the tree optimizers.

This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check", and on nvptx-none with a "make"
and "make -k check", both with no few failures.  There's an
additional target-specific test in the nvptx patch to support
"mul.hi.s64" and "mul.hi.u64" that I'm just about to post, but
this code is already well exercised during bootstrap by libgcc.

Ok for mainline?


2020-08-04  Roger Sayle  

gcc/ChangeLog
* match.pd (((wide)x * (wide)y)>>C -> mult_highpart): New
simplification/canonicalization to recognize MULT_HIGHPART_EXPR.
* optabs.c (expand_mult_highpart_1): New function to expand
MULT_HIGHPART_EXPR as a widening or a wide multiplication
followed by a right shift (or a gen_highpart subreg).
(expand_mult_highpart): Call the above function if the target
doesn't provide a suitable optab.

gcc/testsuite/ChangeLog
* gcc.dg/fold-mult-highpart-1.c: New test.


Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/match.pd b/gcc/match.pd
index a052c9e..15c33f2 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6443,3 +6443,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
to the number of trailing zeroes.  */
 (match (ctz_table_index @1 @2 @3)
   (rshift (mult (bit_and:c (negate @1) @1) INTEGER_CST@2) INTEGER_CST@3))
+
+/* Recognize MULT_HIGHPART_EXPR.  */
+(simplify
+  (convert (rshift (mult:s (convert@3 @0) (convert @1))
+  (INTEGER_CST@2)))
+  (if (INTEGRAL_TYPE_P (type)
+   && INTEGRAL_TYPE_P (TREE_TYPE (@3))
+   && types_match (type, TREE_TYPE (@0))
+   && types_match (type, TREE_TYPE (@1))
+   && (TYPE_PRECISION (TREE_TYPE (@3))
+  >= 2 * TYPE_PRECISION (type))
+   && tree_fits_uhwi_p (@2)
+   && tree_to_uhwi (@2) == TYPE_PRECISION (type)
+   && TYPE_SIGN (TREE_TYPE (@3)) == TYPE_SIGN (type))
+(mult_highpart @0 @1)))
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 184827f..2416a69 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5870,6 +5870,52 @@ expand_vec_cmp_expr (tree type, tree exp, rtx target)
   return ops[0].value;
 }
 
+/* Helper function of expand_mult_highpart.  Expand a highpart
+   multiplication using a widening or wider multiplication.  */
+
+static rtx
+expand_mult_highpart_1 (machine_mode mode, rtx op0, rtx op1, bool uns_p)
+{
+  scalar_int_mode narrow_mode;
+  rtx tem = NULL_RTX;
+  optab t;
+
+  if (!is_a  (mode, _mode))
+return NULL_RTX;
+
+  scalar_int_mode wide_mode = GET_MODE_WIDER_MODE (narrow_mode).require ();
+
+  /* Try a widening multiplication.  */
+  t = uns_p ? umul_widen_optab : smul_widen_optab;
+  if (convert_optab_handler (t, wide_mode, narrow_mode) != CODE_FOR_nothing)
+tem = expand_binop (wide_mode, t, op0, op1, 0, uns_p, OPTAB_WIDEN);
+
+  /* If that fails, try a wider multiplication.  */
+  if (!tem)
+{
+  rtx_insn *insns;
+  rtx wop0, wop1;
+  start_sequence();
+  wop0 = convert_modes (wide_mode, narrow_mode, op0, uns_p);
+  wop1 = convert_modes (wide_mode, narrow_mode, op1, uns_p);
+  tem = expand_binop (wide_mode, smul_optab, wop0, wop1, 0,
+ uns_p, OPTAB_LIB_WIDEN);
+  insns = get_insns ();
+  end_sequence ();
+
+  if (!tem)
+   return NULL_RTX;
+
+  emit_insn (insns);
+}
+
+  if (narrow_mode == word_mode)
+return gen_highpart (narrow_mode, tem);
+  tem = expand_shift (RSHIFT_EXPR, wide_mode, tem,
+GET_MODE_BITSIZE (narrow_mode), 0, 1);
+  return convert_modes (narrow_mode, wide_mode, tem, 0);
+}
+
 /* Expand a highpart multiply.  */
 
 rtx
@@ -5887,7 +5933,8 @@ expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
   switch (method)
 {
 case 0:
-  return NULL_RTX;
+  /* We don't have an optab, try expanding this the hard way.  */
+  return expand_mult_highpart_1 (mode, op0, op1, uns_p);
 case 1:
   tab1 = uns_p ? umul_highpart_optab : smul_highpart_optab;
   return expand_binop (mode, tab1, op0, op1,

Re: [PATCH] Enable GCC support for AMX

2020-08-04 Thread Hongyu Wang via Gcc-patches

PING^3

Hongyu Wang  于2020年7月24日周五 下午1:41写道：
>
> PING^2
>
> Hongyu Wang  于2020年7月17日周五 下午1:40写道：
> >
> > Update for SAPPHIRERAPIDS and PING
> >
> > Hongyu Wang  于2020年7月7日周二 上午11:24写道：
> >
> > >
> > > Hi Kirill, could you help review this patch?
> > >
> > > Hongyu Wang  于2020年7月6日周一 上午9:58写道：
> > > >
> > > > Hi:
> > > >
> > > > This patch is about to support Intel Advanced Matrix Extensions (AMX)
> > > > which will be enabled in GLC.
> > > >
> > > > AMX is a new 64-bit programming paradigm consisting of two
> > > > compo nents: a set of 2-dimensional registers (tiles) representing
> > > > sub-arrays from a larger 2-dimensional memory image,
> > > > and an accelerator able to operate on tiles
> > > >
> > > > Supported instructions are
> > > >
> > > > AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
> > > > AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
> > > > AMX-BF16:tdpbf16ps
> > > >
> > > > The intrinsics adopts constant tile register number as its input 
> > > > parameters.
> > > >
> > > > For detailed information, please refer to
> > > > https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > > >
> > > > Bootstrap ok, regression test on i386/x86 backend is ok.
> > > >
> > > > OK for master?
> > > >
> > > > gcc/ChangeLog
> > > >
> > > > * common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
> > > > OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
> > > > OPTION_MASK_ISA2_AMX_TILE_UNSET,
> > > > OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET):
> > > > New marcos.
> > > > (ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
> > > > * common/config/i386/i386-cpuinfo.h (processor_types): Add
> > > > FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
> > > > * common/config/i386/cpuinfo.h (XSTATE_TILECFG,
> > > > XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
> > > > (get_available_features): Enable AMX features only if
> > > > their states are suoorited by OSXSAVE.
> > > > * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
> > > > for amx-tile, amx-int8, amx-bf16.
> > > > * config.gcc: Add amxtileintrin.h, amxint8intrin.h,
> > > > amxbf16intrin.h to extra headers.
> > > > * config/i386/amxbf16intrin.h: New file.
> > > > * config/i386/amxint8intrin.h: Ditto.
> > > > * config/i386/amxtileintrin.h: Ditto.
> > > > * config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
> > > > New macro.
> > > > * config/i386/i386-c.c (ix86_target_macros_internal): Define
> > > > __AMX_TILE__, __AMX_INT8__, AMX_BF16__.
> > > > * config/i386/i386-options.c (ix86_target_string): Add
> > > > -mamx-tile, -mamx-int8, -mamx-bf16.
> > > > (ix86_option_override_internal): Handle AMX-TILE,
> > > > AMX-INT8, AMX-BF16.
> > > > * config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
> > > > TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
> > > > PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
> > > > * config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
> > > > * config/i386/immintrin.h: Include amxtileintrin.h,
> > > > amxint8intrin.h, amxbf16intrin.h.
> > > > * doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
> > > > * doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
> > > > * doc/sourcebuild.texi ((Effective-Target Keywords, Other
> > > > hardware attributes): Document amx_int8, amx_tile, amx_bf16.
> > > >
> > > > gcc/testsuite/ChangeLog
> > > >
> > > > * lib/target-supports.exp (check_effective_target_amx_tile,
> > > > check_effective_target_amx_int8,
> > > > check_effective_target_amx_bf16): New proc.
> > > > * g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
> > > > * g++.dg/other/i386-3.C: Ditto.
> > > > * gcc.target/i386/sse-12.c: Ditto.
> > > > * gcc.target/i386/sse-13.c: Ditto.
> > > > * gcc.target/i386/sse-14.c: Ditto.
> > > > * gcc.target/i386/sse-22.c: Ditto.
> > > > * gcc.target/i386/sse-23.c: Ditto.
> > > > * gcc.target/i386/funcspec-56.inc: Add new target attribute.
> > > > * gcc.target/i386/amxbf16-asmatt-1.c: New test.
> > > > * gcc.target/i386/amxint8-asmatt-1.c: Ditto.
> > > > * gcc.target/i386/amxtile-asmatt-1.c: Ditto.
> > > > * gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
> > > > * gcc.target/i386/amxint8-asmintel-1.c: Ditto.
> > > > * gcc.target/i386/amxtile-asmintel-1.c: Ditto.
> > > > * gcc.target/i386/amxbf16-asmatt-2.c: Ditto.
> > > > * gcc.target/i386/amxint8-asmatt-2.c: Ditto.
> > > > * gcc.target/i386/amxtile-asmatt-2.c: Ditto.
> > > > * gcc.target/i386/amxbf16-asmintel-2.c: Ditto.
> > > > * gcc.target/i386/amxint8-asmintel-2.c: Ditto.
> > > > * gcc.target/i386/amxtile-asmintel-2.c: Ditto.

[PATCH 2/2] Decouple adjust_range_from_scev from vr_values and value_range_equiv.

2020-08-04 Thread Aldy Hernandez via Gcc-patches

I've abstracted out the parts of the code that had nothing to do with
value_range_equiv into an externally visible range_of_var_in_loop().
This way, it can be called with any range.

adjust_range_with_scev still works as before, intersecting with a
known range.  Due to the way value_range_equiv::intersect works,
intersecting a value_range_equiv with no equivalences into one
with equivalences will result in the resulting range maintaining
whatever equivalences it had.  So everything works as the
vr->update() did before (remember that ::update() retains
equivalences).

OK?

gcc/ChangeLog:

* vr-values.c (check_for_binary_op_overflow): Change type of store
to range_query.
(vr_values::adjust_range_with_scev): Abstract most of the code...
(range_of_var_in_loop): ...here.  Remove value_range_equiv uses.
(simplify_using_ranges::simplify_using_ranges): Change type of store
to range_query.
* vr-values.h (class range_query): New.
(class simplify_using_ranges): Use range_query.
(class vr_values): Add OVERRIDE to get_value_range.
(range_of_var_in_loop): New.
---
 gcc/vr-values.c | 140 ++--
 gcc/vr-values.h |  23 ++--
 2 files changed, 81 insertions(+), 82 deletions(-)

diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 9002d87c14b..e7f97bdbf7b 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -1004,7 +1004,7 @@ vr_values::extract_range_from_comparison 
(value_range_equiv *vr,
overflow.  */
 
 static bool
-check_for_binary_op_overflow (vr_values *store,
+check_for_binary_op_overflow (range_query *store,
  enum tree_code subcode, tree type,
  tree op0, tree op1, bool *ovf)
 {
@@ -1737,22 +1737,18 @@ compare_range_with_value (enum tree_code comp, const 
value_range *vr,
 
   gcc_unreachable ();
 }
+
 /* Given a range VR, a LOOP and a variable VAR, determine whether it
would be profitable to adjust VR using scalar evolution information
for VAR.  If so, update VR with the new limits.  */
 
 void
-vr_values::adjust_range_with_scev (value_range_equiv *vr, class loop *loop,
-  gimple *stmt, tree var)
+range_of_var_in_loop (irange *vr, range_query *query,
+ class loop *loop, gimple *stmt, tree var)
 {
-  tree init, step, chrec, tmin, tmax, min, max, type, tem;
+  tree init, step, chrec, tmin, tmax, min, max, type;
   enum ev_direction dir;
 
-  /* TODO.  Don't adjust anti-ranges.  An anti-range may provide
- better opportunities than a regular range, but I'm not sure.  */
-  if (vr->kind () == VR_ANTI_RANGE)
-return;
-
   chrec = instantiate_parameters (loop, analyze_scalar_evolution (loop, var));
 
   /* Like in PR19590, scev can return a constant function.  */
@@ -1763,16 +1759,17 @@ vr_values::adjust_range_with_scev (value_range_equiv 
*vr, class loop *loop,
 }
 
   if (TREE_CODE (chrec) != POLYNOMIAL_CHREC)
-return;
+{
+  vr->set_varying (TREE_TYPE (var));
+  return;
+}
 
   init = initial_condition_in_loop_num (chrec, loop->num);
-  tem = op_with_constant_singleton_value_range (init);
-  if (tem)
-init = tem;
+  if (TREE_CODE (init) == SSA_NAME)
+query->get_value_range (init, stmt)->singleton_p ();
   step = evolution_part_in_loop_num (chrec, loop->num);
-  tem = op_with_constant_singleton_value_range (step);
-  if (tem)
-step = tem;
+  if (TREE_CODE (step) == SSA_NAME)
+query->get_value_range (step, stmt)->singleton_p ();
 
   /* If STEP is symbolic, we can't know whether INIT will be the
  minimum or maximum value in the range.  Also, unless INIT is
@@ -1781,7 +1778,10 @@ vr_values::adjust_range_with_scev (value_range_equiv 
*vr, class loop *loop,
   if (step == NULL_TREE
   || !is_gimple_min_invariant (step)
   || !valid_value_p (init))
-return;
+{
+  vr->set_varying (TREE_TYPE (var));
+  return;
+}
 
   dir = scev_direction (chrec);
   if (/* Do not adjust ranges if we do not know whether the iv increases
@@ -1790,7 +1790,10 @@ vr_values::adjust_range_with_scev (value_range_equiv 
*vr, class loop *loop,
   /* ... or if it may wrap.  */
   || scev_probably_wraps_p (NULL_TREE, init, step, stmt,
get_chrec_loop (chrec), true))
-return;
+{
+  vr->set_varying (TREE_TYPE (var));
+  return;
+}
 
   type = TREE_TYPE (var);
   if (POINTER_TYPE_P (type) || !TYPE_MIN_VALUE (type))
@@ -1807,7 +1810,7 @@ vr_values::adjust_range_with_scev (value_range_equiv *vr, 
class loop *loop,
   if (TREE_CODE (step) == INTEGER_CST
   && is_gimple_val (init)
   && (TREE_CODE (init) != SSA_NAME
- || get_value_range (init, stmt)->kind () == VR_RANGE))
+ || query->get_value_range (init, stmt)->kind () == VR_RANGE))
 {
   widest_int nit;
 
@@ -1830,21 +1833,32 @@ vr_values::adjust_range_with_scev (value_range_equiv 
*vr, class

[PATCH 1/2] Add statement context to get_value_range.

2020-08-04 Thread Aldy Hernandez via Gcc-patches

This is in line with the statement context that we have for get_value()
in the substitute_and_fold_engine class.
---
 gcc/vr-values.c | 64 ++---
 gcc/vr-values.h | 14 +--
 2 files changed, 41 insertions(+), 37 deletions(-)

diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 511342f2f13..9002d87c14b 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -147,7 +147,8 @@ vr_values::get_lattice_entry (const_tree var)
return NULL.  Otherwise create an empty range if none existed for VAR.  */
 
 const value_range_equiv *
-vr_values::get_value_range (const_tree var)
+vr_values::get_value_range (const_tree var,
+   gimple *stmt ATTRIBUTE_UNUSED)
 {
   /* If we have no recorded ranges, then return NULL.  */
   if (!vr_value)
@@ -450,7 +451,7 @@ simplify_using_ranges::op_with_boolean_value_range_p (tree 
op)
 
   /* ?? Errr, this should probably check for [0,0] and [1,1] as well
  as [0,1].  */
-  const value_range *vr = get_value_range (op);
+  const value_range *vr = get_value_range (op, NULL);
   return *vr == value_range (build_zero_cst (TREE_TYPE (op)),
 build_one_cst (TREE_TYPE (op)));
 }
@@ -972,12 +973,13 @@ vr_values::extract_range_from_cond_expr 
(value_range_equiv *vr, gassign *stmt)
 
 void
 vr_values::extract_range_from_comparison (value_range_equiv *vr,
+ gimple *stmt,
  enum tree_code code,
  tree type, tree op0, tree op1)
 {
   bool sop;
   tree val
-= simplifier.vrp_evaluate_conditional_warnv_with_ops (code, op0, op1,
+= simplifier.vrp_evaluate_conditional_warnv_with_ops (stmt, code, op0, op1,
  false, , NULL);
   if (val)
 {
@@ -1008,14 +1010,14 @@ check_for_binary_op_overflow (vr_values *store,
 {
   value_range vr0, vr1;
   if (TREE_CODE (op0) == SSA_NAME)
-vr0 = *store->get_value_range (op0);
+vr0 = *store->get_value_range (op0, NULL);
   else if (TREE_CODE (op0) == INTEGER_CST)
 vr0.set (op0);
   else
 vr0.set_varying (TREE_TYPE (op0));
 
   if (TREE_CODE (op1) == SSA_NAME)
-vr1 = *store->get_value_range (op1);
+vr1 = *store->get_value_range (op1, NULL);
   else if (TREE_CODE (op1) == INTEGER_CST)
 vr1.set (op1);
   else
@@ -1472,7 +1474,7 @@ vr_values::extract_range_from_assignment 
(value_range_equiv *vr, gassign *stmt)
   else if (code == COND_EXPR)
 extract_range_from_cond_expr (vr, stmt);
   else if (TREE_CODE_CLASS (code) == tcc_comparison)
-extract_range_from_comparison (vr, gimple_assign_rhs_code (stmt),
+extract_range_from_comparison (vr, stmt, gimple_assign_rhs_code (stmt),
   gimple_expr_type (stmt),
   gimple_assign_rhs1 (stmt),
   gimple_assign_rhs2 (stmt));
@@ -1805,7 +1807,7 @@ vr_values::adjust_range_with_scev (value_range_equiv *vr, 
class loop *loop,
   if (TREE_CODE (step) == INTEGER_CST
   && is_gimple_val (init)
   && (TREE_CODE (init) != SSA_NAME
- || get_value_range (init)->kind () == VR_RANGE))
+ || get_value_range (init, stmt)->kind () == VR_RANGE))
 {
   widest_int nit;
 
@@ -1838,7 +1840,7 @@ vr_values::adjust_range_with_scev (value_range_equiv *vr, 
class loop *loop,
  value_range initvr;
 
  if (TREE_CODE (init) == SSA_NAME)
-   initvr = *(get_value_range (init));
+   initvr = *(get_value_range (init, stmt));
  else if (is_gimple_min_invariant (init))
initvr.set (init);
  else
@@ -2090,7 +2092,7 @@ const value_range_equiv *
 simplify_using_ranges::get_vr_for_comparison (int i, value_range_equiv *tem)
 {
   /* Shallow-copy equiv bitmap.  */
-  const value_range_equiv *vr = get_value_range (ssa_name (i));
+  const value_range_equiv *vr = get_value_range (ssa_name (i), NULL);
 
   /* If name N_i does not have a valid range, use N_i as its own
  range.  This allows us to compare against names that may
@@ -2115,7 +2117,7 @@ simplify_using_ranges::compare_name_with_value
 bool *strict_overflow_p, bool use_equiv_p)
 {
   /* Get the set of equivalences for VAR.  */
-  bitmap e = get_value_range (var)->equiv ();
+  bitmap e = get_value_range (var, NULL)->equiv ();
 
   /* Start at -1.  Set it to 0 if we do a comparison without relying
  on overflow, or 1 if all comparisons rely on overflow.  */
@@ -2195,8 +2197,8 @@ simplify_using_ranges::compare_names (enum tree_code 
comp, tree n1, tree n2,
 {
   /* Compare the ranges of every name equivalent to N1 against the
  ranges of every name equivalent to N2.  */
-  bitmap e1 = get_value_range (n1)->equiv ();
-  bitmap e2 = get_value_range (n2)->equiv ();
+  bitmap e1 = get_value_range (n1, NULL)->equiv ();
+

[PATCH 0/2] decouple adjust_range_from_scev from vr_values

2020-08-04 Thread Aldy Hernandez via Gcc-patches

The goal here is to disassociate adjust_range_from_scev from vr_values,
and value_range_equiv while we're at it.

We've already done something similar with simplify_using_ranges, where we
take in a "store" which is a class providing get_value_range().  Initially
we set it up to take a vr_values, but the ultimate purpose was to have
it work with either *vrp or the ranger.  As such, I have abstracted
out the get_value_range() method into its own abstract class from
which vr_values inherits, and provides said method.

As I did for the substitute_and_fold_engine, I future proofed
get_value_range so it takes a gimple statement.  This provides context for
the SSA being queried.  I purposely did not provide a default statement
of NULL, as I want each caller to pass a statement if available.

This patchset is divided in two: one patch to provide the additional
argument to get_value_range and one to do the ripping apart in
adjust_range_from_scev.  I will discuss the SCEV part in the relevant
patch.

Aldy

[PATCH] Adjust tree-ssa-dom.c for irange API.

2020-08-04 Thread Aldy Hernandez via Gcc-patches

This patch removes all uses of VR_ANTI_RANGE in DOM.  It required
minor surgery in the switch handling code.

In doing so, I was able to abstract all the code handling the cases
with ranges into its own function.  Interestingly, there is an exact
copy of this function in VRP, so I was able to use that there too.

I also saw that most of simplify_stmt_for_jump_threading() is
duplicated in VRP/DOM, but I left that alone.  The amount of
duplicated code in this space is mind boggling.

OK?

gcc/ChangeLog:

* tree-ssa-dom.c (simplify_stmt_for_jump_threading): Abstract code out 
to...
* tree-vrp.c (find_case_label_range): ...here.  Rewrite for to use 
irange
API.
(simplify_stmt_for_jump_threading): Call find_case_label_range instead 
of
duplicating the code in simplify_stmt_for_jump_threading.
* tree-vrp.h (find_case_label_range): New prototype.
---
 gcc/tree-ssa-dom.c |  56 +++---
 gcc/tree-vrp.c | 117 +++--
 gcc/tree-vrp.h |   1 +
 3 files changed, 67 insertions(+), 107 deletions(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 69eaec345bf..de5025f3879 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -868,7 +868,11 @@ make_pass_dominator (gcc::context *ctxt)
 static class vr_values *x_vr_values;
 
 /* A trivial wrapper so that we can present the generic jump
-   threading code with a simple API for simplifying statements.  */
+   threading code with a simple API for simplifying statements.
+
+   ?? This should be cleaned up.  There's a virtually identical copy
+   of this function in tree-vrp.c.  */
+
 static tree
 simplify_stmt_for_jump_threading (gimple *stmt,
  gimple *within_stmt ATTRIBUTE_UNUSED,
@@ -901,55 +905,7 @@ simplify_stmt_for_jump_threading (gimple *stmt,
return NULL_TREE;
 
   const value_range_equiv *vr = x_vr_values->get_value_range (op);
-  if (vr->undefined_p ()
- || vr->varying_p ()
- || vr->symbolic_p ())
-   return NULL_TREE;
-
-  if (vr->kind () == VR_RANGE)
-   {
- size_t i, j;
-
- find_case_label_range (switch_stmt, vr->min (), vr->max (), , );
-
- /* Is there only one such label?  */
- if (i == j)
-   {
- tree label = gimple_switch_label (switch_stmt, i);
- tree singleton;
-
- /* The i'th label will only be taken if the value range of the
-operand is entirely within the bounds of this label.  */
- if (CASE_HIGH (label) != NULL_TREE
- ? (tree_int_cst_compare (CASE_LOW (label), vr->min ()) <= 0
-&& tree_int_cst_compare (CASE_HIGH (label), vr->max ()) >= 
0)
- : (vr->singleton_p ()
-&& tree_int_cst_equal (CASE_LOW (label), singleton)))
-   return label;
-   }
-
- /* If there are no such labels, then the default label
-will be taken.  */
- if (i > j)
-   return gimple_switch_label (switch_stmt, 0);
-   }
-
-  if (vr->kind () == VR_ANTI_RANGE)
-  {
-unsigned n = gimple_switch_num_labels (switch_stmt);
-tree min_label = gimple_switch_label (switch_stmt, 1);
-tree max_label = gimple_switch_label (switch_stmt, n - 1);
-
-/* The default label will be taken only if the anti-range of the
-   operand is entirely outside the bounds of all the (non-default)
-   case labels.  */
-if (tree_int_cst_compare (vr->min (), CASE_LOW (min_label)) <= 0
-&& (CASE_HIGH (max_label) != NULL_TREE
-? tree_int_cst_compare (vr->max (), CASE_HIGH (max_label)) 
>= 0
-: tree_int_cst_compare (vr->max (), CASE_LOW (max_label)) 
>= 0))
-return gimple_switch_label (switch_stmt, 0);
-  }
-   return NULL_TREE;
+  return find_case_label_range (switch_stmt, vr);
 }
 
   if (gassign *assign_stmt = dyn_cast  (stmt))
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index de84c1d505d..8c1a1854daa 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3802,6 +3802,61 @@ find_case_label_range (gswitch *stmt, tree min, tree 
max, size_t *min_idx,
 }
 }
 
+/* Given a SWITCH_STMT, return the case label that encompasses the
+   known possible values for the switch operand.  RANGE_OF_OP is a
+   range for the known values of the switch operand.  */
+
+tree
+find_case_label_range (gswitch *switch_stmt, const irange *range_of_op)
+{
+  if (range_of_op->undefined_p ()
+  || range_of_op->varying_p ()
+  || range_of_op->symbolic_p ())
+return NULL_TREE;
+
+  size_t i, j;
+  tree op = gimple_switch_index (switch_stmt);
+  tree type = TREE_TYPE (op);
+  tree tmin = wide_int_to_tree (type, range_of_op->lower_bound ());
+  tree tmax = wide_int_to_tree (type, range_of_op->upper_bound ());
+  find_case_label_range

[PATCH] Adjust tree-ssa-strlen.c for irange API.

2020-08-04 Thread Aldy Hernandez via Gcc-patches

This patch adapts the strlen pass to use the irange API.

I wasn't able to remove the one annoying use of VR_ANTI_RANGE, because
I'm not sure what to do.  Perhaps Martin can shed some light.  The
current code has:

  else if (rng == VR_ANTI_RANGE)
{
  wide_int maxobjsize = wi::to_wide (TYPE_MAX_VALUE 
(ptrdiff_type_node));
  if (wi::ltu_p (cntrange[1], maxobjsize))
{
  cntrange[0] = cntrange[1] + 1;
  cntrange[1] = maxobjsize;

Suppose we have ~[10,20], won't the above set cntrange[] to [21,MAX]?  Won't
this ignore the 0..9 that is part of the range?  What should we do here?

Anyways, I've left the anti-range in place, but the rest of the patch still
stands.

OK?

gcc/ChangeLog:

* tree-ssa-strlen.c (get_range): Adjust for irange API.
(compare_nonzero_chars): Same.
(dump_strlen_info): Same.
(get_range_strlen_dynamic): Same.
(set_strlen_range): Same.
(maybe_diag_stxncpy_trunc): Same.
(get_len_or_size): Same.
(count_nonzero_bytes_addr): Same.
(handle_integral_assign): Same.
---
 gcc/tree-ssa-strlen.c | 122 --
 1 file changed, 57 insertions(+), 65 deletions(-)

diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index fbaee745f7d..e6009874ee5 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -220,21 +220,25 @@ get_range (tree val, wide_int minmax[2], const vr_values 
*rvals /* = NULL */)
 GCC 11).  */
   const value_range *vr
= (CONST_CAST (class vr_values *, rvals)->get_value_range (val));
-  value_range_kind rng = vr->kind ();
-  if (rng != VR_RANGE || !range_int_cst_p (vr))
+  if (vr->undefined_p () || vr->varying_p ())
return NULL_TREE;
 
-  minmax[0] = wi::to_wide (vr->min ());
-  minmax[1] = wi::to_wide (vr->max ());
+  minmax[0] = vr->lower_bound ();
+  minmax[1] = vr->upper_bound ();
   return val;
 }
 
-  value_range_kind rng = get_range_info (val, minmax, minmax + 1);
-  if (rng == VR_RANGE)
-return val;
+  value_range vr;
+  get_range_info (val, vr);
+  if (!vr.undefined_p () && !vr.varying_p ())
+{
+  minmax[0] = vr.lower_bound ();
+  minmax[1] = vr.upper_bound ();
+  return val;
+}
 
-  /* Do not handle anti-ranges and instead make use of the on-demand
- VRP if/when it becomes available (hopefully in GCC 11).  */
+  /* We should adjust for the on-demand VRP if/when it becomes
+ available (hopefully in GCC 11).  */
   return NULL_TREE;
 }
 
@@ -278,16 +282,18 @@ compare_nonzero_chars (strinfo *si, unsigned 
HOST_WIDE_INT off,
 = (CONST_CAST (class vr_values *, rvals)
->get_value_range (si->nonzero_chars));
 
-  value_range_kind rng = vr->kind ();
-  if (rng != VR_RANGE || !range_int_cst_p (vr))
+  if (vr->undefined_p () || vr->varying_p ())
 return -1;
 
   /* If the offset is less than the minimum length or if the bounds
  of the length range are equal return the result of the comparison
  same as in the constant case.  Otherwise return a conservative
  result.  */
-  int cmpmin = compare_tree_int (vr->min (), off);
-  if (cmpmin > 0 || tree_int_cst_equal (vr->min (), vr->max ()))
+  tree type = TREE_TYPE (si->nonzero_chars);
+  tree tmin = wide_int_to_tree (type, vr->lower_bound ());
+  tree tmax = wide_int_to_tree (type, vr->upper_bound ());
+  int cmpmin = compare_tree_int (tmin, off);
+  if (cmpmin > 0 || tree_int_cst_equal (tmin, tmax))
 return cmpmin;
 
   return -1;
@@ -905,32 +911,14 @@ dump_strlen_info (FILE *fp, gimple *stmt, const vr_values 
*rvals)
  print_generic_expr (fp, si->nonzero_chars);
  if (TREE_CODE (si->nonzero_chars) == SSA_NAME)
{
- value_range_kind rng = VR_UNDEFINED;
- wide_int min, max;
+ value_range vr;
  if (rvals)
-   {
- const value_range *vr
-   = CONST_CAST (class vr_values *, rvals)
-   ->get_value_range (si->nonzero_chars);
- rng = vr->kind ();
- if (range_int_cst_p (vr))
-   {
- min = wi::to_wide (vr->min ());
- max = wi::to_wide (vr->max ());
-   }
- else
-   rng = VR_UNDEFINED;
-   }
+   vr = *(CONST_CAST (class vr_values *, rvals)
+  ->get_value_range (si->nonzero_chars));
  else
-   rng = get_range_info (si->nonzero_chars, , );
-
- if (rng == VR_RANGE || rng == VR_ANTI_RANGE)
-   {
- fprintf (fp, " %s[%llu, %llu]",
-  rng

[PATCH] Adjust gimple-ssa-sprintf.c for irange API.

2020-08-04 Thread Aldy Hernandez via Gcc-patches

This is a rather obvious patch, but I'd like a nod before committing.

Martin, I've removed your anti-range check, as it is subsumed by the
lower_bound/upper_bound code.  However, you will have to adapt the code
for multi-ranges if desired.  For example, you may want to loop through the
sub-ranges and do the right thing.  Look at value-range.h and see the comments
for class irange.  Those are the methods you should stick to.

i.e.
for (i=0; i < vr->num_pairs(); ++i)
stuff_with(vr->lower_bound(i), vr->upper_bound(i))

There should be no functional changes with this patch.

Aldy

gcc/ChangeLog:

* gimple-ssa-sprintf.c (get_int_range): Adjust for irange API.
(format_integer): Same.
(handle_printf_call): Same.
---
 gcc/gimple-ssa-sprintf.c | 37 -
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 3d77459d811..70b031fe7b9 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -1070,7 +1070,7 @@ get_int_range (tree arg, HOST_WIDE_INT *pmin, 
HOST_WIDE_INT *pmax,
  const value_range_equiv *vr
= CONST_CAST (class vr_values *, vr_values)->get_value_range (arg);
 
- if (range_int_cst_p (vr))
+ if (!vr->undefined_p () && !vr->varying_p () && !vr->symbolic_p ())
{
  HOST_WIDE_INT type_min
= (TYPE_UNSIGNED (argtype)
@@ -1079,8 +1079,11 @@ get_int_range (tree arg, HOST_WIDE_INT *pmin, 
HOST_WIDE_INT *pmax,
 
  HOST_WIDE_INT type_max = tree_to_uhwi (TYPE_MAX_VALUE (argtype));
 
- *pmin = TREE_INT_CST_LOW (vr->min ());
- *pmax = TREE_INT_CST_LOW (vr->max ());
+ tree type = TREE_TYPE (arg);
+ tree tmin = wide_int_to_tree (type, vr->lower_bound ());
+ tree tmax = wide_int_to_tree (type, vr->upper_bound ());
+ *pmin = TREE_INT_CST_LOW (tmin);
+ *pmax = TREE_INT_CST_LOW (tmax);
 
  if (*pmin < *pmax)
{
@@ -1372,10 +1375,10 @@ format_integer (const directive , tree arg, const 
vr_values *vr_values)
   const value_range_equiv *vr
= CONST_CAST (class vr_values *, vr_values)->get_value_range (arg);
 
-  if (range_int_cst_p (vr))
+  if (!vr->varying_p () && !vr->undefined_p () && !vr->symbolic_p ())
{
- argmin = vr->min ();
- argmax = vr->max ();
+ argmin = wide_int_to_tree (TREE_TYPE (arg), vr->lower_bound ());
+ argmax = wide_int_to_tree (TREE_TYPE (arg), vr->upper_bound ());
 
  /* Set KNOWNRANGE if the argument is in a known subrange
 of the directive's type and neither width nor precision
@@ -1388,11 +1391,7 @@ format_integer (const directive , tree arg, const 
vr_values *vr_values)
  res.argmin = argmin;
  res.argmax = argmax;
}
-  else if (vr->kind () == VR_ANTI_RANGE)
-   {
- /* Handle anti-ranges if/when bug 71690 is resolved.  */
-   }
-  else if (vr->varying_p () || vr->undefined_p ())
+  else
{
  /* The argument here may be the result of promoting the actual
 argument to int.  Try to determine the type of the actual
@@ -4561,10 +4560,13 @@ handle_printf_call (gimple_stmt_iterator *gsi, const 
vr_values *vr_values)
  const value_range_equiv *vr
= CONST_CAST (class vr_values *, vr_values)->get_value_range (size);
 
- if (range_int_cst_p (vr))
+ if (!vr->undefined_p () && !vr->symbolic_p ())
{
- unsigned HOST_WIDE_INT minsize = TREE_INT_CST_LOW (vr->min ());
- unsigned HOST_WIDE_INT maxsize = TREE_INT_CST_LOW (vr->max ());
+ tree type = TREE_TYPE (size);
+ tree tmin = wide_int_to_tree (type, vr->lower_bound ());
+ tree tmax = wide_int_to_tree (type, vr->upper_bound ());
+ unsigned HOST_WIDE_INT minsize = TREE_INT_CST_LOW (tmin);
+ unsigned HOST_WIDE_INT maxsize = TREE_INT_CST_LOW (tmax);
  dstsize = warn_level < 2 ? maxsize : minsize;
 
  if (minsize > target_int_max ())
@@ -4578,13 +4580,6 @@ handle_printf_call (gimple_stmt_iterator *gsi, const 
vr_values *vr_values)
  if (maxsize > target_int_max ())
posunder4k = false;
}
- else if (vr->varying_p ())
-   {
- /* POSIX requires snprintf to fail if DSTSIZE is greater
-than INT_MAX.  Since SIZE's range is unknown, avoid
-folding.  */
- posunder4k = false;
-   }
 
  /* The destination size is not constant.  If the function is
 bounded (e.g., snprintf) a lower bound of zero doesn't
-- 
2.26.2

RE: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion

2020-08-04 Thread Sudakshina Das

Hi Richard

> -Original Message-
> From: Richard Sandiford 
> Sent: 31 July 2020 16:14
> To: Sudakshina Das 
> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov 
> Subject: Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem
> expansion
> 
> Sudakshina Das  writes:
> > Hi
> >
> > This is my attempt at reviving the old patch
> > https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html
> >
> > I have followed on Kyrill's comment upstream on the link above and I am
> using the recommended option iii that he mentioned.
> > "1) Adjust the copy_limit to 256 bits after checking
> AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning.
> >  2) Adjust aarch64_copy_one_block_and_progress_pointers to handle 256-
> bit moves. by iii:
> >iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs 
> > ldp/stps. This
> wouldn't need any adjustments to
> > MD patterns, but would make
> aarch64_copy_one_block_and_progress_pointers more complex as it would
> now have
> > two paths, where one handles two adjacent memory addresses in one
> calls."
> >
> > With this patch the following test
> >
> > #define N 8
> > extern int src[N], dst[N];
> >
> > void
> > foo (void)
> > {
> >   __builtin_memcpy (dst, src, N * sizeof (int)); }
> >
> > which was originally giving
> > foo:
> > adrpx1, src
> > add x1, x1, :lo12:src
> > ldp x4, x5, [x1]
> > adrpx0, dst
> > add x0, x0, :lo12:dst
> > ldp x2, x3, [x1, 16]
> > stp x4, x5, [x0]
> > stp x2, x3, [x0, 16]
> > ret
> >
> >
> > changes to the following
> > foo:
> > adrpx1, src
> > add x1, x1, :lo12:src
> > adrpx0, dst
> > add x0, x0, :lo12:dst
> > ldp q1, q0, [x1]
> > stp q1, q0, [x0]
> > ret
> >
> > This gives about 1.3% improvement on 523.xalancbmk_r in SPEC2017 and
> > an overall code size reduction on most
> > SPEC2017 Int benchmarks on Neoverse N1 due to more LDP/STP Q pair
> registers.
> 
> Sorry for the slow review.  LGTM with a very minor nit (sorry)…

Thanks. Committed with the change.
> 
> > @@ -21150,9 +21177,12 @@ aarch64_expand_cpymem (rtx *operands)
> >/* Convert n to bits to make the rest of the code simpler.  */
> >n = n * BITS_PER_UNIT;
> >
> > -  /* Maximum amount to copy in one go.  The AArch64 back-end has
> integer modes
> > - larger than TImode, but we should not use them for loads/stores here.
> */
> > -  const int copy_limit = GET_MODE_BITSIZE (TImode);
> > +  /* Maximum amount to copy in one go.  We allow 256-bit chunks based
> on the
> > + AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter and
> > +TARGET_SIMD.  */
> > +  const int copy_limit = ((aarch64_tune_params.extra_tuning_flags
> > +  & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
> > + || !TARGET_SIMD)
> > +? GET_MODE_BITSIZE (TImode) :  256;
> 
> Should only be one space before “256”.
> 
> I guess at some point we should consider handling fixed-length SVE too, but
> that's only worth it for -msve-vector-bits=512 and higher.

Yes sure I will add this for future backlog.
> 
> Thanks,
> Richard

Re: [PATCH] [AVX512]For vector compare to mask register, UNSPEC is needed instead of comparison operator [PR96243]

2020-08-04 Thread Kirill Yukhin via Gcc-patches

On 04 авг 13:26, Kirill Yukhin wrote:
> Could you please clarify, how your patch relared to [1]?
> I see from the bug that it describes perf issue w.r.t. scalar
> operations.

[1] - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96226

> 
> --
> Regards, Kirill Yukhin

Re: [PATCH] Aarch64: Add missing clobber for fjcvtzs

2020-08-04 Thread Andrea Corallo

Kyrylo Tkachov  writes:

> I just remembered a recurring bit of review feedback from Ramana on my 
> patches...
> New effective target checks need to be documented in doc/sourcebuild.texi.
>
> Ok with the documentation.
> Thanks,
> Kyrill

Hi Kyrill,

thanks for catching that.

The attached is committed into master as d2b86e14c14.

  Andrea

>From d2b86e14c14020f3e119ab8f462e2a91bd7d46e5 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Wed, 29 Jul 2020 19:04:40 +0200
Subject: [PATCH] aarch64: Add missing clobber for fjcvtzs

gcc/ChangeLog

2020-07-30  Andrea Corallo  

* config/aarch64/aarch64.md (aarch64_fjcvtzs): Add missing
clobber.
* doc/sourcebuild.texi (aarch64_fjcvtzs_hw) Document new
target supports option.

gcc/testsuite/ChangeLog

2020-07-30  Andrea Corallo  

* gcc.target/aarch64/acle/jcvt_2.c: New testcase.
* lib/target-supports.exp
(check_effective_target_aarch64_fjcvtzs_hw): Add new check for
FJCVTZS hw.
---
 gcc/config/aarch64/aarch64.md |  3 +-
 gcc/doc/sourcebuild.texi  |  3 ++
 .../gcc.target/aarch64/acle/jcvt_2.c  | 33 +++
 gcc/testsuite/lib/target-supports.exp | 21 
 4 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/jcvt_2.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d5ca1898c02..df780b86370 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7059,7 +7059,8 @@
 (define_insn "aarch64_fjcvtzs"
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:DF 1 "register_operand" "w")]
-  UNSPEC_FJCVTZS))]
+  UNSPEC_FJCVTZS))
+   (clobber (reg:CC CC_REGNUM))]
   "TARGET_JSCVT"
   "fjcvtzs\\t%w0, %d1"
   [(set_attr "type" "f_cvtf2i")]
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index a7a922d84a2..63216a0daba 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2063,6 +2063,9 @@ whether it does so by default).
 @itemx aarch64_sve2048_hw
 Like @code{aarch64_sve_hw}, but also test for an exact hardware vector length.
 
+@item aarch64_fjcvtzs_hw
+AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
+instruction.
 @end table
 
 @subsubsection MIPS-specific attributes
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/jcvt_2.c 
b/gcc/testsuite/gcc.target/aarch64/acle/jcvt_2.c
new file mode 100644
index 000..ea2dfd14cf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/jcvt_2.c
@@ -0,0 +1,33 @@
+/* Test the __jcvt ACLE intrinsic.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -march=armv8.3-a -save-temps" } */
+/* { dg-require-effective-target aarch64_fjcvtzs_hw } */
+
+#include 
+
+extern void abort (void);
+
+#ifdef __ARM_FEATURE_JCVT
+volatile int32_t x;
+
+int __attribute__((noinline))
+foo (double a, int b, int c)
+{
+  b = b > c;
+  x = __jcvt (a);
+  return b;
+}
+
+int
+main (void)
+{
+  int x = foo (1.1, 2, 3);
+  if (x)
+abort ();
+
+  return 0;
+}
+
+#endif
+
+/* { dg-final { scan-assembler-times "fjcvtzs\tw\[0-9\]+, d\[0-9\]+\n" 1 } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index ba9db0be2f9..e79015b4d54 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4848,6 +4848,27 @@ proc check_effective_target_aarch64_bti_hw { } {
 } "-O2" ]
 }
 
+# Return 1 if the target supports executing the armv8.3-a FJCVTZS
+# instruction.
+proc check_effective_target_aarch64_fjcvtzs_hw { } {
+if { ![istarget aarch64*-*-*] } {
+   return 0
+}
+return [check_runtime aarch64_fjcvtzs_hw_available {
+   int
+   main (void)
+   {
+ double in = 25.1;
+ int out;
+ asm volatile ("fjcvtzs %w0, %d1"
+   : "=r" (out)
+   : "w" (in)
+   : /* No clobbers.  */);
+ return out != 25;
+   }
+} "-march=armv8.3-a" ]
+}
+
 # Return 1 if GCC was configured with --enable-standard-branch-protection
 proc check_effective_target_default_branch_protection { } {
 return [check_configured_with "enable-standard-branch-protection"]
-- 
2.17.1

Re: [PATCH] [AVX512]For vector compare to mask register, UNSPEC is needed instead of comparison operator [PR96243]

2020-08-04 Thread Kirill Yukhin via Gcc-patches

Hello,

On 20 июл 13:46, Hongtao Liu wrote:
> Hi:
>   For rtx like (eq:HI (V8SI 90) (V8SI 91)), cse will take it as a
> boolean value and try to do some optimization. But it is not true for
> vector compare, also other places in rtl passes hold the same
> assumption.
> 
> Bootstrap is ok, regression test is ok for i386 backend.
> 
> 2020-07-20  Hongtao Liu  
> 
> gcc/
> PR target/96226

Could you please clarify, how your patch relared to [1]?
I see from the bug that it describes perf issue w.r.t. scalar
operations.

--
Regards, Kirill Yukhin

[committed][nvptx] Handle V2DI/V2SI mode in nvptx_gen_shuffle

2020-08-04 Thread Tom de Vries

Hi,

With the pr96628-part1.f90 source and -ftree-slp-vectorize, we run into an
ICE due to the fact that V2DI mode is not handled in nvptx_gen_shuffle.

Fix this by adding handling of V2DI as well as V2SI mode in
nvptx_gen_shuffle.

Build and reg-tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Handle V2DI/V2SI mode in nvptx_gen_shuffle

gcc/ChangeLog:

PR target/96428
* config/nvptx/nvptx.c (nvptx_gen_shuffle): Handle V2SI/V2DI.

libgomp/ChangeLog:

PR target/96428
* testsuite/libgomp.oacc-fortran/pr96628-part1.f90: New test.
* testsuite/libgomp.oacc-fortran/pr96628-part2.f90: New test.

---
 gcc/config/nvptx/nvptx.c   | 38 ++
 .../libgomp.oacc-fortran/pr96628-part1.f90 | 20 
 .../libgomp.oacc-fortran/pr96628-part2.f90 | 37 +
 3 files changed, 95 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index d8a8fb2d55b..cf53a921e5b 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1796,6 +1796,44 @@ nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, 
nvptx_shuffle_kind kind)
end_sequence ();
   }
   break;
+case E_V2SImode:
+  {
+   rtx src0 = gen_rtx_SUBREG (SImode, src, 0);
+   rtx src1 = gen_rtx_SUBREG (SImode, src, 4);
+   rtx dst0 = gen_rtx_SUBREG (SImode, dst, 0);
+   rtx dst1 = gen_rtx_SUBREG (SImode, dst, 4);
+   rtx tmp0 = gen_reg_rtx (SImode);
+   rtx tmp1 = gen_reg_rtx (SImode);
+   start_sequence ();
+   emit_insn (gen_movsi (tmp0, src0));
+   emit_insn (gen_movsi (tmp1, src1));
+   emit_insn (nvptx_gen_shuffle (tmp0, tmp0, idx, kind));
+   emit_insn (nvptx_gen_shuffle (tmp1, tmp1, idx, kind));
+   emit_insn (gen_movsi (dst0, tmp0));
+   emit_insn (gen_movsi (dst1, tmp1));
+   res = get_insns ();
+   end_sequence ();
+  }
+  break;
+case E_V2DImode:
+  {
+   rtx src0 = gen_rtx_SUBREG (DImode, src, 0);
+   rtx src1 = gen_rtx_SUBREG (DImode, src, 8);
+   rtx dst0 = gen_rtx_SUBREG (DImode, dst, 0);
+   rtx dst1 = gen_rtx_SUBREG (DImode, dst, 8);
+   rtx tmp0 = gen_reg_rtx (DImode);
+   rtx tmp1 = gen_reg_rtx (DImode);
+   start_sequence ();
+   emit_insn (gen_movdi (tmp0, src0));
+   emit_insn (gen_movdi (tmp1, src1));
+   emit_insn (nvptx_gen_shuffle (tmp0, tmp0, idx, kind));
+   emit_insn (nvptx_gen_shuffle (tmp1, tmp1, idx, kind));
+   emit_insn (gen_movdi (dst0, tmp0));
+   emit_insn (gen_movdi (dst1, tmp1));
+   res = get_insns ();
+   end_sequence ();
+  }
+  break;
 case E_BImode:
   {
rtx tmp = gen_reg_rtx (SImode);
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr96628-part1.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/pr96628-part1.f90
new file mode 100644
index 000..71219f9c467
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr96628-part1.f90
@@ -0,0 +1,20 @@
+! { dg-do run }
+! { dg-additional-sources pr96628-part2.f90 }
+! { dg-additional-options "-ftree-slp-vectorize" }
+!
+! This file is compiled first
+module m2
+  real*8 :: mysum
+  !$acc declare device_resident(mysum)
+contains
+SUBROUTINE one(t)
+  !$acc routine
+  REAL*8,  INTENT(IN):: t(:)
+  mysum = sum(t)
+END SUBROUTINE one
+SUBROUTINE two(t)
+  !$acc routine seq
+  REAL*8, INTENT(INOUT) :: t(:)
+  t = (100.0_8*t)/sum
+END SUBROUTINE two
+end module m2
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr96628-part2.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/pr96628-part2.f90
new file mode 100644
index 000..784dc27e19e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr96628-part2.f90
@@ -0,0 +1,37 @@
+! { dg-do compile  { target skip-all-targets } }
+!
+! Main file is pr96628-part1.f90
+
+MODULE m
+  IMPLICIT NONE
+  REAL*8, ALLOCATABLE :: t(:)
+CONTAINS
+  SUBROUTINE run()
+use m2
+IMPLICIT NONE
+
+INTEGER :: i,j! loop indices
+!$acc data present(t)
+!$acc parallel
+!$acc loop gang
+DO j = 1,2
+  !$acc loop vector
+  DO i = 1,2
+CALL one(t(:))
+CALL two(t(:))
+  END DO
+   END DO
+   !$acc end parallel
+   !$acc end data
+  END SUBROUTINE run
+END MODULE m
+
+use m
+implicit none
+integer :: i
+t = [(3.0_8*i, i = 1, 100)]
+!$acc data copy(t)
+call run
+!$acc end data
+if (any (abs(t - [((300.0_8*i)/15150.0_8, i = 1, 100)]) < 10.0_8*epsilon(t))) 
stop 1
+end

Re: [PUSHED 6/8] Use irange API in test_for_singularity.

2020-08-04 Thread Aldy Hernandez via Gcc-patches





On 8/4/20 8:58 AM, Richard Biener wrote:

On Tue, Aug 4, 2020 at 8:40 AM Aldy Hernandez via Gcc-patches
 wrote:


gcc/ChangeLog:

 * vr-values.c (test_for_singularity): Use irange API.
 (simplify_using_ranges::simplify_cond_using_ranges_1): Do not
 special case VR_RANGE.
---
  gcc/vr-values.c | 13 -
  1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 90ba8fca246..e78b25596b0 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -3480,10 +3480,13 @@ test_for_singularity (enum tree_code cond_code, tree 
op0,
   value range information we have for op0.  */
if (min && max)
  {
-  if (compare_values (vr->min (), min) == 1)
-   min = vr->min ();
-  if (compare_values (vr->max (), max) == -1)
-   max = vr->max ();
+  tree type = TREE_TYPE (op0);
+  tree tmin = wide_int_to_tree (type, vr->lower_bound ());
+  tree tmax = wide_int_to_tree (type, vr->upper_bound ());


I guess with symbolic ranges this just doesn't work anymore
(or rather will give a pessimistinc upper/lower bound)?


Yes, though we do slightly better than VARYING.  The symbolic 
normalizing code will rewrite [SYM, 5] as [-INF, 5], etc.


When I implemented this originally in the ranger branch, I 
pessimistically downgraded all symbolics to [MIN,MAX] to see if there 
was any difference in the generated code.  There wasn't.


I think most of vr-values.c does no better without symbolics, with the 
exception of compare_value* and the corresponding code that handles 
comparisons and equivalences.


Aldy

Re: [PUSHED 4/8] Adjust op_with_boolean_value_range_p for irange API.

2020-08-04 Thread Aldy Hernandez via Gcc-patches





On 8/4/20 8:55 AM, Richard Biener wrote:

On Tue, Aug 4, 2020 at 8:39 AM Aldy Hernandez via Gcc-patches
 wrote:


It seems to me that we should also check for [0,0] and [1,1] in the
range, but I am leaving things as is to avoid functional changes.

gcc/ChangeLog:

 * vr-values.c (simplify_using_ranges::op_with_boolean_value_range_p): 
Adjust
 for irange API.
---
  gcc/vr-values.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 609375c072e..1190fa96453 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -448,10 +448,11 @@ simplify_using_ranges::op_with_boolean_value_range_p 
(tree op)
if (TREE_CODE (op) != SSA_NAME)
  return false;

+  /* ?? Errr, this should probably check for [0,0] and [1,1] as well
+ as [0,1].  */
const value_range *vr = get_value_range (op);
-  return (vr->kind () == VR_RANGE
- && integer_zerop (vr->min ())
- && integer_onep (vr->max ()));
+  return *vr == value_range (build_zero_cst (TREE_TYPE (op)),
+build_one_cst (TREE_TYPE (op)));


This now builds trees in a predicate which is highly dubious.  Isn't
there a better (cheaper) primitive to build a [0, 1] range to compare against?
I guess from what I read it will also allocate memory for the range entry?


Both 0 and 1 are cached for all integer types.  We are also caching 1 
and MAX for pointers per the irange patchset, so this shouldn't be a 
performance issue.


Also, we do the same thing for irange::nonzero_p(), which is used a lot 
more often (and probably on critical paths), and per our benchmarks we 
didn't find it to take any significant time:


  tree zero = build_zero_cst (type ());
  return *this == int_range<1> (zero, zero, VR_ANTI_RANGE);

However, I suppose we could implement it with something like:

return vr->num_pairs() == 1
  && vr->lower_bound () == 0
  && vr->upper_bound () == 1

but that's hardly any faster, especially since num_pairs() is 
non-trivial for the value_range legacy code.


I just don't think it's worth it.

Aldy

RE: [PATCH] aarch64: Add A64FX machine model

2020-08-04 Thread Qian, Jianhua

Hi Richard

Thanks for your help.

> Would you like the patch to be backported further than GCC 10?
> Does the attached patch to document the addition to GCC 10.3 look OK?
I will reply to you after the internal discussion.

Regards,
Qian

Richard Sandiford  writes:
>Qian Jianhua  writes:
>> This patch add support for Fujitsu A64FX, as the first step of adding 
>> A64FX machine model.
>> 
>> A64FX is used in FUJITSU Supercomputer PRIMEHPC FX1000, PRIMEHPC 
>> FX700, and supercomputer Fugaku.
>> The official microarchitecture information of A64FX can be read at 
>> https://github.com/fujitsu/A64FX.
>> 
>> Changelog:
>> 2020-08-03 Qian jianhua 
>> 
>> * config/aarch64/aarch64-cores.def: Add the chip name.
>> * config/aarch64/aarch64-tune.md: Regenerated.
>> * config/aarch64/aarch64.c: Add tuning table for the chip.
>> * doc/invoke.texi: Add the new name to the list.
>> 
>> Test results:
>> * Bootstrap on aarch64 --- [OK]
>> * Regression tests --- [OK]
>> * Compile with -mcpu=a64fx --- [OK]
>
>Thanks for doing this, looks great.  Pushed to trunk and the GCC 10 branch.
>
>Would you like the patch to be backported further than GCC 10?  I wasn't sure 
>whether GCC 9 and earlier would be useful, given that those releases didn't 
>support the ACLE and were missing optimisations that went into GCC 10.
>
>Very minor, but I tweaked the changelog entry slightly to:
>
>2020-08-03  Qian jianhua  
>
>gcc/
>* config/aarch64/aarch64-cores.def (a64fx): New core.
>* config/aarch64/aarch64-tune.md: Regenerated.
>* config/aarch64/aarch64.c (a64fx_prefetch_tune, a64fx_tunings): New.
>* doc/invoke.texi: Add a64fx to the list.
>
>before committing.  The changelog entries are automatically applied to files 
>like gcc/ChangeLog on a nightly basis, and doing that would lose the context 
>in the covering message about which chip the patch is supporting.
>
>Does the attached patch to document the addition to GCC 10.3 look OK?
>We'll need something similar for GCC 11, but personally I tend to prefer 
>adding the notes closer to the release.
>
>Thanks,
>Richard

Re: [PUSHED 5/8] Adjust vrp_evaluate_conditional for irange API.

2020-08-04 Thread Aldy Hernandez via Gcc-patches




On 8/4/20 8:55 AM, Richard Biener wrote:

On Tue, Aug 4, 2020 at 8:40 AM Aldy Hernandez via Gcc-patches
 wrote:


VR_RANGE of [-INF,+INF] is canonicalized to VARYING at creation.
That is why the test now becomes varying_p().

gcc/ChangeLog:

 * vr-values.c (simplify_using_ranges::vrp_evaluate_conditional): Adjust
 for irange API.
---
  gcc/vr-values.c | 6 +-
  1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 1190fa96453..90ba8fca246 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -2495,11 +2495,7 @@ simplify_using_ranges::vrp_evaluate_conditional 
(tree_code code, tree op0,
tree type = TREE_TYPE (op0);
const value_range_equiv *vr0 = get_value_range (op0);

-  if (vr0->kind () == VR_RANGE
- && INTEGRAL_TYPE_P (type)
- && vrp_val_is_min (vr0->min ())
- && vrp_val_is_max (vr0->max ())
- && is_gimple_min_invariant (op1))
+  if (vr0->varying_p () && INTEGRAL_TYPE_P (type))


You dropped the is_gimple_min_invariant (op1) check.


Ah, thanks.

Pushed the attached patch.

Aldy
commit a44293840c0cb1376ac2f45212da7b8d5d21037a
Author: Aldy Hernandez 
Date:   Tue Aug 4 11:19:39 2020 +0200

Add is_gimple_min_invariant dropped from previous patch.

gcc/ChangeLog:

* vr-values.c (simplify_using_ranges::vrp_evaluate_conditional):
Call is_gimple_min_invariant dropped from previous patch.

diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 2fd4956a2e4..511342f2f13 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -2495,7 +2495,9 @@ simplify_using_ranges::vrp_evaluate_conditional (tree_code code, tree op0,
   tree type = TREE_TYPE (op0);
   const value_range_equiv *vr0 = get_value_range (op0);
 
-  if (vr0->varying_p () && INTEGRAL_TYPE_P (type))
+  if (vr0->varying_p ()
+	  && INTEGRAL_TYPE_P (type)
+	  && is_gimple_min_invariant (op1))
 	{
 	  location_t location;

Re: [PUSHED 2/8] Adjust expr_not_equal_to to use irange API.

2020-08-04 Thread Aldy Hernandez via Gcc-patches





On 8/4/20 8:52 AM, Richard Biener wrote:

On Tue, Aug 4, 2020 at 8:37 AM Aldy Hernandez via Gcc-patches
 wrote:


gcc/ChangeLog:

 * fold-const.c (expr_not_equal_to): Adjust for irange API.
---
  gcc/fold-const.c | 17 -
  1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 1324a194995..5d27927f6bf 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -10194,8 +10194,7 @@ tree_expr_nonzero_p (tree t)
  bool
  expr_not_equal_to (tree t, const wide_int )
  {
-  wide_int min, max, nz;
-  value_range_kind rtype;
+  value_range vr;
switch (TREE_CODE (t))
  {
  case INTEGER_CST:
@@ -10204,17 +10203,9 @@ expr_not_equal_to (tree t, const wide_int )
  case SSA_NAME:
if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
 return false;
-  rtype = get_range_info (t, , );
-  if (rtype == VR_RANGE)
-   {
- if (wi::lt_p (max, w, TYPE_SIGN (TREE_TYPE (t
-   return true;
- if (wi::lt_p (w, min, TYPE_SIGN (TREE_TYPE (t
-   return true;
-   }
-  else if (rtype == VR_ANTI_RANGE
-  && wi::le_p (min, w, TYPE_SIGN (TREE_TYPE (t)))
-  && wi::le_p (w, max, TYPE_SIGN (TREE_TYPE (t
+  get_range_info (t, vr);


Ick.  Do we now use references for out parameters?  I find this
highly non-obvious semantics.  What's wrong with

  vr = get_range_info (t);

if you dislike get_range_info (t, )?


Using references was decided last year.

We want the ability of to use the same range granularity as what the 
user requested, so we can't just return a range, as we don't know how 
many sub-ranges the user wants:


int_range<10> big_range;
twiddle_range(big_range);

We want the above to work with big_range, or a small range, or a 
value_range.  Twiddle_range should be range agnostic.


get_range_info(const_tree, value_range &) has been available since last 
year.  I suppose if one were so inclined, one could change it to take a 
pointer to be consistent with the other get_range_info() overload.


Aldy

Re: [PATCH] veclower: Don't ICE on .VEC_CONVERT calls with no lhs [PR96426]

2020-08-04 Thread Richard Biener

On Tue, 4 Aug 2020, Jakub Jelinek wrote:

> Hi!
> 
> .VEC_CONVERT is a const internal call, so normally if the lhs is not used,
> we'd DCE it far before getting to veclower, but with -O0 (or perhaps
> -fno-tree-dce and some other -fno-* options) it can happen.
> But as the internal fn needs the lhs to know the type to which the
> conversion is done (and I think that is a reasonable representation, having
> some magic another argument and having to create constants with that type
> looks overkill to me), we just should DCE those calls ourselves.
> During veclower, we can't really remove insns, as the callers would be
> upset, so this just replaces it with a GIMPLE_NOP.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

Richard.

> 2020-08-04  Jakub Jelinek  
> 
>   PR middle-end/96426
>   * tree-vect-generic.c (expand_vector_conversion): Replace .VEC_CONVERT
>   call with GIMPLE_NOP if there is no lhs.
> 
>   * gcc.c-torture/compile/pr96426.c: New test.
> 
> --- gcc/tree-vect-generic.c.jj2020-07-28 15:39:10.081755224 +0200
> +++ gcc/tree-vect-generic.c   2020-08-03 12:34:32.193423693 +0200
> @@ -1775,6 +1775,12 @@ expand_vector_conversion (gimple_stmt_it
>gimple *stmt = gsi_stmt (*gsi);
>gimple *g;
>tree lhs = gimple_call_lhs (stmt);
> +  if (lhs == NULL_TREE)
> +{
> +  g = gimple_build_nop ();
> +  gsi_replace (gsi, g, false);
> +  return;
> +}
>tree arg = gimple_call_arg (stmt, 0);
>tree ret_type = TREE_TYPE (lhs);
>tree arg_type = TREE_TYPE (arg);
> --- gcc/testsuite/gcc.c-torture/compile/pr96426.c.jj  2020-08-03 
> 12:40:23.442449729 +0200
> +++ gcc/testsuite/gcc.c-torture/compile/pr96426.c 2020-08-03 
> 12:40:09.458647750 +0200
> @@ -0,0 +1,10 @@
> +/* PR middle-end/96426 */
> +
> +typedef long long V __attribute__((vector_size(16)));
> +typedef double W __attribute__((vector_size(16)));
> +
> +void
> +foo (V *v)
> +{
> +  __builtin_convertvector (*v, W);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] gimple-fold: Fix ICE in maybe_canonicalize_mem_ref_addr on debug stmt [PR96354]

2020-08-04 Thread Richard Biener

On Tue, 4 Aug 2020, Jakub Jelinek wrote:

> Hi!
> 
> In debug stmts, we are less strict about what is and what is not accepted
> there, so this patch just punts on optimization of a debug stmt rather than
> ICEing.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2020-08-04  Jakub Jelinek  
> 
>   PR debug/96354
>   * gimple-fold.c (maybe_canonicalize_mem_ref_addr): Add IS_DEBUG
>   argument.  Return false instead of gcc_unreachable if it is true and
>   get_addr_base_and_unit_offset returns NULL.
>   (fold_stmt_1) : Adjust caller.
> 
>   * g++.dg/opt/pr96354.C: New test.
> 
> --- gcc/gimple-fold.c.jj  2020-07-28 15:39:09.908757602 +0200
> +++ gcc/gimple-fold.c 2020-08-03 13:23:57.579436442 +0200
> @@ -4875,7 +4875,7 @@ replace_stmt_with_simplification (gimple
>  /* Canonicalize MEM_REFs invariant address operand after propagation.  */
>  
>  static bool
> -maybe_canonicalize_mem_ref_addr (tree *t)
> +maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug = false)
>  {
>bool res = false;
>tree *orig_t = t;
> @@ -4939,7 +4939,11 @@ maybe_canonicalize_mem_ref_addr (tree *t
> base = get_addr_base_and_unit_offset (TREE_OPERAND (addr, 0),
>   );
> if (!base)
> - gcc_unreachable ();
> + {
> +   if (is_debug)
> + return false;
> +   gcc_unreachable ();
> + }
>  
> TREE_OPERAND (*t, 0) = build_fold_addr_expr (base);
> TREE_OPERAND (*t, 1) = int_const_binop (PLUS_EXPR,
> @@ -5119,7 +5123,7 @@ fold_stmt_1 (gimple_stmt_iterator *gsi,
> if (*val
> && (REFERENCE_CLASS_P (*val)
> || TREE_CODE (*val) == ADDR_EXPR)
> -   && maybe_canonicalize_mem_ref_addr (val))
> +   && maybe_canonicalize_mem_ref_addr (val, true))
>   changed = true;
>   }
>break;
> --- gcc/testsuite/g++.dg/opt/pr96354.C.jj 2020-07-29 11:25:15.701164242 
> +0200
> +++ gcc/testsuite/g++.dg/opt/pr96354.C2020-07-29 11:26:16.490291018 
> +0200
> @@ -0,0 +1,24 @@
> +// PR debug/96354
> +// { dg-do compile }
> +// { dg-options "-O2 -g -fopenmp-simd" }
> +
> +template  struct A { typedef double T[N]; };
> +template  struct B { typename A::T b; double *baz () { return b; } 
> };
> +template  struct C { B d; C (); };
> +template  C::C () { double c = *d.baz (); }
> +template  void operator- (C, const C &);
> +template  struct D {};
> +template  C foo (D, C) { C t; return t; }
> +int e;
> +struct E { D<3> d; void bar (); };
> +
> +void
> +E::bar ()
> +{
> +#pragma omp simd
> +  for (int i = 0; i < e; i++)
> +{
> +  C<3> f, g;
> +  g - foo (d, f);
> +}
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

[committed] openmp: Compute number of collapsed loop iterations more efficiently for some non-rectangular loops

2020-08-04 Thread Jakub Jelinek via Gcc-patches

Hi!

This patch computes number of loop iterations for (so far signed only)
non-rectangular loops where only a single loop in the nest depends on an
outer loop iterator.

Bootstrapped/regtested on x86_64-linux, committed to trunk.

2020-08-04  Jakub Jelinek  

* omp-expand.c (expand_omp_for_init_counts): For triangular loops
compute number of iterations at runtime more efficiently.
(expand_omp_for_init_vars): Adjust immediate dominators.
(extract_omp_for_update_vars): Likewise.

--- gcc/omp-expand.c.jj 2020-07-31 23:06:38.176464758 +0200
+++ gcc/omp-expand.c2020-08-03 18:12:34.393576057 +0200
@@ -1973,139 +1973,390 @@ expand_omp_for_init_counts (struct omp_f
 {
   gcc_assert (fd->last_nonrect != -1);
 
-  /* Fallback implementation.  Evaluate the loops with m1/m2
-non-NULL as well as their outer loops at runtime using temporaries
-instead of the original iteration variables, and in the
-body just bump the counter.  */
   counts[fd->last_nonrect] = create_tmp_reg (type, ".count");
   expand_omp_build_assign (gsi, counts[fd->last_nonrect],
   build_zero_cst (type));
-  gimple_stmt_iterator gsi2 = *gsi;
-  gsi_prev ();
-  e = split_block (entry_bb, gsi_stmt (gsi2));
-  e = split_block (e->dest, (gimple *) NULL);
-  basic_block cur_bb = e->src;
-  basic_block next_bb = e->dest;
-  entry_bb = e->dest;
-  *gsi = gsi_after_labels (entry_bb);
-
-  tree *vs = XALLOCAVEC (tree, fd->last_nonrect);
-  memset (vs, 0, fd->last_nonrect * sizeof (tree));
-
-  for (i = 0; i <= fd->last_nonrect; i++)
-   {
- if (fd->loops[i].m1 == NULL_TREE
- && fd->loops[i].m2 == NULL_TREE
- && !fd->loops[i].non_rect_referenced)
-   continue;
+  for (i = fd->first_nonrect + 1; i < fd->last_nonrect; i++)
+   if (fd->loops[i].m1
+   || fd->loops[i].m2
+   || fd->loops[i].non_rect_referenced)
+ break;
+  if (i == fd->last_nonrect
+ && fd->loops[i].outer == fd->last_nonrect - fd->first_nonrect
+ && !TYPE_UNSIGNED (TREE_TYPE (fd->loops[i].v)))
+   {
+ int o = fd->first_nonrect;
+ tree itype = TREE_TYPE (fd->loops[o].v);
+ tree n1o = create_tmp_reg (itype, ".n1o");
+ t = fold_convert (itype, unshare_expr (fd->loops[o].n1));
+ expand_omp_build_assign (gsi, n1o, t);
+ tree n2o = create_tmp_reg (itype, ".n2o");
+ t = fold_convert (itype, unshare_expr (fd->loops[o].n2));
+ expand_omp_build_assign (gsi, n2o, t);
+ if (fd->loops[i].m1 && fd->loops[i].m2)
+   t = fold_build2 (MINUS_EXPR, itype, unshare_expr (fd->loops[i].m2),
+unshare_expr (fd->loops[i].m1));
+ else if (fd->loops[i].m1)
+   t = fold_unary (NEGATE_EXPR, itype,
+   unshare_expr (fd->loops[i].m1));
+ else
+   t = unshare_expr (fd->loops[i].m2);
+ tree m2minusm1
+   = force_gimple_operand_gsi (gsi, t, true, NULL_TREE,
+   true, GSI_SAME_STMT);
 
- tree itype = TREE_TYPE (fd->loops[i].v);
+ gimple_stmt_iterator gsi2 = *gsi;
+ gsi_prev ();
+ e = split_block (entry_bb, gsi_stmt (gsi2));
+ e = split_block (e->dest, (gimple *) NULL);
+ basic_block bb1 = e->src;
+ entry_bb = e->dest;
+ *gsi = gsi_after_labels (entry_bb);
 
- gsi2 = gsi_after_labels (cur_bb);
- tree n1, n2;
+ gsi2 = gsi_after_labels (bb1);
+ tree ostep = fold_convert (itype, fd->loops[o].step);
+ t = build_int_cst (itype, (fd->loops[o].cond_code
+== LT_EXPR ? -1 : 1));
+ t = fold_build2 (PLUS_EXPR, itype, ostep, t);
+ t = fold_build2 (PLUS_EXPR, itype, t, n2o);
+ t = fold_build2 (MINUS_EXPR, itype, t, n1o);
+ if (TYPE_UNSIGNED (itype)
+ && fd->loops[o].cond_code == GT_EXPR)
+   t = fold_build2 (TRUNC_DIV_EXPR, itype,
+fold_build1 (NEGATE_EXPR, itype, t),
+fold_build1 (NEGATE_EXPR, itype, ostep));
+ else
+   t = fold_build2 (TRUNC_DIV_EXPR, itype, t, ostep);
+ tree outer_niters
+   = force_gimple_operand_gsi (, t, true, NULL_TREE,
+   true, GSI_SAME_STMT);
+ t = fold_build2 (MINUS_EXPR, itype, outer_niters,
+  build_one_cst (itype));
+ t = fold_build2 (MULT_EXPR, itype, t, ostep);
+ t = fold_build2 (PLUS_EXPR, itype, n1o, t);
+ tree last = force_gimple_operand_gsi (, t, true, NULL_TREE,
+   true, GSI_SAME_STMT);
+ tree n1, n2, n1e, n2e;
  t = fold_convert (itype, unshare_expr (fd->loops[i].n1));
  if (fd->loops[i].m1)
{

[PATCH] target: delete unnecessary codes in aarch64.c

2020-08-04 Thread Hu Jiangping

Hi,

This patch deletes 2 unnecessary codes in function
aarch64_if_then_else_costs, which were duplicated
where the function starts.

Tested on aarch64. OK for trunk?

Regards!
Hujp

---
 gcc/config/aarch64/aarch64.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7c3ab3eeb1f..a70b2287b2c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11774,8 +11774,6 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx op2, 
int *cost, bool speed)
  if (speed)
{
  machine_mode mode = GET_MODE (XEXP (op1, 0));
- const struct cpu_cost_table *extra_cost
-   = aarch64_tune_params.insn_extra_cost;
 
  if (GET_MODE_CLASS (mode) == MODE_INT)
*cost += extra_cost->alu.arith;
-- 
2.17.1

Re: [PATCH] veclower: Don't ICE on .VEC_CONVERT calls with no lhs [PR96426]

2020-08-04 Thread Richard Sandiford

Jakub Jelinek via Gcc-patches  writes:
> Hi!
>
> .VEC_CONVERT is a const internal call, so normally if the lhs is not used,
> we'd DCE it far before getting to veclower, but with -O0 (or perhaps
> -fno-tree-dce and some other -fno-* options) it can happen.
> But as the internal fn needs the lhs to know the type to which the
> conversion is done (and I think that is a reasonable representation, having
> some magic another argument and having to create constants with that type
> looks overkill to me), we just should DCE those calls ourselves.

FWIW, the reason we took that approach for .WHILE_ULT was for things
like value numbering.  It seems reasonable to expect that one const call
to internal function F with arguments A produces the same result as
another const call to internal function F with arguments A, without
having to differentiate based on lhs type as well.

(Not a comment on the patch itself btw.)

Thanks,
Richard

[PATCH] doc: Add @cindex to symver attribute

2020-08-04 Thread Jakub Jelinek via Gcc-patches

Hi!

When looking at the symver attr documentation in html, I found there is no
name to refer to for it.
The following patch fixes that, bootstrapped on x86_64-linux, ok for trunk
and 10.3?

2020-08-04  Jakub Jelinek  

* doc/extend.texi (symver): Add @cindex for symver function attribute.

--- gcc/doc/extend.texi.jj  2020-07-28 15:39:09.849758414 +0200
+++ gcc/doc/extend.texi 2020-08-03 17:02:06.221350978 +0200
@@ -3723,6 +3723,7 @@ Function Attributes}, @ref{PowerPC Funct
 for details.
 
 @item symver ("@var{name2}@@@var{nodename}")
+@cindex @code{symver} function attribute
 On ELF targets this attribute creates a symbol version.  The @var{name2} part
 of the parameter is the actual name of the symbol by which it will be
 externally referenced.  The @code{nodename} portion should be the name of a

Jakub

[PATCH] veclower: Don't ICE on .VEC_CONVERT calls with no lhs [PR96426]

2020-08-04 Thread Jakub Jelinek via Gcc-patches

Hi!

.VEC_CONVERT is a const internal call, so normally if the lhs is not used,
we'd DCE it far before getting to veclower, but with -O0 (or perhaps
-fno-tree-dce and some other -fno-* options) it can happen.
But as the internal fn needs the lhs to know the type to which the
conversion is done (and I think that is a reasonable representation, having
some magic another argument and having to create constants with that type
looks overkill to me), we just should DCE those calls ourselves.
During veclower, we can't really remove insns, as the callers would be
upset, so this just replaces it with a GIMPLE_NOP.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2020-08-04  Jakub Jelinek  

PR middle-end/96426
* tree-vect-generic.c (expand_vector_conversion): Replace .VEC_CONVERT
call with GIMPLE_NOP if there is no lhs.

* gcc.c-torture/compile/pr96426.c: New test.

--- gcc/tree-vect-generic.c.jj  2020-07-28 15:39:10.081755224 +0200
+++ gcc/tree-vect-generic.c 2020-08-03 12:34:32.193423693 +0200
@@ -1775,6 +1775,12 @@ expand_vector_conversion (gimple_stmt_it
   gimple *stmt = gsi_stmt (*gsi);
   gimple *g;
   tree lhs = gimple_call_lhs (stmt);
+  if (lhs == NULL_TREE)
+{
+  g = gimple_build_nop ();
+  gsi_replace (gsi, g, false);
+  return;
+}
   tree arg = gimple_call_arg (stmt, 0);
   tree ret_type = TREE_TYPE (lhs);
   tree arg_type = TREE_TYPE (arg);
--- gcc/testsuite/gcc.c-torture/compile/pr96426.c.jj2020-08-03 
12:40:23.442449729 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr96426.c   2020-08-03 
12:40:09.458647750 +0200
@@ -0,0 +1,10 @@
+/* PR middle-end/96426 */
+
+typedef long long V __attribute__((vector_size(16)));
+typedef double W __attribute__((vector_size(16)));
+
+void
+foo (V *v)
+{
+  __builtin_convertvector (*v, W);
+}

Jakub

[PATCH] gimple-fold: Fix ICE in maybe_canonicalize_mem_ref_addr on debug stmt [PR96354]

2020-08-04 Thread Jakub Jelinek via Gcc-patches

Hi!

In debug stmts, we are less strict about what is and what is not accepted
there, so this patch just punts on optimization of a debug stmt rather than
ICEing.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-08-04  Jakub Jelinek  

PR debug/96354
* gimple-fold.c (maybe_canonicalize_mem_ref_addr): Add IS_DEBUG
argument.  Return false instead of gcc_unreachable if it is true and
get_addr_base_and_unit_offset returns NULL.
(fold_stmt_1) : Adjust caller.

* g++.dg/opt/pr96354.C: New test.

--- gcc/gimple-fold.c.jj2020-07-28 15:39:09.908757602 +0200
+++ gcc/gimple-fold.c   2020-08-03 13:23:57.579436442 +0200
@@ -4875,7 +4875,7 @@ replace_stmt_with_simplification (gimple
 /* Canonicalize MEM_REFs invariant address operand after propagation.  */
 
 static bool
-maybe_canonicalize_mem_ref_addr (tree *t)
+maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug = false)
 {
   bool res = false;
   tree *orig_t = t;
@@ -4939,7 +4939,11 @@ maybe_canonicalize_mem_ref_addr (tree *t
  base = get_addr_base_and_unit_offset (TREE_OPERAND (addr, 0),
);
  if (!base)
-   gcc_unreachable ();
+   {
+ if (is_debug)
+   return false;
+ gcc_unreachable ();
+   }
 
  TREE_OPERAND (*t, 0) = build_fold_addr_expr (base);
  TREE_OPERAND (*t, 1) = int_const_binop (PLUS_EXPR,
@@ -5119,7 +5123,7 @@ fold_stmt_1 (gimple_stmt_iterator *gsi,
  if (*val
  && (REFERENCE_CLASS_P (*val)
  || TREE_CODE (*val) == ADDR_EXPR)
- && maybe_canonicalize_mem_ref_addr (val))
+ && maybe_canonicalize_mem_ref_addr (val, true))
changed = true;
}
   break;
--- gcc/testsuite/g++.dg/opt/pr96354.C.jj   2020-07-29 11:25:15.701164242 
+0200
+++ gcc/testsuite/g++.dg/opt/pr96354.C  2020-07-29 11:26:16.490291018 +0200
@@ -0,0 +1,24 @@
+// PR debug/96354
+// { dg-do compile }
+// { dg-options "-O2 -g -fopenmp-simd" }
+
+template  struct A { typedef double T[N]; };
+template  struct B { typename A::T b; double *baz () { return b; } };
+template  struct C { B d; C (); };
+template  C::C () { double c = *d.baz (); }
+template  void operator- (C, const C &);
+template  struct D {};
+template  C foo (D, C) { C t; return t; }
+int e;
+struct E { D<3> d; void bar (); };
+
+void
+E::bar ()
+{
+#pragma omp simd
+  for (int i = 0; i < e; i++)
+{
+  C<3> f, g;
+  g - foo (d, f);
+}
+}

Jakub

RE: SLS Mitigation patches backported for GCC9

2020-08-04 Thread Kyrylo Tkachov

Hi Matthew,

> -Original Message-
> From: Matthew Malcomson 
> Sent: 24 July 2020 17:03
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Ross Burton
> ; Richard Sandiford 
> Subject: Re: SLS Mitigation patches backported for GCC9
> 
> On 24/07/2020 12:01, Kyrylo Tkachov wrote:
> > Hi Matthew,
> >
> >> -Original Message-
> >> From: Matthew Malcomson 
> >> Sent: 21 July 2020 16:16
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Richard Earnshaw ; Kyrylo Tkachov
> >> ; Ross Burton 
> >> Subject: SLS Mitigation patches backported for GCC9
> >>
> >> Hello,
> >>
> >> Eventually we will want to backport the SLS patches to older branches.
> >>
> >> When the GCC10 release is unfrozen we will work on getting the same
> >> patches
> >> already posted backported to that branch.  The patches already posted on
> >> the
> >> mailing list apply cleanly to the current releases/gcc-10 branch.
> >>
> >> I've heard interest in having the GCC 9 patches, so I'm posting the
> modified
> >> versions upstream sooner than otherwise.
> >
> > I'd say let's go ahead with the GCC 10 patches (assuming testing works out
> well on there).
> > For the GCC 9 patches it would be useful if you included a bit of text of 
> > how
> they differ from the GCC 10/11 patches.
> > This would speed up the technical review.
> > Thanks,
> > Kyrill
> >
> >>
> >> Cheers,
> >> Matthew
> >>
> >> Entire patch series attached to cover letter.
> 
> Below were the only two "interesting" hunks that failed to apply after
> `patch -p1`.
> 
> The differences causing these were:
> - in GCC-9 the `retab` instruction wasn't in the "do_return" pattern.
> - `simple_return` had "aarch64_use_simple_return_insn_p ()" as a
> condition.
> 
> 

Thanks, the backports to GCC 10 and GCC 9 are okay, let's go ahead with them.
Kyrill

> 
> 
> --- gcc/config/aarch64/aarch64.md
> +++ gcc/config/aarch64/aarch64.md
> @@ -863,18 +882,23 @@
> [(return)]
> ""
> {
> +const char *ret = NULL;
>   if (aarch64_return_address_signing_enabled ()
>  && TARGET_ARMV8_3
>  && !crtl->calls_eh_return)
> {
>  if (aarch64_ra_sign_key == AARCH64_KEY_B)
> - return "retab";
> + ret = "retab";
>  else
> - return "retaa";
> + ret = "retaa";
> }
> -return "ret";
> +else
> +  ret = "ret";
> +output_asm_insn (ret, operands);
> +return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
> }
> -  [(set_attr "type" "branch")]
> +  [(set_attr "type" "branch")
> +   (set_attr "sls_length" "retbr")]
>   )
> 
>   (define_expand "return"
> @@ -886,8 +910,12 @@
>   (define_insn "simple_return"
> [(simple_return)]
> ""
> -  "ret"
> -  [(set_attr "type" "branch")]
> +  {
> +output_asm_insn ("ret", operands);
> +return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
> +  }
> +  [(set_attr "type" "branch")
> +   (set_attr "sls_length" "retbr")]
>   )
> 
>   (define_insn "*cb1"

RE: [PATCH] Aarch64: Add missing clobber for fjcvtzs

2020-08-04 Thread Kyrylo Tkachov

Hi Andrea,

> -Original Message-
> From: Andrea Corallo 
> Sent: 04 August 2020 09:28
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Richard Sandiford
> 
> Subject: Re: [PATCH] Aarch64: Add missing clobber for fjcvtzs
> 
> Hi all,
> 
> looking into the regression I realized I've got the condition into the
> effective target test wrong (== vs !=).  Apologies for the noise.
> 
> Bootstrapped on aarch64-none-linux-gnu and regresssioned on
> aarch64-none-elf.
> 
> Still okay for trunk and to backport on gcc-10?

I just remembered a recurring bit of review feedback from Ramana on my 
patches...
New effective target checks need to be documented in doc/sourcebuild.texi.

Ok with the documentation.
Thanks,
Kyrill

> 
> Thanks
> 
>   Andrea
> 
> gcc/ChangeLog
> 
> 2020-07-30  Andrea Corallo  
> 
>   * config/aarch64/aarch64.md (aarch64_fjcvtzs): Add missing
>   clobber.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-07-30  Andrea Corallo  
> 
>   * gcc.target/aarch64/acle/jcvt_2.c: New testcase.
>   * lib/target-supports.exp
>   (check_effective_target_aarch64_fjcvtzs_hw): Add new check for
>   FJCVTZS hw.
>

Re: [PATCH] Aarch64: Add missing clobber for fjcvtzs

2020-08-04 Thread Andrea Corallo

Hi all,

looking into the regression I realized I've got the condition into the
effective target test wrong (== vs !=).  Apologies for the noise.

Bootstrapped on aarch64-none-linux-gnu and regresssioned on
aarch64-none-elf.

Still okay for trunk and to backport on gcc-10?

Thanks

  Andrea

gcc/ChangeLog

2020-07-30  Andrea Corallo  

* config/aarch64/aarch64.md (aarch64_fjcvtzs): Add missing
clobber.

gcc/testsuite/ChangeLog

2020-07-30  Andrea Corallo  

* gcc.target/aarch64/acle/jcvt_2.c: New testcase.
* lib/target-supports.exp
(check_effective_target_aarch64_fjcvtzs_hw): Add new check for
FJCVTZS hw.

>From 573b7a09f8c87621efeef80ea42055630391a40c Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Wed, 29 Jul 2020 19:04:40 +0200
Subject: [PATCH] Add missing clobber for fjcvtzs

gcc/ChangeLog

2020-07-30  Andrea Corallo  

* config/aarch64/aarch64.md (aarch64_fjcvtzs): Add missing
clobber.

gcc/testsuite/ChangeLog

2020-07-30  Andrea Corallo  

* gcc.target/aarch64/acle/jcvt_2.c: New testcase.
* lib/target-supports.exp
(check_effective_target_aarch64_fjcvtzs_hw): Add new check for
FJCVTZS hw.
---
 gcc/config/aarch64/aarch64.md |  3 +-
 .../gcc.target/aarch64/acle/jcvt_2.c  | 33 +++
 gcc/testsuite/lib/target-supports.exp | 21 
 3 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/jcvt_2.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d5ca1898c02e..df780b863707 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7059,7 +7059,8 @@
 (define_insn "aarch64_fjcvtzs"
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:DF 1 "register_operand" "w")]
-  UNSPEC_FJCVTZS))]
+  UNSPEC_FJCVTZS))
+   (clobber (reg:CC CC_REGNUM))]
   "TARGET_JSCVT"
   "fjcvtzs\\t%w0, %d1"
   [(set_attr "type" "f_cvtf2i")]
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/jcvt_2.c 
b/gcc/testsuite/gcc.target/aarch64/acle/jcvt_2.c
new file mode 100644
index ..ea2dfd14cf29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/jcvt_2.c
@@ -0,0 +1,33 @@
+/* Test the __jcvt ACLE intrinsic.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -march=armv8.3-a -save-temps" } */
+/* { dg-require-effective-target aarch64_fjcvtzs_hw } */
+
+#include 
+
+extern void abort (void);
+
+#ifdef __ARM_FEATURE_JCVT
+volatile int32_t x;
+
+int __attribute__((noinline))
+foo (double a, int b, int c)
+{
+  b = b > c;
+  x = __jcvt (a);
+  return b;
+}
+
+int
+main (void)
+{
+  int x = foo (1.1, 2, 3);
+  if (x)
+abort ();
+
+  return 0;
+}
+
+#endif
+
+/* { dg-final { scan-assembler-times "fjcvtzs\tw\[0-9\]+, d\[0-9\]+\n" 1 } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 57eed3012b94..885689675e1d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4848,6 +4848,27 @@ proc check_effective_target_aarch64_bti_hw { } {
 } "-O2" ]
 }
 
+# Return 1 if the target supports executing the armv8.3-a FJCVTZS
+# instruction.
+proc check_effective_target_aarch64_fjcvtzs_hw { } {
+if { ![istarget aarch64*-*-*] } {
+   return 0
+}
+return [check_runtime aarch64_fjcvtzs_hw_available {
+   int
+   main (void)
+   {
+ double in = 25.1;
+ int out;
+ asm volatile ("fjcvtzs %w0, %d1"
+   : "=r" (out)
+   : "w" (in)
+   : /* No clobbers.  */);
+ return out != 25;
+   }
+} "-march=armv8.3-a" ]
+}
+
 # Return 1 if GCC was configured with --enable-standard-branch-protection
 proc check_effective_target_default_branch_protection { } {
 return [check_configured_with "enable-standard-branch-protection"]
-- 
2.17.1

[committed] d: Fix PR96429: Pointer subtraction uses TRUNC_DIV_EXPR

2020-08-04 Thread Iain Buclaw via Gcc-patches

Hi,

This patch detects the pattern for pointer substraction in the front-end
AST and uses EXACT_DIV_EXPR rather than TRUNC_DIV_EXPR.

Bootstrapped and regression tested on x86_64-linux-gnu with multilib
configurations -m32/-mx32.  Committed to mainline.

Regards
Iain

---
gcc/d/ChangeLog:

PR d/96429
* expr.cc (ExprVisitor::visit (BinExp*)): Use EXACT_DIV_EXPR for
pointer diff expressions.

gcc/testsuite/ChangeLog:

PR d/96429
* gdc.dg/pr96429.d: New test.
---
 gcc/d/expr.cc  | 12 
 gcc/testsuite/gdc.dg/pr96429.d | 26 ++
 2 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gdc.dg/pr96429.d

diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index 20ab49d7b8c..ac3d4aaa171 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -620,6 +620,18 @@ public:
break;
 
   case TOKdiv:
+   /* Determine if the div expression is a lowered pointer diff operation.
+  The front-end rewrites `(p1 - p2)' into `(p1 - p2) / stride'.  */
+   if (MinExp *me = e->e1->isMinExp ())
+ {
+   if (me->e1->type->ty == Tpointer && me->e2->type->ty == Tpointer
+   && e->e2->op == TOKint64)
+ {
+   code = EXACT_DIV_EXPR;
+   break;
+ }
+ }
+
code = e->e1->type->isintegral ()
  ? TRUNC_DIV_EXPR : RDIV_EXPR;
break;
diff --git a/gcc/testsuite/gdc.dg/pr96429.d b/gcc/testsuite/gdc.dg/pr96429.d
new file mode 100644
index 000..af096e26b5a
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr96429.d
@@ -0,0 +1,26 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96429
+// { dg-do compile }
+// { dg-options "-fdump-tree-original" }
+ptrdiff_t subbyte(byte* bp1, byte* bp2)
+{
+// { dg-final { scan-tree-dump "bp1 - bp2;" "original" } }
+return bp1 - bp2;
+}
+
+ptrdiff_t subshort(short* sp1, short* sp2)
+{
+// { dg-final { scan-tree-dump "\\\(sp1 - sp2\\\) /\\\[ex\\\] 2;" 
"original" } }
+return sp1 - sp2;
+}
+
+ptrdiff_t subint(int* ip1, int* ip2)
+{
+// { dg-final { scan-tree-dump "\\\(ip1 - ip2\\\) /\\\[ex\\\] 4;" 
"original" } }
+return ip1 - ip2;
+}
+
+ptrdiff_t sublong(long* lp1, long* lp2)
+{
+// { dg-final { scan-tree-dump "\\\(lp1 - lp2\\\) /\\\[ex\\\] 8;" 
"original" } }
+return lp1 - lp2;
+}
-- 
2.25.1

Re: [PATCH] Using gen_int_mode instead of GEN_INT to avoid ICE caused by type promotion.

2020-08-04 Thread Hongtao Liu via Gcc-patches

ping ^2

On Mon, Jul 27, 2020 at 5:31 PM Hongtao Liu  wrote:
>
> ping
>
> On Wed, Jul 22, 2020 at 3:57 PM Hongtao Liu  wrote:
> >
> >   Bootstrap is ok, regression test is ok for i386 backend.
> >
> > gcc/
> > PR target/96262
> > * config/i386/i386-expand.c
> > (ix86_expand_vec_shift_qihi_constant): Refine.
> >
> > gcc/testsuite/
> > * gcc.target/i386/pr96262-1.c: New test.
> >
> > ---
> >  gcc/config/i386/i386-expand.c |  6 +++---
> >  gcc/testsuite/gcc.target/i386/pr96262-1.c | 11 +++
> >  2 files changed, 14 insertions(+), 3 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr96262-1.c
> >
> > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > index e194214804b..d57d043106a 100644
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -19537,7 +19537,7 @@ bool
> >  ix86_expand_vec_shift_qihi_constant (enum rtx_code code, rtx dest,
> > rtx op1, rtx op2)
> >  {
> >machine_mode qimode, himode;
> > -  unsigned int and_constant, xor_constant;
> > +  HOST_WIDE_INT and_constant, xor_constant;
> >HOST_WIDE_INT shift_amount;
> >rtx vec_const_and, vec_const_xor;
> >rtx tmp, op1_subreg;
> > @@ -19612,7 +19612,7 @@ ix86_expand_vec_shift_qihi_constant (enum
> > rtx_code code, rtx dest, rtx op1, rtx
> >emit_move_insn (dest, simplify_gen_subreg (qimode, tmp, himode, 0));
> >emit_move_insn (vec_const_and,
> >   ix86_build_const_vector (qimode, true,
> > -  GEN_INT (and_constant)));
> > +  gen_int_mode (and_constant,
> > QImode)));
> >emit_insn (gen_and (dest, dest, vec_const_and));
> >
> >/* For ASHIFTRT, perform extra operation like
> > @@ -19623,7 +19623,7 @@ ix86_expand_vec_shift_qihi_constant (enum
> > rtx_code code, rtx dest, rtx op1, rtx
> >vec_const_xor = gen_reg_rtx (qimode);
> >emit_move_insn (vec_const_xor,
> >   ix86_build_const_vector (qimode, true,
> > -  GEN_INT (xor_constant)));
> > +  gen_int_mode
> > (xor_constant, QImode)));
> >emit_insn (gen_xor (dest, dest, vec_const_xor));
> >emit_insn (gen_sub (dest, dest, vec_const_xor));
> >  }
> > diff --git a/gcc/testsuite/gcc.target/i386/pr96262-1.c
> > b/gcc/testsuite/gcc.target/i386/pr96262-1.c
> > new file mode 100644
> > index 000..1825388072e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr96262-1.c
> > @@ -0,0 +1,11 @@
> > +/* PR target/96262 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx512bw -O" } */
> > +
> > +typedef char __attribute__ ((__vector_size__ (64))) V;
> > +
> > +V
> > +foo (V v)
> > +{
> > +  return ~(v << 1);
> > +}
> > --
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

Re: [PATCH][AVX512][PR96246] Merge two define_insn: _blendm, _load_mask.

2020-08-04 Thread Hongtao Liu via Gcc-patches

ping ^2

On Mon, Jul 27, 2020 at 5:31 PM Hongtao Liu  wrote:
>
> ping
>
> On Wed, Jul 22, 2020 at 12:59 PM Hongtao Liu  wrote:
> >
> >   Those two define_insns have same pattern, and
> > _load_mask would always be matched since it show up
> > earlier in the md file, and it may lose some opportunity in
> > pass_reload since _load_mask only have constraint "0C"
> > for operand2, and "v" constraint in _vblendm would never
> > be matched.
> >
> > 2020-07-21  Hongtao Liu  
> >
> > gcc/
> >PR target/96246
> > * config/i386/sse.md (_load_mask,
> > _load_mask): Extend to generate blendm
> > instructions.
> > (_blendm, _blendm): Change
> > define_insn to define_expand.
> >
> > gcc/testsuite/
> > * gcc.target/i386/avx512bw-pr96246-1.c: New test.
> > * gcc.target/i386/avx512bw-pr96246-2.c: New test.
> > * gcc.target/i386/avx512vl-pr96246-1.c: New test.
> > * gcc.target/i386/avx512vl-pr96246-2.c: New test.
> > * gcc.target/i386/avx512bw-vmovdqu16-1.c: New test.
> > * gcc.target/i386/avx512bw-vmovdqu8-1.c: New test.
> > * gcc.target/i386/avx512f-vmovapd-1.c: New test.
> > * gcc.target/i386/avx512f-vmovaps-1.c: New test.
> > * gcc.target/i386/avx512f-vmovdqa32-1.c: New test.
> > * gcc.target/i386/avx512f-vmovdqa64-1.c: New test.
> > * gcc.target/i386/avx512vl-pr92686-movcc-1.c: New test.
> > * gcc.target/i386/avx512vl-pr96246-1.c: New test.
> > * gcc.target/i386/avx512vl-pr96246-2.c: New test.
> > * gcc.target/i386/avx512vl-vmovapd-1.c: New test.
> > * gcc.target/i386/avx512vl-vmovaps-1.c: New test.
> > * gcc.target/i386/avx512vl-vmovdqa32-1.c: New test.
> > * gcc.target/i386/avx512vl-vmovdqa64-1.c: New test.
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

Re: [PATCH] [AVX512]For vector compare to mask register, UNSPEC is needed instead of comparison operator [PR96243]

2020-08-04 Thread Hongtao Liu via Gcc-patches

ping^2

On Mon, Jul 27, 2020 at 5:31 PM Hongtao Liu  wrote:
>
> ping
>
> On Mon, Jul 20, 2020 at 4:40 PM Hongtao Liu  wrote:
> >
> > Correct PR number in ChangeLog
> > it's pr96243.
> >
> > On Mon, Jul 20, 2020 at 1:46 PM Hongtao Liu  wrote:
> > >
> > > Hi:
> > >   For rtx like (eq:HI (V8SI 90) (V8SI 91)), cse will take it as a
> > > boolean value and try to do some optimization. But it is not true for
> > > vector compare, also other places in rtl passes hold the same
> > > assumption.
> > >
> > > Bootstrap is ok, regression test is ok for i386 backend.
> > >
> > > 2020-07-20  Hongtao Liu  
> > >
> > > gcc/
> > > PR target/96243
> > > * config/i386/i386-expand.c (ix86_expand_sse_cmp): Refine for
> > > maskcmp.
> > > (ix86_expand_mask_vec_cmp): Change prototype.
> > > * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): Change
> > > prototype.
> > > * config/i386/i386.c (ix86_print_operand): Remove operand
> > > modifier 'I'.
> > > * config/i386/sse.md
> > > (*_cmp3,
> > > *_cmp3,
> > > *_ucmp3,
> > > *_ucmp3,
> > > avx512f_maskcmp3): Deleted.
> > >
> > > gcc/testsuite
> > > * gcc.target/i386/pr92865-1.c: Adjust testcase.
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

1 2 >

1 - 100 of 119 matches

Mail list logo