Re: use -mfpu=auto for arm/simd/vmmla_1.c (was: use -mfpu=neon for arm/simd/vmmla_1.c)

2021-02-05 Thread Alexandre Oliva
On Jan 13, 2021, Alexandre Oliva  wrote:

> On Jan  5, 2021, Alexandre Oliva  wrote:
>> So this patch adds -mfpu=auto to the test, overriding any implicit
>> flags with the fpu implied by the arch.

>> * gcc.target/arm/simd/vmmla_1.c: Pass -mfpu=auto.

> Ping?
> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562798.html

Ping?

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Add release notes for new C2X features in GCC 11

2021-02-05 Thread Joseph Myers
Binary constants aren't mentioned individually in this list because
they are covered by the general wording about features previously
supported as extensions.  Allowing duplicate standard attributes isn't
mentioned as it's considered a minor change of details of a feature
supported in GCC 10, rather than a new feature.

Committed.

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index a8b036c3..ba8180e8 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -232,6 +232,47 @@ a work-in-progress.
   and -j to -H.
 
 
+C
+
+  Several new features from the upcoming C2X revision of the ISO C
+  standard are supported with -std=c2x
+  and -std=gnu2x.  Some of these features are also
+  supported as extensions when compiling for older language versions.
+  In addition to the features listed, some features previously
+  supported as extensions and now added to the C standard are enabled
+  by default in C2X mode and not diagnosed with -std=c2x
+  -Wpedantic.
+  
+The BOOL_MAX and BOOL_WIDTH macros
+are provided in limits.h.
+As in C++, function definitions no longer need to give names
+for unused function parameters.
+The expansions of the true and false
+macros in stdbool.h have changed so that they
+have type bool.
+The [[nodiscard]] standard attribute is now
+supported.
+The __has_c_attribute preprocessor operator is
+now supported.
+Macros INFINITY, NAN, FLT_SNAN,
+DBL_SNAN, LDBL_SNAN, DEC_INFINITY,
+DEC_NAN, and corresponding signaling NaN macros for
+_FloatN, _FloatNx
+and _DecimalN types, are provided
+in float.h.  There are also corresponding
+built-in functions __builtin_nansdN for
+decimal signaling NaNs.
+Macros FLT_IS_IEC_60559, DBL_IS_IEC_60559
+and LDBL_IS_IEC_60559 are provided
+in float.h.
+The feature test
+macro __STDC_WANT_IEC_60559_EXT__ is supported
+by float.h.
+Labels may appear before declarations and at the end of a
+compound statement.
+  
+
+
 C++
 
   The default mode has been changed to -std=gnu++17.

-- 
Joseph S. Myers
jos...@codesourcery.com


[committed] d: Remove the expansion of intrinsic and built-in codes from the DEF_D_INTRINSIC macro

2021-02-05 Thread Iain Buclaw via Gcc-patches
Hi,

This patch is a small tidy-up of the intrinsics.def file to make it
clear that the parameters refer to INTRINSIC and BUILT_IN codes.

Bootstrapped and regression tested on x86_64-linux-gnu, and committed to
mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* d-tree.h (DEF_D_INTRINSIC): Don't insert INTRINSIC_ into the
intrinsic code name.
* intrinsics.cc (DEF_D_INTRINSIC): Don't insert INTRISIC_ and
BUILT_IN_ into the intrinsic and built-in code names.
* intrinsics.def:  Explicitly use full intrinsic and built-in
codes in all definitions.
---
 gcc/d/d-tree.h   |   2 +-
 gcc/d/intrinsics.cc  |   2 +-
 gcc/d/intrinsics.def | 298 ---
 3 files changed, 195 insertions(+), 107 deletions(-)

diff --git a/gcc/d/d-tree.h b/gcc/d/d-tree.h
index c41969e36b1..724a0bfd87b 100644
--- a/gcc/d/d-tree.h
+++ b/gcc/d/d-tree.h
@@ -80,7 +80,7 @@ enum level_kind
 
 enum intrinsic_code
 {
-#define DEF_D_INTRINSIC(CODE, B, N, M, D, C) INTRINSIC_ ## CODE,
+#define DEF_D_INTRINSIC(CODE, B, N, M, D, C) CODE,
 
 #include "intrinsics.def"
 
diff --git a/gcc/d/intrinsics.cc b/gcc/d/intrinsics.cc
index 7f97c1d1fa8..539dc0c1f37 100644
--- a/gcc/d/intrinsics.cc
+++ b/gcc/d/intrinsics.cc
@@ -62,7 +62,7 @@ struct intrinsic_decl
 static const intrinsic_decl intrinsic_decls[] =
 {
 #define DEF_D_INTRINSIC(CODE, BUILTIN, NAME, MODULE, DECO, CTFE) \
-{ INTRINSIC_ ## CODE, BUILT_IN_ ## BUILTIN, NAME, MODULE, DECO, CTFE },
+{ CODE, BUILTIN, NAME, MODULE, DECO, CTFE },
 
 #include "intrinsics.def"
 
diff --git a/gcc/d/intrinsics.def b/gcc/d/intrinsics.def
index dc6b104f6d5..f5af2a5fe5c 100644
--- a/gcc/d/intrinsics.def
+++ b/gcc/d/intrinsics.def
@@ -36,133 +36,221 @@ along with GCC; see the file COPYING3.  If not see
 #define DEF_CTFE_BUILTIN(C, B, N, M, D) \
   DEF_D_INTRINSIC (C, B, N, M, D, true)
 
-DEF_D_BUILTIN (NONE, NONE, 0, 0, 0)
+DEF_D_BUILTIN (INTRINSIC_NONE, BUILT_IN_NONE, 0, 0, 0)
 
 /* core.bitop intrinsics.  */
 
-DEF_D_BUILTIN (BSF, NONE, "bsf", "core.bitop", "FNaNbNiNfkZi")
-DEF_D_BUILTIN (BSR, NONE, "bsr", "core.bitop", "FNaNbNiNfkZi")
-DEF_D_BUILTIN (BT, NONE, "bt", "core.bitop", "FNaNbNiMxPkkZi")
-DEF_D_BUILTIN (BTC, NONE, "btc", "core.bitop", "FNaNbNiPkkZi")
-DEF_D_BUILTIN (BTR, NONE, "btr", "core.bitop", "FNaNbNiPkkZi")
-DEF_D_BUILTIN (BTS, NONE, "bts", "core.bitop", "FNaNbNiPkkZi")
-DEF_D_BUILTIN (BSF64, NONE, "bsf", "core.bitop", "FNaNbNiNfmZi")
-DEF_D_BUILTIN (BSR64, NONE, "bsr", "core.bitop", "FNaNbNiNfmZi")
-DEF_D_BUILTIN (BT64, NONE, "bt", "core.bitop", "FNaNbNiMxPmmZi")
-DEF_D_BUILTIN (BTC64, NONE, "btc", "core.bitop", "FNaNbNiPmmZi")
-DEF_D_BUILTIN (BTR64, NONE, "btr", "core.bitop", "FNaNbNiPmmZi")
-DEF_D_BUILTIN (BTS64, NONE, "bts", "core.bitop", "FNaNbNiPmmZi")
-
-DEF_D_BUILTIN (BSWAP16, BSWAP16, "byteswap", "core.bitop", "FNaNbNiNftZt")
-DEF_D_BUILTIN (BSWAP32, BSWAP32, "bswap", "core.bitop", "FNaNbNiNfkZk")
-DEF_D_BUILTIN (BSWAP64, BSWAP64, "bswap", "core.bitop", "FNaNbNiNfmZm")
-
-DEF_D_BUILTIN (POPCNT32, NONE, "popcnt", "core.bitop", "FNaNbNiNfkZi")
-DEF_D_BUILTIN (POPCNT64, NONE, "popcnt", "core.bitop", "FNaNbNiNfmZi")
-
-DEF_D_BUILTIN (ROL, NONE, "rol", "core.bitop", "FNaI1TkZI1T")
-DEF_D_BUILTIN (ROL_TIARG, NONE, "rol", "core.bitop", "FNaI1TZI1T")
-DEF_D_BUILTIN (ROR, NONE, "ror", "core.bitop", "FNaI1TkZI1T")
-DEF_D_BUILTIN (ROR_TIARG, NONE, "ror", "core.bitop", "FNaI1TZI1T")
+DEF_D_BUILTIN (INTRINSIC_BSF, BUILT_IN_NONE, "bsf", "core.bitop",
+  "FNaNbNiNfkZi")
+DEF_D_BUILTIN (INTRINSIC_BSR, BUILT_IN_NONE, "bsr", "core.bitop",
+  "FNaNbNiNfkZi")
+DEF_D_BUILTIN (INTRINSIC_BT, BUILT_IN_NONE, "bt", "core.bitop",
+  "FNaNbNiMxPkkZi")
+DEF_D_BUILTIN (INTRINSIC_BTC, BUILT_IN_NONE, "btc", "core.bitop",
+  "FNaNbNiPkkZi")
+DEF_D_BUILTIN (INTRINSIC_BTR, BUILT_IN_NONE, "btr", "core.bitop",
+  "FNaNbNiPkkZi")
+DEF_D_BUILTIN (INTRINSIC_BTS, BUILT_IN_NONE, "bts", "core.bitop",
+  "FNaNbNiPkkZi")
+DEF_D_BUILTIN (INTRINSIC_BSF64, BUILT_IN_NONE, "bsf", "core.bitop",
+  "FNaNbNiNfmZi")
+DEF_D_BUILTIN (INTRINSIC_BSR64, BUILT_IN_NONE, "bsr", "core.bitop",
+  "FNaNbNiNfmZi")
+DEF_D_BUILTIN (INTRINSIC_BT64, BUILT_IN_NONE, "bt", "core.bitop",
+  "FNaNbNiMxPmmZi")
+DEF_D_BUILTIN (INTRINSIC_BTC64, BUILT_IN_NONE, "btc", "core.bitop",
+  "FNaNbNiPmmZi")
+DEF_D_BUILTIN (INTRINSIC_BTR64, BUILT_IN_NONE, "btr", "core.bitop",
+  "FNaNbNiPmmZi")
+DEF_D_BUILTIN (INTRINSIC_BTS64, BUILT_IN_NONE, "bts", "core.bitop",
+  "FNaNbNiPmmZi")
+
+DEF_D_BUILTIN (INTRINSIC_BSWAP16, BUILT_IN_BSWAP16, "byteswap", "core.bitop",
+  "FNaNbNiNftZt")
+DEF_D_BUILTIN (INTRINSIC_BSWAP32, BUILT_IN_BSWAP32, "bswap", "core.bitop",
+  "FNaNbNiNfkZk")
+DEF_D_BUILTIN (INTRINSIC_BSWAP64, BUILT_IN_BSWAP64, "bswap", "core.bitop",
+  "FNaNbNiNfmZm")
+
+DEF_D_BUILTIN (INTRINSIC_POPCNT32, 

Re: [PATCH] rs6000: Fix MMA API - Add support for compatibility built-ins

2021-02-05 Thread Segher Boessenkool
On Fri, Feb 05, 2021 at 04:11:30PM +0100, Florian Weimer wrote:
> * Peter Bergner:
> > On 2/5/21 4:28 AM, Florian Weimer wrote:
> >> Maybe add a check that the compatibility builtins are flagged as
> >> availble using __has_builtin?
> >
> > Do you mean add a test in the testsuite for this?  I can check on
> > adding that to the test case.
> 
> Right, in the test case.  Given that it's a new kind of built-in.
> (Not sure if it makes sense.)

There aren't many such tests yet, so it will be helpful just because of
that.  But __has_builtin should work for any builtin function
whatsoever, and there is nothing special about these compatibility
builtins (it is just a name, it is defined as any other).


Segher


Re: [PATCH] rs6000: Fix MMA API - Add support for compatibility built-ins

2021-02-05 Thread Segher Boessenkool
On Thu, Feb 04, 2021 at 10:05:19PM -0600, Peter Bergner wrote:
> On 2/4/21 3:16 PM, Segher Boessenkool wrote:
> > On Thu, Feb 04, 2021 at 02:40:20PM -0600, Peter Bergner wrote:
> >> The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
> >> __builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
> >> __builtin_vsx_disassemble_pair respectively.  It's too late to remove the
> >> old names, so this patch adds support for creating compatibility built-ins
> >> (ie, multiple built-in functions generate the same code) and then creates
> >> compatibility built-ins using the new names.

> >> +#ifndef RS6000_BUILTIN_COMPAT
> >> +  #undef BU_COMPAT
> >> +  #define BU_COMPAT(ENUM, COMPAT_NAME)
> > 
> > Please do not do #undef unless necessary: it hides bugs (and that causes
> > more bugs).
> 
> I thought it was needed, since rs6000-builtin.def is #included into
> rs6000-call.c and rs6000.h multiple times.

Ah yes, that is a good enough reason.  Bill's rewrite will clean all
this (and much more) up anyway, so it is fine for now.

> >>  BU_MMA_3 (ASSEMBLE_PAIR,"assemble_pair",  MISC, mma_assemble_pair)
> >> +BU_COMPAT (MMA_BUILTIN_ASSEMBLE_PAIR, "vsx_assemble_pair")
> > 
> > You should do those the other way around (the mma_ one is the
> > compatibility one).  This matters, because if you disable the
> > compatibility builtins the vsx_ one should still be there, but not the
> > old name.  (It also makes more sense of course).
> 
> I actually did it that way initially, but decided to do it this was
> just to make the patch smaller.  You are correct that it's "more"
> correct to rename it though. 

It is less work (less confusion) in the future, and the number of lines
isn't important anyway :-)


Segher


Re: [PATCH] c++: Private parent access check for using decls [PR19377]

2021-02-05 Thread Marek Polacek via Gcc-patches
On Fri, Feb 05, 2021 at 05:28:07PM +, Anthony Sharp wrote:
> Hi Marek,
> 
> > Pedantically, "Altered function." is not very good, it should say what
> > in enforce_access changed.
> 
> I would normally 100% agree, but the changes are a bit too complicated
> to be concisely (and helpfully) described there. I think the patch
> description covers it well enough; not sure what I would say without
> having to write a big paragraph there.

I think at least something like "Improve private access checking." would
be better.  No need to be verbose in the ChangeLog.  :)
 
> > Let's move the test into g++.dg/cpp1z and call it using9.C.
> 
> Okie dokie - it's a bit hard to know where stuff's supposed to go in
> that folder. I'll put a comment in the testcase mentioning pr19377
> just in case there's ever a regression.

Yeah, it's customary to start a test with
// PR c++/19377

> > I don't understand the "generate a diagnostic decl location".  Maybe just
> > "generate a diagnostic?"
> 
> get_class_access_diagnostic_decl returns a decl which is used as the
> location that the error-producing code points to as the source of the
> problem. It could be better - I will change it to say "Examine certain
> special cases to find the decl that is the source of the problem" to
> make it a bit clearer.

Oh, I see now.
 
> > These two comments can be merged into one.
> 
> Technically yes ... but I like how it is since in a very subtle way it
> creates a sense of separation between the first two paragraphs and the
> third. The first two paras are sort of an introduction and the third
> begins to describe the code.

Fair enough.
 
> > I think for comparing a binfo with a type we should use SAME_BINFO_TYPE_P.
> 
> Okay, I think that simplifies the code a bit aswell.
> 
> > This first term doesn't need to be wrapped in ().
> 
> I normally wouldn't use the (), but I think that's how Jason likes it
> so that's why I did it. I guess it makes it more readable.
> 
> > This could be part of the if above.
> 
> Oops - I forgot to change that when I modified the code.
> 
> > Just "had" instead of "did had"?
> 
> Good spot - that's a spelling mistake on my part. Should be "did have".
> 
> > Looks like this line is indented too much (even in the newer patch).
> 
> You're right (if you meant access_failure_reason = ak_private), I will change.

Yup, this one.

> If you mean get_class_access_diagnostic_decl (parent_binfo, diag_decl)
> then I donno, because Linux Text Editor says both are on column 64.
> 
> To be honest, I'm sure there is a way to do it, but I'm not really
> sure how to consistently align code. Every text/code editor/browser
> seems to give different column numbers (and display it differently)
> since they seem to all treat whitespace differently.

Yeah, that can be a pain.  Best if your editor does it for you.  If you
use vim, you can use gcc/contrib/vimrc and then vim will do most of the
formatting for you.
 
> > We usually use dg-do compile.
> 
> True, but wouldn't that technically be slower? I would agree if it
> were more than just error-handling code.
> 
> > Best to replace both with
> > // { dg-do compile { target c++17 } }
> 
> Okie dokie. I did see both being used and I wasn't sure which to go for.
> 
> I'll probably send another patch over tomorrow.

Sounds good, thanks!

> On Fri, 5 Feb 2021 at 16:06, Marek Polacek  wrote:
> >
> > Hi,
> >
> > a few comments:
> >
> > On Fri, Feb 05, 2021 at 03:39:25PM +, Anthony Sharp via Gcc-patches 
> > wrote:
> > > 2021-02-05  Anthony Sharp  
> > >
> > >   * semantics.c (get_class_access_diagnostic_decl): New function.
> > >   (enforce_access): Altered function.
> >
> > Pedantically, "Altered function." is not very good, it should say what
> > in enforce_access changed.
> >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2021-02-05  Anthony Sharp  
> > >
> > >   * g++.dg/pr19377.C: New test.
> >
> > Let's move the test into g++.dg/cpp1z and call it using9.C.
> >
> > >  gcc/cp/semantics.c | 89 ++
> > >  gcc/testsuite/g++.dg/pr19377.C | 28 +++
> > >  2 files changed, 98 insertions(+), 19 deletions(-)
> > >  create mode 100644 gcc/testsuite/g++.dg/pr19377.C
> > >
> > > diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
> > > index 73834467fca..ffb627f08ea 100644
> > > --- a/gcc/cp/semantics.c
> > > +++ b/gcc/cp/semantics.c
> > > @@ -256,6 +256,58 @@ pop_to_parent_deferring_access_checks (void)
> > >  }
> > >  }
> > >
> > > +/* Called from enforce_access.  A class has attempted (but failed) to 
> > > access
> > > +   DECL.  It is already established that a baseclass of that class,
> > > +   PARENT_BINFO, has private access to DECL.  Examine certain special 
> > > cases to
> > > +   generate a diagnostic decl location.  If no special cases are found, 
> > > simply
> >
> > I don't understand the "generate a diagnostic decl location".  Maybe just
> > "generate a diagnostic?"
> >
> > > +   return DECL.  */
> > > +
> > > 

Re: PR98974: Fix vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread Andre Vieira (lists) via Gcc-patches



On 05/02/2021 12:47, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:

Hi,

As mentioned in the PR, this patch fixes up the nvectors parameter passed to 
vect_get_loop_mask in vectorizable_condition.
Before the STMT_VINFO_VEC_STMTS rework we used to handle each ncopy separately, 
now we gather them all at the same time and don't need to multiply vec_num with 
ncopies.

The reduced testcase I used to illustrate the issue in the PR gives a warning, 
if someone knows how to get rid of that (it's Fortran) I'd include it as a 
testcase for this.

Looks like Richi's since posted one.

Included it.

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 
0bc1cb1c5b4f6c1f0447241b4d31434bf7cca1a4..d07602f6d38f9c51936ac09942599fc5a14f46ab
 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -10237,8 +10237,7 @@ vectorizable_condition (vec_info *vinfo,
{
  unsigned vec_num = vec_oprnds0.length ();
  tree loop_mask
-   = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
- vectype, i);
+   = vect_get_loop_mask (gsi, masks, vec_num, vectype, i);
  tree tmp2 = make_ssa_name (vec_cmp_type);
  gassign *g
= gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare,

Does removing the shadowed vec_num work?  I think that would be less
confusing, and means that the calculation stays in sync with the

Yeah that works too.

Here's a reworked patch.


gcc/ChangeLog:
2021-02-05  Andre Vieira  

    PR middle-end/98974
    * tree-vect-stmts.c (vectorizable_condition): Fix nvectors 
parameter

    for vect_get_loop_mask call.

gcc/testsuite/ChangeLog:
2021-02-05  Andre Vieira  

    * gfortran.dg/pr98974.F90: New test.
diff --git a/gcc/testsuite/gfortran.dg/pr98974.F90 
b/gcc/testsuite/gfortran.dg/pr98974.F90
new file mode 100644
index 
..b3db6a6654a0b36bc567405c70429a5dbe168d1e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr98974.F90
@@ -0,0 +1,20 @@
+! PR middle-end/98974
+! { dg-do compile { target { aarch64*-*-* } } }
+! { dg-options "-Ofast -mcpu=neoverse-v1" }
+
+module module_foobar
+  integer,parameter :: fp_kind = selected_real_kind(15)
+contains
+ subroutine foobar( foo, ix ,jx ,kx,iy,ky)
+   real, dimension( ix, kx, jx )  :: foo
+   real(fp_kind), dimension( iy, ky, 3 ) :: bar, baz
+   do k=1,ky
+  do i=1,iy
+if ( baz(i,k,1) > 0. ) then
+  bar(i,k,1) = 0
+endif
+foo(i,nk,j) = baz0 *  bar(i,k,1)
+  enddo
+   enddo
+ end
+end
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 
0bc1cb1c5b4f6c1f0447241b4d31434bf7cca1a4..064e5d138ce9a151287662a0caefd9925b0a2920
 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -10235,7 +10235,6 @@ vectorizable_condition (vec_info *vinfo,
 
  if (masks)
{
- unsigned vec_num = vec_oprnds0.length ();
  tree loop_mask
= vect_get_loop_mask (gsi, masks, vec_num * ncopies,
  vectype, i);


Re: [PATCH] c++: Private parent access check for using decls [PR19377]

2021-02-05 Thread Anthony Sharp via Gcc-patches
Hi Marek,

> Pedantically, "Altered function." is not very good, it should say what
> in enforce_access changed.

I would normally 100% agree, but the changes are a bit too complicated
to be concisely (and helpfully) described there. I think the patch
description covers it well enough; not sure what I would say without
having to write a big paragraph there.

> Let's move the test into g++.dg/cpp1z and call it using9.C.

Okie dokie - it's a bit hard to know where stuff's supposed to go in
that folder. I'll put a comment in the testcase mentioning pr19377
just in case there's ever a regression.

> I don't understand the "generate a diagnostic decl location".  Maybe just
> "generate a diagnostic?"

get_class_access_diagnostic_decl returns a decl which is used as the
location that the error-producing code points to as the source of the
problem. It could be better - I will change it to say "Examine certain
special cases to find the decl that is the source of the problem" to
make it a bit clearer.

> These two comments can be merged into one.

Technically yes ... but I like how it is since in a very subtle way it
creates a sense of separation between the first two paragraphs and the
third. The first two paras are sort of an introduction and the third
begins to describe the code.

> I think for comparing a binfo with a type we should use SAME_BINFO_TYPE_P.

Okay, I think that simplifies the code a bit aswell.

> This first term doesn't need to be wrapped in ().

I normally wouldn't use the (), but I think that's how Jason likes it
so that's why I did it. I guess it makes it more readable.

> This could be part of the if above.

Oops - I forgot to change that when I modified the code.

> Just "had" instead of "did had"?

Good spot - that's a spelling mistake on my part. Should be "did have".

> Looks like this line is indented too much (even in the newer patch).

You're right (if you meant access_failure_reason = ak_private), I will change.

If you mean get_class_access_diagnostic_decl (parent_binfo, diag_decl)
then I donno, because Linux Text Editor says both are on column 64.

To be honest, I'm sure there is a way to do it, but I'm not really
sure how to consistently align code. Every text/code editor/browser
seems to give different column numbers (and display it differently)
since they seem to all treat whitespace differently.

> We usually use dg-do compile.

True, but wouldn't that technically be slower? I would agree if it
were more than just error-handling code.

> Best to replace both with
> // { dg-do compile { target c++17 } }

Okie dokie. I did see both being used and I wasn't sure which to go for.

I'll probably send another patch over tomorrow.

Anthony


On Fri, 5 Feb 2021 at 16:06, Marek Polacek  wrote:
>
> Hi,
>
> a few comments:
>
> On Fri, Feb 05, 2021 at 03:39:25PM +, Anthony Sharp via Gcc-patches wrote:
> > 2021-02-05  Anthony Sharp  
> >
> >   * semantics.c (get_class_access_diagnostic_decl): New function.
> >   (enforce_access): Altered function.
>
> Pedantically, "Altered function." is not very good, it should say what
> in enforce_access changed.
>
> > gcc/testsuite/ChangeLog:
> >
> > 2021-02-05  Anthony Sharp  
> >
> >   * g++.dg/pr19377.C: New test.
>
> Let's move the test into g++.dg/cpp1z and call it using9.C.
>
> >  gcc/cp/semantics.c | 89 ++
> >  gcc/testsuite/g++.dg/pr19377.C | 28 +++
> >  2 files changed, 98 insertions(+), 19 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/pr19377.C
> >
> > diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
> > index 73834467fca..ffb627f08ea 100644
> > --- a/gcc/cp/semantics.c
> > +++ b/gcc/cp/semantics.c
> > @@ -256,6 +256,58 @@ pop_to_parent_deferring_access_checks (void)
> >  }
> >  }
> >
> > +/* Called from enforce_access.  A class has attempted (but failed) to 
> > access
> > +   DECL.  It is already established that a baseclass of that class,
> > +   PARENT_BINFO, has private access to DECL.  Examine certain special 
> > cases to
> > +   generate a diagnostic decl location.  If no special cases are found, 
> > simply
>
> I don't understand the "generate a diagnostic decl location".  Maybe just
> "generate a diagnostic?"
>
> > +   return DECL.  */
> > +
> > +static tree
> > +get_class_access_diagnostic_decl (tree parent_binfo, tree decl)
> > +{
> > +  /* When a class is denied access to a decl in a baseclass, most of the
> > + time it is because the decl itself was declared as private at the 
> > point
> > + of declaration.  So, by default, DECL is at fault.
> > +
> > + However, in C++, there are (at least) two situations in which a decl
> > + can be private even though it was not originally defined as such.  */
> > +
> > +  /* These two situations only apply if a baseclass had private access to
> > + DECL (this function is only called if that is the case).  We must 
> > however
> > + also ensure that the reason the parent had private 

Re: [PATCH 3/4] openacc: Fix lowering for derived-type mappings through array elements

2021-02-05 Thread Tobias Burnus

(CC fortran@)

Hi Julian,

not doing an extensive review yet, but the following gives an ICE
with this patch applied. (I believe the others are already in, aren't they?)

type t
 integer :: i, j
end type t
type t2
 type(t) :: b(4)
end type
type(t2) :: var(10)
!$acc update host(var(3)%b(:)%j)
!$acc update host(var(3)%b%j)
end

That's a noncontiguous array – which is permitted for 'update'
and it gives an ICE via:

0x9b0c59 gfc_conv_scalarized_array_ref
../../repos/gcc/gcc/fortran/trans-array.c:3570
0x9b2134 gfc_conv_array_ref(gfc_se*, gfc_array_ref*, gfc_expr*, locus*)
../../repos/gcc/gcc/fortran/trans-array.c:3721
0x9e9cc6 gfc_conv_variable
../../repos/gcc/gcc/fortran/trans-expr.c:2998
0xa22682 gfc_trans_omp_clauses
../../repos/gcc/gcc/fortran/trans-openmp.c:2963

 * * *


+   bool allocatable = false, pointer = false;
+
+   if (lastref && lastref->type == REF_COMPONENT)
+ {
+   gfc_component *c = lastref->u.c.component;
+
+   if (c->ts.type == BT_CLASS)
+ {
+   pointer = CLASS_DATA (c)->attr.class_pointer;
+   allocatable = CLASS_DATA (c)->attr.allocatable;
+ }
+   else
+ {
+   pointer = c->attr.pointer;
+   allocatable = c->attr.allocatable;
+ }
+ }
+


I am not sure how the rest will change, but I was wondering
whether the following helps. I see that 'lastref' is used
elsewhere – hence, I am not sure whether it is indeed better.

symbol_attribute attr = {};
if (n->expr)
  attr = gfc_expr_attr (n->expr);

(Not really looked at the rest before wondering whether
my testcase above is handled – which isn't.)

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [committed][nvptx] Set -misa=sm_35 by default

2021-02-05 Thread Thomas Schwinge
Hi Tom!

Ping.


Grüße
 Thomas


On 2021-01-22T16:23:57+0100, I wrote:
> Hi Tom!
>
> Ping.
>
>
> Grüße
>  Thomas
>
>
> On 2021-01-13T12:59:14+0100, I wrote:
>> Hi Tom!
>>
>> On 2020-10-09T13:56:09+0200, Tom de Vries  wrote:
>>> The nvptx-as assembler verifies the ptx code using ptxas, if there's any
>>> in the PATH.
>>>
>>
>> After quite some digression to first add a testsuite to nvptx-tools (see
>>  or just
>> ), which
>> I found advisable generally, and then given the kinds of changes we're
>> now doing :-) -- I've now prepared nvptx-as code changes as discussed in
>>  "nvptx-as
>> should not assume a default sm version".  (Currently testing.)
>>
>>> The default in the nvptx port for -misa=sm_xx is sm_30, but the ptxas of the
>>> latest cuda release (11.1) no longer supports sm_30.
>>>
>>> Consequently we cannot build gcc against that release (although we should
>>> still be able to build without any cuda release).
>>>
>>> Fix this by setting -misa=sm_35 by default.
>>>
>>> Tested check-gcc on nvptx.
>>>
>>> Tested libgomp on x86_64-linux with nvpx accelerator.
>>>
>>> Both build against cuda 9.1.
>>>
>>> Committed to trunk.
>>
>> ACK.
>>
>> What is your opinion about backporting that (plus Tobias' documentation
>> update, plus corresponding web 'changes.html' updates?) to release
>> branches, so that nvptx offloading users may use GCC 10/9/8 with CUDA
>> 11.0+?
>>
>> I don't think losing sm_30 support is a major concern: support for sm_35
>> has been introduced with PTX ISA 3.1, CUDA 5.0, driver r302.
>>
>>
>> Further:
>>
>>> [nvptx] Set -misa=sm_35 by default
>>
>>> PR target/97348
>>> * config/nvptx/nvptx.h (ASM_SPEC): Also pass -m to nvptx-as if
>>> default is used.
>>> * config/nvptx/nvptx.opt (misa): Init with PTX_ISA_SM35.
>>
>>> --- a/gcc/config/nvptx/nvptx.h
>>> +++ b/gcc/config/nvptx/nvptx.h
>>
>>> -#define ASM_SPEC "%{misa=*:-m %*}"
>>> +/* Default needs to be in sync with default for misa in nvptx.opt.
>>> +   We add a default here to work around a hard-coded sm_30 default in
>>> +   nvptx-as.  */
>>> +#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"
>>
>>> --- a/gcc/config/nvptx/nvptx.opt
>>> +++ b/gcc/config/nvptx/nvptx.opt
>>
>>> +; Default needs to be in sync with default in ASM_SPEC in nvptx.h.
>>>  misa=
>>> -Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
>>> Init(PTX_ISA_SM30)
>>> +Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
>>> Init(PTX_ISA_SM35)
>>>  Specify the version of the ptx ISA to use.
>>
>> As I'd suggested in
>>  "nvptx-as
>> should not assume a default sm version", I'd then push the attached
>> "[nvptx] Let nvptx-as figure out the target architecture [PR97348]" to
>> GCC master branch, OK?  (Currently testing.)
>>
>> That one I wouldn't backport to GCC release branches, so that we don't
>> require users to update nvptx-tools for these builds.
>>
>>
>> Grüße
>>  Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From eac0d3458f38cd5bb4c930b2887a547b64b046ef Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 13 Jan 2021 09:04:47 +0100
Subject: [PATCH] [nvptx] Let nvptx-as figure out the target architecture
 [PR97348]

... now that it has been enhanced to do so.

This is a follow-up to PR97348 commit 383400a6078d75bbfa1216c9af2c37f7e88740c9
"[nvptx] Set -misa=sm_35 by default".

	gcc/
	PR target/97348
	* config/nvptx/nvptx.h (ASM_SPEC): Don't set.
	* config/nvptx/nvptx.opt (misa): Adjust comment.
---
 gcc/config/nvptx/nvptx.h   | 5 -
 gcc/config/nvptx/nvptx.opt | 1 -
 2 files changed, 6 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 2451703e77f..1a61e6207f6 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -29,11 +29,6 @@
 
 #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
 
-/* Default needs to be in sync with default for misa in nvptx.opt.
-   We add a default here to work around a hard-coded sm_30 default in
-   nvptx-as.  */
-#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"
-
 #define TARGET_CPU_CPP_BUILTINS()		\
   do		\
 {		\
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 51363e4e276..cf7f9022663 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -61,7 +61,6 @@ Enum(ptx_isa) String(sm_30) Value(PTX_ISA_SM30)
 EnumValue
 Enum(ptx_isa) String(sm_35) Value(PTX_ISA_SM35)
 
-; Default needs to be in sync with default in ASM_SPEC in nvptx.h.
 misa=
 Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) Init(PTX_ISA_SM35)
 Specify the version of the ptx ISA to use.
-- 
2.17.1



Re: [PATCH] c++: Private parent access check for using decls [PR19377]

2021-02-05 Thread Marek Polacek via Gcc-patches
Hi,

a few comments:

On Fri, Feb 05, 2021 at 03:39:25PM +, Anthony Sharp via Gcc-patches wrote:
> 2021-02-05  Anthony Sharp  
> 
>   * semantics.c (get_class_access_diagnostic_decl): New function.
>   (enforce_access): Altered function.

Pedantically, "Altered function." is not very good, it should say what
in enforce_access changed.

> gcc/testsuite/ChangeLog:
> 
> 2021-02-05  Anthony Sharp  
> 
>   * g++.dg/pr19377.C: New test.

Let's move the test into g++.dg/cpp1z and call it using9.C.

>  gcc/cp/semantics.c | 89 ++
>  gcc/testsuite/g++.dg/pr19377.C | 28 +++
>  2 files changed, 98 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/pr19377.C
> 
> diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
> index 73834467fca..ffb627f08ea 100644
> --- a/gcc/cp/semantics.c
> +++ b/gcc/cp/semantics.c
> @@ -256,6 +256,58 @@ pop_to_parent_deferring_access_checks (void)
>  }
>  }
>  
> +/* Called from enforce_access.  A class has attempted (but failed) to access
> +   DECL.  It is already established that a baseclass of that class,
> +   PARENT_BINFO, has private access to DECL.  Examine certain special cases 
> to
> +   generate a diagnostic decl location.  If no special cases are found, 
> simply

I don't understand the "generate a diagnostic decl location".  Maybe just
"generate a diagnostic?"

> +   return DECL.  */
> +
> +static tree
> +get_class_access_diagnostic_decl (tree parent_binfo, tree decl)
> +{
> +  /* When a class is denied access to a decl in a baseclass, most of the
> + time it is because the decl itself was declared as private at the point
> + of declaration.  So, by default, DECL is at fault.
> +
> + However, in C++, there are (at least) two situations in which a decl
> + can be private even though it was not originally defined as such.  */
> +
> +  /* These two situations only apply if a baseclass had private access to
> + DECL (this function is only called if that is the case).  We must 
> however
> + also ensure that the reason the parent had private access wasn't simply
> + because it was declared as private in the parent.  */

These two comments can be merged into one.

> +  if (context_for_name_lookup (decl) == BINFO_TYPE (parent_binfo))
> +return decl;

I think for comparing a binfo with a type we should use SAME_BINFO_TYPE_P.

> +  /* 1.  If the "using" keyword is used to inherit DECL within a baseclass,
> + this may cause DECL to be private, so we return the using statement as
> + the source of the problem.
> +
> + Scan the fields of PARENT_BINFO and see if there are any using decls.  
> If
> + there are, see if they inherit DECL.  If they do, that's where DECL was
> + truly declared private.  */
> +  for (tree parent_field = TYPE_FIELDS (BINFO_TYPE (parent_binfo));
> +  parent_field;
> +  parent_field = DECL_CHAIN (parent_field))
> +{
> +  if ((TREE_CODE (parent_field) == USING_DECL)

This first term doesn't need to be wrapped in ().

> +  && TREE_PRIVATE (parent_field))
> + {
> +   if (DECL_NAME (decl) == DECL_NAME (parent_field))

This could be part of the if above.  And then we can drop the braces (in the
if and for both).

> + return parent_field;
> + }
> +}
> +
> +  /* 2.  If decl was privately inherited by a baseclass of the current class,
> + then decl will be inaccessible, even though it may originally have
> + been accessible to deriving classes.  In that case, the fault lies with
> + the baseclass that used a private inheritance, so we return the
> + baseclass type as the source of the problem.
> +
> + Since this is the last check, we just assume it's true.  */
> +  return TYPE_NAME (BINFO_TYPE (parent_binfo));
> +}
> +
>  /* If the current scope isn't allowed to access DECL along
> BASETYPE_PATH, give an error, or if we're parsing a function or class
> template, defer the access check to be performed at instantiation time.
> @@ -317,34 +369,33 @@ enforce_access (tree basetype_path, tree decl, tree 
> diag_decl,
>   diag_decl = strip_inheriting_ctors (diag_decl);
>if (complain & tf_error)
>   {
> -   /* We will usually want to point to the same place as
> -  diag_decl but not always.  */
> +   access_kind access_failure_reason = ak_none;
> +
> +   /* By default, using the decl as the source of the problem will
> +  usually give correct results.  */
> tree diag_location = diag_decl;
> -   access_kind parent_access = ak_none;
>  
> -   /* See if any of BASETYPE_PATH's parents had private access
> -  to DECL.  If they did, that will tell us why we don't.  */
> +   /* However, if a parent of BASETYPE_PATH had private access to decl,
> +  then it actually might be the case that the source of the problem
> +  is not DECL.  */
> tree parent_binfo = 

Re: [PATCH] c++: Private parent access check for using decls [PR19377]

2021-02-05 Thread Anthony Sharp via Gcc-patches
Sorry - last one had a formatting error. This one might be better.
Linux text editor doesn't consistently show whitespace for me!

On Fri, 5 Feb 2021 at 15:39, Anthony Sharp  wrote:
>
> > I'm imagining member functions, i.e. A::f() and A2::f(int).
>
> You're right - good spot.
>
> > == is enough, identifiers are unique.
>
>  Done.
>
> > Agreed, but the using-declaration might not be introducing DECL.  This
> > would be another way to skip irrelevant usings.
>
> Okie dokie.
>
> New patch attached (that rhymes?). C++ (compiled only) compiles fine
> and fixes the bug.
>
> Also modified the new regression test to use your overloaded function
> testcase. I made the testcase c++17 only ... I could also add a
> pre-c++17 testcase but not sure if that would really be beneficial?
> Anyways, regression tests (C++ only) now give:
>
> -- G++ --
>
> # of expected passes203708
> # of unexpected failures2
> # of expected failures989
> # of unsupported tests8714
>
> Anthony
>
>
> On Thu, 4 Feb 2021 at 20:55, Jason Merrill  wrote:
> >
> > On 2/4/21 12:46 PM, Anthony Sharp wrote:
> > >> Yes, thanks; it would take a lot to make me request less comments.
> > >
> > > Awesome.
> > >
> > >> The second lines of arguments are indented one space too far in both 
> > >> these calls.
> > >
> > > Oops! I will fix that.
> > >
> > >> Well, I guess it could be using a declaration of the same name from 
> > >> another base.
> > >
> > >   Yes I had been worrying about that.
> > >
> > >> But in that case you could end up with an overload set
> > >> containing both the decl we're complaining about and another of the same
> > >> name from another base, in which case the lookup result would include
> > >> both, and so the comparison would fail and we would fall through to the
> > >> private base assumption.
> > >
> > > I think technically yes ... but also no since such code would not
> > > compile anyways, plus oddly it seems to work anyway. For instance
> > > (assuming I'm understanding correctly), if you do this (with the patch
> > > applied):
> > >
> > > class A
> > > {
> > >protected:
> > >int i;
> > > };
> > >
> > > class A2
> > > {
> > >protected:
> > >int i;
> > > };
> > >
> > > class B:public A, public A2
> > > {
> > >private:
> > >using A::i, A2::i;
> > > };
> > >
> > > class C:public B
> > > {
> > >void f()
> > >{
> > >  A::i = 0;
> > >}
> > > };
> > >
> > > You get:
> > >
> > > error: redeclaration of ‘using A::i’
> > > using A::i;
> > >
> > > note: previous declaration ‘using A2::i’
> > >  using A2::i;
> > >
> > > error: redeclaration of ‘using A2::i’
> > > using A::i, A2::i;
> > >
> > > previous declaration ‘using A::i’
> > > using A::i, A2::i;
> > >
> > > In member function ‘void C::f()’:
> > > error: ‘int A::i’ is private within this context
> > > A::i = 0;
> > >
> > > note: declared private here
> > > using A::i, A2::i;
> > >
> > > Which seems to work (well ... more like fail to compile ... as
> > > expected). Maybe you're imagining a different situation to me?
> >
> > I'm imagining member functions, i.e. A::f() and A2::f(int).
> >
> > > You can even use void f() { i = 0; } and void f() { A2::i = 0; } and
> > > you seem to get the same results either which way (although, to be
> > > fair, if you do A2::i = 0, it suddenly doesn't complain about it being
> > > private anymore [no idea why], it just complains about the
> > > redeclaration , and once you fix the redeclaration, it THEN complains
> > > about being private, so it's got a bit of a quirk - don't think that's
> > > related to the patch though).
> >
> > That sounds fine, one error can hide another.
> >
> > >> But checking the name is a simple way to skip irrelevant usings.
> > >
> > > That does sound like a better way of doing it. Would I just do
> > > cp_tree_equal (DECL_NAME (blah1), DECL_NAME (blah2)) [assuming
> > > DECL_NAME returns a tree], or perhaps DECL_NAME (blah1) == DECL_NAME
> > > (blah2)?
> >
> > == is enough, identifiers are unique.
> >
> > >> Maybe also check that the using is TREE_PRIVATE?
> > >
> > > Would that be necessary? Maybe if you wanted to sanity-check it I
> > > suppose. We know for sure that PARENT_BINFO has private access to
> > > DECL. If we find a using statement introducing DECL in PARENT_BINFO,
> > > then surely the using statement must (by definition) have been
> > > private? If it were not private, then the child class would have been
> > > able to access it, and enforce_access wouldn't have thrown an error.
> > > It doesn't seem to be the case that DECL could be private for any
> > > other reason other than the using decl being private.
> >
> > Agreed, but the using-declaration might not be introducing DECL.  This
> > would be another way to skip irrelevant usings.
> >
> > > Let me know your thoughts and I will update the patch. Thanks for your 
> > > help.
> > >
> > > Anthony
> > >
> > >
> > > On Thu, 4 Feb 2021 at 16:33, Jason Merrill  wrote:

Re: [PATCH] c++: Private parent access check for using decls [PR19377]

2021-02-05 Thread Anthony Sharp via Gcc-patches
> I'm imagining member functions, i.e. A::f() and A2::f(int).

You're right - good spot.

> == is enough, identifiers are unique.

 Done.

> Agreed, but the using-declaration might not be introducing DECL.  This
> would be another way to skip irrelevant usings.

Okie dokie.

New patch attached (that rhymes?). C++ (compiled only) compiles fine
and fixes the bug.

Also modified the new regression test to use your overloaded function
testcase. I made the testcase c++17 only ... I could also add a
pre-c++17 testcase but not sure if that would really be beneficial?
Anyways, regression tests (C++ only) now give:

-- G++ --

# of expected passes203708
# of unexpected failures2
# of expected failures989
# of unsupported tests8714

Anthony


On Thu, 4 Feb 2021 at 20:55, Jason Merrill  wrote:
>
> On 2/4/21 12:46 PM, Anthony Sharp wrote:
> >> Yes, thanks; it would take a lot to make me request less comments.
> >
> > Awesome.
> >
> >> The second lines of arguments are indented one space too far in both these 
> >> calls.
> >
> > Oops! I will fix that.
> >
> >> Well, I guess it could be using a declaration of the same name from 
> >> another base.
> >
> >   Yes I had been worrying about that.
> >
> >> But in that case you could end up with an overload set
> >> containing both the decl we're complaining about and another of the same
> >> name from another base, in which case the lookup result would include
> >> both, and so the comparison would fail and we would fall through to the
> >> private base assumption.
> >
> > I think technically yes ... but also no since such code would not
> > compile anyways, plus oddly it seems to work anyway. For instance
> > (assuming I'm understanding correctly), if you do this (with the patch
> > applied):
> >
> > class A
> > {
> >protected:
> >int i;
> > };
> >
> > class A2
> > {
> >protected:
> >int i;
> > };
> >
> > class B:public A, public A2
> > {
> >private:
> >using A::i, A2::i;
> > };
> >
> > class C:public B
> > {
> >void f()
> >{
> >  A::i = 0;
> >}
> > };
> >
> > You get:
> >
> > error: redeclaration of ‘using A::i’
> > using A::i;
> >
> > note: previous declaration ‘using A2::i’
> >  using A2::i;
> >
> > error: redeclaration of ‘using A2::i’
> > using A::i, A2::i;
> >
> > previous declaration ‘using A::i’
> > using A::i, A2::i;
> >
> > In member function ‘void C::f()’:
> > error: ‘int A::i’ is private within this context
> > A::i = 0;
> >
> > note: declared private here
> > using A::i, A2::i;
> >
> > Which seems to work (well ... more like fail to compile ... as
> > expected). Maybe you're imagining a different situation to me?
>
> I'm imagining member functions, i.e. A::f() and A2::f(int).
>
> > You can even use void f() { i = 0; } and void f() { A2::i = 0; } and
> > you seem to get the same results either which way (although, to be
> > fair, if you do A2::i = 0, it suddenly doesn't complain about it being
> > private anymore [no idea why], it just complains about the
> > redeclaration , and once you fix the redeclaration, it THEN complains
> > about being private, so it's got a bit of a quirk - don't think that's
> > related to the patch though).
>
> That sounds fine, one error can hide another.
>
> >> But checking the name is a simple way to skip irrelevant usings.
> >
> > That does sound like a better way of doing it. Would I just do
> > cp_tree_equal (DECL_NAME (blah1), DECL_NAME (blah2)) [assuming
> > DECL_NAME returns a tree], or perhaps DECL_NAME (blah1) == DECL_NAME
> > (blah2)?
>
> == is enough, identifiers are unique.
>
> >> Maybe also check that the using is TREE_PRIVATE?
> >
> > Would that be necessary? Maybe if you wanted to sanity-check it I
> > suppose. We know for sure that PARENT_BINFO has private access to
> > DECL. If we find a using statement introducing DECL in PARENT_BINFO,
> > then surely the using statement must (by definition) have been
> > private? If it were not private, then the child class would have been
> > able to access it, and enforce_access wouldn't have thrown an error.
> > It doesn't seem to be the case that DECL could be private for any
> > other reason other than the using decl being private.
>
> Agreed, but the using-declaration might not be introducing DECL.  This
> would be another way to skip irrelevant usings.
>
> > Let me know your thoughts and I will update the patch. Thanks for your help.
> >
> > Anthony
> >
> >
> > On Thu, 4 Feb 2021 at 16:33, Jason Merrill  wrote:
> >>
> >> On 2/4/21 11:24 AM, Jason Merrill wrote:
> >>> On 2/4/21 10:02 AM, Anthony Sharp via Gcc-patches wrote:
>  Hello,
> 
>  New bugfix for PR19377
>  (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19377). This is
>  basically an extension of what I did before for PR17314 except it also
>  fixes this other bug.
> 
>  I hope I didn't over-comment in the code ... better to say too much
>  than too little! It's a niche bug so I thought it 

Re: [PATCH] rs6000: Fix MMA API - Add support for compatibility built-ins

2021-02-05 Thread Florian Weimer via Gcc-patches
* Peter Bergner:

> On 2/5/21 4:28 AM, Florian Weimer wrote:
>> * Peter Bergner via Gcc-patches:
>> 
>>> The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
>>> __builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
>>> __builtin_vsx_disassemble_pair respectively.  It's too late to remove the
>>> old names, so this patch adds support for creating compatibility built-ins
>>> (ie, multiple built-in functions generate the same code) and then creates
>>> compatibility built-ins using the new names.
>> 
>> Maybe add a check that the compatibility builtins are flagged as
>> availble using __has_builtin?
>
> Do you mean add a test in the testsuite for this?  I can check on
> adding that to the test case.

Right, in the test case.  Given that it's a new kind of built-in.
(Not sure if it makes sense.)

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



Re: [PATCH] rs6000: Fix MMA API - Add support for compatibility built-ins

2021-02-05 Thread Peter Bergner via Gcc-patches
On 2/5/21 4:28 AM, Florian Weimer wrote:
> * Peter Bergner via Gcc-patches:
> 
>> The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
>> __builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
>> __builtin_vsx_disassemble_pair respectively.  It's too late to remove the
>> old names, so this patch adds support for creating compatibility built-ins
>> (ie, multiple built-in functions generate the same code) and then creates
>> compatibility built-ins using the new names.
> 
> Maybe add a check that the compatibility builtins are flagged as
> availble using __has_builtin?

Do you mean add a test in the testsuite for this?  I can check on
adding that to the test case.

Peter




Re: [PATCH] c++: Fix ICE with invalid using enum [PR96462]

2021-02-05 Thread Jason Merrill via Gcc-patches

On 2/4/21 6:10 PM, Marek Polacek wrote:

Here we ICE in finish_nonmember_using_decl -> lookup_using_decl ->
... -> find_namespace_slot because "name" is not an IDENTIFIER_NODE.
It is a BIT_NOT_EXPR because this broken test uses

   using E::~E; // SCOPE::NAME

A using-decl can't refer to a destructor, and lookup_using_decl already
checks that in the class member case.  But in C++17, we do the "enum
scope is the enclosing scope" block, and so scope gets set to ::, and
we go into the NAMESPACE_DECL block.  In C++20 we don't do it, we go
to the ENUMERAL_TYPE block.

I resorted to hoisting the check along with a diagnostic tweak: we
don't want to print "~E names destructor".

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/96462
* name-lookup.c (lookup_using_decl): Hoist the destructor check.

gcc/testsuite/ChangeLog:

PR c++/96462
* g++.dg/cpp2a/using-enum-8.C: New test.
---
  gcc/cp/name-lookup.c  | 15 ---
  gcc/testsuite/g++.dg/cpp2a/using-enum-8.C |  5 +
  2 files changed, 13 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/using-enum-8.C

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 52e4a630e25..1f4a7ac1d0c 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -5700,6 +5700,14 @@ lookup_using_decl (tree scope, name_lookup )
scope = ctx;
  }
  
+  /* You cannot using-decl a destructor.  */

+  if (TREE_CODE (lookup.name) == BIT_NOT_EXPR)
+{
+  error ("%<%T%s%D%> names destructor", scope,
+&"::"[scope == global_namespace ? 2 : 0], lookup.name);
+  return NULL_TREE;
+}
+
if (TREE_CODE (scope) == NAMESPACE_DECL)
  {
/* Naming a namespace member.  */
@@ -5739,13 +5747,6 @@ lookup_using_decl (tree scope, name_lookup )
  return NULL_TREE;
}
  
-  /* You cannot using-decl a destructor.  */

-  if (TREE_CODE (lookup.name) == BIT_NOT_EXPR)
-   {
- error ("%<%T::%D%> names destructor", scope, lookup.name);
- return NULL_TREE;
-   }
-
/* Using T::T declares inheriting ctors, even if T is a typedef.  */
if (lookup.name == TYPE_IDENTIFIER (npscope)
  || constructor_name_p (lookup.name, npscope))
diff --git a/gcc/testsuite/g++.dg/cpp2a/using-enum-8.C 
b/gcc/testsuite/g++.dg/cpp2a/using-enum-8.C
new file mode 100644
index 000..9a743a98399
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/using-enum-8.C
@@ -0,0 +1,5 @@
+// PR c++/96462
+// { dg-do compile { target c++11 } }
+
+enum E {};
+using E::~E; // { dg-error "names destructor" }

base-commit: ce5720447c69286599b96bae53ae854b1bbe41fa





Re: [PATCH v2] c++: Fix bogus -Wvolatile warning in C++20 [PR98947]

2021-02-05 Thread Jason Merrill via Gcc-patches

On 2/4/21 6:09 PM, Marek Polacek wrote:

On Thu, Feb 04, 2021 at 03:47:54PM -0500, Jason Merrill via Gcc-patches wrote:

On 2/4/21 1:11 PM, Marek Polacek wrote:

On Thu, Feb 04, 2021 at 10:59:21AM -0500, Jason Merrill wrote:

On 2/3/21 7:03 PM, Marek Polacek wrote:

Since most of volatile is deprecated in C++20, we are required to warn
for compound assignments to volatile variables and so on.  But here we
have

 volatile int x, y, z;
 (b ? x : y) = 1;

and we shouldn't warn, because simple assignments like x = 24; should
not provoke the warning when they are a discarded-value expression.

We warn here because when ?: is used as an lvalue, we transform it in
cp_build_modify_expr/COND_EXPR from (a ? b : c) = rhs to

 (a ? (b = rhs) : (c = rhs))

and build_conditional_expr then calls mark_lvalue_use for the new
artificial assignments


Hmm, that seems wrong; the ?: expression itself does not use lvalue operands
any more than ',' does.  I notice that removing those mark_lvalue_use calls
doesn't regress Wunused-var-10.c, which was added with them in r160289.


The mark_lvalue_use calls didn't strike me as wrong because [expr.cond]/7
says that lvalue-to-rvalue conversion is performed on the second and third
operands.


Only after we've decided (in /6) that the result is a prvalue.


Oh, I see.  :~


With those mark_lvalue_use calls removed, we'd not issue the
warning for

(b ? (x = 2) : y) = 1;
(b ? x : (y = 5)) = 1;


Why wouldn't we?  The assignment should call mark_lvalue_use for the LHS,
which recursively applies it to the arms of the ?:.


Aha: we call mark_lvalue_use_*nonread* and the warning was guarded by
read_p.  I don't know why I did it, I see no regressions if I just
remove the check.  And then I get the expected results.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?


OK.


-- >8 --
Since most of volatile is deprecated in C++20, we are required to warn
for compound assignments to volatile variables and so on.  But here we
have

   volatile int x, y, z;
   (b ? x : y) = 1;

and we shouldn't warn, because simple assignments like x = 24; should
not provoke the warning when they are a discarded-value expression.

We warn here because when ?: is used as an lvalue, we transform it in
cp_build_modify_expr/COND_EXPR from (a ? b : c) = rhs to

   (a ? (b = rhs) : (c = rhs))

and build_conditional_expr then calls mark_lvalue_use for the new
artificial assignments, which then evokes the warning.  The calls
to mark_lvalue_use were added in r160289 to suppress warnings in
Wunused-var-10.c, but looks like they're no longer needed.

To warn on

 (b ? (x = 2) : y) = 1;
 (b ? x : (y = 5)) = 1;

I've tweaked a check in mark_use/MODIFY_EXPR.

I'd argue this is a regression because GCC 9 doesn't warn.

gcc/cp/ChangeLog:

PR c++/98947
* call.c (build_conditional_expr_1): Don't call mark_lvalue_use
on arg2/arg3.
* expr.c (mark_use) : Don't check read_p when
issuing the -Wvolatile warning.  Only set TREE_THIS_VOLATILE if
a warning was emitted.

gcc/testsuite/ChangeLog:

PR c++/98947
* g++.dg/cpp2a/volatile5.C: New test.
---
  gcc/cp/call.c  |  2 --
  gcc/cp/expr.c  | 14 +++---
  gcc/testsuite/g++.dg/cpp2a/volatile5.C | 15 +++
  3 files changed, 22 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/volatile5.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index c7e13f3a22b..4744c9768ec 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -5559,8 +5559,6 @@ build_conditional_expr_1 (const op_location_t ,
&& same_type_p (arg2_type, arg3_type))
  {
result_type = arg2_type;
-  arg2 = mark_lvalue_use (arg2);
-  arg3 = mark_lvalue_use (arg3);
goto valid_operands;
  }
  
diff --git a/gcc/cp/expr.c b/gcc/cp/expr.c

index 480e740f08c..d16d1896f2d 100644
--- a/gcc/cp/expr.c
+++ b/gcc/cp/expr.c
@@ -224,17 +224,17 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
 a volatile-qualified type is deprecated unless the assignment
 is either a discarded-value expression or appears in an
 unevaluated context."  */
- if (read_p
- && !cp_unevaluated_operand
+ if (!cp_unevaluated_operand
  && (TREE_THIS_VOLATILE (lhs)
  || CP_TYPE_VOLATILE_P (TREE_TYPE (lhs)))
  && !TREE_THIS_VOLATILE (expr))
{
- warning_at (location_of (expr), OPT_Wvolatile,
- "using value of simple assignment with %-"
- "qualified left operand is deprecated");
- /* Make sure not to warn about this assignment again.  */
- TREE_THIS_VOLATILE (expr) = true;
+ if (warning_at (location_of (expr), OPT_Wvolatile,
+ "using value of simple assignment with "
+ "%-qualified left 

Re: [PATCH] Fix Ada bootstrap failure on Cygwin since switch to C++11 (PR98590)

2021-02-05 Thread Arnaud Charlet
> > We'd rather not have PR references in the source files, so please remove it
> > (it will be there as part of the commit log and git annotate).
> >
> > OK with the comment updated.
> 
> Thanks, here's the revised patch.

OK, thanks.

> gcc/ada/
> 
> 2021-02-05  Mikael Pettersson  
> 
> PR bootstrap/98590
> * cstreams.c: Ensure fileno_unlocked() is visible on Cygwin.
> 
> diff --git a/gcc/ada/cstreams.c b/gcc/ada/cstreams.c
> index 4e00dedbbd6..7d64277110b 100644
> --- a/gcc/ada/cstreams.c
> +++ b/gcc/ada/cstreams.c
> @@ -37,6 +37,11 @@
>  #define _FILE_OFFSET_BITS 64
>  /* the define above will make off_t a 64bit type on GNU/Linux */
> 
> +/* tell Cygwin's  to expose fileno_unlocked() */
> +#if defined(__CYGWIN__) && !defined(__CYGWIN32__) && !defined(_GNU_SOURCE)
> +#define _GNU_SOURCE
> +#endif
> +
>  #include 
>  #include 
>  #include 


RE: [PATCH] tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS

2021-02-05 Thread Richard Biener
On Fri, 5 Feb 2021, Kyrylo Tkachov wrote:

> Hi Richard,
> 
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Richard Biener
> > Sent: 01 October 2020 14:15
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH] tree-optimization/97236 - fix bad use of
> > VMAT_CONTIGUOUS
> > 
> > This avoids using VMAT_CONTIGUOUS with single-element interleaving
> > when using V1mode vectors.  Instead keep VMAT_ELEMENTWISE but
> > continue to avoid load-lanes and gathers.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> I've checked that this fix also fixes the recently-reported PR 98949 on 
> aarch64 on the GCC 9 branch.
> I've bootstrapped and tested it on the branch on aarch64-none-linux.
> Is it okay to backport to the branch?

Yes.

Thanks,
Richard.

> Thanks,
> Kyrill
> 
> > 
> > Richard.
> > 
> > 2020-10-01  Richard Biener  
> > 
> > PR tree-optimization/97236
> > * tree-vect-stmts.c (get_group_load_store_type): Keep
> > VMAT_ELEMENTWISE for single-element vectors.
> > 
> > * gcc.dg/vect/pr97236.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/vect/pr97236.c | 43
> > +
> >  gcc/tree-vect-stmts.c   | 20 ++
> >  2 files changed, 52 insertions(+), 11 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97236.c
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr97236.c
> > b/gcc/testsuite/gcc.dg/vect/pr97236.c
> > new file mode 100644
> > index 000..03e0cc38984
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr97236.c
> > @@ -0,0 +1,43 @@
> > +typedef unsigned char __uint8_t;
> > +typedef __uint8_t uint8_t;
> > +typedef struct plane_t {
> > +  uint8_t *p_pixels;
> > +  int i_lines;
> > +  int i_pitch;
> > +} plane_t;
> > +
> > +typedef struct {
> > +  plane_t p[5];
> > +} picture_t;
> > +
> > +#define N 4
> > +
> > +void __attribute__((noipa))
> > +picture_Clone(picture_t *picture, picture_t *res)
> > +{
> > +  for (int i = 0; i < N; i++) {
> > +res->p[i].p_pixels = picture->p[i].p_pixels;
> > +res->p[i].i_lines = picture->p[i].i_lines;
> > +res->p[i].i_pitch = picture->p[i].i_pitch;
> > +  }
> > +}
> > +
> > +int
> > +main()
> > +{
> > +  picture_t aaa, bbb;
> > +  uint8_t pixels[10] = {1, 1, 1, 1, 1, 1, 1, 1};
> > +
> > +  for (unsigned i = 0; i < N; i++)
> > +aaa.p[i].p_pixels = pixels;
> > +
> > +  picture_Clone (, );
> > +
> > +  uint8_t c;
> > +  for (unsigned i = 0; i < N; i++)
> > +c += bbb.p[i].p_pixels[0];
> > +
> > +  if (c != N)
> > +__builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> > index 191957c3543..3575f25241f 100644
> > --- a/gcc/tree-vect-stmts.c
> > +++ b/gcc/tree-vect-stmts.c
> > @@ -2235,25 +2235,23 @@ get_group_load_store_type (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> >   /* First cope with the degenerate case of a single-element
> >  vector.  */
> >   if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
> > -   *memory_access_type = VMAT_CONTIGUOUS;
> > +   ;
> > 
> >   /* Otherwise try using LOAD/STORE_LANES.  */
> > - if (*memory_access_type == VMAT_ELEMENTWISE
> > - && (vls_type == VLS_LOAD
> > - ? vect_load_lanes_supported (vectype, group_size,
> > masked_p)
> > - : vect_store_lanes_supported (vectype, group_size,
> > -   masked_p)))
> > + else if (vls_type == VLS_LOAD
> > +  ? vect_load_lanes_supported (vectype, group_size,
> > masked_p)
> > +  : vect_store_lanes_supported (vectype, group_size,
> > +masked_p))
> > {
> >   *memory_access_type = VMAT_LOAD_STORE_LANES;
> >   overrun_p = would_overrun_p;
> > }
> > 
> >   /* If that fails, try using permuting loads.  */
> > - if (*memory_access_type == VMAT_ELEMENTWISE
> > - && (vls_type == VLS_LOAD
> > - ? vect_grouped_load_supported (vectype,
> > single_element_p,
> > -group_size)
> > - : vect_grouped_store_supported (vectype, group_size)))
> > + else if (vls_type == VLS_LOAD
> > +  ? vect_grouped_load_supported (vectype,
> > single_element_p,
> > + group_size)
> > +  : vect_grouped_store_supported (vectype, group_size))
> > {
> >   *memory_access_type = VMAT_CONTIGUOUS_PERMUTE;
> >   overrun_p = would_overrun_p;
> > --
> > 2.26.2
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


RE: [PATCH] tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS

2021-02-05 Thread Kyrylo Tkachov via Gcc-patches
Hi Richard,

> -Original Message-
> From: Gcc-patches  On Behalf Of
> Richard Biener
> Sent: 01 October 2020 14:15
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/97236 - fix bad use of
> VMAT_CONTIGUOUS
> 
> This avoids using VMAT_CONTIGUOUS with single-element interleaving
> when using V1mode vectors.  Instead keep VMAT_ELEMENTWISE but
> continue to avoid load-lanes and gathers.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I've checked that this fix also fixes the recently-reported PR 98949 on aarch64 
on the GCC 9 branch.
I've bootstrapped and tested it on the branch on aarch64-none-linux.
Is it okay to backport to the branch?

Thanks,
Kyrill

> 
> Richard.
> 
> 2020-10-01  Richard Biener  
> 
>   PR tree-optimization/97236
>   * tree-vect-stmts.c (get_group_load_store_type): Keep
>   VMAT_ELEMENTWISE for single-element vectors.
> 
>   * gcc.dg/vect/pr97236.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/pr97236.c | 43
> +
>  gcc/tree-vect-stmts.c   | 20 ++
>  2 files changed, 52 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97236.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/pr97236.c
> b/gcc/testsuite/gcc.dg/vect/pr97236.c
> new file mode 100644
> index 000..03e0cc38984
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr97236.c
> @@ -0,0 +1,43 @@
> +typedef unsigned char __uint8_t;
> +typedef __uint8_t uint8_t;
> +typedef struct plane_t {
> +  uint8_t *p_pixels;
> +  int i_lines;
> +  int i_pitch;
> +} plane_t;
> +
> +typedef struct {
> +  plane_t p[5];
> +} picture_t;
> +
> +#define N 4
> +
> +void __attribute__((noipa))
> +picture_Clone(picture_t *picture, picture_t *res)
> +{
> +  for (int i = 0; i < N; i++) {
> +res->p[i].p_pixels = picture->p[i].p_pixels;
> +res->p[i].i_lines = picture->p[i].i_lines;
> +res->p[i].i_pitch = picture->p[i].i_pitch;
> +  }
> +}
> +
> +int
> +main()
> +{
> +  picture_t aaa, bbb;
> +  uint8_t pixels[10] = {1, 1, 1, 1, 1, 1, 1, 1};
> +
> +  for (unsigned i = 0; i < N; i++)
> +aaa.p[i].p_pixels = pixels;
> +
> +  picture_Clone (, );
> +
> +  uint8_t c;
> +  for (unsigned i = 0; i < N; i++)
> +c += bbb.p[i].p_pixels[0];
> +
> +  if (c != N)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 191957c3543..3575f25241f 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -2235,25 +2235,23 @@ get_group_load_store_type (vec_info *vinfo,
> stmt_vec_info stmt_info,
> /* First cope with the degenerate case of a single-element
>vector.  */
> if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
> - *memory_access_type = VMAT_CONTIGUOUS;
> + ;
> 
> /* Otherwise try using LOAD/STORE_LANES.  */
> -   if (*memory_access_type == VMAT_ELEMENTWISE
> -   && (vls_type == VLS_LOAD
> -   ? vect_load_lanes_supported (vectype, group_size,
> masked_p)
> -   : vect_store_lanes_supported (vectype, group_size,
> - masked_p)))
> +   else if (vls_type == VLS_LOAD
> +? vect_load_lanes_supported (vectype, group_size,
> masked_p)
> +: vect_store_lanes_supported (vectype, group_size,
> +  masked_p))
>   {
> *memory_access_type = VMAT_LOAD_STORE_LANES;
> overrun_p = would_overrun_p;
>   }
> 
> /* If that fails, try using permuting loads.  */
> -   if (*memory_access_type == VMAT_ELEMENTWISE
> -   && (vls_type == VLS_LOAD
> -   ? vect_grouped_load_supported (vectype,
> single_element_p,
> -  group_size)
> -   : vect_grouped_store_supported (vectype, group_size)))
> +   else if (vls_type == VLS_LOAD
> +? vect_grouped_load_supported (vectype,
> single_element_p,
> +   group_size)
> +: vect_grouped_store_supported (vectype, group_size))
>   {
> *memory_access_type = VMAT_CONTIGUOUS_PERMUTE;
> overrun_p = would_overrun_p;
> --
> 2.26.2


Re: [PATCH] Fix Ada bootstrap failure on Cygwin since switch to C++11 (PR98590)

2021-02-05 Thread Mikael Pettersson via Gcc-patches
On Fri, Feb 5, 2021 at 9:35 AM Arnaud Charlet  wrote:
>
> > This fixes the bootstrap failure with Ada on Cygwin since the switch
> > to C++11. The configure checks detect that fileno_unlocked () is
> > present, but when Ada's cstreams.c is compiled in C++11 mode,
> >  does not declare it, causing a hard error.
> >
> > Fixed by defining _GNU_SOURCE before including .
> >
> > Ok for the master branch?
> >
> > gcc/ada/
> >
> > 2021-02-03  Mikael Pettersson  
> >
> > PR bootstrap/98590
> > * cstreams.c: Ensure fileno_unlocked() is visible on Cygwin.
>
> We'd rather not have PR references in the source files, so please remove it
> (it will be there as part of the commit log and git annotate).
>
> OK with the comment updated.

Thanks, here's the revised patch.

gcc/ada/

2021-02-05  Mikael Pettersson  

PR bootstrap/98590
* cstreams.c: Ensure fileno_unlocked() is visible on Cygwin.

diff --git a/gcc/ada/cstreams.c b/gcc/ada/cstreams.c
index 4e00dedbbd6..7d64277110b 100644
--- a/gcc/ada/cstreams.c
+++ b/gcc/ada/cstreams.c
@@ -37,6 +37,11 @@
 #define _FILE_OFFSET_BITS 64
 /* the define above will make off_t a 64bit type on GNU/Linux */

+/* tell Cygwin's  to expose fileno_unlocked() */
+#if defined(__CYGWIN__) && !defined(__CYGWIN32__) && !defined(_GNU_SOURCE)
+#define _GNU_SOURCE
+#endif
+
 #include 
 #include 
 #include 
From f9b9b44f5341b0d8df7b1bb00de6f7231258891b Mon Sep 17 00:00:00 2001
From: Mikael Pettersson 
Date: Fri, 5 Feb 2021 14:43:52 +0100
Subject: [PATCH] Ensure fileno_unlocked() is visible on Cygwin.

---
 gcc/ada/cstreams.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/ada/cstreams.c b/gcc/ada/cstreams.c
index 4e00dedbbd6..7d64277110b 100644
--- a/gcc/ada/cstreams.c
+++ b/gcc/ada/cstreams.c
@@ -37,6 +37,11 @@
 #define _FILE_OFFSET_BITS 64
 /* the define above will make off_t a 64bit type on GNU/Linux */
 
+/* tell Cygwin's  to expose fileno_unlocked() */
+#if defined(__CYGWIN__) && !defined(__CYGWIN32__) && !defined(_GNU_SOURCE)
+#define _GNU_SOURCE
+#endif
+
 #include 
 #include 
 #include 
-- 
2.26.2



Re: [PATCH] tree-optimization/98855 - redo BB vectorization costing

2021-02-05 Thread Richard Biener
On Fri, 5 Feb 2021, Richard Biener wrote:

> On Fri, 5 Feb 2021, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > On Fri, 5 Feb 2021, Richard Sandiford wrote:
> > >> Richard Biener  writes:
> > >> > +  /* First produce cost vectors sorted by loop index.  */
> > >> > +  auto_vec >
> > >> > +li_scalar_costs (scalar_costs.length ());
> > >> > +  auto_vec >
> > >> > +li_vector_costs (vector_costs.length ());
> > >> > +  FOR_EACH_VEC_ELT (scalar_costs, i, cost)
> > >> > +{
> > >> > +  unsigned l = gimple_bb 
> > >> > (cost->stmt_info->stmt)->loop_father->num;
> > >> > +  li_scalar_costs.quick_push (std::make_pair (l, cost));
> > >> > +}
> > >> > +  unsigned l = li_scalar_costs[0].first;
> > >> 
> > >> Is this just to silence an unused warning?  Might be worth a comment if 
> > >> so
> > >> (although couldn't we just use 0?).
> > >
> > > The issue is that not all vector_costs entries have a stmt_info so 
> > > this uses a random loop also used in the scalar code (that's going
> > > to be the correct loop in case there's only one loop involved which
> > > is probably 99% of the cases).  I'mm add a comment.
> > 
> > Hmm, OK, but I thought the comment below was saying that all such
> > cases follow the relevant statement, instead of coming first.
> > Is this situation specific to the first run of costs?
> 
> It's to some extent out of caution since I did not try to provoke
> the situation but I think there are some paths in costing that
> fail to set a stmt_info for costing of an acutal stmt.  While
> we do want to fix this I don't think it's appropriate to ICE
> for the cases at this point.

So the following passes vect.exp testing on x86_64 but I'm quite
sure it will eventually trigger.

Richard.

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b9f12c30fb8..0737621cb1b 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -4412,13 +4412,14 @@ vect_bb_vectorization_profitable_p (bb_vec_info 
bb_vinfo,
 }
   /* Use a random used loop as fallback in case the first vector_costs
  entry does not have a stmt_info associated with it.  */
-  unsigned l = li_scalar_costs[0].first;
+  unsigned l = -1u;
   FOR_EACH_VEC_ELT (vector_costs, i, cost)
 {
   /* We inherit from the previous COST, invariants, externals and
 extracts immediately follow the cost for the related stmt.  */
   if (cost->stmt_info)
l = gimple_bb (cost->stmt_info->stmt)->loop_father->num;
+  gcc_assert (l != -1u);
   li_vector_costs.quick_push (std::make_pair (l, cost));
 }
   li_scalar_costs.qsort (li_cost_vec_cmp);



Re: [PATCH] tree-optimization/98855 - redo BB vectorization costing

2021-02-05 Thread Richard Biener
On Fri, 5 Feb 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Fri, 5 Feb 2021, Richard Sandiford wrote:
> >> Richard Biener  writes:
> >> > +  /* First produce cost vectors sorted by loop index.  */
> >> > +  auto_vec >
> >> > +li_scalar_costs (scalar_costs.length ());
> >> > +  auto_vec >
> >> > +li_vector_costs (vector_costs.length ());
> >> > +  FOR_EACH_VEC_ELT (scalar_costs, i, cost)
> >> > +{
> >> > +  unsigned l = gimple_bb (cost->stmt_info->stmt)->loop_father->num;
> >> > +  li_scalar_costs.quick_push (std::make_pair (l, cost));
> >> > +}
> >> > +  unsigned l = li_scalar_costs[0].first;
> >> 
> >> Is this just to silence an unused warning?  Might be worth a comment if so
> >> (although couldn't we just use 0?).
> >
> > The issue is that not all vector_costs entries have a stmt_info so 
> > this uses a random loop also used in the scalar code (that's going
> > to be the correct loop in case there's only one loop involved which
> > is probably 99% of the cases).  I'mm add a comment.
> 
> Hmm, OK, but I thought the comment below was saying that all such
> cases follow the relevant statement, instead of coming first.
> Is this situation specific to the first run of costs?

It's to some extent out of caution since I did not try to provoke
the situation but I think there are some paths in costing that
fail to set a stmt_info for costing of an acutal stmt.  While
we do want to fix this I don't think it's appropriate to ICE
for the cases at this point.

Richard.

> Richard
> 
> >
> >> > +  FOR_EACH_VEC_ELT (vector_costs, i, cost)
> >> > +{
> >> > +  /* We inherit from the previous SI, invariants, externals and
> >> > + extracts immediately follow the cost for the related stmt.  */
> >> 
> >> Looks like the comment predates s/SI/COST/.
> >
> > Indeed, will fix and push.
> >
> > Thanks,
> > Richard.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [r11-7112 Regression] FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os (test for excess errors) on Linux/x86_64

2021-02-05 Thread Julian Brown
On Thu, 4 Feb 2021 20:35:32 -0800
sunil.k.pandey  wrote:

> On Linux/x86_64,
> 
> 9a4d32f85ccebc0ee4b24e6d9d7a4f11c04d7146 is the first bad commit
> commit 9a4d32f85ccebc0ee4b24e6d9d7a4f11c04d7146
> Author: Julian Brown 
> Date:   Tue Feb 2 03:44:34 2021 -0800
> 
> openacc: Allow strided arrays in update directives
> 
> caused
> 
> FAIL: gfortran.dg/goacc/array-with-dt-2.f90   -O  (internal compiler
> error) FAIL: gfortran.dg/goacc/array-with-dt-2.f90   -O  (test for
> excess errors) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O0
> (internal compiler error) FAIL:
> libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
> -DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
> FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O1
> (internal compiler error) FAIL:
> libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
> -DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
> FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2
> (internal compiler error) FAIL:
> libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
> -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
> FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O3
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
> -finline-functions  (internal compiler error) FAIL:
> libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
> -DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
> excess errors) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O3 -g
>  (internal compiler error) FAIL:
> libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
> -DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess
> errors) FAIL: libgomp.oacc-fortran/array-stride-dt-1.f90
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -Os
> (internal compiler error) FAIL:
> libgomp.oacc-fortran/array-stride-dt-1.f90 -DACC_DEVICE_TYPE_host=1
> -DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)

These will be fixed by:

https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564711.html

I tested on a tree with that patch already applied (and evidently
forgot it was a dependency).

Thanks,

Julian


Re: [PATCH] tree-optimization/98855 - redo BB vectorization costing

2021-02-05 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Fri, 5 Feb 2021, Richard Sandiford wrote:
>> Richard Biener  writes:
>> > +  /* First produce cost vectors sorted by loop index.  */
>> > +  auto_vec >
>> > +li_scalar_costs (scalar_costs.length ());
>> > +  auto_vec >
>> > +li_vector_costs (vector_costs.length ());
>> > +  FOR_EACH_VEC_ELT (scalar_costs, i, cost)
>> > +{
>> > +  unsigned l = gimple_bb (cost->stmt_info->stmt)->loop_father->num;
>> > +  li_scalar_costs.quick_push (std::make_pair (l, cost));
>> > +}
>> > +  unsigned l = li_scalar_costs[0].first;
>> 
>> Is this just to silence an unused warning?  Might be worth a comment if so
>> (although couldn't we just use 0?).
>
> The issue is that not all vector_costs entries have a stmt_info so 
> this uses a random loop also used in the scalar code (that's going
> to be the correct loop in case there's only one loop involved which
> is probably 99% of the cases).  I'mm add a comment.

Hmm, OK, but I thought the comment below was saying that all such
cases follow the relevant statement, instead of coming first.
Is this situation specific to the first run of costs?

Richard

>
>> > +  FOR_EACH_VEC_ELT (vector_costs, i, cost)
>> > +{
>> > +  /* We inherit from the previous SI, invariants, externals and
>> > +   extracts immediately follow the cost for the related stmt.  */
>> 
>> Looks like the comment predates s/SI/COST/.
>
> Indeed, will fix and push.
>
> Thanks,
> Richard.


Re: [PATCH] tree-optimization/98855 - redo BB vectorization costing

2021-02-05 Thread Richard Biener
On Fri, 5 Feb 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following attempts to account for the fact that BB vectorization
> > regions now can span multiple loop levels and that an unprofitable
> > inner loop vectorization shouldn't be offsetted by a profitable
> > outer loop vectorization to make it overall profitable.
> >
> > For now I've implemented a heuristic based on the premise that
> > vectorization should be profitable even if loops may not be entered
> > or if they iterate any number of times.  Especially the first
> > assumption then requires that stmts directly belonging to loop A
> > need to be costed separately from stmts belonging to another loop
> > which also simplifies the implementation.
> >
> > On x86 the added testcase has in the outer loop
> >
> > t.c:38:20: note: Cost model analysis for part in loop 1:
> >   Vector cost: 56
> >   Scalar cost: 192
> >
> > and the inner loop
> >
> > t.c:38:20: note: Cost model analysis for part in loop 2:
> >   Vector cost: 132
> >   Scalar cost: 48
> >
> > and thus the vectorization is considered not profitable
> > (note the same would happen in case the 2nd cost were for
> > a loop outer to the 1st costing).
> >
> > Future enhancements may consider static knowledge of whether
> > a loop is always entered which would allow some inefficiency
> > in the vectorization of its loop header.  Likewise stmts only
> > reachable from a loop exit can be treated this way.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu and it fixes
> > the regression reported in the PR.
> >
> > Does this look sensible and good enough for GCC 11?
> 
> I don't know the new SLP code very well, but FWIW it seems reasonable
> to me.
> 
> > +/* Comparator for the loop-index sorted cost vectors.  */
> > +
> > +static int
> > +li_cost_vec_cmp (const void *a_, const void *b_)
> > +{
> > +  auto *a = (const std::pair *)a_;
> > +  auto *b = (const std::pair *)b_;
> > +  if (a->first < b->first)
> > +return -1;
> > +  else if (a->first == b->first)
> > +return 0;
> > +  return 1;
> > +}
> 
> This isn't a stable sort.  (Or isn't that an issue these days?)
> 
> > +  /* First produce cost vectors sorted by loop index.  */
> > +  auto_vec >
> > +li_scalar_costs (scalar_costs.length ());
> > +  auto_vec >
> > +li_vector_costs (vector_costs.length ());
> > +  FOR_EACH_VEC_ELT (scalar_costs, i, cost)
> > +{
> > +  unsigned l = gimple_bb (cost->stmt_info->stmt)->loop_father->num;
> > +  li_scalar_costs.quick_push (std::make_pair (l, cost));
> > +}
> > +  unsigned l = li_scalar_costs[0].first;
> 
> Is this just to silence an unused warning?  Might be worth a comment if so
> (although couldn't we just use 0?).

The issue is that not all vector_costs entries have a stmt_info so 
this uses a random loop also used in the scalar code (that's going
to be the correct loop in case there's only one loop involved which
is probably 99% of the cases).  I'mm add a comment.

> > +  FOR_EACH_VEC_ELT (vector_costs, i, cost)
> > +{
> > +  /* We inherit from the previous SI, invariants, externals and
> > +extracts immediately follow the cost for the related stmt.  */
> 
> Looks like the comment predates s/SI/COST/.

Indeed, will fix and push.

Thanks,
Richard.


Re: [r11-7113 Regression] FAIL: gfortran.dg/goacc/derived-chartypes-2.f90 -O (test for excess errors) on Linux/x86_64

2021-02-05 Thread Julian Brown
On Thu, 4 Feb 2021 20:35:32 -0800
sunil.k.pandey  wrote:

> On Linux/x86_64,
> 
> b2d84e9f9cccbe4ee662f7002b83105629d09939 is the first bad commit
> commit b2d84e9f9cccbe4ee662f7002b83105629d09939
> Author: Julian Brown 
> Date:   Thu Feb 4 10:13:22 2021 -0800
> 
> openacc: Tests for character types in derived-type mappings
> 
> caused
> 
> FAIL: gfortran.dg/goacc/derived-chartypes-1.f90   -O  (internal
> compiler error) FAIL: gfortran.dg/goacc/derived-chartypes-1.f90   -O
> (test for excess errors) FAIL:
> gfortran.dg/goacc/derived-chartypes-2.f90   -O  (internal compiler
> error) FAIL: gfortran.dg/goacc/derived-chartypes-2.f90   -O  (test
> for excess errors)

Oops -- I tested on a tree with the following patch already applied:

https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564711.html

That patch fixes these failures.

Thanks,

Julian


[PATCH] opts: fix handling of -fpatchable-function-entries option

2021-02-05 Thread Martin Liška

Hello.

As seen the flag -fpatchable-function-entry is properly marked as Optimization.
However, it's the argument is parsed early and stored into the following tuple:

; How many NOP insns to place at each function entry by default
Variable
HOST_WIDE_INT function_entry_patch_area_size

; And how far the real asm entry point is into this area
Variable
HOST_WIDE_INT function_entry_patch_area_start

That does not work with set_current_function where per-function arguments are 
restored.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

PR lto/98971
* cfgexpand.c (pass_expand::execute): Parse per-function option
flag_patchable_function_entry and use it.
* common.opt: Remove function_entry_patch_area_size and
function_entry_patch_area_start global variables.
* opts.c (parse_and_check_patch_area): New function.
(common_handle_option): Use it.
* opts.h (parse_and_check_patch_area): New function.
* toplev.c (process_options): Parse and use
function_entry_patch_area_size.
---
 gcc/cfgexpand.c |  6 +++--
 gcc/common.opt  | 10 +---
 gcc/opts.c  | 65 +++--
 gcc/opts.h  |  4 +++
 gcc/toplev.c|  6 -
 5 files changed, 55 insertions(+), 36 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 8d20ca6cefb..aef9e916fcd 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-address.h"
 #include "output.h"
 #include "builtins.h"
+#include "opts.h"
 
 /* Some systems use __main in a way incompatible with its use in gcc, in these

cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN to
@@ -6845,8 +6846,9 @@ pass_expand::execute (function *fun)
   if (crtl->tail_call_emit)
 fixup_tail_calls ();
 
-  unsigned HOST_WIDE_INT patch_area_size = function_entry_patch_area_size;

-  unsigned HOST_WIDE_INT patch_area_entry = function_entry_patch_area_start;
+  HOST_WIDE_INT patch_area_size, patch_area_entry;
+  parse_and_check_patch_area (flag_patchable_function_entry, false,
+ _area_size, _area_entry);
 
   tree patchable_function_entry_attr

 = lookup_attribute ("patchable_function_entry",
diff --git a/gcc/common.opt b/gcc/common.opt
index a8a2b67a99d..c75dd36843e 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -144,14 +144,6 @@ bool flag_stack_usage_info = false
 Variable
 int flag_debug_asm
 
-; How many NOP insns to place at each function entry by default

-Variable
-HOST_WIDE_INT function_entry_patch_area_size
-
-; And how far the real asm entry point is into this area
-Variable
-HOST_WIDE_INT function_entry_patch_area_start
-
 ; Balance between GNAT encodings and standard DWARF to emit.
 Variable
 enum dwarf_gnat_encodings gnat_encodings = DWARF_GNAT_ENCODINGS_DEFAULT
@@ -2309,7 +2301,7 @@ Common Var(flag_profile_reorder_functions) Optimization
 Enable function reordering that improves code placement.
 
 fpatchable-function-entry=

-Common Joined Optimization
+Common Var(flag_patchable_function_entry) Joined Optimization
 Insert NOP instructions at each function entry.
 
 frandom-seed

diff --git a/gcc/opts.c b/gcc/opts.c
index 1f1cf8388f7..fc5f61e13cc 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2130,6 +2130,44 @@ check_alignment_argument (location_t loc, const char 
*flag, const char *name,
 }
 }
 
+/* Parse argument of -fpatchable-function-entry option ARG and store

+   corresponding values to PATCH_AREA_SIZE and PATCH_AREA_START.
+   If REPORT_ERROR is set to true, generate error for a problematic
+   option arguments.  */
+
+void
+parse_and_check_patch_area (const char *arg, bool report_error,
+   HOST_WIDE_INT *patch_area_size,
+   HOST_WIDE_INT *patch_area_start)
+{
+  *patch_area_size = 0;
+  *patch_area_start = 0;
+
+  if (arg == NULL)
+return;
+
+  char *patch_area_arg = xstrdup (arg);
+  char *comma = strchr (patch_area_arg, ',');
+  if (comma)
+{
+  *comma = '\0';
+  *patch_area_size = integral_argument (patch_area_arg);
+  *patch_area_start = integral_argument (comma + 1);
+}
+  else
+*patch_area_size = integral_argument (patch_area_arg);
+
+  if (*patch_area_size < 0
+  || *patch_area_size > USHRT_MAX
+  || *patch_area_start < 0
+  || *patch_area_start > USHRT_MAX
+  || *patch_area_size < *patch_area_start)
+if (report_error)
+  error ("invalid arguments for %<-fpatchable-function-entry%>");
+
+  free (patch_area_arg);
+}
+
 /* Print help when OPT__help_ is set.  */
 
 void

@@ -2671,30 +2709,9 @@ common_handle_option (struct gcc_options *opts,
 
 case OPT_fpatchable_function_entry_:

   {
-   char *patch_area_arg = xstrdup (arg);
-   char *comma = strchr (patch_area_arg, ',');
-   if (comma)
- {
-  

Re: PR98974: Fix vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Hi,
>
> As mentioned in the PR, this patch fixes up the nvectors parameter passed to 
> vect_get_loop_mask in vectorizable_condition.
> Before the STMT_VINFO_VEC_STMTS rework we used to handle each ncopy 
> separately, now we gather them all at the same time and don't need to 
> multiply vec_num with ncopies.
>
> The reduced testcase I used to illustrate the issue in the PR gives a 
> warning, if someone knows how to get rid of that (it's Fortran) I'd include 
> it as a testcase for this.

Looks like Richi's since posted one.

> Bootstrapped and regression tested on aarch64-none-linux-gnu. I don't believe 
> that code triggers for other targets, so not sure it makes sense to test on 
> others?
>
> Is this OK for trunk? Would you rather wait for the testcase?
>
> gcc/ChangeLog:
> 2021-02-05  Andre Vieira  
>
>  PR middle-end/98974
>  * tree-vect-stmts.c (vectorizable_condition): Fix nvectors 
> parameter
>  for vect_get_loop_mask call.
>
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 
> 0bc1cb1c5b4f6c1f0447241b4d31434bf7cca1a4..d07602f6d38f9c51936ac09942599fc5a14f46ab
>  100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -10237,8 +10237,7 @@ vectorizable_condition (vec_info *vinfo,
>   {
> unsigned vec_num = vec_oprnds0.length ();
> tree loop_mask
> - = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
> -   vectype, i);
> + = vect_get_loop_mask (gsi, masks, vec_num, vectype, i);
> tree tmp2 = make_ssa_name (vec_cmp_type);
> gassign *g
>   = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare,

Does removing the shadowed vec_num work?  I think that would be less
confusing, and means that the calculation stays in sync with the
vect_record_loop_mask call.

Thanks,
Richard


Re: [PATCH] tree-optimization/98855 - redo BB vectorization costing

2021-02-05 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> The following attempts to account for the fact that BB vectorization
> regions now can span multiple loop levels and that an unprofitable
> inner loop vectorization shouldn't be offsetted by a profitable
> outer loop vectorization to make it overall profitable.
>
> For now I've implemented a heuristic based on the premise that
> vectorization should be profitable even if loops may not be entered
> or if they iterate any number of times.  Especially the first
> assumption then requires that stmts directly belonging to loop A
> need to be costed separately from stmts belonging to another loop
> which also simplifies the implementation.
>
> On x86 the added testcase has in the outer loop
>
> t.c:38:20: note: Cost model analysis for part in loop 1:
>   Vector cost: 56
>   Scalar cost: 192
>
> and the inner loop
>
> t.c:38:20: note: Cost model analysis for part in loop 2:
>   Vector cost: 132
>   Scalar cost: 48
>
> and thus the vectorization is considered not profitable
> (note the same would happen in case the 2nd cost were for
> a loop outer to the 1st costing).
>
> Future enhancements may consider static knowledge of whether
> a loop is always entered which would allow some inefficiency
> in the vectorization of its loop header.  Likewise stmts only
> reachable from a loop exit can be treated this way.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu and it fixes
> the regression reported in the PR.
>
> Does this look sensible and good enough for GCC 11?

I don't know the new SLP code very well, but FWIW it seems reasonable
to me.

> +/* Comparator for the loop-index sorted cost vectors.  */
> +
> +static int
> +li_cost_vec_cmp (const void *a_, const void *b_)
> +{
> +  auto *a = (const std::pair *)a_;
> +  auto *b = (const std::pair *)b_;
> +  if (a->first < b->first)
> +return -1;
> +  else if (a->first == b->first)
> +return 0;
> +  return 1;
> +}

This isn't a stable sort.  (Or isn't that an issue these days?)

> +  /* First produce cost vectors sorted by loop index.  */
> +  auto_vec >
> +li_scalar_costs (scalar_costs.length ());
> +  auto_vec >
> +li_vector_costs (vector_costs.length ());
> +  FOR_EACH_VEC_ELT (scalar_costs, i, cost)
> +{
> +  unsigned l = gimple_bb (cost->stmt_info->stmt)->loop_father->num;
> +  li_scalar_costs.quick_push (std::make_pair (l, cost));
> +}
> +  unsigned l = li_scalar_costs[0].first;

Is this just to silence an unused warning?  Might be worth a comment if so
(although couldn't we just use 0?).

> +  FOR_EACH_VEC_ELT (vector_costs, i, cost)
> +{
> +  /* We inherit from the previous SI, invariants, externals and
> +  extracts immediately follow the cost for the related stmt.  */

Looks like the comment predates s/SI/COST/.

Thanks,
Richard


[PATCH] debug: fix switch lowering debug info

2021-02-05 Thread Martin Liška

I'm sending Tom's patch that was approved by Jakub in the PR.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
I'm going to install it.

Thanks,
Martin

gcc/ChangeLog:

PR debug/98656
* tree-switch-conversion.c (jump_table_cluster::emit): Add loc
argument.
(bit_test_cluster::emit): Reuse location_t for newly created
gswitch statement.
(switch_decision_tree::try_switch_expansion): Preserve
location_t.
* tree-switch-conversion.h: Change function signatures.
---
 gcc/tree-switch-conversion.c | 11 +++
 gcc/tree-switch-conversion.h |  8 
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index 24dbad93138..a4798812dbf 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1112,7 +1112,8 @@ group_cluster::dump (FILE *f, bool details)
 
 void

 jump_table_cluster::emit (tree index_expr, tree,
- tree default_label_expr, basic_block default_bb)
+ tree default_label_expr, basic_block default_bb,
+ location_t loc)
 {
   unsigned HOST_WIDE_INT range = get_range (get_low (), get_high ());
   unsigned HOST_WIDE_INT nondefault_range = 0;
@@ -1131,6 +1132,7 @@ jump_table_cluster::emit (tree index_expr, tree,
 
   gswitch *s = gimple_build_switch (index_expr,

unshare_expr (default_label_expr), labels);
+  gimple_set_location (s, loc);
   gimple_stmt_iterator gsi = gsi_start_bb (m_case_bb);
   gsi_insert_after (, s, GSI_NEW_STMT);
 
@@ -1491,7 +1493,7 @@ case_bit_test::cmp (const void *p1, const void *p2)
 
 void

 bit_test_cluster::emit (tree index_expr, tree index_type,
-   tree, basic_block default_bb)
+   tree, basic_block default_bb, location_t)
 {
   case_bit_test test[m_max_case_bit_tests] = { {} };
   unsigned int i, j, k;
@@ -1892,7 +1894,8 @@ switch_decision_tree::try_switch_expansion (vec 
)
 {
   cluster *c = clusters[0];
   c->emit (index_expr, index_type,
-  gimple_switch_default_label (m_switch), m_default_bb);
+  gimple_switch_default_label (m_switch), m_default_bb,
+  gimple_location (m_switch));
   redirect_edge_succ (single_succ_edge (bb), c->m_case_bb);
 }
   else
@@ -1904,7 +1907,7 @@ switch_decision_tree::try_switch_expansion (vec 
)
if (clusters[i]->get_type () != SIMPLE_CASE)
  clusters[i]->emit (index_expr, index_type,
 gimple_switch_default_label (m_switch),
-m_default_bb);
+m_default_bb, gimple_location (m_switch));
 }
 
   fix_phi_operands_for_edges ();

diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
index d31fdf71427..d3d568e84cc 100644
--- a/gcc/tree-switch-conversion.h
+++ b/gcc/tree-switch-conversion.h
@@ -71,7 +71,7 @@ public:
   virtual void dump (FILE *f, bool details = false) = 0;
 
   /* Emit GIMPLE code to handle the cluster.  */

-  virtual void emit (tree, tree, tree, basic_block) = 0;
+  virtual void emit (tree, tree, tree, basic_block, location_t) = 0;
 
   /* Return true if a cluster handles only a single case value and the

  value is not a range.  */
@@ -170,7 +170,7 @@ public:
 fprintf (f, " ");
   }
 
-  void emit (tree, tree, tree, basic_block)

+  void emit (tree, tree, tree, basic_block, location_t)
   {
 gcc_unreachable ();
   }
@@ -260,7 +260,7 @@ public:
   }
 
   void emit (tree index_expr, tree index_type,

-tree default_label_expr, basic_block default_bb);
+tree default_label_expr, basic_block default_bb, location_t loc);
 
   /* Find jump tables of given CLUSTERS, where all members of the vector

  are of type simple_cluster.  New clusters are returned.  */
@@ -378,7 +378,7 @@ public:
 There *MUST* be max_case_bit_tests or less unique case
 node targets.  */
   void emit (tree index_expr, tree index_type,
-tree default_label_expr, basic_block default_bb);
+tree default_label_expr, basic_block default_bb, location_t loc);
 
   /* Find bit tests of given CLUSTERS, where all members of the vector

  are of type simple_cluster.  New clusters are returned.  */
--
2.30.0



PR98974: Fix vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread Andre Vieira (lists) via Gcc-patches

Hi,

As mentioned in the PR, this patch fixes up the nvectors parameter passed to 
vect_get_loop_mask in vectorizable_condition.
Before the STMT_VINFO_VEC_STMTS rework we used to handle each ncopy separately, 
now we gather them all at the same time and don't need to multiply vec_num with 
ncopies.

The reduced testcase I used to illustrate the issue in the PR gives a warning, 
if someone knows how to get rid of that (it's Fortran) I'd include it as a 
testcase for this.

Bootstrapped and regression tested on aarch64-none-linux-gnu. I don't believe 
that code triggers for other targets, so not sure it makes sense to test on 
others?

Is this OK for trunk? Would you rather wait for the testcase?

gcc/ChangeLog:
2021-02-05  Andre Vieira  

PR middle-end/98974
* tree-vect-stmts.c (vectorizable_condition): Fix nvectors parameter
for vect_get_loop_mask call.

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 
0bc1cb1c5b4f6c1f0447241b4d31434bf7cca1a4..d07602f6d38f9c51936ac09942599fc5a14f46ab
 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -10237,8 +10237,7 @@ vectorizable_condition (vec_info *vinfo,
{
  unsigned vec_num = vec_oprnds0.length ();
  tree loop_mask
-   = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
- vectype, i);
+   = vect_get_loop_mask (gsi, masks, vec_num, vectype, i);
  tree tmp2 = make_ssa_name (vec_cmp_type);
  gassign *g
= gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare,


[PATCH] tree-optimization/98855 - redo BB vectorization costing

2021-02-05 Thread Richard Biener
The following attempts to account for the fact that BB vectorization
regions now can span multiple loop levels and that an unprofitable
inner loop vectorization shouldn't be offsetted by a profitable
outer loop vectorization to make it overall profitable.

For now I've implemented a heuristic based on the premise that
vectorization should be profitable even if loops may not be entered
or if they iterate any number of times.  Especially the first
assumption then requires that stmts directly belonging to loop A
need to be costed separately from stmts belonging to another loop
which also simplifies the implementation.

On x86 the added testcase has in the outer loop

t.c:38:20: note: Cost model analysis for part in loop 1:
  Vector cost: 56
  Scalar cost: 192

and the inner loop

t.c:38:20: note: Cost model analysis for part in loop 2:
  Vector cost: 132
  Scalar cost: 48

and thus the vectorization is considered not profitable
(note the same would happen in case the 2nd cost were for
a loop outer to the 1st costing).

Future enhancements may consider static knowledge of whether
a loop is always entered which would allow some inefficiency
in the vectorization of its loop header.  Likewise stmts only
reachable from a loop exit can be treated this way.

Bootstrapped and tested on x86_64-unknown-linux-gnu and it fixes
the regression reported in the PR.

Does this look sensible and good enough for GCC 11?

Thanks,
Richard.

2021-02-05  Richard Biener  

PR tree-optimization/98855
* tree-vectorizer.h (add_stmt_cost): New overload.
* tree-vect-slp.c (li_cost_vec_cmp): New.
(vect_bb_slp_scalar_cost): Cost individual loop regions
separately.  Account for the scalar instance root stmt.

* g++.dg/vect/slp-pr98855.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/slp-pr98855.cc |  84 +++
 gcc/tree-vect-slp.c  | 171 ++-
 gcc/tree-vectorizer.h|   9 ++
 3 files changed, 230 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/slp-pr98855.cc

diff --git a/gcc/testsuite/g++.dg/vect/slp-pr98855.cc 
b/gcc/testsuite/g++.dg/vect/slp-pr98855.cc
new file mode 100644
index 000..0b4e479b513
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/slp-pr98855.cc
@@ -0,0 +1,84 @@
+// { dg-do compile }
+// { dg-additional-options "-fvect-cost-model=cheap" }
+// { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } }
+
+#include 
+#include 
+
+inline uint32_t make_uint32(uint8_t i0, uint8_t i1, uint8_t i2, uint8_t i3)
+{
+  return ((static_cast(i0) << 24) |
+ (static_cast(i1) << 16) |
+ (static_cast(i2) <<  8) |
+ (static_cast(i3)));
+}
+
+inline uint32_t load_be(const uint8_t in[], size_t off)
+{
+  in += off * sizeof(uint32_t);
+  return make_uint32(in[0], in[1], in[2], in[3]);
+}
+
+template
+inline void load_be(const uint8_t in[],
+   T& x0, T& x1, T& x2, T& x3,
+   T& x4, T& x5, T& x6, T& x7)
+{
+  x0 = load_be(in, 0);
+  x1 = load_be(in, 1);
+  x2 = load_be(in, 2);
+  x3 = load_be(in, 3);
+  x4 = load_be(in, 4);
+  x5 = load_be(in, 5);
+  x6 = load_be(in, 6);
+  x7 = load_be(in, 7);
+}
+
+inline void store_be(uint32_t in, uint8_t out[4])
+{
+  uint32_t o = __builtin_bswap32 (in);
+  __builtin_memcpy (out, , sizeof (uint32_t));
+}
+
+template
+inline void store_be(uint8_t out[], T x0, T x1, T x2, T x3,
+T x4, T x5, T x6, T x7)
+{
+  store_be(x0, out + (0 * sizeof(T)));
+  store_be(x1, out + (1 * sizeof(T)));
+  store_be(x2, out + (2 * sizeof(T)));
+  store_be(x3, out + (3 * sizeof(T)));
+  store_be(x4, out + (4 * sizeof(T)));
+  store_be(x5, out + (5 * sizeof(T)));
+  store_be(x6, out + (6 * sizeof(T)));
+  store_be(x7, out + (7 * sizeof(T)));
+}
+
+#define BLOCK_SIZE 8
+void encrypt_n(const uint8_t in[], uint8_t out[], size_t blocks, uint32_t *EK)
+{
+  const size_t blocks4 = blocks / 4;
+
+  for (size_t i = 0; i < blocks4; i++)
+{
+  uint32_t L0, R0, L1, R1, L2, R2, L3, R3;
+  load_be(in + 4*BLOCK_SIZE*i, L0, R0, L1, R1, L2, R2, L3, R3);
+
+  for(size_t r = 0; r != 32; ++r)
+   {
+ L0 += (((R0 << 4) ^ (R0 >> 5)) + R0) ^ EK[2*r];
+ L1 += (((R1 << 4) ^ (R1 >> 5)) + R1) ^ EK[2*r];
+ L2 += (((R2 << 4) ^ (R2 >> 5)) + R2) ^ EK[2*r];
+ L3 += (((R3 << 4) ^ (R3 >> 5)) + R3) ^ EK[2*r];
+
+ R0 += (((L0 << 4) ^ (L0 >> 5)) + L0) ^ EK[2*r+1];
+ R1 += (((L1 << 4) ^ (L1 >> 5)) + L1) ^ EK[2*r+1];
+ R2 += (((L2 << 4) ^ (L2 >> 5)) + L2) ^ EK[2*r+1];
+ R3 += (((L3 << 4) ^ (L3 >> 5)) + L3) ^ EK[2*r+1];
+   }
+
+  store_be(out + 4*BLOCK_SIZE*i, L0, R0, L1, R1, L2, R2, L3, R3);
+}
+}
+
+// { dg-final { scan-tree-dump-times "not vectorized: vectorization is not 
profitable" 2 "slp1" { target x86_64-*-* i?86-*-* } } }
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 2305bbdec3a..25f58af3a43 100644
--- a/gcc/tree-vect-slp.c
+++ 

Re: [PATCH] rs6000: Fix MMA API - Add support for compatibility built-ins

2021-02-05 Thread Florian Weimer via Gcc-patches
* Peter Bergner via Gcc-patches:

> The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
> __builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
> __builtin_vsx_disassemble_pair respectively.  It's too late to remove the
> old names, so this patch adds support for creating compatibility built-ins
> (ie, multiple built-in functions generate the same code) and then creates
> compatibility built-ins using the new names.

Maybe add a check that the compatibility builtins are flagged as
availble using __has_builtin?

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



RE: [PR97903][ARM] Missed optimization in lowering to vtst

2021-02-05 Thread Kyrylo Tkachov via Gcc-patches
Hi Prathamesh,

> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 05 February 2021 09:53
> To: gcc Patches ; Kyrylo Tkachov
> 
> Subject: [PR97903][ARM] Missed optimization in lowering to vtst
> 
> Hi,
> For the following test-case:
> 
> #include 
> 
> uint8x8_t f1(int8x8_t a, int8x8_t b) {
>   return (uint8x8_t) ((a & b) != 0);
> }
> 
> gcc fails to lower test operation to vtst, and instead emits:
> f1:
> vandd0, d0, d1
> vceq.i8 d0, d0, #0
> vmvnd0, d0
> bx  lr
> 
> The attached patch tries to fix this by adding a pattern to match this 
> combine:
> Trying 7, 8 -> 9:
> 7: r120:V8QI=r123:V8QI:V8QI
>   REG_DEAD r124:V8QI
>   REG_DEAD r123:V8QI
> 8: r122:V8QI=-r120:V8QI==const_vector
>   REG_DEAD r120:V8QI
> 9: r121:V8QI=~r122:V8QI
>   REG_DEAD r122:V8QI
> Failed to match this instruction:
> (set (reg:V8QI 121)
> (plus:V8QI (eq:V8QI (and:V8QI (reg:V8QI 123)
> (reg:V8QI 124))
> (const_vector:V8QI [
> (const_int 0 [0]) repeated x8
> ]))
> (const_vector:V8QI [
> (const_int -1 [0x]) repeated x8
> ])))
> 
> Essentially it converts:
> r120 = (and r123 r124)
> r122 = (neg (eq r120 0))
> r121 = (not r122)
> -->
> r121 = vtst r123, r124
> 
> (I guess it simplifies (not (neg X)) to (plus X -1) above).
> 
> Code-gen after patch:
> f1:
> vtst.8  d0, d0, d1
> bx  lr
> 

+(define_insn "neon_vtst_combine"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+(plus:VDQIW
+ (eq:VDQIW
+   (and:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
+  (match_operand:VDQIW 2 "s_register_operand" "w"))
+   (match_operand:VDQIW 3 "zero_operand" "i"))
+ (match_operand:VDQIW 4 "minus_one_operand" "i")))]
+  "TARGET_NEON"
+  "vtst.\t%0, %1, %2"
+)

This will need a type attribute for scheduling.

> Bootstrapped + tested on arm-linux-gnueabihf, and
> cross tested on arm*-*-*.
> Does it look OK for next stage-1 ?

It looks sensible to me for stage 1.
Thanks,
Kyrill

> 
> Thanks,
> Prathamesh


[PATCH 3/3] MIPS: fix compact-branches test FAIL for PIC default configuration

2021-02-05 Thread YunQiang Su
GCC may be configured to use PIC by default, then the test with
-mno-abicall may fail. Just add -fno-PIC option for it.
---
 gcc/testsuite/gcc.target/mips/compact-branches-5.c | 2 +-
 gcc/testsuite/gcc.target/mips/compact-branches-6.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/mips/compact-branches-5.c 
b/gcc/testsuite/gcc.target/mips/compact-branches-5.c
index 90d312c614d..0189635bf61 100644
--- a/gcc/testsuite/gcc.target/mips/compact-branches-5.c
+++ b/gcc/testsuite/gcc.target/mips/compact-branches-5.c
@@ -1,4 +1,4 @@
-/* { dg-options "-mno-abicalls -mcompact-branches=never isa_rev>=6" } */
+/* { dg-options "-mno-abicalls -fno-PIC -mcompact-branches=never isa_rev>=6" } 
*/
 void bar (int);
 
 void
diff --git a/gcc/testsuite/gcc.target/mips/compact-branches-6.c 
b/gcc/testsuite/gcc.target/mips/compact-branches-6.c
index dd35a5581bd..36180b0c76c 100644
--- a/gcc/testsuite/gcc.target/mips/compact-branches-6.c
+++ b/gcc/testsuite/gcc.target/mips/compact-branches-6.c
@@ -1,4 +1,4 @@
-/* { dg-options "-mno-abicalls -mcompact-branches=optimal isa_rev>=6" } */
+/* { dg-options "-mno-abicalls -fno-PIC -mcompact-branches=optimal isa_rev>=6" 
} */
 void bar (int);
 
 void
-- 
2.20.1



[PATCH 2/3] MIPS: add builtime option for -mcompact-branches

2021-02-05 Thread YunQiang Su
For R6+, it allows to configure gcc to use compact branches only.
---
 gcc/config.gcc   | 18 +-
 gcc/doc/install.texi | 23 +++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 17fea83b2e4..7d50e7995d2 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4743,7 +4743,7 @@ case "${target}" in
;;
 
mips*-*-*)
-   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 
madd4"
+   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 
madd4 compact-branches"
 
case ${with_float} in
"" | soft | hard)
@@ -4896,6 +4896,22 @@ case "${target}" in
exit 1
;;
esac
+
+   case ${with_compact_branches} in
+   never | always | optimal | prefer)
+   if test "$with_compact_branches" = "always" -a \
+   "$default_mips_arch" != "mips32r6" -a  \
+   "$default_mips_arch" != "mips64r6";then
+   echo "Compact-branch=always is not allowed for 
pre-R6" 1>&2
+   exit 1
+   fi
+   with_compact_branches=${with_compact_branches}
+   ;;
+   *)
+   echo "Unknown compact-branches policy used in 
--with-compact-branches" 1>&2
+   exit 1
+   ;;
+   esac
;;
 
nds32*-*-*)
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 4c38244ae58..6b9520569ba 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1464,6 +1464,29 @@ systems that support conditional traps).
 Division by zero checks use the break instruction.
 @end table
 
+@item --with-compact-branches=@var{policy}
+Specify how the compiler should generate code for checking for
+division by zero.  This option is only supported on the MIPS target.
+The possibilities for @var{type} are:
+@table @code
+@item optimal
+Cause a delay slot branch to be used if one is available in the
+current ISA and the delay slot is successfully filled. If the delay slot
+is not filled, a compact branch will be chosen if one is available.
+@item never
+Ensures that compact branch instructions will never be generated.
+@item always
+Ensures that a compact branch instruction will be generated if available.
+If a compact branch instruction is not available,
+a delay slot form of the branch will be used instead.
+This option is supported from MIPS Release 6 onwards.
+@item prefer
+Ensures that a compact branch instruction will be generated if available
+on MIPS Release 6 onwards.
+Simliar with @option{always} besides that this option works for pre-R6
+target, on which, this option will just be ignored.
+@end table
+
 @c If you make --with-llsc the default for additional targets,
 @c update the --with-llsc description in the MIPS section below.
 
-- 
2.20.1



[PATCH 1/3] MIPS: add -mcompact-branches=prefer option

2021-02-05 Thread YunQiang Su
For MIPSr6, we may wish to use compact-branches only.
Currently, we have to use `always' option, while it is mark as conflict
with pre-R6.
  cc1: error: unsupported combination: ‘mips32r2’ -mcompact-branches=always

It make some trouble for distributions to make -mcompact-branches=always
default for R6 only.

The new added `prefer' option:
   just ignored by pre-R6 target.
   do the same as `always' for R6+.
---
 gcc/config/mips/mips-opts.h|  3 ++-
 gcc/config/mips/mips.h |  6 --
 gcc/config/mips/mips.opt   |  3 +++
 gcc/doc/invoke.texi|  6 ++
 gcc/testsuite/gcc.target/mips/compact-branches-8.c | 10 ++
 gcc/testsuite/gcc.target/mips/compact-branches-9.c | 10 ++
 6 files changed, 35 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/compact-branches-8.c
 create mode 100644 gcc/testsuite/gcc.target/mips/compact-branches-9.c

diff --git a/gcc/config/mips/mips-opts.h b/gcc/config/mips/mips-opts.h
index 6214849f3e1..f3804b9722b 100644
--- a/gcc/config/mips/mips-opts.h
+++ b/gcc/config/mips/mips-opts.h
@@ -51,6 +51,7 @@ enum mips_r10k_cache_barrier_setting {
 enum mips_cb_setting {
   MIPS_CB_NEVER,
   MIPS_CB_OPTIMAL,
-  MIPS_CB_ALWAYS
+  MIPS_CB_ALWAYS,
+  MIPS_CB_PREFER
 };
 #endif
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index b4a60a55d80..f8762fe6638 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -106,20 +106,22 @@ struct mips_cpu_info {
'never' policy or the 'optimal' policy on a core that lacks
compact branch instructions.  */
 #define TARGET_CB_NEVER (mips_cb == MIPS_CB_NEVER  \
-|| (mips_cb == MIPS_CB_OPTIMAL \
+|| ((mips_cb == MIPS_CB_OPTIMAL || mips_cb == 
MIPS_CB_PREFER) \
 && !ISA_HAS_COMPACT_BRANCHES))
 
 /* Compact branches may be used if the user either selects the
'always' policy or the 'optimal' policy on a core that supports
compact branch instructions.  */
 #define TARGET_CB_MAYBE (TARGET_CB_ALWAYS  \
-|| (mips_cb == MIPS_CB_OPTIMAL \
+|| ((mips_cb == MIPS_CB_OPTIMAL || mips_cb == 
MIPS_CB_PREFER) \
 && ISA_HAS_COMPACT_BRANCHES))
 
 /* Compact branches must always be generated if the user selects
the 'always' policy or the 'optimal' policy om a core that
lacks delay slot branch instructions.  */
 #define TARGET_CB_ALWAYS (mips_cb == MIPS_CB_ALWAYS\
+|| (mips_cb == MIPS_CB_PREFER \
+&& ISA_HAS_COMPACT_BRANCHES)   \
 || (mips_cb == MIPS_CB_OPTIMAL \
 && !ISA_HAS_DELAY_SLOTS))
 
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 6af8037e9bd..f2d7550e36c 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -464,6 +464,9 @@ Enum(mips_cb_setting) String(optimal) Value(MIPS_CB_OPTIMAL)
 EnumValue
 Enum(mips_cb_setting) String(always) Value(MIPS_CB_ALWAYS)
 
+EnumValue
+Enum(mips_cb_setting) String(prefer) Value(MIPS_CB_PREFER)
+
 mloongson-mmi
 Target Mask(LOONGSON_MMI)
 Use Loongson MultiMedia extensions Instructions (MMI) instructions.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3751bc3ac7c..9493c508d5b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -25199,9 +25199,11 @@ and MIPS64 architectures specifically deprecate their 
use.
 @item -mcompact-branches=never
 @itemx -mcompact-branches=optimal
 @itemx -mcompact-branches=always
+@itemx -mcompact-branches=prefer
 @opindex mcompact-branches=never
 @opindex mcompact-branches=optimal
 @opindex mcompact-branches=always
+@opindex mcompact-branches=prefer
 These options control which form of branches will be generated.  The
 default is @option{-mcompact-branches=optimal}.
 
@@ -25215,6 +25217,10 @@ used instead.
 
 This option is supported from MIPS Release 6 onwards.
 
+The @option{-mcompact-branches=prefer} option is same with
+@option{-mcompact-branches=always} for MIPS Release 6 onwards, and
+is same with @option{-mcompact-branches=never} for pre-R6.
+
 The @option{-mcompact-branches=optimal} option will cause a delay slot
 branch to be used if one is available in the current ISA and the delay
 slot is successfully filled.  If the delay slot is not filled, a compact
diff --git a/gcc/testsuite/gcc.target/mips/compact-branches-8.c 
b/gcc/testsuite/gcc.target/mips/compact-branches-8.c
new file mode 100644
index 000..72ffcb49cfc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/compact-branches-8.c
@@ -0,0 +1,10 @@
+/* { dg-options "-mno-abicalls -mcompact-branches=prefer isa_rev<=5" } */
+void bar (int);
+
+void
+foo ()
+{
+  bar (1);
+}
+
+/* { dg-final { scan-assembler "\t(j|jal)\t" } } */
diff --git a/gcc/testsuite/gcc.target/mips/compact-branches-9.c 

[PR97903][ARM] Missed optimization in lowering to vtst

2021-02-05 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
For the following test-case:

#include 

uint8x8_t f1(int8x8_t a, int8x8_t b) {
  return (uint8x8_t) ((a & b) != 0);
}

gcc fails to lower test operation to vtst, and instead emits:
f1:
vandd0, d0, d1
vceq.i8 d0, d0, #0
vmvnd0, d0
bx  lr

The attached patch tries to fix this by adding a pattern to match this combine:
Trying 7, 8 -> 9:
7: r120:V8QI=r123:V8QI:V8QI
  REG_DEAD r124:V8QI
  REG_DEAD r123:V8QI
8: r122:V8QI=-r120:V8QI==const_vector
  REG_DEAD r120:V8QI
9: r121:V8QI=~r122:V8QI
  REG_DEAD r122:V8QI
Failed to match this instruction:
(set (reg:V8QI 121)
(plus:V8QI (eq:V8QI (and:V8QI (reg:V8QI 123)
(reg:V8QI 124))
(const_vector:V8QI [
(const_int 0 [0]) repeated x8
]))
(const_vector:V8QI [
(const_int -1 [0x]) repeated x8
])))

Essentially it converts:
r120 = (and r123 r124)
r122 = (neg (eq r120 0))
r121 = (not r122)
-->
r121 = vtst r123, r124

(I guess it simplifies (not (neg X)) to (plus X -1) above).

Code-gen after patch:
f1:
vtst.8  d0, d0, d1
bx  lr

Bootstrapped + tested on arm-linux-gnueabihf, and
cross tested on arm*-*-*.
Does it look OK for next stage-1 ?

Thanks,
Prathamesh


pr97903-1.diff
Description: Binary data


RE: [PATCH] aarch64: Reimplement vget_low* intrinsics

2021-02-05 Thread Kyrylo Tkachov via Gcc-patches
Hi Richard,

> -Original Message-
> From: Richard Biener 
> Sent: 05 February 2021 09:25
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] aarch64: Reimplement vget_low* intrinsics
> 
> On Fri, Feb 5, 2021 at 9:59 AM Kyrylo Tkachov via Gcc-patches
>  wrote:
> >
> > Hi all,
> >
> > We can do better on the vget_low* intrinsics.
> > Currently they reinterpret their argument into a V2DI vector and extract the
> low "lane",
> > reinterpreting that back into the shorter vector.
> > This is functionally correct and generates a sequence of subregs and a
> vec_select that, by itself,
> > gets optimised away eventually.
> > However it's bad when we want to use the result in a other SIMD
> operations.
> > Then the subreg-vec_select-subreg combo blocks many combine patterns.
> >
> > This patch reimplements them to emit a proper low vec_select from the
> start.
> > It generates much cleaner RTL and allows for more aggressive
> combinations, particularly
> > with the patterns that Jonathan has been pushing lately.
> >
> > Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-
> none-elf.
> > Pushing to trunk.
> 
> Just to remind you folks that we're in stage4 which means fixes to
> regressions
> (or wrong-code) only.  aarch64 is a primary target and you should provide a
> good
> example of following the rules we set up for GCC development.

Apologies for the stream of such patches this late in development.
Indeed it is quite late in the development cycle and I'll be more careful for 
the rest of stage4.

> 
> I'd expect _at least_ a short sentence on why you think this change is
> absolutely
> required for GCC 11.

It was mostly reports from some users on really bad code generation with 
intrinsics, similar to PR94442.
The root cause for most of these is are implementations of the intrinsics with 
inline assembly, which can be fixed in a mostly-mechanical way, thus the 
similarly-looking patches.
I appreciate though that this needs to be weighed against the stability 
requirements at this stage...
> 
> The change also comes with zero testcases and zero bug references.

Indeed, I could have elaborated more. The recent changes have been targeted at 
the intrinsics in arm_neon.h.
We have quite a detailed testsuite for them at 
gcc.target/aarch64/advsimd-intrinsics that exercises them, which is why I felt 
confident to push changes in that area at this stage. 

> 
> Sorry for this particular change taking the fire, I just picked a random one 
> of
> the non-regression change-storm I'm seeing for arm/aarch64 recently.
> 

Thanks keeping me honest.
Kyrill

> Thanks for your consideration,
> Richard.
> 
> > Thanks,
> > Kyrill
> >
> > Thanks,
> > Kyrill
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd-builtins.def (get_low): Define 
> > builtin.
> > * config/aarch64/aarch64-simd.md (aarch64_get_low): Define.
> > * config/aarch64/arm_neon.h (__GET_LOW): Delete.
> > (vget_low_f16): Reimplement using new builtin.
> > (vget_low_f32): Likewise.
> > (vget_low_f64): Likewise.
> > (vget_low_p8): Likewise.
> > (vget_low_p16): Likewise.
> > (vget_low_p64): Likewise.
> > (vget_low_s8): Likewise.
> > (vget_low_s16): Likewise.
> > (vget_low_s32): Likewise.
> > (vget_low_s64): Likewise.
> > (vget_low_u8): Likewise.
> > (vget_low_u16): Likewise.
> > (vget_low_u32): Likewise.
> > (vget_low_u64): Likewise.


Re: [PATCH] i386: Fix up TARGET_QIMODE_MATH for many AMD CPU tunings [PR98957]

2021-02-05 Thread Jakub Jelinek via Gcc-patches
On Fri, Feb 05, 2021 at 10:38:44AM +0100, Uros Bizjak wrote:
> > > --- gcc/config/i386/i386-options.c.jj   2021-01-14 19:34:06.041425286 
> > > +0100
> > > +++ gcc/config/i386/i386-options.c  2021-02-04 16:30:18.424999701 
> > > +0100
> > > @@ -98,6 +98,8 @@ along with GCC; see the file COPYING3.
> > >  #endif
> > >
> > >  /* Processor feature/optimization bitmasks.  */
> > > +#define m_NONE HOST_WIDE_INT_0U
> > > +#define m_ALL (~HOST_WIDE_INT_0U)
> 
> Perhaps HOST_WIDE_INT_M1U can be used here?

On two's complement hosts sure, but e.g. in the INTTYPE_MAXIMUM macro we
still use (t) ~ (t) 0.  It is all theoretic because we probably don't
support any non-two's complement hosts.

Jakub



Re: [PATCH] i386: Fix up TARGET_QIMODE_MATH for many AMD CPU tunings [PR98957]

2021-02-05 Thread Uros Bizjak via Gcc-patches
On Fri, Feb 5, 2021 at 10:31 AM Uros Bizjak  wrote:
>
> On Fri, Feb 5, 2021 at 10:29 AM Jakub Jelinek  wrote:
> >
> > Hi!
> >
> > As written in the PR, TARGET_QIMODE_MATH was meant to be set for all
> > tunings and it was the case for GCC <= 7, but as the number of
> > PROCESSOR_* enumerators grew, some AMD tunings (which are at the end
> > of the list) over time got enumerators with values >= 32 and
> > TARGET_QIMODE_MATH became disabled for them, in GCC 8 for 2
> > tunings, in GCC 9 for 7 tunings, in GCC 10 for 8 tunings, and
> > on the trunk for 11 tunings.
>
> Ouch.
>
> > The following patch fixes it by using uhwis rather than uints
> > and gives them also symbolic names.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2021-02-05  Jakub Jelinek  
> >
> > PR target/98957
> > * config/i386/i386-options.c (m_NONE, m_ALL): Define.
> > * config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS,
> > X86_TUNE_PROMOTE_QI_REGS): Use m_NONE instead of 0U.
> > (X86_TUNE_QIMODE_MATH): Use m_ALL instead of ~0U.
>
> OK for mainline and backports.
>
> Thanks,
> Uros.
>
> > --- gcc/config/i386/i386-options.c.jj   2021-01-14 19:34:06.041425286 +0100
> > +++ gcc/config/i386/i386-options.c  2021-02-04 16:30:18.424999701 +0100
> > @@ -98,6 +98,8 @@ along with GCC; see the file COPYING3.
> >  #endif
> >
> >  /* Processor feature/optimization bitmasks.  */
> > +#define m_NONE HOST_WIDE_INT_0U
> > +#define m_ALL (~HOST_WIDE_INT_0U)

Perhaps HOST_WIDE_INT_M1U can be used here?

Uros.

> >  #define m_386 (HOST_WIDE_INT_1U< >  #define m_486 (HOST_WIDE_INT_1U< >  #define m_PENT (HOST_WIDE_INT_1U< > --- gcc/config/i386/x86-tune.def.jj 2021-01-04 10:25:45.175162012 +0100
> > +++ gcc/config/i386/x86-tune.def2021-02-04 16:32:20.239645476 +0100
> > @@ -580,15 +580,15 @@ DEF_TUNE (X86_TUNE_AVOID_VECTOR_DECODE,
> > on simulation result. But after P4 was made, no performance benefit
> > was observed with branch hints.  It also increases the code size.
> > As a result, icc never generates branch hints.  */
> > -DEF_TUNE (X86_TUNE_BRANCH_PREDICTION_HINTS, "branch_prediction_hints", 0U)
> > +DEF_TUNE (X86_TUNE_BRANCH_PREDICTION_HINTS, "branch_prediction_hints", 
> > m_NONE)
> >
> >  /* X86_TUNE_QIMODE_MATH: Enable use of 8bit arithmetic.  */
> > -DEF_TUNE (X86_TUNE_QIMODE_MATH, "qimode_math", ~0U)
> > +DEF_TUNE (X86_TUNE_QIMODE_MATH, "qimode_math", m_ALL)
> >
> >  /* X86_TUNE_PROMOTE_QI_REGS: This enables generic code that promotes all 
> > 8bit
> > arithmetic to 32bit via PROMOTE_MODE macro.  This code generation scheme
> > is usually used for RISC targets.  */
> > -DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", 0U)
> > +DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", m_NONE)
> >
> >  /* X86_TUNE_EMIT_VZEROUPPER: This enables vzeroupper instruction insertion
> > before a transfer of control flow out of the function.  */
> >
> > Jakub
> >


[PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2021-02-05 Thread Martin Liška

Hello.

Based on discussion with Richi, I'm re-sending the patch. Note that the patch
has been waiting for a review for almost one year and I would like to see
it in GCC 11.1.

Thank you,
Martin
>From 4cb8475a40464d5b7805ececb280bb2463d3caa4 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 5 Feb 2021 10:28:57 +0100
Subject: [PATCH] Change semantics of -frecord-gcc-switches and add
 -frecord-gcc-switches-format.

gcc/ChangeLog:

	* common.opt: Document new options -frecord-gcc-switches-format
	and -frecord-gcc-switches-file.
	* doc/invoke.texi: Document the new options.
	* dwarf2out.c (dwarf2out_early_finish): Respect
	flag_record_gcc_switches_format.
	* flag-types.h (enum record_gcc_switches_format): New.
	* gcc.c (save_driver_cmdline): Save command line to a temporary
	file.
	(driver::main): Call set_commandline.
	(driver::set_commandline): New.
	* gcc.h (set_commandline): New.
	* opts.c (get_driver_command_line): New.
	* opts.h (get_driver_command_line): New.
	* toplev.c (init_asm_output): Use new function get_producer_string.
	(process_options): Respect flag_record_gcc_switches_format
	option.

Co-Authored-By: Egeyar Bagcioglu  
---
 gcc/common.opt  | 19 ++-
 gcc/doc/invoke.texi | 23 ++-
 gcc/dwarf2out.c | 17 -
 gcc/flag-types.h|  7 +++
 gcc/gcc.c   | 37 -
 gcc/gcc.h   |  1 +
 gcc/opts.c  | 31 +++
 gcc/opts.h  |  2 ++
 gcc/toplev.c| 27 ---
 9 files changed, 149 insertions(+), 15 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index a8a2b67a99d..088d552cf31 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2326,9 +2326,26 @@ Common Joined RejectNegative Var(common_deferred_options) Defer
 ; records information in the assembler output file as comments, so
 ; they never reach the object file.
 frecord-gcc-switches
-Common Var(flag_record_gcc_switches)
+Common Driver Var(flag_record_gcc_switches)
 Record gcc command line switches in the object file.
 
+Enum
+Name(record_gcc_switches_format) Type(enum record_gcc_switches_format)
+
+EnumValue
+Enum(record_gcc_switches_format) String(processed) Value(RECORD_GCC_SWITCHES_PROCESSED)
+
+EnumValue
+Enum(record_gcc_switches_format) String(driver) Value(RECORD_GCC_SWITCHES_DRIVER)
+
+frecord-gcc-switches-format=
+Common Joined RejectNegative Var(flag_record_gcc_switches_format) Enum(record_gcc_switches_format) Init(RECORD_GCC_SWITCHES_PROCESSED)
+-frecord-gcc-switches-format=[processed|driver]	Format of recorded gcc command line switches.
+
+frecord-gcc-switches-file=
+Common Joined RejectNegative Var(flag_record_gcc_switches_file) Undocumented
+Path to file that contains driver command line options.
+
 freg-struct-return
 Common Var(flag_pcc_struct_return,0) Optimization
 Return small aggregates in registers.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3751bc3ac7c..acdf9ed2dba 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -654,7 +654,7 @@ Objective-C and Objective-C++ Dialects}.
 -finhibit-size-directive  -fcommon  -fno-ident @gol
 -fpcc-struct-return  -fpic  -fPIC  -fpie  -fPIE  -fno-plt @gol
 -fno-jump-tables -fno-bit-tests @gol
--frecord-gcc-switches @gol
+-frecord-gcc-switches -frecord-gcc-switches-format @gol
 -freg-struct-return  -fshort-enums  -fshort-wchar @gol
 -fverbose-asm  -fpack-struct[=@var{n}]  @gol
 -fleading-underscore  -ftls-model=@var{model} @gol
@@ -16359,6 +16359,27 @@ comments, so it never reaches the object file.
 See also @option{-grecord-gcc-switches} for another
 way of storing compiler options into the object file.
 
+@item -frecord-gcc-switches-format=@var{format}
+@opindex frecord-gcc-switches-format
+This switch controls the output format of options that record
+the command line.  The affected options are @option{-frecord-gcc-switches},
+@option{-grecord-gcc-switches} and @option{-fverbose-asm}.
+
+The @var{format} argument should be one of the following:
+
+@table @samp
+
+@item processed
+
+Options are printed after processing by the compiler driver.
+It is the default option value.
+
+@item driver
+
+Options are printed as provided to the compiler driver.
+
+@end table
+
 @item -fpic
 @opindex fpic
 @cindex global offset table
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 0a61d148c8a..2de79446b5e 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -32149,13 +32149,20 @@ dwarf2out_early_finish (const char *filename)
  header compilation, so always fill it with empty string initially
  and overwrite only here.  */
   dw_attr_node *producer = get_AT (comp_unit_die (), DW_AT_producer);
+  producer_string = concat (lang_hooks.name, " ", version_string, NULL);
 
   if (dwarf_record_gcc_switches)
-producer_string = gen_producer_string (lang_hooks.name,
-	   save_decoded_options,
-	   save_decoded_options_count);
-  else
-producer_string = concat 

Re: [PATCH] i386: Fix up TARGET_QIMODE_MATH for many AMD CPU tunings [PR98957]

2021-02-05 Thread Uros Bizjak via Gcc-patches
On Fri, Feb 5, 2021 at 10:29 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As written in the PR, TARGET_QIMODE_MATH was meant to be set for all
> tunings and it was the case for GCC <= 7, but as the number of
> PROCESSOR_* enumerators grew, some AMD tunings (which are at the end
> of the list) over time got enumerators with values >= 32 and
> TARGET_QIMODE_MATH became disabled for them, in GCC 8 for 2
> tunings, in GCC 9 for 7 tunings, in GCC 10 for 8 tunings, and
> on the trunk for 11 tunings.

Ouch.

> The following patch fixes it by using uhwis rather than uints
> and gives them also symbolic names.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-02-05  Jakub Jelinek  
>
> PR target/98957
> * config/i386/i386-options.c (m_NONE, m_ALL): Define.
> * config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS,
> X86_TUNE_PROMOTE_QI_REGS): Use m_NONE instead of 0U.
> (X86_TUNE_QIMODE_MATH): Use m_ALL instead of ~0U.

OK for mainline and backports.

Thanks,
Uros.

> --- gcc/config/i386/i386-options.c.jj   2021-01-14 19:34:06.041425286 +0100
> +++ gcc/config/i386/i386-options.c  2021-02-04 16:30:18.424999701 +0100
> @@ -98,6 +98,8 @@ along with GCC; see the file COPYING3.
>  #endif
>
>  /* Processor feature/optimization bitmasks.  */
> +#define m_NONE HOST_WIDE_INT_0U
> +#define m_ALL (~HOST_WIDE_INT_0U)
>  #define m_386 (HOST_WIDE_INT_1U<  #define m_486 (HOST_WIDE_INT_1U<  #define m_PENT (HOST_WIDE_INT_1U< --- gcc/config/i386/x86-tune.def.jj 2021-01-04 10:25:45.175162012 +0100
> +++ gcc/config/i386/x86-tune.def2021-02-04 16:32:20.239645476 +0100
> @@ -580,15 +580,15 @@ DEF_TUNE (X86_TUNE_AVOID_VECTOR_DECODE,
> on simulation result. But after P4 was made, no performance benefit
> was observed with branch hints.  It also increases the code size.
> As a result, icc never generates branch hints.  */
> -DEF_TUNE (X86_TUNE_BRANCH_PREDICTION_HINTS, "branch_prediction_hints", 0U)
> +DEF_TUNE (X86_TUNE_BRANCH_PREDICTION_HINTS, "branch_prediction_hints", 
> m_NONE)
>
>  /* X86_TUNE_QIMODE_MATH: Enable use of 8bit arithmetic.  */
> -DEF_TUNE (X86_TUNE_QIMODE_MATH, "qimode_math", ~0U)
> +DEF_TUNE (X86_TUNE_QIMODE_MATH, "qimode_math", m_ALL)
>
>  /* X86_TUNE_PROMOTE_QI_REGS: This enables generic code that promotes all 8bit
> arithmetic to 32bit via PROMOTE_MODE macro.  This code generation scheme
> is usually used for RISC targets.  */
> -DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", 0U)
> +DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", m_NONE)
>
>  /* X86_TUNE_EMIT_VZEROUPPER: This enables vzeroupper instruction insertion
> before a transfer of control flow out of the function.  */
>
> Jakub
>


[PATCH] i386: Fix up TARGET_QIMODE_MATH for many AMD CPU tunings [PR98957]

2021-02-05 Thread Jakub Jelinek via Gcc-patches
Hi!

As written in the PR, TARGET_QIMODE_MATH was meant to be set for all
tunings and it was the case for GCC <= 7, but as the number of
PROCESSOR_* enumerators grew, some AMD tunings (which are at the end
of the list) over time got enumerators with values >= 32 and
TARGET_QIMODE_MATH became disabled for them, in GCC 8 for 2
tunings, in GCC 9 for 7 tunings, in GCC 10 for 8 tunings, and
on the trunk for 11 tunings.

The following patch fixes it by using uhwis rather than uints
and gives them also symbolic names.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-02-05  Jakub Jelinek  

PR target/98957
* config/i386/i386-options.c (m_NONE, m_ALL): Define.
* config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS,
X86_TUNE_PROMOTE_QI_REGS): Use m_NONE instead of 0U.
(X86_TUNE_QIMODE_MATH): Use m_ALL instead of ~0U.

--- gcc/config/i386/i386-options.c.jj   2021-01-14 19:34:06.041425286 +0100
+++ gcc/config/i386/i386-options.c  2021-02-04 16:30:18.424999701 +0100
@@ -98,6 +98,8 @@ along with GCC; see the file COPYING3.
 #endif
 
 /* Processor feature/optimization bitmasks.  */
+#define m_NONE HOST_WIDE_INT_0U
+#define m_ALL (~HOST_WIDE_INT_0U)
 #define m_386 (HOST_WIDE_INT_1U<

Re: [PATCH] aarch64: Reimplement vget_low* intrinsics

2021-02-05 Thread Richard Biener via Gcc-patches
On Fri, Feb 5, 2021 at 9:59 AM Kyrylo Tkachov via Gcc-patches
 wrote:
>
> Hi all,
>
> We can do better on the vget_low* intrinsics.
> Currently they reinterpret their argument into a V2DI vector and extract the 
> low "lane",
> reinterpreting that back into the shorter vector.
> This is functionally correct and generates a sequence of subregs and a 
> vec_select that, by itself,
> gets optimised away eventually.
> However it's bad when we want to use the result in a other SIMD operations.
> Then the subreg-vec_select-subreg combo blocks many combine patterns.
>
> This patch reimplements them to emit a proper low vec_select from the start.
> It generates much cleaner RTL and allows for more aggressive combinations, 
> particularly
> with the patterns that Jonathan has been pushing lately.
>
> Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
> Pushing to trunk.

Just to remind you folks that we're in stage4 which means fixes to regressions
(or wrong-code) only.  aarch64 is a primary target and you should provide a good
example of following the rules we set up for GCC development.

I'd expect _at least_ a short sentence on why you think this change is
absolutely
required for GCC 11.

The change also comes with zero testcases and zero bug references.

Sorry for this particular change taking the fire, I just picked a random one of
the non-regression change-storm I'm seeing for arm/aarch64 recently.

Thanks for your consideration,
Richard.

> Thanks,
> Kyrill
>
> Thanks,
> Kyrill
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-simd-builtins.def (get_low): Define builtin.
> * config/aarch64/aarch64-simd.md (aarch64_get_low): Define.
> * config/aarch64/arm_neon.h (__GET_LOW): Delete.
> (vget_low_f16): Reimplement using new builtin.
> (vget_low_f32): Likewise.
> (vget_low_f64): Likewise.
> (vget_low_p8): Likewise.
> (vget_low_p16): Likewise.
> (vget_low_p64): Likewise.
> (vget_low_s8): Likewise.
> (vget_low_s16): Likewise.
> (vget_low_s32): Likewise.
> (vget_low_s64): Likewise.
> (vget_low_u8): Likewise.
> (vget_low_u16): Likewise.
> (vget_low_u32): Likewise.
> (vget_low_u64): Likewise.


[PATCH] aarch64: Reimplement vget_high* intrinsics

2021-02-05 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

Similar to the vget_low* intrinsics we should just use a proper vec_select 
rather than
going through V2DI subregs.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (get_high): Define builtin.
* config/aarch64/aarch64-simd.md (aarch64_get_high): Define.
* config/aarch64/arm_neon.h (__GET_HIGH): Delete.
(vget_high_f16): Reimplement using new builtin.
(vget_high_f32): Likewise.
(vget_high_f64): Likewise.
(vget_high_p8): Likewise.
(vget_high_p16): Likewise.
(vget_high_p64): Likewise.
(vget_high_s8): Likewise.
(vget_high_s16): Likewise.
(vget_high_s32): Likewise.
(vget_high_s64): Likewise.
(vget_high_u8): Likewise.
(vget_high_u16): Likewise.
(vget_high_u32): Likewise.
(vget_high_u64): Likewise.


get-high.patch
Description: get-high.patch


Re: [PATCH] Fix Ada bootstrap failure on Cygwin since switch to C++11 (PR98590)

2021-02-05 Thread Arnaud Charlet
> This fixes the bootstrap failure with Ada on Cygwin since the switch
> to C++11. The configure checks detect that fileno_unlocked () is
> present, but when Ada's cstreams.c is compiled in C++11 mode,
>  does not declare it, causing a hard error.
> 
> Fixed by defining _GNU_SOURCE before including .
> 
> Ok for the master branch?
> 
> gcc/ada/
> 
> 2021-02-03  Mikael Pettersson  
> 
> PR bootstrap/98590
> * cstreams.c: Ensure fileno_unlocked() is visible on Cygwin.

We'd rather not have PR references in the source files, so please remove it 
(it will be there as part of the commit log and git annotate).

OK with the comment updated.

> diff --git a/gcc/ada/cstreams.c b/gcc/ada/cstreams.c
> index 4e00dedbbd6..9d2f41c5269 100644
> --- a/gcc/ada/cstreams.c
> +++ b/gcc/ada/cstreams.c
> @@ -37,6 +37,11 @@
>  #define _FILE_OFFSET_BITS 64
>  /* the define above will make off_t a 64bit type on GNU/Linux */
> 
> +/* tell Cygwin's  to expose fileno_unlocked() to work around
> PR98590 */
> +#if defined(__CYGWIN__) && !defined(__CYGWIN32__) && !defined(_GNU_SOURCE)
> +#define _GNU_SOURCE
> +#endif
> +
>  #include 
>  #include 
>  #include 


[PATCH] aarch64: Reimplement vget_low* intrinsics

2021-02-05 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

We can do better on the vget_low* intrinsics.
Currently they reinterpret their argument into a V2DI vector and extract the 
low "lane",
reinterpreting that back into the shorter vector.
This is functionally correct and generates a sequence of subregs and a 
vec_select that, by itself,
gets optimised away eventually.
However it's bad when we want to use the result in a other SIMD operations.
Then the subreg-vec_select-subreg combo blocks many combine patterns.

This patch reimplements them to emit a proper low vec_select from the start.
It generates much cleaner RTL and allows for more aggressive combinations, 
particularly
with the patterns that Jonathan has been pushing lately.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (get_low): Define builtin.
* config/aarch64/aarch64-simd.md (aarch64_get_low): Define.
* config/aarch64/arm_neon.h (__GET_LOW): Delete.
(vget_low_f16): Reimplement using new builtin.
(vget_low_f32): Likewise.
(vget_low_f64): Likewise.
(vget_low_p8): Likewise.
(vget_low_p16): Likewise.
(vget_low_p64): Likewise.
(vget_low_s8): Likewise.
(vget_low_s16): Likewise.
(vget_low_s32): Likewise.
(vget_low_s64): Likewise.
(vget_low_u8): Likewise.
(vget_low_u16): Likewise.
(vget_low_u32): Likewise.
(vget_low_u64): Likewise.


get_low.patch
Description: get_low.patch


Re: [RFC] test builtin ratio for loop distribution

2021-02-05 Thread Richard Biener via Gcc-patches
On Thu, Feb 4, 2021 at 11:18 PM Alexandre Oliva  wrote:
>
> On Feb  4, 2021, Richard Biener  wrote:
>
> >> >  b) if expansion would use BY_PIECES then expand to an unrolled loop
> >>
> >> Why would that be better than keeping the constant-length memset call,
> >> that would be turned into an unrolled loop during expand?
>
> > Well, because of the possibly lost ctz and alignment info.
>
> Funny you should mention that.  I got started with the expand-time
> expansion yesterday, and found out that we're not using the alignment
> information that is available.  Though the pointer is known to point to
> an aligned object, we are going for 8-bit alignment for some reason.
>
> The strategy I used there was to first check whether by_pieces would
> expand inline a constant length near the max known length, then loop
> over the bits in the variable length, expand in each iteration a
> constant-length store-by-pieces for the fixed length corresponding to
> that bit, and a test comparing the variable length with the fixed length
> guarding the expansion of the store-by-pieces.  We may get larger code
> this way (no loops), but only O(log(len)) compares.
>
> I've also fixed some bugs in the ldist expander, so now it bootstraps,
> but with a few regressions in the testsuite, that I'm yet to look into.
>
> >> Uhh, thanks, but...  you realize nearly all of the gimple-building code
> >> is one and the same for the loop and for trailing count misalignment?
>
> > Sorry, the code lacked comments and so I didn't actually try decipering
> > the code you generate ;)
>
> Oh, come on, it was planly obscure ;-D
>
> Sorry for posting an early-draft before polishing it up.
>
> > The original motivation was really that esp. for small trip count loops
> > the target knows best how to implement them.  Now, that completely
> > fails of course in case the target doesn't implement any of this or
> > the generic code fails because we lost ctz and alignment info.
>
> In our case, generic code fails because it won't handle variable-sized
> clear-by-pieces.  But then, I found out, when it's fixed-size, it also
> makes the code worse, because it seems to expand to byte stores even
> when the store-to object is known to have wider alignment:
>
> union u {
>   long long i;
>   char c[8];
> } x[8];
> int s(union u *p, int k) {
>   for (int i = k ? 0 : 3; i < 8; i++) {
> for (int j = 0; j < 8; j++) {
>   p[i].c[j] = 0;
> } // becomes a memset to an 8-byte-aligned 8-byte object, then 8 byte 
> stores
>   }
> }

On x86_64 I see two DImode stores generated, but that appears to be
done via setmem.

> >> > I think the builtins with alignment and calloc-style element count
> >> > will be useful on its own.
> >>
> >> Oh, I see, you're suggesting actual separate builtin functions.  Uhh...
> >> I'm not sure I want to go there.  I'd much rather recover the ctz of the
> >> length, and use it in existing code.
>
> > Yeah, but when we generate memcpy there might not be a way to
> > store the ctz info until RTL expansion where the magic should really happen 
> > ...
>
> True.  It can be recovered without much difficulty in the cases I've
> looked at, but it could be lost in others.
>
> > So I'd say go for improving RTL expansion.
>
> 'k, thanks
>
> --
> Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
>Free Software Activist GNU Toolchain Engineer
> Vim, Vi, Voltei pro Emacs -- GNUlius Caesar