date:20200629

Emit a variable defined in gcc

2020-06-29 Thread Harshit Sharma via Gcc-patches

Hello,
I am working on a gcc patch for asan. The patch is almost ready except one
thing. To make sure that the user has applied this patch before using asan
feature, I want to declare an additional variable in gcc which is
referenced by our source code so that if this patch is missing, the user
gets an error compiling the code because the reference to this variable
will not be resolved.

I am still new to gcc development. So, can anyone tell me how can I make
gcc emit this variable?


Thanks,
Harshit

Re: [PATCH 2/7] PowerPC tests: Add PLI/PADDI tests.

2020-06-29 Thread Michael Meissner via Gcc-patches

On Mon, Jun 29, 2020 at 01:42:56PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 29, 2020 at 02:23:22PM -0400, Michael Meissner wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target powerpc_prefixed_addr } */
> > +/* { dg-require-effective-target lp64 } */
> 
> Please always say (_in the test_) why something is required, if it isn't
> obvious.
> 
> > +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> > +
> > +/* Test that PLI (PADDI) is generated to load a large constant.  */
> > +unsigned long long
> > +large (void)
> > +{
> > +  return 0x12345678ULL;
> > +}
> > +
> > +/* { dg-final { scan-assembler {\mpli\M} } } */
> 
> I have no idea why 64-bit mode (or 64-bit addressing) is needed here.
> *Is* it needed?

Yes it is needed.  Otherwise two separate load immediates would be needed to
load each part of the DI constant that is held in 2 registers.

> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c
> > @@ -0,0 +1,161 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target powerpc_prefixed_addr } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> 
> > +  unsigned int si; /* offset 49 bytes.  */
> 
> > +int
> > +load_si (struct packed_struct *p)
> > +{
> > +  return p->si;/* PLWA 3,49(3).  */
> > +}
> 
> Here it is because this would be just lwz on 32-bit.
> 
> But that is the only difference, so you could just make that single test
> conditional, not the whole file.
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/prefix-no-update.c
> > @@ -0,0 +1,51 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target powerpc_prefixed_addr } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> 
> For this testcase, I have no idea at all why you want lp64?

Becuase to show the bug you need a stack frame larger than 64K.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] RISC-V: Handle multi-letter extension for multilib-generator

2020-06-29 Thread Kito Cheng

 - The order of multi-lib config could be wrong if multi-ltter are
   used, e.g. `./multilib-generator rv32izfh-ilp32--c`, would expect
   rv32ic_zfh/ilp32 reuse rv32izfh/ilp32, however the multi-ltter is not
   handled correctly, it will generate reuse rule for rv32izfhc/ilp32
   which is invalid arch configuration.

gcc/ChangeLog:

* config/riscv/multilib-generator (arch_canonicalize): Handle
multi-letter extension.
Using underline as separator between different extensions.
---
 gcc/config/riscv/multilib-generator | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/multilib-generator 
b/gcc/config/riscv/multilib-generator
index b9194e6d3cc1..d4334c8847da 100755
--- a/gcc/config/riscv/multilib-generator
+++ b/gcc/config/riscv/multilib-generator
@@ -39,7 +39,6 @@ reuse = []
 canonical_order = "mafdgqlcbjtpvn"
 
 def arch_canonicalize(arch):
-  # TODO: Support Z, S, H, or X extensions.
   # TODO: Support implied extensions, e.g. D implied F in latest spec.
   # TODO: Support extension version.
   new_arch = ""
@@ -56,19 +55,34 @@ def arch_canonicalize(arch):
   long_ext_prefixes_idx = list(filter(lambda x: x != -1, 
long_ext_prefixes_idx))
   if long_ext_prefixes_idx:
 first_long_ext_idx = min(long_ext_prefixes_idx)
-long_exts = arch[first_long_ext_idx:]
+long_exts = arch[first_long_ext_idx:].split("_")
 std_exts = arch[5:first_long_ext_idx]
   else:
-long_exts = ""
+long_exts = []
 std_exts = arch[5:]
 
+  # Single letter extension might appear in the long_exts list,
+  # becasue we just append extensions list to the arch string.
+  std_exts += "".join(filter(lambda x:len(x) == 1, long_exts))
+
+  # Multi-letter extension must in lexicographic order.
+  long_exts = sorted(filter(lambda x:len(x) != 1, long_exts))
+
   # Put extensions in canonical order.
   for ext in canonical_order:
 if ext in std_exts:
   new_arch += ext
 
+  # Check every extension is processed.
+  for ext in std_exts:
+if ext == '_':
+  continue
+if ext not in canonical_order:
+  raise Exception("Unsupported extension `%s`" % ext)
+
   # Concat rest of the multi-char extensions.
-  new_arch += long_exts
+  if long_exts:
+new_arch += "_" + "_".join(long_exts)
   return new_arch
 
 for cfg in sys.argv[1:]:
@@ -77,7 +91,7 @@ for cfg in sys.argv[1:]:
   abis[abi] = 1
   extra = list(filter(None, extra.split(',')))
   ext = list(filter(None, ext.split(',')))
-  alts = sum([[x] + [x + y for y in ext] for x in [arch] + extra], [])
+  alts = sum([[x] + [x + "_" + y for y in ext] for x in [arch] + extra], [])
   # TODO: We should expand g to imadzifencei once we support newer spec.
   alts = alts + [x.replace('imafd', 'g') for x in alts if 'imafd' in x]
   alts = list(map(arch_canonicalize, alts))
-- 
2.27.0

RE: [PATCH PR95855]A missing ifcvt optimization to generate fcsel

2020-06-29 Thread yangyang (ET)

Hi,

> > Hi,
> >
> > This is a simple fix for pr95855.
> >
> > With this fix, pass_split_paths can recognize the if-conversion
> opportunity of the testcase and doesn't duplicate the corresponding block.
> >
> > Added one testcase for this. Bootstrap and tested on both aarch64 and
> x86 Linux platform, no new regression witnessed.
> >
> > Ok for trunk?
> 
> Can you try using the num_stmts_in_pred[12] counts instead of using
> empty_block_p?

It' ok to using num_stmts_in_pred[12] to judge whether the pred[12] is 
empty since bb's immediate dominator can't meet the constraints 
"single_pred_p (pred[12]) && single_pred (pred[12]) == pred[21]".

> 
> Your matching doesn't allow for FP constants like
> 
>  dmax[0] = d1[i] < 1.0 ? 1.0 : d1[i];
> 
> since FP constants are not shared.  You likely want to use operand_equal_p to
> do the PHI argument comparison.

That's right, after using operand_equal_p instead of == to do the PHI argument
Comparison, the mentioned case can be covered as well. 

> 
> Thanks,
> Richard.

Thanks for your suggestions. We have revised our patch based on your 
suggestions. 

Bootstrap and tested on both aarch64 and x86 Linux platform. Does the v1 patch 
looks better?

Yang Yang

+2020-06-30  Yang Yang  
+
+   PR tree-optimization/95855
+   * gimple-ssa-split-paths.c (is_feasible_trace): Add extra
+   checks to recognize a missed if-conversion opportunity when
+   judging whether to duplicate a block.
+

+2020-06-30 Yang Yang  
+
+   PR tree-optimization/95855
+   * gcc.dg/tree-ssa/split-paths-12.c: New testcase.
+


PR95855-v1.patch
Description: PR95855-v1.patch

[PATCH] RISC-V: Preserve arch version info during normalizing arch string

2020-06-29 Thread Kito Cheng

- Arch version should preserved if user explicitly specified the version.
  e.g.
After normalize, -march=rv32if3d should be -march=rv32i_f3p0d
instead of-march=rv32ifd.

gcc/ChangeLog:

* common/config/riscv/riscv-common.c(riscv_subset_t): New field
added.
(riscv_subset_list::parsing_subset_version): Add parameter for
indicate explicitly version, and handle explicitly version.
(riscv_subset_list::handle_implied_ext): Ditto.
(riscv_subset_list::add): Ditto.
(riscv_subset_t::riscv_subset_t): Init new field.
(riscv_subset_list::to_string): Always output version info if version
explicitly specified.
(riscv_subset_list::parsing_subset_version): Handle explicitly
arch version.
(riscv_subset_list::parse_std_ext): Ditto.
(riscv_subset_list::parse_multiletter_ext): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-13.c: New.
---
 gcc/common/config/riscv/riscv-common.c| 70 ---
 gcc/testsuite/gcc.target/riscv/attribute-13.c |  6 ++
 2 files changed, 52 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-13.c

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 2df93460165b..96a128acdeac 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -42,6 +42,8 @@ struct riscv_subset_t
   int major_version;
   int minor_version;
   struct riscv_subset_t *next;
+
+  bool explicit_version_p;
 };
 
 /* Type for implied ISA info.  */
@@ -80,19 +82,19 @@ private:
   riscv_subset_list (const char *, location_t);
 
   const char *parsing_subset_version (const char *, unsigned *, unsigned *,
- unsigned, unsigned, bool);
+ unsigned, unsigned, bool, bool *);
 
   const char *parse_std_ext (const char *);
 
   const char *parse_multiletter_ext (const char *, const char *,
 const char *);
 
-  void handle_implied_ext (const char *, int, int);
+  void handle_implied_ext (const char *, int, int, bool);
 
 public:
   ~riscv_subset_list ();
 
-  void add (const char *, int, int);
+  void add (const char *, int, int, bool);
 
   riscv_subset_t *lookup (const char *,
  int major_version = RISCV_DONT_CARE_VERSION,
@@ -111,7 +113,8 @@ static const char *riscv_supported_std_ext (void);
 static riscv_subset_list *current_subset_list = NULL;
 
 riscv_subset_t::riscv_subset_t ()
-  : name (), major_version (0), minor_version (0), next (NULL)
+  : name (), major_version (0), minor_version (0), next (NULL),
+explicit_version_p(false)
 {
 }
 
@@ -138,7 +141,7 @@ riscv_subset_list::~riscv_subset_list ()
 
 void
 riscv_subset_list::add (const char *subset, int major_version,
-   int minor_version)
+   int minor_version, bool explicit_version_p)
 {
   riscv_subset_t *s = new riscv_subset_t ();
 
@@ -148,6 +151,7 @@ riscv_subset_list::add (const char *subset, int 
major_version,
   s->name = subset;
   s->major_version = major_version;
   s->minor_version = minor_version;
+  s->explicit_version_p = explicit_version_p;
   s->next = NULL;
 
   if (m_tail != NULL)
@@ -173,13 +177,15 @@ riscv_subset_list::to_string (bool version_p) const
   /* For !version_p, we only separate extension with underline for
 multi-letter extension.  */
   if (!first &&
- (version_p || subset->name.length() > 1))
+ (version_p
+  || subset->explicit_version_p
+  || subset->name.length() > 1))
oss << '_';
   first = false;
 
   oss << subset->name;
 
-  if (version_p)
+  if (version_p || subset->explicit_version_p)
oss  << subset->major_version
 << 'p'
 << subset->minor_version;
@@ -240,7 +246,8 @@ riscv_supported_std_ext (void)
  `major_version` using default_major_version.
  `default_major_version`: Default major version.
  `default_minor_version`: Default minor version.
- `std_ext_p`: True if parsing std extension.  */
+ `std_ext_p`: True if parsing std extension.
+ `explicit_version_p`: True if this subset is not using default version.  
*/
 
 const char *
 riscv_subset_list::parsing_subset_version (const char *p,
@@ -248,13 +255,15 @@ riscv_subset_list::parsing_subset_version (const char *p,
   unsigned *minor_version,
   unsigned default_major_version,
   unsigned default_minor_version,
-  bool std_ext_p)
+  bool std_ext_p,
+  bool *explicit_version_p)
 {
   bool major_p = true;
   unsigned version = 0;
   unsigned major = 0;
   unsigned minor = 0;
   char

[PATCH PR95961] vect: ICE: in exact_div, at poly-int.h:2182

2020-06-29 Thread Yangfei (Felix)

Hi,

PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95961 

In the test case for PR95961, vectorization factor computed by 
vect_determine_vectorization_factor is [8,8].  But this is updated to [1,1] 
later by vect_update_vf_for_slp.
When we call vect_get_num_vectors in vect_enhance_data_refs_alignment, the 
number of scalars which is based on the vectorization factor is not a multiple 
of the the
number of elements in the vector type.  This leads to the ICE.  We should check 
that before calling vect_get_num_vectors and set local variable 
'possible_npeel_number' to
zero if there are too few scalars.

Bootstrapped and tested on aarch64-linux-gnu.  ChangeLog update are contained 
in the patch.

Comments?

Thanks,
Felix


pr95961-v1.diff
Description: pr95961-v1.diff

[wwwdocs PATCH] Fix typo

2020-06-29 Thread Hu Jiangping

Hi,

this patch fix a typo in contribute.html.

Best Regards.
Hujp

---
 htdocs/contribute.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/contribute.html b/htdocs/contribute.html
index 80a4470e..a913565b 100644
--- a/htdocs/contribute.html
+++ b/htdocs/contribute.html
@@ -133,7 +133,7 @@ other changes applied.
 Documentation changes do not require a new bootstrap (a working
 bootstrap is necessary to get the build environment correct), but you
 must perform make info and make dvi and correct
-and errors.  You should investigate complaints about overfull or
+any errors.  You should investigate complaints about overfull or
 underfull hboxes from make dvi, as these can be the only
 indication of serious markup problems, but do not feel obliged to
 eliminate them all.
-- 
2.17.1

Re: [RFC] rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-06-29 Thread Joseph Myers

On Mon, 29 Jun 2020, Richard Biener via Gcc-patches wrote:

> I'm not sure if the actual choice of macro values for the fe* builtins
> need glueing logic or if we want them to be determined statically
> by the target configuration - see how we handle folding of
> fpclassify.  At least without -frounding-math fegetround could be
> constant folded to FE_TONEAREST for which we'd need the
> actual value of FE_TONEAREST.

In most cases, target architectures have fixed values for the exceptions, 
independent of the target OS, but on SPARC there is OS dependence, which 
the SPARC_LOW_FE_EXCEPT_VALUES macro deals with (for the interface between 
atomic compound assignment and libatomic's __atomic_feraiseexcept).  I'd 
guess they tend to have fixed values for the rounding modes as well.

If GCC had a built-in fegetround that was always expanded inline and 
always know the correct value for each rounding mode, that would allow 
fixing bug 30569 (making FLT_ROUNDS depend on the rounding mode at runtime 
- note that expanding the FLT_ROUNDS macro mustn't introduce a dependency 
on libm, hence the need to expand inline).  That bug could also be fixed 
incrementally, target by target, if the target-independent code had some 
way to determine whether the target would expand the built-in fegetround 
inline and what the rounding mode values would be.

(I've tended to suppose that defining a separate __builtin_flt_rounds 
would be the way to go for fixing bug 30569, but that could easily wrap 
__builtin_fegetround and return a constant value for targets that can't 
expand __builtin_fegetround inline.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] [og10] amdgcn: Add waitcnt after LDS write instructions

2020-06-29 Thread Stubbs, Andrew

On 29 Jun 2020 22:03, "Brown, Julian"  wrote:
On Mon, 29 Jun 2020 21:32:41 +0100
Andrew Stubbs  wrote:
> In particular, it seems logical that any barrier should be a memory
> barrier, so inserting it in the barrier pattern is not a big deal.
> IIRC, only OpenACC is using that anyway (OpenMP has explicit asm
> inserts in libgomp).

I'd be happier with that idea if ds_{read,write} operations were *only*
used for broadcasting -- but they're not, they may also be used for
(some) gang-private variables and for reduction temporaries. I don't
have a test case for either of those at present demonstrating bad
behaviour with no waitcnt, but I guess it's theoretically possible for
there to be one, at least.

If there's no barrier then a few cycles this way or that shouldn't make any 
difference, surely?

The only exception I can think of might be atomic release operators, but those 
do a cache flush already, so there shouldn't be any issue with a slightly 
delayed DS operation. Maybe there should be a wait instruction before those too.

The "proper" solution is a general (& "optimal") waitcnt insertion
pass, I think, that works with other memory operations as well as the
DS ones.

Well, yes, that would be nice. The read waits are surely the worst performance 
loss. It's not a trivial task though, and AMD refused to fund it as a directed 
services task last winter.

Andrew

[PATCH] PR middle-end/90597: gcc_assert ICE in layout_type

2020-06-29 Thread Roger Sayle


It turns out that the ICE diagnosed/fixed in my earlier nvptx patch, caused by
TYPE_SIZE(type) being zero during error handling in gcc.dg/attr-vector_size.c
is actually fairly common among backends, and is known in bugzilla as
PR middle-end/90597,  apparently a recent regression.

The following patch should fix the default implementation of 
TARGET_VECTOR_ALIGNMENT, known as default_vector_alignment,
using the same logic.  This patch is relatively untested, a "make bootstrap"
on x86_64-pc-linux-gnu confirms that this code compiles without problems,
but doesn't actually exercise the code itself.

OK for mainline?  Thanks in advance to anyone who can confirm this patch
resolves the unexpected failure of gcc.dg/attr-vector_size.c on an affected
platform (i.e. a backend that doesn't define TARGET_VECTOR_ALIGNMENT).

2020-06-30  Roger Sayle  

PR middle-end/90597
gcc/ChangeLog:
* targhooks.c (default_vector_alignment): Return at least the
GET_MODE_ALIGNMENT for the type's mode.

Thanks,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 0113c7b..da4805d 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1233,8 +1233,9 @@ default_vector_alignment (const_tree type)
   tree size = TYPE_SIZE (type);
   if (tree_fits_uhwi_p (size))
 align = tree_to_uhwi (size);
-
-  return align < MAX_OFILE_ALIGNMENT ? align : MAX_OFILE_ALIGNMENT;
+  if (align >= MAX_OFILE_ALIGNMENT)
+return MAX_OFILE_ALIGNMENT;
+  return MAX (align, GET_MODE_ALIGNMENT (TYPE_MODE (type)));
 }
 
 /* The default implementation of

RE: [PATCH 5/6 ver 3] rs6000, Add vector splat builtin support

2020-06-29 Thread Carl Love via Gcc-patches

On Mon, 2020-06-29 at 16:58 -0500, Segher Boessenkool wrote:
> On Mon, Jun 29, 2020 at 02:29:54PM -0700, Carl Love wrote:
> > Segher:
> > 
> > On Thu, 2020-06-25 at 17:39 -0500, Segher Boessenkool wrote:
> > > > +;; Return 1 if op is a constant 32-bit floating point value
> > > > +(define_predicate "f32bit_const_operand"
> > > > +  (match_code "const_double")
> > > > +{
> > > > +  if (GET_MODE (op) == SFmode)
> > > > +return 1;
> > > > +
> > > > +  else if ((GET_MODE (op) == DFmode) && ((UINTVAL (op) >> 32)
> > > > ==
> > > > 0))
> > > > +   {
> > > > +/* Value fits in 32-bits */
> > > > +return 1;
> > > > +}
> > > > +  else
> > > > +/* Not the expected mode.  */
> > > > +return 0;
> > > > +})
> > > 
> > > I don't think this is the correct test.  What you want to see is
> > > if
> > > the
> > > number in "op" can be converted to an IEEE single-precision
> > > number,
> > > and
> > > back again, losslessly.  (And subnormal SP numbers aren't allowed
> > > either, but NaNs and infinities are).
> > 
> > The predicate is used with the xxsplitw_v4sf define_expand.  The
> > "user"
> > claims the given immediate bit pattern is the bit pattern for a
> > single
> > precision floating point number.  The immediate value is not
> > converted
> > to a float.  Rather we are splatting a bit pattern that the "user"
> > already claims represents a 32-bit floating point value.  I just
> > need
> > to make sure the immediate value actually fits into 32-bits.
> > 
> > I don't see that I need to check that the value can be converted to
> > IEEE float and back.  
> 
> Ah, okay.  Can you please put that in the function comment
> then?  Right
> now it says
> ;; Return 1 if op is a constant 32-bit floating point value
> and that is quite misleading.

Would the following be more clear

;; Return 1 if op is a constant bit pattern representing a floating
;; point value that fits in 32-bits. 

  Carl

Re: [RFC] rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-06-29 Thread Segher Boessenkool

On Mon, Jun 29, 2020 at 11:45:41PM +0200, Marc Glisse wrote:
> On Mon, 29 Jun 2020, Segher Boessenkool wrote:
> 
> >Another question.  How do these builtins prevent other FP insns from
> >being moved (or optimised) "over" them?
> 
> At the GIMPLE level they don't.

And not at RTL level, either.

> They prevent other function calls from 
> moving across, just because function calls where at least one is not pure 
> can't cross, but otherwise fenv_access is one big missing feature in gcc. 
> I started something last year (and postponed indefinitely for lack of 
> time), replacing all FP operations (when the safe mode is enabled) with 
> builtins that get expanded by default to insert asm pass-through on the 
> arguments and the result.

Yes, it is an ancient missing feature, and still very relevant.  Thanks
for any attempt you made / are making / will make to make this better!

My fear is that if we optimise the floating env access better, that then
fewer bad transforms are accidentally prevented :-/


Segher

Re: [PATCH 5/6 ver 3] rs6000, Add vector splat builtin support

2020-06-29 Thread Segher Boessenkool

On Mon, Jun 29, 2020 at 02:29:54PM -0700, Carl Love wrote:
> Segher:
> 
> On Thu, 2020-06-25 at 17:39 -0500, Segher Boessenkool wrote:
> > > +;; Return 1 if op is a constant 32-bit floating point value
> > > +(define_predicate "f32bit_const_operand"
> > > +  (match_code "const_double")
> > > +{
> > > +  if (GET_MODE (op) == SFmode)
> > > +return 1;
> > > +
> > > +  else if ((GET_MODE (op) == DFmode) && ((UINTVAL (op) >> 32) ==
> > > 0))
> > > +   {
> > > +/* Value fits in 32-bits */
> > > +return 1;
> > > +}
> > > +  else
> > > +/* Not the expected mode.  */
> > > +return 0;
> > > +})
> > 
> > I don't think this is the correct test.  What you want to see is if
> > the
> > number in "op" can be converted to an IEEE single-precision number,
> > and
> > back again, losslessly.  (And subnormal SP numbers aren't allowed
> > either, but NaNs and infinities are).
> 
> The predicate is used with the xxsplitw_v4sf define_expand.  The "user"
> claims the given immediate bit pattern is the bit pattern for a single
> precision floating point number.  The immediate value is not converted
> to a float.  Rather we are splatting a bit pattern that the "user"
> already claims represents a 32-bit floating point value.  I just need
> to make sure the immediate value actually fits into 32-bits.
> 
> I don't see that I need to check that the value can be converted to
> IEEE float and back.  

Ah, okay.  Can you please put that in the function comment then?  Right
now it says
;; Return 1 if op is a constant 32-bit floating point value
and that is quite misleading.


Segher

Re: [RFC] rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-06-29 Thread Marc Glisse


On Mon, 29 Jun 2020, Segher Boessenkool wrote:


Another question.  How do these builtins prevent other FP insns from
being moved (or optimised) "over" them?


At the GIMPLE level they don't. They prevent other function calls from 
moving across, just because function calls where at least one is not pure 
can't cross, but otherwise fenv_access is one big missing feature in gcc. 
I started something last year (and postponed indefinitely for lack of 
time), replacing all FP operations (when the safe mode is enabled) with 
builtins that get expanded by default to insert asm pass-through on the 
arguments and the result.


--
Marc Glisse

Re: [PATCH] Accept "user" as an alias for "login" in .netrc

2020-06-29 Thread Gerald Pfeifer

Please ignore this (unless you happen to be a wget maintainer in a
different life ;-). As a friendly soul pointed out - wrong address...

Gerald

RE: [PATCH 5/6 ver 3] rs6000, Add vector splat builtin support

2020-06-29 Thread Carl Love via Gcc-patches

Segher:

On Thu, 2020-06-25 at 17:39 -0500, Segher Boessenkool wrote:
> > +;; Return 1 if op is a constant 32-bit floating point value
> > +(define_predicate "f32bit_const_operand"
> > +  (match_code "const_double")
> > +{
> > +  if (GET_MODE (op) == SFmode)
> > +return 1;
> > +
> > +  else if ((GET_MODE (op) == DFmode) && ((UINTVAL (op) >> 32) ==
> > 0))
> > +   {
> > +/* Value fits in 32-bits */
> > +return 1;
> > +}
> > +  else
> > +/* Not the expected mode.  */
> > +return 0;
> > +})
> 
> I don't think this is the correct test.  What you want to see is if
> the
> number in "op" can be converted to an IEEE single-precision number,
> and
> back again, losslessly.  (And subnormal SP numbers aren't allowed
> either, but NaNs and infinities are).

The predicate is used with the xxsplitw_v4sf define_expand.  The "user"
claims the given immediate bit pattern is the bit pattern for a single
precision floating point number.  The immediate value is not converted
to a float.  Rather we are splatting a bit pattern that the "user"
already claims represents a 32-bit floating point value.  I just need
to make sure the immediate value actually fits into 32-bits.

I don't see that I need to check that the value can be converted to
IEEE float and back.  

   Carl

Go patch committed: Remove some erroneous code that was never run

2020-06-29 Thread Ian Lance Taylor via Gcc-patches

This patch to the Go frontend removes some erroneous code that was
never run.  The code accidentally called Type::type_descriptor rather
than the do_type_descriptor method.  Calling Type::type_descriptor
with a second argument of NULL would always crash.  Since that never
happened, it revealed that this code was never actually executed.
Removing this code fixes GCC PR 95970, in which a new warning triggers
for the now-removed code.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
b2e336fcb15cedeb3c19489e358e47916ac51efc
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index fa3764891fb..ecef60400cc 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-d4dade353648eae4a1eaa1acd3e4ce1f7180a913
+30674246ef60ab74566a21f362a7de7a09b99955
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/types.cc b/gcc/go/gofrontend/types.cc
index d6cd326b2e2..3459a3357a3 100644
--- a/gcc/go/gofrontend/types.cc
+++ b/gcc/go/gofrontend/types.cc
@@ -11106,15 +11106,11 @@ Named_type::do_type_descriptor(Gogo* gogo, 
Named_type* name)
 {
   if (this->is_error_)
 return Expression::make_error(this->location_);
-  if (name == NULL && this->is_alias_)
-{
-  if (this->seen_alias_)
-   return Expression::make_error(this->location_);
-  this->seen_alias_ = true;
-  Expression* ret = this->type_->type_descriptor(gogo, NULL);
-  this->seen_alias_ = false;
-  return ret;
-}
+
+  // We shouldn't see unnamed type aliases here.  They should have
+  // been removed by the call to unalias in Type::type_descriptor_pointer.
+  // We can see named type aliases via Type::named_type_descriptor.
+  go_assert(name != NULL || !this->is_alias_);
 
   // If NAME is not NULL, then we don't really want the type
   // descriptor for this type; we want the descriptor for the

[PATCH, committed] PR fortran/95978 - [10/11 Regression] ICE in gfc_match_data, at fortran/decl.c:731

2020-06-29 Thread Harald Anlauf

Committed as obvious on mastger.

Regtested on x86_64-pc-linux-gnu.

Will backport to 10-branch.

Thanks,
Harald


PR fortran/95978 - ICE in gfc_match_data, at fortran/decl.c:731

Catch NULL pointer dereference on invalid DATA statement.

gcc/fortran/
PR fortran/95978
* decl.c (gfc_match_data): Avoid NULL pointer dereference.

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index ac1f63f66e0..f38def4c291 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -728,7 +728,7 @@ gfc_match_data (void)
 	  gfc_constructor *c;
 	  c = gfc_constructor_first (new_data->value->expr->value.constructor);
 	  for (; c; c = gfc_constructor_next (c))
-	if (c->expr->ts.type == BT_BOZ)
+	if (c->expr && c->expr->ts.type == BT_BOZ)
 	  {
 		gfc_error ("BOZ literal constant at %L cannot appear in a "
 			   "structure constructor", >expr->where);
diff --git a/gcc/testsuite/gfortran.dg/pr95978.f90 b/gcc/testsuite/gfortran.dg/pr95978.f90
new file mode 100644
index 000..47bd7067096
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95978.f90
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! PR fortran/95978 - ICE in gfc_match_data, at fortran/decl.c:731
+
+program p
+  type t
+ integer :: a
+ type(t), allocatable :: b
+ data c /t(1)/   ! { dg-error "Unexpected DATA statement" }
+  end type t
+end

[patch, fortran, committed] Fix bogus recursion check

2020-06-29 Thread Thomas Koenig via Gcc-patches


Hello world,

I just committed the attached patch as obvious and simple. It's
one line, or alternatively, 24 characters long :-)

Best regards

Thomas

Do not generate recursion check for compiler-generated procedures.

This one-line fix removes a check for recursion for procedures
which are compiler-generated, such as finalizers or deallocation.
These need to be recursive, even if the user code should not be.

gcc/fortran/ChangeLog:

PR fortran/95743
* trans-decl.c (gfc_generate_function_code): Do not generate
recursion check for compiler-generated procedures.

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index e10122e6e0c..769ab20c82d 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -6789,7 +6789,7 @@ gfc_generate_function_code (gfc_namespace * ns)
 		 || (sym->attr.entry_master
 		 && sym->ns->entries->sym->attr.recursive);
   if ((gfc_option.rtcheck & GFC_RTCHECK_RECURSION)
-  && !is_recursive && !flag_recursive)
+  && !is_recursive && !flag_recursive && !sym->attr.artificial)
 {
   char * msg;
 
diff --git a/gcc/testsuite/gfortran.dg/recursive_check_16.f90 b/gcc/testsuite/gfortran.dg/recursive_check_16.f90
new file mode 100644
index 000..d8e9d69ea7b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/recursive_check_16.f90
@@ -0,0 +1,25 @@
+! { dg-do  run }
+! ! { dg-options "-fcheck=recursion" }
+! PR 95743 - this used cause a runtime error.
+! Test case by Antoine Lemoine
+
+program test_recursive_call
+   implicit none
+
+   type t_tree_node
+  type(t_tree_node), dimension(:), allocatable :: child
+   end type
+
+   type t_tree
+  type(t_tree_node), allocatable :: root
+   end type
+
+   type(t_tree), allocatable :: tree
+
+   allocate(tree)
+   allocate(tree%root)
+   allocate(tree%root%child(1))
+   ! If the line below is removed, the code works fine.
+   allocate(tree%root%child(1)%child(1))
+   deallocate(tree)
+end program

Re: [PATCH] [og10] amdgcn: Add waitcnt after LDS write instructions

2020-06-29 Thread Julian Brown

On Mon, 29 Jun 2020 21:32:41 +0100
Andrew Stubbs  wrote:

> On 29/06/2020 21:16, Julian Brown wrote:
> > Data-share write (ds_write) instructions do not necessarily complete
> > the write to LDS immediately. When a write completes, LGKM_CNT is
> > decremented. For now, we wait until LGKM_CNT reaches zero after each
> > ds_write instruction.
> > 
> > This fixes a race condition in the case where LDS is read
> > immediately after being written. This can happen with broadcast
> > operations.
> > 
> > OK for og10 branch?  
> 
> I'm not saying no (because this issue needs a fix), but the thought 
> occurs that inserting one wait before the barrier might be better
> than inserting a wait after each and every write.
> 
> In particular, it seems logical that any barrier should be a memory 
> barrier, so inserting it in the barrier pattern is not a big deal.
> IIRC, only OpenACC is using that anyway (OpenMP has explicit asm
> inserts in libgomp).

I'd be happier with that idea if ds_{read,write} operations were *only*
used for broadcasting -- but they're not, they may also be used for
(some) gang-private variables and for reduction temporaries. I don't
have a test case for either of those at present demonstrating bad
behaviour with no waitcnt, but I guess it's theoretically possible for
there to be one, at least.

The "proper" solution is a general (& "optimal") waitcnt insertion
pass, I think, that works with other memory operations as well as the
DS ones.

Thanks,

Julian

[PATCH] PR fortran/95980 - ICE in get_unique_type_string, at fortran/class.c:485

2020-06-29 Thread Harald Anlauf

Dear all,

here's a couple of NULL pointer dereferences on invalid code.

Regtested on x86_64-pc-linux-gnu.

OK for master?

Thanks,
Harald


PR fortran/95980 - ICE on using sync images with -fcheck=bounds

In SELECT TYPE, the argument may be an incorrectly specified unlimited
polymorphic variable.  Avoid a NULL pointer dereference for clean error
recovery.

gcc/fortran/
PR fortran/95980
* match.c (copy_ts_from_selector_to_associate, build_class_sym):
Distinguish between unlimited polymorphic and ordinary variables
to avoid NULL pointer dereference.
* resolve.c resolve_select_type):
Distinguish between unlimited polymorphic and ordinary variables
to avoid NULL pointer dereference.

diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
index db5174f3f21..7d3711c55f9 100644
--- a/gcc/fortran/match.c
+++ b/gcc/fortran/match.c
@@ -6159,14 +6159,18 @@ copy_ts_from_selector_to_associate (gfc_expr *associate, gfc_expr *selector)
   while (ref && ref->next)
 ref = ref->next;

-  if (selector->ts.type == BT_CLASS && CLASS_DATA (selector)->as
+  if (selector->ts.type == BT_CLASS
+  && CLASS_DATA (selector)
+  && CLASS_DATA (selector)->as
   && CLASS_DATA (selector)->as->type == AS_ASSUMED_RANK)
 {
   assoc_sym->attr.dimension = 1;
   assoc_sym->as = gfc_copy_array_spec (CLASS_DATA (selector)->as);
   goto build_class_sym;
 }
-  else if (selector->ts.type == BT_CLASS && CLASS_DATA (selector)->as
+  else if (selector->ts.type == BT_CLASS
+	   && CLASS_DATA (selector)
+	   && CLASS_DATA (selector)->as
 	   && ref && ref->type == REF_ARRAY)
 {
   /* Ensure that the array reference type is set.  We cannot use
@@ -6223,7 +6227,8 @@ build_class_sym:
 {
   /* The correct class container has to be available.  */
   assoc_sym->ts.type = BT_CLASS;
-  assoc_sym->ts.u.derived = CLASS_DATA (selector)->ts.u.derived;
+  assoc_sym->ts.u.derived = CLASS_DATA (selector)
+	? CLASS_DATA (selector)->ts.u.derived : selector->ts.u.derived;
   assoc_sym->attr.pointer = 1;
   gfc_build_class_symbol (_sym->ts, _sym->attr, _sym->as);
 }
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index f3e8ffc204c..4574e8cd752 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -9224,7 +9224,8 @@ resolve_select_type (gfc_code *code, gfc_namespace *old_ns)
 	{
 	  if (code->expr1->symtree->n.sym->attr.untyped)
 	code->expr1->symtree->n.sym->ts = code->expr2->ts;
-	  selector_type = CLASS_DATA (code->expr2)->ts.u.derived;
+	  selector_type = CLASS_DATA (code->expr2)
+	? CLASS_DATA (code->expr2)->ts.u.derived : code->expr2->ts.u.derived;
 	}

   if (code->expr2->rank && CLASS_DATA (code->expr1)->as)
diff --git a/gcc/testsuite/gfortran.dg/pr95980.f90 b/gcc/testsuite/gfortran.dg/pr95980.f90
new file mode 100644
index 000..7c8260a96e0
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95980.f90
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! PR fortran/95980 - ICE in get_unique_type_string, at fortran/class.c:485
+
+program p
+  type t
+  end type t
+  class(t) :: x! { dg-error "must be dummy, allocatable or pointer" }
+  select type (y => x)
+  end select
+end

Re: [RFC] rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-06-29 Thread Segher Boessenkool

Hi!

On Mon, Jun 29, 2020 at 08:49:05AM +0200, Richard Biener via Gcc-patches wrote:
> On Fri, Jun 26, 2020 at 10:12 PM Raoni Fassina Firmino via Gcc-patches
>  wrote:
> > This is an early draft I'm working on to add fegetround , feclearexcept
> > and feraiseexcept as builtins on rs6000.  This is my first patch so I
> > welcome any and all feedback.  Foremost I have some questions to ask as
> > I got stuck on some problems.

> > Q2) How to fallback to the default behavior of the function call when
> > the builtin is not suitable for the parameters?

In general, in the expander you can do FAIL in such cases.  For some
patterns that isn't allowed, and you have to copy everything the
standard implemntation does to your specialized implementation.  This
of course doesn't scale, and will make you miss all future changes to
the standard implementation.  It is then probably better to then do the
work the original implementatyion skimped on, and *do* allow FAIL.

> > Here, it is more specifically for feclearexcept and feraiseexcept.  The
> > builtin should only be used in the case of the parameter input is a
> > constant number with only 1bit mask (to work on only one exception).

rs6000 has exact_log2_cint_operand (and the "N" constraint).

> > Q3) Are the implementations for the builtins more or less on the
> > right places?
> >
> > The first one I did was fegetround and I based it on ppc_get_timebase
> > and other related builtins, so I used a define_expand on rs6000.md, but
> > when I was working on the fe*except I was basing it on other builtins
> > and ended up implementing it all on rs6000-call.c, but I am not sure if
> > there is a canonical way of doing it one way or another.

Patterns have to go to one of the .md files.  rs6000.md is a fine choice;
but put this somewhere near the other floating point patterns?

> GCC already knows fe* builtins, what GCC does not yet have is
> a way for targets to specify custom expansion of them.  So instead
> of adding powerpc specific builtins you should add optabs for the
> RTL expansion part.

Yes.

> > +static rtx
> > +rs6000_expand_feCRexcept_builtin (enum insn_code icode, tree exp, rtx 
> > target)

No caMel case please.  "fe" and "cr" do not mean to much here; think of
a nicer name please?  Saving a character or two isn't useful, this is
called in only a few places.

> > +  //|| INTVAL (op0) == FE_INVALID)

Please fix.

> > +;; int __builtin_fegetround()
> > +(define_expand "rs6000_fegetround"
> > +  [(use (match_operand:SI 0 "gpc_reg_operand"))]
> > +  "TARGET_HARD_FLOAT"
> > +{
> > +rtx tmp_df = gen_reg_rtx (DFmode);
> > +emit_insn (gen_rs6000_mffsl (tmp_df));
> > +
> > +rtx tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> > +rtx tmp_di_2 = gen_reg_rtx (DImode);
> > +emit_insn (gen_anddi3 (tmp_di_2, tmp_di, GEN_INT (0x3LL)));

Just  GEN_INT (3)  will do fine.

Another question.  How do these builtins prevent other FP insns from
being moved (or optimised) "over" them?

Segher

Re: [PATCH] [og10] amdgcn: Add waitcnt after LDS write instructions

2020-06-29 Thread Andrew Stubbs


On 29/06/2020 21:16, Julian Brown wrote:

Data-share write (ds_write) instructions do not necessarily complete
the write to LDS immediately. When a write completes, LGKM_CNT is
decremented. For now, we wait until LGKM_CNT reaches zero after each
ds_write instruction.

This fixes a race condition in the case where LDS is read immediately
after being written. This can happen with broadcast operations.

OK for og10 branch?


I'm not saying no (because this issue needs a fix), but the thought 
occurs that inserting one wait before the barrier might be better than 
inserting a wait after each and every write.


In particular, it seems logical that any barrier should be a memory 
barrier, so inserting it in the barrier pattern is not a big deal. IIRC, 
only OpenACC is using that anyway (OpenMP has explicit asm inserts in 
libgomp).


WDYT?

Andrew

[PATCH] [og10] OpenACC: Fix race condition in Fortran loop collapse tests

2020-06-29 Thread Julian Brown

The gangs participating in a gang-partitioned loop are not all guaranteed
to complete before some given gang continues to execute beyond that loop.
This means that two existing test cases contain a race condition,
because a loop that may be gang-partitioned is followed immediately by
another loop.  The fix is to place the loops in separate parallel regions.

OK for og10 branch?

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-fortran/collapse-1.f90: Fix race condition.
* testsuite/libgomp.oacc-fortran/collapse-2.f90: Likewise.
---
 libgomp/testsuite/libgomp.oacc-fortran/collapse-1.f90 | 3 +++
 libgomp/testsuite/libgomp.oacc-fortran/collapse-2.f90 | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/collapse-1.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/collapse-1.f90
index 918c5d0d5b1c..4857752f1b0a 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/collapse-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/collapse-1.f90
@@ -14,6 +14,9 @@ program collapse1
 end do
   end do
 end do
+  !$acc end parallel
+
+  !$acc parallel
   !$acc loop collapse(2) reduction(.or.:l)
 do i = 1, 3
   do j = 4, 6
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/collapse-2.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/collapse-2.f90
index 98b6987750ec..0a543909127e 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/collapse-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/collapse-2.f90
@@ -13,6 +13,9 @@ program collapse2
 do 164 k = 5, 7
   a(i, j, k) = i + j + k
 164  end do
+  !$acc end parallel
+
+  !$acc parallel
   !$acc loop collapse(2) reduction(.or.:l)
 firstdo: do i = 1, 3
   do j = 4, 6
-- 
2.23.0

[PATCH] [og10] OpenACC: Turn off worker partitioning if num_workers==1

2020-06-29 Thread Julian Brown

This patch turns off the middle-end worker-partitioning support if the
number of workers for an outlined offload function is one.  In that case,
we do not need to perform the broadcasting/neutering code transformation.

OK for og10 branch?

Julian

ChangeLog

gcc/
* omp-offload.c (pass_oacc_gimple_workers::gate): Disable worker
partitioning if num_workers is 1.
---
 gcc/omp-offload.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index bf72782ba4ce..2b730d057781 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -2165,7 +2165,20 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
   {
-return flag_openacc && targetm.goacc.worker_partitioning;
+if (!flag_openacc || !targetm.goacc.worker_partitioning)
+  return false;
+
+tree attr = oacc_get_fn_attrib (current_function_decl);
+
+if (!attr)
+  /* Not an offloaded function.  */
+  return false;
+
+int worker_dim
+  = oacc_get_fn_dim_size (current_function_decl, GOMP_DIM_WORKER);
+
+/* No worker partitioning if we know the number of workers is 1.  */
+return worker_dim != 1;
   };
 
   virtual unsigned int execute (function *)
-- 
2.23.0

[PATCH] [og10] OpenACC: Shared memory layout optimisation

2020-06-29 Thread Julian Brown

This patch implements an algorithm to lay out local data-share (LDS) space.  It 
currently works for AMD GCN.  At the moment, LDS is used for three things:

  1. Gang-private variables
  2. Reduction temporaries (accumulators)
  3. Broadcasting for worker partitioning

After the patch is applied, (2) and (3) are placed at preallocated
locations in LDS, and (1) continues to be handled by the backend (as it
is at present prior to this patch being applied). LDS now looks like this:

  +--+ (gang local size + 1024, = 1536)
  | free space   |
  |...   |
  | - - - - - - -|
  | worker bcast |
  +--+
  | reductions   |
  +--+ <<< -mgang-local-size= (def. 512)
  | gang private |
  |vars  |
  +--+ (32)
  | low LDS vars |
  +--+ LDS base

So, gang-private space is fixed at a constant amount at compile time
(which can be increased with a command-line switch if necessary
for some given code). The layout algorithm takes out a slice of the
remainder of usable space for reduction vars, and uses the rest for
worker partitioning.

The partitioning algorithm works as follows.

 1. An "adjacency" set is built up for each basic block that might
do a broadcast. This is calculated by starting at each such block,
and doing a recursive DFS walk over successors to find the next
block (or blocks) that *also* does a broadcast
(dfs_broadcast_reachable_1).

 2. The adjacency set is inverted to get adjacent predecessor blocks also.

 3. Blocks that will perform a broadcast are sorted by size of that
broadcast: the biggest blocks are handled first.

 4. A splay tree structure is used to calculate the spans of LDS memory
that are already allocated by the blocks adjacent to this one
(merge_ranges{,_1}.

 5. The current block's broadcast space is allocated from the first free
span not allocated in the splay tree structure calculated above
(first_fit_range). This seems to work quite nicely and efficiently
with the splay tree structure.

 6. Continue with the next-biggest broadcast block until we're done.

In this way, "adjacent" broadcasts will not use the same piece of
LDS memory.

OK for og10 branch?

Julian

ChangeLog

gcc/
* config/gcn/gcn-protos.h (gcn_goacc_adjust_private_decl): Update
prototype.
* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Use
preallocated block of LDS memory.
(gcn_goacc_create_propagation_record): Add OFFSET parameter, and return
temporary LDS space at that offset.  Return pointer in "sender" case.
(gcn_goacc_adjust_private_decl): Return var.
* config/gcn/gcn.c (acc_lds_size, gangprivate_hwm, lds_allocs): New
global vars.
(ACC_LDS_SIZE): Define as acc_lds_size.
(gcn_init_machine_status): Don't initialise lds_allocated and
lds_allocs fields of machine function struct.
(gcn_option_override): Handle default size for gang-private variables
and -mgang-local-size option.
(gcn_expand_prologue): Use LDS_SIZE instead of LDS_SIZE-1 when
initialising M0_REG.
(gcn_shared_mem_layout): New function.
(gcn_print_lds_decl): Update comment. Use global lds_allocs map and
gangprivate_hwm variable.
(TARGET_GOACC_SHARED_MEM_LAYOUT): Define target hook.
* config/gcn/gcn.h (machine_function): Remove lds_allocated,
lds_allocs. Add reduction_base, reduction_limit.
* config/gcn/gcn.opt (gang_local_size_opt): New global.
(mgang-local-size=): New option.
* doc/tm.texi.in (TARGET_GOACC_SHARED_MEM_LAYOUT): Place documentation
hook.
* doc/tm.texi: Regenerate.
* omp-offload.c (addr_expr_rewrite_info): Change adjusted_vars to a
hash_map.
(rewrite_addr_expr): Rewrite VAR_DECLs also.
(default_goacc_create_propagation_record): Add OFFSET parameter.
(execute_oacc_gimple_workers): Calculate per-function reduction
temporary and private-variable size.  Call OpenACC shared_mem_layout
hook.  Move num_workers==1 handling here.
(execute_oacc_device_lower): Fix for adjusted_vars being a hash_map
rather than a hash_set.
(pass_oacc_gimple_workers::gate): Remove num_workers==1 handling from
here.  Enable pass for all OpenACC routines in order to call shared
memory-layout hook.
* omp-sese.c (targhooks.h, diagnostic-core.h): Add includes.
(build_sender_ref): Handle sender_decl being pointer.
(worker_single_copy): Add PLACEMENT and ISOLATE_BROADCASTS parameters.
Pass placement argument to create_propagation_record hook invocations.
Handle sender_decl being pointer and isolate_broadcasts inserting extra
barriers.
(blk_offset_map_t): Add typedef.
(neuter_worker_single): Add BLK_OFFSET_MAP parameter.  Pass
preallocated range to worker_single_copy call.

[PATCH] [og10] OpenACC: Fix mkoffload SGPR/VGPR count parsing for HSACO v3

2020-06-29 Thread Julian Brown

If an offload kernel uses a large number of VGPRs, AMD GCN hardware may
need to limit the number of threads/workers launched for that kernel.
The number of SGPRs/VGPRs in use is detected by mkoffload and recorded in
the processed output.  The patterns emitted detailing SGPR/VGPR occupancy
changed between HSACO v2 and v3 though, so this patch updates parsing
to account for that.

OK for og10 branch? (I will repost for mainline after re-testing, etc.)

Julian

ChangeLog

gcc/
* config/gcn/mkoffload.c (process_asm): Initialise regcount.  Update
scanning for SGPR/VGPR usage for HSACO v3.
---
 gcc/config/gcn/mkoffload.c | 40 --
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
index 723da108b655..48a86c719532 100644
--- a/gcc/config/gcn/mkoffload.c
+++ b/gcc/config/gcn/mkoffload.c
@@ -230,7 +230,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 int sgpr_count;
 int vgpr_count;
 char *kernel_name;
-  } regcount;
+  } regcount = { -1, -1, NULL };
 
   /* Always add _init_array and _fini_array as kernels.  */
   obstack_ptr_grow (_os, xstrdup ("_init_array"));
@@ -238,7 +238,12 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   fn_count += 2;
 
   char buf[1000];
-  enum { IN_CODE, IN_AMD_KERNEL_CODE_T, IN_VARS, IN_FUNCS } state = IN_CODE;
+  enum
+{ IN_CODE,
+  IN_METADATA,
+  IN_VARS,
+  IN_FUNCS
+} state = IN_CODE;
   while (fgets (buf, sizeof (buf), in))
 {
   switch (state)
@@ -251,21 +256,25 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
obstack_grow (_os, , sizeof (dim));
dims_count++;
  }
-   else if (sscanf (buf, " .amdgpu_hsa_kernel %ms\n",
-_name) == 1)
- break;
 
break;
  }
-   case IN_AMD_KERNEL_CODE_T:
+   case IN_METADATA:
  {
-   gcc_assert (regcount.kernel_name);
-   if (sscanf (buf, " wavefront_sgpr_count = %d\n",
-   _count) == 1)
+   if (sscanf (buf, " - .name: %ms\n", _name) == 1)
  break;
-   else if (sscanf (buf, " workitem_vgpr_count = %d\n",
+   else if (sscanf (buf, " .sgpr_count: %d\n",
+_count) == 1)
+ {
+   gcc_assert (regcount.kernel_name);
+   break;
+ }
+   else if (sscanf (buf, " .vgpr_count: %d\n",
 _count) == 1)
- break;
+ {
+   gcc_assert (regcount.kernel_name);
+   break;
+ }
 
break;
  }
@@ -306,9 +315,10 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
state = IN_VARS;
   else if (sscanf (buf, " .section .gnu.offload_funcs%c", ) > 0)
state = IN_FUNCS;
-  else if (sscanf (buf, " .amd_kernel_code_%c", ) > 0)
+  else if (sscanf (buf, " .amdgpu_metadata%c", ) > 0)
{
- state = IN_AMD_KERNEL_CODE_T;
+ state = IN_METADATA;
+ regcount.kernel_name = NULL;
  regcount.sgpr_count = regcount.vgpr_count = -1;
}
   else if (sscanf (buf, " .section %c", ) > 0
@@ -317,7 +327,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   || sscanf (buf, " .data%c", ) > 0
   || sscanf (buf, " .ident %c", ) > 0)
state = IN_CODE;
-  else if (sscanf (buf, " .end_amd_kernel_code_%c", ) > 0)
+  else if (sscanf (buf, " .end_amdgpu_metadata%c", ) > 0)
{
  state = IN_CODE;
  gcc_assert (regcount.kernel_name != NULL
@@ -329,7 +339,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
  regcount.sgpr_count = regcount.vgpr_count = -1;
}
 
-  if (state == IN_CODE || state == IN_AMD_KERNEL_CODE_T)
+  if (state == IN_CODE || state == IN_METADATA)
fputs (buf, out);
 }
 
-- 
2.23.0

[PATCH] [og10] amdgcn: Add waitcnt after LDS write instructions

2020-06-29 Thread Julian Brown

Data-share write (ds_write) instructions do not necessarily complete
the write to LDS immediately. When a write completes, LGKM_CNT is
decremented. For now, we wait until LGKM_CNT reaches zero after each
ds_write instruction.

This fixes a race condition in the case where LDS is read immediately
after being written. This can happen with broadcast operations.

OK for og10 branch?

Julian

ChangeLog

gcc/
* config/gcn/gcn-valu.md (scatter_insn_1offset_ds):
Add waitcnt.
(*mov_insn, *movti_insn): Add waitcnt to ds_write alternatives.
---
 gcc/config/gcn/gcn-valu.md | 2 +-
 gcc/config/gcn/gcn.md  | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 6d7fecaa12c2..9dfaec1d0645 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -923,7 +923,7 @@
   {
 addr_space_t as = INTVAL (operands[3]);
 static char buf[200];
-sprintf (buf, "ds_write%%b2\t%%0, %%2 offset:%%1%s",
+sprintf (buf, "ds_write%%b2\t%%0, %%2 offset:%%1%s\;s_waitcnt\tlgkmcnt(0)",
 (AS_GDS_P (as) ? " gds" : ""));
 return buf;
   }
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 8cfb3a85d256..e58669240c67 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -554,7 +554,7 @@
   flat_load_dword\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store_dword\t%A0, %1%O0%g0
   v_mov_b32\t%0, %1
-  ds_write_b32\t%A0, %1%O0
+  ds_write_b32\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b32\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   s_mov_b32\t%0, %1
   global_load_dword\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
@@ -582,7 +582,7 @@
   flat_load%o1\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store%s0\t%A0, %1%O0%g0
   v_mov_b32\t%0, %1
-  ds_write%b0\t%A0, %1%O0
+  ds_write%b0\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read%u1\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store%s0\t%A0, %1%O0%g0"
@@ -611,7 +611,7 @@
   #
   flat_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\t0
   flat_store_dwordx2\t%A0, %1%O0%g0
-  ds_write_b64\t%A0, %1%O0
+  ds_write_b64\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b64\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   global_load_dwordx2\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
   global_store_dwordx2\t%A0, %1%O0%g0"
@@ -667,7 +667,7 @@
   #
   global_store_dwordx4\t%A0, %1%O0%g0
   global_load_dwordx4\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
-  ds_write_b128\t%A0, %1%O0
+  ds_write_b128\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read_b128\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)"
   "reload_completed
&& REG_P (operands[0])
-- 
2.23.0

[PATCH] [og10] OpenACC: Remove unnecessary barriers (gimple worker partitioning/broadcast)

2020-06-29 Thread Julian Brown

This is an optimisation for middle-end worker-partitioning support (used
to support multiple workers on AMD GCN).  At present, barriers may be
emitted in cases where they aren't needed and cannot be optimised away.
This patch stops the extraneous barriers from being emitted in the
first place.

One exception to the above (where the barrier is still needed) is for
predicated blocks of code that perform a write to gang-private shared
memory from one worker.  We must execute a barrier before other workers
read that shared memory location.

OK for og10 branch?

Julian

ChangeLog

gcc/
* config/gcn/gcn.c (gimple.h): Include.
(gcn_fork_join): Emit barrier for worker-level joins.
* omp-sese.c (find_local_vars_to_propagate): Add writes_gangprivate
bitmap parameter. Set bit for blocks containing gang-private variable
writes.
(worker_single_simple): Don't emit barrier after predicated block.
(worker_single_copy): Don't emit barrier if we're not broadcasting
anything and the block contains no gang-private writes.
(neuter_worker_single): Don't predicate blocks that only contain NOPs
or internal marker functions.  Pass has_gangprivate_write argument to
worker_single_copy.
(oacc_do_neutering): Add writes_gangprivate bitmap handling.
---
 gcc/config/gcn/gcn.c |   9 +++-
 gcc/omp-sese.c   | 115 +--
 2 files changed, 97 insertions(+), 27 deletions(-)

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index bf996b461547..35b2ef5e752b 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -50,6 +50,7 @@
 #include "varasm.h"
 #include "intl.h"
 #include "rtl-iter.h"
+#include "gimple.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -4898,9 +4899,15 @@ gcn_oacc_dim_pos (int dim)
 /* Implement TARGET_GOACC_FORK_JOIN.  */
 
 static bool
-gcn_fork_join (gcall *ARG_UNUSED (call), const int *ARG_UNUSED (dims),
+gcn_fork_join (gcall *call, const int *ARG_UNUSED (dims),
   bool ARG_UNUSED (is_fork))
 {
+  tree arg = gimple_call_arg (call, 2);
+  unsigned axis = TREE_INT_CST_LOW (arg);
+
+  if (!is_fork && axis == GOMP_DIM_WORKER && dims[axis] != 1)
+return true;
+
   return false;
 }
 
diff --git a/gcc/omp-sese.c b/gcc/omp-sese.c
index 4dd3417066c6..80697358efec 100644
--- a/gcc/omp-sese.c
+++ b/gcc/omp-sese.c
@@ -768,16 +768,19 @@ static void
 find_local_vars_to_propagate (parallel_g *par, unsigned outer_mask,
  hash_set *partitioned_var_uses,
  hash_set *gangprivate_vars,
+ bitmap writes_gangprivate,
  vec *prop_set)
 {
   unsigned mask = outer_mask | par->mask;
 
   if (par->inner)
 find_local_vars_to_propagate (par->inner, mask, partitioned_var_uses,
- gangprivate_vars, prop_set);
+ gangprivate_vars, writes_gangprivate,
+ prop_set);
   if (par->next)
 find_local_vars_to_propagate (par->next, outer_mask, partitioned_var_uses,
- gangprivate_vars, prop_set);
+ gangprivate_vars, writes_gangprivate,
+ prop_set);
 
   if (!(mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)))
 {
@@ -798,8 +801,7 @@ find_local_vars_to_propagate (parallel_g *par, unsigned 
outer_mask,
  if (!VAR_P (var)
  || is_global_var (var)
  || AGGREGATE_TYPE_P (TREE_TYPE (var))
- || !partitioned_var_uses->contains (var)
- || gangprivate_vars->contains (var))
+ || !partitioned_var_uses->contains (var))
continue;
 
  if (stmt_may_clobber_ref_p (stmt, var))
@@ -813,6 +815,14 @@ find_local_vars_to_propagate (parallel_g *par, unsigned 
outer_mask,
  fprintf (dump_file, "\n");
}
 
+ if (gangprivate_vars->contains (var))
+   {
+ /* If we write a gang-private variable, we want a
+barrier at the end of the block.  */
+ bitmap_set_bit (writes_gangprivate, block->index);
+ continue;
+   }
+
  if (!(*prop_set)[block->index])
(*prop_set)[block->index] = new propagation_set;
 
@@ -924,14 +934,6 @@ worker_single_simple (basic_block from, basic_block to,
}
}
 }
-
-  gsi = gsi_start_bb (skip_block);
-
-  decl = builtin_decl_explicit (BUILT_IN_GOACC_BARRIER);
-  gimple *acc_bar = gimple_build_call (decl, 0);
-
-  gsi_insert_before (, acc_bar, GSI_SAME_STMT);
-  update_stmt (acc_bar);
 }
 
 /* This is a copied and renamed omp-low.c:omp_build_component_ref.  */

Re: [PATCH v2] arm: Warn if IRQ handler is not compiled with -mgeneral-regs-only [PR target/94743]

2020-06-29 Thread Christophe Lyon via Gcc-patches

Ping?

On Tue, 9 Jun 2020 at 11:48, Christophe Lyon  wrote:
>
> Ping?
>
> Maybe I could mention that LLVM emits a warning in this case
> (https://reviews.llvm.org/D28820).
>
> Thanks,
>
> Christophe
>
>
> On Wed, 3 Jun 2020 at 15:23, Christophe Lyon  
> wrote:
> >
> > Ping?
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545747.html
> >
> > On Wed, 27 May 2020 at 13:52, Christophe Lyon
> >  wrote:
> > >
> > > Ping?
> > >
> > > On Thu, 14 May 2020 at 16:57, Christophe Lyon
> > >  wrote:
> > > >
> > > > The interrupt attribute does not guarantee that the FP registers are
> > > > saved, which can result in problems difficult to debug.
> > > >
> > > > Saving the FP registers and status registers can be a large penalty,
> > > > so it's probably not desirable to do that all the time.
> > > >
> > > > If the handler calls other functions, we'd likely need to save all of
> > > > them, for lack of knowledge of which registers they actually clobber.
> > > >
> > > > This is even more obscure for the end-user when the compiler inserts
> > > > calls to helper functions such as memcpy (some multilibs do use FP
> > > > registers to speed it up).
> > > >
> > > > In the PR, we discussed adding routines in libgcc to save the FP
> > > > context and saving only locally-clobbered FP registers, but this seems
> > > > to be too much work for the purpose, given that in general such
> > > > handlers try to avoid this kind of penalty.
> > > > I suspect we would also want new attributes to instruct the compiler
> > > > that saving the FP context is not needed.
> > > >
> > > > In the mean time, emit a warning to suggest re-compiling with
> > > > -mgeneral-regs-only. Note that this can lead to errors if the code
> > > > uses floating-point and -mfloat-abi=hard, eg:
> > > > argument of type 'double' not permitted with -mgeneral-regs-only
> > > >
> > > > This can be troublesome for the user, but at least this would make
> > > > him aware of the latent issue.
> > > >
> > > > The patch adds several testcases:
> > > >
> > > > - pr94734-1-hard.c checks that a warning is emitted when using
> > > >   -mfloat-abi=hard. Function IRQ_HDLR_Test can make implicit calls to
> > > >   runtime floating-point routines (or direct use of FP instructions),
> > > >   IRQ_HDLR_Test2 doesn't. We emit a warning in both cases, though.
> > > >
> > > > - pr94734-1-softfp.c: same as above wih -mfloat-abi=softfp.
> > > >
> > > > - pr94734-1-soft.c checks that no warning is emitted when using
> > > >   -mfloat-abi=soft when the same code as above.
> > > >
> > > > - pr94734-2.c checks that no warning is emitted when using
> > > >   -mgeneral-regs-only.
> > > >
> > > > - pr94734-3.c checks that no warning is emitted when using
> > > >   -mgeneral-regs-only even using float-point data.
> > > >
> > > > 2020-05-14  Christophe Lyon  
> > > >
> > > > PR target/94743
> > > > gcc/
> > > > * config/arm/arm.c (arm_handle_isr_attribute): Warn if
> > > > -mgeneral-regs-only is not used.
> > > >
> > > > gcc/testsuite/
> > > > * gcc.misc-tests/arm-isr.c: Add -mgeneral-regs-only.
> > > > * gcc.target/arm/empty_fiq_handler.c: Add -mgeneral-regs-only.
> > > > * gcc.target/arm/interrupt-1.c: Add -mgeneral-regs-only.
> > > > * gcc.target/arm/interrupt-2.c: Add -mgeneral-regs-only.
> > > > * gcc.target/arm/pr70830.c: Add -mgeneral-regs-only.
> > > > * gcc.target/arm/pr94743-1-hard.c: New test.
> > > > * gcc.target/arm/pr94743-1-soft.c: New test.
> > > > * gcc.target/arm/pr94743-1-softfp.c: New test.
> > > > * gcc.target/arm/pr94743-2.c: New test.
> > > > * gcc.target/arm/pr94743-3.c: New test.
> > > > ---
> > > >  gcc/config/arm/arm.c |  5 
> > > >  gcc/testsuite/gcc.misc-tests/arm-isr.c   |  2 ++
> > > >  gcc/testsuite/gcc.target/arm/empty_fiq_handler.c |  1 +
> > > >  gcc/testsuite/gcc.target/arm/interrupt-1.c   |  2 +-
> > > >  gcc/testsuite/gcc.target/arm/interrupt-2.c   |  2 +-
> > > >  gcc/testsuite/gcc.target/arm/pr70830.c   |  2 +-
> > > >  gcc/testsuite/gcc.target/arm/pr94743-1-hard.c| 29 
> > > > 
> > > >  gcc/testsuite/gcc.target/arm/pr94743-1-soft.c| 27 
> > > > ++
> > > >  gcc/testsuite/gcc.target/arm/pr94743-1-softfp.c  | 29 
> > > > 
> > > >  gcc/testsuite/gcc.target/arm/pr94743-2.c | 22 
> > > > ++
> > > >  gcc/testsuite/gcc.target/arm/pr94743-3.c | 23 
> > > > +++
> > > >  11 files changed, 141 insertions(+), 3 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pr94743-1-hard.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pr94743-1-soft.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pr94743-1-softfp.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pr94743-2.c
> > > >  create mode 100644

[PATCH] [8/9/10/11 Regression] PR fortran/93423 - ICE on invalid with argument list for module procedure

2020-06-29 Thread Harald Anlauf

Early ping (might make it into 10.2).

> A simple situation where a NULL pointer dereference occurs during error 
> recovery.
>
> Regtested on x86_64-pc-linux-gnu.
>
> OK for master / backports?
>
> Thanks,
> Harald
>
>
> PR fortran/93423 - ICE on invalid with argument list for module procedure
>
> When recovering from an error, a NULL pointer dereference could occur.
> Check for that situation and punt.
>
> gcc/fortran/
>   PR fortran/93423
>   * resolve.c (resolve_symbol): Avoid NULL pointer dereference.
>

Thanks,
Harald

[PATCH] [9/10/11 Regression] PR fortran/93337 - ICE in gfc_dt_upper_string, at fortran/module.c:441

2020-06-29 Thread Harald Anlauf

Early ping (might make it into 10.2.1).

> This PR is due to a plain NULL pointer that needs to get caught in the right 
> place.
>
> Regtested on x86_64-pc-linux-gnu.
>
> OK for master / backports where applicable?
>
> Thanks,
> Harald
>
>
> PR fortran/93337 - ICE in gfc_dt_upper_string, at fortran/module.c:441
>
> When declaring a polymorphic variable that is not a dummy, allocatable or
> pointer, an ICE occurred due to a NULL pointer dereference.  Check for
> that situation and punt.
>
> gcc/fortran/
>   PR fortran/93337
>   * class.c (gfc_find_derived_vtab): Punt if name is not set.
>
>

Thanks,
Harald

Re: [PATCH 2/7] PowerPC tests: Add PLI/PADDI tests.

2020-06-29 Thread Segher Boessenkool

Hi!

On Mon, Jun 29, 2020 at 02:23:22PM -0400, Michael Meissner wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_prefixed_addr } */
> +/* { dg-require-effective-target lp64 } */

Please always say (_in the test_) why something is required, if it isn't
obvious.

> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +
> +/* Test that PLI (PADDI) is generated to load a large constant.  */
> +unsigned long long
> +large (void)
> +{
> +  return 0x12345678ULL;
> +}
> +
> +/* { dg-final { scan-assembler {\mpli\M} } } */

I have no idea why 64-bit mode (or 64-bit addressing) is needed here.
*Is* it needed?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c
> @@ -0,0 +1,161 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_prefixed_addr } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */

> +  unsigned int si;   /* offset 49 bytes.  */

> +int
> +load_si (struct packed_struct *p)
> +{
> +  return p->si;  /* PLWA 3,49(3).  */
> +}

Here it is because this would be just lwz on 32-bit.

But that is the only difference, so you could just make that single test
conditional, not the whole file.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/prefix-no-update.c
> @@ -0,0 +1,51 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_prefixed_addr } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */

For this testcase, I have no idea at all why you want lp64?

Thanks,


Segher

Re: [PATCH 2/7] PowerPC tests: Add PLI/PADDI tests.

2020-06-29 Thread Michael Meissner via Gcc-patches

>From 212475e5757fe3335cba30c9c3eec1707ac0c271 Mon Sep 17 00:00:00 2001
From: Michael Meissner 
Date: Sat, 27 Jun 2020 00:40:48 -0500
Subject: [PATCH, committed] Add PowerPC tests for power10.

2020-06-27  Michael Meissner  

* gcc.target/powerpc/prefix-add.c: New test.
* gcc.target/powerpc/prefix-si-constant.c: New test.
* gcc.target/powerpc/prefix-di-constant.c: New test.
* gcc.target/powerpc/prefix-ds-dq.c: New test.
* gcc.target/powerpc/prefix-no-update.c: New test.
* gcc.target/powerpc/prefix-large-dd.c: New test.
* gcc.target/powerpc/prefix-large-df.c: New test.
* gcc.target/powerpc/prefix-large-di.c: New test.
* gcc.target/powerpc/prefix-large-hi.c: New test.
* gcc.target/powerpc/prefix-large-kf.c: New test.
* gcc.target/powerpc/prefix-large-qi.c: New test.
* gcc.target/powerpc/prefix-large-sd.c: New test.
* gcc.target/powerpc/prefix-large-sf.c: New test.
* gcc.target/powerpc/prefix-large-si.c: New test.
* gcc.target/powerpc/prefix-large-udi.c: New test.
* gcc.target/powerpc/prefix-large-uhi.c: New test.
* gcc.target/powerpc/prefix-large-uqi.c: New test.
* gcc.target/powerpc/prefix-large-usi.c: New test.
* gcc.target/powerpc/prefix-large-v2df.c: New test.
* gcc.target/powerpc/prefix-large.h: Include file for new tests.
* gcc.target/powerpc/prefix-pcrel-dd.c: New test.
* gcc.target/powerpc/prefix-pcrel-df.c: New test.
* gcc.target/powerpc/prefix-pcrel-di.c: New test.
* gcc.target/powerpc/prefix-pcrel-hi.c: New test.
* gcc.target/powerpc/prefix-pcrel-kf.c: New test.
* gcc.target/powerpc/prefix-pcrel-qi.c: New test.
* gcc.target/powerpc/prefix-pcrel-sd.c: New test.
* gcc.target/powerpc/prefix-pcrel-sf.c: New test.
* gcc.target/powerpc/prefix-pcrel-si.c: New test.
* gcc.target/powerpc/prefix-pcrel-udi.c: New test.
* gcc.target/powerpc/prefix-pcrel-uhi.c: New test.
* gcc.target/powerpc/prefix-pcrel-uqi.c: New test.
* gcc.target/powerpc/prefix-pcrel-usi.c: New test.
* gcc.target/powerpc/prefix-pcrel-v2df.c: New test.
* gcc.target/powerpc/prefix-pcrel.h: Include file for new tests.
* gcc.target/powerpc/prefix-stack-protect.c: New test.
---
 gcc/testsuite/gcc.target/powerpc/prefix-add.c |  14 ++
 .../gcc.target/powerpc/prefix-di-constant.c   |  13 ++
 .../gcc.target/powerpc/prefix-ds-dq.c | 161 ++
 .../gcc.target/powerpc/prefix-large-dd.c  |  13 ++
 .../gcc.target/powerpc/prefix-large-df.c  |  13 ++
 .../gcc.target/powerpc/prefix-large-di.c  |  14 ++
 .../gcc.target/powerpc/prefix-large-hi.c  |  13 ++
 .../gcc.target/powerpc/prefix-large-kf.c  |  13 ++
 .../gcc.target/powerpc/prefix-large-qi.c  |  13 ++
 .../gcc.target/powerpc/prefix-large-sd.c  |  19 +++
 .../gcc.target/powerpc/prefix-large-sf.c  |  13 ++
 .../gcc.target/powerpc/prefix-large-si.c  |  13 ++
 .../gcc.target/powerpc/prefix-large-udi.c |  14 ++
 .../gcc.target/powerpc/prefix-large-uhi.c |  13 ++
 .../gcc.target/powerpc/prefix-large-uqi.c |  13 ++
 .../gcc.target/powerpc/prefix-large-usi.c |  13 ++
 .../gcc.target/powerpc/prefix-large-v2df.c|  13 ++
 .../gcc.target/powerpc/prefix-large.h |  40 +
 .../gcc.target/powerpc/prefix-no-update.c |  51 ++
 .../gcc.target/powerpc/prefix-pcrel-dd.c  |  13 ++
 .../gcc.target/powerpc/prefix-pcrel-df.c  |  13 ++
 .../gcc.target/powerpc/prefix-pcrel-di.c  |  14 ++
 .../gcc.target/powerpc/prefix-pcrel-hi.c  |  13 ++
 .../gcc.target/powerpc/prefix-pcrel-kf.c  |  13 ++
 .../gcc.target/powerpc/prefix-pcrel-qi.c  |  13 ++
 .../gcc.target/powerpc/prefix-pcrel-sd.c  |  15 ++
 .../gcc.target/powerpc/prefix-pcrel-sf.c  |  13 ++
 .../gcc.target/powerpc/prefix-pcrel-si.c  |  13 ++
 .../gcc.target/powerpc/prefix-pcrel-udi.c |  14 ++
 .../gcc.target/powerpc/prefix-pcrel-uhi.c |  13 ++
 .../gcc.target/powerpc/prefix-pcrel-uqi.c |  13 ++
 .../gcc.target/powerpc/prefix-pcrel-usi.c |  13 ++
 .../gcc.target/powerpc/prefix-pcrel-v2df.c|  13 ++
 .../gcc.target/powerpc/prefix-pcrel.h |  41 +
 .../gcc.target/powerpc/prefix-si-constant.c   |  12 ++
 .../gcc.target/powerpc/prefix-stack-protect.c |  21 +++
 36 files changed, 729 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-add.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-di-constant.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-ds-dq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c
 create mode 100644

Re: PSA: Default C++ dialect is now C++17

2020-06-29 Thread Martin Liška


On 6/29/20 9:51 AM, Martin Liška wrote:

On 6/26/20 9:34 PM, Marek Polacek via Gcc-patches wrote:

As discussed last month:

it's time to change the C++ default to gnu++17.  I've committed the patch after
testing x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu.  Brace 
yourselves!

Marek



Just a small note that 510.parest_r SPEC 2017 benchmark can't be built now
with default changed to -std=c++17. The spec config needs to be adjusted.

Martin


And there one another failure in 520.omnetpp_r caused by run-time error:
 Error during startup: Register_Function() or cMathFunction: attempt to register 
function "SPEC_HYPOT" with wrong number of arguments 2, should be 3.

which is about call of std::__hypot3 which has newly 3 args since c++ 17:
https://en.cppreference.com/w/cpp/numeric/math/hypot

Martin

Re: [PATCH 2/7 v5] rs6000: lenload/lenstore optab support

2020-06-29 Thread Segher Boessenkool

Hi Kewen,

On Mon, Jun 29, 2020 at 02:32:47PM +0800, Kewen.Lin wrote:
> V5: Like V4.

It is still okay for trunk, like before :-)


Segher

Re: PSA: Default C++ dialect is now C++17

2020-06-29 Thread Martin Liška


On 6/29/20 4:57 PM, Marek Polacek wrote:

On Mon, Jun 29, 2020 at 09:51:57AM +0200, Martin Liška wrote:

On 6/26/20 9:34 PM, Marek Polacek via Gcc-patches wrote:

As discussed last month:

it's time to change the C++ default to gnu++17.  I've committed the patch after
testing x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu.  Brace 
yourselves!

Marek



Just a small note that 510.parest_r SPEC 2017 benchmark can't be built now
with default changed to -std=c++17. The spec config needs to be adjusted.


Interesting, do you know why?  Does it use the register keyword?


Apparently it needs -fno-new-ttp-matching for successful compilation.
There's a reduced test-case I made:

cat fe.ii
template  class FiniteElement;
template  class DoFHandler;
class FETools {
  template 
  void back_interpolate(const DoFHandler &, const InVector &,
const FiniteElement &, OutVector &);
  template  class DH, class InVector, class OutVector,
int spacedim>
  void back_interpolate(const DH &, InVector,
const FiniteElement &, OutVector);
};
template  class DoFHandler;
template  class FiniteElement;
template 
void FETools::back_interpolate(const DoFHandler &,
   const InVector &,
   const FiniteElement &,
   OutVector &) {}
template void FETools::back_interpolate(const DoFHandler<3> &, const float &,
const FiniteElement<3> &, float &);

Martin



Marek

[PATCH] nvptx: Fix ICE in nvptx_vector_alignment on gcc.dg/attr-vector_size.c

2020-06-29 Thread Roger Sayle


This patch addresses the ICE in gcc.dg/attr-vector_size.c during make -k check 
on
nvptx-none.  The actual ICE looks like:

testsuite/gcc.dg/attr-vector_size.c:29:1: internal compiler error: in 
tree_to_shwi, at tree.c:7321
0xf53bf2 tree_to_shwi(tree_node const*)
../../gcc/gcc/tree.c:7321
0xff1969 nvptx_vector_alignment
../../gcc/gcc/config/nvptx/nvptx.c:5105^M

The problem is that the caller has ensured that TYPE_SIZE(type) is 
representable as
an unsigned HOST_WIDE_INT, but nvptx_vector_alignment is accessing it as a
signed HOST_WIDE_INT which overflows in pathological conditions.  Amongst those
pathological conditions is that a TYPE_SIZE of zero can sometimes reach this 
function,
prior to an error being emitted.  Making sure the result is not less than the 
mode's
alignment and not greater than BIGGEST_ALIGNMENT fixes the ICEs, and generates 
the expected compile-time error messages.

Tested on --target=nvptx-none, with a "make" and "make check" which results in 
four
fewer unexpected failures and three more expected passes.
Ok for mainline?


2020-06-29  Roger Sayle  

gcc/ChangeLog:
* config/nvptx/nvptx.c (nvptx_vector_alignment): Use tree_to_uhwi
to access TYPE_SIZE (type).  Return at least the mode's alignment.

Thanks,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index e3e84df..bfad91b 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5102,9 +5102,10 @@ static const struct attribute_spec 
nvptx_attribute_table[] =
 static HOST_WIDE_INT
 nvptx_vector_alignment (const_tree type)
 {
-  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
-
-  return MIN (align, BIGGEST_ALIGNMENT);
+  unsigned HOST_WIDE_INT align = tree_to_uhwi (TYPE_SIZE (type));
+  if (align > BIGGEST_ALIGNMENT)
+return BIGGEST_ALIGNMENT;
+  return MAX (align, GET_MODE_ALIGNMENT (TYPE_MODE (type)));
 }
 
 /* Indicate that INSN cannot be duplicated.   */

[committed] middle-end: Optimize (A)^(B) to (A^B) in simplify_rtx (take 3).

2020-06-29 Thread Roger Sayle


Hi Richard,

Many thanks for the review.  Many thanks also to Hans-Peter Nilsson, Joseph 
Myers
and overse...@gcc.gnu.org for helping get my ssh keys updated.  I've taken the 
opportunity
of committing this patch to check that everything is working.  For the record, 
here's the
final version as committed.  I've added the (xor (ashiftrt x c) (ashiftrt y c)) 
case as per your
suggestion, which fires 6 times during make -k check on x86_64-pc-linux-gnu.

Cheers,
Roger
--

-Original Message-
From: Richard Sandiford  
Sent: 22 June 2020 20:41
To: Roger Sayle 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH take 2] middle-end: Optimize (A)^(B) to (A^B) in 
simplify_rtx.

Hi Roger,

Thanks for the update and sorry for the slow reply.

"Roger Sayle"  writes:
> As suggested by Richard Sandiford, this patch splits out the previous 
> RTL simplification, of (X)^(Y) to (X^Y), to its own function, 
> simplify_distributive_operation, and calls it when appropriate for 
> IOR, XOR and AND.
>
> Instrumenting a bootstrap reveals this optimization triggers
> 393358 times during stage2, stage3 and building the libraries, and a 
> further 263659 times during make check.  By order of occurrence the 
> RTL transformation opportunities are:
>
>  284447 01 and ior
>  156798 00 xor and
>  131110 11 and ior
>   47253 00 and ior
>   28035 00 ior and
>2804 01 ior and
>2698 11 ior and
>2262 01 xor and
> 602 11 xor and
> 312 00 xor xor
> 298 00 xor rotate
> 120 00 and ashift
> 108 00 ior lshiftrt
>  60 00 and ashiftrt
>  54 00 ior ashift
>  18 00 and lshiftrt
>  12 10 xor and
>  12 00 xor rotatert
>   8 00 xor lshiftrt
>   4 10 ior and
>   2 00 ior ashiftrt

That's an impressive number of hits :-)

> where the two binary digits denote the surviving inner unique 
> operands, so "00 xor and" corresponds to the original (xor (and X Z) 
> (and Y Z)), and "01 and ior" corresponds to (and (ior X Y) (ior Y Z)).
>
> Many thanks also to Richard for pointing out simplify_rtx_c_tests, the 
> self-testing framework in simplify-rtx.c, which is new since my day.
> This patch supplements the existing vector mode testing, with a suite 
> of scalar integer mode tests that confirm that many of the expected 
> integer simplifications in simplify-rtx are being applied as expected.
> This includes three tests of the new simplify_distributive_operation.
>
> Before:
> xgcc -xc -fself-test: 59693 pass(es) in 0.820956 seconds xgcc -xc++ 
> -fself-test: 59723 pass(es) in 0.786662 seconds
> After:
> xgcc -xc -fself-test: 60003 pass(es) in 0.860637 seconds xgcc -xc++ 
> -fself-test: 60033 pass(es) in 0.794624 seconds
>
>
> I do have one thought/suggestion around test_scalar_ops for future 
> generations.  These tests are extremely strict; instead of an 
> unexpected failure in the testsuite, breaking a self-test stops the 
> build.  Instead of reverting this patch, should anything go wrong (in 
> future on a misbehaving platform), might I instead propose simply 
> commenting out the call to test_scalar_ops in simplify_rtx_c_tests as 
> a mitigation strategy whilst the build is restored.  In fact, removing 
> the "static" from test_scalar_ops would avoid the "defined but not 
> used" complication from this disaster recovery plan.

Yeah, we can work around it rather than revert the patch.

> This patch has been tested with "make bootstrap" and "make -k check"
> on x86_64-pc-linux-gnu with no regressions.
>
>
> 2020-06-16  Roger Sayle  
> Richard Sandiford  

Thanks for the gesture, but I don't think I should be co-author here.
I didn't write anything :-)

> […]
> @@ -3064,6 +3112,21 @@ simplify_binary_operation_1 (enum rtx_code code, 
> machine_mode mode,
>   }
>   }
>  
> +  /* Convert (ior (and A C) (and B C)) into (and (ior A B) C).  */
> +  if (GET_CODE (op0) == GET_CODE (op1)
> +   && (GET_CODE (op0) == AND
> +   || GET_CODE (op0) == IOR
> +   || GET_CODE (op0) == LSHIFTRT
> +   || GET_CODE (op0) == ASHIFTRT
> +   || GET_CODE (op0) == ASHIFT
> +   || GET_CODE (op0) == ROTATE
> +   || GET_CODE (op0) == ROTATERT))
> + {
> +   tem = simplify_distributive_operation (code, mode, op0, op1);
> +   if (tem)
> + return tem;
> + }
> +
>tem = simplify_byte_swapping_operation (code, mode, op0, op1);
>if (tem)
>   return tem;
> @@ -3302,6 +3365,20 @@ simplify_binary_operation_1 (enum rtx_code code, 
> machine_mode mode,
> && (reversed = reversed_comparison (op0, int_mode)))
>   return reversed;
>  
> +  /* Convert (xor (and A C) (and B C)) into (and (xor A B) C).  */
> +  if (GET_CODE (op0) == GET_CODE (op1)
> +   && (GET_CODE (op0) == AND
> +   || GET_CODE (op0) == XOR
> +   || GET_CODE (op0) == LSHIFTRT
> +   || GET_CODE (op0) == ASHIFT
> +   || GET_CODE (op0) == ROTATE
> +   || GET_CODE (op0) == ROTATERT))
> +

Re: [PATCH] underline null argument in -Wnonnull (PR c++/86568)

2020-06-29 Thread Martin Sebor via Gcc-patches


On 6/29/20 3:24 AM, Andreas Schwab wrote:

This breaks bootstrap:

In static member function 'static Expression* Type::type_descriptor(Gogo*, 
Type*)',
 inlined from 'virtual Expression* Named_type::do_type_descriptor(Gogo*, 
Named_type*)' at ../../gcc/go/gofrontend/types.cc:4:53,
 inlined from 'virtual Expression* Named_type::do_type_descriptor(Gogo*, 
Named_type*)' at ../../gcc/go/gofrontend/types.cc:11105:1:
../../gcc/go/gofrontend/types.cc:1474:34: error: 'this' pointer null 
[-Werror=nonnull]
  1474 |   return type->do_type_descriptor(gogo, NULL);
   |  ^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [../../gcc/go/Make-lang.in:242: go/types.o] Error 1


I opened pr95970.

I don't build Go anymore because of PR 91992.

Martin

Re: [PATCH] c++: Make convert_like complain about bad ck_ref_bind again [PR95789]

2020-06-29 Thread Marek Polacek via Gcc-patches

Ping.

On Mon, Jun 22, 2020 at 10:09:27PM -0400, Marek Polacek via Gcc-patches wrote:
> convert_like issues errors about bad_p conversions at the beginning
> of the function, but in the ck_ref_bind case, it only issues them
> after we've called convert_like on the next conversion.
> 
> This doesn't work as expected since r10-7096 because when we see
> a conversion from/to class type in a template, we return early, thereby
> missing the error, and a bad_p conversion goes by undetected.  That
> made the attached test to compile even though it should not.
> 
> I had thought that I could just move the ck_ref_bind/bad_p errors
> above to the rest of them, but that regressed diagnostics because
> expr then wasn't converted yet by the nested convert_like_real call.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk and 10?
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/95789
>   * call.c (convert_like_real): Do the normal processing for
>   ck_ref_bind conversion that are bad_p.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/95789
>   * g++.dg/conversion/ref4.C: New test.
> ---
>  gcc/cp/call.c  |  9 -
>  gcc/testsuite/g++.dg/conversion/ref4.C | 22 ++
>  2 files changed, 30 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/conversion/ref4.C
> 
> diff --git a/gcc/cp/call.c b/gcc/cp/call.c
> index 2b39a3700fc..7b16895d5db 100644
> --- a/gcc/cp/call.c
> +++ b/gcc/cp/call.c
> @@ -7402,7 +7402,14 @@ convert_like_real (conversion *convs, tree expr, tree 
> fn, int argnum,
>   function.  */
>if (processing_template_decl
>&& convs->kind != ck_identity
> -  && (CLASS_TYPE_P (totype) || CLASS_TYPE_P (TREE_TYPE (expr
> +  && (CLASS_TYPE_P (totype) || CLASS_TYPE_P (TREE_TYPE (expr)))
> +  /* Do the normal processing to give the bad_p errors in ck_ref_bind
> +  to avoid losing the fact that this conversion was bad.  Since we
> +  are going to return error_mark_node, we don't care about trees
> +  breaking in templates.  */
> +  && !(convs->kind == ck_ref_bind
> +&& convs->bad_p
> +&& !next_conversion (convs)->bad_p))
>  {
>expr = build1 (IMPLICIT_CONV_EXPR, totype, expr);
>return convs->kind == ck_ref_bind ? expr : convert_from_reference 
> (expr);
> diff --git a/gcc/testsuite/g++.dg/conversion/ref4.C 
> b/gcc/testsuite/g++.dg/conversion/ref4.C
> new file mode 100644
> index 000..464a4cf6c0f
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/conversion/ref4.C
> @@ -0,0 +1,22 @@
> +// PR c++/95789
> +// { dg-do compile { target c++11 } }
> +
> +struct B {
> +int n;
> +};
> +
> +template 
> +struct A {
> +B& get() const { return f; } // { dg-error "binding reference" }
> +
> +B f;
> +};
> +
> +int main() {
> +A a;
> +a.f = {};
> +
> +a.get().n = 10;
> +if (a.f.n != 0)
> +  __builtin_abort();
> +}
> 
> base-commit: 0164e59835de81d758fd4c56248ad7a46435fbfa
> -- 
> Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA
> 

Marek

Re: PSA: Default C++ dialect is now C++17

2020-06-29 Thread Marek Polacek via Gcc-patches

On Mon, Jun 29, 2020 at 09:51:57AM +0200, Martin Liška wrote:
> On 6/26/20 9:34 PM, Marek Polacek via Gcc-patches wrote:
> > As discussed last month:
> > 
> > it's time to change the C++ default to gnu++17.  I've committed the patch 
> > after
> > testing x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu.  Brace 
> > yourselves!
> > 
> > Marek
> > 
> 
> Just a small note that 510.parest_r SPEC 2017 benchmark can't be built now
> with default changed to -std=c++17. The spec config needs to be adjusted.

Interesting, do you know why?  Does it use the register keyword?

Marek

Re: [PATCH] reassoc: Propagate PHI_LOOP_BIAS along single uses

2020-06-29 Thread Ilya Leoshkevich via Gcc-patches

On Thu, 2020-06-25 at 14:34 +0200, Richard Biener wrote:
> On Wed, Jun 24, 2020 at 1:31 AM Ilya Leoshkevich via Gcc-patches
>  wrote:
> > Bootstrapped and regtested x86_64-redhat-linux, ppc64le-redhat-
> > linux and
> > s390x-redhat-linux.  I also ran SPEC 2006 and 2017 on these
> > platforms,
> > and the only measurable regression was 3% in 520.omnetpp_r on ppc,
> > which
> > went away after inserting a single nop at the beginning of
> > cDynamicExpression::evaluate.
> > 
> > OK for master?
> 
> As you might know this is incredibly fragile so I'd prefer if you
> submit
> and push the change to disable PHI biasing in the early pass instance
> separately.  At first glance that change looks reasonable (but we'll
> watch for fallout).

Will do.

> 
> Comments on the other changes inline
> 
> > ---
> > 
> > PR tree-optimization/49749 introduced code that shortens dependency
> > chains containing loop accumulators by placing them last on operand
> > lists of associative operations.
> > 
> > 456.hmmer benchmark on s390 could benefit from this, however, the
> > code
> > that needs it modifies loop accumulator before using it, and since
> > only
> > so-called loop-carried phis are are treated as loop accumulators,
> > the
> > code in the present form doesn't really help.   According to Bill
> > Schmidt - the original author - such a conservative approach was
> > chosen
> > so as to avoid unnecessarily swapping operands, which might cause
> > unpredictable effects.  However, giving special treatment to forms
> > of
> > loop accumulators is acceptable.
> > 
> > The definition of loop-carried phi is: it's a single-use phi, which
> > is
> > used in the same innermost loop it's defined in, at least one
> > argument
> > of which is defined in the same innermost loop as the phi itself.
> > Given this, it seems natural to treat single uses of such phis as
> > phis
> > themselves.
> > 
> > gcc/ChangeLog:
> > 
> > 2020-05-06  Ilya Leoshkevich  
> > 
> > * passes.def (pass_reassoc): Rename parameter to early_p.
> > * tree-ssa-reassoc.c
> > (reassoc_bias_loop_carried_phi_ranks_p):
> > New variable.
> > (phi_rank): Don't bias loop-carried phi ranks
> > before vectorization pass.
> > (loop_carried_phi): Remove (superseded by
> > operand_rank::biased_p).
> > (propagate_rank): Propagate bias along single uses.
> > (get_rank): Pass stmt to propagate_rank.
> > (execute_reassoc): Add bias_loop_carried_phi_ranks_p
> > parameter.
> > (pass_reassoc::pass_reassoc): Add
> > bias_loop_carried_phi_ranks_p
> > initializer.
> > (pass_reassoc::set_param): Set
> > bias_loop_carried_phi_ranks_p
> > value.
> > (pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p
> > to
> > execute_reassoc.
> > (pass_reassoc::bias_loop_carried_phi_ranks_p): New member.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 2020-05-06  Ilya Leoshkevich  
> > 
> > * gcc.target/s390/reassoc-1.c: New test.
> > * gcc.target/s390/reassoc-2.c: New test.
> > * gcc.target/s390/reassoc-3.c: New test.
> > * gcc.target/s390/reassoc.h: New test.
> > ---
> >  gcc/passes.def|  4 +-
> >  gcc/testsuite/gcc.target/s390/reassoc-1.c |  6 ++
> >  gcc/testsuite/gcc.target/s390/reassoc-2.c |  7 ++
> >  gcc/testsuite/gcc.target/s390/reassoc-3.c |  8 ++
> >  gcc/testsuite/gcc.target/s390/reassoc.h   | 22 +
> >  gcc/tree-ssa-reassoc.c| 97 ++-
> > 
> >  6 files changed, 105 insertions(+), 39 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/s390/reassoc-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/s390/reassoc-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/s390/reassoc-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/s390/reassoc.h
> > 
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 2b1e09fdda3..6864f583f20 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -235,7 +235,7 @@ along with GCC; see the file COPYING3.  If not
> > see
> >  program and isolate those paths.  */
> >NEXT_PASS (pass_isolate_erroneous_paths);
> >NEXT_PASS (pass_dse);
> > -  NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
> > +  NEXT_PASS (pass_reassoc, true /* early_p */);
> >NEXT_PASS (pass_dce);
> >NEXT_PASS (pass_forwprop);
> >NEXT_PASS (pass_phiopt, false /* early_p */);
> > @@ -312,7 +312,7 @@ along with GCC; see the file COPYING3.  If not
> > see
> >NEXT_PASS (pass_lower_vector_ssa);
> >NEXT_PASS (pass_lower_switch);
> >NEXT_PASS (pass_cse_reciprocals);
> > -  NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
> > +  NEXT_PASS (pass_reassoc, false /* early_p */);
> >NEXT_PASS (pass_strength_reduction);
> >NEXT_PASS (pass_split_paths);
> >NEXT_PASS (pass_tracer);
> > diff --git

[PATCH] Accept "user" as an alias for "login" in .netrc

2020-06-29 Thread Gerald Pfeifer

On one of my system I have been getting warnings like

  wget: .../.netrc:2: unknown token "user"
  wget: .../.netrc:2: unknown token "..."
  wget: .../.netrc:8: unknown token "user"
  wget: .../.netrc:8: unknown token "..."
  wget: .../.netrc:15: unknown token "user"
  wget: .../.netrc:15: unknown token "..."

for ages (where those entries were used by formail among others).

Finally digging into this a bit deeper, into the wget sources, I
found that wget solely accepts "login" to specify username in .netrc.

This patch makes it more flexible to also accept "user", like fetchmail
does, and hence silences those warnings as a positive side effect.


(I don't know how to besty provide patches to you and hope this is okay?)

Thanks,
Gerald


2020-06-29  Gerald Pfeifer   

* src/netrc.c (parse_netrc_fp): Accept "user" as an alias 
for "login".

---
 src/netrc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/netrc.c b/src/netrc.c
index a9232ed4..214e3cef 100644
--- a/src/netrc.c
+++ b/src/netrc.c
@@ -391,6 +391,10 @@ parse_netrc_fp (const char *path, FILE *fp)
   else if (!strcmp (tok, "login"))
 last_token = tok_login;
 
+  /* "user" sometimes serves as an alias for "login". */
+  else if (!strcmp (tok, "user"))
+last_token = tok_login;
+
   else if (!strcmp (tok, "macdef"))
 last_token = tok_macdef;
 
-- 
2.26.2

Re: [PATCH] coroutines: Collect the function body rewrite code.

2020-06-29 Thread Nathan Sidwell


On 6/27/20 4:13 PM, Iain Sandoe wrote:

Hi,

This is a enabler patch that enables a reasonable approach to
fixing 5 reported PRs (it doesn’t fix anything in its own right).

It has been tested on x86_64-linux, darwin, powerpc64-linux
OK for master / 10.2?
thanks
Iain


OK, yes it was quite a slippery target to hit!



--

The standard describes a rewrite of the body of the user-authored
function (which wraps it in a try-catch block and provides the
initial and final suspend expressions).  The exact arrangement of
this was still in flux right up until the WG21 meeting in Prague and
as a consequence was a bit of a moving target.

The net result was a fragmented implementation of the parts of
this rewrite which is now impeding progress in fixing other issues.

This patch collates the rewrite action into a single function and
carries this out earlier.

gcc/cp/ChangeLog:

* coroutines.cc (expand_one_await_expression): Remove
code dealing with initial suspend.
(build_actor_fn): Remove code special-casing initial
and final suspend. Handle the final suspend and marking
of the coroutine as done.
(coro_rewrite_function_body): New.
(bind_expr_find_in_subtree): Remove.
(coro_body_contains_bind_expr_p): Remove.
(morph_fn_to_coro): Split the rewrite of the original
function into coro_rewrite_function_body and call it.
---
  gcc/cp/coroutines.cc | 534 +--
  1 file changed, 208 insertions(+), 326 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 54f9cb3b4e4..8e0f0e09b56 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1573,8 +1573,6 @@ expand_one_await_expression (tree *stmt, tree 
*await_expr, void *d)
tree awaiter_calls = TREE_OPERAND (saved_co_await, 3);
  
tree source = TREE_OPERAND (saved_co_await, 4);

-  bool is_initial =
-(source && TREE_INT_CST_LOW (source) == (int) INITIAL_SUSPEND_POINT);
bool is_final = (source
   && TREE_INT_CST_LOW (source) == (int) FINAL_SUSPEND_POINT);
bool needs_dtor = TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (var));
@@ -1724,16 +1722,6 @@ expand_one_await_expression (tree *stmt, tree 
*await_expr, void *d)
resume_label = build_stmt (loc, LABEL_EXPR, resume_label);
append_to_statement_list (resume_label, _list);
  
-  if (is_initial)

-{
-  /* Note that we are about to execute the await_resume() for the initial
-await expression.  */
-  r = build2_loc (loc, MODIFY_EXPR, boolean_type_node, data->i_a_r_c,
- boolean_true_node);
-  r = coro_build_cvt_void_expr_stmt (r, loc);
-  append_to_statement_list (r, _list);
-}
-
/* This will produce the value (if one is provided) from the co_await
   expression.  */
tree resume_call = TREE_VEC_ELT (awaiter_calls, 2); /* await_resume().  */
@@ -2102,19 +2090,16 @@ static void
  build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody,
tree orig, hash_map *param_uses,
hash_map *local_var_uses,
-   vec *param_dtor_list, tree initial_await,
-   tree final_await, unsigned body_count, tree frame_size)
+   vec *param_dtor_list,
+   tree fs_label, tree resume_fn_field,
+   unsigned body_count, tree frame_size)
  {
verify_stmt_tree (fnbody);
/* Some things we inherit from the original function.  */
-  tree coro_frame_ptr = build_pointer_type (coro_frame_type);
tree handle_type = get_coroutine_handle_type (orig);
tree self_h_proxy = get_coroutine_self_handle_proxy (orig);
tree promise_type = get_coroutine_promise_type (orig);
tree promise_proxy = get_coroutine_promise_proxy (orig);
-  tree act_des_fn_type
-= build_function_type_list (void_type_node, coro_frame_ptr, NULL_TREE);
-  tree act_des_fn_ptr = build_pointer_type (act_des_fn_type);
  
/* One param, the coro frame pointer.  */

tree actor_fp = DECL_ARGUMENTS (actor);
@@ -2145,21 +2130,15 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
DECL_CONTEXT (continuation) = actor;
BIND_EXPR_VARS (actor_bind) = continuation;
  
-  /* Update the block associated with the outer scope of the orig fn.  */

+  /* Link in the block associated with the outer scope of the re-written
+ function body.  */
tree first = expr_first (fnbody);
-  if (first && TREE_CODE (first) == BIND_EXPR)
-{
-  /* We will discard this, since it's connected to the original scope
-nest.  */
-  tree block = BIND_EXPR_BLOCK (first);
-  if (block) /* For this to be missing is probably a bug.  */
-   {
- gcc_assert (BLOCK_SUPERCONTEXT (block) == NULL_TREE);
- gcc_assert (BLOCK_CHAIN (block) == NULL_TREE);
- BLOCK_SUPERCONTEXT (block) = top_block;
- BLOCK_SUBBLOCKS (top_block) = block;
-   }
-}
+  gcc_checking_assert

[PATCH] tree-optimization/95916 - treat scalar ops explicitely

2020-06-29 Thread Richard Biener

This explicitely treats the case of scalar operands for SLP
when computing insert locations.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-06-29  Richard Biener  

PR tree-optimization/95916
* tree-vect-slp.c (vect_schedule_slp_instance): Explicitely handle
the case of not vectorized externals.

* gcc.dg/vect/pr95916.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr95916.c | 13 +
 gcc/tree-vect-slp.c | 15 +++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr95916.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr95916.c 
b/gcc/testsuite/gcc.dg/vect/pr95916.c
new file mode 100644
index 000..61b8ca3fa0c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr95916.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+extern short var_3, var_8;
+extern int var_5;
+extern char var_10;
+extern int arr_99[][16];
+void test()
+{
+  for (; 0 < var_10;)
+for (long a = var_8;; a++)
+  arr_99[4][a] = var_3 << var_5;
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b223956e3af..1ffbf6f6af9 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -4293,6 +4293,21 @@ vect_schedule_slp_instance (vec_info *vinfo,
  || vect_stmt_dominates_stmt_p (last_stmt, vstmt))
last_stmt = vstmt;
  }
+   else if (!SLP_TREE_VECTYPE (child))
+ {
+   /* For externals we use unvectorized at all scalar defs.  */
+   unsigned j;
+   tree def;
+   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (child), j, def)
+ if (TREE_CODE (def) == SSA_NAME
+ && !SSA_NAME_IS_DEFAULT_DEF (def))
+   {
+ gimple *stmt = SSA_NAME_DEF_STMT (def);
+ if (!last_stmt
+ || vect_stmt_dominates_stmt_p (last_stmt, stmt))
+   last_stmt = stmt;
+   }
+ }
else
  {
/* For externals we have to look at all defs since their
-- 
2.26.2

[PATCH] do not include from tree-vectorizer.h

2020-06-29 Thread Richard Biener

This removes the duplicate  include from tree-vectorizer.h.

2020-06-29  Richard Biener  

* tree-vectorizer.h: Do not include .
---
 gcc/tree-vectorizer.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index c393d7e5fa6..dfe88cc8af3 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -26,7 +26,7 @@ typedef class _stmt_vec_info *stmt_vec_info;
 #include "tree-data-ref.h"
 #include "tree-hash-traits.h"
 #include "target.h"
-#include 
+
 
 /* Used for naming of new temporaries.  */
 enum vect_var_kind {
-- 
2.26.2

Re: [RFC] aarch64: Treat GNU and Advanced SIMD vectors as distinct [PR92789, PR95726]

2020-06-29 Thread Jakub Jelinek via Gcc-patches

On Mon, Jun 29, 2020 at 12:38:45PM +0100, Richard Sandiford wrote:
> It looks like aarch64_comp_type_attributes is missing cases for
> the SVE attributes, but I'll handle that in a separate patch.
> 
> Any thoughts?  I'll apply this after 5pm UTC tomorrow if no asks
> me not to. :-)
> 
> If I do apply the patch in its current form and there's no fallout,
> I'll post a similar one for AArch32.  I'm not sure yet what to do
> about backports though -- there does seem to be a too-high risk
> of breaking things.

I believe Jason said to do it only if comparing_specializations is set,
the problem is that comparing_specializations is defined in the C++ FE
and the hooks can be linked without the C++ FE being linked in.
Perhaps for backports only move the int comparing_specializations;
definition out from the C++ FE to some generic file (tree.c) with
a comment that it is an ugly hack?  Non-C++ FEs would just keep the
variable 0 all the time and thus no change except when the C++ FE
compares template arguments?

Thanks for working on this.

> 2020-06-26  Richard Sandiford  
> 
> gcc/
>   PR target/92789
>   PR target/95726
>   * config/aarch64/aarch64.c (aarch64_attribute_table): Add
>   "Advanced SIMD type".
>   (aarch64_comp_type_attributes): Check that the "Advanced SIMD type"
>   attributes are equal.
>   * config/aarch64/aarch64-builtins.c: Include stringpool.h and
>   attribs.h.
>   (aarch64_mangle_builtin_vector_type): Use the mangling recorded
>   in the "Advanced SIMD type" attribute.
>   (aarch64_init_simd_builtin_types): Add an "Advanced SIMD type"
>   attribute to each Advanced SIMD type, using the mangled type
>   as the attribute's single argument.
> 
> gcc/testsuite/
>   PR target/92789
>   PR target/95726
>   * g++.target/aarch64/pr95726.C: New test.

Jakub

Re: [PATCH] underline null argument in -Wnonnull (PR c++/86568)

2020-06-29 Thread Jakub Jelinek via Gcc-patches

On Fri, Jun 05, 2020 at 01:41:16PM -0600, Martin Sebor via Gcc-patches wrote:
> PR c++/86568 - -Wnonnull warnings should highlight the relevant argument not 
> the closing parenthesis
> 
> gcc/c-family/ChangeLog:
> 
>   PR c++/86568
>   * c-common.c (struct nonnull_arg_ctx): Add members.
>   (check_function_nonnull): Use nonnull_arg_ctx as argument.  Handle
>   C++ member functions specially.  Consider the this pointer implicitly
>   nonnull.
>   (check_nonnull_arg): Use location of argument when available.
>   (check_function_arguments): Use nonnull_arg_ctx as argument.
> 
> gcc/ChangeLog:
> 
>   PR c++/86568
>   * calls.c (maybe_warn_rdwr_sizes): Use location of argument if
>   available.
>   * tree-ssa-ccp.c (pass_post_ipa_warn::execute): Same.  Adjust
>   indentation.
>   * tree.c (get_nonnull_args): Consider the this pointer implicitly
>   nonnull.
>   * gcc/var-tracking.c (deps_vec): New type.
>   (var_loc_dep_vec): New function.
>   (VAR_LOC_DEP_VEC): Use it.

This broke c-c++-common/builtin-arith-overflow-1.c on all arches.

Fixed thusly, tested on x86_64-linux, committed to trunk as obvious.

2020-06-29  Jakub Jelinek  

PR c++/86568
* c-c++-common/builtin/arith-overflow-1.c (generic_3, typed_3_null):
Adjust dg-warning.

--- gcc/testsuite/c-c++-common/builtin-arith-overflow-1.c.jj2020-01-12 
11:54:37.001404537 +0100
+++ gcc/testsuite/c-c++-common/builtin-arith-overflow-1.c   2020-06-29 
13:41:27.057188795 +0200
@@ -82,20 +82,20 @@ generic_3 (int a, int b, int c)
 x += __builtin_add_overflow (0, 0, (enum E *)0);
   */
 
-  x += __builtin_sub_overflow (0, 0, (char *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_add_overflow (0, 0, (short *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_add_overflow (a, b, (int *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_sub_overflow (a, b, (int *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_mul_overflow (a, b, (int *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_add_overflow (a, 1, (int *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_sub_overflow (a, 2, (int *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_mul_overflow (a, 3, (int *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_add_overflow (4, b, (int *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_sub_overflow (5, b, (int *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_mul_overflow (6, b, (int *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_add_overflow (7, 8, (int *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_sub_overflow (9, 10, (int *)0);   /* { dg-warning "null 
argument" } */
-  x += __builtin_mul_overflow (11, 12, (int *)0);   /* { dg-warning "null 
argument" } */
+  x += __builtin_sub_overflow (0, 0, (char *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_add_overflow (0, 0, (short *)0);   /* { dg-warning "argument 
3 null" } */
+  x += __builtin_add_overflow (a, b, (int *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_sub_overflow (a, b, (int *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_mul_overflow (a, b, (int *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_add_overflow (a, 1, (int *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_sub_overflow (a, 2, (int *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_mul_overflow (a, 3, (int *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_add_overflow (4, b, (int *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_sub_overflow (5, b, (int *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_mul_overflow (6, b, (int *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_add_overflow (7, 8, (int *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_sub_overflow (9, 10, (int *)0);   /* { dg-warning "argument 3 
null" } */
+  x += __builtin_mul_overflow (11, 12, (int *)0);   /* { dg-warning "argument 
3 null" } */
 
   return x;
 }
@@ -167,34 +167,34 @@ typed_3_null (int a, int b)
 {
   int x = 0;
 
-  x += __builtin_sadd_overflow (a, b, (int *)0); /* { dg-warning "null 
argument" } */
-  x += __builtin_uadd_overflow (a, b, (unsigned *)0); /* { dg-warning "null 
argument" } */
+  x += __builtin_sadd_overflow (a, b, (int *)0); /* { dg-warning "argument 3 
null" } */
+  x += __builtin_uadd_overflow (a, b, (unsigned *)0); /* { dg-warning 
"argument 3 null" } */
 
-  x += __builtin_saddl_overflow (a, b, (long *)0); /* { dg-warning "null 
argument" } */
-  x += __builtin_uaddl_overflow (a, b, (unsigned long *)0); /* { dg-warning 
"null argument" } */
+  x += __builtin_saddl_overflow (a, b, (long *)0); /* { dg-warning "argument 3 
null" } */
+  x += __builtin_uaddl_overflow (a, b, (unsigned long *)0); /* {

[RFC] aarch64: Treat GNU and Advanced SIMD vectors as distinct [PR92789, PR95726]

2020-06-29 Thread Richard Sandiford

PR95726 is about template look-up for things like:

foo
foo

The immediate cause of the problem is that the hash function usually
returns different hashes for these types, yet the equality function
thinks they are equal.  This then raises the question of how the types
are supposed to be treated.

I think the answer is that the GNU vector type should be treated as
distinct from float32x4_t, not least because the two types mangle
differently.  However, each type should implicitly convert to the other.

This would mean that, as far as the PR is concerned, the hashing
function is right to (sometimes) treat the types differently and
the equality function is wrong to treat them as the same.

The most obvious way to enforce the type difference is to use a
target-specific type attribute.  That on its own is enough to fix
the PR.  The difficulty is deciding whether the knock-on effects
are acceptable.

One obvious effect is that GCC then rejects:

typedef float vecf __attribute__((vector_size(16)));
vecf x;
float32x4_t  = x;

on the basis that the types are no longer reference-compatible.
I think that's again the correct behaviour, and consistent with
current Clang.

A trickier question is whether:

vecf x;
float32x4_t y;
… c ? x : y …

should be valid, and if so, what its type should be [PR92789].
As explained in the comment in the testcase, GCC and Clang both
accepted this, but GCC chose the “then” type while Clang chose
the “else” type.  This can lead to different mangling for (probably
artificial) corner cases, as seen for “sel1” and “sel2” in the
testcase.

Adding the attribute makes GCC reject the conditional expression
as ambiguous.  I think that too is the correct behaviour, for the
reasons described in the testcase.  However, it does seem to have
the potential to break existing code.

It looks like aarch64_comp_type_attributes is missing cases for
the SVE attributes, but I'll handle that in a separate patch.

Any thoughts?  I'll apply this after 5pm UTC tomorrow if no asks
me not to. :-)

If I do apply the patch in its current form and there's no fallout,
I'll post a similar one for AArch32.  I'm not sure yet what to do
about backports though -- there does seem to be a too-high risk
of breaking things.

Tested on aarch64-linux-gnu (with and without SVE).

Richard


2020-06-26  Richard Sandiford  

gcc/
PR target/92789
PR target/95726
* config/aarch64/aarch64.c (aarch64_attribute_table): Add
"Advanced SIMD type".
(aarch64_comp_type_attributes): Check that the "Advanced SIMD type"
attributes are equal.
* config/aarch64/aarch64-builtins.c: Include stringpool.h and
attribs.h.
(aarch64_mangle_builtin_vector_type): Use the mangling recorded
in the "Advanced SIMD type" attribute.
(aarch64_init_simd_builtin_types): Add an "Advanced SIMD type"
attribute to each Advanced SIMD type, using the mangled type
as the attribute's single argument.

gcc/testsuite/
PR target/92789
PR target/95726
* g++.target/aarch64/pr95726.C: New test.
---
 gcc/config/aarch64/aarch64-builtins.c  | 34 
 gcc/config/aarch64/aarch64.c   | 15 ++-
 gcc/testsuite/g++.target/aarch64/pr95726.C | 46 ++
 3 files changed, 77 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/pr95726.C

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 95213cd70c8..e87a4559c36 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -43,6 +43,8 @@
 #include "gimple-iterator.h"
 #include "case-cfn-macros.h"
 #include "emit-rtl.h"
+#include "stringpool.h"
+#include "attribs.h"
 
 #define v8qi_UP  E_V8QImode
 #define v4hi_UP  E_V4HImode
@@ -639,18 +641,12 @@ aarch64_mangle_builtin_scalar_type (const_tree type)
 static const char *
 aarch64_mangle_builtin_vector_type (const_tree type)
 {
-  int i;
-  int nelts = sizeof (aarch64_simd_types) / sizeof (aarch64_simd_types[0]);
-
-  for (i = 0; i < nelts; i++)
-if (aarch64_simd_types[i].mode ==  TYPE_MODE (type)
-   && TYPE_NAME (type)
-   && TREE_CODE (TYPE_NAME (type)) == TYPE_DECL
-   && DECL_NAME (TYPE_NAME (type))
-   && !strcmp
-(IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type))),
- aarch64_simd_types[i].name))
-  return aarch64_simd_types[i].mangle;
+  tree attrs = TYPE_ATTRIBUTES (type);
+  if (tree attr = lookup_attribute ("Advanced SIMD type", attrs))
+{
+  tree mangled_name = TREE_VALUE (TREE_VALUE (attr));
+  return IDENTIFIER_POINTER (mangled_name);
+}
 
   return NULL;
 }
@@ -802,10 +798,16 @@ aarch64_init_simd_builtin_types (void)
 
   if (aarch64_simd_types[i].itype == NULL)
{
- aarch64_simd_types[i].itype
-   = build_distinct_type_copy
- (build_vector_type (eltype,

[PATCH][OBVIOUS] Use gsi_bb instead of iterator->bb.

2020-06-29 Thread Martin Liška


One obvious transformation, pushed to master.

Martin

gcc/ChangeLog:

* tree-ssa-ccp.c (gsi_prev_dom_bb_nondebug): Use gsi_bb
instead of gimple_stmt_iterator::bb.
* tree-ssa-math-opts.c (insert_reciprocals): Likewise.
* tree-vectorizer.h: Likewise.
---
 gcc/tree-ssa-ccp.c   | 2 +-
 gcc/tree-ssa-math-opts.c | 2 +-
 gcc/tree-vectorizer.h| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index e8333ac27d9..7e3921869b8 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -2159,7 +2159,7 @@ gsi_prev_dom_bb_nondebug (gimple_stmt_iterator *i)
   gsi_prev_nondebug (i);
   while (gsi_end_p (*i))
 {
-  dom = get_immediate_dominator (CDI_DOMINATORS, i->bb);
+  dom = get_immediate_dominator (CDI_DOMINATORS, gsi_bb (*i));
   if (dom == NULL || dom == ENTRY_BLOCK_PTR_FOR_FN (cfun))
return;
 
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c

index 104ae97a707..8423caa3ee3 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -446,7 +446,7 @@ insert_reciprocals (gimple_stmt_iterator *def_gsi, struct 
occurrence *occ,
  if (should_insert_square_recip)
gsi_insert_before (, new_square_stmt, GSI_SAME_STMT);
}
-  else if (def_gsi && occ->bb == def_gsi->bb)
+  else if (def_gsi && occ->bb == gsi_bb (*def_gsi))
{
  /* Case 2: insert right after the definition.  Note that this will
 never happen if the definition statement can throw, because in
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index d9f6a67264d..c393d7e5fa6 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -869,7 +869,7 @@ public:
   {
 const_reverse_iterator begin = region_end;
 if (*begin == NULL)
-  begin = const_reverse_iterator (gsi_last_bb (region_end.bb));
+  begin = const_reverse_iterator (gsi_last_bb (gsi_bb (region_end)));
 else
   ++begin;
 
--

2.27.0

Re: [PATCH] PR fortran/71706 - [8/9/10/11 Regression] [Coarray] ICE on using sync images with integer(kind<>4), with -fcoarray=lib -fcheck=bounds

2020-06-29 Thread Thomas Koenig via Gcc-patches


Hi Harald,


Here's a fix to bounds-checking code that manifests itself essentially
with checking enabled.  Once found and understood, the fix is trivial:
just properly convert the argument kind of SYNC IMAGES for checking.


OK.  (I had checked if "images" would be used later on. Tt is, so
images = fold_convert (images, ...) would probably lead to all sorts
of errors).

Thanks for the patch!

Regards

Thomas

Re: [PATCH] testsuite: Ignore line no. for BB vectorization message

2020-06-29 Thread Richard Biener via Gcc-patches

On Mon, Jun 29, 2020 at 12:24 PM Kewen.Lin via Gcc-patches
 wrote:
>
> Hi,
>
> In my testing with vector with length, I happened to find the case
> g++.dg/vect/slp-pr56812.cc need to be fixed a bit with line number
> neglection since the message for basic block vectorization looks
> like:
>   slp-pr56812.cc:19:1: optimized: basic block part vectorized using 16 byte 
> vectors
>
> while for loop vectorization, it looks like:
>   slp-pr56812.cc:17:18: optimized: loop vectorized using 16 byte vectors
>
> Is it ok for trunk?

OK.

> Thanks!
> Kewen
> -
> gcc/testsuite/ChangeLog:
>
> * g++.dg/vect/slp-pr56812.cc: Ignore line number for basic block
> vectorization messages.
>
> -
>
> diff --git a/gcc/testsuite/g++.dg/vect/slp-pr56812.cc 
> b/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
> index 3e7a495aadd..37c47acd191 100644
> --- a/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
> +++ b/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
> @@ -14,6 +14,6 @@ public:
>  void mydata::Set (float x)
>  {
>/* We want to vectorize this either as loop or basic-block.  */
> -  for (int i=0; i +  for (int i=0; i target *-*-* } 0 } */
>  data[i] = x;
>  }

[committed] amdgcn: Support basic DWARF

2020-06-29 Thread Andrew Stubbs

This patch configures the DWARF debug output to match the proposed DWARF 
specification from AMD.  This is already implemented in LLVM and rocgdb 
(out of tree).


This makes no attempt to support CFI, yet, and has some issues with 
vector registers. GCC will need to support some DWARF extensions to make 
those work properly (they're part of the AMD proposal).


Andrew
amdgcn: Support basic DWARF

This is enough DWARF support for "-O0 -g" to work OK, within a single frame,
using the new rocgdb debugger from AMD.  Debugging with optimization enabled
also works when the values are located in SGPRs or memory.

Scalars in VGPRs are problematic, EXEC_HI and VCC_HI are unmappable, and
CFI remains unimplemented.

gcc/ChangeLog:

	* config/gcn/gcn-hsa.h (DBX_REGISTER_NUMBER): New macro.
	* config/gcn/gcn-protos.h (gcn_dwarf_register_number): New prototype.
	* config/gcn/gcn.c (gcn_expand_prologue): Add RTX_FRAME_RELATED_P
	and REG_FRAME_RELATED_EXPR to stack and frame pointer adjustments.
	(gcn_dwarf_register_number): New function.
	(gcn_dwarf_register_span): New function.
	(TARGET_DWARF_REGISTER_SPAN): New hook macro.

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 4fd1365416f..cb291726e19 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -103,3 +103,4 @@ extern const char *last_arg_spec_function (int argc, const char **argv);
 #define DWARF2_DEBUGGING_INFO  1
 #define DWARF2_ASM_LINE_DEBUG_INFO 1
 #define EH_FRAME_THROUGH_COLLECT2  1
+#define DBX_REGISTER_NUMBER(REGNO) gcn_dwarf_register_number (REGNO)
diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index e4dadd37f21..92b1a602610 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -23,6 +23,7 @@ extern bool gcn_can_split_p (machine_mode, rtx);
 extern bool gcn_constant64_p (rtx);
 extern bool gcn_constant_p (rtx);
 extern rtx gcn_convert_mask_mode (rtx reg);
+extern unsigned int gcn_dwarf_register_number (unsigned int regno);
 extern char * gcn_expand_dpp_shr_insn (machine_mode, const char *, int, int);
 extern void gcn_expand_epilogue ();
 extern rtx gcn_expand_scaled_offsets (addr_space_t as, rtx base, rtx offsets,
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 5693b75b672..babecef7888 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -2795,7 +2795,11 @@ gcn_expand_prologue ()
   if (!cfun || !cfun->machine || cfun->machine->normal_function)
 {
   rtx sp = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
+  rtx sp_hi = gcn_operand_part (Pmode, sp, 1);
+  rtx sp_lo = gcn_operand_part (Pmode, sp, 0);
   rtx fp = gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM);
+  rtx fp_hi = gcn_operand_part (Pmode, fp, 1);
+  rtx fp_lo = gcn_operand_part (Pmode, fp, 0);
 
   start_sequence ();
 
@@ -2812,14 +2816,40 @@ gcn_expand_prologue ()
 	+ offsets->callee_saves
 	+ offsets->local_vars + offsets->outgoing_args_size;
   if (sp_adjust > 0)
-	emit_insn (gen_adddi3_scc (sp, sp, gen_int_mode (sp_adjust, DImode)));
+	{
+	  /* Adding RTX_FRAME_RELATED_P effectively disables spliting, so
+	 we use split add explictly, and specify the DImode add in
+	 the note.  */
+	  rtx scc = gen_rtx_REG (BImode, SCC_REG);
+	  rtx adjustment = gen_int_mode (sp_adjust, SImode);
+	  rtx insn = emit_insn (gen_addsi3_scalar_carry (sp_lo, sp_lo,
+			 adjustment, scc));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	  add_reg_note (insn, REG_FRAME_RELATED_EXPR,
+			gen_rtx_SET (sp,
+ gen_rtx_PLUS (DImode, sp, adjustment)));
+	  emit_insn (gen_addcsi3_scalar_zero (sp_hi, sp_hi, scc));
+	}
 
   if (offsets->need_frame_pointer)
-	emit_insn (gen_adddi3_scc (fp, sp,
-   gen_int_mode
-   (-(offsets->local_vars +
-  offsets->outgoing_args_size),
-DImode)));
+	{
+	  /* Adding RTX_FRAME_RELATED_P effectively disables spliting, so
+	 we use split add explictly, and specify the DImode add in
+	 the note.  */
+	  rtx scc = gen_rtx_REG (BImode, SCC_REG);
+	  int fp_adjust = -(offsets->local_vars + offsets->outgoing_args_size);
+	  rtx adjustment = gen_int_mode (fp_adjust, SImode);
+	  rtx insn = emit_insn (gen_addsi3_scalar_carry(fp_lo, sp_lo,
+			adjustment, scc));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	  add_reg_note (insn, REG_FRAME_RELATED_EXPR,
+			gen_rtx_SET (fp,
+ gen_rtx_PLUS (DImode, sp, adjustment)));
+	  emit_insn (gen_addcsi3_scalar (fp_hi, sp_hi,
+	 (fp_adjust < 0 ? GEN_INT (-1)
+	  : const0_rtx),
+	 scc, scc));
+	}
 
   rtx_insn *seq = get_insns ();
   end_sequence ();
@@ -2865,6 +2895,8 @@ gcn_expand_prologue ()
 
   /* Set up frame pointer and stack pointer.  */
   rtx sp = gen_rtx_REG (DImode, STACK_POINTER_REGNUM);
+  rtx sp_hi = simplify_gen_subreg (SImode, sp, DImode, 4);
+  rtx sp_lo = simplify_gen_subreg (SImode, sp, DImode, 0);
   rtx fp = gen_rtx_REG (DImode, HARD_FRAME_POINTER_REGNUM);
   rtx fp_hi = simplify_gen_subreg (SImode, fp,

[PATCH] aarch64: Fix missing BTI instruction in trampolines

2020-06-29 Thread Omar Tahir

Hi,

Got a small bugfix here regarding BTIs and trampolines.

If two functions require trampolines, and the first has BTI enabled while the
second doesn't, the generated template will be lacking a BTI instruction.
This patch fixes this by always adding a BTI instruction, which is safe as BTI
instructions are ignored on unsupported architecture versions.

I don't have write access, so could someone commit for me?

Bootstrapped and tested on aarch64 with no regressions.

gcc/ChangeLog:

2020-06-29  Omar Tahir omar.ta...@arm.com

* config/aarch64/aarch64.c (aarch64_asm_trampoline_template): Always
generate a BTI instruction.


gcc/testsuite/ChangeLog:

2020-06-29  Omar Tahir omar.ta...@arm.com

* gcc.target/aarch64/bti-4.c: New test.


trampoline_bti.patch
Description: trampoline_bti.patch

GCC 10.1.1 Status Report (2020-06-29)

2020-06-29 Thread Richard Biener



Status
==

The GCC 10 branch is in regression and documentation fixing mode.

We're close to two months after the GCC 10.1 release which means
a first bugfix release is about to happen.  The plan is to release
mid July and I am targeting for a release candidate mid next
week, no later than July 17th.

Branch status looks mostly good so this is a heads up for backporting
of important regression fixes that already happened on trunk as well
as checking build status of non-primary targets.


Quality Data


Priority  #   Change from last report
---   ---
P1 
P2  216   +   8
P3   47   +  33
P4  174   +   1
P5   22   +   1
---   ---
Total P1-P3 263   +  41
Total   459   +  43


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2020-April/000504.html

Re: [PATCH] handle MEM_REF with void* arguments (PR c++/95768)

2020-06-29 Thread Richard Biener via Gcc-patches

On Mon, Jun 29, 2020 at 1:08 AM Martin Sebor  wrote:
>
> On 6/23/20 1:12 AM, Richard Biener wrote:
> > On Tue, Jun 23, 2020 at 12:22 AM Martin Sebor via Gcc-patches
> >  wrote:
> >>
> >> On 6/22/20 12:55 PM, Jason Merrill wrote:
> >>> On 6/22/20 1:25 PM, Martin Sebor wrote:
>  The attached fix parallels the one for the equivalent C bug 95580
>  where the pretty printers don't correctly handle MEM_REF arguments
>  with type void* or other pointers to an incomplete type.
> 
>  The incorrect handling was exposed by the recent change to
>  -Wuninitialized which includes such expressions in diagnostics.
> >>>
>  +if (tree size = TYPE_SIZE_UNIT (TREE_TYPE (argtype)))
>  +  if (!integer_onep (size))
>  +{
>  +  pp_cxx_left_paren (pp);
>  +  dump_type (pp, ptr_type_node, flags);
>  +  pp_cxx_right_paren (pp);
>  +}
> >>>
> >>> Don't we want to print the cast if the pointer target type is incomplete?
> >>
> >> I suppose, yes, although after some more testing I think what should
> >> be output is the type of the access.  The target pointer type isn't
> >> meaningful (at least not in this case).
> >>
> >> Here's what the warning looks like in C for the test case in
> >> gcc.dg/pr95580.c:
> >>
> >> warning: ‘*((void *)(p)+1)’ may be used uninitialized
> >>
> >> and like this in C++:
> >>
> >> warning: ‘*(p +1)’ may be used uninitialized
> >>
> >> The +1 is a byte offset, which is correct given that incrementing
> >> a void* in GCC is the same as adding 1 to the byte address, but
> >> dereferencing a void* doesn't correspond to what's going on in
> >> the source.
> >>
> >> Even for a complete type (with size greater than 1), printing
> >> the type of the argument plus a byte offset is wrong.  It ends
> >> up with this for the C++ test case from 95768:
> >>
> >> warning: ‘*((int*) +4)’ is used uninitialized
> >>
> >> when the access is actually ‘*((int*) +1)’
> >>
> >> So it seems to me for MEM_REF, to make the output meaningful,
> >> it's the type of the access (i.e., the MEM_REF type) that should
> >> be printed here, and the offset should either be in elements of
> >> the accessed type, i.e.,
> >>
> >> warning: ‘*((int*) +1)’ is used uninitialized
> >>
> >> or, if the access is misaligned, the argument should first be
> >> cast to char*, the offset added, and the result then cast to
> >> the access type, like this:
> >>
> >> warning: ‘*(T*)((char*) +1)’ is used uninitialized
> >>
> >> The attached revised and less than fully tested patch implements
> >> this for C++ only for now.  If we agree on this approach I'll see
> >> about making the corresponding change in C.
> >
> > Note that there is no C/C++ way of fully expressing MEM_REF
> > semantics.  __MEM  ((T *)p + 1) is not actually
> > *(int *)((char *)p + 1) because that does not reflect that the
> > effective type of the lvalue when TBAA is concerned is 'T'
> > rather than 'int'.
>
> What form would you say is closest to the C/C++ semantics, or
> likely the most useful to users, that GCC could print instead?

Hmm, I'd try *() maybe?  Because there's
no C/C++ that can express what GIMPLE can do here.  Of course
pattern matching the exact cases we can handle like your patch
is an improvement (but as said the TBAA issue is still present).

> > Note for MEM_REF the offset is always
> > a constant byte offset but it indeed does not have to be a
> > multiple of the MEM_REF type size.
> >
> > I wonder whether printing the MEM_REF in full provides
> > any real diagnostic value in the more "obfuscated" cases.
>
> I'm not sure what obfuscated cases you're thinking of, or what
> you mean by printing it in full.

I think that printing ‘*(T*)((char*) +1)’ is likely going
to confuse users because they cannot match this to a source
location.  If we have a source location we should have caret
diagnostics.

>  I instrumented the code to
> print every MEM_REF in that comes up in warn_uninitialized_vars
> and rebuilt GCC.  There are 17,456 distinct instances so I didn't
> review them all but those I did look at all look reasonable.
> Probably the least useful are those that mention  by
> itself (i.e.,  or *).  Those with an offset
> are more informative (e.g., *((access**) +1).  In
> a few the offset is very large, such as *((unsigned int*)sp
> +4611686018427387900), but that doesn't seem like a problem.
> I'd be happy to share the result.

Here +4611686018427387900 should be printed as -4, MEM_REF
offsets are to be interpreted as signed.

> >
> > I'd also not print  but .
>
> I also don't find  helpful, but I don't see 
> as an improvement.  I think printing the SSA variable would be
> more informative here since its name is usually related to
> the variable it was derived from in the source.  But making that
> change (or any other like it) feels like too much feature creep
> for this fix.  I'd be happy to do it in a follow up if we

[PATCH 1/7 v7] ifn/optabs: Support vector load/store with length

2020-06-29 Thread Kewen.Lin via Gcc-patches

Hi Richard,

Thanks for the comments!

on 2020/6/29 下午6:07, Richard Sandiford wrote:
> Thanks for the update.  I agree with the summary of the IRC discussion
> except for…
> 
> "Kewen.Lin"  writes:
>> Hi Richard S./Richi/Jim/Segher,
>>
>> Thanks a lot for your comments to make this patch more solid.
>>
>> Based on our discussion, for the vector load/store with length
>> optab, the length unit would be measured in lanes by default.
>> For the targets which support length measured in bytes like Power,
>> they should only define VnQI modes to wrap the other same size
>> vector modes.  If the length is larger than total lane/byte count
>> of the given mode, it's taken to load all lanes/bytes implicitly.
> 
> …this last bit.  IMO the behaviour of the optab should be undefined
> when the supplied length is greater than the number of lanes.
> 
> I think that also makes things better for the lxvl implementation,
> which ignores the upper 56 bits of the length.  It sounds like the
> above semantics would instead require Power to saturate the value
> at 255 before shifting it.
> 

Good catch, I just realized that this part is inconsistent to what I
implemented in patch 5/7, where the function vect_gen_len still does
the min operation between the given length and length_limit.

This patch is updated accordingly to state the behavior to be undefined.
The others aren't required to change.

Could you have a further look? Thanks in advance!

v6/v7: Updated optab descriptions.

v5:
  - Updated lenload/lenstore optab to len_load/len_store and the docs.
  - Rename expand_mask_{load,store}_optab_fn to 
expand_partial_{load,store}_optab_fn
  - Added/updated macros for expand_mask_{load,store}_optab_fn
and expand_len_{load,store}_optab_fn

v4: Update len_load_direct/len_store_direct to align with direct optab.

v3: Get rid of length mode hook.

BR,
Kewen
-
gcc/ChangeLog:

2020-MM-DD  Kewen Lin  

* doc/md.texi (len_load_@var{m}): Document.
(len_store_@var{m}): Likewise.
* internal-fn.c (len_load_direct): New macro.
(len_store_direct): Likewise.
(expand_len_load_optab_fn): Likewise.
(expand_len_store_optab_fn): Likewise.
(direct_len_load_optab_supported_p): Likewise.
(direct_len_store_optab_supported_p): Likewise.
(expand_mask_load_optab_fn): New macro.  Original renamed to ...
(expand_partial_load_optab_fn): ... here.  Add handlings for
len_load_optab.
(expand_mask_store_optab_fn): New macro.  Original renamed to ...
(expand_partial_store_optab_fn): ... here. Add handlings for
len_store_optab.
(internal_load_fn_p): Handle IFN_LEN_LOAD.
(internal_store_fn_p): Handle IFN_LEN_STORE.
(internal_fn_stored_value_index): Handle IFN_LEN_STORE.
* internal-fn.def (LEN_LOAD): New internal function.
(LEN_STORE): Likewise.
* optabs.def (len_load_optab, len_store_optab): New optab.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2c67c818da5..c8d7bcc7f62 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5167,6 +5167,33 @@ mode @var{n}.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{len_load_@var{m}} instruction pattern
+@item @samp{len_load_@var{m}}
+Load the number of units specified by operand 2 from memory operand 1
+into register operand 0, setting the other bytes of operand 0 to
+undefined values.  Operands 0 and 1 have mode @var{m}.  Operand 2 has
+whichever integer mode the target prefers.  If operand 2 exceeds the
+maximum units of mode @var{m}, the behavior is undefined.  For targets
+which support length measured in bytes, they should only define VnQI
+mode to wrap the other vector modes with the same size.  Meanwhile,
+it's required that the byte count should be a multiple of the element
+size (wrapped vector).
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{len_store_@var{m}} instruction pattern
+@item @samp{len_store_@var{m}}
+Store the number of units specified by operand 2 from nonmemory operand 1
+into memory operand 0, leaving the other bytes of operand 0 unchanged.
+Operands 0 and 1 have mode @var{m}.  Operand 2 has whichever integer
+mode the target prefers.  If operand 2 exceeds the maximum units of mode
+@var{m}, the behavior is undefined.  For targets which support length
+measured in bytes, they should only define VnQI mode to wrap the other
+vector modes with the same size.  Meanwhile, it's required that the byte
+count should be a multiple of the element size (wrapped vector).
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{vec_perm@var{m}} instruction pattern
 @item @samp{vec_perm@var{m}}
 Output a (variable) vector permutation.  Operand 0 is the destination
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 4f088de48d5..1e53ced60eb 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -104,10 +104,12 @@ init_internal_fns ()
 #define load_lanes_direct { -1, -1, false }
 #define

[PATCH] testsuite: Ignore line no. for BB vectorization message

2020-06-29 Thread Kewen.Lin via Gcc-patches

Hi,

In my testing with vector with length, I happened to find the case
g++.dg/vect/slp-pr56812.cc need to be fixed a bit with line number
neglection since the message for basic block vectorization looks
like:
  slp-pr56812.cc:19:1: optimized: basic block part vectorized using 16 byte 
vectors

while for loop vectorization, it looks like:
  slp-pr56812.cc:17:18: optimized: loop vectorized using 16 byte vectors

Is it ok for trunk? 

Thanks!
Kewen
-
gcc/testsuite/ChangeLog:

* g++.dg/vect/slp-pr56812.cc: Ignore line number for basic block
vectorization messages.

-

diff --git a/gcc/testsuite/g++.dg/vect/slp-pr56812.cc 
b/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
index 3e7a495aadd..37c47acd191 100644
--- a/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
+++ b/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
@@ -14,6 +14,6 @@ public:
 void mydata::Set (float x)
 {
   /* We want to vectorize this either as loop or basic-block.  */
-  for (int i=0; i

Re: [PATCH] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.

2020-06-29 Thread Jakub Jelinek via Gcc-patches

On Mon, Jun 29, 2020 at 08:58:57AM +0200, Richard Biener via Gcc-patches wrote:
> Most of the cases I've seen involve transforms that make _b_c_p constant
> on one path and then introduce a new PHI merging two _b_c_p values
> to be then tested in a not simplified condition.  I'm not sure how to fend
> off jump threading (yeah, it's nearly always jump threading doing this...)
> doing this but certainly the easiest way would be to simply disallow
> [jump threading] from duplicating _b_c_p calls.
> 
> Or fold _b_c_p even earlier (though I definitely saw early backwards
> jump threading mess up such a case).

Yeah, perhaps disallow duplicating bcp calls in the threading before IPA
(for bcp to work well, we want to preserve it until inlining had a chance to
propagate constants in there) and after IPA allow that, but just fold them
into 0 during that on both paths.

Jakub

Re: [PATCH] c++: Check uniqueness of concepts/variable templates [PR94553]

2020-06-29 Thread Jason Merrill via Gcc-patches


On 6/26/20 3:26 PM, Marek Polacek wrote:

This patch wraps up PR94553.  Variable template names have no C
compatibility implications so they should be unique in their
declarative region.  It occurred to me that this applies to concepts
as well.  This is not specified in [basic.scope.declarative]/4.2
but that seems like a bug in the standard.

I couldn't use variable_template_p because that uses PRIMARY_TEMPLATE_P
which uses DECL_PRIMARY_TEMPLATE and that might not have been set up yet
(push_template_decl hasn't yet been called).  PRIMARY_TEMPLATE_P is
important to distinguish between a variable template and a variable in a
function template.  But I think we don't have to worry about that in
duplicate_decls: a template declaration cannot appear at block scope,
and additional checks in duplicate_decls suggest that it won't ever
see a TEMPLATE_DECL for a variable in a function template.  So
checking that the DECL_TEMPLATE_RESULT is a VAR_DECL seems to be fine.
I could have added a default argument to variable_template_p too to
avoid checking PRIMARY_TEMPLATE_P but it didn't seem worth the effort.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/94553
* decl.c (duplicate_decls): Make sure a concept or a variable
template is unique in its declarative region.

gcc/testsuite/ChangeLog:

PR c++/94553
* g++.dg/cpp1y/pr68578.C: Adjust dg-error.
* g++.dg/cpp1y/var-templ66.C: New test.
* g++.dg/cpp2a/concepts-redecl1.C: New test.
---
  gcc/cp/decl.c | 12 +++-
  gcc/testsuite/g++.dg/cpp1y/pr68578.C  |  2 +-
  gcc/testsuite/g++.dg/cpp1y/var-templ66.C  |  7 +++
  gcc/testsuite/g++.dg/cpp2a/concepts-redecl1.C |  7 +++
  4 files changed, 26 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ66.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-redecl1.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 3afad5ca805..45c871af741 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -1679,6 +1679,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
newdecl_is_friend)
else if (DECL_TYPE_TEMPLATE_P (olddecl)
   || DECL_TYPE_TEMPLATE_P (newdecl))
/* Class template conflicts.  */;
+  else if ((TREE_CODE (olddecl) == TEMPLATE_DECL
+   && DECL_TEMPLATE_RESULT (olddecl)
+   && TREE_CODE (DECL_TEMPLATE_RESULT (olddecl)) == VAR_DECL)
+  || (TREE_CODE (newdecl) == TEMPLATE_DECL
+  && DECL_TEMPLATE_RESULT (newdecl)
+  && TREE_CODE (DECL_TEMPLATE_RESULT (newdecl)) == VAR_DECL))
+   /* Variable template conflicts.  */;
+  else if (concept_definition_p (olddecl)
+  || concept_definition_p (newdecl))
+   /* Concept conflicts.  */;
else if ((TREE_CODE (newdecl) == FUNCTION_DECL
&& DECL_FUNCTION_TEMPLATE_P (olddecl))
   || (TREE_CODE (olddecl) == FUNCTION_DECL
@@ -1701,7 +1711,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
newdecl_is_friend)
  " literal operator template", newdecl);
  else
return NULL_TREE;
-   
+
  inform (olddecl_loc, "previous declaration %q#D", olddecl);
  return error_mark_node;
}
diff --git a/gcc/testsuite/g++.dg/cpp1y/pr68578.C 
b/gcc/testsuite/g++.dg/cpp1y/pr68578.C
index 18edd83cd7f..9b3898176f1 100644
--- a/gcc/testsuite/g++.dg/cpp1y/pr68578.C
+++ b/gcc/testsuite/g++.dg/cpp1y/pr68578.C
@@ -1,4 +1,4 @@
  // { dg-do compile { target c++14 } }
  
-template  struct bar foo; template <> struct foo<>:  // { dg-error "class template" }

+template  struct bar foo; template <> struct foo<>:  // { dg-error "class 
template|redeclared" }
  // { dg-error "-:expected" "" { target *-*-* } .+1 }
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ66.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ66.C
new file mode 100644
index 000..65cd3d9d31b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ66.C
@@ -0,0 +1,7 @@
+// PR c++/94553
+// { dg-do compile { target c++14 } }
+
+struct C { };
+template int C; // { dg-error "different kind of entity" }
+template int D;
+struct D { }; // { dg-error "different kind of entity" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-redecl1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-redecl1.C
new file mode 100644
index 000..33cd778a318
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-redecl1.C
@@ -0,0 +1,7 @@
+// PR c++/94553
+// { dg-do compile { target c++20 } }
+
+struct E { };
+template concept E = false; // { dg-error "different kind of entity" 
}
+template concept F = false;
+struct F { }; // { dg-error "different kind of entity" }

base-commit: b3d77404c060c0d65d8d4c97254995737d0fc032

Re: [RFC] rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-06-29 Thread Richard Biener via Gcc-patches

On Fri, Jun 26, 2020 at 10:12 PM Raoni Fassina Firmino via Gcc-patches
 wrote:
>
> Hi all,
>
>
> This is an early draft I'm working on to add fegetround , feclearexcept
> and feraiseexcept as builtins on rs6000.  This is my first patch so I
> welcome any and all feedback.  Foremost I have some questions to ask as
> I got stuck on some problems.
>
>
> Q1) How to implement a target specific builtin for a C standard
> function?
>
> More specifically, how to make gcc use a rs6000 builtin for a
> standard C function? Right now, I am getting a double define of the
> builtin.  I don't know if define is the right word for it, may be
> register an implementation?
>
> The context is that I am creating builtin optimizations for fegetround,
> feclearexcept and feraiseexcept.  Early on I discovered that there is
> this file that defines builtins for all C library but not actually
> implements them (in gcc/builtins.def) and trying to redefine them in
> gcc/config/rs6000/rs6000-builtin.def ends up with a name clash.  So I
> implemented the builtins with a suffix in its names and pushed this
> problem for later...  And this later time is now.
>
> I tried my best to find something about it on the gcc internal
> documentation but I may have missed it.
>
> So this is my question, how to I link the builtin defined in
> gcc/builtins.def to use my implementation on rs6000? If someone has a
> pointer about it or a patch that does it for some other c function (in
> any target architecture) that would be great.
>
>
> Q2) How to fallback to the default behavior of the function call when
> the builtin is not suitable for the parameters?
>
> Here, it is more specifically for feclearexcept and feraiseexcept.  The
> builtin should only be used in the case of the parameter input is a
> constant number with only 1bit mask (to work on only one exception).
> Right now, I make the correctly check and it works (I validate the
> builtins using a name suffix to avoid the problem mentioned in Q1)
> But It aborts when the input is not valid instead of falling back to a
> function call.
>
>
> Q3) Are the implementations for the builtins more or less on the
> right places?
>
> The first one I did was fegetround and I based it on ppc_get_timebase
> and other related builtins, so I used a define_expand on rs6000.md, but
> when I was working on the fe*except I was basing it on other builtins
> and ended up implementing it all on rs6000-call.c, but I am not sure if
> there is a canonical way of doing it one way or another.

GCC already knows fe* builtins, what GCC does not yet have is
a way for targets to specify custom expansion of them.  So instead
of adding powerpc specific builtins you should add optabs for the
RTL expansion part.

I'm not sure if the actual choice of macro values for the fe* builtins
need glueing logic or if we want them to be determined statically
by the target configuration - see how we handle folding of
fpclassify.  At least without -frounding-math fegetround could be
constant folded to FE_TONEAREST for which we'd need the
actual value of FE_TONEAREST.

Richard.

>
> o/
> Raoni Fassina Firmino
>
>  8< 
>
> This optimizations were originally in glibc, but was removed
> and sugested that they were a good fit as gcc builtins[1].
>
> The associated bugreport: PR target/94193
>
> [1] https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00047.html
> https://sourceware.org/legacy-ml/libc-alpha/2020-03/msg00080.html
>
> Signed-off-by: Raoni Fassina Firmino 
> ---
>  gcc/config/rs6000/rs6000-builtin.def | 13 ++
>  gcc/config/rs6000/rs6000-call.c  | 69 
>  gcc/config/rs6000/rs6000.md  | 18 
>  3 files changed, 100 insertions(+)
>
> diff --git a/gcc/config/rs6000/rs6000-builtin.def 
> b/gcc/config/rs6000/rs6000-builtin.def
> index 54f750c8384..d5ca15141b1 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -2567,12 +2567,25 @@ BU_SPECIAL_X (RS6000_BUILTIN_GET_TB, 
> "__builtin_ppc_get_timebase",
>  BU_SPECIAL_X (RS6000_BUILTIN_MFTB, "__builtin_ppc_mftb",
>   RS6000_BTM_ALWAYS, RS6000_BTC_MISC)
>
> +BU_SPECIAL_X (RS6000_BUILTIN_FEGETROUND, "__builtin_fegetround",
> + RS6000_BTM_ALWAYS, RS6000_BTC_MISC)
> +
>  BU_SPECIAL_X (RS6000_BUILTIN_MFFS, "__builtin_mffs",
>   RS6000_BTM_ALWAYS, RS6000_BTC_MISC)
>
>  BU_SPECIAL_X (RS6000_BUILTIN_MFFSL, "__builtin_mffsl",
>   RS6000_BTM_ALWAYS, RS6000_BTC_MISC)
>
> +RS6000_BUILTIN_X (RS6000_BUILTIN_FECLEAREXCEPT, "__builtin_feclearexcept",
> + RS6000_BTM_ALWAYS,
> + RS6000_BTC_MISC | RS6000_BTC_UNARY,
> + CODE_FOR_nothing)
> +
> +RS6000_BUILTIN_X (RS6000_BUILTIN_FERAISEEXCEPT, "__builtin_feraiseexcept",
> + RS6000_BTM_ALWAYS,
> + RS6000_BTC_MISC | RS6000_BTC_UNARY,
> + CODE_FOR_nothing)
> +
>  RS6000_BUILTIN_X (RS6000_BUILTIN_MTFSF,

Re: [PATCH] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.

2020-06-29 Thread Richard Biener via Gcc-patches

On Sat, Jun 27, 2020 at 4:52 PM Marc Glisse  wrote:
>
> On Fri, 26 Jun 2020, Jeff Law via Gcc-patches wrote:
>
> > In theory yes, but there are cases where paths converge (like you've shown) 
> > where
> > you may have evaluated to a constant on the paths, but it's not a constant 
> > at the
> > convergence point.  You have to be very careful using b_c_p like this and 
> > it's
> > been a regular source of kernel bugs.
> >
> >
> > I'd recommend looking at the .ssa dump and walk forward from there if the 
> > .ssa
> > dump looks correct.
>
> Here is the last dump before thread1 (105t.mergephi2). I don't see
> anything incorrect in it.
>
> ledtrig_cpu (_Bool is_active)
> {
>int old;
>int iftmp.0_1;
>int _5;
>
> [local count: 1073741824]:
>if (is_active_2(D) != 0)
>  goto ; [50.00%]
>else
>  goto ; [50.00%]
>
> [local count: 536870913]:
>
> [local count: 1073741824]:
># iftmp.0_1 = PHI <1(2), -1(3)>
>_5 = __builtin_constant_p (iftmp.0_1);
>if (_5 != 0)
>  goto ; [50.00%]
>else
>  goto ; [50.00%]
>
> [local count: 536870913]:
>if (iftmp.0_1 >= -128)
>  goto ; [50.00%]
>else
>  goto ; [50.00%]
>
> [local count: 268435456]:
>if (iftmp.0_1 <= 127)
>  goto ; [34.00%]
>else
>  goto ; [66.00%]
>
> [local count: 91268056]:
>__asm__ __volatile__("asi %0,%1
> " : "ptr" "=Q" MEM[(int *)_active_cpus] : "val" "i" iftmp.0_1, "Q" 
> MEM[(int *)_active_cpus] : "memory", "cc");
>goto ; [100.00%]
>
> [local count: 982473769]:
>__asm__ __volatile__("laa %0,%2,%1
> " : "old" "=d" old_8, "ptr" "=Q" MEM[(int *)_active_cpus] : "val" "d" 
> iftmp.0_1, "Q" MEM[(int *)_active_cpus] : "memory", "cc");
>
> [local count: 1073741824]:
>return;
>
> }
>
> There is a single _b_c_p, the immediate asm argument is exactly the
> argument of _b_c_p, and it is in the branch protected by _b_c_p.
>
> Now the thread1 dump, for comparison
>
> ledtrig_cpu (_Bool is_active)
> {
>int old;
>int iftmp.0_4;
>int iftmp.0_6;
>int _7;
>int _12;
>int iftmp.0_13;
>int iftmp.0_14;
>
> [local count: 1073741824]:
>if (is_active_2(D) != 0)
>  goto ; [50.00%]
>else
>  goto ; [50.00%]
>
> [local count: 536870912]:
># iftmp.0_6 = PHI <1(2)>
>_7 = __builtin_constant_p (iftmp.0_6);
>if (_7 != 0)
>  goto ; [50.00%]
>else
>  goto ; [50.00%]
>
> [local count: 536870912]:
># iftmp.0_4 = PHI <-1(2)>
>_12 = __builtin_constant_p (iftmp.0_4);
>if (_12 != 0)
>  goto ; [50.00%]
>else
>  goto ; [50.00%]
>
> [local count: 268435456]:
>if (iftmp.0_4 >= -128)
>  goto ; [20.00%]
>else
>  goto ; [80.00%]
>
> [local count: 214748364]:
>if (iftmp.0_6 <= 127)
>  goto ; [12.00%]
>else
>  goto ; [88.00%]
>
> [local count: 91268056]:
># iftmp.0_13 = PHI 
>__asm__ __volatile__("asi %0,%1
> " : "ptr" "=Q" MEM[(int *)_active_cpus] : "val" "i" iftmp.0_13, "Q" 
> MEM[(int *)_active_cpus] : "memory", "cc");
>goto ; [100.00%]
>
> [local count: 982473769]:
># iftmp.0_14 = PHI 
>__asm__ __volatile__("laa %0,%2,%1
> " : "old" "=d" old_8, "ptr" "=Q" MEM[(int *)_active_cpus] : "val" "d" 
> iftmp.0_14, "Q" MEM[(int *)_active_cpus] : "memory", "cc");
>
> [local count: 1073741824]:
>return;
>
> }
>
> Thread1 decides to separate the paths is_active and !is_active
> (surprisingly, for one it optimizes out the comparison <= 127 and for the
> other the comparison >= -128, while it could optimize both in both cases).
> And it decides to converge after the comparisons, but before the asm.
>
> What the pass did does seem to hurt. It looks like if we duplicate _b_c_p,
> we may need to duplicate far enough to include all the blocks dominated by
> _b_c_p==true (including the asm, here). Otherwise, any _b_c_p can be
> optimized to true, because for a boolean
>
> b is the same as b ? true : false
> __builtin_constant_p(b ? true : false) would be the same as b ?
> __builtin_constant_p(true) : __builtin_constant_p(false), i.e. true.
>
> It is too bad we don't have any optimization pass using ranges between IPA
> and thread1, that would have gotten rid of the comparisons, and hence the
> temptation to thread. Adding always_inline on atomic_add (or flatten on
> the caller) does help: EVRP removes the comparisons.
>
> Do you see a way forward without changing what thread1 does or declaring
> the testcase as unsupported?

Most of the cases I've seen involve transforms that make _b_c_p constant
on one path and then introduce a new PHI merging two _b_c_p values
to be then tested in a not simplified condition.  I'm not sure how to fend
off jump threading (yeah, it's nearly always jump threading doing this...)
doing this but certainly the easiest way would be to simply disallow
[jump threading] from duplicating _b_c_p calls.

Or fold _b_c_p even earlier (though I definitely saw early backwards
jump threading mess up such

Re: [PATCH v3] c++: Fix CTAD for aggregates in template [PR95568]

2020-06-29 Thread Jason Merrill via Gcc-patches


On 6/26/20 5:38 PM, Marek Polacek wrote:

On Fri, Jun 26, 2020 at 04:19:08PM -0400, Jason Merrill wrote:

Please also test scoped enum.


Here:

-- >8 --
95568 complains that CTAD for aggregates doesn't work within
requires-clause and it turned out that it doesn't work when we try
the deduction in a template.  The reason is that maybe_aggr_guide
creates a guide that can look like this

   template X(decltype (X::x))-> X

where the parameter is a decltype, which is a non-deduced context.  So
the subsequent build_new_function_call fails because unify_one_argument
can't deduce anything from it ([temp.deduct.type]: "If a template
parameter is used only in non-deduced contexts and is not explicitly
specified, template argument deduction fails.")

Those decltypes come from finish_decltype_type.  We can just use
TREE_TYPE instead.  I pondered using unlowered_expr_type, but that
didn't make any difference for the FIELD_DECLs I saw in
class-deduction-aggr6.C.


OK, thanks.


gcc/cp/ChangeLog:

PR c++/95568
* pt.c (collect_ctor_idx_types): Use TREE_TYPE.

gcc/testsuite/ChangeLog:

PR c++/95568
* g++.dg/cpp2a/class-deduction-aggr5.C: New test.
* g++.dg/cpp2a/class-deduction-aggr6.C: New test.
---
  gcc/cp/pt.c   |  2 +-
  .../g++.dg/cpp2a/class-deduction-aggr5.C  | 20 +++
  .../g++.dg/cpp2a/class-deduction-aggr6.C  | 35 +++
  3 files changed, 56 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr5.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr6.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 53a64c3a15e..618bf68b2d6 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28329,7 +28329,7 @@ collect_ctor_idx_types (tree ctor, tree list, tree elt 
= NULL_TREE)
tree idx, val; unsigned i;
FOR_EACH_CONSTRUCTOR_ELT (v, i, idx, val)
  {
-  tree ftype = elt ? elt : finish_decltype_type (idx, true, tf_none);
+  tree ftype = elt ? elt : TREE_TYPE (idx);
if (BRACE_ENCLOSED_INITIALIZER_P (val)
  && CONSTRUCTOR_NELTS (val)
  /* As in reshape_init_r, a non-aggregate or array-of-dependent-bound
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr5.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr5.C
new file mode 100644
index 000..01253f42006
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr5.C
@@ -0,0 +1,20 @@
+// PR c++/95568
+// { dg-do compile { target c++20 } }
+
+template struct X { T x; };
+template struct X2 { T x; U y; };
+template concept Y = requires { X{0}; };
+
+template
+void g()
+{
+  X{0};
+  X2{1, 2.2};
+  Y auto y = X{1};
+}
+
+void
+fn ()
+{
+  g();
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr6.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr6.C
new file mode 100644
index 000..95d7c5eec18
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr6.C
@@ -0,0 +1,35 @@
+// PR c++/95568
+// { dg-do compile { target c++20 } }
+// CTAD with aggregates containing bit-fields.
+
+template struct same_type;
+template struct same_type {};
+
+enum E { e };
+enum class F { f };
+
+template
+struct X {
+  T a : 5;
+};
+
+template
+void g()
+{
+  auto x = X{ 0 };
+  same_type();
+  auto x2 = X{ E::e };
+  same_type();
+  auto x3 = X{ false };
+  same_type();
+  auto x4 = X{ 0u };
+  same_type();
+  auto x5 = X{ F::f };
+  same_type();
+}
+
+void
+fn ()
+{
+  g();
+}

base-commit: 0801f419440c14f6772b28f763ad7d40f7f7a580

Re: [PATCH][GCC][Arm] PR target/95646: Do not clobber callee saved registers with CMSE

2020-06-29 Thread Christophe Lyon via Gcc-patches

On Mon, 29 Jun 2020 at 10:56, Andre Vieira (lists)
 wrote:
>
>
> On 23/06/2020 21:52, Christophe Lyon wrote:
> > On Tue, 23 Jun 2020 at 15:28, Andre Vieira (lists)
> >  wrote:
> >> On 23/06/2020 13:10, Kyrylo Tkachov wrote:
>  -Original Message-
>  From: Andre Vieira (lists) 
>  Sent: 22 June 2020 09:52
>  To: gcc-patches@gcc.gnu.org
>  Cc: Kyrylo Tkachov 
>  Subject: [PATCH][GCC][Arm] PR target/95646: Do not clobber callee saved
>  registers with CMSE
> 
>  Hi,
> 
>  As reported in bugzilla when the -mcmse option is used while compiling
>  for size (-Os) with a thumb-1 target the generated code will clear the
>  registers r7-r10. These however are callee saved and should be preserved
>  accross ABI boundaries. The reason this happens is because these
>  registers are made "fixed" when optimising for size with Thumb-1 in a
>  way to make sure they are not used, as pushing and popping hi-registers
>  requires extra moves to and from LO_REGS.
> 
>  To fix this, this patch uses 'callee_saved_reg_p', which accounts for
>  this optimisation, instead of 'call_used_or_fixed_reg_p'. Be aware of
>  'callee_saved_reg_p''s definition, as it does still take call used
>  registers into account, which aren't callee_saved in my opinion, so it
>  is a rather misnoemer, works in our advantage here though as it does
>  exactly what we need.
> 
>  Regression tested on arm-none-eabi.
> 
>  Is this OK for trunk? (Will eventually backport to previous versions if
>  stable.)
> >>> Ok.
> >>> Thanks,
> >>> Kyrill
> >> As I was getting ready to push this I noticed I didn't add any skip-ifs
> >> to prevent this failing with specific target options. So here's a new
> >> version with those.
> >>
> >> Still OK?
> >>
> > Hi,
> >
> > This is not sufficient to skip arm-linux-gnueabi* configs built with
> > non-default cpu/fpu.
> >
> > For instance, with arm-linux-gnueabihf --with-cpu=cortex-a9
> > --with-fpu=neon-fp16 --with-float=hard
> > I see:
> > FAIL: gcc.target/arm/pr95646.c (test for excess errors)
> > Excess errors:
> > cc1: error: ARMv8-M Security Extensions incompatible with selected FPU
> > cc1: error: target CPU does not support ARM mode
> >
> > and the testcase is compiled with -mcpu=cortex-m23 -mcmse -Os
> Resending as I don't think my earlier one made it to the lists (sorry if
> you are receiving this double!)
>
> I'm not following this, before I go off and try to reproduce it, what do
> you mean by 'the testcase is compiled with -mcpu=cortex-m23 -mcmse -Os'?
> These are the options you are seeing in the log file? Surely they should
> override the default options? Only thing I can think of is this might
> need an extra -mfloat-abi=soft to make sure it overrides the default
> float-abi.  Could you give that a try?

No it doesn't make a difference alone.

I also had to add:
-mfpu=auto (that clears the above warning)
-mthumb otherwise we now get cc1: error: target CPU does not support ARM mode

Looks like some effective-target machinery is needed

Christophe


>
> Cheers,
> Andre
> >
> > Christophe
> >
> >> Cheers,
> >> Andre
>  Cheers,
>  Andre
> 
>  gcc/ChangeLog:
>  2020-06-22  Andre Vieira  
> 
> PR target/95646
> * config/arm/arm.c: 
>  (cmse_nonsecure_entry_clear_before_return):
>  Use 'callee_saved_reg_p' instead of
> 'calL_used_or_fixed_reg_p'.
> 
>  gcc/testsuite/ChangeLog:
>  2020-06-22  Andre Vieira  
> 
> PR target/95646
> * gcc.target/arm/pr95646.c: New test.

[PATCH 1/7 v6] ifn/optabs: Support vector load/store with length

2020-06-29 Thread Kewen.Lin via Gcc-patches

Hi Richard S./Richi/Jim/Segher,

Thanks a lot for your comments to make this patch more solid.

Based on our discussion, for the vector load/store with length
optab, the length unit would be measured in lanes by default.
For the targets which support length measured in bytes like Power,
they should only define VnQI modes to wrap the other same size
vector modes.  If the length is larger than total lane/byte count
of the given mode, it's taken to load all lanes/bytes implicitly.
For the remaining lanes/bytes which isn't specified by length,
they would be taken as undefined value.  For length in bytes,
it's required that the byte count should be a multiple of the
element size (wrapped vector), otherwise it's undefined.

This patch has been updated as attached.

2/7 for rs6000 optab defintion has been updated to use V16QI.
5/7 for vectorizer change has been updated accordingly.

-

v6: Updated optab descriptions.

v5:
  - Updated lenload/lenstore optab to len_load/len_store and the docs.
  - Rename expand_mask_{load,store}_optab_fn to 
expand_partial_{load,store}_optab_fn
  - Added/updated macros for expand_mask_{load,store}_optab_fn
and expand_len_{load,store}_optab_fn

v4: Update len_load_direct/len_store_direct to align with direct optab.

v3: Get rid of length mode hook.

BR,
Kewen
-
gcc/ChangeLog:

2020-MM-DD  Kewen Lin  

* doc/md.texi (len_load_@var{m}): Document.
(len_store_@var{m}): Likewise.
* internal-fn.c (len_load_direct): New macro.
(len_store_direct): Likewise.
(expand_len_load_optab_fn): Likewise.
(expand_len_store_optab_fn): Likewise.
(direct_len_load_optab_supported_p): Likewise.
(direct_len_store_optab_supported_p): Likewise.
(expand_mask_load_optab_fn): New macro.  Original renamed to ...
(expand_partial_load_optab_fn): ... here.  Add handlings for
len_load_optab.
(expand_mask_store_optab_fn): New macro.  Original renamed to ...
(expand_partial_store_optab_fn): ... here. Add handlings for
len_store_optab.
(internal_load_fn_p): Handle IFN_LEN_LOAD.
(internal_store_fn_p): Handle IFN_LEN_STORE.
(internal_fn_stored_value_index): Handle IFN_LEN_STORE.
* internal-fn.def (LEN_LOAD): New internal function.
(LEN_STORE): Likewise.
* optabs.def (len_load_optab, len_store_optab): New optab.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2c67c818da5..690c384ff66 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5167,6 +5167,34 @@ mode @var{n}.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{len_load_@var{m}} instruction pattern
+@item @samp{len_load_@var{m}}
+Load the number of units specified by operand 2 from memory operand 1
+into register operand 0, setting the other bytes of operand 0 to
+undefined values.  Operands 0 and 1 have mode @var{m}.  Operand 2 has
+whichever integer mode the target prefers.  If operand 2 exceeds the
+maximum units of mode @var{m}, it will be set to the maximum units of
+mode @var{m}.  For targets which support length measured in bytes,
+they should only define VnQI mode to wrap the other vector modes with
+the same size.  Meanwhile, it's required that the byte count should
+be a multiple of the element size (wrapped vector).
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{len_store_@var{m}} instruction pattern
+@item @samp{len_store_@var{m}}
+Store the number of units specified by operand 2 from nonmemory operand 1
+into memory operand 0, leaving the other bytes of operand 0 unchanged.
+Operands 0 and 1 have mode @var{m}.  Operand 2 has whichever integer
+mode the target prefers.  If operand 2 exceeds the maximum units of mode
+@var{m}, it will be set to the maximum units of mode @var{m}.  For
+targets which support length measured in bytes, they should only define
+VnQI mode to wrap the other vector modes with the same size.  Meanwhile,
+it's required that the byte count should be a multiple of the element
+size (wrapped vector).
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{vec_perm@var{m}} instruction pattern
 @item @samp{vec_perm@var{m}}
 Output a (variable) vector permutation.  Operand 0 is the destination
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 4f088de48d5..1e53ced60eb 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -104,10 +104,12 @@ init_internal_fns ()
 #define load_lanes_direct { -1, -1, false }
 #define mask_load_lanes_direct { -1, -1, false }
 #define gather_load_direct { 3, 1, false }
+#define len_load_direct { -1, -1, false }
 #define mask_store_direct { 3, 2, false }
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
 #define scatter_store_direct { 3, 1, false }
+#define len_store_direct { 3, 3, false }
 #define unary_direct { 0, 0, true }
 #define binary_direct { 0, 0, true }
 #define ternary_direct { 0, 0, true }
@@ -2478,10 +2480,10 @@ expand_call_mem_ref

Re: [PATCH 1/7 v6] ifn/optabs: Support vector load/store with length

2020-06-29 Thread Richard Sandiford

Thanks for the update.  I agree with the summary of the IRC discussion
except for…

"Kewen.Lin"  writes:
> Hi Richard S./Richi/Jim/Segher,
>
> Thanks a lot for your comments to make this patch more solid.
>
> Based on our discussion, for the vector load/store with length
> optab, the length unit would be measured in lanes by default.
> For the targets which support length measured in bytes like Power,
> they should only define VnQI modes to wrap the other same size
> vector modes.  If the length is larger than total lane/byte count
> of the given mode, it's taken to load all lanes/bytes implicitly.

…this last bit.  IMO the behaviour of the optab should be undefined
when the supplied length is greater than the number of lanes.

I think that also makes things better for the lxvl implementation,
which ignores the upper 56 bits of the length.  It sounds like the
above semantics would instead require Power to saturate the value
at 255 before shifting it.

Richard

> For the remaining lanes/bytes which isn't specified by length,
> they would be taken as undefined value.  For length in bytes,
> it's required that the byte count should be a multiple of the
> element size (wrapped vector), otherwise it's undefined.
>
> This patch has been updated as attached.
>
> 2/7 for rs6000 optab defintion has been updated to use V16QI.
> 5/7 for vectorizer change has been updated accordingly.
>
> -
>
> v6: Updated optab descriptions.
>
> v5:
>   - Updated lenload/lenstore optab to len_load/len_store and the docs.
>   - Rename expand_mask_{load,store}_optab_fn to 
> expand_partial_{load,store}_optab_fn
>   - Added/updated macros for expand_mask_{load,store}_optab_fn
> and expand_len_{load,store}_optab_fn
>
> v4: Update len_load_direct/len_store_direct to align with direct optab.
>
> v3: Get rid of length mode hook.
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> 2020-MM-DD  Kewen Lin  
>
>   * doc/md.texi (len_load_@var{m}): Document.
>   (len_store_@var{m}): Likewise.
>   * internal-fn.c (len_load_direct): New macro.
>   (len_store_direct): Likewise.
>   (expand_len_load_optab_fn): Likewise.
>   (expand_len_store_optab_fn): Likewise.
>   (direct_len_load_optab_supported_p): Likewise.
>   (direct_len_store_optab_supported_p): Likewise.
>   (expand_mask_load_optab_fn): New macro.  Original renamed to ...
>   (expand_partial_load_optab_fn): ... here.  Add handlings for
>   len_load_optab.
>   (expand_mask_store_optab_fn): New macro.  Original renamed to ...
>   (expand_partial_store_optab_fn): ... here. Add handlings for
>   len_store_optab.
>   (internal_load_fn_p): Handle IFN_LEN_LOAD.
>   (internal_store_fn_p): Handle IFN_LEN_STORE.
>   (internal_fn_stored_value_index): Handle IFN_LEN_STORE.
>   * internal-fn.def (LEN_LOAD): New internal function.
>   (LEN_STORE): Likewise.
>   * optabs.def (len_load_optab, len_store_optab): New optab.

[PATCH 5/7 v6] vect: Support vector load/store with length in vectorizer

2020-06-29 Thread Kewen.Lin via Gcc-patches

Hi,

v6 changes against v5:
  - As len_load/store optab changes, added function can_vec_len_load_store_p
and vect_get_same_size_vec_for_len.
  - Updated several places like vectoriable_load/store for optab changes.

v5 changes against v4:
  - Updated the conditions of clearing LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P
in vectorizable_condition (which fixed the aarch reg failure).
  - Rebased and updated some macro and function names as the
renaming/refactoring patch.
  - Updated some comments and dumpings.

v4 changes against v3:
  - split out some renaming and refactoring.
  - use QImode for length.
  - update the iv type determination.
  - introduce factor into rgroup_controls.
  - use using_partial_vectors_p for both approaches.

Bootstrapped/regtested on aarch64-linux-gnu and powerpc64le-linux-gnu P9.
Even with explicit vect-with-length-scope settings 1/2, I didn't find
any remarkable failures (only some trivial test case issues).

Is it ok for trunk?

BR,
Kewen

gcc/ChangeLog

* doc/invoke.texi (vect-with-length-scope): Document new option.
* optabs-query.c (can_vec_len_load_store_p): New function.
* optabs-query.h (can_vec_len_load_store_p): New declare.
* params.opt (vect-with-length-scope): New.
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): Add the
handlings for vectorization using length-based partial vectors, call
vect_gen_len for length generation.
(vect_set_loop_condition_partial_vectors): Add the handlings for
vectorization using length-based partial vectors.
(vect_do_peeling): Allow remaining eiters less than epilogue vf for
LOOP_VINFO_USING_PARTIAL_VECTORS_P.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Init
epil_using_partial_vectors_p.
(_loop_vec_info::~_loop_vec_info): Call release_vec_loop_controls
for lengths destruction.
(vect_verify_loop_lens): New function.
(vect_analyze_loop_2): Add the check to allow only one vectorization
approach using partial vectorization at the same time.  Check
loop-wide reasons using length-based partial vectors decision.  Mark
LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P if the epilogue is
considerable to use length-based approach.  Call
release_vec_loop_controls for lengths destruction.
(vect_analyze_loop): Add handlings for epilogue of loop when it's
marked to use vectorization using partial vectors.
(vect_estimate_min_profitable_iters): Adjust for loop vectorization
using length-based partial vectors.
(vect_record_loop_mask): Init factor to 1 for vectorization using
mask-based partial vectors.
(vect_record_loop_len): New function.
(vect_get_loop_len): New function.
* tree-vect-stmts.c (check_load_store_for_partial_vectors): Add
checks for vectorization using length-based partial vectors.
(vect_get_same_size_vec_for_len): New function.
(vectorizable_store): Add handlings when using length-based partial
vectors.
(vectorizable_load): Likewise.
(vectorizable_condition): Add some checks to disable vectorization
using partial vectors for reduction.
(vect_gen_len): New function.
* tree-vectorizer.h (struct rgroup_controls): Add field factor
mainly for length-based partial vectors.
(vec_loop_lens): New typedef.
(_loop_vec_info): Add lens and epil_using_partial_vectors_p.
(LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P): New macro.
(LOOP_VINFO_LENS): Likewise.
(LOOP_VINFO_FULLY_WITH_LENGTH_P): Likewise.
(vect_record_loop_len): New declare.
(vect_get_loop_len): Likewise.
(vect_gen_len): Likewise.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 06a04e3d7dd..284c15705ea 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13389,6 +13389,13 @@ by the copy loop headers pass.
 @item vect-epilogues-nomask
 Enable loop epilogue vectorization using smaller vector size.
 
+@item vect-with-length-scope
+Control the scope of vector memory access with length exploitation.  0 means we
+don't expliot any vector memory access with length, 1 means we only exploit
+vector memory access with length for those loops whose iteration number are
+less than VF, such as very small loop or epilogue, 2 means we want to exploit
+vector memory access with length for any loops if possible.
+
 @item slp-max-insns-in-bb
 Maximum number of instructions in basic block to be
 considered for SLP vectorization.
diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
index 215d68e4225..9c351759204 100644
--- a/gcc/optabs-query.c
+++ b/gcc/optabs-query.c
@@ -606,6 +606,60 @@ can_vec_mask_load_store_p (machine_mode mode,
   return false;
 }
 
+/* Return true if target supports vector load/store with length for vector
+   mode MODE.  There are two flavors for vector load/store

Re: [RFC patch] Clean all (sub)?module files

2020-06-29 Thread dhumieres . dominique


Le 2020-06-27 13:34, Thomas Koenig a écrit :

Hi Dominieque,

While investigating pr95538, I see several module files that were not 
cleaned.



Several were cleaned by a patch I had in my working directory.
However new ones were not cleaned (e.g., gfortran.dg/pr95091.f90) due 
to continuation lines.




This is now fixed with the attached patch (patch-mod).


Thanks for working on this.

My problem is that I my dejagnu-fu is almost nonexistent, so I could,
in theory, review and commit this, but I do not really understand what
you did.  So, maybe if somebody more knowledgable about this could
this could comment on


Hi Thomas,

Thanks for having a look. The patch is for two Tcl procedures, thus have
very little to do with dejagnu.

It is basically related to Tcl and regular expressions.

If you want to refresh your Tcl you may want to look, e.g.,

http://www.tcl-lang.org/man/tcl8.6/TclCmd/contents.htm

The first part of the patch for the proc 'list-module-names-1'
extends the existing procedure to handle additional-sources
and aux-modules. In order to avoid too long lines, I have
split the initial pattern 'pat' in four pieces.

The second part of the patch is for the proc 'f90grep' (borrowed from 
dejagnu
and renamed from igrep, as in grep -i) in order to handle (sun)?module 
with

free-form continuation lines.

I am fully aware that some user may find a way to break the proposed 
logic,
however it does not introduce new Tcl error, nor new failure in the test 
suite,

and it works as expected for the present test suite.

I can take the responsibility of the patch if there is nobody to review 
it

and I'll do my best if a new test introduce a regression.

Cheers,

Dominique

PS. Handling continuations in fixed-form is certainly doable, but more 
complicated
since you need to keep track of the previous line and it is not needed 
at the moment.




https://gcc.gnu.org/pipermail/fortran/2020-June/054533.html ?

Best regards

Thomas

[COMMITTED] sparc: Remove register storage class in sparc.c

2020-06-29 Thread Rainer Orth

The switch to C++17 broke SPARC bootstrap:

/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:8887:34: error: ISO C++17 
does not allow 'register' storage class specifier [-Werror=register]
 8887 | epilogue_renumber (register rtx *where, int test)
  |  ^
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c: In function 'int 
epilogue_renumber(rtx_def**, int)':
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:8889:24: error: ISO C++17 
does not allow 'register' storage class specifier [-Werror=register]
 8889 |   register const char *fmt;
  |^~~
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:8890:16: error: ISO C++17 
does not allow 'register' storage class specifier [-Werror=register]
 8890 |   register int i;
  |^
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:8891:26: error: ISO C++17 
does not allow 'register' storage class specifier [-Werror=register]
 8891 |   register enum rtx_code code;
  |  ^~~~
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:8948:17: error: ISO C++17 
does not allow 'register' storage class specifier [-Werror=register]
 8948 |register int j;
  | ^
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c: In function 'void 
sparc_print_operand_address(std::FILE*, machine_mode, rtx)':
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:9671:16: error: ISO C++17 
does not allow 'register' storage class specifier [-Werror=register]
 9671 |   register rtx base, index = 0;
  |^~~~
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:9671:22: error: ISO C++17 
does not allow 'register' storage class specifier [-Werror=register]
 9671 |   register rtx base, index = 0;
  |  ^
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:9673:16: error: ISO C++17 
does not allow 'register' storage class specifier [-Werror=register]
 9673 |   register rtx addr = x;
  |^~~~
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c: At global scope:
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:9807:32: error: ISO C++17 
does not allow 'register' storage class specifier [-Werror=register]
 9807 | sparc_type_code (register tree type)
  |^~~~
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c: In function 'long 
unsigned int sparc_type_code(tree)':
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:9809:26: error: ISO C++17 
does not allow 'register' storage class specifier [-Werror=register]
 9809 |   register unsigned long qualifiers = 0;
  |  ^~
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:9810:21: error: ISO C++17 
does not allow 'register' storage class specifier [-Werror=register]
 9810 |   register unsigned shift;
  | ^
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c: In function 'int 
set_extends(rtx_insn*)':
/vol/gcc/src/hg/master/local/gcc/config/sparc/sparc.c:10306:16: error: ISO 
C++17 does not allow 'register' storage class specifier [-Werror=register]
10306 |   register rtx pat = PATTERN (insn);
  |^~~

Fixed by removing the register keyword.  Bootstrapped on
sparc-sun-solaris2.11.  Installed as obvious.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2020-06-28  Rainer Orth  

* config/sparc/sparc.c (epilogue_renumber): Remove register.
(sparc_print_operand_address): Likewise.
(sparc_type_code): Likewise.
(set_extends): Likewise.

# HG changeset patch
# Parent  30ae6f30133e6cba64c50970a215f2578bc18897
sparc: Remove register storage class in sparc.c

diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -8884,11 +8884,11 @@ output_v9branch (rtx op, rtx dest, int r
  */
 
 static int
-epilogue_renumber (register rtx *where, int test)
-{
-  register const char *fmt;
-  register int i;
-  register enum rtx_code code;
+epilogue_renumber (rtx *where, int test)
+{
+  const char *fmt;
+  int i;
+  enum rtx_code code;
 
   if (*where == 0)
 return 0;
@@ -8945,7 +8945,7 @@ epilogue_renumber (register rtx *where, 
 {
   if (fmt[i] == 'E')
 	{
-	  register int j;
+	  int j;
 	  for (j = XVECLEN (*where, i) - 1; j >= 0; j--)
 	if (epilogue_renumber (&(XVECEXP (*where, i, j)), test))
 	  return 1;
@@ -9668,9 +9668,9 @@ sparc_print_operand (FILE *file, rtx x, 
 static void
 sparc_print_operand_address (FILE *file, machine_mode /*mode*/, rtx x)
 {
-  register rtx base, index = 0;
+  rtx base, index = 0;
   int offset = 0;
-  register rtx addr = x;
+  rtx addr = x;
 
   if (REG_P (addr))
 fputs (reg_names[REGNO (addr)], file);
@@ -9804,10 +9804,10 @@ sparc_assemble_integer (rtx x,

[PATCH 2/7 v5] rs6000: lenload/lenstore optab support

2020-06-29 Thread Kewen.Lin via Gcc-patches

Hi,

V5: Like V4.

V4: Update define_expand names as optab name changes.

V3: Update the define_expand as optab changes.

BR,
Kewen
--
gcc/ChangeLog:

2020-MM-DD  Kewen Lin  

* config/rs6000/vsx.md (len_load_v16qi): New define_expand.
(len_store_v16qi): Likewise.

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 2a28215ac5b..fe85f60c681 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5082,6 +5082,34 @@ (define_expand "stxvl"
   operands[3] = gen_reg_rtx (DImode);
 })
 
+;; Define optab for vector access with length vectorization exploitation.
+(define_expand "len_load_v16qi"
+  [(match_operand:V16QI 0 "vlogical_operand")
+   (match_operand:V16QI 1 "memory_operand")
+   (match_operand:QI 2 "gpc_reg_operand")]
+  "TARGET_P9_VECTOR && TARGET_64BIT"
+{
+  rtx mem = XEXP (operands[1], 0);
+  mem = force_reg (DImode, mem);
+  rtx len = gen_lowpart (DImode, operands[2]);
+  emit_insn (gen_lxvl (operands[0], mem, len));
+  DONE;
+})
+
+(define_expand "len_store_v16qi"
+  [(match_operand:V16QI 0 "memory_operand")
+   (match_operand:V16QI 1 "vlogical_operand")
+   (match_operand:QI 2 "gpc_reg_operand")
+  ]
+  "TARGET_P9_VECTOR && TARGET_64BIT"
+{
+  rtx mem = XEXP (operands[0], 0);
+  mem = force_reg (DImode, mem);
+  rtx len = gen_lowpart (DImode, operands[2]);
+  emit_insn (gen_stxvl (operands[1], mem, len));
+  DONE;
+})
+
 (define_insn "*stxvl"
   [(set (mem:V16QI (match_operand:DI 1 "gpc_reg_operand" "b"))
(unspec:V16QI

Re: [PATCH PR95700] Use nullptr instead of NULL as a sentinel value

2020-06-29 Thread Richard Biener via Gcc-patches

On Sat, Jun 27, 2020 at 2:06 PM Richard Sandiford
 wrote:
>
> Ilya Leoshkevich via Gcc-patches  writes:
> > Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
> > and s390x-redhat-linux.
>
> Agree we should do this FWIW, but as a belt-and-braces fix, would it
> make sense to define NULL to nullptr in system.h for all hosts?
>
> Currently we have:
>
> /* Define a generic NULL if one hasn't already been defined.  */
> #ifndef NULL
> #define NULL 0
> #endif
>
> which we might be able to change to:
>
> #undef NULL
> #define NULL nullptr
>
> The current position is probably too early though.  I think it should
> instead be after all system headers have been included, so that there's
> no chance of a multiple definition error, and no risk that our definition
> confuses the system headers.

Good idea, IMHO this should be done before this big patch (or we can
even go without it to reduce backporting issues).

> This of course relies on files sticking to the “don't include system
> headers directly“ rule, and using INCLUDE_* macros instead.  (Which
> isn't pretty, but that's where we are…)

Yeah.

Richard.

> Thanks,
> Richard

Re: [PATCH][GCC][Arm] PR target/95646: Do not clobber callee saved registers with CMSE

2020-06-29 Thread Andre Vieira (lists)




On 23/06/2020 21:52, Christophe Lyon wrote:

On Tue, 23 Jun 2020 at 15:28, Andre Vieira (lists)
 wrote:

On 23/06/2020 13:10, Kyrylo Tkachov wrote:

-Original Message-
From: Andre Vieira (lists) 
Sent: 22 June 2020 09:52
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov 
Subject: [PATCH][GCC][Arm] PR target/95646: Do not clobber callee saved
registers with CMSE

Hi,

As reported in bugzilla when the -mcmse option is used while compiling
for size (-Os) with a thumb-1 target the generated code will clear the
registers r7-r10. These however are callee saved and should be preserved
accross ABI boundaries. The reason this happens is because these
registers are made "fixed" when optimising for size with Thumb-1 in a
way to make sure they are not used, as pushing and popping hi-registers
requires extra moves to and from LO_REGS.

To fix this, this patch uses 'callee_saved_reg_p', which accounts for
this optimisation, instead of 'call_used_or_fixed_reg_p'. Be aware of
'callee_saved_reg_p''s definition, as it does still take call used
registers into account, which aren't callee_saved in my opinion, so it
is a rather misnoemer, works in our advantage here though as it does
exactly what we need.

Regression tested on arm-none-eabi.

Is this OK for trunk? (Will eventually backport to previous versions if
stable.)

Ok.
Thanks,
Kyrill

As I was getting ready to push this I noticed I didn't add any skip-ifs
to prevent this failing with specific target options. So here's a new
version with those.

Still OK?


Hi,

This is not sufficient to skip arm-linux-gnueabi* configs built with
non-default cpu/fpu.

For instance, with arm-linux-gnueabihf --with-cpu=cortex-a9
--with-fpu=neon-fp16 --with-float=hard
I see:
FAIL: gcc.target/arm/pr95646.c (test for excess errors)
Excess errors:
cc1: error: ARMv8-M Security Extensions incompatible with selected FPU
cc1: error: target CPU does not support ARM mode

and the testcase is compiled with -mcpu=cortex-m23 -mcmse -Os
I'm not following this, before I go off and try to reproduce it, what do 
you mean by 'the testcase is compiled with -mcpu=cortex-m23 -mcmse -Os'? 
These are the options you are seeing in the log file? Surely they should 
override the default options? Only thing I can think of is this might 
need an extra -mfloat-abi=soft to make sure it overrides the default 
float-abi.  Could you give that a try?


Cheers,
Andre


Christophe


Cheers,
Andre

Cheers,
Andre

gcc/ChangeLog:
2020-06-22  Andre Vieira  

   PR target/95646
   * config/arm/arm.c: (cmse_nonsecure_entry_clear_before_return):
Use 'callee_saved_reg_p' instead of
   'calL_used_or_fixed_reg_p'.

gcc/testsuite/ChangeLog:
2020-06-22  Andre Vieira  

   PR target/95646
   * gcc.target/arm/pr95646.c: New test.

Re: PSA: Default C++ dialect is now C++17

2020-06-29 Thread Martin Liška


On 6/26/20 9:34 PM, Marek Polacek via Gcc-patches wrote:

As discussed last month:

it's time to change the C++ default to gnu++17.  I've committed the patch after
testing x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu.  Brace 
yourselves!

Marek



Just a small note that 510.parest_r SPEC 2017 benchmark can't be built now
with default changed to -std=c++17. The spec config needs to be adjusted.

Martin

RE: [PATCH PR95854] ICE in find_bswap_or_nop_1 of pass store-merging

2020-06-29 Thread Richard Biener

On Sun, 28 Jun 2020, zhoukaipeng (A) wrote:

> Hi,
> 
> Thanks for your good suggestions!
> 
> This patch was remade and attached.  Does the v2 patch look better?
> 
> Bootstrap and new testcase tested on aarch64 & x86 Linux platform.

OK.

Thanks,
Richard.

RE: [PATCH PR95854] ICE in find_bswap_or_nop_1 of pass store-merging

2020-06-29 Thread zhoukaipeng (A)

> On Sun, 28 Jun 2020, zhoukaipeng (A) wrote:
> 
> > Hi,
> >
> > Thanks for your good suggestions!
> >
> > This patch was remade and attached.  Does the v2 patch look better?
> >
> > Bootstrap and new testcase tested on aarch64 & x86 Linux platform.
> 
> OK.
> 
> Thanks,
> Richard.

Thanks for reviewing this.  Could you please help install it?

Kaipeng Zhou

Re: [PATCH] Add TARGET_LOWER_LOCAL_DECL_ALIGNMENT [PR95237]

2020-06-29 Thread Richard Biener via Gcc-patches

On Fri, Jun 26, 2020 at 10:11 PM H.J. Lu  wrote:
>
> On Thu, Jun 25, 2020 at 1:10 AM Richard Biener
>  wrote:
> >
> > On Thu, Jun 25, 2020 at 2:53 AM Sunil Pandey  wrote:
> > >
> > > On Wed, Jun 24, 2020 at 12:30 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Jun 23, 2020 at 5:31 PM Sunil K Pandey via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > From: Sunil K Pandey 
> > > > >
> > > > > Default for this hook is NOP. For x86, in 32 bit mode, this hook
> > > > > sets alignment of long long on stack to 32 bits if preferred stack
> > > > > boundary is 32 bits.
> > > > >
> > > > >  - This patch fixes
> > > > > gcc.target/i386/pr69454-2.c
> > > > > gcc.target/i386/stackalign/longlong-1.c
> > > > >  - Regression test on x86-64, no new fail introduced.
> > > >
> > > > I think the name is badly chosen, TARGET_LOWER_LOCAL_DECL_ALIGNMENT
> > >
> > > Yes, I can change the target hook name.
> > >
> > > > would be better suited (and then asks for LOCAL_DECL_ALIGNMENT to be
> > > > renamed to INCREASE_LOCAL_DECL_ALIGNMENT).
> > >
> > > It seems like LOCAL_DECL_ALIGNMENT macro documentation is incorrect.
> > > It increases as well as decreases alignment based on condition(-m32
> > > -mpreferred-stack-boundary=2)
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95885
> > >
> > > >
> > > > You're calling it from do_type_align which IMHO is dangerous since 
> > > > that's
> > > > invoked from FIELD_DECL layout as well.  Instead invoke it from
> > > > layout_decl itself where we do
> > > >
> > > >   if (code != FIELD_DECL)
> > > > /* For non-fields, update the alignment from the type.  */
> > > > do_type_align (type, decl);
> > > >
> > > > and invoke the hook _after_ do_type_align.  Also avoid
> > > > invoking the hook on globals or hard regs and only
> > > > invoke it on VAR_DECLs, thus only
> > > >
> > > >   if (VAR_P (decl) && !is_global_var (decl) && !DECL_HARD_REGISTER 
> > > > (decl))
> > >
> > > It seems like decl property is not fully populated at this point call
> > > to is_global_var (decl) on global variable return false.
> > >
> > > $ cat foo.c
> > > long long x;
> > > int main()
> > > {
> > > if (__alignof__(x) != 8)
> > >   __builtin_abort();
> > > return 0;
> > > }
> > >
> > > Breakpoint 1, layout_decl (decl=0x77ffbb40, known_align=0)
> > > at /local/skpandey/gccwork/gccwork/gcc/gcc/stor-layout.c:674
> > > 674 do_type_align (type, decl);
> > > Missing separate debuginfos, use: dnf debuginfo-install
> > > gmp-6.1.2-10.fc31.x86_64 isl-0.16.1-9.fc31.x86_64
> > > libmpc-1.1.0-4.fc31.x86_64 mpfr-3.1.6-5.fc31.x86_64
> > > zlib-1.2.11-20.fc31.x86_64
> > > (gdb) call debug_tree(decl)
> > >   > > type  > > size 
> > > unit-size 
> > > align:64 warn_if_not_align:0 symtab:0 alias-set -1
> > > canonical-type 0x7fffea801888 precision:64 min  > > 0x7fffea7e8fd8 -9223372036854775808> max  > > 9223372036854775807>
> > > pointer_to_this >
> > > DI foo.c:1:11 size  unit-size
> > > 
> > > align:1 warn_if_not_align:0>
> > >
> > > (gdb) p is_global_var(decl)
> > > $1 = false
> > > (gdb)
> > >
> > >
> > > What about calling hook here
> > >
> > >  603 do_type_align (tree type, tree decl)
> > >  604 {
> > >  605   if (TYPE_ALIGN (type) > DECL_ALIGN (decl))
> > >  606 {
> > >  607   SET_DECL_ALIGN (decl, TYPE_ALIGN (type));
> > >  608   if (TREE_CODE (decl) == FIELD_DECL)
> > >  609 DECL_USER_ALIGN (decl) = TYPE_USER_ALIGN (type);
> > >  610   else
> > >  611 /* Lower local decl alignment */
> > >  612 if (VAR_P (decl)
> > >  613 && !is_global_var (decl)
> > >  614 && !DECL_HARD_REGISTER (decl)
> > >  615 && cfun != NULL)
> > >  616   targetm.lower_local_decl_alignment (decl);
> > >  617 }
> >
> > But that doesn't change anything (obviously).  layout_decl
> > is called quite early, too early it looks like.
> >
> > Now there doesn't seem to be any other good place where
> > we are sure to catch the decl before we evaluate things
> > like __alignof__
> >
> > void __attribute__((noipa))
> > foo (__SIZE_TYPE__ align, long long *p)
> > {
> >   if ((__SIZE_TYPE__)p & (align-1))
> > __builtin_abort ();
> > }
> > int main()
> > {
> >   long long y;
> >   foo (_Alignof y, );
> >   return 0;
> > }
> >
> > Joseph/Jason - do you have a good recommendation
> > how to deal with targets where natural alignment
> > is supposed to be lowered for optimization purposes?
> > (this case is for i?86 to avoid dynamic stack re-alignment
> > to align long long to 8 bytes with -mpreferred-stack-boundary=2)
> >
> > I note that for -mincoming-stack-boundary=2 we do perform
> > dynamic stack re-alignment already.
> >
> > I can't find a suitable existing target macro/hook for this,
> > but my gut feeling is that the default alignment should
> > instead be the lower one and instead the alignment for
> > globals should be raised as optimization?
> >
>
> Here is the updated patch from

Re: [PATCH] underline null argument in -Wnonnull (PR c++/86568)

2020-06-29 Thread Andreas Schwab

This breaks bootstrap:

In static member function 'static Expression* Type::type_descriptor(Gogo*, 
Type*)',
inlined from 'virtual Expression* Named_type::do_type_descriptor(Gogo*, 
Named_type*)' at ../../gcc/go/gofrontend/types.cc:4:53,
inlined from 'virtual Expression* Named_type::do_type_descriptor(Gogo*, 
Named_type*)' at ../../gcc/go/gofrontend/types.cc:11105:1:
../../gcc/go/gofrontend/types.cc:1474:34: error: 'this' pointer null 
[-Werror=nonnull]
 1474 |   return type->do_type_descriptor(gogo, NULL);
  |  ^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [../../gcc/go/Make-lang.in:242: go/types.o] Error 1

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: [PATCH][GCC][Arm] PR target/95646: Do not clobber callee saved registers with CMSE

2020-06-29 Thread Andre Vieira (lists)




On 23/06/2020 21:52, Christophe Lyon wrote:

On Tue, 23 Jun 2020 at 15:28, Andre Vieira (lists)
 wrote:

On 23/06/2020 13:10, Kyrylo Tkachov wrote:

-Original Message-
From: Andre Vieira (lists) 
Sent: 22 June 2020 09:52
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov 
Subject: [PATCH][GCC][Arm] PR target/95646: Do not clobber callee saved
registers with CMSE

Hi,

As reported in bugzilla when the -mcmse option is used while compiling
for size (-Os) with a thumb-1 target the generated code will clear the
registers r7-r10. These however are callee saved and should be preserved
accross ABI boundaries. The reason this happens is because these
registers are made "fixed" when optimising for size with Thumb-1 in a
way to make sure they are not used, as pushing and popping hi-registers
requires extra moves to and from LO_REGS.

To fix this, this patch uses 'callee_saved_reg_p', which accounts for
this optimisation, instead of 'call_used_or_fixed_reg_p'. Be aware of
'callee_saved_reg_p''s definition, as it does still take call used
registers into account, which aren't callee_saved in my opinion, so it
is a rather misnoemer, works in our advantage here though as it does
exactly what we need.

Regression tested on arm-none-eabi.

Is this OK for trunk? (Will eventually backport to previous versions if
stable.)

Ok.
Thanks,
Kyrill

As I was getting ready to push this I noticed I didn't add any skip-ifs
to prevent this failing with specific target options. So here's a new
version with those.

Still OK?


Hi,

This is not sufficient to skip arm-linux-gnueabi* configs built with
non-default cpu/fpu.

For instance, with arm-linux-gnueabihf --with-cpu=cortex-a9
--with-fpu=neon-fp16 --with-float=hard
I see:
FAIL: gcc.target/arm/pr95646.c (test for excess errors)
Excess errors:
cc1: error: ARMv8-M Security Extensions incompatible with selected FPU
cc1: error: target CPU does not support ARM mode

and the testcase is compiled with -mcpu=cortex-m23 -mcmse -Os
Resending as I don't think my earlier one made it to the lists (sorry if 
you are receiving this double!)


I'm not following this, before I go off and try to reproduce it, what do 
you mean by 'the testcase is compiled with -mcpu=cortex-m23 -mcmse -Os'? 
These are the options you are seeing in the log file? Surely they should 
override the default options? Only thing I can think of is this might 
need an extra -mfloat-abi=soft to make sure it overrides the default 
float-abi.  Could you give that a try?


Cheers,
Andre


Christophe


Cheers,
Andre

Cheers,
Andre

gcc/ChangeLog:
2020-06-22  Andre Vieira  

   PR target/95646
   * config/arm/arm.c: (cmse_nonsecure_entry_clear_before_return):
Use 'callee_saved_reg_p' instead of
   'calL_used_or_fixed_reg_p'.

gcc/testsuite/ChangeLog:
2020-06-22  Andre Vieira  

   PR target/95646
   * gcc.target/arm/pr95646.c: New test.

Re: [PATCH PR95855]A missing ifcvt optimization to generate fcsel

2020-06-29 Thread Richard Biener via Gcc-patches

On Sun, Jun 28, 2020 at 2:32 PM yangyang (ET)  wrote:
>
> Hi,
>
> This is a simple fix for pr95855.
>
> With this fix, pass_split_paths can recognize the if-conversion 
> opportunity of the testcase and doesn't duplicate the corresponding block.
>
> Added one testcase for this. Bootstrap and tested on both aarch64 and x86 
> Linux platform, no new regression witnessed.
>
> Ok for trunk?

Can you try using the num_stmts_in_pred[12] counts instead of using
empty_block_p?

Your matching doesn't allow for FP constants like

 dmax[0] = d1[i] < 1.0 ? 1.0 : d1[i];

since FP constants are not shared.  You likely want to use
operand_equal_p to do the
PHI argument comparison.

Thanks,
Richard.

> Thanks,
> Yang Yang
>
>
> +2020-06-28  Yang Yang  
> +
> +   PR tree-optimization/95855
> +   * gimple-ssa-split-paths.c (is_feasible_trace): Add extra
> +   checks to recognize a missed if-conversion opportunity when
> +   judging whether to duplicate a block.
> +
>
> +2020-06-28 Yang Yang  
> +
> +   PR tree-optimization/95855
> +   * gcc.dg/tree-ssa/split-paths-12.c: New testcase.
> +

Re: [PATCH] arc: add exceptions for PR92860.

2020-06-29 Thread Martin Liška


On 6/27/20 12:59 AM, Jeff Law wrote:

On Wed, 2020-06-24 at 09:43 +0200, Martin Liška wrote:

Hey.

The patch is about addition of some exceptions for arc target that
address:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92860#c26

It's again another example where optimization options influence target
options.

Ready for master?
Martin

gcc/ChangeLog:

PR tree-optimization/92860
* optc-save-gen.awk: Add exceptions for arc target.

It doens't look like you're explicitly handling the OPT_mmillicode case.  Was
that intentional?  Regardless, this patch is OK as-is or with the millicode 
stuff
added.


Yes, it's a target mask and this one are not checked:

#define MASK_MILLICODE_THUNK_SET (1U << 12)

I'm going to install the patch as is.
Martin



jeff

81 matches

Mail list logo