date:20210819

[Fortran] OG11 backports

2021-08-19 Thread Sandra Loosemore

I've backported several patches having to do with Fortran/C 
interoperability from mainline to the OG11 branch.  See attached log for 
details.


-Sandra
commit d554155c07771935778f557e9ef649cc3624d1ce
Author: Sandra Loosemore 
Date:   Wed Aug 11 19:24:17 2021 -0700

Fortran: Fix c_float128 and c_float128_complex definitions.

gfc_float128_type_node is only non-NULL on targets that support a
128-bit type that is not long double.  Use float128_type_node instead
when computing the value of the kind constants c_float128 and
c_float128_complex from the ISO_C_BINDING intrinsic module; this also
ensures it actually corresponds to __float128 (the IEEE encoding) and
not some other 128-bit floating-point type.

2021-08-11  Sandra Loosemore  

gcc/fortran/
	* iso-c-binding.def (c_float128, c_float128_complex): Check
	float128_type_node instead of gfc_float128_type_node.
	* trans-types.c (gfc_init_kinds, gfc_build_real_type):
	Update comments re supported 128-bit floating-point types.

(cherry picked from commit 58340a7cd3670024bafdbbc6ca63a9af841df98a)

commit 0a8af79817f5c633543d2aa32f4f0385af4cf22c
Author: Tobias Burnus 
Date:   Wed Aug 11 19:18:42 2021 -0700

gfortran: Fix in-build-tree testing [PR101305, PR101660]

ISO_Fortran_binding.h is written in the build dir - hence, a previous commit
added it as include directory for in-build-tree testing.  However,
it turned out that -I$specdir/libgfortran interferes with reading .mod files
as they are then no longer regareded as intrinsic modules.  Solution: Create
an extra include/ directory in the libgfortran build dir and copy
ISO_Fortran_binding.h to that directory.  As -B$specdir/libgfortran already
causes gfortran to read that include subdirectory, the -I flag is no longer
needed.

	PR libfortran/101305
	PR fortran/101660
	PR testsuite/101847

libgfortran/ChangeLog:

	* Makefile.am (ISO_Fortran_binding.h): Create include/ in the build dir
	and copy the include file to it.
	(clean-local): Add for removing the 'include' directory.
	* Makefile.in: Regenerate.

gcc/testsuite/ChangeLog:

	* lib/gfortran.exp (gfortran_init): Remove -I$specpath/libgfortran
	from the string used to set GFORTRAN_UNDER_TEST.

(cherry picked from commit 2ba0376ac40447ce7ee09fcef00511c18db25dfa)

commit d2b1fbc8a159a465ce6114723301721808972a6e
Author: Tobias Burnus 
Date:   Mon Aug 9 12:35:23 2021 +0200

testsuite/lib/gfortran.exp: Add -I for ISO*.h [PR101305, PR101660]

This patch adds -I$specdir/libgfortran to GFORTRAN_UNDER_TEST, when
set by proc gfortran_init. As the $specdir depends on the multilib
setting, it has to be re-set for a different multilib; hence, we track
whether a previous call to gfortran_init set that var or whether it
was set differently.

gcc/testsuite/
	PR libfortran/101305
	PR fortran/101660

	* lib/gfortran.exp (gfortran_init): Add -I $specdir/libgfortran to
	GFORTRAN_UNDER_TEST; update it when set by previous gfortran_init call.
	* gfortran.dg/ISO_Fortran_binding_1.c: Use <...> not "..." for
	ISO_Fortran_binding.h's #include.
	* gfortran.dg/ISO_Fortran_binding_10.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_11.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_12.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_15.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_16.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_17.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_18.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_3.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_5.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_6.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_7.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_8.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_9.c: Likewise.
	* gfortran.dg/PR94327.c: Likewise.
	* gfortran.dg/PR94331.c: Likewise.
	* gfortran.dg/bind_c_array_params_3_aux.c: Likewise.
	* gfortran.dg/iso_fortran_binding_uint8_array_driver.c: Likewise.
	* gfortran.dg/pr93524.c: Likewise.

(cherry picked from commit 527a1cf32c27a3fbeaf6be7596241570d864cc4c)

commit 5084c7d199d149cb58a3c41aae4ed9e97ef9ad31
Author: Sandra Loosemore 
Date:   Wed Aug 11 18:57:34 2021 -0700

Bind(c): Improve error checking in CFI_* functions

This patch adds additional run-time checking for invalid arguments to
CFI_establish and CFI_setpointer.  It also changes existing messages
throughout the CFI_* functions to use PRIiPTR to format CFI_index_t
values instead of casting them to int and using %d (which may not work
on targets where int is a smaller type), simplifies wording of some
messages, and fixes issues with capitalization, typos, and the like.
Additionally some coding standards problems such as >80 character lines

Re: [PATCH] more warning code refactoring

2021-08-19 Thread Kewen.Lin via Gcc-patches

Hi Martin,

on 2021/8/20 上午12:30, Martin Sebor wrote:
> On 8/19/21 9:03 AM, Martin Sebor wrote:
>> On 8/18/21 11:56 PM, Kewen.Lin wrote:
>>> Hi David,
>>>
>>> on 2021/8/19 上午11:26, David Edelsohn via Gcc-patches wrote:
 Hi, Martin

 A few PowerPC-specific testcases started failing yesterday on AIX with
 a strange failure mode: the compiler runs out of memory.  As you may
 expect from telling you this in an email reply to your patch, I have
 bisected the failure and landed on your commit.  I can alternate
 between the previous commit and your commit, and the failure
 definitely appears with your patch, although I'm unsure how your patch
 affected memory allocation in the compiler.  Maybe moving the code
 changed a type of allocation or some memory no longer is being freed?

>>>
>>>
>>> To get rid of GTY variable alloc_object_size_limit looks suspicious,
>>> maybe tree objects returned by alloc_max_size after the change are out
>>> of GC's tracking?
>>>
>>> If the suspicion holds, the attached explorative diff may help.
>>
>> I wouldn't expect that to make a difference.  There are thousands
>> of similar calls to build_int_cst() throughout the middle end.
>>
>> Looking at the original patch, the change that I'm not sure about
>> and that shouldn't have been part of the refactoring is the call
>> to enable_ranger() in pass_waccess::execute().  It's something
>> I was planning to do next.  But even that I wouldn't expect to
>> eat up a whole 1GB or memory.
> 
> I have reproduced the excessive memory consumption with
> the rlwimi-0.c test and a powerpc-linux cross-compiler, and
> confirmed that it is indeed caused by the call to enable_ranger().
> The test defines some six thousand functions so it seems that
> unless each call enable_ranger() is paired with some call to
> release the memory it allocates the memory leaks.
> 
> The removal of the alloc_object_size_limit global variable doesn't
> have any effect on the test case.  The function that used it (and
> now calls build_int_cst () instead) isn't called when the test
> is compiled  (It's only called for calls to allocation functions
> in the source and the test case has none.
> 

Thanks for the clarification and sorry for noisy suspicion!

BR,
Kewen

> Let me take care of releasing the ranger memory.
> 
> Martin
> 
> 
>>
>>>
>>> BR,
>>> Kewen
>>>
 Previously, compiler bootstrap and all testcases ran with a data size
 of 1GB.  After your change, the data size required for those
 particular testcases jumped to 2GB.

 The testcases are

 gcc/testsuite/gcc.target/powerpc/rlwimi-[012].c

 The failure is

 cc1: out of memory allocating 65536 bytes after a total of 1608979296

 This seems like a significant memory use regression.  Any ideas what 
 happened?
>>
>> Not really.  The patch just moved code around.  I didn't make any
>> changes that I'd expect to impact memory allocation to an appreciable
>> extent, at least not intentionally.  Let me look into it and get back
>> to you.
>>
>> Martin
>>

 Thanks, David

>>
>

Re: [PATCH] PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514

2021-08-19 Thread H.J. Lu via Gcc-patches

On Thu, Aug 19, 2021 at 12:12 PM Harald Anlauf via Gcc-patches
 wrote:
>
> Hi Tobias,
>
> > I am inclined to say that the Intel compiler has a bug by not
> > accepting it – but as written before, I regard sub-string length
> > (esp. with const expr) inquiries as an odd corner case which
> > is unlikely to occur in real-world code.
>
> ok.
>
> > Still does not work – or rather: ...%t(:)(3:4) [i.e. substring with array 
> > section]
> > and ...%str(3:4) [i.e. substring of deferred-length scalar] both do work
> > but if one combines the two (→ ...%str2(:)(3:4), i.e. substring of 
> > deferred-length
> > array section), it does not:
> >
> > Array ‘r’ at (1) is a variable, which does not reduce to a constant 
> > expression
> >
> > for:
> >
> > --- a/gcc/testsuite/gfortran.dg/pr100950.f90
> > +++ b/gcc/testsuite/gfortran.dg/pr100950.f90
> > @@ -15,2 +15,3 @@ program p
> >character(len=:), allocatable :: str
> > + character(len=:), allocatable :: str2(:)
> > end type t_
> > @@ -24,2 +25,4 @@ program p
> > integer,  parameter :: l6 = len (r(1)%str (3:4))
> > +  integer,  parameter :: l7 = len (r(1)%str2(1)(3:4))
> > +  integer,  parameter :: l8 = len (r(1)%str2(:)(3:4))
> >
> >
> > which feels odd.
>
> I agree.  I have revised the code slightly to accept substrings
> of deferred-length.  Your suggested variants now work correctly.
>
> > In principle, LGTM – except I wonder what we do about the
> > len(r(1)%str(1)(3:4));
> > I think we really do handle most code available and I would like to
> > close this
> > topic – but still it feels a bit odd to leave this bit out.
>
> That is handle now as discussed, see attached final patch.
> Regtested again.
>
> > I was also wondering whether we should check that the
> > compile-time simplification works – i.e. use -fdump-tree-original for this;
> > I attached a patch for this.
>
> I added this to the final patch and taken the liberty to push the result
> to master as d881460deb1f0bdfc3e8fa2d391a03a9763cbff4.
>
> Thanks for your patience, given the rather extensive review...
>
> Harald

This may have broken bootstrap on 32-bit hosts:

https://gcc.gnu.org/pipermail/gcc-regression/2021-August/075209.html

-- 
H.J.

Re: [PATCH] Simplify (truncate:QI (subreg:SI (reg:QI x))) to (reg:QI x)

2021-08-19 Thread Andrew Pinski via Gcc-patches

On Thu, Aug 19, 2021 at 4:18 PM Roger Sayle  wrote:
>
>
> Whilst working on a backend patch, I noticed that the middle-end's
> RTL optimizers weren't simplifying a truncation of a paradoxical
> subreg extension, though it does transform closely related (more
> complex) expressions.  The main (first) part of this patch
> implements this simplification, reusing much of the logic already
> in place.
>
> I briefly considered suggesting that it's difficult to provide a new
> testcase for this change, but then realized the reviewer's response
> would be that this type of transformation should be self-tested
> in simplify-rtx, so this patch adds a bunch of tests that integer
> extensions and truncations are simplified as expected.  No good
> deed goes unpunished and I was equally surprised to see that we
> don't currently simplify/check/defend (zero_extend:SI (reg:SI)),
> i.e. useless no-op extensions to the same mode.  So I've added
> some logic to simplify (or more accurately prevent us generating
> dubious RTL for) those.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and "make -k check" with no new failures.
>
> Ok for mainline?

The main target I know of that uses truncate a lot is MIPS64 which has
TARGET_TRULY_NOOP_TRUNCATION defined to be:
static bool
mips_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
{
  return !TARGET_64BIT || inprec <= 32 || outprec > 32;
}

So you might want to make sure this is still correct for this case.

Thanks,
Andrew Pinski


>
>
> 2021-08-20  Roger Sayle  
>
> gcc/ChangeLog
> * simplify-rtx.c (simplify_truncation): Generalize simplification
> of (truncate:A (subreg:B X)).
> (simplify_unary_operation_1) [FLOAT_TRUNCATE, FLOAT_EXTEND,
> SIGN_EXTEND, ZERO_EXTEND]: Handle cases where the operand
> already has the desired machine mode.
> (test_scalar_int_ops): Add tests that useless extensions and
> truncations are optimized away.
> (test_scalar_int_ext_ops): New self-test function to confirm
> that truncations of extensions are correctly simplified.
> (test_scalar_int_ext_ops2): New self-test function to check
> truncations of truncations, extensions of extensions, and
> truncations of extensions.
> (test_scalar_ops): Call the above two functions with a
> representative sampling of integer machine modes.
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>

Re: [PATCH] rs6000: Fix ICE expanding lxvp and stxvp gimple built-ins [PR101849]

2021-08-19 Thread Peter Bergner via Gcc-patches

On 8/13/21 12:15 PM, Bill Schmidt wrote:
> Honestly, I don't see how it matters.  So far as I can tell, all you've done
> here is hand-inlined what build_simple_mem_ref would do.  So I guess I have
> a slight preference for your original patch (but with the new test case,
> of course).

Ok, I ended up pushing the original patch then with the expanded test case.
I'll let this bake on trunk for a bit before back porting.  Thanks everyone.

Peter

[PATCH] Simplify (truncate:QI (subreg:SI (reg:QI x))) to (reg:QI x)

2021-08-19 Thread Roger Sayle


Whilst working on a backend patch, I noticed that the middle-end's
RTL optimizers weren't simplifying a truncation of a paradoxical
subreg extension, though it does transform closely related (more
complex) expressions.  The main (first) part of this patch
implements this simplification, reusing much of the logic already
in place.

I briefly considered suggesting that it's difficult to provide a new
testcase for this change, but then realized the reviewer's response
would be that this type of transformation should be self-tested
in simplify-rtx, so this patch adds a bunch of tests that integer
extensions and truncations are simplified as expected.  No good
deed goes unpunished and I was equally surprised to see that we
don't currently simplify/check/defend (zero_extend:SI (reg:SI)),
i.e. useless no-op extensions to the same mode.  So I've added
some logic to simplify (or more accurately prevent us generating
dubious RTL for) those.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and "make -k check" with no new failures.

Ok for mainline?


2021-08-20  Roger Sayle  

gcc/ChangeLog
* simplify-rtx.c (simplify_truncation): Generalize simplification
of (truncate:A (subreg:B X)).
(simplify_unary_operation_1) [FLOAT_TRUNCATE, FLOAT_EXTEND,
SIGN_EXTEND, ZERO_EXTEND]: Handle cases where the operand
already has the desired machine mode.
(test_scalar_int_ops): Add tests that useless extensions and
truncations are optimized away.
(test_scalar_int_ext_ops): New self-test function to confirm
that truncations of extensions are correctly simplified.
(test_scalar_int_ext_ops2): New self-test function to check
truncations of truncations, extensions of extensions, and
truncations of extensions.
(test_scalar_ops): Call the above two functions with a
representative sampling of integer machine modes.

Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index a719f57..f3df614 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -813,23 +813,49 @@ simplify_context::simplify_truncation (machine_mode mode, 
rtx op,
 return simplify_gen_unary (GET_CODE (op), mode,
   XEXP (XEXP (op, 0), 0), mode);
 
-  /* (truncate:A (subreg:B (truncate:C X) 0)) is
- (truncate:A X).  */
+  /* Simplifications of (truncate:A (subreg:B X 0)).  */
   if (GET_CODE (op) == SUBREG
   && is_a  (mode, _mode)
   && SCALAR_INT_MODE_P (op_mode)
   && is_a  (GET_MODE (SUBREG_REG (op)), _mode)
-  && GET_CODE (SUBREG_REG (op)) == TRUNCATE
   && subreg_lowpart_p (op))
 {
-  rtx inner = XEXP (SUBREG_REG (op), 0);
-  if (GET_MODE_PRECISION (int_mode) <= GET_MODE_PRECISION (subreg_mode))
-   return simplify_gen_unary (TRUNCATE, int_mode, inner,
-  GET_MODE (inner));
-  else
-   /* If subreg above is paradoxical and C is narrower
-  than A, return (subreg:A (truncate:C X) 0).  */
-   return simplify_gen_subreg (int_mode, SUBREG_REG (op), subreg_mode, 0);
+  /* (truncate:A (subreg:B (truncate:C X) 0)) is (truncate:A X).  */
+  if (GET_CODE (SUBREG_REG (op)) == TRUNCATE)
+   {
+ rtx inner = XEXP (SUBREG_REG (op), 0);
+ if (GET_MODE_PRECISION (int_mode)
+ <= GET_MODE_PRECISION (subreg_mode))
+   return simplify_gen_unary (TRUNCATE, int_mode, inner,
+  GET_MODE (inner));
+ else
+   /* If subreg above is paradoxical and C is narrower
+  than A, return (subreg:A (truncate:C X) 0).  */
+   return simplify_gen_subreg (int_mode, SUBREG_REG (op),
+   subreg_mode, 0);
+   }
+
+  /* Simplifications of (truncate:A (subreg:B X:C 0)) with
+paradoxical subregs (B is wider than C).  */
+  if (is_a  (op_mode, _op_mode))
+   {
+ unsigned int int_op_prec = GET_MODE_PRECISION (int_op_mode);
+ unsigned int subreg_prec = GET_MODE_PRECISION (subreg_mode);
+ if (int_op_prec > subreg_mode)
+   {
+ if (int_mode == subreg_mode)
+   return SUBREG_REG (op);
+ if (GET_MODE_PRECISION (int_mode) < subreg_prec)
+   return simplify_gen_unary (TRUNCATE, int_mode,
+  SUBREG_REG (op), subreg_mode);
+   }
+ /* Simplification of (truncate:A (subreg:B X:C 0)) where
+A is narrower than B and B is narrower than C.  */
+ else if (int_op_prec < subreg_mode
+  && GET_MODE_PRECISION (int_mode) < int_op_prec)
+   return simplify_gen_unary (TRUNCATE, int_mode,
+  SUBREG_REG (op), subreg_mode);
+   }
 }
 
   /* (truncate:A (truncate:B X)) is (truncate:A X).  */
@@ -1245,6 +1271,10 @@

[PATCH] enable ranger and caching in pass_waccess

2021-08-19 Thread Martin Sebor via Gcc-patches


The attached patch changes the new access warning pass to use
the per-function ranger instance.  To do that it makes a number
of the global static functions members of the pass (that involved
moving one to a later point in the file, increasing the diff;
the body of the function hasn't changed otherwise).  Still more
functions remain.  At the same time, the patch also enables
the simple pointer_query cache to avoid repeatedly recomputing
the properties of related pointers into the same objects, and
makes the cache more effective (trunk fails to cache a bunch of
intermediate results).  Finally, the patch enhances the debugging
support for the cache.

Other than the ranger/caching the changes have no user-visible
effect.

Tested on x86_64-linux.

Martin

Previous patches in this series:
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577526.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576821.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575377.html
gcc/ChangeLog:

	* gimple-ssa-warn-access.cc (check_memop_access): Remove template and
	make a member function.
	(maybe_check_dealloc_call): Make a pass_waccess member function.
	(class pass_waccess): Add and rename members.
	(pass_waccess::pass_waccess): Adjust to name change.
	(pass_waccess::~pass_waccess): Same.
	(check_alloca): Make a member function.
	(check_alloc_size_call): Same.
	(check_strcat): Same.
	(check_strncat): Same.
	(check_stxcpy): Same.
	(check_stxncpy): Same.
	(check_strncmp): Same.
	(maybe_warn_rdwr_sizes): Rename...
	(pass_waccess::maybe_check_access_sizes): ...to this.
	(pass_waccess::check_call): Adjust to name changes.
	(pass_waccess::maybe_check_dealloc_call): Make a pass_waccess member
	function.
	(pass_waccess::execute): Adjust to name changes.
	* gimple-ssa-warn-access.h (check_memop_access): Remove.
	* pointer-query.cc (access_ref::phi): Handle null pointer.
	(access_ref::inform_access): Same.
	(pointer_query::put_ref): Modify a cached value, not a copy of it.
	(pointer_query::dump): New function.
	(compute_objsize_r): Avoid overwriting access_ref::bndrng.  Cache
	more results.
	* pointer-query.h (pointer_query::dump): Declare.
	* tree-ssa-strlen.c (printf_strlen_execute): Factor code out into
	pointer_query::put_ref.

gcc/testsuite/ChangeLog:

	* gcc.dg/Wstringop-overflow-73.c: New test.

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 4a2dd9ade77..4473b093f88 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -1511,41 +1511,6 @@ check_access (tree expr, tree dstwrite,
 			mode, pad);
 }
 
-/* Helper to determine and check the sizes of the source and the destination
-   of calls to __builtin_{bzero,memcpy,mempcpy,memset} calls.  EXP is the
-   call expression, DEST is the destination argument, SRC is the source
-   argument or null, and LEN is the number of bytes.  Use Object Size type-0
-   regardless of the OPT_Wstringop_overflow_ setting.  Return true on success
-   (no overflow or invalid sizes), false otherwise.  */
-
-template 
-static bool
-check_memop_access (GimpleOrTree expr, tree dest, tree src, tree size)
-{
-  /* For functions like memset and memcpy that operate on raw memory
- try to determine the size of the largest source and destination
- object using type-0 Object Size regardless of the object size
- type specified by the option.  */
-  access_data data (expr, access_read_write);
-  tree srcsize = src ? compute_objsize (src, 0, ) : NULL_TREE;
-  tree dstsize = compute_objsize (dest, 0, );
-
-  return check_access (expr, size, /*maxread=*/NULL_TREE,
-		   srcsize, dstsize, data.mode, );
-}
-
-bool
-check_memop_access (gimple *stmt, tree dest, tree src, tree size)
-{
-  return check_memop_access(stmt, dest, src, size);
-}
-
-bool
-check_memop_access (tree expr, tree dest, tree src, tree size)
-{
-  return check_memop_access(expr, dest, src, size);
-}
-
 /* A convenience wrapper for check_access above to check access
by a read-only function like puts.  */
 
@@ -2093,135 +2058,6 @@ warn_dealloc_offset (location_t loc, gimple *call, const access_ref )
   return true;
 }
 
-/* Issue a warning if a deallocation function such as free, realloc,
-   or C++ operator delete is called with an argument not returned by
-   a matching allocation function such as malloc or the corresponding
-   form of C++ operatorn new.  */
-
-static void
-maybe_check_dealloc_call (gcall *call)
-{
-  tree fndecl = gimple_call_fndecl (call);
-  if (!fndecl)
-return;
-
-  unsigned argno = fndecl_dealloc_argno (fndecl);
-  if ((unsigned) call_nargs (call) <= argno)
-return;
-
-  tree ptr = gimple_call_arg (call, argno);
-  if (integer_zerop (ptr))
-return;
-
-  access_ref aref;
-  if (!compute_objsize (ptr, 0, ))
-return;
-
-  tree ref = aref.ref;
-  if (integer_zerop (ref))
-return;
-
-  tree dealloc_decl = fndecl;
-  location_t loc = gimple_location (call);
-
-  if (DECL_P (ref) || EXPR_P (ref))
-{
-  /*

[PATCH] nvptx: Add a __PTX_ISA__ predefined macro based on target ISA.

2021-08-19 Thread Roger Sayle


This patch adds a __PTX_ISA__ predefined macro to the nvptx backend that
allows code to check the compute model being targeted by the compiler.
This is equivalent to the __CUDA_ARCH__ macro defined by CUDA's nvcc
compiler, but to avoid causing problems for source code that checks
for that compiler, this macro uses GCC's nomenclature; it's easy
enough for users to "#define __CUDA_ARCH__ __PTX_ISA__", but I'm
also happy to modify this patch to define __CUDA_ARCH__ if that's
the preference of the nvptx backend maintainers.

What might have been a four line patch is actually a little more
complicated, as this patch takes the opportunity to upgrade the
nvptx backend to use the now preferred nvptx-c.c idiom.

This patch has been tested with a cross-compiler from
x86_64-pc-linux-gnu to nvptx-none, and tested with
"make -k check" with no new failures.  This feature is
useful for implementing clock() on nvptx in newlib.

Ok for mainline?


2021-08-19  Roger Sayle  

gcc/ChangeLog
* config.gcc (nvptx-*-*): Define {c,c++}_target_objs.
* config/nvptx/nvptx-protos.h (nvptx_cpu_cpp_builtins): Prototype.
* config/nvptx/nvptx.h (TARGET_CPU_CPP_BUILTINS): Implement with
a call to the new nvptx_cpu_cpp_builtins function in nvptx-c.c.
* config/nvptx/t-nvptx (nvptx-c.o): New rule.
* config/nvptx/nvptx-c.c: New source file.
(nvptx_cpu_cpp_builtins): Move implementation here.

Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 93e2b32..cbad989 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -468,6 +468,8 @@ nios2-*-*)
;;
 nvptx-*-*)
cpu_type=nvptx
+   c_target_objs="nvptx-c.o"
+   cxx_target_objs="nvptx-c.o"
;;
 or1k*-*-*)
cpu_type=or1k
diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index b7e6ae2..b29ddc9 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -40,6 +40,7 @@ extern void nvptx_output_aligned_decl (FILE *file, const char 
*name,
 extern void nvptx_function_end (FILE *);
 extern void nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT);
 extern void nvptx_output_ascii (FILE *, const char *, unsigned HOST_WIDE_INT);
+extern void nvptx_cpu_cpp_builtins (void);
 extern void nvptx_register_pragmas (void);
 extern unsigned int nvptx_data_alignment (const_tree, unsigned int);
 
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index fdaacdd..d367174 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -34,17 +34,7 @@
nvptx-as.  */
 #define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"
 
-#define TARGET_CPU_CPP_BUILTINS()  \
-  do   \
-{  \
-  builtin_assert ("machine=nvptx");\
-  builtin_assert ("cpu=nvptx");\
-  builtin_define ("__nvptx__");\
-  if (TARGET_SOFT_STACK)   \
-builtin_define ("__nvptx_softstack__");\
-  if (TARGET_UNIFORM_SIMT) \
-builtin_define ("__nvptx_unisimt__");  \
-} while (0)
+#define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
 
 /* Avoid the default in ../../gcc.c, which adds "-pthread", which is not
supported for nvptx.  */
diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index 6c1010d..d33bacd 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -1,3 +1,7 @@
+nvptx-c.o: $(srcdir)/config/nvptx/nvptx-c.c
+   $(COMPILE) $<
+   $(POSTCOMPILE)
+
 CFLAGS-mkoffload.o += $(DRIVER_DEFINES) \
-DGCC_INSTALL_NAME=\"$(GCC_INSTALL_NAME)\"
 mkoffload.o: $(srcdir)/config/nvptx/mkoffload.c
diff --git a/gcc/config/nvptx/nvptx-c.c b/gcc/config/nvptx/nvptx-c.c
new file mode 100644
index 000..5a7a0ef
--- /dev/null
+++ b/gcc/config/nvptx/nvptx-c.c
@@ -0,0 +1,47 @@
+/* Subroutines for the C front end on the NVPTX architecture.
+ * Copyright (C) 2021 Free Software Foundation, Inc.
+ *
+ * This file is part of GCC.
+ *
+ * GCC is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published
+ * by the Free Software Foundation; either version 3, or (at your
+ * option) any later version.
+ *
+ * GCC is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+ * License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GCC; see the file COPYING3.  If not see
+ * .  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "target.h"
+#include "c-family/c-common.h"
+#include "memmodel.h"
+#include "tm_p.h"
+#include "c-family/c-pragma.h"
+
+/* Function to tell the

Re: [committed] Drop stabs support from h8300 and v850 ports

2021-08-19 Thread Jeff Law via Gcc-patches





On 8/19/2021 12:24 PM, Gerald Pfeifer wrote:

On Thu, 19 Aug 2021, Jeff Law via Gcc-patches wrote:

Whee, two more ports dropping stabs. Committed to the trunk.

Are you saying you're on a mission to stab wodden stakes into stabs?
Seems that way  :-)    I hadn't really set out to do that, but once rl78 
started failing stabs stuff, I figured I might as well just start the 
process of killing off stabs.  I suspect for the majority of targets 
it's trivial like we've seen for rl78, v8 and h8.


jeff

[Patch][doc][PR101843]clarification on building gcc and binutils together

2021-08-19 Thread Qing Zhao via Gcc-patches

Hi,

This patch is on behalf of John Henning, who opened PR 101843: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101843

He proposed the following doc change, please take a look and let me know 
whether this is Okay for commit?

Thanks.

Qing

>From 9bf6f9a5964df26cac32d90f57719f4871874d54 Mon Sep 17 00:00:00 2001
From: qing zhao 
Date: Thu, 19 Aug 2021 18:20:49 -0400
Subject: [PATCH] doc/install.texi: add a generic advice on building gcc and
 binutils together.

gcc/ChangeLog:

* doc/install.texi: Add a generic advice on building gcc and binutils together.
---
 gcc/doc/install.texi | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 8e974d2..4f1abbd 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -562,11 +562,13 @@ language front end and the language runtime (where 
appropriate).
 
 If you also intend to build binutils (either to upgrade an existing
 installation or for use in place of the corresponding tools of your
-OS), unpack the binutils distribution either in the same directory or
-a separate one.  In the latter case, add symbolic links to any
-components of the binutils you intend to build alongside the compiler
-(@file{bfd}, @file{binutils}, @file{gas}, @file{gprof}, @file{ld},
-@file{opcodes}, @dots{}) to the directory containing the GCC sources.
+OS), begin by identifying a version of binutils that was created at
+about the same time as your version of GCC.  Then unpack the binutils
+distribution either in the same directory or a separate one. In the
+latter case, add symbolic links to any components of the binutils you
+intend to build alongside the compiler (@file{bfd}, @file{binutils},
+@file{gas}, @file{gprof}, @file{ld}, @file{opcodes}, @dots{}) to the
+directory containing the GCC sources.
 
 Likewise the GMP, MPFR and MPC libraries can be automatically built
 together with GCC.  You may simply run the
-- 
1.8.3.1

Re: [PATCH] Move xx* builtins to vsx.md.

2021-08-19 Thread Segher Boessenkool

On Thu, Aug 19, 2021 at 06:10:46PM -0400, Michael Meissner wrote:
> On Wed, Aug 18, 2021 at 06:11:03PM -0500, Segher Boessenkool wrote:
> > I think the current vector.md / altivec.md / vsx.md / rs6000.md
> > division is artificial at best.  Most of the basic (movement etc.)
> > things are in rs6000.md (and all should be), but nothing else is clear.
> > 
> > The name "altivec.md" suggests it is only for the very old things, but
> > it is not used that way, and that it untenable anyway: we have more
> > recent insns to plug holes in that (for example 64-bit integer support),
> > so it arguably is not just for that.
> > 
> > Using it for instructions that only work on the high 32 VSRs (i.e. the
> > VRs) is quite artificial as well -- sometimes there are equivalent insns
> > for the other 32 VSRs already, sometimes it is just because of opcode
> > scarcity, sometimes it is because it is for the slow vector unit only
> > (but those seem to live in rs6000.md and crypto.md anyway).
> > 
> > Maybe we should give up on dividing these things, and put both in one
> > file, say vector.md?
> 
> Yes but that is more ambitious.

Absolutely.  But setting a destination before starting to walk is
sometimes helpful ;-)

> Basically I have 2 patches coming that use and
> update the xxsplti instructions.  I can avoid putting in this specific change
> and reformulate them for altivec.md instead of vsx.md.  Or I can check in 
> these
> changes.  Which do you want?  I don't want to do both insn movement and new
> patches at the same time.

I am fine with this patch, it is a clear improvement already.

> The original design of vector.md was to allow for alternate vector units, and
> vector.md was just the define_expands.  But the likely hood of new vector 
> units
> is probably low.

Right, history has caught up with us.

> When I wrote vsx.md in the power7 days, we were toying with the notion of 
> doing
> VSX and not Altivec instructions.  But I quickly realized you always need
> Altivec for VSX.

There also were 8-byte vectors back then.  Or was that completely
separate code?

> In general, I would prefer not to have a flag day where everything gets moved
> all at once.

Yup, it is too easy to make mistakes here, ordering in machine
descriptions is significant.  Although comparing the generated insn-*
files before and after might help.


Segher

Re: [PATCH] Move xx* builtins to vsx.md.

2021-08-19 Thread Michael Meissner via Gcc-patches

On Wed, Aug 18, 2021 at 06:11:03PM -0500, Segher Boessenkool wrote:
> On Wed, Aug 18, 2021 at 04:42:42PM -0400, David Edelsohn wrote:
> > I wanted to give Segher a chance to comment on the structure.
> 
> I think the current vector.md / altivec.md / vsx.md / rs6000.md
> division is artificial at best.  Most of the basic (movement etc.)
> things are in rs6000.md (and all should be), but nothing else is clear.
> 
> The name "altivec.md" suggests it is only for the very old things, but
> it is not used that way, and that it untenable anyway: we have more
> recent insns to plug holes in that (for example 64-bit integer support),
> so it arguably is not just for that.
> 
> Using it for instructions that only work on the high 32 VSRs (i.e. the
> VRs) is quite artificial as well -- sometimes there are equivalent insns
> for the other 32 VSRs already, sometimes it is just because of opcode
> scarcity, sometimes it is because it is for the slow vector unit only
> (but those seem to live in rs6000.md and crypto.md anyway).
> 
> Maybe we should give up on dividing these things, and put both in one
> file, say vector.md?

Yes but that is more ambitious.  Basically I have 2 patches coming that use and
update the xxsplti instructions.  I can avoid putting in this specific change
and reformulate them for altivec.md instead of vsx.md.  Or I can check in these
changes.  Which do you want?  I don't want to do both insn movement and new
patches at the same time.

The original design of vector.md was to allow for alternate vector units, and
vector.md was just the define_expands.  But the likely hood of new vector units
is probably low.

When I wrote vsx.md in the power7 days, we were toying with the notion of doing
VSX and not Altivec instructions.  But I quickly realized you always need
Altivec for VSX.

In general, I would prefer not to have a flag day where everything gets moved
all at once.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH v2] rs6000: Avoid buffer overruns

2021-08-19 Thread Bill Schmidt via Gcc-patches


Hi,

I totally biffed the previous version of this patch, as it was built
against an experimental tree instead of trunk.  Trying again...

Although safe_inc_pos avoids buffer overruns in rs6000-gen-builtins.c,
there are some other routines where we fail to detect the possibility.
Clean those up!  (Also, safe_inc_pos is not quite right itself.)

Bootstrapped and tested on powerpc64le-linux-gnu, this time without
some unsubmitted patches in the way.  Is this OK for trunk?

Thanks,
Bill


2021-08-19  Bill Schmidt  

gcc/
PR target/101830
* config/rs6000/rs6000-gen-builtins.c (consume_whitespace):
Diagnose buffer overrun.
(safe_inc_pos): Fix overrun detection.
(match_identifier): Diagnose buffer overrun.
(match_integer): Likewise.
(match_to_right_bracket): Likewise.
---
 gcc/config/rs6000/rs6000-gen-builtins.c | 34 ++---
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c 
b/gcc/config/rs6000/rs6000-gen-builtins.c
index e5d3b71b622..05b2d2939b5 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -597,6 +597,13 @@ consume_whitespace (void)
 {
   while (pos < LINELEN && isspace(linebuf[pos]) && linebuf[pos] != '\n')
 pos++;
+
+  if (pos >= LINELEN)
+{
+  diag ("line length overrun at %d.\n", pos);
+  exit (1);
+}
+
   return;
 }
 
@@ -623,7 +630,7 @@ advance_line (FILE *file)

 static inline void
 safe_inc_pos (void)
 {
-  if (pos++ >= LINELEN)
+  if (++pos >= LINELEN)
 {
   (*diag) ("line length overrun.\n");
   exit (1);
@@ -636,9 +643,16 @@ static char *
 match_identifier (void)
 {
   int lastpos = pos - 1;
-  while (isalnum (linebuf[lastpos + 1]) || linebuf[lastpos + 1] == '_')
+  while (lastpos < LINELEN - 1
+&& (isalnum (linebuf[lastpos + 1]) || linebuf[lastpos + 1] == '_'))
 ++lastpos;
 
+  if (lastpos >= LINELEN - 1)

+{
+  diag ("line length overrun at %d.\n", lastpos);
+  exit (1);
+}
+
   if (lastpos < pos)
 return 0;
 
@@ -660,9 +674,15 @@ match_integer (void)

 safe_inc_pos ();
 
   int lastpos = pos - 1;

-  while (isdigit (linebuf[lastpos + 1]))
+  while (lastpos < LINELEN - 1 && isdigit (linebuf[lastpos + 1]))
 ++lastpos;
 
+  if (lastpos >= LINELEN - 1)

+{
+  diag ("line length overrun at %d.\n", lastpos);
+  exit (1);
+}
+
   if (lastpos < pos)
 return NULL;
 
@@ -680,7 +700,7 @@ static const char *

 match_to_right_bracket (void)
 {
   int lastpos = pos - 1;
-  while (linebuf[lastpos + 1] != ']')
+  while (lastpos < LINELEN - 1 && linebuf[lastpos + 1] != ']')
 {
   if (linebuf[lastpos + 1] == '\n')
{
@@ -690,6 +710,12 @@ match_to_right_bracket (void)
   ++lastpos;
 }
 
+  if (lastpos >= LINELEN - 1)

+{
+  diag ("line length overrun at %d.\n", lastpos);
+  exit (1);
+}
+
   if (lastpos < pos)
 return 0;
 
--

2.27.0

Re: [ping] Re-unify 'omp_build_component_ref' and 'oacc_build_component_ref'

2021-08-19 Thread Thomas Schwinge

Hi!

Richard, maybe you have an opinion here, in particular about my
"SLP vectorizer" comment below?  Please see
<87r1f2puss.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87r1f2puss.fsf@euler.schwinge.homeip.net>
for the full context.

On 2021-08-16T10:21:04+0200, Jakub Jelinek  wrote:
> On Mon, Aug 16, 2021 at 10:08:42AM +0200, Thomas Schwinge wrote:
>>  /* Build COMPONENT_REF and set TREE_THIS_VOLATILE and TREE_READONLY on it
>> as appropriate.  */
>>
>>  tree
>>  omp_build_component_ref (tree obj, tree field)
>>  {
>> +  tree field_type = TREE_TYPE (field);
>> +  tree obj_type = TREE_TYPE (obj);
>> +  if (!ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (obj_type)))
>> +field_type
>> +  = build_qualified_type (field_type,
>> +  KEEP_QUAL_ADDR_SPACE (TYPE_QUALS (obj_type)));

(For later reference: "Kwok's new code" here is to propagate to
'field_type' any non-generic address space of 'obj_type'.)

|> Concerning the current 'gcc/omp-low.c:omp_build_component_ref', for the
|> current set of offloading testcases, we never see a
|> '!ADDR_SPACE_GENERIC_P' there, so the address space handling doesn't seem
|> to be necessary there (but also won't do any harm: no-op).
>
> Are you sure this can't trigger?
> Say
> extern int __seg_fs a;
>
> void
> foo (void)
> {
>   #pragma omp parallel private (a)
>   a = 2;
> }

That test case doesn't run into 'omp_build_component_ref' at all,
but I'm attaching an altered and extended variant that does,
"Add 'libgomp.c/address-space-1.c'".  OK to push to master branch?

In this case, 'omp_build_component_ref' called via host compilation
'pass_lower_omp', it's the 'field_type' that has 'address-space-1', not
'obj_type', so indeed Kwok's new code is a no-op:

(gdb) call debug_tree(field_type)
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x77686498 precision:32 min  max 

pointer_to_this >
unsigned DI
size  constant 64>
unit-size  constant 8>
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x77686b28>

(gdb) call debug_tree(obj_type)
  constant 64>
unit-size  constant 8>
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x77686bd0
fields 
unsigned DI size  unit-size 

align:64 warn_if_not_align:0 symtab:0 alias-set -1 
canonical-type 0x77686b28>
unsigned DI /home/thomas/shared/gcc/omp/as.c:4:14 size  unit-size 
align:64 warn_if_not_align:0 offset_align 128
offset 
bit-offset  context 
> reference_to_this >

The case that Kwok's new code handles, however, is when 'obj_type' has a
non-generic address space, and then propagates that one to 'field_type'.

For a similar OpenACC example, 'omp_build_component_ref' called via GCN
offloading compilation 'pass_omp_oacc_neuter_broadcast', we've got
without Kwok's new code:

(gdb) call debug_tree(field_type)
  constant 8>
unit-size  constant 1>
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x77550b28 precision:1 min  max >

(gdb) call debug_tree(obj_type)
  constant 8>
unit-size  constant 1>
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x77631000
fields  unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 
canonical-type 0x77550b28 precision:1 min  
max >
unsigned QI :0:0 size  
unit-size 
align:8 warn_if_not_align:0 offset_align 64
offset 
bit-offset  context 
>
pointer_to_this >

..., and with Kwok's new code the 'address-space-4' of 'obj_type' is
propagated to 'field_type':

(gdb) call debug_tree(field_type)
  constant 8>
unit-size  constant 1>
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x77631540 precision:1 min  max >

I'm not familiar enough with these bits to tell whether Kwok's new code
is the right solution to this problem -- or if, for example, the problem
is rather in the SLP vectorizer, where the ICE seems to ultimately
emerge?

Without (ICEs later) vs. with (works) Kwok's new code, we see the
'a.xamdgcn-amdhsa.mkoffload.175t.slp1' dump change as follows (word-diff,
only additional '', occasionally):

[...]
  {++} vector(2) long int * vectp.58;
  {++} vector(2) long int * vectp_.oacc_worker_o.57;
  {++} vector(2) int * vectp.56;
  {++} vector(2) int * vectp_.oacc_worker_o.55;
[...]
  {++} long int * _104;
[...]
  {++} long int * _108;
[...]
   void * _350;
[...]
  _350 = __builtin_gcn_single_copy_start (&.oacc_worker_o.6);
[...]
  MEM <{++} vector(2) long int> [(long int 
*)&.oacc_worker_o.6] = _101;
  _108 = &.oacc_worker_o.6._22 + 16;
  MEM <{++} vector(2) long int> [(long int *)_108] = _100;
  _104 =

[PATCH] libgfortran : Use the libtool macro to determine libm availability.

2021-08-19 Thread Iain Sandoe

Hi,

A while ago had a report of build failure against a Darwin branch on
the latest OS release.  This was because (temporarily) the symlink
from libm.dylib => libSystem.dylib had been removed/omitted.

libm is not needed on Darwin, and should not be added unconditionally
even if that is (mostly) harmless since it is a symlink to libc.

There could be cases where the addition was not completely harmless
because the presentation of the symlink would cause the symbols exposed
in libSystem to be considered ahead of ones presented in convenience
libraries.

tested on x86_64, i686-darwin, x86_64-linux,
OK for master?
thanks
Iain

libgfortran/ChangeLog:

* Makefile.am: Use configured libm availability.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use libtool macro to find libm availability.
* libgfortran.spec.in: Use configured libm availability.
---
 libgfortran/Makefile.am |   2 +-
 libgfortran/Makefile.in |   3 +-
 libgfortran/configure   | 142 
 libgfortran/configure.ac|   1 +
 libgfortran/libgfortran.spec.in |   2 +-
 5 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/libgfortran/Makefile.am b/libgfortran/Makefile.am
index 8d104321567..6fc4b465c6e 100644
--- a/libgfortran/Makefile.am
+++ b/libgfortran/Makefile.am
@@ -42,7 +42,7 @@ libgfortran_la_LINK = $(LINK) $(libgfortran_la_LDFLAGS)
 libgfortran_la_LDFLAGS = -version-info `grep -v '^\#' 
$(srcdir)/libtool-version` \
$(LTLDFLAGS) $(LIBQUADLIB) ../libbacktrace/libbacktrace.la \
$(HWCAP_LDFLAGS) \
-   -lm $(extra_ldflags_libgfortran) \
+   $(LIBM) $(extra_ldflags_libgfortran) \
$(version_arg) -Wc,-shared-libgcc
 libgfortran_la_DEPENDENCIES = $(version_dep) libgfortran.spec $(LIBQUADLIB_DEP)
 
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 523eb24bca1..513fd80b936 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -260,6 +260,7 @@ AC_PROG_INSTALL
 #AC_MSG_NOTICE([== Starting libtool configuration])
 AC_LIBTOOL_DLOPEN
 AM_PROG_LIBTOOL
+AC_CHECK_LIBM
 ACX_LT_HOST_FLAGS
 AC_SUBST(enable_shared)
 AC_SUBST(enable_static)
diff --git a/libgfortran/libgfortran.spec.in b/libgfortran/libgfortran.spec.in
index 95aa3f842a3..b870e78c151 100644
--- a/libgfortran/libgfortran.spec.in
+++ b/libgfortran/libgfortran.spec.in
@@ -5,4 +5,4 @@
 #
 
 %rename lib liborig
-*lib: @LIBQUADSPEC@ -lm %(libgcc) %(liborig)
+*lib: @LIBQUADSPEC@  @LIBM@ %(libgcc) %(liborig)
-- 
2.24.3 (Apple Git-128)

[PATCH] testsuite, Darwin : Do not claim 'GAS' for cctools assembler.

2021-08-19 Thread Iain Sandoe

Hi,

Although the cctools assembler is based of GNU GAS, it is from a
very old version (1.38) which does not support many of the features
that the target supports test is expecting***.

tested on i686 and x86_64 darwin versions using the cctools as.
OK for master?

thanks
Iain

*** I guess we could be more clever and parse the output to find a version
and then alter the supports condition to “gas NN”, but I don’t currently
have cycles to implement that.


gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Exclude cctools assembler based on
GAS 1.38.
---
 gcc/testsuite/lib/target-supports.exp | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 44465b14b06..ac9daee26b8 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9454,7 +9454,14 @@ proc check_effective_target_gas { } {
set status [remote_exec host "$gcc_as" "-v /dev/null"]
set as_output [lindex $status 1]
if { [ string first "GNU" $as_output ] >= 0 } {
-   set use_gas_saved 1
+   # Some Darwin versions have an assembler which is based on an old
+   # version of GAS (and reports GNU assembler in its -v output) but
+   # but doesn't support many of the modern GAS features.
+   if { [ string first "cctools" $as_output ] >= 0 } {
+   set use_gas_saved 0
+   } else {
+   set use_gas_saved 1
+   }
} else {
set use_gas_saved 0
}
-- 
2.24.3 (Apple Git-128)

Re: [PATCH] more warning code refactoring

2021-08-19 Thread Segher Boessenkool

On Thu, Aug 19, 2021 at 12:53:16PM -0600, Martin Sebor wrote:
> That said, I introduced
> the variable in r243470 to begin with and I consider its removal
> a trivially correct and appropriate part of refactoring.

It is not a refactoring.  It changes behaviour.


Segher

Re: [PATCH 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-08-19 Thread Segher Boessenkool

Hi!

On Thu, Aug 19, 2021 at 01:16:16PM -0500, Paul A. Clarke wrote:
> On Wed, Aug 18, 2021 at 05:46:58PM -0500, Segher Boessenkool wrote:
> > There are __builtin_set_fpscr_rn and friends, please use those, those
> > are optimised for any platform.
> 
> I do.  (Unless I missed an opportunity somewhere?)

It looked to me like you do a lot of unnecessary asm.

> > >   * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
> > >   _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
> > >   Convert from function to macro.
> > 
> > Please explain why you regress this (not in the changelog of course).
> 
> I'm not sure what "regress" means here?

Macros are from the 1970's, inline functions are the new hot.  Why do
you need macros here?  The patch should say (the patch message likely).

> > > +#define _MM_FROUND_TO_NEAREST_INT   0x00
> > > +#define _MM_FROUND_TO_ZERO  0x01
> > > +#define _MM_FROUND_TO_POS_INF   0x02
> > > +#define _MM_FROUND_TO_NEG_INF   0x03
> > > +#define _MM_FROUND_CUR_DIRECTION0x04
> > 
> > You can just write "0" .. "4", heh.
> 
> Copied from config/i386/smmintrin.h.

That doesn't make it less silly :-)

> > > +__inline __m128d
> > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > +_mm_round_pd (__m128d __A, int __rounding)
> > > +{
> > 
> > Non-static inline is not what you want, esp. with gnu-inline?  Or, what
> > is the goal, and why can you not do it with modern inline?
> 
> This is the same basic signature as the other 600+ intrinsics.
> Actually, they were all described as "extern", but in a previous
> review, you said:
> > "extern" on definitions is superfluous
> So, I've dropped that for newer ones.
> Should they all instead be "static"?
> 
> The goal is to be compatible with the i386 implementations.
> Those typically use something like:
> 
>   extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> 
> (which kinda makes me want to put "extern" back, now that I think
> about it).

"extern" is not redundant for inline functions.  Since you have
always_inline here, gnu_inline extern inline has the same meaning as
static inline in portable C.

> I'm not sure what you mean by "modern inline".

Not using the long deprecated gnu_inline.

> > Wrong indent.  This code is very hard to read because of that.
> 
> OK, will fix in v2.

Thanks!

> > If you figure that gee, it would be a nice if we had a builtin for
> > mffsce, then please make one?  :-)
> 
> Is one use-case sufficient grounds?  I can give it a shot if so.

If it is useful for others, then yes please!  Ideally you can make a
builtin that we can also reasonably implement without support for the
new insns, so we can use the builtin whenever the builtin exists.

Thanks,


Segher

Re: [PATCH] PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514

2021-08-19 Thread Harald Anlauf via Gcc-patches

Hi Tobias,

> I am inclined to say that the Intel compiler has a bug by not
> accepting it – but as written before, I regard sub-string length
> (esp. with const expr) inquiries as an odd corner case which
> is unlikely to occur in real-world code.

ok.

> Still does not work – or rather: ...%t(:)(3:4) [i.e. substring with array 
> section]
> and ...%str(3:4) [i.e. substring of deferred-length scalar] both do work
> but if one combines the two (→ ...%str2(:)(3:4), i.e. substring of 
> deferred-length
> array section), it does not:
> 
> Array ‘r’ at (1) is a variable, which does not reduce to a constant expression
> 
> for:
> 
> --- a/gcc/testsuite/gfortran.dg/pr100950.f90
> +++ b/gcc/testsuite/gfortran.dg/pr100950.f90
> @@ -15,2 +15,3 @@ program p
>character(len=:), allocatable :: str
> + character(len=:), allocatable :: str2(:)
> end type t_
> @@ -24,2 +25,4 @@ program p
> integer,  parameter :: l6 = len (r(1)%str (3:4))
> +  integer,  parameter :: l7 = len (r(1)%str2(1)(3:4))
> +  integer,  parameter :: l8 = len (r(1)%str2(:)(3:4))
> 
> 
> which feels odd.

I agree.  I have revised the code slightly to accept substrings
of deferred-length.  Your suggested variants now work correctly.

> In principle, LGTM – except I wonder what we do about the
> len(r(1)%str(1)(3:4));
> I think we really do handle most code available and I would like to
> close this
> topic – but still it feels a bit odd to leave this bit out.

That is handle now as discussed, see attached final patch.
Regtested again.

> I was also wondering whether we should check that the
> compile-time simplification works – i.e. use -fdump-tree-original for this;
> I attached a patch for this.

I added this to the final patch and taken the liberty to push the result
to master as d881460deb1f0bdfc3e8fa2d391a03a9763cbff4.

Thanks for your patience, given the rather extensive review...

Harald
diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index c27b47aa98f..492867e12cb 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -4512,6 +4512,78 @@ gfc_simplify_leadz (gfc_expr *e)
 }


+/* Check for constant length of a substring.  */
+
+static bool
+substring_has_constant_len (gfc_expr *e)
+{
+  gfc_ref *ref;
+  HOST_WIDE_INT istart, iend, length;
+  bool equal_length = false;
+
+  if (e->ts.type != BT_CHARACTER)
+return false;
+
+  for (ref = e->ref; ref; ref = ref->next)
+if (ref->type != REF_COMPONENT && ref->type != REF_ARRAY)
+  break;
+
+  if (!ref
+  || ref->type != REF_SUBSTRING
+  || !ref->u.ss.start
+  || ref->u.ss.start->expr_type != EXPR_CONSTANT
+  || !ref->u.ss.end
+  || ref->u.ss.end->expr_type != EXPR_CONSTANT
+  || !ref->u.ss.length)
+return false;
+
+  /* For non-deferred strings the given length shall be constant.  */
+  if (!e->ts.deferred
+  && (!ref->u.ss.length->length
+	  || ref->u.ss.length->length->expr_type != EXPR_CONSTANT))
+return false;
+
+  /* Basic checks on substring starting and ending indices.  */
+  if (!gfc_resolve_substring (ref, _length))
+return false;
+
+  istart = gfc_mpz_get_hwi (ref->u.ss.start->value.integer);
+  iend = gfc_mpz_get_hwi (ref->u.ss.end->value.integer);
+
+  if (istart <= iend)
+{
+  if (istart < 1)
+	{
+	  gfc_error ("Substring start index (" HOST_WIDE_INT_PRINT_DEC
+		 ") at %L below 1",
+		 istart, >u.ss.start->where);
+	  return false;
+	}
+
+  /* For deferred strings use end index as proxy for length.  */
+  if (e->ts.deferred)
+	length = iend;
+  else
+	length = gfc_mpz_get_hwi (ref->u.ss.length->length->value.integer);
+  if (iend > length)
+	{
+	  gfc_error ("Substring end index (" HOST_WIDE_INT_PRINT_DEC
+		 ") at %L exceeds string length",
+		 iend, >u.ss.end->where);
+	  return false;
+	}
+  length = iend - istart + 1;
+}
+  else
+length = 0;
+
+  /* Fix substring length.  */
+  e->value.character.length = length;
+
+  return true;
+}
+
+
 gfc_expr *
 gfc_simplify_len (gfc_expr *e, gfc_expr *kind)
 {
@@ -4521,7 +4593,8 @@ gfc_simplify_len (gfc_expr *e, gfc_expr *kind)
   if (k == -1)
 return _bad_expr;

-  if (e->expr_type == EXPR_CONSTANT)
+  if (e->expr_type == EXPR_CONSTANT
+  || substring_has_constant_len (e))
 {
   result = gfc_get_constant_expr (BT_INTEGER, k, >where);
   mpz_set_si (result->value.integer, e->value.character.length);
diff --git a/gcc/testsuite/gfortran.dg/pr100950.f90 b/gcc/testsuite/gfortran.dg/pr100950.f90
new file mode 100644
index 000..cb9d126bc18
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr100950.f90
@@ -0,0 +1,53 @@
+! { dg-do run }
+! { dg-additional-options "-fdump-tree-original" }
+! PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514
+
+program p
+  character(8), parameter :: u = "123"
+  character(8):: x = "", s
+  character(2):: w(2) = [character(len(x(3:4))) :: 'a','b' ]
+  character(*), parameter :: y(*) =

[PATCH] Jit, testsuite: Amend expect processing to tolerate more platforms.

2021-08-19 Thread Iain Sandoe

Hi,

Preface:

this is the last patch for now in my series - with this applied Darwin
reports the same results as Linux (at least, for modern x86_64
platform versions).

Note
a)  that the expect expression in {fixed}host_execute seems to depend
on the assumption that the dejagnu.h output is used by the testcase
and that the executable’s output can be seen to end with the totals
produced there (which might in itself be erroneous, see 3).

b) the main GCC testsuite processing does not do this; rather the expect
expression is somewhat simple and the output from the executable
is copied into a secondary buffer, which is then processed by
prune expressions and then to find the requisite matches (so most
of the issues seen below do not occur there).

 patch discussion

The current 'fixed_host_execute' implementation fails on Darwin
platforms for a number of reasons:

1/ If the sub-process spawn fails (e.g. because of missing or mal-
   formed params); rather than reporting the fail output into the
   match stream, as indicated by the expect manual, it terminates
   the script.

 - We fix this by (a) checking that the executable is valid as well
   as existing (b) we put the spawn into a catch block and report
   a failure.

2/ There is no recovery path at all for a buffer-full case (and we
   do see buffer-full events with the default sizes).

 - Added by the patch here, however it is not as sophisticated as
   the methods used by dejagnu internally.  Here we set the process
   to be "nowait" and then close the connection - with the intent
   that this will terminate the spawned process.

3/  The expect logic assumes that 'Totals:' is a valid indicator
for the end of the spawned process output.  This is not true
even for the default dejagnu header (there are a number of
additional reporting lines after).  In addition to this, there
are some tests that intentionally produce more output after
the totals report (and there are tests that do not use that
mechanism at all).

The effect is the we might arrive at the "wait" for the spawned
process to finish - but that process might not have completed
all its output.  For Darwin, at least that causes a deadlock
between expect and the spawnee - the latter is doing a non-
cancellable write and the former is waiting for the latter to
terminate.  For some reason this does not seem to affect Linux
perhaps the pty implementation allows the write(s) are able to
proceed even though there is no reader.

 -  This is fixed by modifying the loop termination condition to be
either EOF (which will be the 'correct' condition) or a timeout
which would represent an error either in the runtime or in the
parsing of the output.  As added precautions, we only try to
wait if there is a correcly-spawned process, and we are also
specific about which process we are waiting for.

4/  Darwin appears to have a bug in either the tcl or termios
'cooking' code that ocassionally inserts an additional CR char
into the stream - thus '\n' => '\r\r\n' instead of '\r\n'. The
original program output is correct (it only contains a single
\n) - the additional character is being inserted somewhere in
the translations applied before the output reaches expect.

The logic of this expect implementation does not tolerate single
\r or \n characters (it will fail with a timeout or buffer-full
if that occurs).

 -  This is fixed by having a line-end match that is adjusted for
Darwin.

5/  The default buffer size does seem to be too small in some cases
noting that GCC uses 1 as the match buffer size and the
default is 2000.

 -  Fixed by increasing the size to 8192.

6/  There is a somewhat arbitrary dumping of output where we match
^$prefix\tSOMETHING... and then process the something.  This
essentially allows the match to start at any place in the buffer
following any collection of non-line-end chars.

 -  Fixed by amending the match for 'general' lines to accommodate
these cases, and reporting such lines to the log.  At least this
should allow debugging of any cases where output that should be
recognized is being dropped.

tested on i686, x86_64-darwin, x86_64,powerpc64-linux,
OK for master?
thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/testsuite/ChangeLog:

* jit.dg/jit.exp (fixed_local_execute): Amend the match and
exit conditions to cater for more platforms.
---
 gcc/testsuite/jit.dg/jit.exp | 123 +++
 1 file changed, 83 insertions(+), 40 deletions(-)

diff --git a/gcc/testsuite/jit.dg/jit.exp b/gcc/testsuite/jit.dg/jit.exp
index 005ba01601a..8541a44e1b2 100644
--- a/gcc/testsuite/jit.dg/jit.exp
+++ b/gcc/testsuite/jit.dg/jit.exp
@@ -167,6 +167,9 @@ proc fixed_host_execute {args} {
 if {![file exists ${executable}]} {
perror "The executable, \"$executable\" is missing" 0
return "No source file

Re: [PATCH] more warning code refactoring

2021-08-19 Thread Martin Sebor via Gcc-patches


On 8/19/21 9:36 AM, Segher Boessenkool wrote:

On Thu, Aug 19, 2021 at 09:03:44AM -0600, Martin Sebor via Gcc-patches wrote:

On 8/18/21 11:56 PM, Kewen.Lin wrote:

To get rid of GTY variable alloc_object_size_limit looks suspicious,
maybe tree objects returned by alloc_max_size after the change are out
of GC's tracking?


I wouldn't expect that to make a difference.  There are thousands
of similar calls to build_int_cst() throughout the middle end.


But it does make a huge difference.  It has nothing to do with the calls
to build_int_cst, either.

Please don't mess with GTY (and call it a refactoring, to add insult to
injury) if you do not know what you are doing :-(


As I explained, the leak was caused by failing to pair the call to
enable_ranger() with the corresponding call to disable_ranger().
The removal of the global GTY variable has no impact on the test
case, or on memory usage in general.  That said, I introduced
the variable in r243470 to begin with and I consider its removal
a trivially correct and appropriate part of refactoring.




Looking at the original patch, the change that I'm not sure about
and that shouldn't have been part of the refactoring is the call
to enable_ranger() in pass_waccess::execute().  It's something
I was planning to do next.  But even that I wouldn't expect to
eat up a whole 1GB or memory.


The testcase is super heavy in the instruction combiner, so you get a
lot of garbage.  Which is not a problem, except you made the garbage
collector not pick this up, so we get a ton of garbage accumulating.


Neither that nor anything else you said is relevant or helpful.
On the off chance were trying to be and not just using this as
an opportunity to lecture, I recommend you do your homework
first next time and choose your words more carefully.

Martin

[PATCH] configure: Allow a host makefile fragment to override PIE flag settings.

2021-08-19 Thread Iain Sandoe

Hi,

This concerns the settings of flags (using the host makefile fragment) for
tools that will run on the host.

At present the (no)PIE flags are computed in gcc/configure but it is not
possible to override them (either from higher level Makefile or from the
command line).  Secondly the ordering of placement of flags assumes
ELF semantics are OK for ordering of -fno-PIE and -fPIC.

For Darwin, this introduces problems if fno-PIE causes PIC to be switched
off and the bootstrap compiler does not support mdynamic-no-pic (which
is the case when we bootstrap a 32b toolchain with clang).  This causes
the host files to be built '-static' which is not legal for user-space
code, and the build terminates with illegal relocations (so that bootstrap
fails).

This patch:

1. Allows the computed PIE flags to be overridden by the top level
   configure.

2. Allows a host fragment to set values on the basis of the configured
   host platform/version.

3. Sets suitable values for the Darwin cases that currently fail.

tested on i686,powerpc,x86-64-darwin, x86_64, powerpc64-linux,
OK for master?

thanks
Iain

Signed-off-by: Iain Sandoe 

ChangeLog:

* Makefile.in: Regenerated.
* Makefile.tpl: Add PIE flags to HOST_EXPORTS and
POST_STAGE1_HOST_EXPORTS as specified by the host makefile
fragment.

config/ChangeLog:

* mh-darwin: Specify suitable PIE/PIC flags for 32b Darwin
hosts.

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Allow top level configure to override assumed
PIE flags.
---
 Makefile.in  | 14 ++
 Makefile.tpl | 14 ++
 config/mh-darwin | 20 
 gcc/configure|  4 ++--
 gcc/configure.ac |  4 ++--
 5 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/Makefile.tpl b/Makefile.tpl
index 9adf4f94728..9523be5a761 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -576,6 +576,20 @@ all:
 @host_makefile_frag@
 ###
 
+# Allow host makefile fragment to override PIE settings.
+ifneq ($(STAGE1_NO_PIE_CFLAGS),)
+  HOST_EXPORTS += export NO_PIE_CFLAGS="$(STAGE1_NO_PIE_CFLAGS)";
+endif
+ifneq ($(STAGE1_NO_PIE_FLAG),)
+  HOST_EXPORTS += export NO_PIE_FLAG="$(STAGE1_NO_PIE_FLAG)";
+endif
+ifneq ($(BOOT_NO_PIE_CFLAGS),)
+  POSTSTAGE1_HOST_EXPORTS += export NO_PIE_CFLAGS="$(BOOT_NO_PIE_CFLAGS)";
+endif
+ifneq ($(BOOT_NO_PIE_FLAG),)
+  POSTSTAGE1_HOST_EXPORTS += export NO_PIE_FLAG="$(BOOT_NO_PIE_FLAG)";
+endif
+
 # This is the list of directories that may be needed in RPATH_ENVVAR
 # so that programs built for the target machine work.
 TARGET_LIB_PATH = [+ FOR target_modules +][+
diff --git a/config/mh-darwin b/config/mh-darwin
index b72835ae953..d3c357a0574 100644
--- a/config/mh-darwin
+++ b/config/mh-darwin
@@ -7,6 +7,13 @@
 # We use Werror, since some versions of clang report unknown command line flags
 # as a warning only.
 
+# In addition, all versions of clang released to date treat -fno-PIE in -m32
+# compilations  as switching PIC code off too, which means that -fno-PIE,
+# without -mdynamic-no-pic produces broken relocations (and we cannot enable
+# -mdynamic-no-pic because the inverse setting doesn't work).  To work around
+# this, we need to ensure that the no-PIE option is followed by -fPIE, when
+# the compiler does not support mdynamic-no-pic.
+
 # We only need to determine this for the host tool used to build stage1 (or a
 # non-bootstrapped compiler), later stages will be built by GCC which supports
 # the required flags.
@@ -24,23 +31,28 @@ endif
 @if gcc-bootstrap
 ifeq (${BOOTSTRAP_TOOL_CAN_USE_MDYNAMIC_NO_PIC},true)
 STAGE1_CFLAGS += -mdynamic-no-pic
+STAGE1_NO_PIE_CFLAGS = -fno-PIE
 else
 STAGE1_CFLAGS += -fPIC
+STAGE1_NO_PIE_CFLAGS = -fno-PIE -fPIC
 endif
 ifeq (${host_shared},no)
 # Add -mdynamic-no-pic to later stages when we know it is built with GCC.
 BOOT_CFLAGS += -mdynamic-no-pic
+BOOT_NO_PIE_CFLAGS = -fno-PIE
+else
+BOOT_NO_PIE_CFLAGS = -fno-PIE -fPIC
 endif
 @endif gcc-bootstrap
 
 @unless gcc-bootstrap
 ifeq (${BOOTSTRAP_TOOL_CAN_USE_MDYNAMIC_NO_PIC},true)
-# FIXME: we should also enable this for cross and non-bootstrap builds but
-# that needs amendment to libcc1.
-# CFLAGS += -mdynamic-no-pic
-# CXXFLAGS += -mdynamic-no-pic
+CFLAGS += -mdynamic-no-pic
+CXXFLAGS += -mdynamic-no-pic
+STAGE1_NO_PIE_CFLAGS = -fno-PIE
 else
 CFLAGS += -fPIC
 CXXFLAGS += -fPIC
+STAGE1_NO_PIE_CFLAGS = -fno-PIE -fPIC
 endif
 @endunless gcc-bootstrap

diff --git a/gcc/configure.ac b/gcc/configure.ac
index ad8fa5a4604..fad9b27879e 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -7580,7 +7580,7 @@ AC_CACHE_CHECK([for -fno-PIE option],
  [gcc_cv_c_no_fpie=no])
CXXFLAGS="$saved_CXXFLAGS"])
 if test "$gcc_cv_c_no_fpie" = "yes"; then
-  NO_PIE_CFLAGS="-fno-PIE"
+  NO_PIE_CFLAGS=${NO_PIE_CFLAGS-"-fno-PIE"}
 fi
 AC_SUBST([NO_PIE_CFLAGS])
 
@@ -7594,7 +7594,7 @@ AC_CACHE_CHECK([for -no-pie option],
  [gcc_cv_no_pie=no])
LDFLAGS="$saved_LDFLAGS"])
 if test

[committed] release ranger instance in pass_waccess (PR 101984)

2021-08-19 Thread Martin Sebor via Gcc-patches


The changes in last night's patch to the new access warning pass
(somewhat prematurely) included a call to enable_ranger() with no
matching call to disable_ranger().  The two calls must be paired
in order for the latter to release resources allocated by
the former, otherwise the resources leak and might cause GCC to
run out memory (as was observed in PR 101984).

Besides a native x86_64 build I have also tested the change with
the affected test and a powerpc-linux cross-compiler simply by
observing memory usage.  Committed as obvious in r12-3031.

Martin
PR middle-end/101984 - gimple-ssa-warn-access memory leak

gcc/ChangeLog:

	PR middle-end/101984
	* gimple-ssa-warn-access.cc (pass_waccess::execute): Also call
	disable_ranger.

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index f3efe564af0..4a2dd9ade77 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -3310,12 +3310,16 @@ pass_waccess::check (basic_block bb)
 unsigned
 pass_waccess::execute (function *fun)
 {
+  /* Create a new ranger instance and associate it with FUN.  */
   m_ranger = enable_ranger (fun);
 
   basic_block bb;
   FOR_EACH_BB_FN (bb, fun)
 check (bb);
 
+  /* Release the ranger instance and replace it with a global ranger.  */
+  disable_ranger (fun);
+
   return 0;
 }

[pushed] Objective-C, NeXT runtime: Correct the default for fobjc-nilcheck.

2021-08-19 Thread Iain Sandoe

Hi,

It is intended that the default for the NeXT runtime at ABI 2 is to
check for nil message receivers.  This updates this to match the
documented behaviour and to match the behaviour of the system tools.

tested on x86_64, i686-darwin, x86_64-linux,
pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/objc/ChangeLog:

* objc-next-runtime-abi-02.c (objc_next_runtime_abi_02_init):
Default receiver nilchecks on.
---
 gcc/objc/objc-next-runtime-abi-02.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/objc/objc-next-runtime-abi-02.c 
b/gcc/objc/objc-next-runtime-abi-02.c
index 0d963e357c4..ce831fc34ae 100644
--- a/gcc/objc/objc-next-runtime-abi-02.c
+++ b/gcc/objc/objc-next-runtime-abi-02.c
@@ -254,6 +254,10 @@ objc_next_runtime_abi_02_init (objc_runtime_hooks *rthooks)
   flag_objc_sjlj_exceptions = 0;
 }
 
+  /* NeXT ABI 2 is intended to default to checking for nil receivers.  */
+  if (! global_options_set.x_flag_objc_nilcheck)
+flag_objc_nilcheck = 1;
+
   rthooks->initialize = next_runtime_02_initialize;
   rthooks->default_constant_string_class_name = DEF_CONSTANT_STRING_CLASS_NAME;
   rthooks->tag_getclass = TAG_GETCLASS;
-- 
2.24.3 (Apple Git-128)

Re: [committed] Drop stabs support from h8300 and v850 ports

2021-08-19 Thread Gerald Pfeifer

On Thu, 19 Aug 2021, Jeff Law via Gcc-patches wrote:
> Whee, two more ports dropping stabs. Committed to the trunk.

Are you saying you're on a mission to stab wodden stakes into stabs?

SCNR :-)
Gerald

Re: [PATCH] aix: handle 64bit inodes for include directories

2021-08-19 Thread Jeff Law via Gcc-patches





On 6/28/2021 1:16 AM, CHIGOT, CLEMENT wrote:

On 6/23/2021 12:53 AM, CHIGOT, CLEMENT via Gcc-patches wrote:

Hi David,

Did you have a chance to take look at this patch ?

Thanks,
Clément



+DavidMalcolm

Can you review this patch when you have a moment?

Thanks, David

On Mon, May 17, 2021 at 3:05 PM David Edelsohn  wrote:

The aix.h change is okay with me, but you need to get approval for the
incpath.c and cpplib.h parts of the patch from the appropriate
maintainers.

Thanks, David

On Mon, May 17, 2021 at 7:44 AM CHIGOT, CLEMENT  wrote:

On AIX, stat will store inodes in 32bit even when using LARGE_FILES.
If the inode is larger, it will return -1 in st_ino.
Thus, in incpath.c when comparing include directories, if several
of them have 64bit inodes, they will be considered as duplicated.

gcc/ChangeLog:
2021-05-06  Clément Chigot  

   * configure.ac: Check sizeof ino_t and dev_t.
   * config.in: Regenerate.
   * configure: Regenerate.
   * config/rs6000/aix.h (HOST_STAT_FOR_64BIT_INODES): New define.
   * incpath.c (HOST_STAT_FOR_64BIT_INODES): New define.
   (remove_duplicates): Use it.

libcpp/ChangeLog:
2021-05-06  Clément Chigot  

   * configure.ac: Check sizeof ino_t and dev_t.
   * config.in: Regenerate.
   * configure: Regenerate.
   * include/cpplib.h (INO_T_CPP): Change for AIX.
   (DEV_T_CPP): New macro.
   (struct cpp_dir): Use it.

So my worry here is this is really a host property -- ie, this is
behavior of where GCC runs, not the target for which GCC is generating code.

That implies that the change in aix.h is wrong.  aix.h is for the
target, not the host -- you don't want to define something like
HOST_STAT_FOR_64BIT_INODES there.

You'd want to be triggering this behavior via a host fragment, x-aix, or
better yet via an autoconf test.

Indeed, would this version be better ? I'm not sure about the configure test.
But as we are retrieving the size of dev_t and ino_t just above, I'm assuming
that the one being used in stat directly. At least, that's the case on AIX, and
this test is only made for AIX.

It's a clear improvement.  It's still checking for the aix target though:

+# Select the right stat being able to handle 64bit inodes, if needed.
+if test "$enable_largefile" != no; then
+  case "$target" in
+    *-*-aix*)
+  if test "$ac_cv_sizeof_ino_t" == "4" -a "$ac_cv_sizeof_dev_t" == 
4; then

+
+$as_echo "#define HOST_STAT_FOR_64BIT_INODES stat64x" >>confdefs.h
+
+  fi;;
+  esac
+fi

Again, we're dealing with a host property.  You might be able to just 
change $target above to $host.  Hmm, that makes me wonder about canadian 
crosses where host != build.    We may need to do this for both the aix 
host and aix build.


Sorry about the delay,
jeff

[committed] Drop stabs support from h8300 and v850 ports

2021-08-19 Thread Jeff Law via Gcc-patches


Whee, two more ports dropping stabs. Committed to the trunk.

Jeff
commit 18e9e7db7afb8635316414b560c10852db13c4c1
Author: Jeff Law 
Date:   Thu Aug 19 14:15:03 2021 -0400

Drop stabs from h8/300 and v850 ports

gcc/
* config.gcc (h8300-*-elf*): Do not include dbxelf.h.
(h8300-*-linux*, v850-*-rtems*, v850*-elf*): Likewise.
* config/v850/v850.h (DEFAULT_GDB_EXTENSIONS): Remove.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index eb232df6df4..08e6c6779a5 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1677,11 +1677,11 @@ moxie-*-moxiebox*)
;;
 h8300-*-elf*)
tmake_file="h8300/t-h8300"
-   tm_file="h8300/h8300.h dbxelf.h elfos.h newlib-stdint.h h8300/elf.h"
+   tm_file="h8300/h8300.h elfos.h newlib-stdint.h h8300/elf.h"
;;
 h8300-*-linux*)
tmake_file="${tmake_file} h8300/t-h8300 h8300/t-linux"
-   tm_file="h8300/h8300.h dbxelf.h elfos.h gnu-user.h linux.h 
glibc-stdint.h h8300/linux.h"
+   tm_file="h8300/h8300.h elfos.h gnu-user.h linux.h glibc-stdint.h 
h8300/linux.h"
;;
 hppa*64*-*-linux*)
target_cpu_default="MASK_PA_11|MASK_PA_20"
@@ -3473,7 +3473,7 @@ tilepro*-*-linux*)
;;
 v850-*-rtems*)
target_cpu_default="TARGET_CPU_generic"
-   tm_file="dbxelf.h elfos.h v850/v850.h"
+   tm_file="elfos.h v850/v850.h"
tm_file="${tm_file} v850/rtems.h rtems.h newlib-stdint.h"
tmake_file="${tmake_file} v850/t-v850"
tmake_file="${tmake_file} v850/t-rtems"
@@ -3502,11 +3502,7 @@ v850*-*-*)
target_cpu_default="TARGET_CPU_generic"
;;
esac
-   tm_file="dbxelf.h elfos.h newlib-stdint.h v850/v850.h"
-   if test x$stabs = xyes
-   then
-   tm_file="${tm_file} dbx.h"
-   fi
+   tm_file="elfos.h newlib-stdint.h v850/v850.h"
use_collect2=no
c_target_objs="v850-c.o"
cxx_target_objs="v850-c.o"
diff --git a/gcc/config/v850/v850.h b/gcc/config/v850/v850.h
index 386f9f59e0b..51622684622 100644
--- a/gcc/config/v850/v850.h
+++ b/gcc/config/v850/v850.h
@@ -694,9 +694,6 @@ typedef enum
   if ((LOG) != 0)  \
 fprintf (FILE, "\t.align %d\n", (LOG))
 
-/* We don't have to worry about dbx compatibility for the v850.  */
-#define DEFAULT_GDB_EXTENSIONS 1
-
 /* Use dwarf2 debugging info by default.  */
 #undef  PREFERRED_DEBUGGING_TYPE
 #define PREFERRED_DEBUGGING_TYPE   DWARF2_DEBUG

Re: [PATCH 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-08-19 Thread Paul A. Clarke via Gcc-patches

On Wed, Aug 18, 2021 at 05:46:58PM -0500, Segher Boessenkool wrote:
> On Mon, Aug 09, 2021 at 03:23:50PM -0500, Paul A. Clarke wrote:
> > Suppress exceptions (when specified), by saving, manipulating, and
> > restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> > rounding mode when required.
> > 
> > No attempt is made to optimize writing the FPSCR (by checking if the new
> > value would be the same), other than using lighter weight instructions
> > when possible.
> 
> There are __builtin_set_fpscr_rn and friends, please use those, those
> are optimised for any platform.

I do.  (Unless I missed an opportunity somewhere?)

The "optimize" comment refers to, for example, not checking the current
rounding mode before setting and restoring it.

> > * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
> > _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
> > Convert from function to macro.
> 
> Please explain why you regress this (not in the changelog of course).

I'm not sure what "regress" means here?

I should've said that these are now identical implementations to those
found in config/i386/smmintrin.h.  I'll add that to the commit message
in v2.

> > +/* Rounding mode macros. */
> > +#define _MM_FROUND_TO_NEAREST_INT   0x00
> > +#define _MM_FROUND_TO_ZERO  0x01
> > +#define _MM_FROUND_TO_POS_INF   0x02
> > +#define _MM_FROUND_TO_NEG_INF   0x03
> > +#define _MM_FROUND_CUR_DIRECTION0x04
> 
> You can just write "0" .. "4", heh.

Copied from config/i386/smmintrin.h.

> > +#define _MM_FROUND_NINT\
> > +  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_FLOOR   \
> > +  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_CEIL\
> > +  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_TRUNC   \
> > +  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_RINT\
> > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_NEARBYINT   \
> > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
> 
> All these macro definitions will comfortably fit on one line.

Copied from config/i386/smmintrin.h.

> > +__inline __m128d
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_round_pd (__m128d __A, int __rounding)
> > +{
> 
> Non-static inline is not what you want, esp. with gnu-inline?  Or, what
> is the goal, and why can you not do it with modern inline?

This is the same basic signature as the other 600+ intrinsics.
Actually, they were all described as "extern", but in a previous
review, you said:
> "extern" on definitions is superfluous
So, I've dropped that for newer ones.
Should they all instead be "static"?

The goal is to be compatible with the i386 implementations.
Those typically use something like:

  extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))

(which kinda makes me want to put "extern" back, now that I think
about it).

I'm not sure what you mean by "modern inline".

> > +  __v2df __r;
> > +  union {
> > +double __fr;
> > +long long __fpscr;
> > +  } __save, __tmp;
> > +
> > +  if (__rounding & _MM_FROUND_NO_EXC)
> > +  {
> 
> Wrong indent.  This code is very hard to read because of that.

OK, will fix in v2.

> If you figure that gee, it would be a nice if we had a builtin for
> mffsce, then please make one?  :-)

Is one use-case sufficient grounds?  I can give it a shot if so.

> > +case _MM_FROUND_TO_NEAREST_INT:
> > +  __tmp.__fr = __builtin_mffsl ();
> > +  __attribute__((fallthrough));
> 
> Space before (.

OK

> > +case _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC:
> 
> Space after |.

OK

> Please fix these things and resend.

Will do.  Thanks!

PC

Re: [PATCH] document enable/disable_ranger

2021-08-19 Thread Andrew MacLeod via Gcc-patches


On 8/19/21 2:00 PM, David Malcolm wrote:

On Thu, 2021-08-19 at 11:30 -0600, Martin Sebor via Gcc-patches wrote:

Hey Aldy & Andrew,

I introduced a leak by calling enable_ranger() without pairing it
with one to disable_ranger() on the same function (PR 101984).
I didn't realize (or look to see) that enable_ranger() dynamically
allocates memory.

The patch below adds comments to make it clear that the calls need
to be paired.  That seems obvious now but wasn't before from just
the function names.  So I'm wondering if we might want to rename
them to make it more obvious that the former involves allocating
memory that must be explicitly deallocated.

If you agree, names along the following lines would make this
clearer (to me, anyway) but I'm open to others:

    gimple_ranger *set_new_ranger (function *);
    void release_ranger (function *);


Could an RAII class help here, to make the aquire/release pairing more
automatic?

Dave


Well, we discussed making it a a pass property, among other things,  but 
there are also use cases we can imagine where we want to be able to have 
some alternate control.


I think for the moment its not overly difficult to create a ranger when 
your pass is ready for it, and then dispose of it when you are done.


Andrew

Re: [PATCH 0/4] drop version checks for in-tree gas [PR91602]

2021-08-19 Thread Jeff Law via Gcc-patches





On 8/9/2021 12:46 AM, Serge Belyshev wrote:

Jeff Law  writes:


On 7/20/2021 9:44 AM, Serge Belyshev wrote:

Special-casing checks for in-tree gas features is unnecessary since
r17 which made configure-gcc depend on all-gas, and thus making
alternate code path in gcc_GAS_CHECK_FEATURE for in-tree gas
redundant.

Along the way this fixes PR 91602, which is caused by incorrect guess
of leb128 support presense in RISC-V.

First patch removes alternate code path in gcc_GAS_CHECK_FEATURE and
related code, the rest are further cleanups.  Patches 2 and 3 in
series make no functional changes, thus configure is unchanged.

Bootstrapped/regtested on x86_64-pc-linux-gnu, riscv64-unknown-linux-gnu,
sparc-sun-solaris2.11 and powerpc-ibm-aix7.{1.5.0,2.4.0}, with and without
in-tree binutils (except on aix where combined tree does not appear to work
due to dynamic linker peculiarity).

OK for mainline ?

Serge Belyshev (4):
configure: drop version checks for in-tree gas [PR91602]
configure: remove version argument from gcc_GAS_CHECK_FEATURE
configure: fixup formatting from previous change
configure: remove gas versions from tls check

So just be clear, the point here is to stop checking the version # and
instead always do a real feature check by testing the behavior of the
assembler, even an in-tree assembler, right?

That is correct, yes.
This set is approved.    Push them to the trunk when it's convenient for 
you.


Thanks for your patience,
jeff

Re: [PATCH] document enable/disable_ranger

2021-08-19 Thread Andrew MacLeod via Gcc-patches


On 8/19/21 1:30 PM, Martin Sebor wrote:

Hey Aldy & Andrew,

I introduced a leak by calling enable_ranger() without pairing it
with one to disable_ranger() on the same function (PR 101984).
I didn't realize (or look to see) that enable_ranger() dynamically
allocates memory.

The patch below adds comments to make it clear that the calls need
to be paired.  That seems obvious now but wasn't before from just
the function names.  So I'm wondering if we might want to rename
them to make it more obvious that the former involves allocating
memory that must be explicitly deallocated.

If you agree, names along the following lines would make this
clearer (to me, anyway) but I'm open to others:

  gimple_ranger *set_new_ranger (function *);
  void release_ranger (function *); 



I think the missing comments alone are enough for now.

OK.

Andrew

Re: [PATCH] Fold more constants during veclower pass.

2021-08-19 Thread Jeff Law via Gcc-patches





On 8/19/2021 9:53 AM, Roger Sayle wrote:

An issue with a backend patch I've been investigating has revealed
a missed optimization opportunity during GCC's vector lowering pass.
An unrecognized insn for "(set (reg:SI) (not:SI (const_int 0))"
revealed that not only was my expander not expecting a NOT with
a constant operand, but also that veclower was producing the
dubious tree expression ~0.

The attached patch replaces a call to gimple_build_assign with a
call to either gimplify_build1 or gimplify_build2 depending upon
whether the operation takes one or two operands.  The net effect
is that where GCC previously produced the following optimized
gimple for testsuite/c-c++common/Wunused-var-16.c (notice the ~0
and the "& 0"):

void foo ()
{
   V x;
   V y;
   vector(16) unsigned char _1;
   unsigned char _7;
   unsigned char _8;

   y_2 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
   x_3 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
   _7 = ~0;
   _1 = {_7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7};
   _8 = 0 & _7;
   y_4 = {_8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8};
   v = y_4;
   return;
}

With this patch we now generate:

void foo ()
{
   V x;
   V y;
   vector(16) unsigned char _1;

   y_2 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
   x_3 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
   _1 = { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255 };
   y_4 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
   v = y_4;
   return;
}

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check".

Ok for mainline?


2021-08-18  Roger Sayle  

gcc/ChangeLog
* tree-vect-generic.c (expand_vector_operations_1): Use either
gimplify_build1 or gimplify_build2 instead of gimple_build_assign
when constructing scalar splat expressions.

gcc/testsuite/ChangeLog
* c-c++common/Wununsed-var-16.c: Add extra check for that ~0
is optimized away.

OK
jeff

Re: [PATCH] document enable/disable_ranger

2021-08-19 Thread David Malcolm via Gcc-patches

On Thu, 2021-08-19 at 11:30 -0600, Martin Sebor via Gcc-patches wrote:
> Hey Aldy & Andrew,
> 
> I introduced a leak by calling enable_ranger() without pairing it
> with one to disable_ranger() on the same function (PR 101984).
> I didn't realize (or look to see) that enable_ranger() dynamically
> allocates memory.
> 
> The patch below adds comments to make it clear that the calls need
> to be paired.  That seems obvious now but wasn't before from just
> the function names.  So I'm wondering if we might want to rename
> them to make it more obvious that the former involves allocating
> memory that must be explicitly deallocated.
> 
> If you agree, names along the following lines would make this
> clearer (to me, anyway) but I'm open to others:
> 
>    gimple_ranger *set_new_ranger (function *);
>    void release_ranger (function *);


Could an RAII class help here, to make the aquire/release pairing more
automatic?

Dave

[PATCH] document enable/disable_ranger

2021-08-19 Thread Martin Sebor via Gcc-patches


Hey Aldy & Andrew,

I introduced a leak by calling enable_ranger() without pairing it
with one to disable_ranger() on the same function (PR 101984).
I didn't realize (or look to see) that enable_ranger() dynamically
allocates memory.

The patch below adds comments to make it clear that the calls need
to be paired.  That seems obvious now but wasn't before from just
the function names.  So I'm wondering if we might want to rename
them to make it more obvious that the former involves allocating
memory that must be explicitly deallocated.

If you agree, names along the following lines would make this
clearer (to me, anyway) but I'm open to others:

  gimple_ranger *set_new_ranger (function *);
  void release_ranger (function *);

Martin

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 60b7d3a59cd..ef3afeacc90 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -381,6 +381,10 @@ gimple_ranger::dump (FILE *f)
   m_cache.dump (f);
 }

+/* Create a new ranger instance and associate it with function FUN.
+   Each call must be paired with a call to disable_ranger to release
+   resources.  */
+
 gimple_ranger *
 enable_ranger (struct function *fun)
 {
@@ -392,6 +396,9 @@ enable_ranger (struct function *fun)
   return r;
 }

+/* Destroy and release the ranger instance associated with function FUN
+   and replace it with the global ranger.  */
+
 void
 disable_ranger (struct function *fun)
 {
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index 41845b14fd6..eaebb9c5833 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -62,6 +62,9 @@ protected:
   range_tracer tracer;
 };

+/* Create a new ranger instance and associate it with a function.
+   Each call must be paired with a call to disable_ranger to release
+   resources.  */
 extern gimple_ranger *enable_ranger (struct function *);
 extern void disable_ranger (struct function *);

Re: [PATCH v3, Fortran] TS 29113 testsuite

2021-08-19 Thread Sandra Loosemore


On 7/27/21 5:07 AM, Tobias Burnus wrote:

Hi Sandra, hi Thomas, hi all,

@Thomas K: Comments about the following - and of course to the
testsuite itself - are highly welcome.

In my opinion, the testsuite LGTM and can be committed.

@Sandra:
- Thoughts on the directory name? (cf. below)
- Give others/Thomas a chance to comment on this,
   before committing it. (And remove the now passing xfails.)
   Thanks for the testsuite!

Regarding:

* XFAILS - as discussed before, I think having some XFAILS is
   not ideal but fine, especially if the XFAIL/PASS ratio is low
   and there are plans to fix the known fails, some posted patches
   for those, and open PRs for the issues.

(I think there is one patch pending review and two patches pending
committal with some modifications by Sandra - plus several patches
by José which still need to be reviewed.)


* Naming of the directory + .exp file:
  ts29113/ts29113.exp
   is okay. Given that 'select rank' (in F2018 but not in TS29113)
   is also tested, there was some controversy regarding the name
   and the coverage; additionally, TS29113 is a name which is not
   immediately clear. Thus, we could use some other name like:
  c-interop/c-interop.exp
   or  (suggestions?).
   In any case, I do not feel strong about either name.

* I had a closer look at earlier versions of the testsuite, I did
   browse through the current one + looked at the diff to previous
   version, but it is big enough and the spec is complex enough that
   I have likely missed something.
   Thus: Additional reviews are highly welcome!


Here is the current version of the testsuite.  Changes since the last 
version include:


* Renaming the directory and .exp file from ts29113 -> c-interop per the 
request above.  There have been no additional review comments.


* I also made it explicit that section and constraint numbers mentioned 
in comments in the test cases refer to TS 29113.  I considered using the 
numbering from 2018 standard, but given that the standard already 
renumbered things twice since the time TS 29113 was published I didn't 
really see the point, as long as it is unambiguous what document is 
being cited.


* I flattened the subdirectory structure after realizing that 
dg-additional-sources can't cope with relative pathnames in remote-host 
testing.


* I split up the typecodes tests (for testing that descriptors 
constructed by the front end match ISO_Fortran_binding.h) to allow for 
finer-grained control over xfails and dg-require-effective-target, and 
added a new effective target for Fortran C_FLOAT128 support.  There are 
also some additional things being tested now in this group.


The current xfails in the tests reflect the two patches I posted last 
night that are still waiting for review:


https://gcc.gnu.org/pipermail/fortran/2021-August/056382.html
https://gcc.gnu.org/pipermail/fortran/2021-August/056383.html

I've been testing on x86 (both 32- and 64-bit) and powerpc64le-linux-gnu.

Given that Tobias already said the last version of the patch was OK, I'd 
like to commit this soon, either at the same time I push the patches 
above, or next week if there is some hold-up on them.  If anybody wants 
more time to review this first, let me know.


-Sandra


ts29113-aug19.patch.gz
Description: application/gzip

Re: [PATCH] more warning code refactoring

2021-08-19 Thread Martin Sebor via Gcc-patches


On 8/19/21 9:03 AM, Martin Sebor wrote:

On 8/18/21 11:56 PM, Kewen.Lin wrote:

Hi David,

on 2021/8/19 上午11:26, David Edelsohn via Gcc-patches wrote:

Hi, Martin

A few PowerPC-specific testcases started failing yesterday on AIX with
a strange failure mode: the compiler runs out of memory.  As you may
expect from telling you this in an email reply to your patch, I have
bisected the failure and landed on your commit.  I can alternate
between the previous commit and your commit, and the failure
definitely appears with your patch, although I'm unsure how your patch
affected memory allocation in the compiler.  Maybe moving the code
changed a type of allocation or some memory no longer is being freed?




To get rid of GTY variable alloc_object_size_limit looks suspicious,
maybe tree objects returned by alloc_max_size after the change are out
of GC's tracking?

If the suspicion holds, the attached explorative diff may help.


I wouldn't expect that to make a difference.  There are thousands
of similar calls to build_int_cst() throughout the middle end.

Looking at the original patch, the change that I'm not sure about
and that shouldn't have been part of the refactoring is the call
to enable_ranger() in pass_waccess::execute().  It's something
I was planning to do next.  But even that I wouldn't expect to
eat up a whole 1GB or memory.


I have reproduced the excessive memory consumption with
the rlwimi-0.c test and a powerpc-linux cross-compiler, and
confirmed that it is indeed caused by the call to enable_ranger().
The test defines some six thousand functions so it seems that
unless each call enable_ranger() is paired with some call to
release the memory it allocates the memory leaks.

The removal of the alloc_object_size_limit global variable doesn't
have any effect on the test case.  The function that used it (and
now calls build_int_cst () instead) isn't called when the test
is compiled  (It's only called for calls to allocation functions
in the source and the test case has none.

Let me take care of releasing the ranger memory.

Martin






BR,
Kewen


Previously, compiler bootstrap and all testcases ran with a data size
of 1GB.  After your change, the data size required for those
particular testcases jumped to 2GB.

The testcases are

gcc/testsuite/gcc.target/powerpc/rlwimi-[012].c

The failure is

cc1: out of memory allocating 65536 bytes after a total of 1608979296

This seems like a significant memory use regression.  Any ideas what 
happened?


Not really.  The patch just moved code around.  I didn't make any
changes that I'd expect to impact memory allocation to an appreciable
extent, at least not intentionally.  Let me look into it and get back
to you.

Martin



Thanks, David

Re: [PATCH] Use __builtin_trap() for abort() if inhibit_libc

2021-08-19 Thread Jeff Law via Gcc-patches





On 8/17/2021 2:41 AM, Sebastian Huber wrote:

abort() is used in gcc_assert() and gcc_unreachable() which is used by target
libraries such as libgcov.a.  This patch changes the abort() definition under
certain conditions.  If inhibit_libc is defined and abort is not already
defined, then abort() is defined to __builtin_trap().

The inhibit_libc define is usually defined if GCC is built for targets running
in embedded systems which may optionally use a C standard library.  If
inhibit_libc is defined, then there may be still a full featured abort()
available.  abort() is a heavy weight function which depends on signals and
file streams.  For statically linked applications, this means that a dependency
on gcc_assert() pulls in the support for signals and file streams.  This could
prevent using gcov to test low end targets for example.  Using __builtin_trap()
avoids these dependencies if the target implements a "trap" instruction.  The
application or operating system could use a trap handler to react to failed GCC
runtime checks which caused a trap.

gcc/

* tsystem.h (abort): Define abort() if inhibit_libc is defined and it
is not already defined.

OK.
Jeff

Re: [PATCH][MIPS] Remove TARGET_ASM_FUNCTION_RODATA_SECTION

2021-08-19 Thread Jeff Law via Gcc-patches





On 8/19/2021 6:11 AM, Dragan Mladjenovic wrote:

Since 'Remove obsolete IRIX 6.5 support' [1] we only use
gp-relative jump-tables for PIC code. We can fall back to
default behaviour for asm_function_rodata_section.

[1] https://gcc.gnu.org/ml/libstdc++/2012-03/msg00067.html

2018-06-04 Dragan Mladjenovic 
gcc/

* config/mips/mips.c (mips_function_rodata_section,
TARGET_ASM_FUNCTION_RODATA_SECTION): Removed.

OK
jeff

Re: [PATCH] more warning code refactoring

2021-08-19 Thread David Edelsohn via Gcc-patches

Hi, Kewen

Good catch!

The patch is in the right direction, but gimple-ssa-warn-access.cc is
the first file that requires GTY and ends in ".cc".  The GCC Makefile
machinery to create the GTY headers performs the substitution for
files with file extension ".c", so this requires more adjustment in
the Makefile.

Thanks, David

On Thu, Aug 19, 2021 at 1:57 AM Kewen.Lin  wrote:
>
> Hi David,
>
> on 2021/8/19 上午11:26, David Edelsohn via Gcc-patches wrote:
> > Hi, Martin
> >
> > A few PowerPC-specific testcases started failing yesterday on AIX with
> > a strange failure mode: the compiler runs out of memory.  As you may
> > expect from telling you this in an email reply to your patch, I have
> > bisected the failure and landed on your commit.  I can alternate
> > between the previous commit and your commit, and the failure
> > definitely appears with your patch, although I'm unsure how your patch
> > affected memory allocation in the compiler.  Maybe moving the code
> > changed a type of allocation or some memory no longer is being freed?
> >
>
>
> To get rid of GTY variable alloc_object_size_limit looks suspicious,
> maybe tree objects returned by alloc_max_size after the change are out
> of GC's tracking?
>
> If the suspicion holds, the attached explorative diff may help.
>
> BR,
> Kewen
>
> > Previously, compiler bootstrap and all testcases ran with a data size
> > of 1GB.  After your change, the data size required for those
> > particular testcases jumped to 2GB.
> >
> > The testcases are
> >
> > gcc/testsuite/gcc.target/powerpc/rlwimi-[012].c
> >
> > The failure is
> >
> > cc1: out of memory allocating 65536 bytes after a total of 1608979296
> >
> > This seems like a significant memory use regression.  Any ideas what 
> > happened?
> >
> > Thanks, David
> >

RE: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

2021-08-19 Thread Roger Sayle


Doh!  ENOPATCH.

-Original Message-
From: Roger Sayle  
Sent: 19 August 2021 16:59
To: 'GCC Patches' 
Subject: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.


Back in June I briefly mentioned in one of my gcc-patches posts that a
change that should have always reduced code size, would mysteriously
occasionally result in slightly larger code (according to CSiBE):
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573233.html

Investigating further, the cause turns out to be that x86_64's
scalar-to-vector (stv) pass is relying on poor estimates of the size
costs/benefits.  This patch tweaks the backend's compute_convert_gain method
to provide slightly more accurate values when compiling with -Os.
Compilation without -Os is (should be) unaffected.  And for completeness,
I'll mention that the stv pass is a net win for code size so it's much
better to improve its heuristics than simply gate the pass on
!optimize_for_size.

The net effect of this change is to save 1399 bytes on the CSiBE code size
benchmark when compiling with -Os.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.

Ok for mainline?


2021-08-19  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-features.c (compute_convert_gain): Provide
more accurate values for CONST_INT, when optimizing for size.
* config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
* config/i386/i386.h (COSTS_N_BYTES): to here.

Roger
--

diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index d9c6652..cdae3dc 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -610,12 +610,31 @@ general_scalar_chain::compute_convert_gain ()
 
  case CONST_INT:
if (REG_P (dst))
- /* DImode can be immediate for TARGET_64BIT and SImode always.  */
- igain += m * COSTS_N_INSNS (1);
+ {
+   if (optimize_insn_for_size_p ())
+ {
+   /* xor (2 bytes) vs. xorps (3 bytes).  */
+   if (src == const0_rtx)
+ igain -= COSTS_N_BYTES (1);
+   /* movdi_internal vs. movv2di_internal.  */
+   /* => mov (5 bytes) vs. movaps (7 bytes).  */
+   else if (x86_64_immediate_operand (src, SImode))
+ igain -= COSTS_N_BYTES (2);
+ }
+   else
+ {
+   /* DImode can be immediate for TARGET_64BIT
+  and SImode always.  */
+   igain += m * COSTS_N_INSNS (1);
+   igain -= vector_const_cost (src);
+ }
+ }
else if (MEM_P (dst))
- igain += (m * ix86_cost->int_store[2]
-   - ix86_cost->sse_store[sse_cost_idx]);
-   igain -= vector_const_cost (src);
+ {
+   igain += (m * ix86_cost->int_store[2]
+ - ix86_cost->sse_store[sse_cost_idx]);
+   igain -= vector_const_cost (src);
+ }
break;
 
  default:
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4d4ab6a..5abf2a6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19982,8 +19982,6 @@ ix86_division_cost (const struct processor_costs *cost,
 return cost->divide[MODE_INDEX (mode)];
 }
 
-#define COSTS_N_BYTES(N) ((N) * 2)
-
 /* Return cost of shift in MODE.
If CONSTANT_OP1 is true, the op1 value is known and set in OP1_VAL.
AND_IN_OP1 specify in op1 is result of and and SHIFT_AND_TRUNCATE
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 21fe51b..edbfcaf 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -88,6 +88,11 @@ struct stringop_algs
   } size [MAX_STRINGOP_ALGS];
 };
 
+/* Analog of COSTS_N_INSNS when optimizing for size.  */
+#ifndef COSTS_N_BYTES
+#define COSTS_N_BYTES(N) ((N) * 2)
+#endif
+
 /* Define the specific costs for a given cpu.  NB: hard_register is used
by TARGET_REGISTER_MOVE_COST and TARGET_MEMORY_MOVE_COST to compute
hard register move costs by register allocator.  Relative costs of

[x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

2021-08-19 Thread Roger Sayle



Back in June I briefly mentioned in one of my gcc-patches posts that
a change that should have always reduced code size, would mysteriously
occasionally result in slightly larger code (according to CSiBE):
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573233.html

Investigating further, the cause turns out to be that x86_64's
scalar-to-vector (stv) pass is relying on poor estimates of the size
costs/benefits.  This patch tweaks the backend's compute_convert_gain
method to provide slightly more accurate values when compiling with
-Os. Compilation without -Os is (should be) unaffected.  And for
completeness, I'll mention that the stv pass is a net win for code
size so it's much better to improve its heuristics than simply gate
the pass on !optimize_for_size.

The net effect of this change is to save 1399 bytes on the CSiBE
code size benchmark when compiling with -Os.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.

Ok for mainline?


2021-08-19  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-features.c (compute_convert_gain): Provide
more accurate values for CONST_INT, when optimizing for size.
* config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
* config/i386/i386.h (COSTS_N_BYTES): to here.

Roger
--

[PATCH] Fold more constants during veclower pass.

2021-08-19 Thread Roger Sayle


An issue with a backend patch I've been investigating has revealed
a missed optimization opportunity during GCC's vector lowering pass.
An unrecognized insn for "(set (reg:SI) (not:SI (const_int 0))"
revealed that not only was my expander not expecting a NOT with
a constant operand, but also that veclower was producing the
dubious tree expression ~0.

The attached patch replaces a call to gimple_build_assign with a
call to either gimplify_build1 or gimplify_build2 depending upon
whether the operation takes one or two operands.  The net effect
is that where GCC previously produced the following optimized
gimple for testsuite/c-c++common/Wunused-var-16.c (notice the ~0
and the "& 0"):

void foo ()
{
  V x;
  V y;
  vector(16) unsigned char _1;
  unsigned char _7;
  unsigned char _8;

  y_2 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  x_3 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _7 = ~0;
  _1 = {_7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7, _7};
  _8 = 0 & _7;
  y_4 = {_8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8, _8};
  v = y_4;
  return;
}

With this patch we now generate:

void foo ()
{
  V x;
  V y;
  vector(16) unsigned char _1;

  y_2 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  x_3 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _1 = { 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255 };
  y_4 = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  v = y_4;
  return;
}

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check".

Ok for mainline?


2021-08-18  Roger Sayle  

gcc/ChangeLog
* tree-vect-generic.c (expand_vector_operations_1): Use either
gimplify_build1 or gimplify_build2 instead of gimple_build_assign
when constructing scalar splat expressions.

gcc/testsuite/ChangeLog
* c-c++common/Wununsed-var-16.c: Add extra check for that ~0
is optimized away.

Roger
--

diff --git a/gcc/testsuite/c-c++-common/Wunused-var-16.c 
b/gcc/testsuite/c-c++-common/Wunused-var-16.c
index 8bdbcd3..31c7db3 100644
--- a/gcc/testsuite/c-c++-common/Wunused-var-16.c
+++ b/gcc/testsuite/c-c++-common/Wunused-var-16.c
@@ -1,6 +1,6 @@
 /* PR c++/78949 */
 /* { dg-do compile } */
-/* { dg-options "-Wunused" } */
+/* { dg-options "-Wunused -fdump-tree-optimized" } */
 /* { dg-additional-options "-fno-common" { target hppa*-*-hpux* } } */
 
 typedef unsigned char V __attribute__((vector_size(16)));
@@ -14,3 +14,5 @@ foo ()
   y &= ~x;
   v = y;
 }
+
+/* { dg-final { scan-tree-dump-not " ~0" "optimized" } } */
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index 2e00b3e..0d7f041 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -2162,9 +2162,10 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi,
   if (op >= FIRST_NORM_OPTAB && op <= LAST_NORM_OPTAB
  && optab_handler (op, TYPE_MODE (TREE_TYPE (type))) != 
CODE_FOR_nothing)
{
- tree slhs = make_ssa_name (TREE_TYPE (TREE_TYPE (lhs)));
- gimple *repl = gimple_build_assign (slhs, code, srhs1, srhs2);
- gsi_insert_before (gsi, repl, GSI_SAME_STMT);
+ tree stype = TREE_TYPE (TREE_TYPE (lhs));
+ tree slhs = (rhs2 != NULL_TREE)
+ ? gimplify_build2 (gsi, code, stype, srhs1, srhs2)
+ : gimplify_build1 (gsi, code, stype, srhs1);
  gimple_assign_set_rhs_from_tree (gsi,
   build_vector_from_val (type, slhs));
  update_stmt (stmt);

[committed] Fix ada build on hpux

2021-08-19 Thread John David Anglin

Tested on hppa2.0w-hp-hpux11.11.  Committed to trunk.

Dave

Define STAGE1_LIBS to link against libcl.a in stage1 on hpux.

2021-08-19  Arnaud Charlet  

PR ada/101924
gcc/ada/ChangeLog:
* gcc-interface/Make-lang.in (STAGE1_LIBS): Define on hpux.

diff --git a/gcc/ada/gcc-interface/Make-lang.in 
b/gcc/ada/gcc-interface/Make-lang.in
index b68081ed065..765654fc36b 100644
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -85,6 +85,10 @@ ifeq ($(strip $(filter-out linux%,$(host_os))),)
   STAGE1_LIBS=-ldl
 endif

+ifeq ($(strip $(filter-out hpux%,$(host_os))),)
+  STAGE1_LIBS=/usr/lib/libcl.a
+endif
+
 ifeq ($(STAGE1),True)
   ADA_INCLUDES=$(COMMON_ADA_INCLUDES)
   adalib=$(dir $(shell $(CC) -print-libgcc-file-name))adalib

Re: [PATCH] more warning code refactoring

2021-08-19 Thread Segher Boessenkool

On Thu, Aug 19, 2021 at 09:03:44AM -0600, Martin Sebor via Gcc-patches wrote:
> On 8/18/21 11:56 PM, Kewen.Lin wrote:
> >To get rid of GTY variable alloc_object_size_limit looks suspicious,
> >maybe tree objects returned by alloc_max_size after the change are out
> >of GC's tracking?
> 
> I wouldn't expect that to make a difference.  There are thousands
> of similar calls to build_int_cst() throughout the middle end.

But it does make a huge difference.  It has nothing to do with the calls
to build_int_cst, either.

Please don't mess with GTY (and call it a refactoring, to add insult to
injury) if you do not know what you are doing :-(

> Looking at the original patch, the change that I'm not sure about
> and that shouldn't have been part of the refactoring is the call
> to enable_ranger() in pass_waccess::execute().  It's something
> I was planning to do next.  But even that I wouldn't expect to
> eat up a whole 1GB or memory.

The testcase is super heavy in the instruction combiner, so you get a
lot of garbage.  Which is not a problem, except you made the garbage
collector not pick this up, so we get a ton of garbage accumulating.


Segher

Re: [PATCH, V2 2/3] targhooks: New target hook for CTF/BTF debug info emission

2021-08-19 Thread Jose E. Marchesi via Gcc-patches



> Hmm, well.  How about emitting .BTF.ext.string from GCC and have the linker
> merge the .BTF.ext.string section with the CTF string section then?  You can't
> really say "the ship has sailed" if I read the CTF webpage - there seems to be
> many format changes planned.

Forgot to mention that BPF programs are never linked in practice, even
if we support it in the GNU toolchain.

BPF programmers compile C code into an object file, and then that object
file is loaded in the kernel via libbpf.  So they don't ever use the
linker.

A pity, because this was a neat idea.

Re: [PATCH] more warning code refactoring

2021-08-19 Thread Martin Sebor via Gcc-patches


On 8/18/21 11:56 PM, Kewen.Lin wrote:

Hi David,

on 2021/8/19 上午11:26, David Edelsohn via Gcc-patches wrote:

Hi, Martin

A few PowerPC-specific testcases started failing yesterday on AIX with
a strange failure mode: the compiler runs out of memory.  As you may
expect from telling you this in an email reply to your patch, I have
bisected the failure and landed on your commit.  I can alternate
between the previous commit and your commit, and the failure
definitely appears with your patch, although I'm unsure how your patch
affected memory allocation in the compiler.  Maybe moving the code
changed a type of allocation or some memory no longer is being freed?




To get rid of GTY variable alloc_object_size_limit looks suspicious,
maybe tree objects returned by alloc_max_size after the change are out
of GC's tracking?

If the suspicion holds, the attached explorative diff may help.


I wouldn't expect that to make a difference.  There are thousands
of similar calls to build_int_cst() throughout the middle end.

Looking at the original patch, the change that I'm not sure about
and that shouldn't have been part of the refactoring is the call
to enable_ranger() in pass_waccess::execute().  It's something
I was planning to do next.  But even that I wouldn't expect to
eat up a whole 1GB or memory.



BR,
Kewen


Previously, compiler bootstrap and all testcases ran with a data size
of 1GB.  After your change, the data size required for those
particular testcases jumped to 2GB.

The testcases are

gcc/testsuite/gcc.target/powerpc/rlwimi-[012].c

The failure is

cc1: out of memory allocating 65536 bytes after a total of 1608979296

This seems like a significant memory use regression.  Any ideas what happened?


Not really.  The patch just moved code around.  I didn't make any
changes that I'd expect to impact memory allocation to an appreciable
extent, at least not intentionally.  Let me look into it and get back
to you.

Martin



Thanks, David

Re: [PATCH, V2 2/3] targhooks: New target hook for CTF/BTF debug info emission

2021-08-19 Thread Jose E. Marchesi via Gcc-patches



> On Tue, Aug 17, 2021 at 7:26 PM Indu Bhagat  wrote:
>>
>> On 8/17/21 1:04 AM, Richard Biener wrote:
>> > On Mon, Aug 16, 2021 at 7:39 PM Indu Bhagat  wrote:
>> >>
>> >> On 8/10/21 4:54 AM, Richard Biener wrote:
>> >>> On Thu, Aug 5, 2021 at 2:52 AM Indu Bhagat via Gcc-patches
>> >>>  wrote:
>> 
>>  This patch adds a new target hook to detect if the CTF container can 
>>  allow the
>>  emission of CTF/BTF debug info at DWARF debug info early finish time. 
>>  Some
>>  backends, e.g., BPF when generating code for CO-RE usecase, may need to 
>>  emit
>>  the CTF/BTF debug info sections around the time when late DWARF debug is
>>  finalized (dwarf2out_finish).
>> >>>
>> >>> Without looking at the dwarf2out.c usage in the next patch - I think
>> >>> the CTF part
>> >>> should be always emitted from dwarf2out_early_finish, the "hooks" should 
>> >>> somehow
>> >>> arrange for the alternate output specific data to be preserved until
>> >>> dwarf2out_finish
>> >>> time so the late BTF data can be emitted from there.
>> >>>
>> >>> Lumping everything together now just makes it harder to see what info
>> >>> is required
>> >>> to persist and thus make LTO support more intrusive than necessary.
>> >>
>> >> In principle, I agree the approach to split generate/emit CTF/BTF like
>> >> you mention is ideal.  But, the BTF CO-RE relocations format is such
>> >> that the .BTF section cannot be finalized until .BTF.ext contents are
>> >> all fully known (David Faust summarizes this issue in the other thread
>> >> "[PATCH, V2 3/3] dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE
>> >> usecase".)
>> >>
>> >> In summary, the .BTF.ext section refers to strings in the .BTF section.
>> >> These strings are added at the time the CO-RE relocations are added.
>> >> Recall that the .BTF section's header has information about the .BTF
>> >> string table start offset and length. So, this means the "CTF part" (or
>> >> the .BTF section) cannot simply be emitted in the dwarf2out_early_finish
>> >> because it's not ready yet. If it is still unclear, please let me know.
>> >>
>> >> My judgement here is that the BTF format itself is not amenable to split
>> >> early/late emission like DWARF. BTF has no linker support yet either.
>> >
>> > But are the strings used for the CO-RE relocations not all present already?
>> > Or does the "CTF part" have only "foo", "bar" and "baz" while the CO-RE
>> > part wants to output sth like "foo->bar.baz" (which IMHO would be quite
>> > stupid also for size purposes)?
>> >
>>
>> Yes, the latter ("foo->bar.baz") is closer to what the format does for
>> CO-RE relocations!
>>
>> > That said, fix the format.
>> >
>> > Alternatively hand the CO-RE part its own string table (what's the fuss
>> > with re-using the CTF string table if there's nothing to share ...)
>> >
>>
>> BTF and .BTF.ext formats are specified already by implementations in the
>> kernel, libbpf, and LLVM. For that matter, I should add BPF CO-RE to the
>> mix and say that BPF CO-RE capability _and_ .BTF/.BTF.ext debug formats
>> have been defined already by the BPF kernel developers/associated
>> entities. At this time, we as GCC developers simply extending the BPF
>> backend/BTF generation support in GCC, cannot fix the format. That ship
>> has sailed.
>
> Hmm, well.  How about emitting .BTF.ext.string from GCC and have the linker
> merge the .BTF.ext.string section with the CTF string section then?  You can't
> really say "the ship has sailed" if I read the CTF webpage - there seems to be
> many format changes planned.

Unfortunately we have little (if any) influence in the design of BPF,
BTF and CO-RE.  All of which have been designed and is being evolved by
the kernel people.

CTF, on the other hand, is unrelated to CO-RE, and we are definitely
keeping LTO in mind when designing the extra extensions (like the
backtraces support) that need input from the compiler backends.

> Well.  Guess that was it from my side on the topic of ranting about the
> not well thought out debug format ;)

Trust me, we feel you ;)

RE: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-08-19 Thread Jirui Wu via Gcc-patches

Hi all,

This patch generates FRINTZ instruction to optimize type casts.

The changes in this patch covers:
* Generate FRINTZ for (double)(int) casts.
* Add new test cases.

The intermediate type is not checked according to the C99 spec. 
Overflow of the integral part when casting floats to integers causes undefined 
behavior.
As a result, optimization to trunc() is not invalid. 
I've confirmed that Boolean type does not match the matching condition.

Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? If OK can it be committed for me, I have no commit rights.

Thanks,
Jirui

gcc/ChangeLog:

* match.pd: Generate IFN_TRUNC.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/merge_trunc1.c: New test.

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 17, 2021 9:13 AM
> To: Andrew Pinski 
> Cc: Jirui Wu ; Richard Sandiford
> ; i...@airs.com; gcc-patches@gcc.gnu.org;
> rguent...@suse.de
> Subject: Re: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int)
> under -ffast-math on aarch64
> 
> On Mon, Aug 16, 2021 at 8:48 PM Andrew Pinski via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > On Mon, Aug 16, 2021 at 9:15 AM Jirui Wu via Gcc-patches
> >  wrote:
> > >
> > > Hi all,
> > >
> > > This patch generates FRINTZ instruction to optimize type casts.
> > >
> > > The changes in this patch covers:
> > > * Opimization of a FIX_TRUNC_EXPR cast inside a FLOAT_EXPR using
> IFN_TRUNC.
> > > * Change of corresponding test cases.
> > >
> > > Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master? If OK can it be committed for me, I have no commit rights.
> >
> > Is there a reason why you are doing the transformation manually inside
> > forwprop rather than handling it inside match.pd?
> > Also can't this only be done for -ffast-math case?
> 
> You definitely have to look at the intermediate type - that could be a uint8_t
> or even a boolean type.  So unless the intermediate type can represent all
> float values optimizing to trunc() is invalid.  Also if you emit IFN_TRUNC you
> have to make sure there's target support - we don't emit calls to a library
> trunc() from an internal function call (and we wouldn't want to optimize it
> that way).
> 
> Richard.
> 
> >
> > Thanks,
> > Andrew Pinski
> >
> > >
> > > Thanks,
> > > Jirui
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-ssa-forwprop.c (pass_forwprop::execute): Optimize with 
> > > frintz.
> > >
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/aarch64/fix_trunc1.c: Update to new expectation.


diff
Description: diff

Re: [PATCH] more warning code refactoring

2021-08-19 Thread Segher Boessenkool

Hi!

On Thu, Aug 19, 2021 at 01:56:56PM +0800, Kewen.Lin via Gcc-patches wrote:
> on 2021/8/19 上午11:26, David Edelsohn via Gcc-patches wrote:
> > A few PowerPC-specific testcases started failing yesterday on AIX with
> > a strange failure mode: the compiler runs out of memory.  As you may
> > expect from telling you this in an email reply to your patch, I have
> > bisected the failure and landed on your commit.  I can alternate
> > between the previous commit and your commit, and the failure
> > definitely appears with your patch, although I'm unsure how your patch
> > affected memory allocation in the compiler.  Maybe moving the code
> > changed a type of allocation or some memory no longer is being freed?
> 
> To get rid of GTY variable alloc_object_size_limit looks suspicious,
> maybe tree objects returned by alloc_max_size after the change are out
> of GC's tracking?

That looks exactly right.  Thanks Ke Wen!

This also means this was not a refactoring at all -- nothing changing
anything to do with GC is ever a refactoring.  Refactorings are trivial
one-to-one code transforms that do not change semantics at all.


Segher

Re: [committed] libstdc++: Document P1739R4 status [PR100139]

2021-08-19 Thread Jonathan Wakely via Gcc-patches


On 19/08/21 13:03 +0100, Jonathan Wakely wrote:

We should document the status of this unimplemented feature.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/100139
* doc/xml/manual/status_cxx2020.xml: Add P1739R4 to status table.
* doc/html/manual/status.html: Regenerate.



I forgot that I'd grouped all the Ranges and Concepts papers together
in the status table. This moves the new row to be in that group.

Committed to trunk and 10 and 11 branches.

commit c5e0f954aef8caf4ee54b185e0fbfa88aeab62c6
Author: Jonathan Wakely 
Date:   Thu Aug 19 15:03:21 2021

libstdc++: Move status table entry to be with other ranges papers

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2020.xml: Move row  earlier in table.
* doc/html/manual/status.html: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
index 45de79311a1..a729ddd3ada 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
@@ -545,6 +545,16 @@ or any notes about the implementation.
   
 
 
+
+  
+   Avoid template bloat for safe_ranges in combination with ‘subrange-y’ view adaptors.
+  
+http://www.w3.org/1999/xlink; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1739r4.html;>
+P1739R4 
+  
+   
+  
+
 
 
 
@@ -1437,17 +1447,6 @@ or any notes about the implementation.
   
 
 
-
-  
-   Avoid template bloat for safe_ranges in combination with ‘subrange-y’ view adaptors.
-  
-http://www.w3.org/1999/xlink; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1739r4.html;>
-P1739R4 
-  
-   
-  
-
-

[committed] libstdc++: Update Doxygen config template to Doxygen 1.9.2

2021-08-19 Thread Jonathan Wakely via Gcc-patches

This adds my new SHOW_HEADERFILE option, and removes some obsolete
options.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in: Update to Doxygen 1.9.2

Tested powerpc64le-linux. Committed to trunk.

commit 778044ccf59205e85bc5fdcd1760d789fdd05022
Author: Jonathan Wakely 
Date:   Thu Aug 19 14:50:09 2021

libstdc++: Update Doxygen config template to Doxygen 1.9.2

This adds my new SHOW_HEADERFILE option, and removes some obsolete
options.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in: Update to Doxygen 1.9.2

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index ab9e552701c..17cd6fc1c0e 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -93,14 +93,6 @@ ALLOW_UNICODE_NAMES= NO
 
 OUTPUT_LANGUAGE= English
 
-# The OUTPUT_TEXT_DIRECTION tag is used to specify the direction in which all
-# documentation generated by doxygen is written. Doxygen will use this
-# information to generate all generated output in the proper direction.
-# Possible values are: None, LTR, RTL and Context.
-# The default value is: None.
-
-OUTPUT_TEXT_DIRECTION  = None
-
 # If the BRIEF_MEMBER_DESC tag is set to YES, doxygen will include brief member
 # descriptions after the members that are listed in the file and class
 # documentation (similar to Javadoc). Set to NO to disable this.
@@ -588,6 +580,12 @@ HIDE_SCOPE_NAMES   = NO
 
 HIDE_COMPOUND_REFERENCE= NO
 
+# If the SHOW_HEADERFILE tag is set to YES then the documentation for a class
+# will show which file needs to be included to use the class.
+# The default value is: YES.
+
+SHOW_HEADERFILE= YES
+
 # If the SHOW_INCLUDE_FILES tag is set to YES then doxygen will put a list of
 # the files that are included by a file in the documentation of that file.
 # The default value is: YES.
@@ -1293,38 +1291,6 @@ USE_HTAGS  = NO
 
 VERBATIM_HEADERS   = NO
 
-# If the CLANG_ASSISTED_PARSING tag is set to YES then doxygen will use the
-# clang parser (see: http://clang.llvm.org/) for more accurate parsing at the
-# cost of reduced performance. This can be particularly helpful with template
-# rich C++ code for which doxygen's built-in parser lacks the necessary type
-# information.
-# Note: The availability of this option depends on whether or not doxygen was
-# generated with the -Duse_libclang=ON option for CMake.
-# The default value is: NO.
-
-CLANG_ASSISTED_PARSING = NO
-
-# If clang assisted parsing is enabled you can provide the compiler with 
command
-# line options that you would normally use when invoking the compiler. Note 
that
-# the include paths will already be set by doxygen for the files and 
directories
-# specified with INPUT and INCLUDE_PATH.
-# This tag requires that the tag CLANG_ASSISTED_PARSING is set to YES.
-
-CLANG_OPTIONS  =
-
-# If clang assisted parsing is enabled you can provide the clang parser with 
the
-# path to the directory containing a file called compile_commands.json. This
-# file is the compilation database (see:
-# http://clang.llvm.org/docs/HowToSetupToolingForLLVM.html) containing the
-# options used when the source files were built. This is equivalent to
-# specifying the "-p" option to a clang tool, such as clang-check. These 
options
-# will then be passed to the parser. Any options specified with CLANG_OPTIONS
-# will be added as well.
-# Note: The availability of this option depends on whether or not doxygen was
-# generated with the -Duse_libclang=ON option for CMake.
-
-CLANG_DATABASE_PATH=
-
 #---
 # Configuration options related to the alphabetical class index
 #---
@@ -2072,16 +2038,6 @@ LATEX_BATCHMODE= YES
 
 LATEX_HIDE_INDICES = YES
 
-# If the LATEX_SOURCE_CODE tag is set to YES then doxygen will include source
-# code with syntax highlighting in the LaTeX output.
-#
-# Note that which sources are shown also depends on other settings such as
-# SOURCE_BROWSER.
-# The default value is: NO.
-# This tag requires that the tag GENERATE_LATEX is set to YES.
-
-LATEX_SOURCE_CODE  = NO
-
 # The LATEX_BIB_STYLE tag can be used to specify the style to use for the
 # bibliography, e.g. plainnat, or ieeetr. See
 # https://en.wikipedia.org/wiki/BibTeX and \cite for more info.
@@ -2162,16 +2118,6 @@ RTF_STYLESHEET_FILE=
 
 RTF_EXTENSIONS_FILE=
 
-# If the RTF_SOURCE_CODE tag is set to YES then doxygen will include source 
code
-# with syntax highlighting in the RTF output.
-#
-# Note that which sources are shown also depends on other settings such as
-# SOURCE_BROWSER.
-# The default value is: NO.
-# This tag requires that the tag GENERATE_RTF is set to YES.
-
-RTF_SOURCE_CODE= NO
-

[committed] libstdc++: Don't check always-true condition [PR101965]

2021-08-19 Thread Jonathan Wakely via Gcc-patches

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101965
* include/std/charconv (__to_chars_i): Remove redundant check.

Tested powerpc64le-linux. Committed to trunk.

commit 85a709595005b5df4b2ee9d81717a5df19c0023f
Author: Jonathan Wakely 
Date:   Thu Aug 19 13:05:54 2021

libstdc++: Don't check always-true condition [PR101965]

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101965
* include/std/charconv (__to_chars_i): Remove redundant check.

diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index ac9c34d4601..2e368843fc9 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -336,12 +336,10 @@ namespace __detail
  *__first = '0';
  return { __first + 1, errc{} };
}
-
-  if _GLIBCXX17_CONSTEXPR (std::is_signed<_Tp>::value)
+  else if _GLIBCXX17_CONSTEXPR (std::is_signed<_Tp>::value)
if (__value < 0)
  {
-   if (__builtin_expect(__first != __last, 1))
- *__first++ = '-';
+   *__first++ = '-';
__unsigned_val = _Up(~__value) + _Up(1);
  }

Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-08-19 Thread Qing Zhao via Gcc-patches



> On Aug 19, 2021, at 4:00 AM, Richard Biener  wrote:
> 
> On Wed, 18 Aug 2021, Qing Zhao wrote:
> 
>> 
>> 
>>> On Aug 18, 2021, at 2:15 AM, Richard Biener  wrote:
>>> 
>>> On Tue, 17 Aug 2021, Qing Zhao wrote:
>>> 
 
 
> On Aug 17, 2021, at 9:50 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> 
> 
>> On Aug 17, 2021, at 3:29 AM, Richard Biener  wrote:
>> 
>> On Mon, 16 Aug 2021, Qing Zhao wrote:
>> 
>>> My current code for expand_DEFERRED_INIT is like the following, could 
>>> you check and see whether there is any issue for it:
>>> 
>>> #define INIT_PATTERN_VALUE  0xFE
>>> static void
>>> expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>> {
>>> tree lhs = gimple_call_lhs (stmt);
>>> tree var_size = gimple_call_arg (stmt, 0);
>>> enum auto_init_type init_type
>>> = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>> bool is_vla = (bool) TREE_INT_CST_LOW (gimple_call_arg (stmt, 2));
>>> 
>>> tree var_type = TREE_TYPE (lhs);
>>> gcc_assert (init_type > AUTO_INIT_UNINITIALIZED);
>>> 
>>> if (is_vla || (!use_register_for_decl (lhs)))
>>> {
>>>   if (TREE_CODE (lhs) == SSA_NAME)
>>> lhs = SSA_NAME_VAR (lhs);
>> 
>> this should not be necessary (in fact you shouldn't see a SSA_NAME
>> here, if you do then using SSA_NAME_VAR is wrong)
> You mean during RTL expansion phase, all SSA_NAMEs are gone already?
 
 Actually, the lhs could be SSA_NAME here, 
 
 Breakpoint 1, expand_DEFERRED_INIT (stmt=0x7fffe96ae348) at 
 ../../latest-gcc/gcc/internal-fn.c:3021
 3021 mark_addressable (lhs);
 (gdb) call debug_tree(lhs)
 >>>   type >>>   size 
   unit-size 
   align:32 warn_if_not_align:0 symtab:0 alias-set 2 canonical-type 
 0x7fffe959b2a0 precision:32
   pointer_to_this >
   visited var 
   def_stmt temp1_5 = .DEFERRED_INIT (4, 2, 0, &"temp1"[0]);
   version:5>
 
 when I deleted:
 
 if (TREE_CODE (lhs) == SSA_NAME
  lhs = SSA_NAME_VAR (lhs);
>>> 
>>> but then using SSA_NAME_VAR is broken.  I suspect use_register_for_decl
>>> isn't the correct thing to look at.  I think we need to look at what
>>> the LHS expanded to if it is a SSA_VAR_P (that includes SSA names
>>> but also plain DECLs but not what we get from VLAs where we'd see
>>> *ptr).  So sth like
>>> 
>>> bool reg_lhs;
>>> if (SSA_VAR_P (lhs))
>>>   {
>>> rtx tem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>>> reg_lhs = !MEM_P (tem);
>>> /* If not MEM_P reg_lhs should be REG_P or SUBREG_P (but maybe
>>>also CONCAT or lowpart...?)  */
>>>   }
>>> else
>>>   {
>>> gcc_assert (is_vla);
>>> reg_lhs = false;
>>>   }
>>> 
>>> if (!reg_lhs)
>>>   memset path
>>> else
>>>   expand_assignment path
>> 
>> After making the following change:
>> 
>> +  bool reg_lhs = true;
>> 
>>   tree var_type = TREE_TYPE (lhs);
>>   gcc_assert (init_type > AUTO_INIT_UNINITIALIZED);
>> 
>> -  if (is_vla || (!use_register_for_decl (lhs)))
>> +  if (SSA_VAR_P (lhs))
>> +{
>> +  rtx tem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>> +  reg_lhs = !MEM_P (tem);
>> +}
>> +  else
>> +{
>> +  gcc_assert (is_vla);
>> +  reg_lhs = false;
>> +}
>> +
>> +  if (!reg_lhs)
>> {
>> 
>> I got exactly the same internal error that failed at expr.c:
>> 
>> 8436   /* We must have made progress.  */
>> 8437   gcc_assert (inner != exp);
>> 
>> 
>> Looks like for the following code:
>> 
>> 3026   if (!reg_lhs)
>> 3027 {
>> 3028 /* If this is a VLA or the variable is not in register,
>> 3029expand to a memset to initialize it.  */
>> 3030   mark_addressable (lhs);
>> 3031   tree var_addr = build_fold_addr_expr (lhs);
>> 3032 
>> 3033   tree value = (init_type == AUTO_INIT_PATTERN) ?
>> 3034 build_int_cst (integer_type_node,
>> 3035INIT_PATTERN_VALUE) :
>> 3036 integer_zero_node;
>> 3037   tree m_call = build_call_expr (builtin_decl_implicit 
>> (BUILT_IN_MEMSET),
>> 3038  3, var_addr, value, var_size);
>> 3039   /* Expand this memset call.  */
>> 3040   expand_builtin_memset (m_call, NULL_RTX, TYPE_MODE (var_type));
>> 3041 }
>> 
>> At line 3030, “lhs” could be a SSA_NAME.
>> 
>> My questions are:
>> 
>> 1. Could the routine “mark_addressable” and “build_fold_addr_expr” be 
>> applied on SSA_NAME?
> 
> No.
> 
>> 2. Could the routine “expand_builtin_memset” be applied on the memset call 
>> whose “DEST” is
>>an address expression on SSA_NAME? 
> 
> No.
> 
>> 3. Within “expand_DEFERRED_INIT”, can I call “expand_builtin_memset” to 
>> expand .DEFERRED_INIT?
> 
> Well, not with "invalid" GENERIC I fear (address of a SSA name).
> 
>> I suspect that one of the above 3 might be the issue, but not sure which one?
>

Re: [ping][vect-patterns][RFC] Refactor widening patterns to allow internal_fn's

2021-08-19 Thread Richard Biener via Gcc-patches

On Tue, 17 Aug 2021, Joel Hutton wrote:

> Ping. Is there still interest in refactoring vect-patterns to internal_fn's? 

Yes, sorry ...

+  internal_fn fn = as_internal_fn ((combined_fn) wide_code_or_ifn);

maybe add an overload to as_internal_fn.

+  pattern_stmt = gimple_build_call_internal (fn, 2,
+   oprnd[0],
+   oprnd[1]);

some whitespace problems here?  It looks like it fits on less lines.

+  if (ch.is_tree_code ())
+  {
+new_stmt = gimple_build_assign (vec_dest, ch, vec_oprnd0, 
vec_oprnd1);
+new_temp = make_ssa_name (vec_dest, new_stmt);
+gimple_assign_set_lhs (new_stmt, new_temp);
+  }
+  else
+  {
+  new_stmt = gimple_build_call_internal (as_internal_fn 
((combined_fn) ch),
+2, vec_oprnd0, vec_oprnd1);
+  new_temp = make_ssa_name (vec_dest, new_stmt);
+  gimple_call_set_lhs (new_stmt, new_temp);
+  }

only the new_stmt = needs to be conditional if you use
gimple_set_lhs ().

+  if (is_gimple_assign (stmt))
+  {
+code_or_ifn = gimple_assign_rhs_code (stmt);
+  }
+  else
+code_or_ifn = (combined_fn) gimple_call_combined_fn (stmt);

is the cast really necessary?

+  if (is_gimple_assign (stmt))
+  {
+if (!CONVERT_EXPR_CODE_P (code_or_ifn)
+   && code_or_ifn != FIX_TRUNC_EXPR
+   && code_or_ifn != FLOAT_EXPR
+   && code_or_ifn != WIDEN_PLUS_EXPR
+   && code_or_ifn != WIDEN_MINUS_EXPR
+   && code_or_ifn != WIDEN_MULT_EXPR
+   && code_or_ifn != WIDEN_LSHIFT_EXPR)
+  return false;
+  }
+

no constraints for the IFN case?  The check should be
code_or_ifn.is_tree_code () for clarity.

+  if (supportable_convert_operation (code_or_ifn, vectype_out, 
vectype_in,
+ (tree_code*) _or_ifn1))

that looks fragile (TBAA), better pass  and if successful
update code_or_ifn1 from it?

I wonder if we should add as_tree_code () / as_fn_code ()
that work safely when the code isn't a tree or fn code like

   enum tree_code as_tree_code () const { return is_tree_code () ? 
(tree_code)*this : MAX_TREE_CODES; }

and similar for as_fn_code (with CFN_LAST).  That would make

+  if (code_or_ifn.is_tree_code ())
+  {
+switch ((tree_code) code_or_ifn)
+  {

nicer to be just

  switch (code_or_ifn.as_tree_code ())
   {
...
   }
  switch (code_or_ifn.as_fn_code ())
   {
...
   }

that said, the patch doesn't include any actual new IFNs for the
widening stuff, correct?  Still thanks for the work sofar.

Thanks,
Richard.

> > -Original Message-
> > From: Joel Hutton
> > Sent: 07 June 2021 14:30
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Biener ; Richard Sandiford
> > 
> > Subject: [vect-patterns][RFC] Refactor widening patterns to allow
> > internal_fn's
> > 
> > Hi all,
> > 
> > This refactor allows widening patterns (such as widen_plus/widen_minus) to
> > be represented as either internal_fns or tree_codes. The widening patterns
> > were originally added as tree codes with the expectation that they would be
> > refactored later.
> > 
> > [vect-patterns] Refactor as internal_fn's
> > 
> > Refactor vect-patterns to allow patterns to be internal_fns starting with
> > widening_plus/minus patterns.
> > 
> > 
> > gcc/ChangeLog:
> > 
> > * gimple-match.h (class code_helper): Move code_helper class to more
> > visible header.
> > * internal-fn.h (internal_fn_name): Add internal_fn range check.
> > * optabs-tree.h (supportable_convert_operation): Change function
> > prototypes to use code_helper.
> > * tree-vect-patterns.c (vect_recog_widen_op_pattern): Refactor to 
> > use
> > code_helper.
> > * tree-vect-stmts.c (vect_gen_widened_results_half): Refactor to use
> > code_helper, build internal_fns.
> > (vect_create_vectorized_promotion_stmts): Refactor to use
> > code_helper.
> > (vectorizable_conversion): Refactor to use code_helper.
> > (supportable_widening_operation): Refactor to use code_helper.
> > (supportable_narrowing_operation): Refactor to use code_helper.
> > * tree-vectorizer.h (supportable_widening_operation): Refactor to 
> > use
> > code_helper.
> > (supportable_narrowing_operation): Refactor to use code_helper.
> > * tree.h (class code_helper): Refactor to use code_helper.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

[PATCH, OG11, OpenACC, committed] Fix ICE for non-contiguous arrays

2021-08-19 Thread Chung-Lin Tang


Currently we ICE when non-decl base-pointers (like struct members) are
used in OpenACC non-contiguous array sections.

This patch is kind of a band-aid to reject such cases ATM. We'll deal
with the more elaborate middle-end stuff to fully support them later.

Committed to devel/omp/gcc-11 after testing. This is not for mainline.

Chung-Lin

From 4e34710679ac084d7ca15ccf387c1b6f1e64c2d1 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Thu, 19 Aug 2021 16:17:02 +0800
Subject: [PATCH] openacc: fix ICE for non-decl expression in non-contiguous
 array base-pointer

Currently, we do not support cases like struct-members as the base-pointer
for an OpenACC non-contiguous array. Mark such cases as unsupported in the
C/C++ front-ends, instead of ICEing on them.

gcc/c/ChangeLog:

* c-typeck.c (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.

gcc/cp/ChangeLog:

* semantics.c (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.
---
 gcc/c/c-typeck.c   | 35 +++
 gcc/cp/semantics.c | 39 ---
 2 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 9c4822bbf27..a8b54c676c0 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -13431,25 +13431,36 @@ handle_omp_array_sections_1 (tree c, tree t, 
vec ,
  && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_AFFINITY
  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
{
- if (ort == C_ORT_ACC)
-   /* Note that OpenACC does accept these kinds of non-contiguous
-  pointer based arrays.  */
-   non_contiguous = true;
- else
+ /* If any prior dimension has a non-one length, then deem this
+array section as non-contiguous.  */
+ for (tree d = TREE_CHAIN (t); TREE_CODE (d) == TREE_LIST;
+  d = TREE_CHAIN (d))
{
- /* If any prior dimension has a non-one length, then deem this
-array section as non-contiguous.  */
- for (tree d = TREE_CHAIN (t); TREE_CODE (d) == TREE_LIST;
-  d = TREE_CHAIN (d))
+ tree d_length = TREE_VALUE (d);
+ if (d_length == NULL_TREE || !integer_onep (d_length))
{
- tree d_length = TREE_VALUE (d);
- if (d_length == NULL_TREE || !integer_onep (d_length))
+ if (ort == C_ORT_ACC)
{
+ while (TREE_CODE (d) == TREE_LIST)
+   d = TREE_CHAIN (d);
+ if (DECL_P (d))
+   {
+ /* Note that OpenACC does accept these kinds of
+non-contiguous pointer based arrays.  */
+ non_contiguous = true;
+ break;
+   }
  error_at (OMP_CLAUSE_LOCATION (c),
-   "array section is not contiguous in %qs clause",
+   "base-pointer expression in %qs clause not "
+   "supported for non-contiguous arrays",
omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
  return error_mark_node;
}
+
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "array section is not contiguous in %qs clause",
+   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+ return error_mark_node;
}
}
}
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index e56ad8aa1e1..ad62ad76ff9 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5292,32 +5292,41 @@ handle_omp_array_sections_1 (tree c, tree t, vec 
,
  return error_mark_node;
}
   /* If there is a pointer type anywhere but in the very first
-array-section-subscript, the array section could be non-contiguous.
-Note that OpenACC does accept these kinds of non-contiguous pointer
-based arrays.  */
+array-section-subscript, the array section could be non-contiguous.  */
   if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_AFFINITY
  && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND
  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
{
- if (ort == C_ORT_ACC)
-   /* Note that OpenACC does accept these kinds of non-contiguous
-  pointer based arrays.  */
-   non_contiguous = true;
- else
+ /* If any prior dimension has a non-one length, then deem this
+array section as non-contiguous.  */
+ for (tree d = TREE_CHAIN (t); TREE_CODE (d) == TREE_LIST;
+  d = TREE_CHAIN (d))
{
- /* If any prior dimension

[PATCH][MIPS] Remove TARGET_ASM_FUNCTION_RODATA_SECTION

2021-08-19 Thread Dragan Mladjenovic via Gcc-patches

Since 'Remove obsolete IRIX 6.5 support' [1] we only use
gp-relative jump-tables for PIC code. We can fall back to
default behaviour for asm_function_rodata_section.

[1] https://gcc.gnu.org/ml/libstdc++/2012-03/msg00067.html

2018-06-04 Dragan Mladjenovic 
gcc/

* config/mips/mips.c (mips_function_rodata_section,
TARGET_ASM_FUNCTION_RODATA_SECTION): Removed.
---
Tested against mips64-linux-gnu with -mabi=64|32|n32 and
mips-mti-elf with mips32r2.

 gcc/config/mips/mips.c | 38 --
 1 file changed, 38 deletions(-)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 89d1be6cea6..39666d6973f 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -9306,42 +9306,6 @@ mips_select_rtx_section (machine_mode mode, rtx x,
   return default_elf_select_rtx_section (mode, x, align);
 }
 
-/* Implement TARGET_ASM_FUNCTION_RODATA_SECTION.
-
-   The complication here is that, with the combination TARGET_ABICALLS
-   && !TARGET_ABSOLUTE_ABICALLS && !TARGET_GPWORD, jump tables will use
-   absolute addresses, and should therefore not be included in the
-   read-only part of a DSO.  Handle such cases by selecting a normal
-   data section instead of a read-only one.  The logic apes that in
-   default_function_rodata_section.  */
-
-static section *
-mips_function_rodata_section (tree decl, bool)
-{
-  if (!TARGET_ABICALLS || TARGET_ABSOLUTE_ABICALLS || TARGET_GPWORD)
-return default_function_rodata_section (decl, false);
-
-  if (decl && DECL_SECTION_NAME (decl))
-{
-  const char *name = DECL_SECTION_NAME (decl);
-  if (DECL_COMDAT_GROUP (decl) && startswith (name, ".gnu.linkonce.t."))
-   {
- char *rname = ASTRDUP (name);
- rname[14] = 'd';
- return get_section (rname, SECTION_LINKONCE | SECTION_WRITE, decl);
-   }
-  else if (flag_function_sections
-  && flag_data_sections
-  && startswith (name, ".text."))
-   {
- char *rname = ASTRDUP (name);
- memcpy (rname + 1, "data", 4);
- return get_section (rname, SECTION_WRITE, decl);
-   }
-}
-  return data_section;
-}
-
 /* Implement TARGET_IN_SMALL_DATA_P.  */
 
 static bool
@@ -22606,8 +22570,6 @@ mips_asm_file_end (void)
 #define TARGET_ASM_FUNCTION_EPILOGUE mips_output_function_epilogue
 #undef TARGET_ASM_SELECT_RTX_SECTION
 #define TARGET_ASM_SELECT_RTX_SECTION mips_select_rtx_section
-#undef TARGET_ASM_FUNCTION_RODATA_SECTION
-#define TARGET_ASM_FUNCTION_RODATA_SECTION mips_function_rodata_section
 
 #undef TARGET_SCHED_INIT
 #define TARGET_SCHED_INIT mips_sched_init
-- 
2.17.1

Re: [PATCH v2] Fix incomplete computation in fill_always_executed_in_1

2021-08-19 Thread Richard Biener via Gcc-patches

On Tue, 17 Aug 2021, Xionghu Luo wrote:

> 
> 
> On 2021/8/17 15:12, Richard Biener wrote:
> > On Tue, 17 Aug 2021, Xionghu Luo wrote:
> > 
> >> Hi,
> >>
> >> On 2021/8/16 19:46, Richard Biener wrote:
> >>> On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
> >>>
>  It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
>  nested loops.  inn_loop is updated to inner loop, so it need be restored
>  when exiting from innermost loop. With this patch, the store instruction
>  in outer loop could also be moved out of outer loop by store motion.
>  Any comments?  Thanks.
> >>>
>  gcc/ChangeLog:
> 
>    * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
>    inn_loop when exiting from innermost loop.
> 
>  gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
>  ---
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++
> gcc/tree-ssa-loop-im.c |  6 +-
> 2 files changed, 29 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> 
>  diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>  b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>  new file mode 100644
>  index 000..097a5ee4a4b
>  --- /dev/null
>  +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>  @@ -0,0 +1,24 @@
>  +/* PR/101293 */
>  +/* { dg-do compile } */
>  +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
>  +
>  +struct X { int i; int j; int k;};
>  +
>  +void foo(struct X *x, int n, int l)
>  +{
>  +  for (int j = 0; j < l; j++)
>  +{
>  +  for (int i = 0; i < n; ++i)
>  +{
>  +  int *p = >j;
>  +  int tem = *p;
>  +  x->j += tem * i;
>  +}
>  +  int *r = >k;
>  +  int tem2 = *r;
>  +  x->k += tem2 * j;
>  +}
>  +}
>  +
>  +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" 
>  } }
>  */
>  +
>  diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
>  index b24bc64f2a7..5ca4738b20e 100644
>  --- a/gcc/tree-ssa-loop-im.c
>  +++ b/gcc/tree-ssa-loop-im.c
>  @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, 
>  sbitmap
>  @@ contains_call)
>    if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>  last = bb;
> +   if (inn_loop != loop
>  +  && flow_loop_nested_p (bb->loop_father, inn_loop))
>  +inn_loop = bb->loop_father;
>  +
> >>>
> >>> The comment says
> >>>
> >>> /* In a loop that is always entered we may proceed anyway.
> >>>But record that we entered it and stop once we leave 
> >>> it.
> >>> */
> >>> inn_loop = bb->loop_father;
> >>>
> >>> and your change would defeat that early return, no?
> >>
> >> The issue is the search method exits too early when iterating the outer
> >> loop.  For example of a nested loop, loop 1 includes 5,8,3,10,4,9
> >> and loop2 includes 3,10.  Currently, it breaks when bb is 3 as bb 3
> >> doesn't dominate bb 9 of loop 1.  But actually, both bb 5 and bb 4 are
> >> ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4
> >> they won't be processed by store motion again.
> >>
> >>
> >>  5<
> >>  |\   |
> >>  8 \  9
> >>  |  \ |
> >> --->3--->4
> >> ||
> >> 10---|
> >>
> >>
> >> SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this
> >> patch, it will continue search when meet bb 3 until bb 4, then last is 
> >> updated
> >> to bb 4, it will break until exit edge is found at bb 4 by
> >> "if (!flow_bb_inside_loop_p (loop, e->dest))".  Then the followed loop code
> >> will
> >> set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5.
> >>
> >>
> >>   while (1)
> >>{
> >>  SET_ALWAYS_EXECUTED_IN (last, loop);
> >>  if (last == loop->header)
> >>break;
> >>  last = get_immediate_dominator (CDI_DOMINATORS, last);
> >>}
> >>
> >> After further discussion with Kewen, we found that the inn_loop variable is
> >> totally useless and could be removed.
> >>
> >>
> >>>
>    if (bitmap_bit_p (contains_call, bb->index))
>  break;
> 
>  @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, 
>  sbitmap
>  @@ contains_call)
> 
>    if (bb->loop_father->header == bb)
>   {
>  -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>  +  if (!dominated_by_p (CDI_DOMINATORS, 
>  bb->loop_father->latch,
>  bb))
>   break;
> >>>
> >>> That's now a always false condition - a loops latch is always dominated
> >>> by its header.  The condition as written tries to verify whether the
> >>> loop is always entered - mind we visit all

[committed] libstdc++: Fix move construction of std::tuple with array elements [PR101960]

2021-08-19 Thread Jonathan Wakely via Gcc-patches

An array member cannot be direct-initialized in a ctor-initializer-list,
so use the base class' move constructor, which does the right thing for
both arrays and non-arrays.

This constructor could be defaulted, but that would make it trivial for
some specializations, which would change the argument passing ABI. Do
that for the versioned namespace only.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101960
* include/std/tuple (_Tuple_impl(_Tuple_impl&&)): Use base
class' move constructor. Define as defaulted for versioned
namespace.
* testsuite/20_util/tuple/cons/101960.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 0187e0d7360f327f88d8b2294668669306ae4630
Author: Jonathan Wakely 
Date:   Thu Aug 19 11:48:40 2021

libstdc++: Fix move construction of std::tuple with array elements 
[PR101960]

An array member cannot be direct-initialized in a ctor-initializer-list,
so use the base class' move constructor, which does the right thing for
both arrays and non-arrays.

This constructor could be defaulted, but that would make it trivial for
some specializations, which would change the argument passing ABI. Do
that for the versioned namespace only.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101960
* include/std/tuple (_Tuple_impl(_Tuple_impl&&)): Use base
class' move constructor. Define as defaulted for versioned
namespace.
* testsuite/20_util/tuple/cons/101960.cc: New test.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 1292aee45c0..f082ccb8a3b 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -438,11 +438,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // 2729. Missing SFINAE on std::pair::operator=
   _Tuple_impl& operator=(const _Tuple_impl&) = delete;
 
+#if _GLIBCXX_INLINE_VERSION
+  _Tuple_impl(_Tuple_impl&&) = default;
+#else
   constexpr
   _Tuple_impl(_Tuple_impl&& __in)
   noexcept(is_nothrow_move_constructible<_Head>::value)
-  : _Base(std::forward<_Head>(_M_head(__in)))
+  : _Base(static_cast<_Base&&>(__in))
   { }
+#endif
 
   template
constexpr
diff --git a/libstdc++-v3/testsuite/20_util/tuple/cons/101960.cc 
b/libstdc++-v3/testsuite/20_util/tuple/cons/101960.cc
new file mode 100644
index 000..f14604cdc69
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/tuple/cons/101960.cc
@@ -0,0 +1,4 @@
+// { dg-do compile { target c++11 } }
+#include 
+std::tuple t;
+auto tt = std::move(t); // PR libstdc++/101960

[committed] libstdc++: Document P1739R4 status [PR100139]

2021-08-19 Thread Jonathan Wakely via Gcc-patches

We should document the status of this unimplemented feature.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/100139
* doc/xml/manual/status_cxx2020.xml: Add P1739R4 to status table.
* doc/html/manual/status.html: Regenerate.

Tested powerpc64le-linux. Committed to trunk.

commit 926d4a71c7e5a2f7d17a4f943d6e7fe9f1e3ba55
Author: Jonathan Wakely 
Date:   Thu Aug 19 11:44:57 2021

libstdc++: Document P1739R4 status [PR100139]

We should document the status of this unimplemented feature.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/100139
* doc/xml/manual/status_cxx2020.xml: Add P1739R4 to status table.
* doc/html/manual/status.html: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
index ca12d8023f1..45de79311a1 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2020.xml
@@ -1437,6 +1437,17 @@ or any notes about the implementation.
   
 
 
+
+  
+   Avoid template bloat for safe_ranges in combination 
with ‘subrange-y’ view adaptors.
+  
+http://www.w3.org/1999/xlink; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1739r4.html;>
+P1739R4 
+  
+   
+  
+
+

[committed] libstdc++: Improve doxygen docs for smart pointers

2021-08-19 Thread Jonathan Wakely via Gcc-patches

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr.h: Add @since and @headerfile tags.
* include/bits/unique_ptr.h: Add @headerfile tags.

Tested powerpc64le-linux. Committed to trunk.

commit 30b300de8eb9a53c8ad8d80caf06e386e916bc66
Author: Jonathan Wakely 
Date:   Thu Aug 19 11:27:32 2021

libstdc++: Improve doxygen docs for smart pointers

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr.h: Add @since and @headerfile tags.
* include/bits/unique_ptr.h: Add @headerfile tags.

diff --git a/libstdc++-v3/include/bits/shared_ptr.h 
b/libstdc++-v3/include/bits/shared_ptr.h
index d5386ad535f..214ce20a878 100644
--- a/libstdc++-v3/include/bits/shared_ptr.h
+++ b/libstdc++-v3/include/bits/shared_ptr.h
@@ -102,6 +102,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /**
*  @brief  A smart pointer with reference-counted copy semantics.
+   *  @headerfile memory
+   *  @since C++11
*
* A `shared_ptr` object is either empty or _owns_ a pointer passed
* to the constructor. Copies of a `shared_ptr` share ownership of
@@ -139,6 +141,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if __cplusplus >= 201703L
 # define __cpp_lib_shared_ptr_weak_type 201606
   /// The corresponding weak_ptr type for this shared_ptr
+  /// @since C++17
   using weak_type = weak_ptr<_Tp>;
 #endif
   /**
@@ -266,6 +269,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  @param  __r  A `shared_ptr`.
*  @param  __p  A pointer that will remain valid while `*__r` is valid.
*  @post   `get() == __p && !__r.use_count() && !__r.get()`
+   *  @since C++17
*
*  This can be used to construct a `shared_ptr` to a sub-object
*  of an object managed by an existing `shared_ptr`. The complete
@@ -607,6 +611,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if __cplusplus >= 201703L
   /// Convert type of `shared_ptr`, via `reinterpret_cast`
+  /// @since C++17
   template
 inline shared_ptr<_Tp>
 reinterpret_pointer_cast(const shared_ptr<_Up>& __r) noexcept
@@ -620,6 +625,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // 2996. Missing rvalue overloads for shared_ptr operations
 
   /// Convert type of `shared_ptr` rvalue, via `static_cast`
+  /// @since C++20
   template
 inline shared_ptr<_Tp>
 static_pointer_cast(shared_ptr<_Up>&& __r) noexcept
@@ -630,6 +636,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   /// Convert type of `shared_ptr` rvalue, via `const_cast`
+  /// @since C++20
   template
 inline shared_ptr<_Tp>
 const_pointer_cast(shared_ptr<_Up>&& __r) noexcept
@@ -640,6 +647,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   /// Convert type of `shared_ptr` rvalue, via `dynamic_cast`
+  /// @since C++20
   template
 inline shared_ptr<_Tp>
 dynamic_pointer_cast(shared_ptr<_Up>&& __r) noexcept
@@ -651,6 +659,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   /// Convert type of `shared_ptr` rvalue, via `reinterpret_cast`
+  /// @since C++20
   template
 inline shared_ptr<_Tp>
 reinterpret_pointer_cast(shared_ptr<_Up>&& __r) noexcept
@@ -666,6 +675,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /**
* @brief  A non-owning observer for a pointer owned by a shared_ptr
+   * @headerfile memory
+   * @since C++11
*
* A weak_ptr provides a safe alternative to a raw pointer when you want
* a non-owning reference to an object that is managed by a shared_ptr.
@@ -786,7 +797,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
   /**
-   *  @brief Base class allowing use of member function shared_from_this.
+   * @brief Base class allowing use of the member function `shared_from_this`.
+   * @headerfile memory
+   * @since C++11
*/
   template
 class enable_shared_from_this
@@ -813,6 +826,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11
 #define __cpp_lib_enable_shared_from_this 201603
+  /** @{
+   * Get a `weak_ptr` referring to the object that has `*this` as its base.
+   * @since C++17
+   */
   weak_ptr<_Tp>
   weak_from_this() noexcept
   { return this->_M_weak_this; }
@@ -820,6 +837,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   weak_ptr
   weak_from_this() const noexcept
   { return this->_M_weak_this; }
+  /// @}
 #endif
 
 private:
diff --git a/libstdc++-v3/include/bits/unique_ptr.h 
b/libstdc++-v3/include/bits/unique_ptr.h
index 023bd4d7f31..f34ca10ce65 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -240,6 +240,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // 20.7.1.2 unique_ptr for single objects.
 
   /// A move-only smart pointer that manages unique ownership of a resource.
+  /// @headerfile memory
   /// @since C++11
   template >
 class unique_ptr
@@ -478,6 +479,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // DR 740 - omit specialization for array objects with

[committed] libstdc++: Improve overflow check for file timestamps

2021-08-19 Thread Jonathan Wakely via Gcc-patches

The current code assumes that system_clock::duration is nanoseconds, and
also performs a value-changing conversion from nanoseconds::max() to
double (which doesn't matter after dividing by 1e9, but triggers a
warning with Clang nonetheless).

A better solution is to use system_clock::duration::max() and perform
the comparison entirely using the std::chrono types, rather than with
dimensionless arithmetic types.

This doesn't address the FIXME in the function, so the overflow check
still rejects some values that could be represented by the file_clock.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* src/filesystem/ops-common.h (filesystem::file_time): Improve
overflow check by using system_clock::duration::max().

Tested powerpc64le-linux. Committed to trunk.

commit 65441d8fc3c132a58c8bef6faefa2bc25e82a913
Author: Jonathan Wakely 
Date:   Thu Aug 19 11:03:01 2021

libstdc++: Improve overflow check for file timestamps

The current code assumes that system_clock::duration is nanoseconds, and
also performs a value-changing conversion from nanoseconds::max() to
double (which doesn't matter after dividing by 1e9, but triggers a
warning with Clang nonetheless).

A better solution is to use system_clock::duration::max() and perform
the comparison entirely using the std::chrono types, rather than with
dimensionless arithmetic types.

This doesn't address the FIXME in the function, so the overflow check
still rejects some values that could be represented by the file_clock.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* src/filesystem/ops-common.h (filesystem::file_time): Improve
overflow check by using system_clock::duration::max().

diff --git a/libstdc++-v3/src/filesystem/ops-common.h 
b/libstdc++-v3/src/filesystem/ops-common.h
index 304e5b263fb..bf26c06b7b5 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++ b/libstdc++-v3/src/filesystem/ops-common.h
@@ -229,7 +229,7 @@ namespace __gnu_posix
 // (This only applies to the C++17 Filesystem library, because for the
 // Filesystem TS we don't have a distinct __file_clock, we just use the
 // system clock for file timestamps).
-if (s >= (nanoseconds::max().count() / 1e9))
+if (seconds{s} >= floor(system_clock::duration::max()))
   {
ec = std::make_error_code(std::errc::value_too_large); // EOVERFLOW
return system_clock::time_point::min();

[committed] libstdc++: Tweak whitespace

2021-08-19 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Committed to trunk.

commit c8a1cf1a7a8be1dc0de48035d88fecf4954e37ba
Author: Jonathan Wakely 
Date:   Wed Aug 18 16:57:47 2021

libstdc++: Tweak whitespace

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/stl_tree.h: Tweak whitespace.

diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h
index 96299129810..e4e3e0b985c 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -322,7 +322,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   _Base_ptr _M_node;
-  };
+};
 
   template
 struct _Rb_tree_const_iterator

aarch64/arm zero bitfield handling (was Re: [PATCH] c++, v3: Implement P0466R5 __cpp_lib_is_layout_compatible compiler helpers [PR101539])

2021-08-19 Thread Jakub Jelinek via Gcc-patches

On Thu, Aug 19, 2021 at 08:59:16AM +0200, Christophe Lyon wrote:
> This patch ( r12-2975) is causing regressions on arm and aarch64:
> 
> g++:g++.target/aarch64/aarch64.exp=g++.target/aarch64/no_unique_address_1.C
> check-function-bodies _Z8caller_pR1P
> 
> g++:g++.target/aarch64/aarch64.exp=g++.target/aarch64/no_unique_address_2.C
>  (test for warnings, line 169)
> 
> g++:g++.target/aarch64/aarch64.exp=g++.target/aarch64/no_unique_address_2.C
> check-function-bodies _Z8caller_pR1P
> 
> g++:g++.target/arm/arm.exp=g++.target/arm/no_unique_address_1.C
> check-function-bodies _Z8caller_pR1P
> g++:g++.target/arm/arm.exp=g++.target/arm/no_unique_address_2.C  (test
> for warnings, line 163)
> g++:g++.target/arm/arm.exp=g++.target/arm/no_unique_address_2.C
> check-function-bodies _Z8caller_pR1P

The only change in the patch that affects this is the removal of the
remove_zero_width_bit_fields function and call to it in layout_class_type.

As I've tried to explain, that has been done as some kind of optimization
by the C++ FE only (e.g. the C FE doesn't do that), but because it is
significant for layout compatibility and its traits, it is no longer
something that can be done (at least not by the FE).

The optimized dump from the test is identical, it is just the aarch64
(and arm?) backend handling of those, and results e.g. on the first test in:
 _Z8caller_pR1P:
 .LFB15:
-   ldp s0, s1, [x0]
-   ldp s2, s3, [x0, 8]
+   ldp x0, x1, [x0]
b   _Z8callee_p1P
difference, where P does have int : 0; bitfield in it.

Though, if the r10-8042-g56fe3ca30e1343e4f232ca539726506440e23dd3
and r10-8044-g127abeb2e8448b2932bd52245f055d0c5c4b44a0 patches were about
the handling of zero width bitfields, I don't see how that could have
worked properly when the FE would just throw them away, so the
backend wouldn't know if they were there or not.

Richard, can you please have a look?

Jakub

Re: [PATCH] move x86 to use gather/scatter internal functions

2021-08-19 Thread Richard Biener via Gcc-patches

On Thu, 19 Aug 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 18 Aug 2021, Richard Sandiford wrote:
> >> I think it would be OK/sensible to use the larger of the index or
> >> result vectors to determine the mask, if that helps.  There just
> >> wasn't any need to make a distinction for SVE, since there the
> >> mask type is determined entirely by the number of elements.
> >
> > I don't think that would help - for a V4SFmode gather with V4DImode
> > indices the x86 ISA expects the AVX2 mask to be V4SImode, that is,
> > the mask corresponds to the data mode, not to the index mode.
> 
> Ah, OK.  So the widest type determines the ISA and then the data
> type determines the mask type within that ISA?

Yes.

> > The real issue is that we're asking for a mask mode just using
> > the data mode but not the instruction.  So with V8SFmode data
> > mode and -mavx512f -mno-avx512vl it appears we want a AVX2
> > mask mode but in reality the gather instruction we can use
> > uses EVEX.512 encoding when the index mode is V8DImode
> > and thus is available w/ -mavx512f.
> >
> > So it appears that we'd need to pass another mode to the
> > get_mask_mode target hook that determines the instruction
> > encoding ...
> >
> > I'm also not sure if this wouldn't open up a can of worms
> > with the mask computation stmts which would not know which
> > (maybe there are two) context they are used in.  We're
> > fundamentally using two different vector sizes and ISA subsets
> > here :/
> 
> Yeah.
> 
> > Maybe vinfo->vector_mode should fully determine whether
> > we're using EVEX or VEX encoding in the loop and thus we should
> > reject attempts to do VEX encoding vectorization in EVEX mode?
> > I supose this is how you don't get a mix of SVE and ADVSIMD
> > vectorization in one loop?
> 
> Yeah, that's right.  It means that for -msve-vector-bits=128
> we effectively have two V16QIs: the “real” AdvSIMD V16QI and
> the SVE VNx16QI.  I can't imagine that being popular for x86 :-)
> 
> > But since VEX/EVEX use the same modes (but SVE/ADVSIMD do not)
> > this will get interesting.
> >
> > Oh, and -mavx512bw would also be prone to use (different size again)
> > AVX2 vectors at the same time.  Maybe the "solution" is to simply
> > not expose the EVEX modes to the vectorizer when there's such
> > a possibility, thus only when all F, VL and BW are available?
> > At least when thinking about relaxing the vector sizes we can
> > deal with.
> >
> > I've opened this can of worms with
> >
> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> > index 97745a830a2..db938f79c9c 100644
> > --- a/gcc/tree-vect-data-refs.c
> > +++ b/gcc/tree-vect-data-refs.c
> > @@ -3749,13 +3749,15 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool 
> > read_p, 
> > bool masked_p,
> >  
> >for (;;)
> >  {
> > -  tree offset_vectype = get_vectype_for_scalar_type (vinfo, 
> > offset_type);
> > -  if (!offset_vectype)
> > -   return false;
> > -
> > +  tree offset_vectype
> > +   = get_related_vectype_for_scalar_type (vinfo->vector_mode, 
> > offset_type,
> > +  TYPE_VECTOR_SUBPARTS 
> > (vectype));
> >
> > to even get the mixed size optabs to be considered.  If we were to
> > expose the real instructions instead we'd have
> > mask_gather_loadv16sfv8di but as said the semantics of such
> > optab entry would need to be defined.  That's the way the current
> > builtin_gather target hook code works, so it seems that would be
> > a working approach, side-stepping the mixed size problems.
> > Here we'd define that excess elements on either the data or the
> > index operand are ignored (on the output operand they are zeroed).
> 
> It seems a bit ugly to expose such a low-level target-specific detail
> at the gimple level though.  Also, if we expose the v16sf at the gimple
> level then we would need to construct a v16sf vector for scatter stores
> and initialise all the elements (at least until we have a don't care
> VEC_PERM_EXPR encoding).

True - which is why I tried to not do this in the first place ...

> I'm not sure this would solve the problem with the mixture of mask
> computation stmts that you mentioned above.  E.g. what if the input
> to the v16sf scatter store above is a COND_EXPR that shares some mask
> subexpresions with the scatter store mask?  This COND_EXPR would be a
> v8sf VEC_COND_EXPR and so would presumably use the get_mask_mode
> corresponding to v8sf rather than v16sf.  The mask computation stmts
> would still need to cope with a mixture of AVX512 and AVX2 masks.

But the smaller data mode is only exposed because of the gather
optab - the index mode is large only because the rest of the loop
is vectorized with the larger size.  This is I think why the current
scheme using the builtin_gather target hook works - we expose the
larger data vector here.

> Even if x86 doesn't support that combination yet (because it instead
> requires all vector

Re: [PATCH] move x86 to use gather/scatter internal functions

2021-08-19 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Wed, 18 Aug 2021, Richard Sandiford wrote:
>> I think it would be OK/sensible to use the larger of the index or
>> result vectors to determine the mask, if that helps.  There just
>> wasn't any need to make a distinction for SVE, since there the
>> mask type is determined entirely by the number of elements.
>
> I don't think that would help - for a V4SFmode gather with V4DImode
> indices the x86 ISA expects the AVX2 mask to be V4SImode, that is,
> the mask corresponds to the data mode, not to the index mode.

Ah, OK.  So the widest type determines the ISA and then the data
type determines the mask type within that ISA?

> The real issue is that we're asking for a mask mode just using
> the data mode but not the instruction.  So with V8SFmode data
> mode and -mavx512f -mno-avx512vl it appears we want a AVX2
> mask mode but in reality the gather instruction we can use
> uses EVEX.512 encoding when the index mode is V8DImode
> and thus is available w/ -mavx512f.
>
> So it appears that we'd need to pass another mode to the
> get_mask_mode target hook that determines the instruction
> encoding ...
>
> I'm also not sure if this wouldn't open up a can of worms
> with the mask computation stmts which would not know which
> (maybe there are two) context they are used in.  We're
> fundamentally using two different vector sizes and ISA subsets
> here :/

Yeah.

> Maybe vinfo->vector_mode should fully determine whether
> we're using EVEX or VEX encoding in the loop and thus we should
> reject attempts to do VEX encoding vectorization in EVEX mode?
> I supose this is how you don't get a mix of SVE and ADVSIMD
> vectorization in one loop?

Yeah, that's right.  It means that for -msve-vector-bits=128
we effectively have two V16QIs: the “real” AdvSIMD V16QI and
the SVE VNx16QI.  I can't imagine that being popular for x86 :-)

> But since VEX/EVEX use the same modes (but SVE/ADVSIMD do not)
> this will get interesting.
>
> Oh, and -mavx512bw would also be prone to use (different size again)
> AVX2 vectors at the same time.  Maybe the "solution" is to simply
> not expose the EVEX modes to the vectorizer when there's such
> a possibility, thus only when all F, VL and BW are available?
> At least when thinking about relaxing the vector sizes we can
> deal with.
>
> I've opened this can of worms with
>
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index 97745a830a2..db938f79c9c 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -3749,13 +3749,15 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool 
> read_p, 
> bool masked_p,
>  
>for (;;)
>  {
> -  tree offset_vectype = get_vectype_for_scalar_type (vinfo, 
> offset_type);
> -  if (!offset_vectype)
> -   return false;
> -
> +  tree offset_vectype
> +   = get_related_vectype_for_scalar_type (vinfo->vector_mode, 
> offset_type,
> +  TYPE_VECTOR_SUBPARTS 
> (vectype));
>
> to even get the mixed size optabs to be considered.  If we were to
> expose the real instructions instead we'd have
> mask_gather_loadv16sfv8di but as said the semantics of such
> optab entry would need to be defined.  That's the way the current
> builtin_gather target hook code works, so it seems that would be
> a working approach, side-stepping the mixed size problems.
> Here we'd define that excess elements on either the data or the
> index operand are ignored (on the output operand they are zeroed).

It seems a bit ugly to expose such a low-level target-specific detail
at the gimple level though.  Also, if we expose the v16sf at the gimple
level then we would need to construct a v16sf vector for scatter stores
and initialise all the elements (at least until we have a don't care
VEC_PERM_EXPR encoding).

I'm not sure this would solve the problem with the mixture of mask
computation stmts that you mentioned above.  E.g. what if the input
to the v16sf scatter store above is a COND_EXPR that shares some mask
subexpresions with the scatter store mask?  This COND_EXPR would be a
v8sf VEC_COND_EXPR and so would presumably use the get_mask_mode
corresponding to v8sf rather than v16sf.  The mask computation stmts
would still need to cope with a mixture of AVX512 and AVX2 masks.

Even if x86 doesn't support that combination yet (because it instead
requires all vector sizes to be equal), it seems like something that
is likely to be supported in future.

Thanks,
Richard

Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-08-19 Thread Richard Biener via Gcc-patches

On Wed, 18 Aug 2021, Qing Zhao wrote:

> 
> 
> > On Aug 18, 2021, at 2:15 AM, Richard Biener  wrote:
> > 
> > On Tue, 17 Aug 2021, Qing Zhao wrote:
> > 
> >> 
> >> 
> >>> On Aug 17, 2021, at 9:50 AM, Qing Zhao via Gcc-patches 
> >>>  wrote:
> >>> 
> >>> 
> >>> 
>  On Aug 17, 2021, at 3:29 AM, Richard Biener  wrote:
>  
>  On Mon, 16 Aug 2021, Qing Zhao wrote:
>  
> > My current code for expand_DEFERRED_INIT is like the following, could 
> > you check and see whether there is any issue for it:
> > 
> > #define INIT_PATTERN_VALUE  0xFE
> > static void
> > expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> > {
> > tree lhs = gimple_call_lhs (stmt);
> > tree var_size = gimple_call_arg (stmt, 0);
> > enum auto_init_type init_type
> >  = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
> > bool is_vla = (bool) TREE_INT_CST_LOW (gimple_call_arg (stmt, 2));
> > 
> > tree var_type = TREE_TYPE (lhs);
> > gcc_assert (init_type > AUTO_INIT_UNINITIALIZED);
> > 
> > if (is_vla || (!use_register_for_decl (lhs)))
> >  {
> >if (TREE_CODE (lhs) == SSA_NAME)
> >  lhs = SSA_NAME_VAR (lhs);
>  
>  this should not be necessary (in fact you shouldn't see a SSA_NAME
>  here, if you do then using SSA_NAME_VAR is wrong)
> >>> You mean during RTL expansion phase, all SSA_NAMEs are gone already?
> >> 
> >> Actually, the lhs could be SSA_NAME here, 
> >> 
> >> Breakpoint 1, expand_DEFERRED_INIT (stmt=0x7fffe96ae348) at 
> >> ../../latest-gcc/gcc/internal-fn.c:3021
> >> 3021 mark_addressable (lhs);
> >> (gdb) call debug_tree(lhs)
> >>  >>type  >>size 
> >>unit-size 
> >>align:32 warn_if_not_align:0 symtab:0 alias-set 2 canonical-type 
> >> 0x7fffe959b2a0 precision:32
> >>pointer_to_this >
> >>visited var 
> >>def_stmt temp1_5 = .DEFERRED_INIT (4, 2, 0, &"temp1"[0]);
> >>version:5>
> >> 
> >> when I deleted:
> >> 
> >> if (TREE_CODE (lhs) == SSA_NAME
> >>   lhs = SSA_NAME_VAR (lhs);
> > 
> > but then using SSA_NAME_VAR is broken.  I suspect use_register_for_decl
> > isn't the correct thing to look at.  I think we need to look at what
> > the LHS expanded to if it is a SSA_VAR_P (that includes SSA names
> > but also plain DECLs but not what we get from VLAs where we'd see
> > *ptr).  So sth like
> > 
> >  bool reg_lhs;
> >  if (SSA_VAR_P (lhs))
> >{
> >  rtx tem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >  reg_lhs = !MEM_P (tem);
> >  /* If not MEM_P reg_lhs should be REG_P or SUBREG_P (but maybe
> > also CONCAT or lowpart...?)  */
> >}
> >  else
> >{
> >  gcc_assert (is_vla);
> >  reg_lhs = false;
> >}
> > 
> >  if (!reg_lhs)
> >memset path
> >  else
> >expand_assignment path
> 
> After making the following change:
> 
> +  bool reg_lhs = true;
>  
>tree var_type = TREE_TYPE (lhs);
>gcc_assert (init_type > AUTO_INIT_UNINITIALIZED);
>  
> -  if (is_vla || (!use_register_for_decl (lhs)))
> +  if (SSA_VAR_P (lhs))
> +{
> +  rtx tem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  reg_lhs = !MEM_P (tem);
> +}
> +  else
> +{
> +  gcc_assert (is_vla);
> +  reg_lhs = false;
> +}
> +
> +  if (!reg_lhs)
>  {
> 
> I got exactly the same internal error that failed at expr.c:
> 
>  8436   /* We must have made progress.  */
>  8437   gcc_assert (inner != exp);
> 
> 
> Looks like for the following code:
> 
> 3026   if (!reg_lhs)
> 3027 {
> 3028 /* If this is a VLA or the variable is not in register,
> 3029expand to a memset to initialize it.  */
> 3030   mark_addressable (lhs);
> 3031   tree var_addr = build_fold_addr_expr (lhs);
> 3032 
> 3033   tree value = (init_type == AUTO_INIT_PATTERN) ?
> 3034 build_int_cst (integer_type_node,
> 3035INIT_PATTERN_VALUE) :
> 3036 integer_zero_node;
> 3037   tree m_call = build_call_expr (builtin_decl_implicit 
> (BUILT_IN_MEMSET),
> 3038  3, var_addr, value, var_size);
> 3039   /* Expand this memset call.  */
> 3040   expand_builtin_memset (m_call, NULL_RTX, TYPE_MODE (var_type));
> 3041 }
> 
> At line 3030, “lhs” could be a SSA_NAME.
> 
> My questions are:
> 
> 1. Could the routine “mark_addressable” and “build_fold_addr_expr” be applied 
> on SSA_NAME?

No.

> 2. Could the routine “expand_builtin_memset” be applied on the memset call 
> whose “DEST” is
> an address expression on SSA_NAME? 

No.

> 3. Within “expand_DEFERRED_INIT”, can I call “expand_builtin_memset” to 
> expand .DEFERRED_INIT?

Well, not with "invalid" GENERIC I fear (address of a SSA name).

> I suspect that one of the above 3 might be the issue, but not sure which one?

All of the above ;)  So while reg_lhs is now precise as to how the
variable will end up (the SSA name will

Re: [PATCH] expand: Add new clrsb fallback expansion [PR101950]

2021-08-19 Thread Richard Biener via Gcc-patches

On Thu, 19 Aug 2021, Jakub Jelinek wrote:

> Hi!
> 
> As suggested in the PR, the following patch adds two new clrsb
> expansion possibilities if target doesn't have clrsb_optab for the
> requested nor wider modes, but does have clz_optab for the requested
> mode.
> One expansion is
> clrsb (op0)
> expands as
> clz (op0 ^ (((stype)op0) >> (prec-1))) - 1
> which is usable if CLZ_DEFINED_VALUE_AT_ZERO is 2 with value
> of prec, because the clz argument can be 0 and clrsb should give
> prec-1 in that case.
> The other expansion is
> clz (((op0 << 1) ^ (((stype)op0) >> (prec-1))) | 1)
> where the clz argument is never 0, but it is one operation longer.
> E.g. on x86_64-linux with -O2 -mno-lzcnt, this results for
> int foo (int x) { return __builtin_clrsb (x); }
> in
> - subq$8, %rsp
> - movslq  %edi, %rdi
> - call__clrsbdi2
> - addq$8, %rsp
> - subl$32, %eax
> + leal(%rdi,%rdi), %eax
> + sarl$31, %edi
> + xorl%edi, %eax
> + orl $1, %eax
> + bsrl%eax, %eax
> + xorl$31, %eax
> and with -O2 -mlzcnt:
> + movl%edi, %eax
> + sarl$31, %eax
> + xorl%edi, %eax
> + lzcntl  %eax, %eax
> + subl$1, %eax
> On armv7hl-linux-gnueabi with -O2:
> - push{r4, lr}
> - bl  __clrsbsi2
> - pop {r4, pc}
> + @ link register save eliminated.
> + eor r0, r0, r0, asr #31
> + clz r0, r0
> + sub r0, r0, #1
> + bx  lr
> As it (at least usually) will make code larger, it is
> disabled for -Os or cold instructions.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2021-08-19  Jakub Jelinek  
> 
>   PR middle-end/101950
>   * optabs.c (expand_clrsb_using_clz): New function.
>   (expand_unop): Use it as another clrsb expansion fallback.
> 
>   * gcc.target/i386/pr101950-1.c: New test.
>   * gcc.target/i386/pr101950-2.c: New test.
> 
> --- gcc/optabs.c.jj   2021-07-15 10:16:13.027581160 +0200
> +++ gcc/optabs.c  2021-08-18 13:36:56.410818265 +0200
> @@ -2600,6 +2600,82 @@ widen_leading (scalar_int_mode mode, rtx
>return 0;
>  }
>  
> +/* Attempt to emit (clrsb:mode op0) as
> +   (plus:mode (clz:mode (xor:mode op0 (ashr:mode op0 (const_int prec-1
> +   (const_int -1))
> +   if CLZ_DEFINED_VALUE_AT_ZERO (mode, val) is 2 and val is prec,
> +   or as
> +   (clz:mode (ior:mode (xor:mode (ashl:mode op0 (const_int 1))
> +  (ashr:mode op0 (const_int prec-1)))
> +(const_int 1)))
> +   otherwise.  */
> +
> +static rtx
> +expand_clrsb_using_clz (scalar_int_mode mode, rtx op0, rtx target)
> +{
> +  if (optimize_insn_for_size_p ()
> +  || optab_handler (clz_optab, mode) == CODE_FOR_nothing)
> +return NULL_RTX;
> +
> +  start_sequence ();
> +  HOST_WIDE_INT val = 0;
> +  if (CLZ_DEFINED_VALUE_AT_ZERO (mode, val) != 2
> +  || val != GET_MODE_PRECISION (mode))
> +val = 0;
> +  else
> +val = 1;
> +
> +  rtx temp2 = op0;
> +  if (!val)
> +{
> +  temp2 = expand_binop (mode, ashl_optab, op0, const1_rtx,
> + NULL_RTX, 0, OPTAB_DIRECT);
> +  if (!temp2)
> + {
> + fail:
> +   end_sequence ();
> +   return NULL_RTX;
> + }
> +}
> +
> +  rtx temp = expand_binop (mode, ashr_optab, op0,
> +GEN_INT (GET_MODE_PRECISION (mode) - 1),
> +NULL_RTX, 0, OPTAB_DIRECT);
> +  if (!temp)
> +goto fail;
> +
> +  temp = expand_binop (mode, xor_optab, temp2, temp, NULL_RTX, 0,
> +OPTAB_DIRECT);
> +  if (!temp)
> +goto fail;
> +
> +  if (!val)
> +{
> +  temp = expand_binop (mode, ior_optab, temp, const1_rtx,
> +NULL_RTX, 0, OPTAB_DIRECT);
> +  if (!temp)
> + goto fail;
> +}
> +  temp = expand_unop_direct (mode, clz_optab, temp, val ? NULL_RTX : target,
> +  true);
> +  if (!temp)
> +goto fail;
> +  if (val)
> +{
> +  temp = expand_binop (mode, add_optab, temp, constm1_rtx,
> +target, 0, OPTAB_DIRECT);
> +  if (!temp)
> + goto fail;
> +}
> +
> +  rtx_insn *seq = get_insns ();
> +  end_sequence ();
> +
> +  add_equal_note (seq, temp, CLRSB, op0, NULL_RTX, mode);
> +  emit_insn (seq);
> +  return temp;
> +}
> +
>  /* Try calculating clz of a double-word quantity as two clz's of word-sized
> quantities, choosing which based on whether the high word is nonzero.  */
>  static rtx
> @@ -3171,6 +3247,9 @@ expand_unop (machine_mode mode, optab un
> temp = widen_leading (int_mode, op0, target, unoptab);
> if (temp)
>   return temp;
> +   temp = expand_clrsb_using_clz (int_mode, op0, target);
> +   if (temp)
> + return temp;
>   }
>goto try_libcall;
>  }
> --- gcc/testsuite/gcc.target/i386/pr101950-1.c.jj 2021-08-18 
> 13:58:05.363093681 +0200
> +++

Re: [PATCH] move x86 to use gather/scatter internal functions

2021-08-19 Thread Richard Biener via Gcc-patches

On Wed, 18 Aug 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 18 Aug 2021, Hongtao Liu wrote:
> >
> >> On Wed, Aug 18, 2021 at 7:37 PM Hongtao Liu  wrote:
> >> >
> >> > On Wed, Aug 18, 2021 at 7:30 PM Hongtao Liu  wrote:
> >> > >
> >> > > On Wed, Aug 18, 2021 at 6:28 PM Richard Biener  
> >> > > wrote:
> >> > > >
> >> > > > On Wed, 18 Aug 2021, Richard Biener wrote:
> >> > > >
> >> > > > >
> >> > > > > So in the end I seem to be able to combine AVX & AVX512 arriving
> >> > > > > at the following which passes basic testing.  I will now see to
> >> > > > > teach the vectorizer the required "promotion" to handle
> >> > > > > mask_gather_loadv4dfv4si and mask_gather_loadv4sfv4di.
> >> > > > >
> >> > > > > Meanwhile, do you see any hole in the below?  If not I'll
> >> > > > > do mask_scatter_store accordingly (that will be simpler since
> >> > > > > AVX doesn't have scatter).
> >> > > >
> >> > > > There seems to be one more complication ... we have
> >> > > >
> >> > > > (define_expand "avx2_gatherdi"
> >> > > >   [(parallel [(set (match_operand:VEC_GATHER_MODE 0 
> >> > > > "register_operand")
> >> > > >(unspec:VEC_GATHER_MODE
> >> > > >  [(match_operand: 1
> >> > > > "register_operand")
> >> > > >   (mem:
> >> > > > (match_par_dup 6
> >> > > >   [(match_operand 2 "vsib_address_operand")
> >> > > >(match_operand:
> >> > > >   3 "register_operand")
> >> > > >
> >> > > > but VEC_GATHER_IDXDI is
> >> > > >
> >> > > > (define_mode_attr VEC_GATHER_IDXDI
> >> > > >   [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
> >> > > >(V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI")
> >> > > >(V4SI "V2DI") (V8SI "V4DI") (V16SI "V8DI")
> >> > > >(V4SF "V2DI") (V8SF "V4DI") (V16SF "V8DI")])
> >> > > >
> >> > > > I'd have expected (V4SF "V4DI") for example, or (V8SF "V8DI").
> >> > > VEX.128 version: For dword indices, the instruction will gather four
> >> > > single-precision floating-point values. For
> >> > > qword indices, the instruction will gather two values and zero the
> >> > > upper 64 bits of the destination.
> >> > > VEX.256 version: For dword indices, the instruction will gather eight
> >> > > single-precision floating-point values. For
> >> > > qword indices, the instruction will gather four values and zero the
> >> > > upper 128 bits of the destination.
> >> > >
> >> > > So, for expander name, it should be v2sfv2di and v4sfv4di for IDXDI
> >> > > under avx2, and v8sfv8di under avx512.
> >> > >
> >> > > cut pattern
> >> > > (define_insn "*avx2_gatherdi_2"
> >> > >   [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=")
> >> > > (unspec:VEC_GATHER_MODE
> >> > >   [(pc)
> >> > >(match_operator: 6 "vsib_mem_operator"
> >> > >  [(unspec:P
> >> > > [(match_operand:P 2 "vsib_address_operand" "Tv")
> >> > > (match_operand: 3 "register_operand" "x")
> >> > > (match_operand:SI 5 "const1248_operand" "n")]
> >> > > UNSPEC_VSIBADDR)])
> >> > >(mem:BLK (scratch))
> >> > >(match_operand: 4 "register_operand" "1")]
> >> > >   UNSPEC_GATHER))
> >> > >(clobber (match_scratch:VEC_GATHER_MODE 1 "="))]
> >> > >   "TARGET_AVX2"
> >> > > {
> >> > >   if (mode != mode)
> >> > > return "%M2vgatherq\t{%4, %6,
> >> > > %x0|%x0, %6, %4}";
> >> > > cut end---
> >> > > We are using the trick of the operand modifier %x0 to force print xmm.
> >> > (define_mode_attr VEC_GATHER_SRCDI
> >> >   [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI")
> >> >(V2DF "V2DF") (V4DF "V4DF") (V8DF "V8DF")
> >> >(V4SI "V4SI") (V8SI "V4SI") (V16SI "V8SI")
> >> >(V4SF "V4SF") (V8SF "V4SF") (V16SF "V8SF")])
> >> >
> >> > (define_insn "*avx2_gathersi"
> >> >   [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=")
> >> > (unspec:VEC_GATHER_MODE
> >> >   [(match_operand:VEC_GATHER_MODE 2 "register_operand" "0")
> >> >(match_operator: 7 "vsib_mem_operator"
> >> >  [(unspec:P
> >> > [(match_operand:P 3 "vsib_address_operand" "Tv")
> >> > (match_operand: 4 "register_operand" "x")
> >> > (match_operand:SI 6 "const1248_operand" "n")]
> >> > UNSPEC_VSIBADDR)])
> >> >(mem:BLK (scratch))
> >> >(match_operand:VEC_GATHER_MODE 5 "register_operand" "1")]
> >> >   UNSPEC_GATHER))
> >> >(clobber (match_scratch:VEC_GATHER_MODE 1 "="))]
> >> >   "TARGET_AVX2"
> >> >   "%M3vgatherd\t{%1, %7, %0|%0, %7, %1}"
> >> >
> >> > Or only print operands[1] which has mode VEC_GATHER_SRCDI as index.
> >> 
> >> Typo should be operands[2] in the below one
> >> (define_insn "*avx2_gatherdi"
> >>   [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=")
> >> (unspec:VEC_GATHER_MODE
> >>   [(match_operand: 2 "register_operand" "0")
> >>(match_operator: 7 "vsib_mem_operator"
> >>  [(unspec:P
> >> [(match_operand:P 3

[committed] openmp: Fix ICE on requires clause with atomic_default_mem_order (

2021-08-19 Thread Jakub Jelinek via Gcc-patches

Hi!

When working on error directive, I've noticed the C FE ICEs on
  #pragma omp requires atomic_default_mem_order (
where it tries to peek 2nd token after the CPP_PRAGMA_EOL (or CPP_EOF)
in there in order to improve error-recovery on say
atomic_default_mem_order (acquire)
or
atomic_default_mem_order (seqcst)
etc.  The C++ FE didn't ICE, but it is better to follow the same thing there.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-08-19  Jakub Jelinek  

gcc/c/
* c-parser.c (c_parser_omp_requires): Don't call
c_parser_peek_2nd_token and optionally consume token if current
token is CPP_EOF, CPP_PRAGMA_EOL or CPP_CLOSE_PAREN.
gcc/cp/
* parser.c (cp_parser_omp_requires): Don't call cp_lexer_nth_token_is
and optionally consume token if current token is CPP_EOF,
CPP_PRAGMA_EOL or CPP_CLOSE_PAREN.
gcc/testsuite/
* c-c++-common/gomp/requires-3.c: Add testcase for
atomic_default_mem_order ( at the end of line without corresponding ).

--- gcc/c/c-parser.c.jj 2021-08-18 11:10:34.922869013 +0200
+++ gcc/c/c-parser.c2021-08-18 18:51:19.692144764 +0200
@@ -21710,9 +21714,18 @@ c_parser_omp_requires (c_parser *parser)
  error_at (c_parser_peek_token (parser)->location,
"expected %, % or "
"%");
- if (c_parser_peek_2nd_token (parser)->type
- == CPP_CLOSE_PAREN)
-   c_parser_consume_token (parser);
+ switch (c_parser_peek_token (parser)->type)
+   {
+   case CPP_EOF:
+   case CPP_PRAGMA_EOL:
+   case CPP_CLOSE_PAREN:
+ break;
+   default:
+ if (c_parser_peek_2nd_token (parser)->type
+ == CPP_CLOSE_PAREN)
+   c_parser_consume_token (parser);
+ break;
+   }
}
  else
c_parser_consume_token (parser);
--- gcc/cp/parser.c.jj  2021-08-18 21:37:03.225488478 +0200
+++ gcc/cp/parser.c 2021-08-18 21:39:34.343353030 +0200
@@ -45479,9 +45479,18 @@ cp_parser_omp_requires (cp_parser *parse
  error_at (cp_lexer_peek_token (parser->lexer)->location,
"expected %, % or "
"%");
- if (cp_lexer_nth_token_is (parser->lexer, 2,
-CPP_CLOSE_PAREN))
-   cp_lexer_consume_token (parser->lexer);
+ switch (cp_lexer_peek_token (parser->lexer)->type)
+   {
+   case CPP_EOF:
+   case CPP_PRAGMA_EOL:
+   case CPP_CLOSE_PAREN:
+ break;
+   default:
+ if (cp_lexer_nth_token_is (parser->lexer, 2,
+CPP_CLOSE_PAREN))
+   cp_lexer_consume_token (parser->lexer);
+ break;
+   }
}
  else
cp_lexer_consume_token (parser->lexer);
--- gcc/testsuite/c-c++-common/gomp/requires-3.c.jj 2020-01-12 
11:54:37.016404311 +0100
+++ gcc/testsuite/c-c++-common/gomp/requires-3.c2021-08-18 
18:44:27.700851769 +0200
@@ -1,3 +1,5 @@
 #pragma omp requires atomic_default_mem_order(acquire) /* { dg-error "expected 
'seq_cst', 'relaxed' or 'acq_rel'" } */
 #pragma omp requires atomic_default_mem_order(release) /* { dg-error "expected 
'seq_cst', 'relaxed' or 'acq_rel'" } */
 #pragma omp requires atomic_default_mem_order(foobar)  /* { dg-error "expected 
'seq_cst', 'relaxed' or 'acq_rel'" } */
+#pragma omp requires atomic_default_mem_order (/* { dg-error "expected 
'seq_cst', 'relaxed' or 'acq_rel'" } */
+/* { dg-error "expected '\\\)' before end of line" "" { target *-*-* } .-1 } */


Jakub

[committed] openmp: For C++ ensure nothing directive has no operands

2021-08-19 Thread Jakub Jelinek via Gcc-patches

Hi!

When working on error directive, I've noticed that while C FE diagnosed
clauses on nothing directive which doesn't allow any, the C++ FE silently
accepted it.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk.

2021-08-19  Jakub Jelinek  

* parser.c (cp_parser_omp_nothing): Use cp_parser_require_pragma_eol
instead of cp_parser_skip_to_pragma_eol.

* c-c++-common/gomp/nothing-2.c: New test.

--- gcc/cp/parser.c.jj  2021-08-18 11:10:34.926868957 +0200
+++ gcc/cp/parser.c 2021-08-18 18:53:23.894424288 +0200
@@ -45570,7 +45570,7 @@ cp_parser_omp_requires (cp_parser *parse
 static void
 cp_parser_omp_nothing (cp_parser *parser, cp_token *pragma_tok)
 {
-  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+  cp_parser_require_pragma_eol (parser, pragma_tok);
 }
 
 
--- gcc/testsuite/c-c++-common/gomp/nothing-2.c.jj  2021-08-18 
18:35:22.068409297 +0200
+++ gcc/testsuite/c-c++-common/gomp/nothing-2.c 2021-08-18 18:35:54.499960209 
+0200
@@ -0,0 +1,2 @@
+#pragma omp nothing ,  /* { dg-error "expected end of line before" } */
+#pragma omp nothing asdf   /* { dg-error "expected end of line before" } */

Jakub

Re: [PATCH] JIT, testsuite, Darwin: Initial testsuite fixes.

2021-08-19 Thread Iain Sandoe

Hi David,

> On 18 Aug 2021, at 20:54, David Malcolm via Gcc-patches 
>  wrote:
> 
> On Wed, 2021-08-18 at 20:40 +0100, Iain Sandoe wrote:
>> Hi,
>> 
>> * Note, the strategy in jit.exp has the assumption that
>> $target==$host
>>   the patches here adhere to that - there is far less testsuite
>> library
>>   support for host-side facilities (which we’d probably want to add
>> if
>>   it was desirable to operate the Jit in a cross-compiler
>> environment).
> 
> Various people have expressed wanting to use libgccjit for ahead-of-
> time cross-compilation, so that's a use-case we're going to want to
> support at some point (e.g. the libgccjit-based rustc backend).

ack.
That will be interesting to arrange; e.g. there will have to be some way to
find N host-side libraries that correspond to N cross-toolchains + 1 host-
side library that generates code for the host as a target.  Given that we
can only have one backend per library - although a standard toolchain
layout can accommodate multiple cross-toolchains + the native.

Probably, there’s a fair amount of test-suite library work to do to make it
possible to query similar capability information about the host (to that that
we currently query for targets).

Finally, I guess, some way of offloading the built objects to the cross-target
so that the execute portion can be done.

>> ———
[snip]

>> diff --git a/gcc/jit/docs/examples/tut04-toyvm/toyvm.cc
>> b/gcc/jit/docs/examples/tut04-toyvm/toyvm.cc
>> index 4b9c7651ee3..7e9550159ad 100644
>> --- a/gcc/jit/docs/examples/tut04-toyvm/toyvm.cc
>> +++ b/gcc/jit/docs/examples/tut04-toyvm/toyvm.cc
>> @@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include 
>>  #include 
>>  
>> -#include 
>> +#include "jit-dejagnu.h"
>>  
>>  #include 
>>  
> 
> There's a Makefile in gcc/jit/docs/examples/tut04-toyvm which can be
> used to build these, so do the
>  #include "jit-dejagnu.h"
> will need to be adjusted to give a path that finds the new header?

> That said, the Makefile seems to assume pkg-config, so it's not working
> particularly well as-is, so maybe there's no need to fix this.

I don’t think that the current Makefile has any provision for finding dejagnu.h;
It will happen to work if that header is in one of the default include search
paths - but not otherwise.

Given that the Makefile seems to be designed for building the examples in-
source, I could make a patch that adds -I ../../.. to it to include the jit 
root?

Or we could symlink it - although presumably that would not work for hosts
that do not support symlinks.

> test-threads.c does some preprocessor hackery to make 
> threadsafe which it would be nice to fix, but that feels like followup
> work and not needed for this patch.

thanks, I’ve applied it as-is - but happy to make one of the two changes above
to the tut04 Makefile.

thanks,
Iain

[PATCH] expand: Add new clrsb fallback expansion [PR101950]

2021-08-19 Thread Jakub Jelinek via Gcc-patches

Hi!

As suggested in the PR, the following patch adds two new clrsb
expansion possibilities if target doesn't have clrsb_optab for the
requested nor wider modes, but does have clz_optab for the requested
mode.
One expansion is
clrsb (op0)
expands as
clz (op0 ^ (((stype)op0) >> (prec-1))) - 1
which is usable if CLZ_DEFINED_VALUE_AT_ZERO is 2 with value
of prec, because the clz argument can be 0 and clrsb should give
prec-1 in that case.
The other expansion is
clz (((op0 << 1) ^ (((stype)op0) >> (prec-1))) | 1)
where the clz argument is never 0, but it is one operation longer.
E.g. on x86_64-linux with -O2 -mno-lzcnt, this results for
int foo (int x) { return __builtin_clrsb (x); }
in
-   subq$8, %rsp
-   movslq  %edi, %rdi
-   call__clrsbdi2
-   addq$8, %rsp
-   subl$32, %eax
+   leal(%rdi,%rdi), %eax
+   sarl$31, %edi
+   xorl%edi, %eax
+   orl $1, %eax
+   bsrl%eax, %eax
+   xorl$31, %eax
and with -O2 -mlzcnt:
+   movl%edi, %eax
+   sarl$31, %eax
+   xorl%edi, %eax
+   lzcntl  %eax, %eax
+   subl$1, %eax
On armv7hl-linux-gnueabi with -O2:
-   push{r4, lr}
-   bl  __clrsbsi2
-   pop {r4, pc}
+   @ link register save eliminated.
+   eor r0, r0, r0, asr #31
+   clz r0, r0
+   sub r0, r0, #1
+   bx  lr
As it (at least usually) will make code larger, it is
disabled for -Os or cold instructions.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-08-19  Jakub Jelinek  

PR middle-end/101950
* optabs.c (expand_clrsb_using_clz): New function.
(expand_unop): Use it as another clrsb expansion fallback.

* gcc.target/i386/pr101950-1.c: New test.
* gcc.target/i386/pr101950-2.c: New test.

--- gcc/optabs.c.jj 2021-07-15 10:16:13.027581160 +0200
+++ gcc/optabs.c2021-08-18 13:36:56.410818265 +0200
@@ -2600,6 +2600,82 @@ widen_leading (scalar_int_mode mode, rtx
   return 0;
 }
 
+/* Attempt to emit (clrsb:mode op0) as
+   (plus:mode (clz:mode (xor:mode op0 (ashr:mode op0 (const_int prec-1
+ (const_int -1))
+   if CLZ_DEFINED_VALUE_AT_ZERO (mode, val) is 2 and val is prec,
+   or as
+   (clz:mode (ior:mode (xor:mode (ashl:mode op0 (const_int 1))
+(ashr:mode op0 (const_int prec-1)))
+  (const_int 1)))
+   otherwise.  */
+
+static rtx
+expand_clrsb_using_clz (scalar_int_mode mode, rtx op0, rtx target)
+{
+  if (optimize_insn_for_size_p ()
+  || optab_handler (clz_optab, mode) == CODE_FOR_nothing)
+return NULL_RTX;
+
+  start_sequence ();
+  HOST_WIDE_INT val = 0;
+  if (CLZ_DEFINED_VALUE_AT_ZERO (mode, val) != 2
+  || val != GET_MODE_PRECISION (mode))
+val = 0;
+  else
+val = 1;
+
+  rtx temp2 = op0;
+  if (!val)
+{
+  temp2 = expand_binop (mode, ashl_optab, op0, const1_rtx,
+   NULL_RTX, 0, OPTAB_DIRECT);
+  if (!temp2)
+   {
+   fail:
+ end_sequence ();
+ return NULL_RTX;
+   }
+}
+
+  rtx temp = expand_binop (mode, ashr_optab, op0,
+  GEN_INT (GET_MODE_PRECISION (mode) - 1),
+  NULL_RTX, 0, OPTAB_DIRECT);
+  if (!temp)
+goto fail;
+
+  temp = expand_binop (mode, xor_optab, temp2, temp, NULL_RTX, 0,
+  OPTAB_DIRECT);
+  if (!temp)
+goto fail;
+
+  if (!val)
+{
+  temp = expand_binop (mode, ior_optab, temp, const1_rtx,
+  NULL_RTX, 0, OPTAB_DIRECT);
+  if (!temp)
+   goto fail;
+}
+  temp = expand_unop_direct (mode, clz_optab, temp, val ? NULL_RTX : target,
+true);
+  if (!temp)
+goto fail;
+  if (val)
+{
+  temp = expand_binop (mode, add_optab, temp, constm1_rtx,
+  target, 0, OPTAB_DIRECT);
+  if (!temp)
+   goto fail;
+}
+
+  rtx_insn *seq = get_insns ();
+  end_sequence ();
+
+  add_equal_note (seq, temp, CLRSB, op0, NULL_RTX, mode);
+  emit_insn (seq);
+  return temp;
+}
+
 /* Try calculating clz of a double-word quantity as two clz's of word-sized
quantities, choosing which based on whether the high word is nonzero.  */
 static rtx
@@ -3171,6 +3247,9 @@ expand_unop (machine_mode mode, optab un
  temp = widen_leading (int_mode, op0, target, unoptab);
  if (temp)
return temp;
+ temp = expand_clrsb_using_clz (int_mode, op0, target);
+ if (temp)
+   return temp;
}
   goto try_libcall;
 }
--- gcc/testsuite/gcc.target/i386/pr101950-1.c.jj   2021-08-18 
13:58:05.363093681 +0200
+++ gcc/testsuite/gcc.target/i386/pr101950-1.c  2021-08-18 14:01:22.905335834 
+0200
@@ -0,0 +1,20 @@
+/* PR middle-end/101950 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-lzcnt" } */
+/* { dg-final { scan-assembler-not "call\[^\n\r]*__clrsb.i2" } } */
+/* { dg-final {

Re: [PATCH] PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514

2021-08-19 Thread Tobias Burnus


Hi Harald,

On 18.08.21 23:01, Harald Anlauf wrote:

Von: "Tobias Burnus"

Note, however, that gfc_simplify_len still won't handle neither
deferred strings nor their substrings.

Obviously, nonsubstrings cannot be simplified but I do not
see why  len(str(1:2))  cannot or should not be simplified.

well, here's an example that Intel rejects:
...
  character(:), allocatable :: str
   end type u
   type(u) :: q
...
   integer, parameter :: k3 = len (q% str (3:4)) ! Rejected by Intel

pr100950-ww.f90(7): error #6814: When using this inquiry function, the length 
of this object cannot be evaluated to a constant.   [LEN]


I think the question is really how to interpret "10.1.12 Constant expression"

"(4) a specification inquiry where each designator or argument is
   ...
 (b) a variable whose properties inquired about are not
(i) assumed,
(ii) deferred, or
(iii) defined by an expression that is not a constant expression,"

And as the substring bounds are constant expressions,
one can argue that (4)(b) is fulfilled as (i)–(iii) do not apply.

I am inclined to say that the Intel compiler has a bug by not
accepting it – but as written before, I regard sub-string length
(esp. with const expr) inquiries as an odd corner case which
is unlikely to occur in real-world code.


However, there is no reason why the user cannot do [...]

Maybe you can enlighten me here.  [...]

I can't as I did not understand your question. However ...

But, IMHO, the latter remark does_not_  imply that we
shall/must/have to accept code like:

if (allocated(str)) then
block
   integer, parameter :: n = len(str(:5))
end block
endif

So shall we not simplify here (and thus reject it)?
This is important!  Or silently simplify and accept it?


I tried to draw the line between simplification – to generate better code –
and 'constant expression' handling (accept where permitted, diagnose
non-standard-conforming code). — However, nearly but not quite always:
if it can be simplified to a constant the standard also regards it as
constant expression.

I think in for the purpose of the examples in this email thread,
we do not need to distinguish the two. — And can always simplify
deferred-length substrings where the substring bounds are const
expressions (or the lower-bound is absent and, hence, 1).


With the caveat from above that len() is rather special,
there is no real reason why:  str_array(:)(4:5)  cannot be handled.
(→ len = 2).

Good point.  This is fixed in the revised patch and tested for.


Still does not work – or rather: ...%t(:)(3:4) [i.e. substring with array 
section]
and ...%str(3:4) [i.e. substring of deferred-length scalar] both do work
but if one combines the two (→ ...%str2(:)(3:4), i.e. substring of 
deferred-length
array section), it does not:

Array ‘r’ at (1) is a variable, which does not reduce to a constant expression

for:

--- a/gcc/testsuite/gfortran.dg/pr100950.f90
+++ b/gcc/testsuite/gfortran.dg/pr100950.f90
@@ -15,2 +15,3 @@ program p
  character(len=:), allocatable :: str
+ character(len=:), allocatable :: str2(:)
   end type t_
@@ -24,2 +25,4 @@ program p
   integer,  parameter :: l6 = len (r(1)%str (3:4))
+  integer,  parameter :: l7 = len (r(1)%str2(1)(3:4))
+  integer,  parameter :: l8 = len (r(1)%str2(:)(3:4))


which feels odd.


The updated patch regtests fine.  OK?

Looks good to me except for the caveats.

Regtested again.

[...]

Well, there's already
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101735


I have added the example to the PR.


For deferred length, I have no strong opinion; [...]

Actually, this is now an important point.  If we really want
to allow to handle substrings of deferred length strings
in constant expressions, the new patch would be fine,

I think handling len=: substrings is fine.

In principle, LGTM – except I wonder what we do about the
len(r(1)%str(1)(3:4));
I think we really do handle most code available and I would like to
close this
topic – but still it feels a bit odd to leave this bit out.

I was also wondering whether we should check that the
compile-time simplification works – i.e. use -fdump-tree-original for this;
I attached a patch for this.

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
diff --git a/gcc/testsuite/gfortran.dg/pr100950.f90 b/gcc/testsuite/gfortran.dg/pr100950.f90
index 7de589fe882..b9dcef0a7af 100644
--- a/gcc/testsuite/gfortran.dg/pr100950.f90
+++ b/gcc/testsuite/gfortran.dg/pr100950.f90
@@ -1,0 +2 @@
+! { dg-additional-options "-fdump-tree-original" }
@@ -15,0 +17 @@ program p
+ character(len=:), allocatable :: str2(:)
@@ -24,0 +27,2 @@ program p
+!  integer,  parameter :: l7 = len (r(1)%str2(1)(3:4))
+!  integer,  parameter :: l8 = len (r(1)%str2(:)(3:4))
@@

Re: [committed] Introduce selftest::locate_file (v5)

2021-08-19 Thread Thomas Schwinge

Hi!

On 2021-08-18T16:56:18-0700, "H.J. Lu"  wrote:
> On Tue, Aug 17, 2021 at 12:01 AM Thomas Schwinge
>  wrote:
>> On 2016-12-14T21:31:05-0500, David Malcolm  wrote:
>> > On Wed, 2016-12-14 at 15:02 +0100, Bernd Schmidt wrote:
>> >> On 12/09/2016 08:32 PM, David Malcolm wrote:
>> >> > Thanks.  Unfortunately, applying the "locate_file" patch
>> >> >   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01186.html
>> >> > would now introduce a regression in a recently-added test case:
>> >>
>> >> > The problem is that this DejaGnu test case uses -fself-test, and
>> >> > doesn't provide any arguments.  With the locate_file patch, we need to
>> >> > pass the path to $(srcdir)/testsuite/selftests as an argument to -fself
>> >> > -test, and it's not clear to me how to do that sanely in a DejaGnu test
>> >> > case
>>
>> Rather simple, actually -- once you realize how all this works.  ;-)
>>
>> >> > if I pass in a dummy value (like for pr71591.c), then the
>> >> > selftests that use locate_file fail.
>>
>> > I've committed the following updated version to trunk (as r243681).
>> >
>> > Changed in v5:
>> > * disable DejaGnu test for PR 78213
>> >
>> > Successfully bootstrapped on x86_64-pc-linux-gnu (with 2 PASS
>> > results converted to 1 UNSUPPORTED in gcc.sum, re gcc.dg/pr78213.c).
>>
>> > --- a/gcc/testsuite/gcc.dg/pr78213.c
>> > +++ b/gcc/testsuite/gcc.dg/pr78213.c
>> > @@ -1,6 +1,13 @@
>> >  /* { dg-do compile } */
>> >  /* { dg-options "-fself-test" } */
>> >
>> > +/* When this test was written -fself-test took no argument, but it
>> > +   has subsequently gained a mandatory argument, giving the path
>> > +   to selftest support files (within the srcdir).
>> > +   It's not clear how to provide this path sanely from
>> > +   within DejaGnu, so for now, this test is disabled.  */
>> > +/* { dg-skip-if "" { *-*-* } } */
>> > +
>> >  /* Verify that -fself-test does not fail on a non empty source.  */
>> >
>> >  int i;
>> >   void bar();  
>> >void foo()
>>
>> OK to push the attached "Restore 'gcc.dg/pr78213.c' testing" to master
>> branch?
>>
>> See 'git grep --cached 'dg-.*options .*\$' -- */testsuite/' for
>> pre-existing '$srcdir' usage in DejaGnu directives.
>
> This caused:
>
> cc1: note: self-tests are not enabled in this build
> FAIL: gcc.dg/pr78213.c -fself-test (test for warnings, line )

Sorry for that.

> on release branches.

Specifically: in '--enable-checking=release' etc. configurations.
(I had done my testing with checking enabled...)

This is, in fact, a problem in the original
r242748 (commit 3615816da830d41f67a5d8955ae588eba7f0b6fb)
"[PR target/78213] Do not ICE on non-empty -fself-test", as made
apparent by recent commit a42467bdb70650cd2f421e67b6c3418f74feaec2
"Restore 'gcc.dg/pr78213.c' testing", after the test case had gotten
disabled in r243681 (commit ecfc21ff34ddc6f8aa517251fb51494c68ff741f)
"Introduce selftest::locate_file" shortly after its original introduction.
(This can be seen in a few  reports from
back then.)

I've pushed "Fix up 'gcc.dg/pr78213.c' for '--enable-checking=release' etc."
to master branch in commit b7fc42073c04813f6b63e0641d3f6765424857c9,
cherry-picked into releases/gcc-11 branch in
commit 5fb588a677bf34dc864c577ed848405752905b89, releases/gcc-10 branch
in commit ee7502e5fec1a1c0215febfd486a0df9ffaf5692, and releases/gcc-9
branch in commit fc1993af02a3076e91c24f372be1883517453095, see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From b7fc42073c04813f6b63e0641d3f6765424857c9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 19 Aug 2021 08:25:47 +0200
Subject: [PATCH] Fix up 'gcc.dg/pr78213.c' for '--enable-checking=release'
 etc.

Fix up for r242748 (commit 3615816da830d41f67a5d8955ae588eba7f0b6fb)
"[PR target/78213] Do not ICE on non-empty -fself-test", as made
apparent by recent commit a42467bdb70650cd2f421e67b6c3418f74feaec2
"Restore 'gcc.dg/pr78213.c' testing", after the test case had gotten
disabled in r243681 (commit ecfc21ff34ddc6f8aa517251fb51494c68ff741f)
"Introduce selftest::locate_file" shortly after its original introduction.

	gcc/testsuite/
	PR testsuite/101969
	* gcc.dg/pr78213.c: Fix up for '--enable-checking=release' etc.
---
 gcc/testsuite/gcc.dg/pr78213.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr78213.c b/gcc/testsuite/gcc.dg/pr78213.c
index 40dd3c82b60..04bf0381f76 100644
--- a/gcc/testsuite/gcc.dg/pr78213.c
+++ b/gcc/testsuite/gcc.dg/pr78213.c
@@ -8,4 +8,5 @@ int i;
   while (i--)
 bar();
 }
-/* { dg-message "fself\-test: " "-fself-test" { target *-*-* } 0 } */
+
+/* { dg-regexp {^-fself-test: [0-9]+

Re: [PATCH] c++, v3: Implement P0466R5 __cpp_lib_is_layout_compatible compiler helpers [PR101539]

2021-08-19 Thread Christophe Lyon via Gcc-patches

Hi Jakub,


On Tue, Aug 17, 2021 at 5:35 PM Jason Merrill via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On 8/17/21 10:55 AM, Jakub Jelinek wrote:
> > On Tue, Aug 17, 2021 at 07:10:28AM -0700, Jason Merrill wrote:
> >> Looks good, thanks.  I think you didn't see that I also asked for some
> added
> >> comments; OK with those added.
> >
> > Oops, I've indeed missed them, sorry.
> >
> > On Mon, Aug 16, 2021 at 03:57:21PM -0400, Jason Merrill wrote:
> >> Add a comment that discussion in core suggests that we might move toward
> >> treating multiple union fields of the same type as the same field, so
> this
> >> constraint might get dropped in the future.
> >
> > Just same type fields, or even any fields with layout compatible types?
> > Anyway, either of that would require further changes in the code.
>
> Just same type.
>
> > So that I don't repost the whole large patch, here is just incremental
> > diff with the added comments:
>
> Looks good, thanks.
>
>
This patch ( r12-2975) is causing regressions on arm and aarch64:

g++:g++.target/aarch64/aarch64.exp=g++.target/aarch64/no_unique_address_1.C
check-function-bodies _Z8caller_pR1P

g++:g++.target/aarch64/aarch64.exp=g++.target/aarch64/no_unique_address_2.C
 (test for warnings, line 169)

g++:g++.target/aarch64/aarch64.exp=g++.target/aarch64/no_unique_address_2.C
check-function-bodies _Z8caller_pR1P

g++:g++.target/arm/arm.exp=g++.target/arm/no_unique_address_1.C
check-function-bodies _Z8caller_pR1P
g++:g++.target/arm/arm.exp=g++.target/arm/no_unique_address_2.C  (test
for warnings, line 163)
g++:g++.target/arm/arm.exp=g++.target/arm/no_unique_address_2.C
check-function-bodies _Z8caller_pR1P


Christophe

> --- gcc/cp/semantics.c2021-08-17 11:36:44.024227609 +0200
> > +++ gcc/cp/semantics.c2021-08-17 16:41:57.070923754 +0200
> > @@ -10923,6 +10923,16 @@
> >  basetype2, membertype2, arg2);
> > if (TREE_TYPE (ret) == boolean_type_node)
> >   return ret;
> > +  /* If both arg1 and arg2 are INTEGER_CSTs,
> is_corresponding_member_aggr
> > + already returns boolean_{true,false}_node whether those particular
> > + members are corresponding members or not.  Otherwise, if only
> > + one of them is INTEGER_CST (canonicalized to first being
> INTEGER_CST
> > + above), it returns boolean_false_node if it is certainly not a
> > + corresponding member and otherwise we need to do a runtime check
> that
> > + those two OFFSET_TYPE offsets are equal.
> > + If neither of the operands is INTEGER_CST,
> is_corresponding_member_aggr
> > + returns the largest offset at which the members would be
> corresponding
> > + members, so perform arg1 <= ret && arg1 == arg2 runtime check.  */
> > gcc_assert (TREE_CODE (arg2) != INTEGER_CST);
> > if (TREE_CODE (arg1) == INTEGER_CST)
> >   return fold_build2 (EQ_EXPR, boolean_type_node, arg1,
> > --- gcc/cp/typeck.c   2021-08-17 11:18:53.271850970 +0200
> > +++ gcc/cp/typeck.c   2021-08-17 16:48:56.165115017 +0200
> > @@ -1727,6 +1727,15 @@
> > field2 = DECL_CHAIN (field2);
> >   }
> >   }
> > +  /* Otherwise both types must be union types.
> > +  The standard says:
> > +  "Two standard-layout unions are layout-compatible if they have
> > +  the same number of non-static data members and corresponding
> > +  non-static data members (in any order) have layout-compatible
> > +  types."
> > +  but the code anticipates that bitfield vs. non-bitfield,
> > +  different bitfield widths or presence/absence of
> > +  [[no_unique_address]] should be checked as well.  */
> > auto_vec vec;
> > unsigned int count = 0;
> > for (; field1; field1 = DECL_CHAIN (field1))
> > @@ -1735,6 +1744,9 @@
> > for (; field2; field2 = DECL_CHAIN (field2))
> >   if (TREE_CODE (field2) == FIELD_DECL)
> > vec.safe_push (field2);
> > +  /* Discussions on core lean towards treating multiple union fields
> > +  of the same type as the same field, so this might need changing
> > +  in the future.  */
> > if (count != vec.length ())
> >   return false;
> > for (field1 = TYPE_FIELDS (type1); field1; field1 = DECL_CHAIN
> (field1))
> >
> >   Jakub
> >
>
>

Re: [PATCH] x86-64: Remove HAVE_LD_PIE_COPYRELOC

2021-08-19 Thread Fāng-ruì Sòng via Gcc-patches

PING^3 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html

On Fri, Jun 4, 2021 at 3:04 PM Fāng-ruì Sòng  wrote:
>
> PING^2 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
>
> On Mon, May 24, 2021 at 9:43 AM Fāng-ruì Sòng  wrote:
> >
> > Ping https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> >
> > On Tue, May 11, 2021 at 8:29 PM Fangrui Song  wrote:
> > >
> > > This was introduced in 2014-12 to use local binding for external symbols
> > > for -fPIE. Now that we have H.J. Lu's GOTPCRELX for years which mostly
> > > nullify the benefit of HAVE_LD_PIE_COPYRELOC, HAVE_LD_PIE_COPYRELOC
> > > should retire now.
> > >
> > > One design goal of -fPIE was to avoid copy relocations.
> > > HAVE_LD_PIE_COPYRELOC has deviated from the goal.  With this change, the
> > > -fPIE behavior of x86-64 will be closer to x86-32 and other targets.
> > >
> > > ---
> > >
> > > See https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html for a list
> > > of fixed and unfixed (e.g. gold incompatibility with protected
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=19823) issues.
> > >
> > > If you prefer a longer write-up, see
> > > https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected
> > > ---
> > >  gcc/config.in |  6 ---
> > >  gcc/config/i386/i386.c| 11 +---
> > >  gcc/configure | 52 ---
> > >  gcc/configure.ac  | 48 -
> > >  gcc/doc/sourcebuild.texi  |  3 --
> > >  .../gcc.target/i386/pie-copyrelocs-1.c| 14 -
> > >  .../gcc.target/i386/pie-copyrelocs-2.c| 14 -
> > >  .../gcc.target/i386/pie-copyrelocs-3.c| 14 -
> > >  .../gcc.target/i386/pie-copyrelocs-4.c| 17 --
> > >  gcc/testsuite/lib/target-supports.exp | 47 -
> > >  10 files changed, 2 insertions(+), 224 deletions(-)
> > >  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-1.c
> > >  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-2.c
> > >  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-3.c
> > >  delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-4.c
> > >
> > > diff --git a/gcc/config.in b/gcc/config.in
> > > index e54f59ce0c3..a65bf5d4176 100644
> > > --- a/gcc/config.in
> > > +++ b/gcc/config.in
> > > @@ -1659,12 +1659,6 @@
> > >  #endif
> > >
> > >
> > > -/* Define 0/1 if your linker supports -pie option with copy reloc. */
> > > -#ifndef USED_FOR_TARGET
> > > -#undef HAVE_LD_PIE_COPYRELOC
> > > -#endif
> > > -
> > > -
> > >  /* Define if your PowerPC linker has .gnu.attributes long double 
> > > support. */
> > >  #ifndef USED_FOR_TARGET
> > >  #undef HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE
> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > index 915f89f571a..5ec3c6fd0c9 100644
> > > --- a/gcc/config/i386/i386.c
> > > +++ b/gcc/config/i386/i386.c
> > > @@ -10579,11 +10579,7 @@ legitimate_pic_address_disp_p (rtx disp)
> > > return true;
> > > }
> > >   else if (!SYMBOL_REF_FAR_ADDR_P (op0)
> > > -  && (SYMBOL_REF_LOCAL_P (op0)
> > > -  || (HAVE_LD_PIE_COPYRELOC
> > > -  && flag_pie
> > > -  && !SYMBOL_REF_WEAK (op0)
> > > -  && !SYMBOL_REF_FUNCTION_P (op0)))
> > > +  && SYMBOL_REF_LOCAL_P (op0)
> > >&& ix86_cmodel != CM_LARGE_PIC)
> > > return true;
> > >   break;
> > > @@ -22892,10 +22888,7 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree 
> > > *clear, tree *update)
> > >  static bool
> > >  ix86_binds_local_p (const_tree exp)
> > >  {
> > > -  return default_binds_local_p_3 (exp, flag_shlib != 0, true, true,
> > > - (!flag_pic
> > > -  || (TARGET_64BIT
> > > -  && HAVE_LD_PIE_COPYRELOC != 0)));
> > > +  return default_binds_local_p_3 (exp, flag_shlib != 0, true, true, 
> > > !flag_pic);
> > >  }
> > >  #endif
> > >
> > > diff --git a/gcc/configure b/gcc/configure
> > > index f03fe888384..c500f5ca11e 100755
> > > --- a/gcc/configure
> > > +++ b/gcc/configure
> > > @@ -29968,58 +29968,6 @@ fi
> > >  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld_pie" >&5
> > >  $as_echo "$gcc_cv_ld_pie" >&6; }
> > >
> > > -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking linker PIE support 
> > > with copy reloc" >&5
> > > -$as_echo_n "checking linker PIE support with copy reloc... " >&6; }
> > > -gcc_cv_ld_pie_copyreloc=no
> > > -if test $gcc_cv_ld_pie = yes ; then
> > > -  if test $in_tree_ld = yes ; then
> > > -if test "$gcc_cv_gld_major_version" -eq 2 -a 
> > > "$gcc_cv_gld_minor_version" -ge 25 -o "$gcc_cv_gld_major_version" -gt 2; 
> > > then
> > > -

76 matches

Mail list logo