Re: [PATCH] Simplify testing symbol sections

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/20/19 11:08 PM, Strager Neds wrote:
> While fixing some bugs in __attribute__((section)), I found it difficult
> to write tests. Make testing easier: introduce the
> scan-assembler-symbol-section and scan-symbol-section helpers. See
> in-line documentation for details.
>
> Testing:
>
> * Run `make check` on x86_64-linux-gnu with --disable-multilib
>   --enable-checking=release --enable-languages=c,c++. Observe no new
>   failures in test results.
> * Run `make check` on macOS x86_64-apple-darwin16.7.0 with
>   --disable-multilib --enable-checking=release --enable-languages=c,c++.
>   Observe no new failures in test results.
> * Run test-framework.exp with CHECK_TEST_FRAMEWORK=1, and post-process
>   results with test-framework.awk. Observe no new failures test
>   results.
>
> 2019-11-12  Matthew Glazar 
>
> * gcc/testsuite/lib/scanasm.exp (dg-scan): Extract file globbing
> code ...
> (dg_glob_remote): ... into this new procedure.
> (scan-assembler-symbol-section): Define.
> (scan-symbol-section): Define.

[ ... ]

So this was marginally painful due to the mangling from your mailer. 
But after a fair amount of hand-editing the patch file I got it to
apply.  I actually botched that ever-so-slightly resulting in pr25376
not being updated properly which the testsuite naturally complained
about -- good ;-)  The darwin test has changed slightly since you
originally submitted this patch.  Hopefully I updated that properly as well.


I could then generate a fresh diff with proper formatting ;-)  Things
look generally OK.  A few caveats worth mentioning.


This may fail on targets that silently put certain objects into
nonstandard sections.  I don't think that's a problem with your patch,
but more a note that it's likely someone could write a test with this
new bit of framework and find that it fails on some targets.  I did
throw this into my tester just to see if any of the existing tests might
tickle this issue on one of the crosses, but it  didn't flag anything
(well, there is one msp430 failure, but I'm pretty sure that's not your
change).


I worry a bit about the less common native targets -- aix, hpux and the
like.  But testing them is too painful to contemplate these days.  I'm
sure those with access to suitable hardware will chime in if something
is amiss.


I'm going to go ahead and push the patch to the trunk.  Thanks again for
your patience.


jeff




Re: [PATCH v2] PR target/97682 - Fix to reuse t1 register between call address and epilogue.

2020-11-13 Thread Jim Wilson
On Thu, Nov 12, 2020 at 10:56 PM Monk Chiang  wrote:

>   - When expanding the call pattern, choose t1 register be a jump register.
> Epilogue also uses a t1 register to adjust Stack point. The call
> pattern
> and epilogue will initial t1 twice, if both are generated in the same
> function. The call pattern will emit 'la t1,symbol' and 'jalr
> t1'instructions.
> Epilogue also emits 'li t1,4096' and 'addi sp,sp,t1' instructions.
> But li and addi instructions will be placed between la and jalr
> instructions.
> The la instruction will be removed by some optimizations,
> because t1 register define twice, the first define instruction look
> likes duplicate.
>

Thanks.  Committed and pushed.

Jim


[Bug target/97682] Miscompiled tail call with -fPIC

2020-11-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97682

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Jim Wilson :

https://gcc.gnu.org/g:207de83922bda8707aa33d6a2185e691116377e7

commit r11-5026-g207de83922bda8707aa33d6a2185e691116377e7
Author: Monk Chiang 
Date:   Fri Nov 13 19:35:11 2020 -0800

PR target/97682 - Fix to reuse t1 register between call address and
epilogue.

  - When expanding the call pattern, choose t1 register be a jump register.
Epilogue also uses a t1 register to adjust Stack point. The call
pattern
and epilogue will initial t1 twice, if both are generated in the same
function. The call pattern will emit 'la t1,symbol' and 'jalr
t1'instructions.
Epilogue also emits 'li t1,4096' and 'addi sp,sp,t1' instructions.
But li and addi instructions will be placed between la and jalr
instructions.
The la instruction will be removed by some optimizations,
because t1 register define twice, the first define instruction look
likes duplicate.

  - To resolve this issue, Prologue and Epilogue use the t0 register
be a temporary register, the call pattern use the t1 register be
a temporary register.

gcc/
2020-11-13  Monk Chiang  

PR target/97682
* config/riscv/riscv.h (RISCV_PROLOGUE_TEMP_REGNUM): Change
register
to t0.
(RISCV_CALL_ADDRESS_TEMP_REGNUM): New Marco, define t1 register.
(RISCV_CALL_ADDRESS_TEMP): Use it for call instructions.
* config/riscv/riscv.c (riscv_legitimize_call_address): Use
RISCV_CALL_ADDRESS_TEMP.
(riscv_compute_frame_info): Change temporary register to t0 form
t1.
(riscv_trampoline_init): Adjust comment.

gcc/testsuite/
2020-11-13  Monk Chiang  

PR target/97682
* g++.target/riscv/pr97682.C: New test.
* gcc.target/riscv/interrupt-3.c: Check register for t0.
* gcc.target/riscv/interrupt-4.c: Likewise.

Re: [PATCH] Add a new pattern in 4-insn combine

2020-11-13 Thread Jim Wilson
On Tue, Nov 10, 2020 at 4:18 PM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
> On 11/8/20 7:48 PM, HAO CHEN GUI via Gcc-patches wrote:
> > ChangeLog
> >
> >   * combine.c (combine_validate_cost): Add an argument for newi1pat.
> >   (try_combine): Add a 4-insn combine pattern for optimizing rtx
> >   sign_extend (op:zero_extend, zero_extend).
>
> It'd be nice to see motivating examples.  Depending on their structure,
> we may get better results cleaning things up with match.pd patterns.  We
> already have some which work in this space.
>

I don't have a use case for this specifically, but for the general case of
allowing a 4-insn combine to split into 3 I do have a RISC-V use case for
that in Philipp Tomsich's recent match.pd thread.
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558765.html

Maybe instead of modifying combine to know about the
sign_extend(zero_extend zero_extend) case you could add a 4->3 splitter to
the rs6000.md file, and then where we call combine_split_insns modify the
code to accept 3 output insns when there were 4 input insns.  That would
fix this rs6000 case, and also allow me to fix my RISC-V case with a
riscv.md splitter.

Jim


Re: [PATCH] Asan changes for RISC-V.

2020-11-13 Thread Jim Wilson
On Fri, Nov 13, 2020 at 11:12 AM Jeff Law  wrote:

>
> On 10/28/20 5:58 PM, Jim Wilson wrote:
> > We have only riscv64 asan support, there is no riscv32 support as yet.
> So I
> > need to be able to conditionally enable asan support for the riscv
> target.  I
> > implemented this by returning zero from the asan_shadow_offset
> function.  This
> > requires a change to toplev.c and docs in target.def.
> >
>


> I noticed you hadn't committed this change.  Just to be explicit, this
> is OK for the trunk.
>

Thanks committed.  I can self approve the RISC-V parts but not the asan
changes so I wanted a review for that.

Jim


Re: [PATCH][RFC] Make mingw-w64 printf/scanf attribute alias to ms_printf/ms_scanf only for C89

2020-11-13 Thread Liu Hao via Gcc-patches
在 2020/11/14 上午8:12, Joseph Myers 写道:
> On Fri, 13 Nov 2020, Liu Hao via Gcc-patches wrote:
> 
>> 在 2020/11/13 2:46, Joseph Myers 写道:
>>> I'd expect these patches to include updates to the gcc.dg/format/ms_*.c 
>>> tests to reflect the changed semantics (or new tests there if some of the 
>>> changes don't result in any failures in the existing tests).
>>>
>>
>> Does the attached patch suffice?
> 
> This is testing the sort of thing I'd expect tests for regarding 'L' and 
> 'll'.  What about the changes for 'I' - do those result in changes to how 
> the compiler behaves, which should also have tests?
> 

There is already a test for `%Ix` in 'ms_c99-printf-2.c', but there is no test 
(in all 'ms_*printf*'
files) which checks for absence of warnings.


-- 
Best regards,
LH_Mouse



signature.asc
Description: OpenPGP digital signature


Re: [RS6000] Use LIB2_SIDITI_CONV_FUNCS in place of ppc64-fp.c

2020-11-13 Thread Segher Boessenkool
Hi!

On Fri, Nov 13, 2020 at 11:39:35PM +1030, Alan Modra wrote:
> This patch retires ppc64-fp.c in favour of using
> "LIB2_SIDITI_CONV_FUNCS = yes", which is a lot better solution than
> having a copy of selected libgcc2.c functions.

Nice :-)

> Bootstrapped and regression tested powerpc64-linux, powerpc64le-linux,
> powerpc-linux and powerpc-ibm-aix7.2.4.0.  OK?

Yes, okay for trunk, thank you!

>   * config/rs6000/t-ppc64-fp (LIB2ADD): Delete.
>   (LIB2_SIDITI_CONV_FUNCS): Define.
>   * config/rs6000/ppc64-fp.c: Delete file.

(Don't forget to delete the file, it isn't in your patch.)


Segher


Re: [PATCH] std::experimental::simd

2020-11-13 Thread Matthias Kretz
On Donnerstag, 12. November 2020 00:43:31 CET Jonathan Wakely wrote:
> On 08/05/20 21:03 +0200, Matthias Kretz wrote:
> >Here's my last update to the std::experimental::simd patch. It's currently
> >based on the gcc-10 branch.
> >
> >
> >+
> >+// __next_power_of_2{{{
> >+/**
> >+ * \internal
> 
> We use @foo for Doxygen commens rather than \foo

Done.

> >+ * Returns the next power of 2 larger than or equal to \p __x.
> >+ */
> >+constexpr std::size_t
> >+__next_power_of_2(std::size_t __x)
> >+{
> >+  return (__x & (__x - 1)) == 0 ? __x
> >+: __next_power_of_2((__x | (__x >> 1)) + 1);
> >+}
> 
> Can this be replaced with std::__bit_ceil ?
> 
> std::bit_ceil is C++20, but we provide __private versions of
> everything in  for C++14 and up.

Ah good. I'll delete some code.

> >+// vvv  type traits  vvv
> >+// integer type aliases{{{
> >+using _UChar = unsigned char;
> >+using _SChar = signed char;
> >+using _UShort = unsigned short;
> >+using _UInt = unsigned int;
> >+using _ULong = unsigned long;
> >+using _ULLong = unsigned long long;
> >+using _LLong = long long;
> 
> I have a suspicion some of these might clash with libc macros on some
> OS somewhere, but we can cross that bridge when we come to it.

I need those to help cutting down the code for 80 cols. ;-)

> >+// __make_dependent_t {{{
> >+template  struct __make_dependent
> >+{
> >+  using type = _Up;
> >+};
> >+template 
> >+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;
> 
> Do you need a distinct class template for this, or can
> __make_dependent_t be an alias to __type_identity::type or
> something else that already exists?

With GCC it would suffice to use __type_identity::type here. But Clang 
rejects it. Clang sees that the first template argument is not used in the 
definition of the alias and thus doesn't make _Up a dependent type.

> >+// __call_with_n_evaluations{{{
> >+template 
> >+_GLIBCXX_SIMD_INTRINSIC constexpr auto
> >+__call_with_n_evaluations(std::index_sequence<_I...>, _F0&& __f0,
> >+  _FArgs&& __fargs)
> 
> I'm not sure if it matters here, but old versions of G++ passed empty
> types (like index_sequence) using the wrong ABI. Passing them as the
> last argument makes it a non-issue. If they're not the last argument,
> you get incompatible code when compiling with -fabi-version=7 or
> lower.

These are all [[gnu::always_inline]]. So it shouldn't matter.

> >+// __is_narrowing_conversion<_From, _To>{{{
> >+template  >std::is_arithmetic<_From>::value, +bool =
> >std::is_arithmetic<_To>::value>
> 
> These could use is_arithmetic_v.

Right. That was me trying to work around a clang-format bug. Will fix. I'm in 
the process of ditching clang-format anyway.

> >+{
> >+};
> >+
> >+template 
> >+struct __is_narrowing_conversion : public true_type
> 
> This looks odd, bool to arithmetic type T is narrowing?
> I assume there's a reason for it, so maybe a comment explaining it
> would help.

Odd indeed. Either I wanted to take a shortcut to implement: "From is a 
vectorizable type and every possibly value of From can be represented with 
type value_type, or [...]". Or I wanted to swap bool and _Tp here and say that 
anything other than bool converting to bool is narrowing.

I should clean this up.

> 
> >+// _BitOps {{{
> >+struct _BitOps
> > [...]
> std::__popcount in 
> > [...]
> std::__countl_zero in 

Yes. I'll clean up all of _BitOps.

> >+template 
> 
> We generally avoid single letter names, although _V isn't in the list
> of BADNAMES in the manual, so maybe this one's OK.
> 
> >+template ,
> >+  typename _R
> 
> Same for _R, it's not listed as a BADNAME.

I believe I checked the list. ;-)

> >+
> >+template 
> >+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
> >+__and(_Tp __a, _Tp __b) noexcept
> 
> Calls to __and are done unqualified. Are they only with types that
> won't cause ADL to look outside namespace std?
> 
> Even though __and is a reserved name, avoidign ADL has other benefits.

Called either with integers, [[gnu::vector_size(N)]] types, or 
std::experimental::parallelism_v2::_SimdWrapper. I request a column limit 
relaxation to at least 100 cols if I should qualify all of them with 
std::experimental:: ;-)

> That's all for now ... not very far through the huge patch though.
> Generally this looks very good. The things mentioned above are
> stylistic or just remove some redundancy, they're not critical.

Thanks. I'll post a new patch ASAP. My tests are running.

-Matthias

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──





[committed] openmp: Add support for non-VLA {, first}private allocate on omp task

2020-11-13 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds support for custom allocators on private/firstprivate
clauses for task (and taskloop) constructs.  Private didn't need anything
special, but firstprivate if it is passed by reference needs the GOMP_alloc
calls in the copyfn and GOMP_free in the task body.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-11-14  Jakub Jelinek  

* gimplify.c (gimplify_omp_for): Add OMP_CLAUSE_ALLOCATE_ALLOCATOR
decls as firstprivate on task clauses even when allocate clause
decl is not lastprivate.
* omp-low.c (install_var_field): Don't dereference omp_is_reference
types if mask is 33 rather than 1.
(scan_sharing_clauses): Populate allocate_map even for task
constructs.  For now remove it back for variables mentioned in
reduction and in_reduction clauses on task/taskloop constructs
or on VLA task firstprivates.  For firstprivate on task construct,
install the var field into field_map with by_ref and 33 instead
of false and 1 if mentioned in allocate clause.
(lower_private_allocate): Set TREE_THIS_NOTRAP on the created
MEM_REF.
(lower_rec_input_clauses): Handle allocate for task firstprivatized
non-VLA variables.
(create_task_copyfn): Likewise.

* testsuite/libgomp.c-c++-common/allocate-1.c (struct S): New type.
(foo): Add tests for non-VLA private and firstprivate clauses on
omp task.
(bar): Likewise.  Remove taking of address from private/firstprivate
variables.
* testsuite/libgomp.c++/allocate-1.C (struct S): New type.
(foo): Add p, q, px and s arguments.  Add tests for array reductions
and for non-VLA private and firstprivate clauses on omp task.
(bar): Removed.
(main): Adjust foo caller.  Don't call bar.

--- gcc/gimplify.c.jj   2020-11-12 21:37:53.907422938 +0100
+++ gcc/gimplify.c  2020-11-13 22:48:46.738345390 +0100
@@ -12463,22 +12463,22 @@ gimplify_omp_for (tree *expr_p, gimple_s
  /* Allocate clause we duplicate on task and inner taskloop
 if the decl is lastprivate, otherwise just put on task.  */
  case OMP_CLAUSE_ALLOCATE:
+   if (OMP_CLAUSE_ALLOCATE_ALLOCATOR (c)
+   && DECL_P (OMP_CLAUSE_ALLOCATE_ALLOCATOR (c)))
+ {
+   /* Additionally, put firstprivate clause on task
+  for the allocator if it is not constant.  */
+   *gtask_clauses_ptr
+ = build_omp_clause (OMP_CLAUSE_LOCATION (c),
+ OMP_CLAUSE_FIRSTPRIVATE);
+   OMP_CLAUSE_DECL (*gtask_clauses_ptr)
+ = OMP_CLAUSE_ALLOCATE_ALLOCATOR (c);
+   gtask_clauses_ptr = _CLAUSE_CHAIN (*gtask_clauses_ptr);
+ }
if (lastprivate_uids
&& bitmap_bit_p (lastprivate_uids,
 DECL_UID (OMP_CLAUSE_DECL (c
  {
-   if (OMP_CLAUSE_ALLOCATE_ALLOCATOR (c)
-   && DECL_P (OMP_CLAUSE_ALLOCATE_ALLOCATOR (c)))
- {
-   /* Additionally, put firstprivate clause on task
-  for the allocator if it is not constant.  */
-   *gtask_clauses_ptr
- = build_omp_clause (OMP_CLAUSE_LOCATION (c),
- OMP_CLAUSE_FIRSTPRIVATE);
-   OMP_CLAUSE_DECL (*gtask_clauses_ptr)
- = OMP_CLAUSE_ALLOCATE_ALLOCATOR (c);
-   gtask_clauses_ptr = _CLAUSE_CHAIN (*gtask_clauses_ptr);
- }
*gfor_clauses_ptr = c;
gfor_clauses_ptr = _CLAUSE_CHAIN (c);
*gtask_clauses_ptr = copy_node (c);
--- gcc/omp-low.c.jj2020-11-13 19:00:46.788619911 +0100
+++ gcc/omp-low.c   2020-11-13 23:07:21.996118893 +0100
@@ -803,7 +803,7 @@ install_var_field (tree var, bool by_ref
 }
   else if (by_ref)
 type = build_pointer_type (type);
-  else if ((mask & 3) == 1 && omp_is_reference (var))
+  else if ((mask & (32 | 3)) == 1 && omp_is_reference (var))
 type = TREE_TYPE (type);
 
   field = build_decl (DECL_SOURCE_LOCATION (var),
@@ -1141,8 +1141,6 @@ scan_sharing_clauses (tree clauses, omp_
/* omp_default_mem_alloc is 1 */
|| !integer_onep (OMP_CLAUSE_ALLOCATE_ALLOCATOR (c
   {
-   if (is_task_ctx (ctx))
- continue; /* For now.  */
if (ctx->allocate_map == NULL)
  ctx->allocate_map = new hash_map;
ctx->allocate_map->put (OMP_CLAUSE_DECL (c),
@@ -1222,18 +1220,20 @@ scan_sharing_clauses (tree clauses, omp_
  ctx->local_reduction_clauses
= tree_cons (NULL, c, ctx->local_reduction_clauses);
}
- if ((OMP_CLAUSE_REDUCTION_INSCAN (c)
-  || OMP_CLAUSE_REDUCTION_TASK (c)) && ctx->allocate_map)
+ /* 

[Bug libstdc++/93456] No overflow check in __atomic_futex_unsigned_base::_M_futex_wait_until

2020-11-13 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93456

--- Comment #4 from Jonathan Wakely  ---
Fixed on trunk by the patch above and a few other recent commits today.

[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible

2020-11-13 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421

--- Comment #9 from Jonathan Wakely  ---
(In reply to Rich Felker from comment #7)
> Indeed, the direct clock_gettime syscall stuff is just unnecessary on any
> modern system, certainly any time64 one. I read the patch briefly and I
> don't see anywhere it would break anything, but it also wouldn't produce a
> useful Y2038-ready configuration, so I don't think it makes sense. Configure
> or source-level assertions should just ensure that, if time_t is larger than
> long and there's a distinct time64 syscall, the direct syscall is never used.

I didn't get around to making use of the new syscalls for GCC 11, so I've
committed that patch. I'll keep this bug open to revisit it and do it properly
later.

[committed] libstdc++: Use custom timespec in system calls [PR 93421]

2020-11-13 Thread Jonathan Wakely via Gcc-patches
On 32-bit targets where userspace has switched to 64-bit time_t, we
cannot pass struct timespec to SYS_futex or SYS_clock_gettime, because
the userspace definition of struct timespec will not match what the
kernel expects.

We use the existence of the SYS_futex_time64 or SYS_clock_gettime_time64
macros to imply that userspace *might* have switched to the new timespec
definition.  This is a conservative assumption. It's possible that the
new syscall numbers are defined in the libc headers but that timespec
hasn't been updated yet (as is the case for glibc currently). But using
the alternative struct with two longs is still OK, it's just redundant
if userspace timespec still uses a 32-bit time_t.

We also check that SYS_futex_time64 != SYS_futex so that we don't try
to use a 32-bit tv_sec on modern targets that only support the 64-bit
system calls and define the old macro to the same value as the new one.

We could possibly check #ifdef __USE_TIME_BITS64 to see whether
userspace has actually been updated, but it's not clear if user code
is meant to inspect that or if it's only for libc internal use.

libstdc++-v3/ChangeLog:

PR libstdc++/93421
* src/c++11/chrono.cc [_GLIBCXX_USE_CLOCK_GETTIME_SYSCALL]
(syscall_timespec): Define a type suitable for SYS_clock_gettime
calls.
(system_clock::now(), steady_clock::now()): Use syscall_timespec
instead of timespec.
* src/c++11/futex.cc (syscall_timespec): Define a type suitable
for SYS_futex and SYS_clock_gettime calls.
(relative_timespec): Use syscall_timespec instead of timespec.
(__atomic_futex_unsigned_base::_M_futex_wait_until): Likewise.
(__atomic_futex_unsigned_base::_M_futex_wait_until_steady):
Likewise.

Tested x86_64-linux, -m32 too. Committed to trunk.

commit 4d039cb9a1d0ffc6910fe09b726c3561b64527dc
Author: Jonathan Wakely 
Date:   Thu Sep 24 17:35:52 2020

libstdc++: Use custom timespec in system calls [PR 93421]

On 32-bit targets where userspace has switched to 64-bit time_t, we
cannot pass struct timespec to SYS_futex or SYS_clock_gettime, because
the userspace definition of struct timespec will not match what the
kernel expects.

We use the existence of the SYS_futex_time64 or SYS_clock_gettime_time64
macros to imply that userspace *might* have switched to the new timespec
definition.  This is a conservative assumption. It's possible that the
new syscall numbers are defined in the libc headers but that timespec
hasn't been updated yet (as is the case for glibc currently). But using
the alternative struct with two longs is still OK, it's just redundant
if userspace timespec still uses a 32-bit time_t.

We also check that SYS_futex_time64 != SYS_futex so that we don't try
to use a 32-bit tv_sec on modern targets that only support the 64-bit
system calls and define the old macro to the same value as the new one.

We could possibly check #ifdef __USE_TIME_BITS64 to see whether
userspace has actually been updated, but it's not clear if user code
is meant to inspect that or if it's only for libc internal use.

libstdc++-v3/ChangeLog:

PR libstdc++/93421
* src/c++11/chrono.cc [_GLIBCXX_USE_CLOCK_GETTIME_SYSCALL]
(syscall_timespec): Define a type suitable for SYS_clock_gettime
calls.
(system_clock::now(), steady_clock::now()): Use syscall_timespec
instead of timespec.
* src/c++11/futex.cc (syscall_timespec): Define a type suitable
for SYS_futex and SYS_clock_gettime calls.
(relative_timespec): Use syscall_timespec instead of timespec.
(__atomic_futex_unsigned_base::_M_futex_wait_until): Likewise.
(__atomic_futex_unsigned_base::_M_futex_wait_until_steady):
Likewise.

diff --git a/libstdc++-v3/src/c++11/chrono.cc b/libstdc++-v3/src/c++11/chrono.cc
index 723f3002d11a..f10be7d8c073 100644
--- a/libstdc++-v3/src/c++11/chrono.cc
+++ b/libstdc++-v3/src/c++11/chrono.cc
@@ -35,6 +35,17 @@
 #ifdef _GLIBCXX_USE_CLOCK_GETTIME_SYSCALL
 #include 
 #include 
+
+# if defined(SYS_clock_gettime_time64) \
+  && SYS_clock_gettime_time64 != SYS_clock_gettime
+  // Userspace knows about the new time64 syscalls, so it's possible that
+  // userspace has also updated timespec to use a 64-bit tv_sec.
+  // The SYS_clock_gettime syscall still uses the old definition
+  // of timespec where tv_sec is 32 bits, so define a type that matches that.
+  struct syscall_timespec { long tv_sec; long tv_nsec; };
+# else
+  using syscall_timespec = ::timespec;
+# endif
 #endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -52,11 +63,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 system_clock::now() noexcept
 {
 #ifdef _GLIBCXX_USE_CLOCK_REALTIME
-  timespec tp;
   // -EINVAL, -EFAULT
 #ifdef _GLIBCXX_USE_CLOCK_GETTIME_SYSCALL
+  syscall_timespec 

Re: [committed] libstdc++: Optimise std::future::wait_for and fix futex polling

2020-11-13 Thread Jonathan Wakely via Gcc-patches

On 13/11/20 22:45 +, Jonathan Wakely wrote:

On 13/11/20 21:12 +, Jonathan Wakely wrote:

On 13/11/20 20:29 +, Mike Crowe via Libstdc++ wrote:

On Friday 13 November 2020 at 17:25:22 +, Jonathan Wakely wrote:

+  // Return the relative duration from (now_s + now_ns) to (abs_s + abs_ns)
+  // as a timespec.
+  struct timespec
+  relative_timespec(chrono::seconds abs_s, chrono::nanoseconds abs_ns,
+   time_t now_s, long now_ns)
+  {
+struct timespec rt;
+
+// Did we already time out?
+if (now_s > abs_s.count())
+  {
+   rt.tv_sec = -1;
+   return rt;
+  }
+
+auto rel_s = abs_s.count() - now_s;
+
+// Avoid overflows
+if (rel_s > __gnu_cxx::__int_traits::__max)
+  rel_s = __gnu_cxx::__int_traits::__max;
+else if (rel_s < __gnu_cxx::__int_traits::__min)
+  rel_s = __gnu_cxx::__int_traits::__min;


I may be missing something, but if the line above executes...


+
+// Convert the absolute timeout value to a relative timeout
+rt.tv_sec = rel_s;
+rt.tv_nsec = abs_ns.count() - now_ns;
+if (rt.tv_nsec < 0)
+  {
+   rt.tv_nsec += 10;
+   --rt.tv_sec;


...and so does this line above, then I think that we'll end up
underflowing. (Presumably rt.tv_sec will wrap round to being some time in
2038 on most 32-bit targets.)


Ugh.


I'm currently trying to persuade myself that this can actually happen and
if so work out how to come up with a test case for it.


Maybe something like:

auto d = chrono::floor(system_clock::now().time_since_epoch() 
- seconds(INT_MAX + 2LL));
fut.wait_until(system_clock::time_point(d));

This will create a sys_time with a value that is slightly more than
INT_MAX seconds before the current time, with a zero nanoseconds


Ah, but such a time will never reach the overflow because the first
thing that the new relative_timespec function does is:

if (now_s > abs_s.count())
  {
rt.tv_sec = -1;
return rt;
  }

So in fact we can never have a negative rel_s anyway.


Here's what I've pushed now, after testing on x86_64-linux.


commit b8d36dcc917e8a06d8c20b9f5ecc920ed2b9e947
Author: Jonathan Wakely 
Date:   Fri Nov 13 20:57:15 2020

libstdc++: Remove redundant overflow check for futex timeout [PR 93456]

The relative_timespec function already checks for the case where the
specified timeout is in the past, so the difference can never be
negative. That means we dn't need to check if it's more negative than
the minimum time_t value.

libstdc++-v3/ChangeLog:

PR libstdc++/93456
* src/c++11/futex.cc (relative_timespec): Remove redundant check
negative values.
* testsuite/30_threads/future/members/wait_until_overflow.cc: Moved to...
* testsuite/30_threads/future/members/93456.cc: ...here.

diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index 15959cebee57..c008798318c9 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -73,21 +73,23 @@ namespace
 	return rt;
   }
 
-auto rel_s = abs_s.count() - now_s;
+const auto rel_s = abs_s.count() - now_s;
 
-// Avoid overflows
+// Convert the absolute timeout to a relative timeout, without overflow.
 if (rel_s > __int_traits::__max) [[unlikely]]
-  rel_s = __int_traits::__max;
-else if (rel_s < __int_traits::__min) [[unlikely]]
-  rel_s = __int_traits::__min;
-
-// Convert the absolute timeout value to a relative timeout
-rt.tv_sec = rel_s;
-rt.tv_nsec = abs_ns.count() - now_ns;
-if (rt.tv_nsec < 0)
   {
-	rt.tv_nsec += 10;
-	--rt.tv_sec;
+	rt.tv_sec = __int_traits::__max;
+	rt.tv_nsec = 9;
+  }
+else
+  {
+	rt.tv_sec = rel_s;
+	rt.tv_nsec = abs_ns.count() - now_ns;
+	if (rt.tv_nsec < 0)
+	  {
+	rt.tv_nsec += 10;
+	--rt.tv_sec;
+	  }
   }
 
 return rt;


[Bug libstdc++/93456] No overflow check in __atomic_futex_unsigned_base::_M_futex_wait_until

2020-11-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93456

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:b8d36dcc917e8a06d8c20b9f5ecc920ed2b9e947

commit r11-5021-gb8d36dcc917e8a06d8c20b9f5ecc920ed2b9e947
Author: Jonathan Wakely 
Date:   Fri Nov 13 20:57:15 2020 +

libstdc++: Remove redundant overflow check for futex timeout [PR 93456]

The relative_timespec function already checks for the case where the
specified timeout is in the past, so the difference can never be
negative. That means we dn't need to check if it's more negative than
the minimum time_t value.

libstdc++-v3/ChangeLog:

PR libstdc++/93456
* src/c++11/futex.cc (relative_timespec): Remove redundant check
negative values.
* testsuite/30_threads/future/members/wait_until_overflow.cc: Moved
to...
* testsuite/30_threads/future/members/93456.cc: ...here.

[Bug libstdc++/93421] futex.cc use of futex syscall is not time64-compatible

2020-11-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93421

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:4d039cb9a1d0ffc6910fe09b726c3561b64527dc

commit r11-5022-g4d039cb9a1d0ffc6910fe09b726c3561b64527dc
Author: Jonathan Wakely 
Date:   Thu Sep 24 17:35:52 2020 +0100

libstdc++: Use custom timespec in system calls [PR 93421]

On 32-bit targets where userspace has switched to 64-bit time_t, we
cannot pass struct timespec to SYS_futex or SYS_clock_gettime, because
the userspace definition of struct timespec will not match what the
kernel expects.

We use the existence of the SYS_futex_time64 or SYS_clock_gettime_time64
macros to imply that userspace *might* have switched to the new timespec
definition.  This is a conservative assumption. It's possible that the
new syscall numbers are defined in the libc headers but that timespec
hasn't been updated yet (as is the case for glibc currently). But using
the alternative struct with two longs is still OK, it's just redundant
if userspace timespec still uses a 32-bit time_t.

We also check that SYS_futex_time64 != SYS_futex so that we don't try
to use a 32-bit tv_sec on modern targets that only support the 64-bit
system calls and define the old macro to the same value as the new one.

We could possibly check #ifdef __USE_TIME_BITS64 to see whether
userspace has actually been updated, but it's not clear if user code
is meant to inspect that or if it's only for libc internal use.

libstdc++-v3/ChangeLog:

PR libstdc++/93421
* src/c++11/chrono.cc [_GLIBCXX_USE_CLOCK_GETTIME_SYSCALL]
(syscall_timespec): Define a type suitable for SYS_clock_gettime
calls.
(system_clock::now(), steady_clock::now()): Use syscall_timespec
instead of timespec.
* src/c++11/futex.cc (syscall_timespec): Define a type suitable
for SYS_futex and SYS_clock_gettime calls.
(relative_timespec): Use syscall_timespec instead of timespec.
(__atomic_futex_unsigned_base::_M_futex_wait_until): Likewise.
(__atomic_futex_unsigned_base::_M_futex_wait_until_steady):
Likewise.

Re: [PATCH][RFC] Make mingw-w64 printf/scanf attribute alias to ms_printf/ms_scanf only for C89

2020-11-13 Thread Joseph Myers
On Fri, 13 Nov 2020, Liu Hao via Gcc-patches wrote:

> 在 2020/11/13 2:46, Joseph Myers 写道:
> > I'd expect these patches to include updates to the gcc.dg/format/ms_*.c 
> > tests to reflect the changed semantics (or new tests there if some of the 
> > changes don't result in any failures in the existing tests).
> > 
> 
> Does the attached patch suffice?

This is testing the sort of thing I'd expect tests for regarding 'L' and 
'll'.  What about the changes for 'I' - do those result in changes to how 
the compiler behaves, which should also have tests?

-- 
Joseph S. Myers
jos...@codesourcery.com


float.h: Handle C2x __STDC_WANT_IEC_60559_EXT__

2020-11-13 Thread Joseph Myers
TS 18661-1 and 18661-2 have various definitions conditional on
__STDC_WANT_IEC_60559_BFP_EXT__ and __STDC_WANT_IEC_60559_DFP_EXT__
macros.  When those TSes were integrated into C2x, most of the feature
test macro conditionals were removed (with declarations for decimal FP
becoming conditional only on whether decimal FP is supported by the
implementation and those for binary FP becoming unconditionally
required).

A few tests of those feature test macros remained for declarations
that appeared only in Annex F and not in the main part of the
standard.  A change accepted for C2x at the last WG14 meeting (but not
yet added to the working draft in git) was to replace both those
macros by __STDC_WANT_IEC_60559_EXT__; if __STDC_WANT_IEC_60559_EXT__
is defined, the specific declarations in the headers will then depend
on which features are supported by the implementation, as for
declarations not controlled by a feature test macro at all.

Thus, add a check of __STDC_WANT_IEC_60559_EXT__ for CR_DECIMAL_DIG in
float.h, the only case of this change relevant to GCC.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  OK to commit?

gcc/
2020-11-14  Joseph Myers  

* ginclude/float.h (CR_DECIMAL_DIG): Also define for
[__STDC_WANT_IEC_60559_EXT__].

gcc/testsuite/
2020-11-14  Joseph Myers  

* gcc.dg/cr-decimal-dig-3.c: New test.

diff --git a/gcc/ginclude/float.h b/gcc/ginclude/float.h
index 9c4b0385568..0442f26ec56 100644
--- a/gcc/ginclude/float.h
+++ b/gcc/ginclude/float.h
@@ -250,7 +250,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
 #endif /* C2X */
 
-#ifdef __STDC_WANT_IEC_60559_BFP_EXT__
+#if (defined __STDC_WANT_IEC_60559_BFP_EXT__ \
+ || defined __STDC_WANT_IEC_60559_EXT__)
 /* Number of decimal digits for which conversions between decimal
character strings and binary formats, in both directions, are
correctly rounded.  */
diff --git a/gcc/testsuite/gcc.dg/cr-decimal-dig-3.c 
b/gcc/testsuite/gcc.dg/cr-decimal-dig-3.c
new file mode 100644
index 000..8e07b67dd52
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cr-decimal-dig-3.c
@@ -0,0 +1,14 @@
+/* Test C2x CR_DECIMAL_DIG: defined for __STDC_WANT_IEC_60559_EXT__.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x" } */
+
+#define __STDC_WANT_IEC_60559_EXT__
+#include 
+
+#ifndef CR_DECIMAL_DIG
+#error "CR_DECIMAL_DIG not defined"
+#endif
+
+#if CR_DECIMAL_DIG < DECIMAL_DIG + 3
+#error "CR_DECIMAL_DIG too small"
+#endif

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v2] c: Silently ignore pragma region [PR85487]

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/13/20 7:57 AM, Austin Morton wrote:
> On the contrary, as a user of GCC I would much prefer a consistent
> behavior for #pragma region based purely on GCC version.

You can get consistent behavior with a command line argument and it's
much more useful over time as the world changes.


>
> IE, so you can tell people:
> "just update to GCC X.Y and those warnings will go away"
> rather than:
> "update to GCC X.Y and pass some new flags - but make sure
>  not to pass them to old GCC versions, since that will generate
>  a new warning"

I'm aware of those benefits.  But I would still claim that embedding
knowledge of other toolchain's pragmas into GCC  itself is just plain
wrong from a design standpoint.   A flag to allow specifying pragmas to
ignore would be much more useful and gets you the same level of
consistency with a much higher degree of control and future proofing.

Being able to specify them in a file would be even better (IMHO)


I'm not going to ACK this patch.  However, I won't object if someone
else wants to ACK it.


Jeff



[Bug c/97817] -Wformat-truncation=2 elicits invalid warning

2020-11-13 Thread jim at meyering dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97817

--- Comment #4 from jim at meyering dot net ---
Thanks for explaining. It would be nice if the diagnostic were to say something
along the lines of "... writing into a region whose size may be as low as N".
Given the wording of the current diagnostic, I initially went looking for a
caller whose buffer really did have length 2 (in the original it was 2, not 6).

It's only when I finally noticed that initial "if" block in the implementation
that I understood where the "2" (6 in this example) was coming from.

Re: [PATCH] c++: Predefine __STDCPP_THREADS__ in the compiler if thread model is not single

2020-11-13 Thread Jonathan Wakely via Gcc-patches

On 13/11/20 22:46 +0100, Jakub Jelinek wrote:

On Fri, Nov 13, 2020 at 04:39:25PM -0500, Jason Merrill wrote:

On 11/13/20 2:20 PM, Tom Tromey wrote:
> > > > > > "Jakub" == Jakub Jelinek via Gcc-patches  
writes:
>
> Jakub> 2020-11-13  Jakub Jelinek  
>
> Jakub>  * c-cppbuiltin.c: Include configargs.h.
> Jakub>  (c_cpp_builtins): For C++11 and later if THREAD_MODEL_SPEC is not
> Jakub>  defined, predefine __STDCPP_THREADS__ to 1 unless thread_model is
> Jakub>  "single".
>
> Note this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63287

Any opinions about the relative advantage of doing this in the compiler vs.
doing it in the library, as in Jonathan's patch in the PR?


If it is done in the library, it will be defined only if any of the library
headers are included.
The https://eel.is/c++draft/cpp.predefined wording doesn't look like it
would allow defining it only if certain headers are included
(unlike e.g. the __cpp_lib_* macros which have associated list of headers
that should define those).


Right, the standard says "The values of the predefined macros (except
for __FILE__ and __LINE__) remain constant throughout the translation
unit."

Libc++ defines it in the library, and I added a testcase showing why
that's non-conforming to https://bugs.llvm.org/show_bug.cgi?id=33230#c11



Revert accidentally comitted sanity check

2020-11-13 Thread Jan Hubicka
Hi,
I have accidentally comitted in the following sanity check that fails
and is discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558538.html
The current ipa-icf code is conservatively correct because 
alias_ptr_types_compatible_p returns false on ref_all_alias_ptr_type_p.

Comitted as obvious.

I apologize for the breakage,
Honza

* tree-ssa-alias.c (ao_ref_base_alias_ptr_type,
ao_ref_alias_ptr_type): Revert accidental commit

diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index 9f279c805e5..5ebbb087285 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -755,7 +755,6 @@ ao_ref_base_alias_ptr_type (ao_ref *ref)
   while (handled_component_p (base_ref))
 base_ref = TREE_OPERAND (base_ref, 0);
   tree ret = reference_alias_ptr_type (base_ref);
-  gcc_checking_assert (get_deref_alias_set (ret) == ao_ref_base_alias_set 
(ref));
   return ret;
 }
 
@@ -768,7 +767,6 @@ ao_ref_alias_ptr_type (ao_ref *ref)
   if (!ref->ref)
 return NULL_TREE;
   tree ret = reference_alias_ptr_type (ref->ref);
-  gcc_checking_assert (get_deref_alias_set (ret) == ao_ref_alias_set (ref));
   return ret;
 }
 


c: C2x binary constants

2020-11-13 Thread Joseph Myers
C2x adds binary integer constants (approved at the last WG14 meeting,
though not yet added to the working draft in git).  Configure libcpp
to consider these a standard feature in C2x mode, with appropriate
updates to diagnostics including support for diagnosing them with
-std=c2x -Wc11-c2x-compat.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/testsuite/
2020-11-13  Joseph Myers  

* gcc.dg/binary-constants-2.c, gcc.dg/binary-constants-3.c,
gcc.dg/system-binary-constants-1.c: Update expected diagnostics.
* gcc.dg/c11-binary-constants-1.c,
gcc.dg/c11-binary-constants-2.c, gcc.dg/c2x-binary-constants-1.c,
gcc.dg/c2x-binary-constants-2.c, gcc.dg/c2x-binary-constants-3.c:
New tests.

libcpp/
2020-11-13  Joseph Myers  

* expr.c (cpp_classify_number): Update diagnostic for binary
constants for C.  Also diagnose binary constants for
-Wc11-c2x-compat.
* init.c (lang_defaults): Enable binary constants for GNUC2X and
STDC2X.

diff --git a/gcc/testsuite/gcc.dg/binary-constants-2.c 
b/gcc/testsuite/gcc.dg/binary-constants-2.c
index 6c3928aa2a0..5339d57b991 100644
--- a/gcc/testsuite/gcc.dg/binary-constants-2.c
+++ b/gcc/testsuite/gcc.dg/binary-constants-2.c
@@ -9,8 +9,8 @@
 int
 foo (void)
 {
-#if FOO /* { dg-warning "binary constants are a GCC extension" } */
+#if FOO /* { dg-warning "binary constants are a C2X feature or GCC extension" 
} */
   return 23;
 #endif
-  return 0b1101; /* { dg-warning "binary constants are a GCC extension" } */
+  return 0b1101; /* { dg-warning "binary constants are a C2X feature or GCC 
extension" } */
 }
diff --git a/gcc/testsuite/gcc.dg/binary-constants-3.c 
b/gcc/testsuite/gcc.dg/binary-constants-3.c
index 410fc4cd725..5b49cb4efbb 100644
--- a/gcc/testsuite/gcc.dg/binary-constants-3.c
+++ b/gcc/testsuite/gcc.dg/binary-constants-3.c
@@ -9,8 +9,8 @@
 int
 foo (void)
 {
-#if FOO /* { dg-error "binary constants are a GCC extension" } */
+#if FOO /* { dg-error "binary constants are a C2X feature or GCC extension" } 
*/
   return 23;
 #endif
-  return 0b1101; /* { dg-error "binary constants are a GCC extension" } */
+  return 0b1101; /* { dg-error "binary constants are a C2X feature or GCC 
extension" } */
 }
diff --git a/gcc/testsuite/gcc.dg/c11-binary-constants-1.c 
b/gcc/testsuite/gcc.dg/c11-binary-constants-1.c
new file mode 100644
index 000..fdc7df4bfad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c11-binary-constants-1.c
@@ -0,0 +1,11 @@
+/* Test that binary constants are diagnosed in C11 mode: -pedantic.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c11 -pedantic" } */
+
+int a = 0b1; /* { dg-warning "binary constants" } */
+#if 0b101 /* { dg-warning "binary constants" } */
+#endif
+
+int b = 0B1; /* { dg-warning "binary constants" } */
+#if 0B101 /* { dg-warning "binary constants" } */
+#endif
diff --git a/gcc/testsuite/gcc.dg/c11-binary-constants-2.c 
b/gcc/testsuite/gcc.dg/c11-binary-constants-2.c
new file mode 100644
index 000..6b48a5d005b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c11-binary-constants-2.c
@@ -0,0 +1,11 @@
+/* Test that binary constants are diagnosed in C11 mode: -pedantic-errors.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c11 -pedantic-errors" } */
+
+int a = 0b1; /* { dg-error "binary constants" } */
+#if 0b101 /* { dg-error "binary constants" } */
+#endif
+
+int b = 0B1; /* { dg-error "binary constants" } */
+#if 0B101 /* { dg-error "binary constants" } */
+#endif
diff --git a/gcc/testsuite/gcc.dg/c2x-binary-constants-1.c 
b/gcc/testsuite/gcc.dg/c2x-binary-constants-1.c
new file mode 100644
index 000..bbb2bc842c9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-binary-constants-1.c
@@ -0,0 +1,5 @@
+/* Test C2x binary constants.  Valid syntax and types.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+#include "binary-constants-1.c"
diff --git a/gcc/testsuite/gcc.dg/c2x-binary-constants-2.c 
b/gcc/testsuite/gcc.dg/c2x-binary-constants-2.c
new file mode 100644
index 000..4379427d6ce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-binary-constants-2.c
@@ -0,0 +1,11 @@
+/* Test that binary constants are accepted in C2X mode: compat warnings.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -Wc11-c2x-compat" } */
+
+int a = 0b1; /* { dg-warning "C2X feature" } */
+#if 0b101 /* { dg-warning "C2X feature" } */
+#endif
+
+int b = 0B1; /* { dg-warning "C2X feature" } */
+#if 0B101 /* { dg-warning "C2X feature" } */
+#endif
diff --git a/gcc/testsuite/gcc.dg/c2x-binary-constants-3.c 
b/gcc/testsuite/gcc.dg/c2x-binary-constants-3.c
new file mode 100644
index 000..7604791fa85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-binary-constants-3.c
@@ -0,0 +1,9 @@
+/* Test C2x binary constants.  Invalid constants.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+int a = 0b; /* { dg-error "invalid suffix" } */
+int b = 0B2; /* { dg-error 

Re: [committed] libstdc++: Optimise std::future::wait_for and fix futex polling

2020-11-13 Thread Jonathan Wakely via Gcc-patches

On 13/11/20 21:12 +, Jonathan Wakely wrote:

On 13/11/20 20:29 +, Mike Crowe via Libstdc++ wrote:

On Friday 13 November 2020 at 17:25:22 +, Jonathan Wakely wrote:

+  // Return the relative duration from (now_s + now_ns) to (abs_s + abs_ns)
+  // as a timespec.
+  struct timespec
+  relative_timespec(chrono::seconds abs_s, chrono::nanoseconds abs_ns,
+   time_t now_s, long now_ns)
+  {
+struct timespec rt;
+
+// Did we already time out?
+if (now_s > abs_s.count())
+  {
+   rt.tv_sec = -1;
+   return rt;
+  }
+
+auto rel_s = abs_s.count() - now_s;
+
+// Avoid overflows
+if (rel_s > __gnu_cxx::__int_traits::__max)
+  rel_s = __gnu_cxx::__int_traits::__max;
+else if (rel_s < __gnu_cxx::__int_traits::__min)
+  rel_s = __gnu_cxx::__int_traits::__min;


I may be missing something, but if the line above executes...


+
+// Convert the absolute timeout value to a relative timeout
+rt.tv_sec = rel_s;
+rt.tv_nsec = abs_ns.count() - now_ns;
+if (rt.tv_nsec < 0)
+  {
+   rt.tv_nsec += 10;
+   --rt.tv_sec;


...and so does this line above, then I think that we'll end up
underflowing. (Presumably rt.tv_sec will wrap round to being some time in
2038 on most 32-bit targets.)


Ugh.


I'm currently trying to persuade myself that this can actually happen and
if so work out how to come up with a test case for it.


Maybe something like:

auto d = chrono::floor(system_clock::now().time_since_epoch() 
- seconds(INT_MAX + 2LL));
fut.wait_until(system_clock::time_point(d));

This will create a sys_time with a value that is slightly more than
INT_MAX seconds before the current time, with a zero nanoseconds


Ah, but such a time will never reach the overflow because the first
thing that the new relative_timespec function does is:

 if (now_s > abs_s.count())
   {
 rt.tv_sec = -1;
 return rt;
   }

So in fact we can never have a negative rel_s anyway.



gcc-9-20201113 is now available

2020-11-13 Thread GCC Administrator via Gcc
Snapshot gcc-9-20201113 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/9-20201113/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 9 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-9 
revision 62c2d527307d8adce31f5c9ca6e19e15b2866b83

You'll find:

 gcc-9-20201113.tar.xzComplete GCC

  SHA256=84376ef0ee749441eaf7821c7cd0b6622b3594114ab8028bc57ff96fa721b9ad
  SHA1=63048c06b7d2c9f691faf400abb665848903fc5f

Diffs from 9-20201106 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug c/97817] -Wformat-truncation=2 elicits invalid warning

2020-11-13 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97817

Martin Sebor  changed:

   What|Removed |Added

 CC||msebor at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from Martin Sebor  ---
Level 2 of the warning is documented to warn also about calls to bounded
functions whose return value is used and that might result in truncation given
an argument of sufficient length or magnitude.  The level is meant to help
write code with the least likelihood of truncation given unknown arguments.

In the test case, the output of the function will be truncated unless buflen is
at least 16.  It will also be truncated if buflen is 16 and errnum is either
negative or bigger than 9.  The note printed after the warning indicates the
minimum size of output (i.e., 16) and the maximum (26) beyond which truncation
is impossible.

In 'gcc/omp-oacc-kernels-decompose.cc', use langhook instead of accessing language-specific decl information (was: [PATCH 04/10, OpenACC] Turn OpenACC kernels regions into a sequence of, parallel regi

2020-11-13 Thread Thomas Schwinge
Hi!

On 2019-08-05T22:51:22+0100, Kwok Cheung Yeung  wrote:
> On 18/07/2019 10:30 am, Jakub Jelinek wrote:
>> On Wed, Jul 17, 2019 at 10:06:07PM +0100, Kwok Cheung Yeung wrote:
>>> --- a/gcc/omp-oacc-kernels.c
>>> +++ b/gcc/omp-oacc-kernels.c
>>> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
>>>   #include "backend.h"
>>>   #include "target.h"
>>>   #include "tree.h"
>>> +#include "cp/cp-tree.h"
>>
>> No, you certainly don't want to do this.  Use langhooks if needed

ACK.

>> though
>> that can be only for stuff done before IPA.  After IPA, because of LTO FE, 
>> you
>> must not rely on anything that is not in the IL generically.

ACK, and this is very early:

   NEXT_PASS (pass_diagnose_omp_blocks);
   NEXT_PASS (pass_diagnose_tm_blocks);
+  NEXT_PASS (pass_omp_oacc_kernels_decompose);
   NEXT_PASS (pass_lower_omp);

> I have modified the patch to use the get_generic_function_decl langhook
> to determine whether current_function_decl is an instantiation of a
> template (in this case, we don't care what the generic decl is - just
> whether the function decl has one).

To me, it's not obvious that the original:

(DECL_LANG_SPECIFIC (current_function_decl)
 && DECL_TEMPLATE_INSTANTIATION (current_function_decl)))

... may be replaced with:

(lang_hooks.decls.get_generic_function_decl (current_function_decl)
 != NULL)

..., so thanks, Kwok, that you've figured that out.  :-)

I've just pushed to master branch commit
ccd56db89806a5f6eb3be99fc3b4fe364cf35e98 "In
'gcc/omp-oacc-kernels-decompose.cc', use langhook instead of accessing
language-specific decl information", see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From ccd56db89806a5f6eb3be99fc3b4fe364cf35e98 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Mon, 5 Aug 2019 22:51:22 +0100
Subject: [PATCH] In 'gcc/omp-oacc-kernels-decompose.cc', use langhook instead
 of accessing language-specific decl information

	gcc/
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Use langhook instead of accessing language-specific decl
	information.
---
 gcc/omp-oacc-kernels-decompose.cc | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index c585e5d092b..baad1b9a348 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "backend.h"
 #include "target.h"
 #include "tree.h"
-#include "cp/cp-tree.h"
+#include "langhooks.h"
 #include "gimple.h"
 #include "tree-pass.h"
 #include "cgraph.h"
@@ -792,6 +792,12 @@ static gimple *
 maybe_build_inner_data_region (location_t loc, gimple *body,
 			   tree inner_bind_vars, gimple *inner_cleanup)
 {
+  /* Is this an instantiation of a template?  (In this case, we don't care what
+ the generic decl is - just whether the function decl has one.)  */
+  bool generic_inst_p
+= (lang_hooks.decls.get_generic_function_decl (current_function_decl)
+   != NULL);
+
   /* Build data 'create (var)' clauses for these local variables.
  Below we will add these to a data region enclosing the entire body
  of the decomposed kernels region.  */
@@ -802,8 +808,7 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
   next = TREE_CHAIN (v);
   if (DECL_ARTIFICIAL (v)
 	  || TREE_CODE (v) == CONST_DECL
-	  || (DECL_LANG_SPECIFIC (current_function_decl)
-	  && DECL_TEMPLATE_INSTANTIATION (current_function_decl)))
+	  || generic_inst_p)
 	{
 	  /* If this is an artificial temporary, it need not be mapped.  We
 	 move its declaration into the bind inside the data region.
-- 
2.17.1



Re: Ping: [PATCH] Ensure colorization doesn't corrupt multibyte sequences in diagnostics

2020-11-13 Thread Jeff Law via Gcc-patches


On 1/14/20 5:05 PM, Lewis Hyatt wrote:
> Hello-
>
> I thought I might ping this short patch please, just in case it may
> make sense to include in GCC 10 along with the other UTF-8-related
> fixes to diagnostics. Thanks!
>
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00915.html

This is fine for the trunk.  Note that due to the changes to handle
tabs/control bytes will require this patch to be updated.  It may be as
simple as moving the c = dw.next_byte() statement up.


Go ahead and do the necessary update and retest & repost the patch for
archival purposes.  If you have commit privs, go ahead and commit the
updated patch, else indicate in the patch repost that someone needs to
apply it for you.


Thanks for your patience,

Jeff


>> #1. diagnostic_show_locus() should be sure it will not corrupt output in
>> this way, regardless of what ranges it is given to work with.

Yes.


>>
>> #2. libcpp should probably generate a range that includes the whole UTF-8
>> character. Actually in other ways the range seems not ideal, for example
>> if an invalid character appears in the middle of the identifier, the
>> diagnostic still points to the first byte of the identifier.

Probably.  We haven't traditionally worried a  lot about multitbyte
sequences, so I'm not surprised we're not handling them particularly well.


>>
>> The attached patch fixes #1. It's essentially a one-line change, plus a
>> new selftest. Would you please have a look at it sometime? bootstrap
>> and testsuite were done on linux x86-64.
>>
>> Other questions that I have:
>>
>> - I am not quite clear when a selftest is preferred vs a dejagnu test. In
>>   this case I stuck with the selftest because color diagnostics don't seem
>>   to work well with dg-error etc, and it didn't seem worth creating a new
>>   plugin-based test like g++.dg/plugin just for this. (I also considered
>>   using the existing g++.dg plugin, but it seems this test should run for
>>   gcc as well.)

It varies and there's cases that are fine in either and I suspect there
are many tests in the dejagnu suite that would be better as selftests --
selftests are a fairly new concept.


The guidance I would give is the more a particular test is tied to the
internals of the code, the more likely a selftest is the right
approach.  THe more the test needs an end-to-end run through passes of
the compiler, the more it belongs in the dejagnu suite.



>>
>> - I wasn't sure if I should create a PR for an issue such as this, if
>>   there is already a patch readily available. And if I did create a PR,
>>   not sure if it's preferred to post the patch to gcc-patches, or as an
>>   attachment to the PR.

We still prefer patches to go to gcc-patches -- I personally don't troll
BZ looking for attached patches.


>>
>> - Does it seem worth me looking into #2? I think the patch to address #1 is
>>   appropriate in any case, because it handles generically all potential
>>   cases where this may arise, but still perhaps the ranges coming out of
>>   libcpp could be improved?

I don't think it can hurt to look into the difficulty in addressing #2.


jeff



[Bug c++/63287] __STDCPP_THREADS__ is not defined

2020-11-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63287

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:1d9a8675d379f02f5e39639f469ae8dfcf33fea9

commit r11-5017-g1d9a8675d379f02f5e39639f469ae8dfcf33fea9
Author: Jakub Jelinek 
Date:   Fri Nov 13 23:23:33 2020 +0100

c++: Predefine __STDCPP_THREADS__ in the compiler if thread model is not
single [PR63287]

The following patch predefines __STDCPP_THREADS__ macro to 1 if c++11 or
later and thread model (e.g. printed by gcc -v) is not single.
There are two targets not handled by this patch, those that define
THREAD_MODEL_SPEC.  In one case - QNX - it looks just like a mistake
to me, instead of setting thread_model=posix in config.gcc it uses
THREAD_MODEL_SPEC macro to set it unconditionally to posix.
The other is hpux10, which uses -threads option to decide if threads
are enabled or not, but that option isn't really passed to the compiler.
I think that is something that really should be solved in config/pa/
instead, e.g. in the config/xxx/xxx-c.c targets usually set their own
predefined macros and it could handle this, and either pass the option
also to the compiler, or say predefine __STDCPP_THREADS__ if _DCE_THREADS
macro is defined already (or -D_DCE_THREADS found on the command line),
or whatever else.

2020-11-13  Jakub Jelinek  

PR c++/63287
* c-cppbuiltin.c: Include configargs.h.
(c_cpp_builtins): For C++11 and later if THREAD_MODEL_SPEC is not
defined, predefine __STDCPP_THREADS__ to 1 unless thread_model is
"single".

[PATCH] IBM Z: Do not run long double tests on old machines

2020-11-13 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on z13 s390x-redhat-linux.  Ok for master?

gcc/testsuite/ChangeLog:

2020-11-12  Ilya Leoshkevich  

* gcc.target/s390/s390.exp (check_effective_target_s390_z14_hw):
New predicate.
* gcc.target/s390/vector/long-double-caller-abi-run.c: Use the
new predicate.
* gcc.target/s390/vector/long-double-copysign.c: Likewise.
* gcc.target/s390/vector/long-double-from-double.c: Likewise.
* gcc.target/s390/vector/long-double-from-float.c: Likewise.
* gcc.target/s390/vector/long-double-from-i16.c: Likewise.
* gcc.target/s390/vector/long-double-from-i32.c: Likewise.
* gcc.target/s390/vector/long-double-from-i64.c: Likewise.
* gcc.target/s390/vector/long-double-from-i8.c: Likewise.
* gcc.target/s390/vector/long-double-from-u16.c: Likewise.
* gcc.target/s390/vector/long-double-from-u32.c: Likewise.
* gcc.target/s390/vector/long-double-from-u64.c: Likewise.
* gcc.target/s390/vector/long-double-from-u8.c: Likewise.
* gcc.target/s390/vector/long-double-to-double.c: Likewise.
* gcc.target/s390/vector/long-double-to-float.c: Likewise.
* gcc.target/s390/vector/long-double-to-i16.c: Likewise.
* gcc.target/s390/vector/long-double-to-i32.c: Likewise.
* gcc.target/s390/vector/long-double-to-i64.c: Likewise.
* gcc.target/s390/vector/long-double-to-i8.c: Likewise.
* gcc.target/s390/vector/long-double-to-u16.c: Likewise.
* gcc.target/s390/vector/long-double-to-u32.c: Likewise.
* gcc.target/s390/vector/long-double-to-u64.c: Likewise.
* gcc.target/s390/vector/long-double-to-u8.c: Likewise.
* gcc.target/s390/vector/long-double-wfaxb.c: Likewise.
* gcc.target/s390/vector/long-double-wfdxb.c: Likewise.
* gcc.target/s390/vector/long-double-wfsxb-1.c: Likewise.
---
 gcc/testsuite/gcc.target/s390/s390.exp | 10 ++
 .../s390/vector/long-double-caller-abi-run.c   |  3 ++-
 .../gcc.target/s390/vector/long-double-copysign.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-double.c   |  3 ++-
 .../gcc.target/s390/vector/long-double-from-float.c|  3 ++-
 .../gcc.target/s390/vector/long-double-from-i16.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-i32.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-i64.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-i8.c   |  3 ++-
 .../gcc.target/s390/vector/long-double-from-u16.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-u32.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-u64.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-u8.c   |  3 ++-
 .../gcc.target/s390/vector/long-double-to-double.c |  3 ++-
 .../gcc.target/s390/vector/long-double-to-float.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-to-i16.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-i32.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-i64.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-i8.c |  3 ++-
 .../gcc.target/s390/vector/long-double-to-u16.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-u32.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-u64.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-u8.c |  3 ++-
 .../gcc.target/s390/vector/long-double-wfaxb.c |  3 ++-
 .../gcc.target/s390/vector/long-double-wfdxb.c |  3 ++-
 .../gcc.target/s390/vector/long-double-wfsxb-1.c   |  3 ++-
 26 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/s390.exp 
b/gcc/testsuite/gcc.target/s390/s390.exp
index 387a720b8e3..00e0555d55c 100644
--- a/gcc/testsuite/gcc.target/s390/s390.exp
+++ b/gcc/testsuite/gcc.target/s390/s390.exp
@@ -192,6 +192,16 @@ proc check_effective_target_s390_z13_hw { } {
}
 }] "-march=z13 -m64 -mzarch" ] } { return 0 } else { return 1 }
 }
+proc check_effective_target_s390_z14_hw { } {
+if { ![check_runtime s390_check_s390_z14_hw [subst {
+   int main (void)
+   {
+   int x = 0;
+   asm ("msgrkc %%0,%%0,%%0" : "+r" (x) : );
+   return x;
+   }
+}] "-march=z14 -m64 -mzarch" ] } { return 0 } else { return 1 }
+}
 
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c
index f3a41bacc2f..f7315f6c2e9 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c
@@ -1,4 +1,5 @@
-/* { dg-do run } */
+/* { dg-do compile } */
 /* { dg-options "-O3 -march=z14 -mzarch" } */
+/* { dg-do run { target { s390_z14_hw } } } */
 #include "long-double-callee-abi-scan.c"
 #include 

Re: [PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

2020-11-13 Thread Jonathan Wakely via Gcc-patches

On 13/11/20 21:58 +, Mike Crowe via Libstdc++ wrote:

On Thursday 12 November 2020 at 23:07:47 +, Jonathan Wakely wrote:

On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
> The futex system call supports waiting for an absolute time if
> FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
> benefits:
>
> 1. The call to gettimeofday is not required in order to calculate a
>   relative timeout.
>
> 2. If someone changes the system clock during the wait then the futex
>   timeout will correctly expire earlier or later.  Currently that only
>   happens if the clock is changed prior to the call to gettimeofday.
>
> According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
> v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
> that the code still works correctly with earlier kernel versions, an ENOSYS
> error from futex[1] results in the futex_clock_realtime_unavailable flag
> being set.  This flag is used to avoid the unnecessary unsupported futex
> call in the future and to fall back to the previous gettimeofday and
> relative time implementation.
>
> glibc applied an equivalent switch in pthread_cond_timedwait to use
> FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
> glibc-2.10 back in 2009.  See
> glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
>
> The futex_clock_realtime_unavailable flag is accessed using
> std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
> two calls to _M_futex_wait_until happen to happen simultaneously then the
> only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
> risk discovering that it doesn't work and, if so, both set the flag.
>
> [1] This is how glibc's nptl-init.c determines whether these flags are
>supported.
>
>* libstdc++-v3/src/c++11/futex.cc: Add new constants for required
>futex flags.  Add futex_clock_realtime_unavailable flag to store
>result of trying to use
>FUTEX_CLOCK_REALTIME. (__atomic_futex_unsigned_base::_M_futex_wait_until):
>Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
>fall back to using gettimeofday and FUTEX_WAIT if that's not
>supported.

Mike,

I've been doing some performance comparisons and this patch seems to
make quite a big difference to code that polls a future by calling
fut.wait_until(t) using any t < now() as the timeout. For example,
fut.wait_until(chrono::system_clock::time_point{}) to wait until the
UNIX epoch.

With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
commented out) I see that polling take < 100ns. With the change, it
takes 3000ns or more.

Now this is still far better than polling using fut.wait_for(0s) which
takes around 5ns due to the clock_gettime call, but I'm about to
fix that.

I'm not sure how important it is for wait_until(past) to be fast, but
the difference from 100ns to 3000ns seems significant. Do you see the
same kind of numbers? Is this just a property of the futex wait with
an absolute time?

N.B. using wait_until(system_clock::time_point::min()) or any other
time before the epoch doesn't work. The futex syscall returns EINVAL
which we don't check for. I'm about to fix that too.


I see similar behaviour. I suppose this is because the
gettimeofday/clock_gettime system calls are in the VDSO and therefore
usually much cheaper to call than the real system call SYS_futex.

If rather than bailing out early when the relative timeout is negative, I
call the relative SYS_futex with rt.tv_sec = rt.tv_nsec = 0 then the
wait_until call takes about ten times longer than when using the absolute
SYS_futex. I can't really explain that.

Calling these functions with a time in the past is probably quite common if
you calculate a single timeout for several operations in sequence. What's
less clear is whether the performance matters that much when the return
value indicates a timeout anyway.

If gettimeofday/clock_gettime are cheap enough then I suppose we can call
them even in the absolute timeout case (losing benefit 1 above, which
appears to not really exist) to get the improved performance for timeouts
in the past whilst retaining the correct behaviour if the clock is warped
that this patch addressed (benefit 2 above.)

I'll try to come up with some standalone test cases with results for
further discussion. I suspect that the glibc people will be interested too.


Thanks, that would be great. I have about twenty things on my plate
already.



Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs (was: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions)

2020-11-13 Thread Thomas Schwinge
Hi!

On 2019-02-01T00:59:30+0100, I wrote:
> I've just pushed the attached nine patches to openacc-gcc-8-branch:
> OpenACC 'kernels' construct changes: splitting of the construct into
> several regions.

Now, slightly more polished, I've pushed to master branch a variant of
most of these patches combined in commit
e898ce7997733c29dcab9c3c62ca102c7f9fa6eb "Decompose OpenACC 'kernels'
constructs into parts, a sequence of compute constructs", see attached.

> There's more work to be done there, and we're aware of a number of TODO
> items, but nevertheless: it's a good first step.

That's still the case...  :-)


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From e898ce7997733c29dcab9c3c62ca102c7f9fa6eb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerg=C3=B6=20Barany?= 
Date: Fri, 1 Feb 2019 00:59:30 +0100
Subject: [PATCH] Decompose OpenACC 'kernels' constructs into parts, a sequence
 of compute constructs

Not yet enabled by default: for now, the current mode of OpenACC 'kernels'
constructs handling still remains '-fopenacc-kernels=parloops', but that is to
change later.

	gcc/
	* omp-oacc-kernels-decompose.cc: New.
	* Makefile.in (OBJS): Add it.
	* passes.def: Instantiate it.
	* tree-pass.h (make_pass_omp_oacc_kernels_decompose): Declare.
	* flag-types.h (enum openacc_kernels): Add.
	* doc/invoke.texi (-fopenacc-kernels): Document.
	* gimple.h (enum gf_mask): Add
	'GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED',
	'GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE',
	'GF_OMP_TARGET_KIND_OACC_DATA_KERNELS'.
	(is_gimple_omp_oacc, is_gimple_omp_offloaded): Handle these.
	* gimple-pretty-print.c (dump_gimple_omp_target): Likewise.
	* omp-expand.c (expand_omp_target, build_omp_regions_1)
	(omp_make_gimple_edges): Likewise.
	* omp-low.c (scan_sharing_clauses, scan_omp_for)
	(check_omp_nesting_restrictions, lower_oacc_reductions)
	(lower_oacc_head_mark, lower_omp_target): Likewise.
	* omp-offload.c (execute_oacc_device_lower): Likewise.
	gcc/c-family/
	* c.opt (fopenacc-kernels): Add.
	gcc/fortran/
	* lang.opt (fopenacc-kernels): Add.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-1.c: New.
	* c-c++-common/goacc/kernels-decompose-2.c: New.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: New.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: New.
	* gfortran.dg/goacc/kernels-decompose-1.f95: New.
	* gfortran.dg/goacc/kernels-decompose-2.f95: New.
	* c-c++-common/goacc/if-clause-2.c: Adjust.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	New.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

Co-authored-by: Thomas Schwinge 
---
 gcc/Makefile.in   |1 +
 gcc/c-family/c.opt|   13 +
 gcc/doc/invoke.texi   |   14 +-
 gcc/flag-types.h  |7 +
 gcc/fortran/lang.opt  |4 +
 gcc/gimple-pretty-print.c |9 +
 gcc/gimple.h  |   14 +
 gcc/omp-expand.c  |   22 +
 gcc/omp-low.c |   66 +-
 gcc/omp-oacc-kernels-decompose.cc | 1531 +
 gcc/omp-offload.c |   19 +
 gcc/passes.def|1 +
 .../c-c++-common/goacc/if-clause-2.c  |   24 +-
 .../c-c++-common/goacc/kernels-decompose-1.c  |   83 +
 .../c-c++-common/goacc/kernels-decompose-2.c  |  141 ++
 .../goacc/kernels-decompose-ice-1.c   |  108 ++
 .../goacc/kernels-decompose-ice-2.c   |   16 +
 .../gfortran.dg/goacc/kernels-decompose-1.f95 |   81 +
 .../gfortran.dg/goacc/kernels-decompose-2.f95 |  142 ++
 .../gfortran.dg/goacc/kernels-tree.f95|5 +
 gcc/tree-pass.h   |1 +
 .../declare-vla-kernels-decompose-ice-1.c |8 +
 .../declare-vla-kernels-decompose.c   |6 +
 .../libgomp.oacc-c-c++-common/declare-vla.c   |6 +
 .../kernels-decompose-1.c |   38 +
 .../libgomp.oacc-fortran/pr94358-1.f90|   11 +-
 26 files changed, 2355 insertions(+), 16 deletions(-)
 create mode 100644 gcc/omp-oacc-kernels-decompose.cc
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
 create mode 100644 

[Bug tree-optimization/97821] New: wrong code with -ftree-vectorize at -O1 on x86_64-pc-linux-gnu

2020-11-13 Thread zhendong.su at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97821

Bug ID: 97821
   Summary: wrong code with -ftree-vectorize at -O1 on
x86_64-pc-linux-gnu
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhendong.su at inf dot ethz.ch
  Target Milestone: ---

The code is valid, but it is hard to reduce, so still quite large.

[509] % gcctk -v
Using built-in specs.
COLLECT_GCC=gcctk
COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --disable-bootstrap
--prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++
--disable-werror --enable-multilib --with-system-zlib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.0 20201113 (experimental) [master revision
54896b10dbe:c3a97a9df4b:a514934a0565255276adaa4fbd4aa35579ec33c6] (GCC) 
[510] % 
[510] % gcctk -O1 small.c; ./a.out
5030-170
[511] % gcctk -O1 -ftree-vectorize small.c; ./a.out
5030-176
[512] % 
[512] % cat small.c
int printf (const char *, ...);

static unsigned a, f, v;
int b, h, aa, ab, ac, ad, ae, y, z, af;
static long c, m, t, ag, ah = 3;
static signed d;
static char e, ai;
static short g, j = 1, o, w;
int *i, *s;
long long l;
static int *n;
char p;
static int q;
static int r;
static int u;
short x;
long long *aj = 
static signed ak;
static volatile unsigned al = 5;
static volatile short am = 1;
int *an(int *ao, int *ap) { return ap; }
static int aq() {
  int ar[] = {2, 2, 2, 2, 2, 2};
  short *as = 
  int at[] = {0, 1, 0, 1};
  int au = ab = 0;
  for (; m <= 1; m++) {
int av = 0, k, aw = e && u, ax = aw || ag;
int **ay = 
for (; ab; ab++)
  ac = 0;
for (; ac; ac++)
  am;
u &
short az = am || a ^ w;
unsigned bc = am & w | am || ag;
  ba:
aw = u;
i = 0;
for (; i; i++)
  b = a;
printf("0");
if (p) {
  printf("%ld", ag);
  continue;
}
if (ag) {
  printf("7");
  e = w | ag c < ax;
}
if (w) {
  printf("%d", u);
  goto bb;
}
if (u)
  printf("%d", e);
s = 
u = aw;
t = 0;
for (; t <= 1; t++)
  *ay = an(, );
e++;
  }
  for (; r >= 0;)
for (; ag <= 5;) {
  signed bd[6];
  int be = 0, bf = am % al;
  for (; be < 6; be++)
bd[0] = 9;
  h = 0;
  for (; h <= 5; h++)
*aj = *as = aa;
  for (; w; w = d)
;
  short bg = d + j ^ e + r;
  al % am;
  int bi = bg & al >> am;
  am ^ al;
  am / al;
  am 
  al;
  am / al;
  if (c)
if (q) {
  be = 0;
  for (; be; be++)
z = 0;
}
  am;
  int bj = 0;
  if (m || q) {
  bh:
l = ad = c;
int bm = al || q;
al;
al;
char bn = al || q;
al;
al;
bm = q;
ae = a;
  bk:
ai = h || q > d;
ag = d;
al;
al;
printf("%d", q);
if (a > 1)
  break;
if (q)
  printf("%d", d);
if (q) {
  printf("3");
  h = d | bm > q;
  goto bk;
}
if (!ai || al && 0) {
  printf("%d", d);
  al;
  printf("%d", a);
  goto bb;
}
d = al;
printf("%lld", l);
m = q;
if (ak) {
  printf("%ld", c);
  ad = c & q;
}
if (!ah) {
  printf("%d", q);
  goto bh;
}
  }
  if (c)
s = 
  m = q = d && c;
  r = ~(e / j & al > r);
  f |= d = al;
  v |= am;
  al / al ^ am;
  ak = am + al | al;
  am / al + al ^ am;
  j = am;
  al;
bb:
  if (c)
g++;
  a = q || e & d;
  am || al;
  am;
  am;
  am;
  al 
  am;
  am;
bl:
  am;
  if (q) {
printf("%d", q);
a = q - am;
goto bl;
  }
  am;
  printf("%d", d);
  m = q & am;
  am;
  printf("%d", a);
  if (d < -41) {
printf("%ld", ag);
goto ba;
  }
  h = *n;
  printf("3");
  c = e / d;
  printf("%ld", m);
  d = d << q / ag;
  o = 2;
  for (; o; o++)
i = 
  x = m = e;
  printf("%d", r) && (ah = r) || (d = ak && e);
  printf("%d", ak);
  if (!bf) {
printf("%d", e);
*as = a;
i = n;
bi = ak / am > r;
*n = 0;
for (; n; n++)
  ;
  }
  y = bi;
}
  return 0;
}
int main() {
  for (; af < 6; af++) {
d = 8;
aq();
  }
  printf("%d\n", h);
  return 0;
}

More explicit checking of which OMP constructs we're expecting

2020-11-13 Thread Thomas Schwinge
Hi!

I've pushed "More explicit checking of which OMP constructs we're
expecting" to master branch in commit
bd7885755405bc9947ebe805a53d6100c78c8e82, and backported to
releases/gcc-10 branch in commit
00d4aa2128fd73b49e28c8a8c5fcb81150b640fe, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From bd7885755405bc9947ebe805a53d6100c78c8e82 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 13 Oct 2020 14:56:59 +0200
Subject: [PATCH] More explicit checking of which OMP constructs we're
 expecting

In particular, more precisely highlight what applies generally vs. the special
handling for the current 'parloops'-based OpenACC 'kernels' implementation.

	gcc/
	* omp-low.c (scan_sharing_clauses, scan_omp_for)
	(lower_oacc_reductions, lower_omp_target): More explicit checking
	of which OMP constructs we're expecting.
---
 gcc/omp-low.c | 59 +++
 1 file changed, 45 insertions(+), 14 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index eec34818f1b..2602189d687 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1194,9 +1194,16 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  goto do_private;
 
 	case OMP_CLAUSE_REDUCTION:
-	  if (is_oacc_parallel_or_serial (ctx) || is_oacc_kernels (ctx))
-	ctx->local_reduction_clauses
-	  = tree_cons (NULL, c, ctx->local_reduction_clauses);
+	  /* Collect 'reduction' clauses on OpenACC compute construct.  */
+	  if (is_gimple_omp_oacc (ctx->stmt)
+	  && is_gimple_omp_offloaded (ctx->stmt))
+	{
+	  /* No 'reduction' clauses on OpenACC 'kernels'.  */
+	  gcc_checking_assert (!is_oacc_kernels (ctx));
+
+	  ctx->local_reduction_clauses
+		= tree_cons (NULL, c, ctx->local_reduction_clauses);
+	}
 	  if ((OMP_CLAUSE_REDUCTION_INSCAN (c)
 	   || OMP_CLAUSE_REDUCTION_TASK (c)) && ctx->allocate_map)
 	{
@@ -2502,7 +2509,7 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
 {
   omp_context *tgt = enclosing_target_ctx (outer_ctx);
 
-  if (!tgt || is_oacc_parallel_or_serial (tgt))
+  if (!(tgt && is_oacc_kernels (tgt)))
 	for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
 	  {
 	tree c_op0;
@@ -6921,6 +6928,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
   for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
 if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
   {
+	/* No 'reduction' clauses on OpenACC 'kernels'.  */
+	gcc_checking_assert (!is_oacc_kernels (ctx));
+
 	tree orig = OMP_CLAUSE_DECL (c);
 	tree var = maybe_lookup_decl (orig, ctx);
 	tree ref_to_res = NULL_TREE;
@@ -6958,10 +6968,11 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		break;
 
 		  case GIMPLE_OMP_TARGET:
-		if ((gimple_omp_target_kind (probe->stmt)
-			 != GF_OMP_TARGET_KIND_OACC_PARALLEL)
-			&& (gimple_omp_target_kind (probe->stmt)
-			!= GF_OMP_TARGET_KIND_OACC_SERIAL))
+		/* No 'reduction' clauses inside OpenACC 'kernels'
+		   regions.  */
+		gcc_checking_assert (!is_oacc_kernels (probe));
+
+		if (!is_gimple_omp_offloaded (probe->stmt))
 		  goto do_lookup;
 
 		cls = gimple_omp_target_clauses (probe->stmt);
@@ -7768,8 +7779,16 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,
   tag |= OLF_GANG_STATIC;
 }
 
-  /* In a parallel region, loops are implicitly INDEPENDENT.  */
   omp_context *tgt = enclosing_target_ctx (ctx);
+  if (!tgt || is_oacc_parallel_or_serial (tgt))
+;
+  else if (is_oacc_kernels (tgt))
+/* Not using this loops handling inside OpenACC 'kernels' regions.  */
+gcc_unreachable ();
+  else
+gcc_unreachable ();
+
+  /* In a parallel region, loops are implicitly INDEPENDENT.  */
   if (!tgt || is_oacc_parallel_or_serial (tgt))
 tag |= OLF_INDEPENDENT;
 
@@ -11805,8 +11824,14 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	break;
 
   case OMP_CLAUSE_FIRSTPRIVATE:
-	if (is_oacc_parallel_or_serial (ctx))
-	  goto oacc_firstprivate;
+	gcc_checking_assert (offloaded);
+	if (is_gimple_omp_oacc (ctx->stmt))
+	  {
+	/* No 'firstprivate' clauses on OpenACC 'kernels'.  */
+	gcc_checking_assert (!is_oacc_kernels (ctx));
+
+	goto oacc_firstprivate;
+	  }
 	map_cnt++;
 	var = OMP_CLAUSE_DECL (c);
 	if (!omp_is_reference (var)
@@ -11831,8 +11856,14 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	break;
 
   case OMP_CLAUSE_PRIVATE:
+	gcc_checking_assert (offloaded);
 	if (is_gimple_omp_oacc (ctx->stmt))
-	  break;
+	  {
+	/* No 'private' clauses on OpenACC 'kernels'.  */
+	gcc_checking_assert (!is_oacc_kernels (ctx));
+
+	break;
+	  }
 	var = OMP_CLAUSE_DECL (c);
 	if (is_variable_sized (var))
 	  {
@@ -12195,7 +12226,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 

Attach an attribute to all outlined OpenACC compute regions

2020-11-13 Thread Thomas Schwinge
Hi!

I've pushed "Attach an attribute to all outlined OpenACC compute regions"
to master branch in commit 703e4f86496214e4915db898397fcd0ae1d955e0, and
backported to releases/gcc-10 branch in commit
40bf92be5b621318a43347236508696cc387f3a6, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 703e4f86496214e4915db898397fcd0ae1d955e0 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 28 Oct 2020 11:43:49 +0100
Subject: [PATCH] Attach an attribute to all outlined OpenACC compute regions

This allows for making some things more explicit, later on.

	gcc/
	* omp-expand.c (expand_omp_target): Attach an attribute to all
	outlined OpenACC compute regions.
	* omp-offload.c (execute_oacc_device_lower): Adjust.
	gcc/testsuite/
	* c-c++-common/goacc/classify-parallel.c: Adjust.
	* gfortran.dg/goacc/classify-parallel.f95: Likewise.
	* c-c++-common/goacc/classify-serial.c: New.
	* gfortran.dg/goacc/classify-serial.f95: Likewise.
---
 gcc/omp-expand.c  | 22 +---
 gcc/omp-offload.c | 51 +--
 .../c-c++-common/goacc/classify-parallel.c|  4 +-
 .../c-c++-common/goacc/classify-serial.c  | 29 +++
 .../gfortran.dg/goacc/classify-parallel.f95   |  4 +-
 .../gfortran.dg/goacc/classify-serial.f95 | 31 +++
 6 files changed, 114 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-serial.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/classify-serial.f95

diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index ddca3d33bdd..c6ee3eb0857 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -9284,27 +9284,33 @@ expand_omp_target (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;
 
+  if (target_kind == GF_OMP_TARGET_KIND_OACC_KERNELS)
+mark_loops_in_oacc_kernels_region (region->entry, region->exit);
+
+  /* Going on, all OpenACC compute constructs are mapped to
+ 'BUILT_IN_GOACC_PARALLEL', and get their compute regions outlined.
+ To distinguish between them, we attach attributes.  */
   switch (target_kind)
 {
+case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+  DECL_ATTRIBUTES (child_fn)
+	= tree_cons (get_identifier ("oacc parallel"),
+		 NULL_TREE, DECL_ATTRIBUTES (child_fn));
+  break;
 case GF_OMP_TARGET_KIND_OACC_KERNELS:
-  mark_loops_in_oacc_kernels_region (region->entry, region->exit);
-
-  /* Further down, all OpenACC compute constructs will be mapped to
-	 BUILT_IN_GOACC_PARALLEL, and to distinguish between them, there
-	 is an "oacc kernels" attribute set for OpenACC kernels.  */
   DECL_ATTRIBUTES (child_fn)
 	= tree_cons (get_identifier ("oacc kernels"),
 		 NULL_TREE, DECL_ATTRIBUTES (child_fn));
   break;
 case GF_OMP_TARGET_KIND_OACC_SERIAL:
-  /* Further down, all OpenACC compute constructs will be mapped to
-	 BUILT_IN_GOACC_PARALLEL, and to distinguish between them, there
-	 is an "oacc serial" attribute set for OpenACC serial.  */
   DECL_ATTRIBUTES (child_fn)
 	= tree_cons (get_identifier ("oacc serial"),
 		 NULL_TREE, DECL_ATTRIBUTES (child_fn));
   break;
 default:
+  /* Make sure we don't miss any.  */
+  gcc_checking_assert (!(is_gimple_omp_oacc (entry_stmt)
+			 && is_gimple_omp_offloaded (entry_stmt)));
   break;
 }
 
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 4490701147c..21583433d6d 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1762,12 +1762,45 @@ execute_oacc_device_lower ()
   flag_openacc_dims = (char *)_openacc_dims;
 }
 
+  bool is_oacc_parallel
+= (lookup_attribute ("oacc parallel",
+			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
   bool is_oacc_kernels
 = (lookup_attribute ("oacc kernels",
 			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  bool is_oacc_serial
+= (lookup_attribute ("oacc serial",
+			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  int fn_level = oacc_fn_attrib_level (attrs);
+  bool is_oacc_routine = (fn_level >= 0);
+  gcc_checking_assert (is_oacc_parallel
+		   + is_oacc_kernels
+		   + is_oacc_serial
+		   + is_oacc_routine
+		   == 1);
+
   bool is_oacc_kernels_parallelized
 = (lookup_attribute ("oacc kernels parallelized",
 			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  if (is_oacc_kernels_parallelized)
+gcc_checking_assert (is_oacc_kernels);
+
+  if (dump_file)
+{
+  if (is_oacc_parallel)
+	fprintf (dump_file, "Function is OpenACC parallel offload\n");
+  else if (is_oacc_kernels)
+	fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
+		 (is_oacc_kernels_parallelized
+		  ? "parallelized" : "unparallelized"));
+  else if (is_oacc_serial)
+	fprintf (dump_file, "Function is OpenACC serial offload\n");
+  

Re: testsuite: Adjust pr96789.c to exclude vect_load_lanes

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/10/20 7:42 PM, Kewen.Lin via Gcc-patches wrote:
> Hi Richard,
>
> Thanks for the review!
>
> on 2020/11/10 涓嬪崍7:31, Richard Sandiford wrote:
>> "Kewen.Lin"  writes:
>>> Hi,
>>>
>>> As Lyon pointed out, the newly introduced test case
>>> gcc.dg/tree-ssa/pr96789.c fails on arm-none-linux-gnueabihf.
>>> Loop vectorizer is able to vectorize the two loops which
>>> operate on array tmp with load_lanes feature support.  It
>>> makes dse3 get unexpected inputs and do nothing.
>>>
>>> This patch is to teach the case to respect vect_load_lanes,
>>> meanwhile to guard the check only under vect_int.
>> I'm not sure this is the right check.  The test passes on aarch64,
>> which also has load lanes, but apparently doesn't use them for this
>> test.  I think the way the loop vectoriser handles the loops will
>> depend a lot on target costs, which can vary in unpredictable ways.
>>
> You are right, although aarch64 doesn't have this failure, it can fail
> with explicit -march=armv8-a+sve.  It can vary as target features/costs
> change.  The check is still fragile.
>
> Your suggestion with -ftree-slp-vectorize below is better!
>
>> Does it work if you instead change -ftree-vectorize to -ftree-slp-vectorize?
>> Or does that defeat the purpose of the test?
> It works, nice, thanks for the suggestion!
>
> I appended one explicit -fno-tree-loop-vectorize to avoid it to fail
> in case someone kicks off the testing with explicit -ftree-loop-vectorize.
>
> The updated version is pasted below, is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/tree-ssa/pr96789.c: Adjusted by disabling loop vectorization.

OK

jeff




Re: [PATCH v3] Include checking of 0 cost dependency due to bypass in rank_for_schedule

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/6/20 2:37 AM, Jojo R wrote:
> Insn seqs before sched:
>
> .L1:
> a5 = insn-1 (a0)
> a6 = insn-2 (a1)
> a7 = insn-3 (a7, a5)
> a8 = insn-4 (a8, a6)
> Jmp .L1
>
> Insn-3 & insn-4 is REG_DEP_TRUE of insn-1 & insn-2,
> so insn-3 & insn-4 will be as the last of ready list.
> And this patch will put 0 cost dependency due to a bypass
> as highest numbered class also if some target have forward
> feature between DEP_PRO and DEP_CON.
>
> if the insns are in the same cost class on -fsched-last-insn-heuristic,
> And then, go to "prefer the insn which has more later insns that depend on 
> it",
> return from dep_list_size() is not satisfied, it includes all dependence of 
> insn.
> We need to ignore the ones that have a 0 cost dependency due to a bypass.
>
> With this patch and pipeline description as below:
>
> (define_bypass 0 "insn-1, insn-2" "insn-3, insn-4")
>
> We can get better insn seqs after sched:
>
> .L1:
> a5 = insn-1 (a0)
> a7 = insn-3 (a7, a5)
> a6 = insn-2 (a1)
> a8 = insn-4 (a8, a6)
> Jmp .L1
>
> I have tested on ck860 of C-SKY arch and C960 of T-Head based on RISCV arch
>
>   gcc/
>   * haifa-sched.c (dep_list_costs): New.
>   (rank_for_schedule): Replace dep_list_size with dep_list_costs.
>   Add 0 cost dependency due to bypass on -fsched-last-insn-heuristic.

OK for the trunk.  Thanks,

Jeff




Re: [PATCH] dumpfile.c: use prefixes other that 'note: ' for MSG_{OPTIMIZED_LOCATIONS|MISSED_OPTIMIZATION}

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/6/20 1:38 AM, Thomas Schwinge wrote:
> Hi!
>
> On 2018-09-25T16:00:14-0400, David Malcolm  wrote:
>> As noted at Cauldron, dumpfile.c currently emits "note: " for all kinds
>> of dump message, so that (after filtering) there's no distinction between
>> MSG_OPTIMIZED_LOCATIONS vs MSG_NOTE vs MSG_MISSED_OPTIMIZATION in the
>> textual output.
>>
>> This patch changes dumpfile.c so that the "note: " varies to show
>> which MSG_* was used, with the string prefix matching that used for
>> filtering in -fopt-info, hence e.g.
>>   directive_unroll_3.f90:24:0: optimized: loop unrolled 7 times
>> and:
>>   pr19210-1.c:24:3: missed: missed loop optimization: niters analysis ends 
>> up with assumptions.
>>
>> The patch adds "dg-optimized" and "dg-missed" directives for use
>> in the testsuite for matching these (with -fopt-info on stderr; they
>> don't help for dumpfile output).
> Thanks, this is very useful.
>
>
> I just ran into a problem regarding these two:
>
>> --- a/gcc/testsuite/lib/gcc-dg.exp
>> +++ b/gcc/testsuite/lib/gcc-dg.exp
>> +# Handle output from -fopt-info for MSG_OPTIMIZED_LOCATIONS:
>> +# a successful optimization.
>> +
>> +proc dg-optimized { args } {
>> +# Make this variable available here and to the saved proc.
>> +upvar dg-messages dg-messages
>> +
>> +process-message saved-dg-error "optimized: " "$args"
>> +}
>> +
>> +# Handle output from -fopt-info for MSG_MISSED_OPTIMIZATION:
>> +# a missed optimization.
>> +
>> +proc dg-missed { args } {
>> +# Make this variable available here and to the saved proc.
>> +upvar dg-messages dg-messages
>> +
>> +process-message saved-dg-error "missed: " "$args"
>> +}
> If, in addition to the usual line location checking, you'd like to do
> column location checking ("[column]: " prefix before the actual
> diagnostic), and the actual diagnostic doesn't begin with whitespace,
> then this currently fails.  To address this, OK to push the attached
> patch "[testsuite] Enable column location checking for 'dg-optimized',
> 'dg-missed'" -- with or without the demonstrator
> 'gcc.dg/vect/nodump-vect-opt-info-1.c',
> 'gcc.dg/vect/nodump-vect-opt-info-2.c' changes, your call?  (I still have
> to run this through regression testing.)
>
>
> Grüße
>  Thomas
>
>
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, 
> Alexander Walter
>
> 0001-testsuite-Enable-column-location-checking-for-dg-opt.patch
>
> From f3046b8bea6a2a6489dd10d72cb038b92aa4fc38 Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge 
> Date: Fri, 6 Nov 2020 09:18:06 +0100
> Subject: [PATCH] [testsuite] Enable column location checking for
>  'dg-optimized', 'dg-missed'
>
> 'process-message' would like the 'msgprefix' argument without trailing space.
>
> This is a bug-fix for commit ed2d9d3720adef3a260b8a55e17e744352a901fc
> "dumpfile.c: use prefixes other than 'note: ' for
> MSG_{OPTIMIZED_LOCATIONS|MISSED_OPTIMIZATION}", which added 'dg-optimized',
> 'dg-missed'.
>
>   gcc/testsuite/
>   * lib/gcc-dg.exp (dg-optimized, dg-missed): Fix 'process-message'
>   call.
>   * gcc.dg/vect/nodump-vect-opt-info-1.c: Demonstrate.
>   * gcc.dg/vect/nodump-vect-opt-info-2.c: Likewise.

OK

jeff




Add 'libgomp.oacc-fortran/pr94358-1.f90' [PR94358] (was: [PATCH, OpenACC] Rework OpenACC Fortran DO loop initialization)

2020-11-13 Thread Thomas Schwinge
Hi!

The whole topic of GCC PR94358 "[OMP] Privatize internal array variables
introduced by the Fortran FE" is yet to be resolved, but we may already
now add Gergő's testcase:

On 2019-01-25T15:13:48+0100, Gergö Barany  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90
> @@ -0,0 +1,31 @@
> +[...]

... to document the status quo, and so that it may help highlight any
behavioral changes later on.  I've pushed "Add
'libgomp.oacc-fortran/pr94358-1.f90' [PR94358]" to master branch in
commit d1ba078d9bcc3457d36ba12695cfef29eb3ca942, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From d1ba078d9bcc3457d36ba12695cfef29eb3ca942 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerg=C3=B6=20Barany?= 
Date: Mon, 21 Jan 2019 03:08:57 -0800
Subject: [PATCH] Add 'libgomp.oacc-fortran/pr94358-1.f90' [PR94358]

Document status quo re PR94358 "[OMP] Privatize internal array variables
introduced by the Fortran FE".

	libgomp/
	PR fortran/94358
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: New.

Co-authored-by: Thomas Schwinge 
---
 .../libgomp.oacc-fortran/pr94358-1.f90| 34 +++
 1 file changed, 34 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
new file mode 100644
index 000..5013c5ba04b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
@@ -0,0 +1,34 @@
+! { dg-do run }
+! { dg-additional-options "-fopt-info-omp-all" }
+
+subroutine kernel(lo, hi, a, b, c)
+  implicit none
+  integer :: lo, hi, i
+  real, dimension(lo:hi) :: a, b, c
+
+  !$acc kernels copyin(lo, hi) ! { dg-optimized "assigned OpenACC seq loop parallelism" }
+  !$acc loop independent
+  do i = lo, hi
+ b(i) = a(i)
+  end do
+  !$acc loop independent
+  do i = lo, hi
+ c(i) = b(i)
+  end do
+  !$acc end kernels
+end subroutine kernel
+
+program main
+  integer :: n = 20
+  real, dimension(1:20) :: a, b, c
+
+  a(:) = 1
+  b(:) = 2
+  c(:) = 3
+
+  call kernel(1, n, a, b, c)
+
+  do i = 1, n
+ if (c(i) .ne. 1) call abort
+  end do
+end program main
-- 
2.17.1



Re: [PATCH] dumpfile.c: use prefixes other that 'note: ' for MSG_{OPTIMIZED_LOCATIONS|MISSED_OPTIMIZATION}

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/6/20 1:50 AM, Thomas Schwinge wrote:
> Hi!
>
> On 2018-09-25T16:00:14-0400, David Malcolm  wrote:
>> The patch adds "dg-optimized" and "dg-missed" directives
> Another small thing I just noticed:
>
>> --- a/gcc/testsuite/lib/gcc-dg.exp
>> +++ b/gcc/testsuite/lib/gcc-dg.exp
>> +# Handle output from -fopt-info for MSG_OPTIMIZED_LOCATIONS:
>> +# a successful optimization.
>> +
>> +proc dg-optimized { args } {
>> +# Make this variable available here and to the saved proc.
>> +upvar dg-messages dg-messages
>> +
>> +process-message saved-dg-error "optimized: " "$args"
>> +}
>> +
>> +# Handle output from -fopt-info for MSG_MISSED_OPTIMIZATION:
>> +# a missed optimization.
>> +
>> +proc dg-missed { args } {
>> +# Make this variable available here and to the saved proc.
>> +upvar dg-messages dg-messages
>> +
>> +process-message saved-dg-error "missed: " "$args"
>> +}
> These currently print "(test for *errors*, line [...])".  However, these
> diagnostics are not actually error diagnostics (fatal, meaning: causes
> compilation to fail) but rather warning diagnostics (non-fatal, doesn't
> cause compilation to fail).  Thus, same as 'dg-message', these should use
> 'saved-dg-warning' instead of 'saved-dg-error', which will print: "(test
> for *warnings*, line [...])".  OK to change that after regression
> testing?

Yes.

jeff



[PATCH] ipa-cp: Avoid unwanted multiple propagations (PR 97816)

2020-11-13 Thread Martin Jambor
Hi,

when looking at the testcase of PR 97816 I realized that the reason
why we were hitting overflows in size growth estimates in IPA-CP is
not because the chains of how lattices feed values to each other are
so long but mainly because we add estimates in callee lattices to
caller lattices for each value source, which roughly corresponds to a
call graph edge, and therefore if there are multiple calls between two
functions passing the same value in a parameter we end up doing it
more than once, sometimes actually quite many times.

This patch avoids it by using a has_set to remember the source values
we have already updated.  This should make any overflows very unlikely
but not impossible, so I still included checks for overflows but
decided to restructure the code to only need it in the
propagate_effects function and modified it so that it does not need to
perform the check before each sum.

This is because I decided to add local estimates to propagated
estimates already in propagate_effects and not at the evaluation time.
The function can then do the sums in a wide type and discard them in
the unlikely case of an overflow.  I also decided to use the
opportunity to make propagated effect stats now include stats from
other values in the same SCCs.  In the dumps I have seen this tended
to increase size cost a tiny bit more than the estimated time benefit
but both increases were small.

I will look at how this affects the IPA-CP heuristics a bit more but
so far the changes do not seem to matter much.  The patch passes
bootstrap and testing on x86_64-linux.  I plan to start LTO bootstrap
straight away.

OK for trunk if I don't find any issues?

Martin


gcc/ChangeLog:

2020-11-13  Martin Jambor  

PR ipa/97816
* ipa-cp.c (safe_add): Removed.
(good_cloning_opportunity_p): Remove special handling of INT_MAX.
(value_topo_info::propagate_effects): Take care not to
propagate from one value to another through more sources.  Include
local time and size in propagates ones here.  Take care not to
overflow size.
(decide_about_value): Do not add local and propagated effects when
passing them to good_cloning_opportunity_p.
---
 gcc/ipa-cp.c | 66 
 1 file changed, 30 insertions(+), 36 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index c3ee71e16e1..45cc8fcbdee 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -3264,13 +3264,6 @@ good_cloning_opportunity_p (struct cgraph_node *node, 
sreal time_benefit,
 return false;
 
   gcc_assert (size_cost > 0);
-  if (size_cost == INT_MAX)
-{
-  if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, " good_cloning_opportunity_p returning "
-"false because of size overflow.\n");
-  return false;
-}
 
   class ipa_node_params *info = IPA_NODE_REF (node);
   int eval_threshold = opt_for_fn (node->decl, param_ipa_cp_eval_threshold);
@@ -3840,20 +3833,6 @@ propagate_constants_topo (class ipa_topo_info *topo)
 }
 }
 
-
-/* Return the sum of A and B if none of them is bigger than INT_MAX/2, return
-   INT_MAX.  */
-
-static int
-safe_add (int a, int b)
-{
-  if (a > INT_MAX/2 || b > INT_MAX/2)
-return INT_MAX;
-  else
-return a + b;
-}
-
-
 /* Propagate the estimated effects of individual values along the topological
from the dependent values to those they depend on.  */
 
@@ -3862,30 +3841,49 @@ void
 value_topo_info::propagate_effects ()
 {
   ipcp_value *base;
+  hash_set *> processed_srcvals;
 
   for (base = values_topo; base; base = base->topo_next)
 {
   ipcp_value_source *src;
   ipcp_value *val;
   sreal time = 0;
-  int size = 0;
+  HOST_WIDE_INT size = 0;
 
   for (val = base; val; val = val->scc_next)
{
  time = time + val->local_time_benefit + val->prop_time_benefit;
- size = safe_add (size, safe_add (val->local_size_cost,
-  val->prop_size_cost));
+ size = size + val->local_size_cost + val->prop_size_cost;
}
 
   for (val = base; val; val = val->scc_next)
-   for (src = val->sources; src; src = src->next)
- if (src->val
- && src->cs->maybe_hot_p ())
+   {
+ processed_srcvals.empty ();
+ for (src = val->sources; src; src = src->next)
+   if (src->val
+   && !processed_srcvals.contains (src->val)
+   && src->cs->maybe_hot_p ())
+ {
+   processed_srcvals.add (src->val);
+   HOST_WIDE_INT prop_size = size + src->val->prop_size_cost;
+   if (prop_size < INT_MAX)
+ {
+   src->val->prop_time_benefit += time;
+   src->val->prop_size_cost = prop_size;
+ }
+ }
+
+ if (size < INT_MAX)
{
- src->val->prop_time_benefit = time + 

[PATCH, rs6000] Add Power10 scheduling description

2020-11-13 Thread Pat Haugen via Gcc-patches
Add Power10 scheduling description.

This patch adds the Power10 scheduling description. Since power10.md was pretty 
much a complete rewrite (existing version of power10.md is mostly just a copy 
of power9.md), I diffed power10.md with /dev/null so that the full contents of 
the file are shown as opposed to a diff. This should make it easier to read. 
This patch will not apply on current trunk do to that reason.
 
Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-11-13  Pat Haugen  

gcc/
* config/rs6000/rs6000.c (struct processor_costs): New.
(rs6000_option_override_internal): Set Power10 costs.
(rs6000_issue_rate): Set Power10 issue rate.
* config/rs6000/power10.md: Rewrite for Power10.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4d528a39a37..85bb42d6dce 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1080,6 +1080,26 @@ struct processor_costs power9_cost = {
   COSTS_N_INSNS (3),   /* SF->DF convert */
 };
 
+/* Instruction costs on POWER10 processors.  */
+static const
+struct processor_costs power10_cost = {
+  COSTS_N_INSNS (1),   /* mulsi */
+  COSTS_N_INSNS (1),   /* mulsi_const */
+  COSTS_N_INSNS (1),   /* mulsi_const9 */
+  COSTS_N_INSNS (1),   /* muldi */
+  COSTS_N_INSNS (4),   /* divsi */
+  COSTS_N_INSNS (4),   /* divdi */
+  COSTS_N_INSNS (2),   /* fp */
+  COSTS_N_INSNS (2),   /* dmul */
+  COSTS_N_INSNS (7),   /* sdiv */
+  COSTS_N_INSNS (9),   /* ddiv */
+  128, /* cache line size */
+  32,  /* l1 cache */
+  512, /* l2 cache */
+  16,  /* prefetch streams */
+  COSTS_N_INSNS (2),   /* SF->DF convert */
+};
+
 /* Instruction costs on POWER A2 processors.  */
 static const
 struct processor_costs ppca2_cost = {
@@ -4734,10 +4754,13 @@ rs6000_option_override_internal (bool global_init_p)
break;
 
   case PROCESSOR_POWER9:
-  case PROCESSOR_POWER10:
rs6000_cost = _cost;
break;
 
+  case PROCESSOR_POWER10:
+   rs6000_cost = _cost;
+   break;
+
   case PROCESSOR_PPCA2:
rs6000_cost = _cost;
break;
@@ -18001,8 +18024,9 @@ rs6000_issue_rate (void)
   case PROCESSOR_POWER8:
 return 7;
   case PROCESSOR_POWER9:
-  case PROCESSOR_POWER10:
 return 6;
+  case PROCESSOR_POWER10:
+return 8;
   default:
 return 1;
   }
diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
new file mode 100644
index 000..f9ca4cbf10e
--- /dev/null
+++ b/gcc/config/rs6000/power10.md
@@ -0,0 +1,553 @@
+;; Scheduling description for the IBM POWER10 processor.
+;; Copyright (C) 2020-2020 Free Software Foundation, Inc.
+;;
+;; Contributed by Pat Haugen (pthau...@us.ibm.com).
+
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+; For Power10 we model (and try to pack) the in-order decode/dispatch groups
+; which consist of 8 instructions max.  We do not try to model the details of
+; the out-of-order issue queues and how insns flow to the various execution
+; units except for the simple representation of the issue limitation of at
+; most 4 insns to the execution units/2 insns to the load units/2 insns to
+; the store units.
+(define_automaton "power10dsp,power10issue,power10div")
+
+; Decode/dispatch slots
+(define_cpu_unit "du0_power10,du1_power10,du2_power10,du3_power10,
+ du4_power10,du5_power10,du6_power10,du7_power10" "power10dsp")
+
+; Four execution units
+(define_cpu_unit "exu0_power10,exu1_power10,exu2_power10,exu3_power10"
+"power10issue")
+; Two load units and two store units
+(define_cpu_unit "lu0_power10,lu1_power10" "power10issue")
+(define_cpu_unit "stu0_power10,stu1_power10" "power10issue")
+; Create false units for use by non-pipelined div/sqrt
+(define_cpu_unit "fx_div0_power10,fx_div1_power10" "power10div")
+(define_cpu_unit "fp_div0_power10,fp_div1_power10,fp_div2_power10,
+ fp_div3_power10" "power10div")
+
+
+; Dispatch slots are allocated in order conforming to program order.
+(absence_set "du0_power10" "du1_power10,du2_power10,du3_power10,du4_power10,\
+  du5_power10,du6_power10,du7_power10")
+(absence_set "du1_power10" "du2_power10,du3_power10,du4_power10,du5_power10,\
+  du6_power10,du7_power10")

Re: [PATCH v5 2/8] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

2020-11-13 Thread Mike Crowe via Gcc-patches
On Thursday 12 November 2020 at 23:07:47 +, Jonathan Wakely wrote:
> On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
> > The futex system call supports waiting for an absolute time if
> > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT.  Doing so provides two
> > benefits:
> > 
> > 1. The call to gettimeofday is not required in order to calculate a
> >   relative timeout.
> > 
> > 2. If someone changes the system clock during the wait then the futex
> >   timeout will correctly expire earlier or later.  Currently that only
> >   happens if the clock is changed prior to the call to gettimeofday.
> > 
> > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
> > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25.  To ensure
> > that the code still works correctly with earlier kernel versions, an ENOSYS
> > error from futex[1] results in the futex_clock_realtime_unavailable flag
> > being set.  This flag is used to avoid the unnecessary unsupported futex
> > call in the future and to fall back to the previous gettimeofday and
> > relative time implementation.
> > 
> > glibc applied an equivalent switch in pthread_cond_timedwait to use
> > FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
> > glibc-2.10 back in 2009.  See
> > glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7
> > 
> > The futex_clock_realtime_unavailable flag is accessed using
> > std::memory_order_relaxed to stop it becoming a bottleneck.  If the first
> > two calls to _M_futex_wait_until happen to happen simultaneously then the
> > only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
> > risk discovering that it doesn't work and, if so, both set the flag.
> > 
> > [1] This is how glibc's nptl-init.c determines whether these flags are
> >supported.
> > 
> > * libstdc++-v3/src/c++11/futex.cc: Add new constants for required
> > futex flags.  Add futex_clock_realtime_unavailable flag to store
> > result of trying to use
> > FUTEX_CLOCK_REALTIME. 
> > (__atomic_futex_unsigned_base::_M_futex_wait_until):
> > Try to use FUTEX_WAIT_BITSET with FUTEX_CLOCK_REALTIME and only
> > fall back to using gettimeofday and FUTEX_WAIT if that's not
> > supported.
> 
> Mike,
> 
> I've been doing some performance comparisons and this patch seems to
> make quite a big difference to code that polls a future by calling
> fut.wait_until(t) using any t < now() as the timeout. For example,
> fut.wait_until(chrono::system_clock::time_point{}) to wait until the
> UNIX epoch.
> 
> With GCC 10 (or with the if (!futex_clock_realtime_unavailable.load(...)
> commented out) I see that polling take < 100ns. With the change, it
> takes 3000ns or more.
> 
> Now this is still far better than polling using fut.wait_for(0s) which
> takes around 5ns due to the clock_gettime call, but I'm about to
> fix that.
> 
> I'm not sure how important it is for wait_until(past) to be fast, but
> the difference from 100ns to 3000ns seems significant. Do you see the
> same kind of numbers? Is this just a property of the futex wait with
> an absolute time?
> 
> N.B. using wait_until(system_clock::time_point::min()) or any other
> time before the epoch doesn't work. The futex syscall returns EINVAL
> which we don't check for. I'm about to fix that too.

I see similar behaviour. I suppose this is because the
gettimeofday/clock_gettime system calls are in the VDSO and therefore
usually much cheaper to call than the real system call SYS_futex.

If rather than bailing out early when the relative timeout is negative, I
call the relative SYS_futex with rt.tv_sec = rt.tv_nsec = 0 then the
wait_until call takes about ten times longer than when using the absolute
SYS_futex. I can't really explain that.

Calling these functions with a time in the past is probably quite common if
you calculate a single timeout for several operations in sequence. What's
less clear is whether the performance matters that much when the return
value indicates a timeout anyway.

If gettimeofday/clock_gettime are cheap enough then I suppose we can call
them even in the absolute timeout case (losing benefit 1 above, which
appears to not really exist) to get the improved performance for timeouts
in the past whilst retaining the correct behaviour if the clock is warped
that this patch addressed (benefit 2 above.)

I'll try to come up with some standalone test cases with results for
further discussion. I suspect that the glibc people will be interested too.

Thanks for investigating this.

Mike.


[Bug middle-end/94527] RFE: Add an __attribute__ that marks a function as freeing an object

2020-11-13 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94527

--- Comment #12 from Martin Sebor  ---
David Malcolm: I went with "overloading" attribute malloc in my patch for the
reasons I explained my comments on your patch and in the patch submission
email.  I'm open to changing the name (or the association from the allocator to
the deallocator) so long as it's limited to pointers (with integers and other
handles handled by some other attribute), and provided it lets users specify
the position of the argument in the deallocation function's argument list.

David Howells: If/when you have a chance please comment on the design and let
us know if you have any concerns.

Re: [00/32] C++ 20 Modules

2020-11-13 Thread Mike Stump via Gcc-patches
On Nov 3, 2020, at 1:12 PM, Nathan Sidwell  wrote:
> 
> Here is the implementation of C++20 modules that I have been developing on 
> the devel/c++-modules branch over the last few years.

I was just recently wondering about this.  Congratulations.

> It is some 25K new lines of code (plus testsuite).

> Definitely the most important event of today :)

I agree.

> don't forget to vote.

I vote yes; although, I didn't know we had switched to voting patches in.

:-)

Re: [PATCH] c++: Predefine __STDCPP_THREADS__ in the compiler if thread model is not single

2020-11-13 Thread Jason Merrill via Gcc-patches

On 11/13/20 4:46 PM, Jakub Jelinek wrote:

On Fri, Nov 13, 2020 at 04:39:25PM -0500, Jason Merrill wrote:

On 11/13/20 2:20 PM, Tom Tromey wrote:

"Jakub" == Jakub Jelinek via Gcc-patches  writes:


Jakub> 2020-11-13  Jakub Jelinek  

Jakub>   * c-cppbuiltin.c: Include configargs.h.
Jakub>   (c_cpp_builtins): For C++11 and later if THREAD_MODEL_SPEC is not
Jakub>   defined, predefine __STDCPP_THREADS__ to 1 unless thread_model is
Jakub>   "single".

Note this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63287


Any opinions about the relative advantage of doing this in the compiler vs.
doing it in the library, as in Jonathan's patch in the PR?


If it is done in the library, it will be defined only if any of the library
headers are included.
The https://eel.is/c++draft/cpp.predefined wording doesn't look like it
would allow defining it only if certain headers are included
(unlike e.g. the __cpp_lib_* macros which have associated list of headers
that should define those).


Then the patch is OK.

Jason



[Bug middle-end/94527] RFE: Add an __attribute__ that marks a function as freeing an object

2020-11-13 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94527

--- Comment #11 from Martin Sebor  ---
Patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559053.html

Re: [PATCH] c++: Predefine __STDCPP_THREADS__ in the compiler if thread model is not single

2020-11-13 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 13, 2020 at 04:39:25PM -0500, Jason Merrill wrote:
> On 11/13/20 2:20 PM, Tom Tromey wrote:
> > > > > > > "Jakub" == Jakub Jelinek via Gcc-patches 
> > > > > > >  writes:
> > 
> > Jakub> 2020-11-13  Jakub Jelinek  
> > 
> > Jakub>  * c-cppbuiltin.c: Include configargs.h.
> > Jakub>  (c_cpp_builtins): For C++11 and later if THREAD_MODEL_SPEC is 
> > not
> > Jakub>  defined, predefine __STDCPP_THREADS__ to 1 unless thread_model 
> > is
> > Jakub>  "single".
> > 
> > Note this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63287
> 
> Any opinions about the relative advantage of doing this in the compiler vs.
> doing it in the library, as in Jonathan's patch in the PR?

If it is done in the library, it will be defined only if any of the library
headers are included.
The https://eel.is/c++draft/cpp.predefined wording doesn't look like it
would allow defining it only if certain headers are included
(unlike e.g. the __cpp_lib_* macros which have associated list of headers
that should define those).

Jakub



[PATCH] detect allocation/deallocation mismatches in user-defined functions (PR94527)

2020-11-13 Thread Martin Sebor via Gcc-patches

Bug 94527 is request from the kernel developers for an attribute
to indicate that a user-defined function deallocates an object
allocated by an earlier call to an allocation function.  Their
goal is to detect misuses of such functions and the pointers or
objects returned from them.

The recently submitted patches(*) enable the detection of a subset
of such misuses for standard allocation functions like malloc and
free, but those are just a small fraction of allocation/deallocation
functions used in practice, and only rarely used in the kernel
(mostly in utility programs). The attached patch extends attribute
malloc to enable this detection also for user-defined functions.

The design extends attribute malloc to accept one or two optional
arguments: one naming a deallocation function that deallocates
pointers returned from the malloc-like function, and another to
denote the position of the pointer argument in the deallocation
functions parameter list.  Any number of deallocators can be
associated with any number of allocators.  This makes it possible
to annotate, for example, all the POSIX  functions that
open and close FILE streams and detect mismatches between any
pairs that aren't suitable (in addition to calling free on
a FILE* returned from fopen, for instance).

An association with an allocator results in adding an internal
"*dealloc" attribute to the deallocator so that the former can
be quickly looked up based on a call to the latter.

Tested on x86_64-linux + Glibc & Binutils/GDB (no instances
of the new warnings).

Martin

[*] Prerequisite patch
add -Wmismatched-new-delete to middle end (PR 90629)
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557987.html

PS In pr94527 Jonathan notes that failing to properly match pairs
of calls isn't limited to APIs that return pointers and applies
to other kinds of "handles" including integers (e.g., the POSIX
open/close APIs), and a detection of such mismatches would be
helpful as well.  David submitted a prototype of this for
the analyzer here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/44.html
I chose not to implement nonpointer detection for some of the same
reasons as mentioned in comment #8 on the bug (and also because
there's no support for it in the machinery I use).  I also didn't
use the same attribute as David, in part because I think it's better
to provide separate attributes for pointer APIs and for others
(integers), and in part also because the deallocated_by attribute
design as is cannot accommodate my goal of supporting app standard
functions (including the  freopen which "deallocates"
the third argument).
PR middle-end/94527 - Add an __attribute__ that marks a function as freeing an object

gcc/ChangeLog:

	PR middle-end/94527
	* builtins.c (gimple_call_alloc_p): Handle user-defined functions.
	(fndecl_alloc_p): New helper.
	(call_dealloc_argno): New helper.
	(gimple_call_dealloc_p): Call it.
	(call_dealloc_p): Same.
	(matching_alloc_calls_p): Handle user-defined functions.
	(maybe_emit_free_warning): Same.
	* doc/extend.texi (attribute malloc): Update.
	* doc/invoke.texi (-Wmismatched-dealloc): Document new option.

gcc/c-family/ChangeLog:

	PR middle-end/94527
	* c-attribs.c (handle_dealloc_attribute): New function.
	(handle_malloc_attribute): Handle argument forms of attribute.
	* c.opt (-Wmismatched-dealloc): New option.
	(-Wmismatched-new-delete): Update description.

gcc/testsuite/ChangeLog:

	PR middle-end/94527
	* g++.dg/warn/Wmismatched-dealloc-2.C: New test.
	* g++.dg/warn/Wmismatched-dealloc.C: New test.
	* gcc.dg/Wmismatched-dealloc.c: New test.
	* gcc.dg/attr-malloc.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index ebdded69189..aad99da01c2 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -13014,10 +13014,9 @@ find_assignment_location (tree var)
allocated objects.  Otherwise return true even for all forms of
alloca (including VLA).  */
 
-bool
-gimple_call_alloc_p (gimple *stmt, bool all_alloc /* = false */)
+static bool
+fndecl_alloc_p (tree fndecl, bool all_alloc /* = false */)
 {
-  tree fndecl = gimple_call_fndecl (stmt);
   if (!fndecl)
 return false;
 
@@ -13025,24 +13024,53 @@ gimple_call_alloc_p (gimple *stmt, bool all_alloc /* = false */)
   if (DECL_IS_OPERATOR_NEW_P (fndecl))
 return true;
 
-  /* TODO: Handle user-defined functions with attribute malloc.  */
-  if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+  if (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL))
+{
+  switch (DECL_FUNCTION_CODE (fndecl))
+	{
+	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
+	  return all_alloc;
+	case BUILT_IN_CALLOC:
+	case BUILT_IN_MALLOC:
+	case BUILT_IN_REALLOC:
+	case BUILT_IN_STRDUP:
+	case BUILT_IN_STRNDUP:
+	  return true;
+	default:
+	  break;
+	}
+}
+
+  /* A function is considered an allocation function if it's declared
+ with attribute malloc with an argument naming its associated
+ deallocation function.  */
+  tree attrs = 

Re: [PATCH] RFC: add "deallocated_by" attribute for use by analyzer

2020-11-13 Thread Jeff Law via Gcc-patches


On 10/5/20 5:12 PM, David Malcolm via Gcc-patches wrote:
> This work-in-progress patch generalizes the malloc/free problem-checking
> in -fanalyzer so that it can work on arbitrary acquire/release API pairs.
>
> It adds a new __attribute__((deallocated_by(FOO))) that could be used
> like this in a library header:
>
>   struct foo;
>
>   extern void foo_release (struct foo *);
>
>   extern struct foo *foo_acquire (void)
> __attribute__ ((deallocated_by(foo_release)));
>
> In theory, the analyzer then "knows" these functions are an
> acquire/release pair, and can emit diagnostics for leaks, double-frees,
> use-after-frees, mismatching deallocations, etc.
>
> My hope was that this would provide a minimal level of markup that would
> support library-checking without requiring lots of further markup.
> I attempted to use this to detect a memory leak within a Linux
> driver (CVE-2019-19078), by adding the attribute to mark these fns:
>   extern struct urb *usb_alloc_urb(int iso_packets, gfp_t mem_flags);
>   extern void usb_free_urb(struct urb *urb);
> where there is a leak of a "urb" on an error-handling path.
> Unfortunately I ran into the problem that there are various other fns
> that take "struct urb *" and the analyzer conservatively assumes that a
> urb passed to them might or might not be freed and thus stops tracking
> state for them.
>
> So I don't know how much use this feature would be as-is.
> (without either requiring lots of additional attributes for marking
> fndecl args as being merely borrowed, or simply assuming that they
> are borrowed in the absence of a function body to analyze)
>
> Thoughts?
> Dave
>
> gcc/analyzer/ChangeLog:
>   * region-model-impl-calls.cc
>   (region_model::impl_deallocation_call): New.
>   * region-model.cc: Include "attribs.h".
>   (region_model::on_call_post): Handle fndecls referenced by
>   __attribute__((deallocated_by(FOO))).
>   * region-model.h (region_model::impl_deallocation_call): New decl.
>   * sm-malloc.cc: Include "stringpool.h" and "attribs.h".
>   (enum wording): Add WORDING_DEALLOCATED.
>   (malloc_state_machine::custom_api_map_t): New typedef.
>   (malloc_state_machine::m_custom_apis): New field.
>   (start_p): New.
>   (use_after_free::describe_state_change): Handle
>   WORDING_DEALLOCATED.
>   (use_after_free::describe_final_event): Likewise.
>   (malloc_leak::describe_state_change): Only emit "allocated here" on
>   a start->nonnull transition, rather than on other transitions to
>   nonnull.
>   (malloc_state_machine::~malloc_state_machine): New.
>   (malloc_state_machine::on_stmt): Handle
>   "__attribute__((deallocated_by(FOO)))", and the special attribute
>   set on FOO.
>   (malloc_state_machine::get_or_create_api): New.
>   (malloc_state_machine::on_allocator_call): Add "returns_nonnull"
>   param and use it to affect which state to transition to.
>
> gcc/c-family/ChangeLog:
>   * c-attribs.c (c_common_attribute_table): Add entry for
>   "deallocated_by".
>   (matching_deallocator_type_p): New.
>   (maybe_add_deallocator_attribute): New.
>   (handle_deallocated_by_attribute): New.
>
> gcc/ChangeLog:
>   * doc/extend.texi (Common Function Attributes): Add
>   "deallocated_by".
>
> gcc/testsuite/ChangeLog:
>   * gcc.dg/analyzer/attr-deallocated_by-1.c: New test.
>   * gcc.dg/analyzer/attr-deallocated_by-1a.c: New test.
>   * gcc.dg/analyzer/attr-deallocated_by-2.c: New test.
>   * gcc.dg/analyzer/attr-deallocated_by-3.c: New test.
>   * gcc.dg/analyzer/attr-deallocated_by-4.c: New test.
>   * gcc.dg/analyzer/attr-deallocated_by-CVE-2019-19078-usb-leak.c:
>   New test.
>   * gcc.dg/analyzer/attr-deallocated_by-misuses.c: New test.

I'd probably go with something more like acquire/release since I think
the same concepts apply to things like file descriptors acquired by open
and released by close.  I think the basic concept makes sense and would
be useful, so I'd lean towards moving forward even if it hasn't been
particularly useful for the analyzer yet.  One could even ponder
propagation of the attribute similar to what we do with const/pure so
that we could see through wrappers without the user having to do more
markup.


What I wonder here is whether or not Martin's work could take advantage
of the attribute.   I don't see that as strictly necessary for the patch
to move forward, just a question we should try to answer.


So I don't mind seeing it go forward.  I leave it as your call.


jeff




[Bug libstdc++/96322] 22_locale/numpunct/members/char/3.cc is outdated: expects grouping=0, actual=3

2020-11-13 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96322

--- Comment #3 from Sergei Trofimovich  ---
Good point.

I tried pt_PT on FreeBSD to check if it's the same as in Linux and it's not:
there grouping=3 is used. +1 for custom locale.

Re: [PATCH] c++: Predefine __STDCPP_THREADS__ in the compiler if thread model is not single

2020-11-13 Thread Jonathan Wakely via Gcc-patches
>> "Jakub" == Jakub Jelinek via Gcc-patches  
>> writes:
>
>Jakub> * c-cppbuiltin.c: Include configargs.h.
>Jakub> (c_cpp_builtins): For C++11 and later if THREAD_MODEL_SPEC is 
>not
>Jakub> defined, predefine __STDCPP_THREADS__ to 1 unless thread_model 
>is
>Jakub> "single".
>
>Note this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63287

And FTR the QNX case that Jakub mentioned is now:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97815



Re: [PATCH] c++: Predefine __STDCPP_THREADS__ in the compiler if thread model is not single

2020-11-13 Thread Jason Merrill via Gcc-patches

On 11/13/20 2:20 PM, Tom Tromey wrote:

"Jakub" == Jakub Jelinek via Gcc-patches  writes:


Jakub> 2020-11-13  Jakub Jelinek  

Jakub>   * c-cppbuiltin.c: Include configargs.h.
Jakub>   (c_cpp_builtins): For C++11 and later if THREAD_MODEL_SPEC is not
Jakub>   defined, predefine __STDCPP_THREADS__ to 1 unless thread_model is
Jakub>   "single".

Note this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63287


Any opinions about the relative advantage of doing this in the compiler 
vs. doing it in the library, as in Jonathan's patch in the PR?


Jason



Re: [PATCH] handle conditionals in -Wstringop-overflow et al. (PR 92936)

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/2/20 7:24 PM, Martin Sebor wrote:
> The attached patch extends compute_objsize() to handle conditional
> expressions represented either as PHIs or MIN_EXPR and MAX_EXPR.
>
> To simplify the handling of the -Wstringop-overflow/-overread
> warnings the change factors this code out of tree-ssa-strlen.c
> and into inform_access() in builtins.c, making it a member of
> access_ref.  Besides eliminating a decent amount of code
> duplication this also improves the consistency of the warnings.
>
> Finally, the change introduces a distinction between the definite
> kinds of -Wstringop-overflow (and -Wstringop-overread) warnings
> and the maybe kind.  The latter are currently only being issued
> for function array parameters but I expect to make use of them
> more extensively in the future.
>
> Besides the usual GCC bootstrap/regtest I have tested the change
> with Binutils/GDB and Glibc and verified that it doesn't introduce
> any false positives.
>
> Martin
>
> gcc-92936.diff
>
> PR middle-end/92936 - missing warning on a past-the-end store to a PHI
> PR middle-end/92940 - incorrect offset and size in -Wstringop-overflow for 
> out-of-bounds store into VLA and two offset ranges
> PR middle-end/89428 - missing -Wstringop-overflow on a PHI with variable 
> offset
>
> gcc/ChangeLog:
>
>   PR middle-end/92936
>   PR middle-end/92940
>   PR middle-end/89428
>   * builtins.c (access_ref::access_ref): Initialize member.
>   (access_ref::phi): New function.
>   (access_ref::get_ref): New function.
>   (access_ref::add_offset): Remove duplicate assignment.
>   (maybe_warn_for_bound): Add "maybe" kind of warning messages.
>   (warn_for_access): Same.
>   (inform_access): Rename...
>   (access_ref::inform_access): ...to this.  Print PHI arguments.  Format
>   offset the same as size and simplify.  Improve printing of allocation
>   functions and VLAs.
>   (check_access): Adjust to the above.
>   (gimple_parm_array_size): Change argument.
>   (handle_min_max_size): New function.
>   * builtins.h (struct access_ref): Declare new members.
>   (gimple_parm_array_size): Change argument.
>   * tree-ssa-strlen.c (maybe_warn_overflow): Use access_ref and simplify.
>   (handle_builtin_memcpy): Correct argument passed to maybe_warn_overflow.
>   (handle_builtin_memset): Same.
>
> gcc/testsuite/ChangeLog:
>
>   PR middle-end/92936
>   PR middle-end/92940
>   PR middle-end/89428
>   * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>   informational notes.
>   * gcc.dg/Wstringop-overflow-11.c: Remove xfails.
>   * gcc.dg/Wstringop-overflow-12.c: Same.
>   * gcc.dg/Wstringop-overflow-17.c: Adjust text of expected messages.
>   * gcc.dg/Wstringop-overflow-27.c: Same.  Remove xfails.
>   * gcc.dg/Wstringop-overflow-28.c: Adjust text of expected messages.
>   * gcc.dg/Wstringop-overflow-29.c: Same.
>   * gcc.dg/Wstringop-overflow-37.c: Same.
>   * gcc.dg/Wstringop-overflow-46.c: Same.
>   * gcc.dg/Wstringop-overflow-47.c: Same.
>   * gcc.dg/Wstringop-overflow-54.c: Same.
>   * gcc.dg/warn-strnlen-no-nul.c: Add expected warning.
>   * gcc.dg/Wstringop-overflow-58.c: New test.
>   * gcc.dg/Wstringop-overflow-59.c: New test.
>   * gcc.dg/Wstringop-overflow-60.c: New test.
>   * gcc.dg/Wstringop-overflow-61.c: New test.
>   * gcc.dg/Wstringop-overflow-62.c: New test.

So my only significant concern here is the recursive nature and the lack
of a limiter for pathological cases.  We certainly run into cases with
thousands of PHI arguments and deep chains of PHIs feeding other PHIs. 
Can you put in a PARAM to limit the amount of recursion and and PHI
arguments you look at?  With that I think this is fine -- I consider it
unlikely this patch is the root cause of the ICEs I sent you earlier
today from the tester since those failures are in the array bounds
checking bits.


jeff




[pushed] c++: Add feature test macro for C++20 using enum.

2020-11-13 Thread Jason Merrill via Gcc-patches
Missing piece from the 'using enum' implementation patch.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/c-family/ChangeLog:

* c-cppbuiltin.c (c_cpp_builtins): Define __cpp_using_enum.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/feat-cxx2a.C: Check it.
---
 gcc/c-family/c-cppbuiltin.c | 1 +
 gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 15f71e681ac..7229c59dd6b 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1005,6 +1005,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_constexpr_dynamic_alloc=201907L");
  cpp_define (pfile, "__cpp_impl_three_way_comparison=201907L");
  cpp_define (pfile, "__cpp_aggregate_paren_init=201902L");
+ cpp_define (pfile, "__cpp_using_enum=201907L");
}
   if (flag_concepts)
 {
diff --git a/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C 
b/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C
index 7f1fe34ad7f..dc46e48f6d7 100644
--- a/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C
+++ b/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C
@@ -533,3 +533,9 @@
 #elif __cpp_concepts != 201907
 #  error "__cpp_concepts != 201907"
 #endif
+
+#ifndef __cpp_using_enum
+#  error "__cpp_using_enum"
+#elif __cpp_using_enum != 201907
+#  error "__cpp_using_enum != 201907"
+#endif

base-commit: 1a90e99fa2f2423a195301adca060dccc3d0755e
-- 
2.18.4



Re: [committed] libstdc++: Optimise std::future::wait_for and fix futex polling

2020-11-13 Thread Jonathan Wakely via Gcc-patches

On 13/11/20 21:12 +, Jonathan Wakely wrote:

On 13/11/20 20:29 +, Mike Crowe via Libstdc++ wrote:

On Friday 13 November 2020 at 17:25:22 +, Jonathan Wakely wrote:

+  // Return the relative duration from (now_s + now_ns) to (abs_s + abs_ns)
+  // as a timespec.
+  struct timespec
+  relative_timespec(chrono::seconds abs_s, chrono::nanoseconds abs_ns,
+   time_t now_s, long now_ns)
+  {
+struct timespec rt;
+
+// Did we already time out?
+if (now_s > abs_s.count())
+  {
+   rt.tv_sec = -1;
+   return rt;
+  }
+
+auto rel_s = abs_s.count() - now_s;
+
+// Avoid overflows
+if (rel_s > __gnu_cxx::__int_traits::__max)
+  rel_s = __gnu_cxx::__int_traits::__max;
+else if (rel_s < __gnu_cxx::__int_traits::__min)
+  rel_s = __gnu_cxx::__int_traits::__min;


I may be missing something, but if the line above executes...


+
+// Convert the absolute timeout value to a relative timeout
+rt.tv_sec = rel_s;
+rt.tv_nsec = abs_ns.count() - now_ns;
+if (rt.tv_nsec < 0)
+  {
+   rt.tv_nsec += 10;
+   --rt.tv_sec;


...and so does this line above, then I think that we'll end up
underflowing. (Presumably rt.tv_sec will wrap round to being some time in
2038 on most 32-bit targets.)


Ugh.


I'm currently trying to persuade myself that this can actually happen and
if so work out how to come up with a test case for it.


Maybe something like:

auto d = chrono::floor(system_clock::now().time_since_epoch() 
- seconds(INT_MAX + 2LL));
fut.wait_until(system_clock::time_point(d));

This will create a sys_time with a value that is slightly more than
INT_MAX seconds before the current time, with a zero nanoseconds
component. The difference between the gettimeofday result and this
time will be slightly more negative than INT_MIN and so will overflow
a 32-bit time_t, causing the code to use __int_traits::__min.
As long as the gettimeofday call doesn't happen to also have a zero
nanoseconds component, the difference of the nanoseconds values will
be negative, and we will decrement the time_t value.


I don't believe that this part changed in your later patch.


Right. Thanks for catching this.

The attached patch should fix it. There's no point clamping very
negative values to the minimum time_t value, since any relative
timeout less than zero has already passed. So we can just use -1
there (and not bother with the tv_nsec field at all):

   else if (rel_s <= 0) [[unlikely]]
 {
rt.tv_sec = -1;
 }


Gah, no, this has to be < 0 not <= 0 otherwise we treat 0.1s as -1.1

And we should probably lose the [[unlikely]] there, since it's quite
feasible that an absolute time in the recent past gets used.




Re: [committed] libstdc++: Optimise std::future::wait_for and fix futex polling

2020-11-13 Thread Jonathan Wakely via Gcc-patches

On 13/11/20 20:29 +, Mike Crowe via Libstdc++ wrote:

On Friday 13 November 2020 at 17:25:22 +, Jonathan Wakely wrote:

+  // Return the relative duration from (now_s + now_ns) to (abs_s + abs_ns)
+  // as a timespec.
+  struct timespec
+  relative_timespec(chrono::seconds abs_s, chrono::nanoseconds abs_ns,
+   time_t now_s, long now_ns)
+  {
+struct timespec rt;
+
+// Did we already time out?
+if (now_s > abs_s.count())
+  {
+   rt.tv_sec = -1;
+   return rt;
+  }
+
+auto rel_s = abs_s.count() - now_s;
+
+// Avoid overflows
+if (rel_s > __gnu_cxx::__int_traits::__max)
+  rel_s = __gnu_cxx::__int_traits::__max;
+else if (rel_s < __gnu_cxx::__int_traits::__min)
+  rel_s = __gnu_cxx::__int_traits::__min;


I may be missing something, but if the line above executes...


+
+// Convert the absolute timeout value to a relative timeout
+rt.tv_sec = rel_s;
+rt.tv_nsec = abs_ns.count() - now_ns;
+if (rt.tv_nsec < 0)
+  {
+   rt.tv_nsec += 10;
+   --rt.tv_sec;


...and so does this line above, then I think that we'll end up
underflowing. (Presumably rt.tv_sec will wrap round to being some time in
2038 on most 32-bit targets.)


Ugh.


I'm currently trying to persuade myself that this can actually happen and
if so work out how to come up with a test case for it.


Maybe something like:

auto d = chrono::floor(system_clock::now().time_since_epoch() 
- seconds(INT_MAX + 2LL));
fut.wait_until(system_clock::time_point(d));

This will create a sys_time with a value that is slightly more than
INT_MAX seconds before the current time, with a zero nanoseconds
component. The difference between the gettimeofday result and this
time will be slightly more negative than INT_MIN and so will overflow
a 32-bit time_t, causing the code to use __int_traits::__min.
As long as the gettimeofday call doesn't happen to also have a zero
nanoseconds component, the difference of the nanoseconds values will
be negative, and we will decrement the time_t value.


I don't believe that this part changed in your later patch.


Right. Thanks for catching this.

The attached patch should fix it. There's no point clamping very
negative values to the minimum time_t value, since any relative
timeout less than zero has already passed. So we can just use -1
there (and not bother with the tv_nsec field at all):

else if (rel_s <= 0) [[unlikely]]
  {
rt.tv_sec = -1;
  }

But please check my working.




Oh how I wish all these functions didn't expect already cracked seconds and
nanoseconds. :(


Indeed.

At some point (probably not for GCC 11) I'm going to add a proper
wrapper class for all these futex calls, and make it robust, and then
replace the various hand-crafted uses of syscall(SYS_futex, ...) with
that type so we only need to fix all this in one place. And it will
just take a duration or time_point.


commit a77f131249202c7342cc385f2d0473ee032ae0eb
Author: Jonathan Wakely 
Date:   Fri Nov 13 20:57:15 2020

libstdc++: Fix another 32-bit time_t overflow in futex timeouts

libstdc++-v3/ChangeLog:

* src/c++11/futex.cc (relative_timespec): Avoid overflow if the
difference of seconds is the minimum time_t value and the
difference of nanoseconds is also negative.

diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index 15959cebee57..50768403cc63 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -75,19 +75,25 @@ namespace
 
 auto rel_s = abs_s.count() - now_s;
 
-// Avoid overflows
+// Convert the absolute timeout to a relative timeout, without overflow.
 if (rel_s > __int_traits::__max) [[unlikely]]
-  rel_s = __int_traits::__max;
-else if (rel_s < __int_traits::__min) [[unlikely]]
-  rel_s = __int_traits::__min;
-
-// Convert the absolute timeout value to a relative timeout
-rt.tv_sec = rel_s;
-rt.tv_nsec = abs_ns.count() - now_ns;
-if (rt.tv_nsec < 0)
   {
-	rt.tv_nsec += 10;
-	--rt.tv_sec;
+	rt.tv_sec = __int_traits::__max;
+	rt.tv_nsec = 9;
+  }
+else if (rel_s <= 0) [[unlikely]]
+  {
+	rt.tv_sec = -1;
+  }
+else
+  {
+	rt.tv_sec = rel_s;
+	rt.tv_nsec = abs_ns.count() - now_ns;
+	if (rt.tv_nsec < 0)
+	  {
+	rt.tv_nsec += 10;
+	--rt.tv_sec;
+	  }
   }
 
 return rt;


Re: [RFC, Instruction Scheduler, Stage1] New hook/code to perform fusion of dependent instructions

2020-11-13 Thread Pat Haugen via Gcc-patches
On 11/12/20 10:05 AM, Jeff Law wrote:
>> I have coded up a proof of concept that implements our needs via a new
>> target hook. The hook is passed a pair of dependent insns and returns if
>> they are a fusion candidate. It is called while removing the forward
>> dependencies of the just scheduled insn. If a dependent insn becomes
>> available to schedule and it's a fusion candidate with the just
>> scheduled insn, then the new code moves it to the ready list (if
>> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
>> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
>> candidate will be scheduled next. Following is the scheduling part of
>> the diff. Does this sound like a feasible approach? I welcome any
>> comments/discussion.
> It looks fairly reasonable to me.   Do you plan on trying to take this
> forward at all?

Due to other requirements where this approach did not work, we are pursuing a 
different approach of creating combine patterns for the target.

-Pat



Re: [PATCH] nvptx: Cache stacks block for OpenMP kernel launch

2020-11-13 Thread Julian Brown
Hi Alexander,

Thanks for the review! Comments below.

On Tue, 10 Nov 2020 00:32:36 +0300
Alexander Monakov  wrote:

> On Mon, 26 Oct 2020, Jakub Jelinek wrote:
> 
> > On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote:  
> > > This patch adds caching for the stack block allocated for
> > > offloaded OpenMP kernel launches on NVPTX. This is a performance
> > > optimisation -- we observed an average 11% or so performance
> > > improvement with this patch across a set of accelerated GPU
> > > benchmarks on one machine (results vary according to individual
> > > benchmark and with hardware used).  
> 
> In this patch you're folding two changes together: reuse of allocated
> stacks and removing one host-device synchronization.  Why is that?
> Can you report performance change separately for each change (and
> split out the patches)?

An accident of the development process of the patch, really -- the idea
for removing the post-kernel-launch synchronisation came from the
OpenACC side, and adapting it to OpenMP meant the stacks had to remain
allocated after the return of the GOMP_OFFLOAD_run function.

> > > A given kernel launch will reuse the stack block from the
> > > previous launch if it is large enough, else it is freed and
> > > reallocated. A slight caveat is that memory will not be freed
> > > until the device is closed, so e.g. if code is using highly
> > > variable launch geometries and large amounts of GPU RAM, you
> > > might run out of resources slightly quicker with this patch.
> > > 
> > > Another way this patch gains performance is by omitting the
> > > synchronisation at the end of an OpenMP offload kernel launch --
> > > it's safe for the GPU and CPU to continue executing in parallel
> > > at that point, because e.g. copies-back from the device will be
> > > synchronised properly with kernel completion anyway.  
> 
> I don't think this explanation is sufficient. My understanding is
> that OpenMP forbids the host to proceed asynchronously after the
> target construct unless it is a 'target nowait' construct. This may
> be observable if there's a printf in the target region for example
> (or if it accesses memory via host pointers).
> 
> So this really needs to be a separate patch with more explanation why
> this is okay (if it is okay).

As long as the offload kernel only touches GPU memory and does not have
any CPU-visible side effects (like the printf you mentioned -- I hadn't
really considered that, oops!), it's probably OK.

But anyway, the benefit obtained on OpenMP code (the same set of
benchmarks run before) of omitting the synchronisation at the end of
GOMP_OFFLOAD_run seems minimal. So it's good enough to just do the
stacks caching, and miss out the synchronisation removal for now. (It
might still be something worth considering later, perhaps, as long as
we can show some given kernel doesn't use printf or access memory via
host pointers -- I guess the former might be easier than the latter. I
have observed the equivalent OpenACC patch provide a significant boost
on some benchmarks, so there's probably something that could be gained
on the OpenMP side too.)

The benefit with the attached patch -- just stacks caching, no
synchronisation removal -- is about 12% on the same set of benchmarks
as before. Results are a little noisy on the machine I'm benchmarking
on, so this isn't necessarily proof that the synchronisation removal is
harmful for performance!

> > > In turn, the last part necessitates a change to the way "(perhaps
> > > abort was called)" errors are detected and reported.  
> 
> As already mentioned using callbacks is problematic. Plus, I'm sure
> the way you lock out other threads is a performance loss when
> multiple threads have target regions: even though they will not run
> concurrently on the GPU, you still want to allow host threads to
> submit GPU jobs while the GPU is occupied.
> 
> I would suggest to have a small pool (up to 3 entries perhaps) of
> stacks. Then you can arrange reuse without totally serializing host
> threads on target regions.

I'm really wary of the additional complexity of adding a stack pool,
and the memory allocation/freeing code paths in CUDA appear to be so
slow that we get a benefit with this patch even when the GPU stream has
to wait for the CPU to unlock the stacks block. Also, for large GPU
launches, the size of the soft-stacks block isn't really trivial (I've
seen something like 50MB on the hardware I'm using, with default
options), and multiplying that by 3 could start to eat into the GPU
heap memory for "useful data" quite significantly.

Consider the attached (probably not amazingly-written) microbenchmark.
It spawns 8 threads which each launch lots of OpenMP kernels
performing some trivial work, then joins the threads and checks the
results. As a baseline, with the "FEWER_KERNELS" parameters set (256
kernel launches over 8 threads), this gives us over 5 runs:

real3m55.375s
user7m14.192s
sys 0m30.148s

real

float.h: C2x *_IS_IEC_60559 macros

2020-11-13 Thread Joseph Myers
C2x adds float.h macros that say whether float, double and long double
match an IEC 60559 (IEEE 754) format and operations.  Add these
macros to GCC's float.h.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  OK to commit?

Although this changes the same part of float.h as

(pending review), there are no actual dependencies between the
patches; new tests are named to avoid conflicts with the tests added
in that patch.

gcc/c-family/
2020-11-13  Joseph Myers  

* c-cppbuiltin.c (builtin_define_float_constants): Define
*_IS_IEC_60559__ macros.

gcc/
2020-11-13  Joseph Myers  

* ginclude/float.h [__STDC_VERSION__ > 201710L] (FLT_IS_IEC_60559,
DBL_IS_IEC_60559, LDBL_IS_IEC_60559): New macros.

gcc/testsuite/
2020-11-13  Joseph Myers  

* gcc.dg/c11-float-6.c, gcc.dg/c2x-float-10.c: New tests.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index e5ebb79e22a..5a839d7fa6f 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -316,6 +316,16 @@ builtin_define_float_constants (const char *name_prefix,
   sprintf (name, "__FP_FAST_FMA%s", fma_suffix);
   builtin_define_with_int_value (name, 1);
 }
+
+  /* For C2x *_IS_IEC_60559.  0 means the type does not match an IEC
+ 60559 format, 1 that it matches a format but not operations and 2
+ that it matches a format and operations (but may not conform to
+ Annex F; we take this as meaning exceptions and rounding modes
+ need not be supported).  */
+  sprintf (name, "__%s_IS_IEC_60559__", name_prefix);
+  builtin_define_with_int_value (name,
+(fmt->ieee_bits == 0
+ ? 0 : (fmt->round_towards_zero ? 1 : 2)));
 }
 
 /* Define __DECx__ constants for TYPE using NAME_PREFIX and SUFFIX. */
diff --git a/gcc/ginclude/float.h b/gcc/ginclude/float.h
index 9c4b0385568..70149564ff1 100644
--- a/gcc/ginclude/float.h
+++ b/gcc/ginclude/float.h
@@ -248,6 +248,15 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define DBL_NORM_MAX   __DBL_NORM_MAX__
 #define LDBL_NORM_MAX  __LDBL_NORM_MAX__
 
+/* Whether each type matches an IEC 60559 format (1 for format, 2 for
+   format and operations).  */
+#undef FLT_IS_IEC_60559
+#undef DBL_IS_IEC_60559
+#undef LDBL_IS_IEC_60559
+#define FLT_IS_IEC_60559   __FLT_IS_IEC_60559__
+#define DBL_IS_IEC_60559   __DBL_IS_IEC_60559__
+#define LDBL_IS_IEC_60559  __LDBL_IS_IEC_60559__
+
 #endif /* C2X */
 
 #ifdef __STDC_WANT_IEC_60559_BFP_EXT__
diff --git a/gcc/testsuite/gcc.dg/c11-float-6.c 
b/gcc/testsuite/gcc.dg/c11-float-6.c
new file mode 100644
index 000..b0381e57884
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c11-float-6.c
@@ -0,0 +1,17 @@
+/* Test *_IS_IEC_60559 not defined for C11.  */
+/* { dg-do preprocess } */
+/* { dg-options "-std=c11 -pedantic-errors" } */
+
+#include 
+
+#ifdef FLT_IS_IEC_60559
+#error "FLT_IS_IEC_60559 defined"
+#endif
+
+#ifdef DBL_IS_IEC_60559
+#error "DBL_IS_IEC_60559 defined"
+#endif
+
+#ifdef LDBL_IS_IEC_60559
+#error "LDBL_IS_IEC_60559 defined"
+#endif
diff --git a/gcc/testsuite/gcc.dg/c2x-float-10.c 
b/gcc/testsuite/gcc.dg/c2x-float-10.c
new file mode 100644
index 000..7b53a6ab050
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-float-10.c
@@ -0,0 +1,33 @@
+/* Test *_IS_IEC_60559 macros.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+#include 
+
+#ifndef FLT_IS_IEC_60559
+#error "FLT_IS_IEC_60559 undefined"
+#endif
+
+#ifndef DBL_IS_IEC_60559
+#error "DBL_IS_IEC_60559 undefined"
+#endif
+
+#ifndef LDBL_IS_IEC_60559
+#error "LDBL_IS_IEC_60559 undefined"
+#endif
+
+#if defined __pdp11__ || defined __vax__
+_Static_assert (FLT_IS_IEC_60559 == 0);
+_Static_assert (DBL_IS_IEC_60559 == 0);
+_Static_assert (LDBL_IS_IEC_60559 == 0);
+#else
+_Static_assert (FLT_IS_IEC_60559 == 2);
+_Static_assert (DBL_IS_IEC_60559 == 2);
+#if LDBL_MANT_DIG == 106 || LDBL_MIN_EXP == -16382
+/* IBM long double and m68k extended format do not meet the definition
+   of an IEC 60559 interchange or extended format.  */
+_Static_assert (LDBL_IS_IEC_60559 == 0);
+#else
+_Static_assert (LDBL_IS_IEC_60559 == 2);
+#endif
+#endif

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [committed] libstdc++: Optimise std::future::wait_for and fix futex polling

2020-11-13 Thread Mike Crowe via Gcc-patches
On Friday 13 November 2020 at 17:25:22 +, Jonathan Wakely wrote:
> On 13/11/20 11:02 +, Jonathan Wakely wrote:
> > On 12/11/20 23:49 +, Jonathan Wakely wrote:
> > > To poll a std::future to see if it's ready you have to call one of the
> > > timed waiting functions. The most obvious way is wait_for(0s) but this
> > > was previously very inefficient because it would turn the relative
> > > timeout to an absolute one by calling system_clock::now(). When the
> > > relative timeout is zero (or less) we're obviously going to get a time
> > > that has already passed, but the overhead of obtaining the current time
> > > can be dozens of microseconds. The alternative is to call wait_until
> > > with an absolute timeout that is in the past. If you know the clock's
> > > epoch is in the past you can use a default constructed time_point.
> > > Alternatively, using some_clock::time_point::min() gives the earliest
> > > time point supported by the clock, which should be safe to assume is in
> > > the past. However, using a futex wait with an absolute timeout before
> > > the UNIX epoch fails and sets errno=EINVAL. The new code using futex
> > > waits with absolute timeouts was not checking for this case, which could
> > > result in hangs (or killing the process if the libray is built with
> > > assertions enabled).
> > > 
> > > This patch checks for times before the epoch before attempting to wait
> > > on a futex with an absolute timeout, which fixes the hangs or crashes.
> > > It also makes it very fast to poll using an absolute timeout before the
> > > epoch (because we skip the futex syscall).
> > > 
> > > It also makes future::wait_for avoid waiting at all when the relative
> > > timeout is zero or less, to avoid the unnecessary overhead of getting
> > > the current time. This makes polling with wait_for(0s) take only a few
> > > cycles instead of dozens of milliseconds.
> > > 
> > > libstdc++-v3/ChangeLog:
> > > 
> > >   * include/std/future (future::wait_for): Do not wait for
> > >   durations less than or equal to zero.
> > >   * src/c++11/futex.cc (_M_futex_wait_until)
> > >   (_M_futex_wait_until_steady): Do not wait for timeouts before
> > >   the epoch.
> > >   * testsuite/30_threads/future/members/poll.cc: New test.
> > > 
> > > Tested powerpc64le-linux. Committed to trunk.
> > > 
> > > I think the shortcut in future::wait_for is worth backporting. The
> > > changes in src/c++11/futex.cc are not needed because the code using
> > > absolute timeouts with futex waits is not present on any release
> > > branch.
> > 
> > I've committed this fix for the new test.
> 
> Backporting the change to gcc-10 revealed an overflow bug in the
> existing code, resulting in blocking for years when given an absolute
> timeout in the distant past. There's still a similar bug in the new
> code (using futexes with absolute timeouts against clocks) where a
> large chrono::seconds value can overflow and produce an incorrect
> tv_sec value. Apart from the overflow itself being UB, the result in
> that case is just a spurious wakeup (the call says it timed out when
> it didn't reach the specified time). That should still be fixed, but
> I'll do it separately.
> 
> Tested x86_64-linux. Committed to trunk.
> 
> 

> commit e7e0eeeb6e6707be2a6c6da49d4b6be3199e2af8
> Author: Jonathan Wakely 
> Date:   Fri Nov 13 15:19:04 2020
> 
> libstdc++: Avoid 32-bit time_t overflows in futex calls
> 
> The existing code doesn't check whether the chrono::seconds value is out
> of range of time_t. When using a timeout before the epoch (with a
> negative value) subtracting the current time (as time_t) and then
> assigning it to a time_t can overflow to a large positive value. This
> means that we end up waiting several years even though the specific
> timeout was in the distant past.
> 
> We do have a check for negative timeouts, but that happens after the
> conversion to time_t so happens after the overflow.
> 
> The conversion to a relative timeout is done in two places, so this
> factors it into a new function and adds the overflow checks there.
> 
> libstdc++-v3/ChangeLog:
> 
> * src/c++11/futex.cc (relative_timespec): New function to
> create relative time from two absolute times.
> (__atomic_futex_unsigned_base::_M_futex_wait_until)
> (__atomic_futex_unsigned_base::_M_futex_wait_until_steady):
> Use relative_timespec.
> 
> diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
> index 57f7dfe87e9e..c2b2d32e8c43 100644
> --- a/libstdc++-v3/src/c++11/futex.cc
> +++ b/libstdc++-v3/src/c++11/futex.cc
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #ifdef _GLIBCXX_USE_CLOCK_GETTIME_SYSCALL
> @@ -46,20 +47,55 @@ const unsigned futex_clock_realtime_flag = 256;
>  const unsigned futex_bitset_match_any = ~0;
>  const unsigned futex_wake_op = 1;
>  

[PATCH] pru: Add builtins for HALT and LMBD

2020-11-13 Thread Dimitar Dimitrov
Add builtins for HALT and LMBD, per Texas Instruments document
SPRUHV7C.  Use the new LMBD pattern to define an expand for clz.

Binutils [1] and sim [2] support for LMBD instruction are merged now.

[1] https://sourceware.org/pipermail/binutils/2020-October/113901.html
[2] https://sourceware.org/pipermail/gdb-patches/2020-November/173141.html

gcc/ChangeLog:

* config/pru/alu-zext.md: Add lmbd patterns for zero_extend
variants.
* config/pru/pru.c (enum pru_builtin): Add HALT and LMBD.
(pru_init_builtins): Ditto.
(pru_builtin_decl): Ditto.
(pru_expand_builtin): Ditto.
* config/pru/pru.h (CLZ_DEFINED_VALUE_AT_ZERO): Define PRU
value for CLZ with zero value parameter.
* config/pru/pru.md: Add halt, lmbd and clz patterns.
* doc/extend.texi: Document PRU builtins.

gcc/testsuite/ChangeLog:

* gcc.target/pru/halt.c: New test.
* gcc.target/pru/lmbd.c: New test.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/config/pru/alu-zext.md  | 51 
 gcc/config/pru/pru.c| 62 ++---
 gcc/config/pru/pru.h|  3 ++
 gcc/config/pru/pru.md   | 40 +++
 gcc/doc/extend.texi | 28 +
 gcc/testsuite/gcc.target/pru/halt.c |  9 +
 gcc/testsuite/gcc.target/pru/lmbd.c | 14 +++
 7 files changed, 201 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/pru/halt.c
 create mode 100644 gcc/testsuite/gcc.target/pru/lmbd.c

diff --git a/gcc/config/pru/alu-zext.md b/gcc/config/pru/alu-zext.md
index 65916c70d65..35a6dbdda79 100644
--- a/gcc/config/pru/alu-zext.md
+++ b/gcc/config/pru/alu-zext.md
@@ -37,6 +37,10 @@ (define_subst_attr "alu3_zext_op1" "alu3_zext_op1_subst" 
"_z1" "_noz1")
 (define_subst_attr "alu3_zext_op2" "alu3_zext_op2_subst" "_z2" "_noz2")
 (define_subst_attr "alu3_zext" "alu3_zext_subst" "_z" "_noz")
 
+(define_subst_attr "lmbd_zext_op1" "lmbd_zext_op1_subst" "_z1" "_noz1")
+(define_subst_attr "lmbd_zext_op2" "lmbd_zext_op2_subst" "_z2" "_noz2")
+(define_subst_attr "lmbd_zext" "lmbd_zext_subst" "_z"  "_noz")
+
 (define_subst_attr "bitalu_zext"   "bitalu_zext_subst"   "_z" "_noz")
 
 (define_code_iterator ALUOP3 [plus minus and ior xor umin umax ashift 
lshiftrt])
@@ -72,6 +76,19 @@ (define_insn 
"sub_impl__"
+  [(set (match_operand:EQD 0 "register_operand" "=r")
+   (unspec:EQD
+ [(zero_extend:EQD
+(match_operand:EQS0 1 "register_operand" "r"))
+  (zero_extend:EQD
+(match_operand:EQS1 2 "reg_or_ubyte_operand" 
"r"))]
+ UNSPEC_LMBD))]
+  ""
+  "lmbd\t%0, %1, %2"
+  [(set_attr "type" "alu")])
+
 (define_insn "neg_impl_"
   [(set (match_operand:EQD 0 "register_operand" "=r")
(neg:EQD
@@ -179,3 +196,37 @@ (define_subst "alu3_zext_op2_subst"
   [(set (match_dup 0)
(ALUOP3:EQD (zero_extend:EQD (match_dup 1))
(match_dup 2)))])
+
+
+(define_subst "lmbd_zext_subst"
+  [(set (match_operand:EQD 0)
+   (unspec:EQD [(zero_extend:EQD (match_operand:EQD 1))
+(zero_extend:EQD (match_operand:EQD 2))]
+   UNSPEC_LMBD))]
+  ""
+  [(set (match_dup 0)
+   (unspec:EQD [(match_dup 1)
+(match_dup 2)]
+   UNSPEC_LMBD))])
+
+(define_subst "lmbd_zext_op1_subst"
+  [(set (match_operand:EQD 0)
+   (unspec:EQD [(zero_extend:EQD (match_operand:EQD 1))
+(zero_extend:EQD (match_operand:EQS1 2))]
+   UNSPEC_LMBD))]
+  ""
+  [(set (match_dup 0)
+   (unspec:EQD [(match_dup 1)
+(zero_extend:EQD (match_dup 2))]
+   UNSPEC_LMBD))])
+
+(define_subst "lmbd_zext_op2_subst"
+  [(set (match_operand:EQD 0)
+   (unspec:EQD [(zero_extend:EQD (match_operand:EQD 1))
+(zero_extend:EQD (match_operand:EQD 2))]
+   UNSPEC_LMBD))]
+  ""
+  [(set (match_dup 0)
+   (unspec:EQD [(zero_extend:EQD (match_dup 1))
+(match_dup 2)]
+   UNSPEC_LMBD))])
diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c
index 39104e5f9cd..65ad6878a12 100644
--- a/gcc/config/pru/pru.c
+++ b/gcc/config/pru/pru.c
@@ -2705,6 +2705,8 @@ pru_reorg (void)
 enum pru_builtin
 {
   PRU_BUILTIN_DELAY_CYCLES,
+  PRU_BUILTIN_HALT,
+  PRU_BUILTIN_LMBD,
   PRU_BUILTIN_max
 };
 
@@ -2719,11 +2721,31 @@ pru_init_builtins (void)
 = build_function_type_list (void_type_node,
long_long_integer_type_node,
NULL);
+  tree uint_ftype_uint_uint
+= build_function_type_list (unsigned_type_node,
+   unsigned_type_node,
+   unsigned_type_node,
+   NULL);
+
+  tree void_ftype_void
+= build_function_type_list (void_type_node,
+   void_type_node,
+  

[PATCH] Add MODE_OPAQUE

2020-11-13 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey 

After discussion with Richard Sandiford on IRC, he suggested adding a
new mode class MODE_OPAQUE to deal with the problems (PR 96791) we had
been having with POImode/PXImode in powerpc target. This patch is the
accumulation of changes I needed to make to add this and make it useable
for the purposes of what power10 MMA needed.

MODE_OPAQUE modes allow you to have modes for which you can just
define loads and stores. By design, optimization does not expect to
know how to do arithmetic or subregs on these modes. This allows us to
have modes for multi-register vector operations where we don't want to
open Pandora's Box and define general arithmetic operations.

This patch will be followed by a target specific patch to change the
powerpc power10 MMA builtins to use opaque modes, and will also let use use
the vector pair loads/stores defined with that in the inline expansion
of memcpy/memmove, allowing me to fix PR 96791.

Regstrap in progress on ppc64le and x86_64, ok for trunk if successful?

Thanks,
   Aaron


 gcc/ChangeLog
 PR target/96791
 * mode-classes.def: Add MODE_OPAQUE.
 * machmode.def: Add OPAQUE_MODE.
 * tree.def: Add OPAQUE_TYPE for types that will use MODE_OPAQUE.
 * machmode.h: Add OPAQUE_MODE_P().
 * genmodes.c (complete_mode): Add MODE_OPAQUE.
 (opaque_mode): New function.
 * tree.c (tree_code_size): Add OPAQUE_TYPE.
 * tree.h: Add OPAQUE_TYPE_P().
 * tree-ssanames.c (get_nonzero_bits): OPAQUE_TYPE has an unknown
 number of nonzero bits.
 * stor-layout.c (int_mode_for_mode): Treat MODE_OPAQUE modes
 like BLKmode.
 * ira.c (find_moveable_pseudos): Treat MODE_OPAQUE modes more
 like integer/float modes here.
 * emit-rtl.c (init_emit_once): Create small rtx consts because we
 do want const0_rtx to work with opaque modes.
 * dbxout.c (dbxout_type): Treat OPAQUE_TYPE like VOID_TYPE.
 * tree-pretty-print.c (dump_generic_node): Treat OPAQUE_TYPE like
 like other types.

---
 gcc/dbxout.c|  1 +
 gcc/emit-rtl.c  |  3 +++
 gcc/genmodes.c  | 22 ++
 gcc/ira.c   |  3 ++-
 gcc/machmode.def|  3 +++
 gcc/machmode.h  |  4 
 gcc/mode-classes.def|  3 ++-
 gcc/stor-layout.c   |  3 +++
 gcc/tree-pretty-print.c |  1 +
 gcc/tree-ssanames.c |  3 +++
 gcc/tree.c  |  1 +
 gcc/tree.def|  6 ++
 gcc/tree.h  |  3 +++
 13 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/gcc/dbxout.c b/gcc/dbxout.c
index 5a20fdecdcc..eaee2f19ce0 100644
--- a/gcc/dbxout.c
+++ b/gcc/dbxout.c
@@ -1963,6 +1963,7 @@ dbxout_type (tree type, int full)
 case VOID_TYPE:
 case NULLPTR_TYPE:
 case LANG_TYPE:
+case OPAQUE_TYPE:
   /* For a void type, just define it as itself; i.e., "5=5".
 This makes us consider it defined
 without saying what it is.  The debugger will make it
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 3706f0a03fd..44a3b660bd0 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -6268,6 +6268,9 @@ init_emit_once (void)
   mode <= MAX_MODE_PARTIAL_INT;
   mode = (machine_mode)((int)(mode) + 1))
const_tiny_rtx[i][(int) mode] = GEN_INT (i);
+
+  FOR_EACH_MODE_IN_CLASS (mode, MODE_OPAQUE)
+   const_tiny_rtx[i][(int) mode] = GEN_INT (i);
 }
 
   const_tiny_rtx[3][(int) VOIDmode] = constm1_rtx;
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index bd78310ea24..369fe0aaec5 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -358,6 +358,14 @@ complete_mode (struct mode_data *m)
   m->component = 0;
   break;
 
+case MODE_OPAQUE:
+  /* Opaque modes have size and precision.  */
+  validate_mode (m, OPTIONAL, SET, UNSET, UNSET, UNSET);
+
+  m->ncomponents = 1;
+  m->component = 0;
+  break;
+
 case MODE_PARTIAL_INT:
   /* A partial integer mode uses ->component to say what the
 corresponding full-size integer mode is, and may also
@@ -588,6 +596,20 @@ make_int_mode (const char *name,
   m->precision = precision;
 }
 
+#define OPAQUE_MODE(N, B)  \
+  make_opaque_mode (#N, -1U, B, __FILE__, __LINE__)
+
+static void __attribute__((unused))
+make_opaque_mode (const char *name,
+ unsigned int precision,
+ unsigned int bytesize,
+ const char *file, unsigned int line)
+{
+  struct mode_data *m = new_mode (MODE_OPAQUE, name, file, line);
+  m->bytesize = bytesize;
+  m->precision = precision;
+}
+
 #define FRACT_MODE(N, Y, F) \
make_fixed_point_mode (MODE_FRACT, #N, Y, 0, F, __FILE__, __LINE__)
 
diff --git a/gcc/ira.c b/gcc/ira.c
index 050405f1833..d7a0482d121 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -4666,7 +4666,8 @@ find_moveable_pseudos (void)
|| !DF_REF_INSN_INFO (def)
|| HARD_REGISTER_NUM_P (regno)

[Bug c++/97820] New: VLAs in function declarations fail to compile

2020-11-13 Thread josephcsible at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97820

Bug ID: 97820
   Summary: VLAs in function declarations fail to compile
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Keywords: rejects-valid
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: josephcsible at gmail dot com
  Target Milestone: ---

Consider these 4 attempts at declaring a function:

void f1(int rows, int cols, int arr[rows][cols]);
void f2(int rows, int cols, int (*arr)[cols]);
void f3(int, int, int [*][*]);
void f4(int, int, int (*)[*]);

When I compile them with g++, they all have errors:

:1:41: error: use of parameter outside function body before ']' token
1 | void f1(int rows, int cols, int arr[rows][cols]);
  | ^
:1:47: error: use of parameter outside function body before ']' token
1 | void f1(int rows, int cols, int arr[rows][cols]);
  |   ^
:2:44: error: use of parameter outside function body before ']' token
2 | void f2(int rows, int cols, int (*arr)[cols]);
  |^
:3:25: error: expected primary-expression before ']' token
3 | void f3(int, int, int [*][*]);
  | ^
:3:28: error: expected primary-expression before ']' token
3 | void f3(int, int, int [*][*]);
  |^
:4:28: error: expected primary-expression before ']' token
4 | void f4(int, int, int (*)[*]);
  |^

Since we support VLAs in C++ as an extension, I expected that all of these
would work.

https://godbolt.org/z/Wca9ra

Re: [PATCH] dwarf2: Emit DW_TAG_unspecified_parameters even in late DWARF [PR97599]

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/13/20 10:54 AM, Jakub Jelinek via Gcc-patches wrote:
> Hi!
>
> Aldy's PR71855 fix avoided emitting multiple redundant
> DW_TAG_unspecified_parameters sub-DIEs of a single DIE by restricting
> it to early dwarf only.  That unfortunately means if we need to emit
> another DIE for the function (whether it is for LTO, or e.g. because of
> IPA cloning), we don't emit DW_TAG_unspecified_parameters, it remains
> solely in the DW_AT_abstract_origin's referenced DIE.
> But DWARF consumers don't really use DW_TAG_unspecified_parameters
> from there, like we duplicate DW_TAG_formal_parameter sub-DIEs even in the
> clones because either they have some more specific location, or e.g.
> a function clone could have fewer or different argument types etc.,
> they need to assume that originally stdarg function isn't later stdarg etc.
> Unfortunately, while for DW_TAG_formal_parameter sub-DIEs, we can use the
> hash tabs to look the PARM_DECLs if we already have the DIEs, for
> DW_TAG_unspecified_parameters we don't have an easy way to look it up.
>
> The following patch handles it by trying to figure out if we are creating a
> fresh new DIE (in that case we add DW_TAG_unspecified_parameters if it is
> stdarg), or if gen_subprogram_die is called again on an pre-existing DIE
> to fill in some further details (then it will not touch it).
>
> Except for lto, subr_die != old_die would be good enough, but unfortunately
> for LTO the new DIE that will refer to early dwarf created DIE is created
> on the fly during lookup_decl_die.  So the patch tracks if the DIE has
> no children before any children are added to it.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2020-11-13  Jakub Jelinek  
>
>   PR debug/97599
>   * dwarf2out.c (gen_subprogram_die): Call
>   gen_unspecified_parameters_die even if not early dwarf, but only
>   if subr_die is a newly created DIE.

OK

jeff



[Bug c++/97819] New: Pack expansion in member initializer lists nested with their parameter list got rejected.

2020-11-13 Thread gnaggnoyil at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97819

Bug ID: 97819
   Summary: Pack expansion in member initializer lists nested with
their parameter list got rejected.
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gnaggnoyil at gmail dot com
  Target Milestone: ---

The following code got accepted by Clang 10.0 and MSVC 19.27.29112, yet fails
to compile under GCC:

template 
class foo : public Ts...{
public:
template 
explicit foo(T &, Us &&...us)
//: Ts(Ts(t, us))...{} -> This line is accepted
//: Ts{t, us...}...{} -> This line is accepted too
: Ts(t, us...)...{} // This line is rejected
};

class bar{
public:
explicit bar(int){}
};

class qux{
public:
explicit qux(int){}
};

class baz{
public:
explicit baz(int){}
};

int main(){
foo x(3);
(void)x;
return 0;
}

I tried GCC version 4.9.3 with -std=c++14, 5.5.0, 6.3.0, 7.3.0 with -std=c++17,
and 8.3.0, 9.3.0, 10.1.0 with -std=c++2a, all of them rejects this code.

Error outputs for GCC 4.9.3, 5.5.0, 9.3.0, 10.1.0 ones are:

prog.cc: In instantiation of 'foo::foo(T&&, Us&& ...) [with T = int; Us =
{}; Ts = bar, baz, qux]':
prog.cc:27:27:   required from here
prog.cc:8:23: error: invalid use of pack expansion expression
 : Ts(t, us...)...{}
   ^

Error outputs for GCC 6.3.0, 7.3.0, 8.3.0 ones are:

prog.cc: In instantiation of 'foo::foo(T&&, Us&& ...) [with T = int; Us =
{}; Ts = {bar, baz, qux}]':
prog.cc:27:27:   required from here
prog.cc:8:23: error: no matching function for call to 'bar::bar(int&, bool)'
 : Ts(t, us...)...{}
   ^~~
prog.cc:13:14: note: candidate: bar::bar(int)
 explicit bar(int){}
  ^~~
prog.cc:13:14: note:   candidate expects 1 argument, 2 provided
prog.cc:11:7: note: candidate: constexpr bar::bar(const bar&)
 class bar{
   ^~~
prog.cc:11:7: note:   candidate expects 1 argument, 2 provided
prog.cc:11:7: note: candidate: constexpr bar::bar(bar&&)
prog.cc:11:7: note:   candidate expects 1 argument, 2 provided
prog.cc:8:23: error: no matching function for call to 'baz::baz(int&, bool)'
 : Ts(t, us...)...{}
   ^~~
prog.cc:23:14: note: candidate: baz::baz(int)
 explicit baz(int){}
  ^~~
prog.cc:23:14: note:   candidate expects 1 argument, 2 provided
prog.cc:21:7: note: candidate: constexpr baz::baz(const baz&)
 class baz{
   ^~~
prog.cc:21:7: note:   candidate expects 1 argument, 2 provided
prog.cc:21:7: note: candidate: constexpr baz::baz(baz&&)
prog.cc:21:7: note:   candidate expects 1 argument, 2 provided
prog.cc:8:23: error: no matching function for call to 'qux::qux(int&, bool)'
 : Ts(t, us...)...{}
   ^~~
prog.cc:18:14: note: candidate: qux::qux(int)
 explicit qux(int){}
  ^~~
prog.cc:18:14: note:   candidate expects 1 argument, 2 provided
prog.cc:16:7: note: candidate: constexpr qux::qux(const qux&)
 class qux{
   ^~~
prog.cc:16:7: note:   candidate expects 1 argument, 2 provided
prog.cc:16:7: note: candidate: constexpr qux::qux(qux&&)
prog.cc:16:7: note:   candidate expects 1 argument, 2 provided

[Bug fortran/97818] New: PDT Parameterized Derived Type fails with SIGABRT at -O1

2020-11-13 Thread adamjermyn at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97818

Bug ID: 97818
   Summary: PDT Parameterized Derived Type fails with SIGABRT at
-O1
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: adamjermyn at gmail dot com
  Target Milestone: ---

Created attachment 49558
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49558=edit
Minimal broken example. Compile with -O1. Should result in SIGABRT.

Hello,

I encountered an error (SIGABRT on macOS) using parameterized derived types in
Fortran with gfortran v10.2.0. The error only appears at -O1 or above. I've
attached a minimal broken example. An example of the error message is:

---
a.out(64288,0x116e325c0) malloc: *** error for object 0x7ffee429c8e8: pointer
being freed was not allocated
a.out(64288,0x116e325c0) malloc: *** set a breakpoint in malloc_error_break to
debug

Program received signal SIGABRT: Process abort signal.
---

The derived type supports arithmetic operations (+-/*), but for the minimal
example I've stripped this down to just addition, and removed the actual
addition logic (just leaving empty functions).

>From some testing, the error only appears when there are common
sub-expressions. So for instance in the attached example I've written


z = (x + 2.0) + ((x + 2.0) + 1.0)

which fails. When I manually break the expression up into smaller pieces that
don't share common sub-expressions (as below) it works:

y = x + 2.0
z = y + (y + 1.0)

I've tested ~10 examples like this and only ones where common sub-expressions
occur in a single function produce this problem.

I'm happy to provide a version of this example that has the arithmetic logic
filled in if that would be helpful.

If this is not a compiler problem and I'm doing something wrong I would greatly
appreciate being pointed in the right direction.

Many thanks,
Adam

Re: [PATCH] PR preprocessor/94657: use $AR, not 'ar',

2020-11-13 Thread Sergei Trofimovich via Gcc-patches
On Fri, 13 Nov 2020 11:45:56 -0700
Jeff Law  wrote:

> 
> On 4/22/20 4:05 PM, Sergei Trofimovich wrote:
> > From: Sergei Trofimovich 
> >
> > On system with 'ar' and '${CHOST}-ar' the latter is preferred.
> > as it might not match default 'ar'.
> >
> > Bug is initially reported downstream as https://bugs.gentoo.org/718004.
> >
> > libcpp/ChangeLog:
> >
> > PR libcpp/94657
> > * Makefile.in: use @AR@ placeholder
> > * configure.ac: use AC_CHECK_TOOL to find 'ar'
> > * configure: regenerate
> 
> This was subsumed by David Edelsohn's patch to libcpp and libdecnumber
> which does effectively the same thing.

Agreed. It was also mentioned in
  https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546756.html
Probably should not have changed the topic to be mre visible.

-- 

  Sergei


Re: [PATCH] c++: Predefine __STDCPP_THREADS__ in the compiler if thread model is not single

2020-11-13 Thread Tom Tromey
> "Jakub" == Jakub Jelinek via Gcc-patches  writes:

Jakub> 2020-11-13  Jakub Jelinek  

Jakub>  * c-cppbuiltin.c: Include configargs.h.
Jakub>  (c_cpp_builtins): For C++11 and later if THREAD_MODEL_SPEC is not
Jakub>  defined, predefine __STDCPP_THREADS__ to 1 unless thread_model is
Jakub>  "single".

Note this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63287

Tom


Re: [committed] libstdc++: Optimise std::future::wait_for and fix futex polling

2020-11-13 Thread Jonathan Wakely via Gcc-patches
>Backporting the change to gcc-10 revealed an overflow bug in the
>existing code, resulting in blocking for years when given an absolute
>timeout in the distant past. There's still a similar bug in the new
>code (using futexes with absolute timeouts against clocks) where a
>large chrono::seconds value can overflow and produce an incorrect
>tv_sec value. Apart from the overflow itself being UB, the result in
>that case is just a spurious wakeup (the call says it timed out when
>it didn't reach the specified time). That should still be fixed, but
>I'll do it separately.

And here's the separate fix for that other overflow. In fact it
doesn't just produce spurious wakeups. It either spins forever
(instead of sleeping gently) or incorrectly reports a timeout when it
should keep blocking. Either way, it's fixed now.

Tested x86_64-linux && powerpc64le-linux. Committed to trunk.


commit 91004436daaf8d54daa467908d1b634a1a352707
Author: Jonathan Wakely 
Date:   Fri Nov 13 19:11:02 2020

libstdc++: Avoid more 32-bit time_t overflows in futex calls

This fixes another overflow in code converting a std::chrono::seconds
duration to a time_t. This time in the new code using a futex wait with
an absolute timeout (so this one doesn't need to be backported to the
release branches).

A timeout after the epochalypse would overflow the tv_sec field,
producing an incorrect value. If that incorrect value happened to be
negative, the syscall would return with EINVAL and then the caller would
keep retrying, spinning until the timeout was reached.  If the value
happened to be positive, we would wake up too soon and incorrectly
report a timeout

libstdc++-v3/ChangeLog:

* src/c++11/futex.cc (relative_timespec): Add [[unlikely]]
attributes.
(__atomic_futex_unsigned_base::_M_futex_wait_until)
(__atomic_futex_unsigned_base::_M_futex_wait_until_steady):
Check for overflow.
* testsuite/30_threads/future/members/wait_until_overflow.cc:
New test.

diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index c2b2d32e8c43..15959cebee57 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -51,6 +51,8 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+  using __gnu_cxx::__int_traits;
+
 namespace
 {
   std::atomic futex_clock_realtime_unavailable;
@@ -74,10 +76,10 @@ namespace
 auto rel_s = abs_s.count() - now_s;
 
 // Avoid overflows
-if (rel_s > __gnu_cxx::__int_traits::__max)
-  rel_s = __gnu_cxx::__int_traits::__max;
-else if (rel_s < __gnu_cxx::__int_traits::__min)
-  rel_s = __gnu_cxx::__int_traits::__min;
+if (rel_s > __int_traits::__max) [[unlikely]]
+  rel_s = __int_traits::__max;
+else if (rel_s < __int_traits::__min) [[unlikely]]
+  rel_s = __int_traits::__min;
 
 // Convert the absolute timeout value to a relative timeout
 rt.tv_sec = rel_s;
@@ -111,14 +113,17 @@ namespace
   {
 	if (!futex_clock_realtime_unavailable.load(std::memory_order_relaxed))
 	  {
-	struct timespec rt;
-	rt.tv_sec = __s.count();
-	rt.tv_nsec = __ns.count();
-
 	// futex sets errno=EINVAL for absolute timeouts before the epoch.
-	if (__builtin_expect(rt.tv_sec < 0, false))
+	if (__s.count() < 0)
 	  return false;
 
+	struct timespec rt;
+	if (__s.count() > __int_traits::__max) [[unlikely]]
+	  rt.tv_sec = __int_traits::__max;
+	else
+	  rt.tv_sec = __s.count();
+	rt.tv_nsec = __ns.count();
+
 	if (syscall (SYS_futex, __addr,
 			 futex_wait_bitset_op | futex_clock_realtime_flag,
 			 __val, , nullptr, futex_bitset_match_any) == -1)
@@ -184,14 +189,17 @@ namespace
   {
 	if (!futex_clock_monotonic_unavailable.load(std::memory_order_relaxed))
 	  {
-	struct timespec rt;
-	rt.tv_sec = __s.count();
-	rt.tv_nsec = __ns.count();
-
 	// futex sets errno=EINVAL for absolute timeouts before the epoch.
-	if (__builtin_expect(rt.tv_sec < 0, false))
+	if (__s.count() < 0) [[unlikely]]
 	  return false;
 
+	struct timespec rt;
+	if (__s.count() > __int_traits::__max) [[unlikely]]
+	  rt.tv_sec = __int_traits::__max;
+	else
+	  rt.tv_sec = __s.count();
+	rt.tv_nsec = __ns.count();
+
 	if (syscall (SYS_futex, __addr,
 			 futex_wait_bitset_op | futex_clock_monotonic_flag,
 			 __val, , nullptr, futex_bitset_match_any) == -1)
diff --git a/libstdc++-v3/testsuite/30_threads/future/members/wait_until_overflow.cc b/libstdc++-v3/testsuite/30_threads/future/members/wait_until_overflow.cc
new file mode 100644
index ..8d6a5148ce3c
--- /dev/null
+++ b/libstdc++-v3/testsuite/30_threads/future/members/wait_until_overflow.cc
@@ -0,0 +1,48 @@
+// { dg-do run }
+// { dg-additional-options "-pthread" { target pthread } }
+// { dg-require-effective-target c++11 }
+// { 

Re: [PATCH 1/7] C-SKY: Add fpuv3 instructions and CK860 arch

2020-11-13 Thread Jeff Law via Gcc-patches


On 10/29/20 9:28 PM, Cooper Qu via Gcc-patches wrote:
> Hi gengqi,
>
> I could not find the patchs [3/7], [4/7] and [7/7]. Could you check
> the emails and send them again ?

That's strange, I have them in my inbox.  I can send them to you
directly -- if you could review them it'd be greatly appreciated.


jeff




Re: [PATCH] Asan changes for RISC-V.

2020-11-13 Thread Jeff Law via Gcc-patches


On 10/28/20 5:58 PM, Jim Wilson wrote:
> We have only riscv64 asan support, there is no riscv32 support as yet.  So I
> need to be able to conditionally enable asan support for the riscv target.  I
> implemented this by returning zero from the asan_shadow_offset function.  This
> requires a change to toplev.c and docs in target.def.
>
> The asan support works on a 5.5 kernel, but does not work on a 4.15 kernel.
> The problem is that the asan high memory region is a small wedge below
> 0x40.  The new kernel puts shared libraries at 0x3f and going
> down which works.  But the old kernel puts shared libraries at 0x20
> and going up which does not work, as it isn't in any recognized memory
> region.  This might be fixable with more asan work, but we don't really need
> support for old kernel versions.
>
> The asan port is curious in that it uses 1<<29 for the shadow offset, but all
> other 64-bit targets use a number larger than 1<<32.  But what we have is
> working OK for now.
>
> I did a make check RUNTESTFLAGS="asan.exp" on Fedora rawhide image running on
> qemu and the results look reasonable.
>
>   === gcc Summary ===
>
> # of expected passes  1905
> # of unexpected failures  11
> # of unsupported tests224
>
>   === g++ Summary ===
>
> # of expected passes  2002
> # of unexpected failures  6
> # of unresolved testcases 1
> # of unsupported tests175
>
> OK?
>
> Jim
>
> 2020-10-28  Jim Wilson  
>
>   gcc/
>   * config/riscv/riscv.c (riscv_asan_shadow_offset): New.
>   (TARGET_ASAN_SHADOW_OFFSET): New.
>   * doc/tm.texi: Regenerated.
>   * target.def (asan_shadow_offset); Mention that it can return zero.
>   * toplev.c (process_options): Check for and handle zero return from
>   targetm.asan_shadow_offset call.

I noticed you hadn't committed this change.  Just to be explicit, this
is OK for the trunk.


Thanks,

jeff




[Bug target/97787] [10/11 regression] 64bit mips lto: .symtab local symbol at index x (>= sh_info of y)

2020-11-13 Thread bunk at stusta dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97787

--- Comment #7 from Adrian Bunk  ---
(In reply to Richard Biener from comment #6)
> I see.  Still GCC or GAS produces a bogus object file (the original linker
> error).  It might be the new problem is an entirely different one?  It looks
> more and more like a target problem to me.

My guess would be that the situations where -mxgot is required on 64bit MIPS
are not (no longer?) handled properly with LTO.

Note that when compiling from precompiled sources the linker also exits with an
error, the main difference in that case is that the correct "relocation
truncated to fit" error message is not output in the LTO case.

More worrisome is that adding -mxgot to compiler and linker flags did not fix
it in the LTO case.

Re: [PATCH] Clarify the documentation for the ms_abi function attribute

2020-11-13 Thread Jeff Law via Gcc-patches


On 9/10/20 11:04 AM, Peter Jones via Gcc-patches wrote:
> Somewhere in the process of writing the documentation for the ms_abi
> function attribute, there has been some (justifiable) confusion about
> which calling conventions are which, and the documentation currently
> states that on non-Windows x86 targets we default to the "x86/AMD ABI".
>
> In the past I've heard "the AMD calling conventions" used to refer to
> the Microsoft parameter passing register usage, so I looked it up.  As
> far as I can tell, AMD does not specify *any* calling conventions for
> the 64-bit x86 instruction set.  The only times the AMD64 documentation
> refers to calling conventions or parameter passing are in a general
> introduction of PUSHA and ENTER, which do not implement a relevant part
> of either convention, and a section about saving SSE state when passing
> YMM/XMM register values as arguments.
>
> This patch changes the documentation to explicitly refer to either the
> "Microsoft ABI" or the "System V ELF ABI".
>
> Signed-off-by: Peter Jones 
> ---
>  ChangeLog   | 5 +
>  gcc/doc/extend.texi | 7 ---
>  2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/ChangeLog b/ChangeLog
> index b0239316868..691d4cd619c 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,8 @@
> +2020-09-10  Peter Jones  
> +
> + * gcc/doc/extend.texi: Clarify the documentation for the ms_abi
> + function attribute

Thanks.  I've pushed this to the trunk.  Sorry for the delays.


jeff




Re: [PATCH] c++: Predefine __STDCPP_THREADS__ in the compiler if thread model is not single

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/13/20 12:03 PM, John David Anglin wrote:
> On 2020-11-13 1:20 p.m., Jeff Law wrote:
>> On 11/13/20 10:29 AM, Jakub Jelinek via Gcc-patches wrote:
>>> Hi!
>>>
>>> The following patch predefines __STDCPP_THREADS__ macro to 1 if c++11 or
>>> later and thread model (e.g. printed by gcc -v) is not single.
>>> There are two targets not handled by this patch, those that define
>>> THREAD_MODEL_SPEC.  In one case - QNX - it looks just like a mistake
>>> to me, instead of setting thread_model=posix in config.gcc it uses
>>> THREAD_MODEL_SPEC macro to set it unconditionally to posix.
>>> The other is hpux10, which uses -threads option to decide if threads
>>> are enabled or not, but that option isn't really passed to the compiler.
>>> I think that is something that really should be solved in config/pa/
>>> instead, e.g. in the config/xxx/xxx-c.c targets usually set their own
>>> predefined macros and it could handle this, and either pass the option
>>> also to the compiler, or say predefine __STDCPP_THREADS__ if _DCE_THREADS
>>> macro is defined already (or -D_DCE_THREADS found on the command line),
>>> or whatever else.
>>>
>>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 
>>>
>>> 2020-11-13  Jakub Jelinek  
>>>
>>> * c-cppbuiltin.c: Include configargs.h.
>>> (c_cpp_builtins): For C++11 and later if THREAD_MODEL_SPEC is not
>>> defined, predefine __STDCPP_THREADS__ to 1 unless thread_model is
>>> "single".
>> OK.  Note that hpux10 should be considered long dead.   I wouldn't let
>> that get in the way of anything.  One could argue we should remove
>> hpux10 and earlier, leaving just hpux11.
> In principle, I agree.  But there are some intereactions in the header 
> defines and I have limited
> time at the moment.

ACK.  I don't think removing the old hpux stuff is a high priority.   My
primary point was that I think hpux10 is dead and we shouldn't let it
get in the way of making progress on platforms that are still viable.


jeff



Re: ubsan: d-demangle.c:214 signed integer overflow

2020-11-13 Thread Jeff Law via Gcc-patches


On 9/4/20 7:34 AM, Alan Modra via Gcc-patches wrote:
> So this one is on top of the previously posted patch.
>
>   * d-demangle.c (string_need): Take a size_t n arg, and use size_t tem.
>   (string_append): Use size_t n.
>   (string_appendn, string_prependn): Take a size_t n arg.
>   (TEMPLATE_LENGTH_UNKNOWN): Define as -1UL.
>   * d-demangle.c (dlang_number): Make "ret" an unsigned long*.
>   Only succeed for result of [0,4294967295UL].
>   (dlang_decode_backref): Only succeed for result [1,MAX_LONG].
>   (dlang_backref): Remove now unnecessary range check.
>   (dlang_symbol_name_p): Likewise.
>   (dlang_lname, dlang_parse_template): Take an unsigned long len
>   arg.
>   (dlang_symbol_backref, dlang_identifier, dlang_parse_integer),
>   (dlang_parse_integer, dlang_parse_string),
>   (dlang_parse_arrayliteral, dlang_parse_assocarray),
>   (dlang_parse_structlit, dlang_parse_tuple),
>   (dlang_template_symbol_param, dlang_template_args): Use
>   unsigned long variables.
>   * testsuite/d-demangle-expected: Add new tests.

Explicitly leaving this to Iain.


jeff




Re: [PATCH] c++: Predefine __STDCPP_THREADS__ in the compiler if thread model is not single

2020-11-13 Thread John David Anglin
On 2020-11-13 1:20 p.m., Jeff Law wrote:
> On 11/13/20 10:29 AM, Jakub Jelinek via Gcc-patches wrote:
>> Hi!
>>
>> The following patch predefines __STDCPP_THREADS__ macro to 1 if c++11 or
>> later and thread model (e.g. printed by gcc -v) is not single.
>> There are two targets not handled by this patch, those that define
>> THREAD_MODEL_SPEC.  In one case - QNX - it looks just like a mistake
>> to me, instead of setting thread_model=posix in config.gcc it uses
>> THREAD_MODEL_SPEC macro to set it unconditionally to posix.
>> The other is hpux10, which uses -threads option to decide if threads
>> are enabled or not, but that option isn't really passed to the compiler.
>> I think that is something that really should be solved in config/pa/
>> instead, e.g. in the config/xxx/xxx-c.c targets usually set their own
>> predefined macros and it could handle this, and either pass the option
>> also to the compiler, or say predefine __STDCPP_THREADS__ if _DCE_THREADS
>> macro is defined already (or -D_DCE_THREADS found on the command line),
>> or whatever else.
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 
>>
>> 2020-11-13  Jakub Jelinek  
>>
>>  * c-cppbuiltin.c: Include configargs.h.
>>  (c_cpp_builtins): For C++11 and later if THREAD_MODEL_SPEC is not
>>  defined, predefine __STDCPP_THREADS__ to 1 unless thread_model is
>>  "single".
> OK.  Note that hpux10 should be considered long dead.   I wouldn't let
> that get in the way of anything.  One could argue we should remove
> hpux10 and earlier, leaving just hpux11.
In principle, I agree.  But there are some intereactions in the header defines 
and I have limited
time at the moment.

Regards,
Dave

-- 
John David Anglin  dave.ang...@bell.net



[PATCH] Re: Fix gimple_expr_code?

2020-11-13 Thread Andrew MacLeod via Gcc-patches

On 11/13/20 4:05 AM, Richard Biener wrote:
On Thu, Nov 12, 2020 at 10:12 PM Andrew MacLeod > wrote:


On 11/12/20 3:53 PM, Richard Biener wrote:
> On November 12, 2020 9:43:52 PM GMT+01:00, Andrew MacLeod via
Gcc-patches mailto:gcc-patches@gcc.gnu.org>> wrote:
>> So I spent some time tracking down a ranger issue, and in the
end, it
>> boiled down to the range-op handler not being picked up properly.
>>
>> The handler is picked up by:
>>
>>    if ((gimple_code (s) == GIMPLE_ASSIGN) || (gimple_code (s) ==
>> GIMPLE_COND))
>>      return range_op_handler (gimple_expr_code (s),
gimple_expr_type
>> (s));
> IMHO this should use more specific functions. Gimple_expr_code
should go away similar to gimple_expr_type.

gimple_expr_type is quite pervasive.. and each consumer is going
to have
to roll their own version of it.  Why do we want to get rid of it?

If we are trying to save a few bytes by storing the information in
different places, then we're going to need some sort of accessing
function like that
>
>> where it is indexing the table with the gimple_expr_code..
>> the stmt being processed was for a pointer assignment,
>>    _5 = _33
>> and it was coming back with a gimple_expr_code of VAR_DECL
instead of
>> an SSA_NAME... which confused me greatly.
>>
>>
>> gimple_expr_code (const gimple *stmt)
>> {
>>    enum gimple_code code = gimple_code (stmt);
>>    if (code == GIMPLE_ASSIGN || code == GIMPLE_COND)
>>      return (enum tree_code) stmt->subcode;
>>
>> A little more digging shows this:
>>
>> static inline enum tree_code
>> gimple_assign_rhs_code (const gassign *gs)
>> {
>>    enum tree_code code = (enum tree_code) gs->subcode;
>>    /* While we initially set subcode to the TREE_CODE of the
rhs for
>>   GIMPLE_SINGLE_RHS assigns we do not update that subcode
to stay
>>   in sync when we rewrite stmts into SSA form or do SSA
>> propagations.  */
>>    if (get_gimple_rhs_class (code) == GIMPLE_SINGLE_RHS)
>>      code = TREE_CODE (gs->op[1]);
>>
>>    return code;
>> }
>>
>> Fascinating comment.
> ... 
>
>> But it means that gimple_expr_code() isn't returning the
correct result
>>
>> for GIMPLE_SINGLE_RHS
> It depends. A SSA name isn't an expression code either. As said,
the generic gimple_expr_code should be used with extreme care.

what is an expression code?  It seems like its just a tree_code
representing what is on the RHS?    Im not sure I understand why one
needs to be careful with it.  It only applies to COND, ASSIGN and
CALL.
and its current right for everything except GIMPLE_SINGLE_RHS?

If we dont fix gimple_expr_code, then Im basically going to be
reimplementing it myself... which seems kind of pointless.


Well sure we can fix it.  Your patch looks OK but can be optimized like

  if (gassign *ass = dyn_cast (stmt))
    return gimple_assign_rhs_code (stmt);

note it looks odd that we use this for gimple_assign but
directly access subcode for GIMPLE_COND instead
of returning gimple_cond_code () (again, operate on
gcond to avoid an extra check).

Thanks,
Richard.

Andrew



And with a little bit of const-ness...  I adjusted gimple_range_handler 
to use gassing and gcond as well.


Bootstrapped on x86_64-pc-linux-gnu, no regressions. pushed.

Andrew
commit fcbb6018abaf04d30e2cf6fff2eb35115412cdd5
Author: Andrew MacLeod 
Date:   Fri Nov 13 13:56:01 2020 -0500

Re: Fix gimple_expr_code?

have gimple_expr_code return the correct code for GIMPLE_ASSIGN.
use gassign and gcond in gimple_range_handler.

* gimple-range.h (gimple_range_handler): Cast to gimple stmt
kinds before asking for code and type.
* gimple.h (gimple_expr_code): Call gassign and gcond routines
to get their expr_code.

diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index dde41e9e743..92bb5305c18 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -97,12 +97,12 @@ extern bool gimple_range_calc_op2 (irange , const gimple *s,
 static inline range_operator *
 gimple_range_handler (const gimple *s)
 {
-  if (gimple_code (s) == GIMPLE_ASSIGN)
-return range_op_handler (gimple_assign_rhs_code (s),
-			 TREE_TYPE (gimple_assign_lhs (s)));
-  if (gimple_code (s) == GIMPLE_COND)
-return range_op_handler (gimple_cond_code (s),
-			 TREE_TYPE (gimple_cond_lhs (s)));
+  if (const gassign *ass = dyn_cast (s))
+return range_op_handler (gimple_assign_rhs_code (ass),
+			 TREE_TYPE (gimple_assign_lhs (ass)));
+  if (const gcond *cond = dyn_cast (s))
+return range_op_handler (gimple_cond_code (cond),
+			 TREE_TYPE (gimple_cond_lhs (cond)));
   return NULL;
 }
 
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 

Re: [PATCH] libiberty: Make strstr.c in libiberty ANSI compliant

2020-11-13 Thread Jeff Law via Gcc-patches


On 5/1/20 6:06 PM, Seija Kijin via Gcc-patches wrote:
> The original code in libiberty says "FIXME" and then says it has not been
> validated to be ANSI compliant. However, this patch changes the function to
> match implementations that ARE compliant, and such code is in the public
> domain.
>
> I ran the test results, and there are no test failures.

Thanks.  This seems to be the standard "simple" strstr implementation. 
There's significantly faster implementations available, but I doubt it's
worth the effort as the version in this file only gets used if there is
no system strstr.c.


I've pushed this patch to the trunk after fixing some minor formatting
issues

jeff




Re: [pushed] c++: Implement C++20 'using enum'. [PR91367]

2020-11-13 Thread Nathan Sidwell

On 11/13/20 1:35 PM, Jason Merrill via Gcc-patches wrote:

This feature allows the programmer to import enumerator names into the
current scope so later mentions don't need to use the fully-qualified name.
These usings are not subject to the usual restrictions on using-declarations:
in particular, they can move between class and non-class scopes, and between
classes that are not related by inheritance.  This last caused difficulty
for our normal approach to using-decls within a class hierarchy, as we
assume that the class where we looked up a used declaration is derived from
the class where it was first declared.  So to simplify things, in that case
we make a clone of the CONST_DECL in the using class.


Thanks for finishing this off!  Now, let's see what that broke in 
modules ...


nathan

--
Nathan Sidwell


[Bug c++/86252] Abstract class in function return type

2020-11-13 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86252

Jason Merrill  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jason at gcc dot gnu.org

Re: [PATCH] PR preprocessor/94657: use $AR, not 'ar',

2020-11-13 Thread Jeff Law via Gcc-patches


On 4/22/20 4:05 PM, Sergei Trofimovich wrote:
> From: Sergei Trofimovich 
>
> On system with 'ar' and '${CHOST}-ar' the latter is preferred.
> as it might not match default 'ar'.
>
> Bug is initially reported downstream as https://bugs.gentoo.org/718004.
>
> libcpp/ChangeLog:
>
>   PR libcpp/94657
>   * Makefile.in: use @AR@ placeholder
>   * configure.ac: use AC_CHECK_TOOL to find 'ar'
>   * configure: regenerate

This was subsumed by David Edelsohn's patch to libcpp and libdecnumber
which does effectively the same thing.

jeff



[Bug c++/88323] implement C++20 language features.

2020-11-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88323
Bug 88323 depends on bug 91370, which changed state.

Bug 91370 Summary: Implement P1041R4 and P1139R2: Stronger Unicode requirements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91370

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug c++/91370] Implement P1041R4 and P1139R2: Stronger Unicode requirements

2020-11-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91370

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #2 from Jakub Jelinek  ---
Fixed for 10.1+.

[Bug c++/88323] implement C++20 language features.

2020-11-13 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88323
Bug 88323 depends on bug 90537, which changed state.

Bug 90537 Summary: Implement P1286R2, Contra CWG DR1778
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90537

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug c++/91581] ICE on usage requiring complete class in exception-specification of defaulted method

2020-11-13 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91581
Bug 91581 depends on bug 90537, which changed state.

Bug 90537 Summary: Implement P1286R2, Contra CWG DR1778
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90537

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug c++/90537] Implement P1286R2, Contra CWG DR1778

2020-11-13 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90537

Jason Merrill  changed:

   What|Removed |Added

   Assignee|mpolacek at gcc dot gnu.org|jason at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
 CC||jason at gcc dot gnu.org

--- Comment #3 from Jason Merrill  ---
Implemented in r277351.

[Bug c++/88323] implement C++20 language features.

2020-11-13 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88323
Bug 88323 depends on bug 91367, which changed state.

Bug 91367 Summary: Implement P1099R5: using enum
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91367

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug c++/91367] Implement P1099R5: using enum

2020-11-13 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91367

Jason Merrill  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
 CC||jason at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jason at gcc dot gnu.org

--- Comment #1 from Jason Merrill  ---
Implemented for GCC 11.

[Bug c++/97388] By-value function parameter changes are rolled back prior to destructor call during constant evaluation

2020-11-13 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97388

Jason Merrill  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
 CC||jason at gcc dot gnu.org

--- Comment #6 from Jason Merrill  ---
Fixed.

[Bug debug/97060] Missing DW_AT_declaration=1 in dwarf data

2020-11-13 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060

Jason Merrill  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #15 from Jason Merrill  ---
Fixed for GCC 11.  The patch will also be backported to the Red Hat GCC 10
branch that has the same bug.

[Bug c/97817] -Wformat-truncation=2 elicits invalid warning

2020-11-13 Thread schwab--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97817

--- Comment #2 from Andreas Schwab  ---
But when it's 6 it's truncated.

Re: Split DWARF and rnglists, gcc vs clang

2020-11-13 Thread Mark Wielaard
Hi Simon,

On Fri, 2020-11-13 at 10:41 -0500, Simon Marchi wrote:
> So in the end the logical thing to do when encountering a
> DW_FORM_rnglistx in a split-unit, in order to support everybody, is
> probably to go to the .debug_rnglists.dwo section, if there's one,
> disregarding the (inherited) DW_AT_rnglists_base.  If there isn't, then
> try the linked file's .debug_rnglists section, using
> DW_AT_rnglists_base.  If there isn't, then something is malformed.

Yes, I think that makes sense.

> > I interpreted it as when there is a base attribute in the (skeleton)
> > unit, then the corresponding section (index table) can be found in the
> > main object file.
> 
> That doesn't work with how clang produces it, AFAIU.  There is a
> DW_AT_rnglists_base attribute in the skeleton and a .debug_rnglists in
> the linked file, which is used for the skeleton's DW_AT_ranges
> attribute.  And there is also .debug_rnglists.dwo sections in the DWO
> files.  So DW_FORM_rnglistx values in the skeleton use the
> .debug_rnglists in the linked file, while the DW_FORM_rnglistx values
> in the DWO file use the .debug_rnglists.dwo in that file (even though
> there is a DW_AT_rnglists_base in the skeleton).

I would have expected the skeleton's DW_AT_ranges to use
DW_FORM_secoffset, not DW_FORM_rnglistx. Precisely because you would
then get an ambiguity. But it would indeed be good to handle that
situation.

> > I think it depends on who exactly you ask and what their specific
> > goals/setups are. Both things, reducing the number of relocations and
> > moving data out of the main object file, are independently useful in
> > different context. But I think it is mainly reducing the number of
> > relocations that is beneficial. For example clang (but not yet gcc)
> > supports having the .dwo sections themselves in the main object file
> > (using SHF_EXCLUDED for the .dwo sections, so the linker will still
> > skip them). Which is also a possibility that the spec describes and
> > which really makes split DWARF much more usable, because then you don't
> > need to change your build system to deal with multiple output files.
> 
> Not sure I understand.  Does that mean that the .dwo sections are
> emitted in the .o files, and that's the end of the road for them?  The
> DW_AT_dwo_name attributes of the skeletons then refer to the .o files?

Yes, precisely. I am not sure whether it is already in any released
clang, but if it is you could try -gsplit-dwarf=single to see an
example.

Note that elfutils libdw doesn't yet handle that variant. Luckily not
because of a design issue, but because there are some sanity checks
that trigger when seeing a .debug_xxx and .debug_xxx.dwo section in the
same file. I have a partial patch to fix that and make it so that you
can explicitly open a file as either a "main" Dwarf or "split" Dwarf.
The only thing it doesn't do yet is share the file handle between the
Dwarf object (which isn't strictly needed, but would be a nice
optimization).

I actually think having a "single" split-dwarf file (.o == .dwo) is the
best way to support Split Dwarf more generically because then it would
simply work without having to adjust all build systems to work
with/around separate .dwo files.

Cheers,

Mark


[pushed] c++: Implement C++20 'using enum'. [PR91367]

2020-11-13 Thread Jason Merrill via Gcc-patches
This feature allows the programmer to import enumerator names into the
current scope so later mentions don't need to use the fully-qualified name.
These usings are not subject to the usual restrictions on using-declarations:
in particular, they can move between class and non-class scopes, and between
classes that are not related by inheritance.  This last caused difficulty
for our normal approach to using-decls within a class hierarchy, as we
assume that the class where we looked up a used declaration is derived from
the class where it was first declared.  So to simplify things, in that case
we make a clone of the CONST_DECL in the using class.

Thanks to Nathan for the start of this work: in particular, the
lookup_using_decl rewrite.

The changes to dwarf2out revealed an existing issue with the D front-end: we
were doing the wrong thing for importing a D CONST_DECL, because
dwarf2out_imported_module_or_decl_1 was looking through it to its type,
expecting it to be an enumerator, but in one case in thread.d, the constant
had type int.  Adding the ability to import a C++ enumerator also fixed
that, but that led to a crash in force_decl_die, which didn't know what to
do with a CONST_DECL.  So now it does.

Tested x86_64-pc-linux-gnu, applying to trunk.

Co-authored-by: Nathan Sidwell 

gcc/cp/ChangeLog:

* cp-tree.h (USING_DECL_UNRELATED_P): New.
(CONST_DECL_USING_P): New.
* class.c (handle_using_decl): If USING_DECL_UNRELATED_P,
clone the CONST_DECL.
* name-lookup.c (supplement_binding_1): A clone hides its
using-declaration.
(lookup_using_decl): Rewrite to separate lookup and validation.
(do_class_using_decl): Adjust.
(finish_nonmember_using_decl): Adjust.
* parser.c (make_location): Add cp_token overload.
(finish_using_decl): Split out from...
(cp_parser_using_declaration): ...here.  Don't look through enums.
(cp_parser_using_enum): New.
(cp_parser_block_declaration): Call it.
(cp_parser_member_declaration): Call it.
* semantics.c (finish_id_expression_1): Handle enumerator
used from class scope.

gcc/ChangeLog:

* dwarf2out.c (gen_enumeration_type_die): Call
equate_decl_number_to_die for enumerators.
(gen_member_die): Don't move enumerators to their
enclosing class.
(dwarf2out_imported_module_or_decl_1): Allow importing
individual enumerators.
(force_decl_die): Handle CONST_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/inh-ctor28.C: Adjust expected diagnostic.
* g++.dg/cpp0x/inh-ctor33.C: Likewise.
* g++.dg/cpp0x/using-enum-1.C: Add comment.
* g++.dg/cpp0x/using-enum-2.C: Allowed in C++20.
* g++.dg/cpp0x/using-enum-3.C: Likewise.
* g++.dg/cpp1z/class-deduction69.C: Adjust diagnostic.
* g++.dg/inherit/using5.C: Likewise.
* g++.dg/cpp2a/using-enum-1.C: New test.
* g++.dg/cpp2a/using-enum-2.C: New test.
* g++.dg/cpp2a/using-enum-3.C: New test.
* g++.dg/cpp2a/using-enum-4.C: New test.
* g++.dg/cpp2a/using-enum-5.C: New test.
* g++.dg/cpp2a/using-enum-6.C: New test.
* g++.dg/debug/dwarf2/using-enum.C: New test.
---
 gcc/cp/cp-tree.h  |  11 +
 gcc/cp/class.c|  17 ++
 gcc/cp/name-lookup.c  | 278 +++---
 gcc/cp/parser.c   | 145 +++--
 gcc/cp/semantics.c|  14 +-
 gcc/dwarf2out.c   |  16 +-
 gcc/testsuite/g++.dg/cpp0x/inh-ctor28.C   |   2 +-
 gcc/testsuite/g++.dg/cpp0x/inh-ctor33.C   |   2 +-
 gcc/testsuite/g++.dg/cpp0x/using-enum-1.C |   3 +
 gcc/testsuite/g++.dg/cpp0x/using-enum-2.C |  11 +-
 gcc/testsuite/g++.dg/cpp0x/using-enum-3.C |  15 +-
 .../g++.dg/cpp1z/class-deduction69.C  |   2 +-
 gcc/testsuite/g++.dg/cpp2a/using-enum-1.C |  62 
 gcc/testsuite/g++.dg/cpp2a/using-enum-2.C |  48 +++
 gcc/testsuite/g++.dg/cpp2a/using-enum-3.C |   6 +
 gcc/testsuite/g++.dg/cpp2a/using-enum-4.C |  13 +
 gcc/testsuite/g++.dg/cpp2a/using-enum-5.C | 132 +
 gcc/testsuite/g++.dg/cpp2a/using-enum-6.C |   5 +
 .../g++.dg/debug/dwarf2/using-enum.C  |  21 ++
 gcc/testsuite/g++.dg/inherit/using5.C |   2 +-
 20 files changed, 659 insertions(+), 146 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/using-enum-1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/using-enum-2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/using-enum-3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/using-enum-4.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/using-enum-5.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/using-enum-6.C
 create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/using-enum.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 63724c0e84f..9ae6ff5f7a2 

[PATCH] vect: Add a “very cheap” cost model

2020-11-13 Thread Richard Sandiford via Gcc-patches
Currently we have three vector cost models: cheap, dynamic and
unlimited.  -O2 -ftree-vectorize uses “cheap” by default, but that's
still relatively aggressive about peeling and aliasing checks,
and can lead to significant code size growth.

This patch adds an even more conservative choice, which for lack of
imagination I've called “very cheap”.  It only allows vectorisation
if the vector code entirely replaces the scalar code.  It also
requires one iteration of the vector loop to pay for itself,
regardless of how often the loop iterates.  (If the vector loop
needs multiple iterations to be beneficial then things are
probably too close to call, and the conservative thing would
be to stick with the scalar code.)

The idea is that this should be suitable for -O2, although the patch
doesn't change any defaults itself.

I tested this by building and running a bunch of workloads for SVE,
with three options:

  (1) -O2
  (2) -O2 -ftree-vectorize -fvect-cost-model=very-cheap
  (3) -O2 -ftree-vectorize [-fvect-cost-model=cheap]

All three builds used the default -msve-vector-bits=scalable and
ran with the minimum vector length of 128 bits, which should give
a worst-case bound for the performance impact.

The workloads included a mixture of microbenchmarks and full
applications.  Because it's quite an eclectic mix, there's not
much point giving exact figures.  The aim was more to get a general
impression.

Code size growth with (2) was much lower than with (3).  Only a
handful of tests increased by more than 5%, and all of them were
microbenchmarks.

In terms of performance, (2) was significantly faster than (1)
on microbenchmarks (as expected) but also on some full apps.
Again, performance only regressed on a handful of tests.

As expected, the performance of (3) vs. (1) and (3) vs. (2) is more
of a mixed bag.  There are several significant improvements with (3)
over (2), but also some (smaller) regressions.  That seems to be in
line with -O2 -ftree-vectorize being a kind of -O2.5.

The patch reorders vect_cost_model so that values are in order
of increasing aggressiveness, which makes it possible to use
range checks.  The value 0 still represents “unlimited”,
so “if (flag_vect_cost_model)” is still a meaningful check.

Tested on aarch64-linux-gnu, arm-linux-gnueabihf and
x86_64-linux-gnu.  OK to install?

Richard


gcc/
* doc/invoke.texi (-fvect-cost-model): Add a very-cheap model.
* common.opt (fvect-cost-model=): Add very-cheap as a possible option.
(fsimd-cost-model=): Likewise.
(vect_cost_model): Add very-cheap.
* flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP.
Put the values in order of increasing aggressiveness.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use
range checks when comparing against VECT_COST_MODEL_CHEAP.
(vect_prune_runtime_alias_test_list): Do not allow any alias
checks for the very-cheap cost model.
* tree-vect-loop.c (vect_analyze_loop_costing): Do not allow
any peeling for the very-cheap cost model.  Also require one
iteration of the vector loop to pay for itself.

gcc/testsuite/
* gcc.dg/vect/vect-cost-model-1.c: New test.
* gcc.dg/vect/vect-cost-model-2.c: Likewise.
* gcc.dg/vect/vect-cost-model-3.c: Likewise.
* gcc.dg/vect/vect-cost-model-4.c: Likewise.
* gcc.dg/vect/vect-cost-model-5.c: Likewise.
* gcc.dg/vect/vect-cost-model-6.c: Likewise.
---
 gcc/common.opt|  7 +++--
 gcc/doc/invoke.texi   | 11 ++--
 gcc/flag-types.h  | 10 ---
 gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c | 11 
 gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c | 11 
 gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c | 11 
 gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c | 11 
 gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c | 11 
 gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c | 12 +
 gcc/tree-vect-data-refs.c |  8 --
 gcc/tree-vect-loop.c  | 27 +++
 11 files changed, 120 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 7d0e0d9c88a..6ae613e3743 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3008,11 +3008,11 @@ Enable basic block vectorization (SLP) on trees.
 
 fvect-cost-model=
 Common Joined RejectNegative Enum(vect_cost_model) Var(flag_vect_cost_model) 
Init(VECT_COST_MODEL_DEFAULT) Optimization

[Bug c/97817] -Wformat-truncation=2 elicits invalid warning

2020-11-13 Thread jim at meyering dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97817

--- Comment #1 from jim at meyering dot net ---
I confirmed this happens both with the very latest built from git: gcc version
11.0.0 20201113 (experimental) (GCC), and Fedora 32's gcc version 10.2.1
20201016 (Red Hat 10.2.1-6) (GCC).

Re: [PATCH] c++: Predefine __STDCPP_THREADS__ in the compiler if thread model is not single

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/13/20 10:29 AM, Jakub Jelinek via Gcc-patches wrote:
> Hi!
>
> The following patch predefines __STDCPP_THREADS__ macro to 1 if c++11 or
> later and thread model (e.g. printed by gcc -v) is not single.
> There are two targets not handled by this patch, those that define
> THREAD_MODEL_SPEC.  In one case - QNX - it looks just like a mistake
> to me, instead of setting thread_model=posix in config.gcc it uses
> THREAD_MODEL_SPEC macro to set it unconditionally to posix.
> The other is hpux10, which uses -threads option to decide if threads
> are enabled or not, but that option isn't really passed to the compiler.
> I think that is something that really should be solved in config/pa/
> instead, e.g. in the config/xxx/xxx-c.c targets usually set their own
> predefined macros and it could handle this, and either pass the option
> also to the compiler, or say predefine __STDCPP_THREADS__ if _DCE_THREADS
> macro is defined already (or -D_DCE_THREADS found on the command line),
> or whatever else.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 
>
> 2020-11-13  Jakub Jelinek  
>
>   * c-cppbuiltin.c: Include configargs.h.
>   (c_cpp_builtins): For C++11 and later if THREAD_MODEL_SPEC is not
>   defined, predefine __STDCPP_THREADS__ to 1 unless thread_model is
>   "single".

OK.  Note that hpux10 should be considered long dead.   I wouldn't let
that get in the way of anything.  One could argue we should remove
hpux10 and earlier, leaving just hpux11.


Jeff




Re: [PATCH] testsuite: guality/redeclaration1.C test workaround

2020-11-13 Thread Jeff Law via Gcc-patches


On 11/13/20 10:37 AM, Jakub Jelinek via Gcc-patches wrote:
> Hi!
>
> Apparently older GDB versions didn't handle this test right and so while
> it has been properly printing 42 on line 14 (e.g. on x86_64), it issued
> a weird error on line 17 (and because it didn't print any value, guality
> testsuite wasn't marking it as FAIL).
> That has been apparently fixed in GDB 10, where it now (on x86_64) prints
> properly.
> Unfortunately that revealed that the test can suffer from instruction
> scheduling, where e.g. on i686 (but various other arches) the very first
> insn of the function (or whatever b 14 is on) happens to be load of the
> S::i variable from memory and that insn has the inner lexical scope, so
> GDB 10 prints there 24 instead of 42.  The following insn is then
> the first store to l and there the automatic i is in scope and prints as 42
> and then the second store to l where the inner lexical scope is current
> and prints 24 again.
> The test wasn't meant about insn scheduling but about whether we emit the
> DIEs properly, so this hack attempts to prevent the undesirable scheduling.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2020-11-13  Jakub Jelinek  
>
>   * g++.dg/guality/redeclaration1.C (p): New variable.
>   (S::f): Increment what p points to before storing S::i into l.  Adjust
>   gdb-test line numbers.
>   (main): Initialize p to address of an automatic variable.

OK

jeff




[Bug c/97817] New: -Wformat-truncation=2 elicits invalid warning

2020-11-13 Thread jim at meyering dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97817

Bug ID: 97817
   Summary: -Wformat-truncation=2 elicits invalid warning
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jim at meyering dot net
  Target Milestone: ---

Here's the invalid warning.
The buffer's size is obviously not 6. It is AT LEAST 6.

$ gcc -Wformat-truncation=2 -O2 -c strerror_r.c
strerror_r.c: In function ‘strerror_r’:
strerror_r.c:12:35: warning: ‘Unknown error ’ directive output truncated
writing 14 bytes into a region of size 6 [-Wformat-truncation=]
   12 | snprintf (buf, buflen, "Unknown error %d", errnum);
  | ~~^~~~
strerror_r.c:12:5: note: ‘snprintf’ output between 16 and 26 bytes into a
destination of size 6
   12 | snprintf (buf, buflen, "Unknown error %d", errnum);
  | ^~

Here's the reduced test case (from gnulib's strerror.c):

$ cat strerror_r.c
#define size_t unsigned long long
extern int snprintf (char *__restrict __s, size_t __maxlen,
 const char *__restrict __format, ...)
  __attribute__ ((__format__ (__printf__, 3, 4)));
extern int __xpg_strerror_r (int errnum, char *buf, size_t buflen);
int strerror_r (int errnum, char *buf, size_t buflen)
{
  if (buflen <= 5)
return 9;
  int ret = __xpg_strerror_r (errnum, buf, buflen);
  if (ret == 1 && !*buf)
snprintf (buf, buflen, "Unknown error %d", errnum);
  return ret;
}

[COMMITTED] Implementation of asm goto outputs

2020-11-13 Thread Vladimir Makarov via Gcc-patches
The original patch has been modified according to the reviewers comments 
and the following patch has been committed.



commit e3b3b59683c1e7d31a9d313dd97394abebf644be
Author: Vladimir N. Makarov 
Date:   Fri Nov 13 12:45:59 2020 -0500

[PATCH] Implementation of asm goto outputs

gcc/
* cfgexpand.c (expand_asm_stmt): Output asm goto with outputs too.
Place insns after asm goto on edges.
* doc/extend.texi: Reflect the changes in asm goto documentation.
* gimple.c (gimple_build_asm_1): Remove an assert checking output
absence for asm goto.
* gimple.h (gimple_asm_label_op, gimple_asm_set_label_op): Take
possible asm goto outputs into account.
* ira.c (ira): Remove critical edges for potential asm goto output
reloads.
(ira_nullify_asm_goto): New function.
* ira.h (ira_nullify_asm_goto): New prototype.
* lra-assigns.c (lra_split_hard_reg_for): Use ira_nullify_asm_goto.
Check that splitting is done inside a basic block.
* lra-constraints.c (curr_insn_transform): Permit output reloads
for any jump insn.
* lra-spills.c (lra_final_code_change): Remove USEs added in ira
for asm gotos.
* lra.c (lra_process_new_insns): Place output reload insns after
jumps in the beginning of destination BBs.
* reload.c (find_reloads): Report error for asm gotos with
outputs.  Modify them to keep CFG consistency to avoid crashes.
* tree-into-ssa.c (rewrite_stmt): Don't put debug stmt after asm
goto.

gcc/c/
* c-parser.c (c_parser_asm_statement): Parse outputs for asm
goto too.
* c-typeck.c (build_asm_expr): Remove an assert checking output
absence for asm goto.

gcc/cp
* parser.c (cp_parser_asm_definition): Parse outputs for asm
goto too.

gcc/testsuite/
* c-c++-common/asmgoto-2.c: Permit output in asm goto.
* gcc.c-torture/compile/asmgoto-2.c: New.
* gcc.c-torture/compile/asmgoto-3.c: New.
* gcc.c-torture/compile/asmgoto-4.c: New.
* gcc.c-torture/compile/asmgoto-5.c: New.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index f4c4cf7bf8f..7540a15d65d 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7144,10 +7144,7 @@ c_parser_asm_statement (c_parser *parser)
 	switch (section)
 	  {
 	  case 0:
-	/* For asm goto, we don't allow output operands, but reserve
-	   the slot for a future extension that does allow them.  */
-	if (!is_goto)
-	  outputs = c_parser_asm_operands (parser);
+	outputs = c_parser_asm_operands (parser);
 	break;
 	  case 1:
 	inputs = c_parser_asm_operands (parser);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 26a5f7128d2..413109c916c 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -10666,10 +10666,6 @@ build_asm_expr (location_t loc, tree string, tree outputs, tree inputs,
   TREE_VALUE (tail) = input;
 }
 
-  /* ASMs with labels cannot have outputs.  This should have been
- enforced by the parser.  */
-  gcc_assert (outputs == NULL || labels == NULL);
-
   args = build_stmt (loc, ASM_EXPR, string, outputs, inputs, clobbers, labels);
 
   /* asm statements without outputs, including simple ones, are treated
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 1b7bdbc15be..1df6f4bc55a 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3371,20 +3371,21 @@ expand_asm_stmt (gasm *stmt)
 			   ARGVEC CONSTRAINTS OPNAMES))
  If there is more than one, put them inside a PARALLEL.  */
 
-  if (nlabels > 0 && nclobbers == 0)
-{
-  gcc_assert (noutputs == 0);
-  emit_jump_insn (body);
-}
-  else if (noutputs == 0 && nclobbers == 0)
+  if (noutputs == 0 && nclobbers == 0)
 {
   /* No output operands: put in a raw ASM_OPERANDS rtx.  */
-  emit_insn (body);
+  if (nlabels > 0)
+	emit_jump_insn (body);
+  else
+	emit_insn (body);
 }
   else if (noutputs == 1 && nclobbers == 0)
 {
   ASM_OPERANDS_OUTPUT_CONSTRAINT (body) = constraints[0];
-  emit_insn (gen_rtx_SET (output_rvec[0], body));
+  if (nlabels > 0)
+	emit_jump_insn (gen_rtx_SET (output_rvec[0], body));
+  else 
+	emit_insn (gen_rtx_SET (output_rvec[0], body));
 }
   else
 {
@@ -3461,7 +3462,27 @@ expand_asm_stmt (gasm *stmt)
   if (after_md_seq)
 emit_insn (after_md_seq);
   if (after_rtl_seq)
-emit_insn (after_rtl_seq);
+{
+  if (nlabels == 0)
+	emit_insn (after_rtl_seq);
+  else
+	{
+	  edge e;
+	  edge_iterator ei;
+	  
+	  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+	{
+	  start_sequence ();
+	  for (rtx_insn *curr = after_rtl_seq;
+		   curr != NULL_RTX;
+		   curr = NEXT_INSN (curr))
+		emit_insn (copy_insn (PATTERN (curr)));
+	  rtx_insn 

[committed] openmp: Support allocate for C/C++ array section reductions

2020-11-13 Thread Jakub Jelinek via Gcc-patches
Hi!

This adds allocate clause support for array section reductions.
Furthermore, it fixes one bug that would cause inscan reductions with
allocate to be rejected by C, and for now just ignores allocate for
inscan/task reductions, that will need slightly more work.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2020-11-13  Jakub Jelinek  

gcc/
* omp-low.c (scan_sharing_clauses): For now remove for reduction
clauses with inscan or task modifiers decl from allocate_map.
(lower_private_allocate): Handle TYPE_P (new_var).
(lower_rec_input_clauses): Handle allocate clause for C/C++ array
reductions.
gcc/c/
* c-typeck.c (c_finish_omp_clauses): Don't clear
OMP_CLAUSE_REDUCTION_INSCAN unless reduction_seen == -2.
libgomp/
* testsuite/libgomp.c-c++-common/allocate-1.c (foo): Add tests
for array reductions.
(main): Adjust foo callers.

--- gcc/omp-low.c.jj2020-11-12 21:37:53.909422916 +0100
+++ gcc/omp-low.c   2020-11-13 15:55:09.479302108 +0100
@@ -1197,6 +1197,14 @@ scan_sharing_clauses (tree clauses, omp_
  if (is_oacc_parallel_or_serial (ctx) || is_oacc_kernels (ctx))
ctx->local_reduction_clauses
  = tree_cons (NULL, c, ctx->local_reduction_clauses);
+ if ((OMP_CLAUSE_REDUCTION_INSCAN (c)
+  || OMP_CLAUSE_REDUCTION_TASK (c)) && ctx->allocate_map)
+   {
+ tree decl = OMP_CLAUSE_DECL (c);
+ /* For now.  */
+ if (ctx->allocate_map->get (decl))
+   ctx->allocate_map->remove (decl);
+   }
  /* FALLTHRU */
 
case OMP_CLAUSE_IN_REDUCTION:
@@ -4392,13 +4400,17 @@ lower_private_allocate (tree var, tree n
   if (allocator)
 return false;
   gcc_assert (allocate_ptr == NULL_TREE);
-  if (ctx->allocate_map && DECL_P (new_var))
+  if (ctx->allocate_map
+  && (DECL_P (new_var) || (TYPE_P (new_var) && size)))
 if (tree *allocatorp = ctx->allocate_map->get (var))
   allocator = *allocatorp;
   if (allocator == NULL_TREE)
 return false;
   if (!is_ref && omp_is_reference (var))
-return false;
+{
+  allocator = NULL_TREE;
+  return false;
+}
 
   if (TREE_CODE (allocator) != INTEGER_CST)
 allocator = build_outer_var_ref (allocator, ctx);
@@ -4410,19 +4422,24 @@ lower_private_allocate (tree var, tree n
   allocator = var;
 }
 
-  tree ptr_type, align, sz;
-  if (is_ref)
+  tree ptr_type, align, sz = size;
+  if (TYPE_P (new_var))
+{
+  ptr_type = build_pointer_type (new_var);
+  align = build_int_cst (size_type_node, TYPE_ALIGN_UNIT (new_var));
+}
+  else if (is_ref)
 {
   ptr_type = build_pointer_type (TREE_TYPE (TREE_TYPE (new_var)));
   align = build_int_cst (size_type_node,
 TYPE_ALIGN_UNIT (TREE_TYPE (ptr_type)));
-  sz = size;
 }
   else
 {
   ptr_type = build_pointer_type (TREE_TYPE (new_var));
   align = build_int_cst (size_type_node, DECL_ALIGN_UNIT (new_var));
-  sz = fold_convert (size_type_node, DECL_SIZE_UNIT (new_var));
+  if (sz == NULL_TREE)
+   sz = fold_convert (size_type_node, DECL_SIZE_UNIT (new_var));
 }
   if (TREE_CODE (sz) != INTEGER_CST)
 {
@@ -4855,7 +4872,23 @@ lower_rec_input_clauses (tree clauses, g
  tree type = TREE_TYPE (d);
  gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
  tree v = TYPE_MAX_VALUE (TYPE_DOMAIN (type));
+ tree sz = v;
  const char *name = get_name (orig_var);
+ if (pass != 3 && !TREE_CONSTANT (v))
+   {
+ tree t = maybe_lookup_decl (v, ctx);
+ if (t)
+   v = t;
+ else
+   v = maybe_lookup_decl_in_outer_ctx (v, ctx);
+ gimplify_expr (, ilist, NULL, is_gimple_val, fb_rvalue);
+ t = fold_build2_loc (clause_loc, PLUS_EXPR,
+  TREE_TYPE (v), v,
+  build_int_cst (TREE_TYPE (v), 1));
+ sz = fold_build2_loc (clause_loc, MULT_EXPR,
+   TREE_TYPE (v), t,
+   TYPE_SIZE_UNIT (TREE_TYPE (type)));
+   }
  if (pass == 3)
{
  tree xv = create_tmp_var (ptr_type_node);
@@ -4913,6 +4946,13 @@ lower_rec_input_clauses (tree clauses, g
  gimplify_assign (cond, x, ilist);
  x = xv;
}
+ else if (lower_private_allocate (var, type, allocator,
+  allocate_ptr, ilist, ctx,
+  true,
+  TREE_CONSTANT (v)
+  ? TYPE_SIZE_UNIT (type)
+  : sz))
+  

  1   2   3   >