date:20210517

[PATCH] RISC-V: Properly parse the letter 'p' in '-march'.

2021-05-17 Thread Geng Qi via Gcc-patches

gcc/ChangeLog:
* common/config/riscv/riscv-common.c
(riscv_subset_list::parsing_subset_version): Properly parse the letter
'p' in '-march'.
(riscv_subset_list::parse_std_ext,
 riscv_subset_list::parse_multiletter_ext): To handle errors generated
in riscv_subset_list::parsing_subset_version.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-12.c: New.
* gcc.target/riscv/attribute-19.c: New.
---
 gcc/common/config/riscv/riscv-common.c| 67 ++-
 gcc/testsuite/gcc.target/riscv/arch-12.c  |  4 ++
 gcc/testsuite/gcc.target/riscv/attribute-19.c |  4 ++
 3 files changed, 42 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-19.c

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 34b74e5..65e5641 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -518,40 +518,38 @@ riscv_subset_list::parsing_subset_version (const char 
*ext,
   unsigned version = 0;
   unsigned major = 0;
   unsigned minor = 0;
-  char np;
   *explicit_version_p = false;
 
-  for (; *p; ++p)
-{
-  if (*p == 'p')
-   {
- np = *(p + 1);
-
- if (!ISDIGIT (np))
-   {
- /* Might be beginning of `p` extension.  */
- if (std_ext_p)
-   {
- get_default_version (ext, major_version, minor_version);
- return p;
-   }
- else
-   {
- error_at (m_loc, "%<-march=%s%>: Expect number "
-   "after %<%dp%>.", m_arch, version);
- return NULL;
-   }
-   }
-
- major = version;
- major_p = false;
- version = 0;
-   }
-  else if (ISDIGIT (*p))
-   version = (version * 10) + (*p - '0');
-  else
-   break;
-}
+  if (*p == 'p')
+gcc_assert (std_ext_p);
+  else {
+for (; *p; ++p)
+  {
+   if (*p == 'p')
+ {
+   if (!ISDIGIT (*(p+1)))
+ {
+   error_at (m_loc, "%<-march=%s%>: Expect number "
+ "after %<%dp%>.", m_arch, version);
+   return NULL;
+ }
+   if (!major_p)
+ {
+   error_at (m_loc, "%<-march=%s%>: For %<%s%dp%dp?%>, version "
+ "number with more than 2 level is not supported.",
+ m_arch, ext, major, version);
+   return NULL;
+ }
+   major = version;
+   major_p = false;
+   version = 0;
+ }
+   else if (ISDIGIT (*p))
+ version = (version * 10) + (*p - '0');
+   else
+ break;
+  }
+  }
 
   if (major_p)
 major = version;
@@ -643,7 +641,7 @@ riscv_subset_list::parse_std_ext (const char *p)
   return NULL;
 }
 
-  while (*p)
+  while (p != NULL && *p)
 {
   char subset[2] = {0, 0};
 
@@ -771,6 +769,9 @@ riscv_subset_list::parse_multiletter_ext (const char *p,
  /* std_ext_p= */ false, _version_p);
   free (ext);
 
+  if (end_of_version == NULL)
+   return NULL;
+
   *q = '\0';
 
   if (strlen (subset) == 1)
diff --git a/gcc/testsuite/gcc.target/riscv/arch-12.c 
b/gcc/testsuite/gcc.target/riscv/arch-12.c
new file mode 100644
index 000..29e16c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-12.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64im1p2p3 -mabi=lp64" } */
+int foo() {}
+/* { dg-error "'-march=rv64im1p2p3': For 'm1p2p\\?', version number with more 
than 2 level is not supported." "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-19.c 
b/gcc/testsuite/gcc.target/riscv/attribute-19.c
new file mode 100644
index 000..18f68d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-19.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-mriscv-attribute -march=rv64imp0p9 -mabi=lp64" } */
+int foo() {}
+/* { dg-final { scan-assembler ".attribute arch, \"rv64i2p0_m2p0_p0p9\"" } } */
-- 
2.7.4

Re: [PATCH] Run pass_sink_code once more before store_mergin

2021-05-17 Thread Xionghu Luo via Gcc-patches


Hi,

On 2021/5/17 16:11, Richard Biener wrote:

On Fri, 14 May 2021, Xionghu Luo wrote:


Hi Richi,

On 2021/4/21 19:54, Richard Biener wrote:

On Tue, 20 Apr 2021, Xionghu Luo wrote:




On 2021/4/15 19:34, Richard Biener wrote:

On Thu, 15 Apr 2021, Xionghu Luo wrote:


Thanks,

On 2021/4/14 14:41, Richard Biener wrote:

"#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink,
but it moves #538 first, then #235, there is strong dependency here. It
seemsdoesn't like the LCM framework that could solve all and do the
delete-insert in one iteration.

So my question was whether we want to do both within the LCM store
sinking framework.  The LCM dataflow is also used by RTL PRE which
handles both loads and non-loads so in principle it should be able
to handle stores and non-stores for the sinking case (PRE on the
reverse CFG).

A global dataflow is more powerful than any local ad-hoc method.


My biggest concern is whether the LCM DF framework could support sinking
*multiple* reverse-dependent non-store instructions together by *one*
calling of LCM DF.   If this is not supported, we need run multiple LCM
until no new changes, it would be time consuming obviously (unless
compiling time is not important here).


As said it is used for PRE and there it most definitely can do that.


I did some investigation about PRE and attached a case to show how it
works, it is quite like store-motion, and actually there is a rtl-hoist
pass in gcse.c which only works for code size.  All of them are
leveraging the LCM framework to move instructions upward or downward.

PRE and rtl-hoist move instructions upward, they analyze/hash the SOURCE
exprs and call pre_edge_lcm, store-motion and rtl-sink move instructions
downward, so they analyze/hash the DEST exprs and call pre_edge_rev_lcm.
The four problems are all converted to the LCM DF problem with
n_basic_blocks * m_exprs of 4 matrix (antic, transp, avail, kill) as input
and two outputs of where to insert/delete.

PRE scan each instruction and hash the SRC to table without *checking the
relationship between instructions*, for the case attached, BB 37, BB 38
and BB 41 both contains SOURCE expr "r262:DI+r139:DI", but BB 37 and BB 41
save it to index 106, BB 38 save it to index 110. After finishing this pass,
"r262:DI+r139:DI" BB41 is replaced with "r194:DI=r452:DI", then insert
expr to BB 75~BB 80 to create full redundancies from partial redundancies,
finally update instruction in BB 37.


I'm not familiar with the actual PRE code but reading the toplevel comment
it seems that indeed it can only handle expressions contained in a single
insn unless a REG_EQUAL note provides a short-hand for the larger one.

That of course means it would need to mark things as not transparent
for correctness where they'd be if moved together.  Now, nothing
prevents you changing the granularity of what you feed LCM.

So originally we arrived at looking into LCM because there's already
a (store) sinking pass on RTL (using LCM) so adding another (loop-special)
one didn't look like the best obvious solution.

That said, LCM would work for single-instruction expressions.
Alternatively a greedy algorithm like you prototyped could be used.
Another pass to look at would be RTL invariant motion which seems to
compute some kind of dependency graph - not sure if that would be
adaptable for the reverse CFG problem.



Actually my RTL sinking pass patch is borrowed from RTL loop invariant
motion, it is  quite limited since only moves instructions from loop header
to loop exits, though it could be refined with various of algorithms.
Compared to the initial method of running gimple sink pass once more,
it seems much more complicated and limited without gaining obvious performance
benefit, shall we turn back to consider gimple sink2 pass from original since
we are in stage1 now?


OK, so while there might be new sinking opportunities exposed during
RTL expansion and early RTL opts we can consider adding another sink pass
on GIMPLE.  Since it's basically a scheduling optimization placement
shouldn't matter much but I suppose we should run it before store
merging, so anywhere between cd_dce and that.

Richard.



Attached the patch as discussed, put it before store_merging is fine.
Regression tested pass on P8LE, OK for trunk? :)


Thanks,
Xionghu
From 7fcc6ca9ef3b6acbfbcbd3da4be1d1c0eef4be80 Mon Sep 17 00:00:00 2001
From: Xiong Hu Luo 
Date: Mon, 17 May 2021 20:46:15 -0500
Subject: [PATCH] Run pass_sink_code once more before store_merging

Gimple sink code pass runs quite early, there may be some new
oppertunities exposed by later gimple optmization passes, this patch
runs the sink code pass once more before store_merging.  For detailed
discussion, please refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562352.html

Tested the SPEC2017 performance on P8LE, 544.nab_r is improved
by 2.43%, but no big changes to other cases, GEOMEAN is improved quite
small with 0.25%.

gcc/ChangeLog:

Re: [PATCH] libstdc++: Fix iterator caching inside range adaptors [PR100479]

2021-05-17 Thread Patrick Palka via Gcc-patches

On Mon, 17 May 2021, Tim Song wrote:

> On Mon, May 17, 2021 at 2:59 PM Patrick Palka  wrote:
> >
> > +   constexpr _CachedPosition&
> > +   operator=(_CachedPosition&& __other) noexcept
> > +   {
> > + if (std::__addressof(__other) != this)
> 
> I don't think we need this check - self-move-assigning the underlying
> view isn't required to be a no-op, so we should still invalidate.

Sounds good, so like so:

-- >8 --

Subject: [PATCH] libstdc++: Fix iterator caching inside range adaptors
 [PR100479]

This fixes two issues with our iterator caching as described in detail
in the PR.  Since we recently added the __non_propagating_cache class
template as part of r12-336 for P2328, this patch just rewrites the
problematic _CachedPosition partial specialization in terms of this
class template.

For the offset partial specialization, it's safe to propagate the cached
offset on copy/move, but we should still invalidate the cached offset in
the source object on move.

libstdc++-v3/ChangeLog:

PR libstdc++/100479
* include/std/ranges (__detail::__non_propagating_cache): Move
definition up to before that of _CachedPosition.  Make base
class _Optional_base protected instead of private.  Add const
overload for operator*.
(__detail::_CachedPosition): Rewrite the partial specialization
for forward ranges as a derived class of __non_propagating_cache.
Remove the size constraint on the partial specialization for
random access ranges.  Add copy/move/copy-assignment/move-assignment
members to the offset partial specialization for random
access ranges that propagate the cached value but additionally
invalidate it in the source object on move.
* testsuite/std/ranges/adaptors/100479.cc: New test.
---
 libstdc++-v3/include/std/ranges   | 155 ++
 .../testsuite/std/ranges/adaptors/100479.cc   | 113 +
 2 files changed, 201 insertions(+), 67 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/100479.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index ca62f73ae5d..f7ffee56f56 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1039,6 +1039,67 @@ namespace views::__adaptor
 
   namespace __detail
   {
+template
+  struct __non_propagating_cache
+  {
+   // When _Tp is not an object type (e.g. is a reference type), we make
+   // __non_propagating_cache<_Tp> empty rather than ill-formed so that
+   // users can easily conditionally declare data members with this type
+   // (such as join_view::_M_inner).
+  };
+
+template
+  requires is_object_v<_Tp>
+  struct __non_propagating_cache<_Tp> : protected _Optional_base<_Tp>
+  {
+   __non_propagating_cache() = default;
+
+   constexpr
+   __non_propagating_cache(const __non_propagating_cache&) noexcept
+   { }
+
+   constexpr
+   __non_propagating_cache(__non_propagating_cache&& __other) noexcept
+   { __other._M_reset(); }
+
+   constexpr __non_propagating_cache&
+   operator=(const __non_propagating_cache& __other) noexcept
+   {
+ if (std::__addressof(__other) != this)
+   this->_M_reset();
+ return *this;
+   }
+
+   constexpr __non_propagating_cache&
+   operator=(__non_propagating_cache&& __other) noexcept
+   {
+ this->_M_reset();
+ __other._M_reset();
+ return *this;
+   }
+
+   constexpr _Tp&
+   operator*() noexcept
+   { return this->_M_get(); }
+
+   constexpr const _Tp&
+   operator*() const noexcept
+   { return this->_M_get(); }
+
+   template
+ _Tp&
+ _M_emplace_deref(const _Iter& __i)
+ {
+   this->_M_reset();
+   // Using _Optional_base::_M_construct to initialize from '*__i'
+   // would incur an extra move due to the indirection, so we instead
+   // use placement new directly.
+   ::new ((void *) std::__addressof(this->_M_payload._M_payload)) 
_Tp(*__i);
+   this->_M_payload._M_engaged = true;
+   return this->_M_get();
+ }
+  };
+
 template
   struct _CachedPosition
   {
@@ -1060,27 +1121,26 @@ namespace views::__adaptor
 
 template
   struct _CachedPosition<_Range>
+   : protected __non_propagating_cache>
   {
-  private:
-   iterator_t<_Range> _M_iter{};
-
-  public:
constexpr bool
_M_has_value() const
-   { return _M_iter != iterator_t<_Range>{}; }
+   { return this->_M_is_engaged(); }
 
constexpr iterator_t<_Range>
_M_get(const _Range&) const
{
  __glibcxx_assert(_M_has_value());
- return _M_iter;
+ return **this;
}
 
constexpr void
_M_set(const _Range&, const iterator_t<_Range>& __it)
{

Re: [PATCH] libstdc++: Fix access issues in elements_view::_Sentinel [PR100631]

2021-05-17 Thread Patrick Palka via Gcc-patches




On Mon, 17 May 2021, Tim Song wrote:

> On Mon, May 17, 2021 at 12:21 PM Patrick Palka via Gcc-patches
>  wrote:
> >
> >   using _Base = elements_view::_Base<_Const>;
> >   sentinel_t<_Base> _M_end = sentinel_t<_Base>();
> > @@ -3800,7 +3807,7 @@ namespace views::__adaptor
> > requires sized_sentinel_for, 
> > iterator_t<_Base2>>
> > friend constexpr range_difference_t<_Base>
> 
> Preexisting, but this one should be _Base2 - we always want to get the
> difference type from the iterator being used.

Patch committed with this as a drive-by change.  Thanks!

-- >8 --

Subject: [PATCH] libstdc++: Fix miscellaneous issues with
 elements_view::_Sentinel [PR100631]

libstdc++-v3/ChangeLog:

PR libstdc++/100631
* include/std/ranges (elements_view::_Iterator): Also befriend
_Sentinel.
(elements_view::_Sentinel::_M_equal): Templatize.
(elements_view::_Sentinel::_M_distance_from): Split out from ...
(elements_view::_Sentinel::operator-): ... here.  Depend on
_Base2 instead of _Base in the return type.
* testsuite/std/ranges/adaptors/elements.cc (test06, test07):
New tests.
---
 libstdc++-v3/include/std/ranges   | 18 +++
 .../testsuite/std/ranges/adaptors/elements.cc | 31 +++
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 48100e9d7f2..36bccd6b33b 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3634,16 +3634,22 @@ namespace views::__adaptor
requires sized_sentinel_for, iterator_t<_Base>>
  { return __x._M_current - __y._M_current; }
 
- friend _Sentinel<_Const>;
+ template  friend struct _Sentinel;
};
 
   template
struct _Sentinel
{
private:
- constexpr bool
- _M_equal(const _Iterator<_Const>& __x) const
- { return __x._M_current == _M_end; }
+ template
+   constexpr bool
+   _M_equal(const _Iterator<_Const2>& __x) const
+   { return __x._M_current == _M_end; }
+
+ template
+   constexpr auto
+   _M_distance_from(const _Iterator<_Const2>& __i) const
+   { return _M_end - __i._M_current; }
 
  using _Base = elements_view::_Base<_Const>;
  sentinel_t<_Base> _M_end = sentinel_t<_Base>();
@@ -3684,9 +3690,9 @@ namespace views::__adaptor
  template>
requires sized_sentinel_for, iterator_t<_Base2>>
-   friend constexpr range_difference_t<_Base>
+   friend constexpr range_difference_t<_Base2>
operator-(const _Sentinel& __x, const _Iterator<_Const2>& __y)
-   { return __x._M_end - __y._M_current; }
+   { return __x._M_distance_from(__y); }
 
  friend _Sentinel;
};
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
index 134afd6a873..c6839e38ce5 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
@@ -115,6 +115,35 @@ test05()
   VERIFY( r2[0] == 1 && r2[1] == 3 );
 }
 
+void
+test06()
+{
+  // PR libstdc++/100631
+  auto r = views::iota(0)
+| views::filter([](int){ return true; })
+| views::take(42)
+| views::reverse
+| views::transform([](int) { return std::make_pair(42, "hello"); })
+| views::take(42)
+| views::keys;
+  auto b = r.begin();
+  auto e = r.end();
+  e - b;
+}
+
+void
+test07()
+{
+  // PR libstdc++/100631 comment #2
+  auto r = views::iota(0)
+| views::transform([](int) { return std::make_pair(42, "hello"); })
+| views::keys;
+  auto b = ranges::cbegin(r);
+  auto e = ranges::end(r);
+  b.base() == e.base();
+  b == e;
+}
+
 int
 main()
 {
@@ -123,4 +152,6 @@ main()
   test03();
   test04();
   test05();
+  test06();
+  test07();
 }
-- 
2.31.1.621.g97eea85a0a

Re: [PATCH] Add a couple of A?CST1:CST2 match and simplify optimizations

2021-05-17 Thread Andrew Pinski via Gcc-patches

x86_64-pc-linux-gnu/include -isystem 
> > /home/ed/gnu/install/x86_64-pc-linux-gnu/sys-include   -fchecking=1 -c -g 
> > -O2 -fchecking=1  -gnatpg -gnata -W -Wall -nostdinc -I- -I. -Iada/generated 
> > -Iada -Iada/gcc-interface -I../../gcc-trunk-1/gcc/ada 
> > -I../../gcc-trunk-1/gcc/ada/gcc-interface -Iada/libgnat 
> > -I../../gcc-trunk-1/gcc/ada/libgnat 
> > ../../gcc-trunk-1/gcc/ada/libgnat/a-charac.ads -o ada/libgnat/a-charac.o
> > /home/ed/gnu/gcc-build-2/./prev-gcc/xgcc 
> > -B/home/ed/gnu/gcc-build-2/./prev-gcc/ 
> > -B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
> > -B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
> > -B/home/ed/gnu/install/x86_64-pc-linux-gnu/lib/ -isystem 
> > /home/ed/gnu/install/x86_64-pc-linux-gnu/include -isystem 
> > /home/ed/gnu/install/x86_64-pc-linux-gnu/sys-include   -fchecking=1 -c -g 
> > -O2 -fchecking=1  -gnatpg -gnata -W -Wall -nostdinc -I- -I. -Iada/generated 
> > -Iada -Iada/gcc-interface -I../../gcc-trunk-1/gcc/ada 
> > -I../../gcc-trunk-1/gcc/ada/gcc-interface -Iada/libgnat 
> > -I../../gcc-trunk-1/gcc/ada/libgnat 
> > ../../gcc-trunk-1/gcc/ada/libgnat/a-chlat1.ads -o ada/libgnat/a-chlat1.o
> > +===GNAT BUG DETECTED==+
> > | 12.0.0 20210517 (experimental) (x86_64-pc-linux-gnu) Storage_Error stack 
> > overflow or erroneous memory access|
> > | Error detected at a-charac.ads:16:12 |
> > | Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
> > | Use a subject line meaningful to you and us to track the bug.|
> > | Include the entire contents of this bug box in the report.   |
> > | Include the exact command that you entered.  |
> > | Also include sources listed below.   |
> > +==+
> >
> > Please include these source files with error report
> > Note that list may not be accurate in some cases,
> > so please double check that the problem can still
> > be reproduced with the set of files listed.
> > Consider also -gnatd.n switch (see debug.adb).
> >
> > ../../gcc-trunk-1/gcc/ada/gcc-interface/system.ads
> > ../../gcc-trunk-1/gcc/ada/libgnat/a-charac.ads
> > ../../gcc-trunk-1/gcc/ada/libgnat/ada.ads
> >
> > compilation abandoned
> > echo timestamp > s-modes
> > make[3]: *** [../../gcc-trunk-1/gcc/ada/gcc-interface/Make-lang.in:144: 
> > ada/libgnat/a-charac.o] Error 1
> > make[3]: *** Waiting for unfinished jobs
> > +===GNAT BUG DETECTED==+
> > | 12.0.0 20210517 (experimental) (x86_64-pc-linux-gnu) Storage_Error stack 
> > overflow or erroneous memory access|
> > | Error detected at a-chlat1.ads:18:23 |
> > | Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
> > | Use a subject line meaningful to you and us to track the bug.|
> > | Include the entire contents of this bug box in the report.   |
> > | Include the exact command that you entered.  |
> > | Also include sources listed below.   |
> > +==+
> >
> > Please include these source files with error report
> > Note that list may not be accurate in some cases,
> > so please double check that the problem can still
> > be reproduced with the set of files listed.
> > Consider also -gnatd.n switch (see debug.adb).
> >
> > ../../gcc-trunk-1/gcc/ada/gcc-interface/system.ads
> > ../../gcc-trunk-1/gcc/ada/libgnat/a-chlat1.ads
> > ../../gcc-trunk-1/gcc/ada/libgnat/a-charac.ads
> > ../../gcc-trunk-1/gcc/ada/libgnat/ada.ads
> >
> > compilation abandoned
> > make[3]: *** [../../gcc-trunk-1/gcc/ada/gcc-interface/Make-lang.in:144: 
> > ada/libgnat/a-chlat1.o] Error 1
> > rm gcov.pod gcov-dump.pod fsf-funding.pod gfdl.pod gpl.pod cpp.pod 
> > gccgo.pod gcc.pod gdc.pod gfortran.pod gcov-tool.pod lto-dump.pod
> > make[3]: Leaving directory '/home/ed/gnu/gcc-build-2/gcc'
> > make[2]: *** [Makefile:4854: all-stage3-gcc] Error 2
> > make[2]: Leaving directory '/home/ed/gnu/gcc-build-2'
> > make[1]: *** [Makefile:32080: stage3-bubble] Error 2
> > make[1]: Leaving directory '/home/ed/gnu/gcc-build-2'
> > make: *** [Makefile:1009: all] Error 2
> >
> >
> > Bernd.

Re: [PATCH] go/100537 - Bootstrap-O3 and bootstrap-debug fail

2021-05-17 Thread Ian Lance Taylor via Gcc-patches

On Mon, May 17, 2021 at 1:17 AM Richard Biener via Gcc-patches
 wrote:
>
> On Fri, May 14, 2021 at 11:19 AM guojiufu via Gcc-patches
>  wrote:
> >
> > On 2021-05-14 15:39, guojiufu via Gcc-patches wrote:
> > > On 2021-05-14 15:15, Richard Biener wrote:
> > >> On May 14, 2021 4:52:56 AM GMT+02:00, Jiufu Guo
> > >>  wrote:
> > >>> As discussed in the PR, Richard mentioned the method to
> > >>> figure out which VAR was not set TREE_ADDRESSABLE, and
> > >>> then cause this failure.  It is address_expression which
> > >>> build addr_expr (build_fold_addr_expr_loc), but not set
> > >>> TREE_ADDRESSABLE.
> > >>>
> > >>> I drafted this patch with reference the comments from Richard
> > >>> in this PR, while I'm not quite sure if more thing need to do.
> > >>> So, please have review, thanks!
> > >>>
> > >>> Bootstrap and regtest pass on ppc64le. Is this ok for trunk?
> > >>
> > >> I suggest to use mark_addresssable unless we're sure expr is always an
> > >> entity where TREE_ADDRESSABLE has the desired meaning.
> >
> > Thanks, Richard!
> > You point out the root concern, I'm not sure ;)
> >
> > With looking at code "mark_addresssable" and code around
> > tree-ssa.c:1013,
> > VAR_P, PARM_DECL, and RESULT_DECL are checked before accessing
> > TREE_ADDRESSABLE.
> > So, just wondering if these entities need to be marked as
> > TREE_ADDRESSABLE?
> >
> > diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
> > index 5d9dbb5d068..85d324a92cc 100644
> > --- a/gcc/go/go-gcc.cc
> > +++ b/gcc/go/go-gcc.cc
> > @@ -1680,6 +1680,11 @@ Gcc_backend::address_expression(Bexpression*
> > bexpr, Location location)
> > if (expr == error_mark_node)
> >   return this->error_expression();
> >
> > +  if ((VAR_P(expr)
> > +   || TREE_CODE(expr) == PARM_DECL
> > +   || TREE_CODE(expr) == RESULT_DECL)
> > +TREE_ADDRESSABLE (expr) = 1;
> > +
>
> The root concern is that mark_addressable does
>
>   while (handled_component_p (x))
> x = TREE_OPERAND (x, 0);
>
> and I do not know the constraints on 'expr' as passed to
> Gcc_backend::address_expression.
>
> I think we need input from Ian here.  Most FEs have their own 
> *_mark_addressable
> function where they also emit diagnostics (guess this is handled in
> the actual Go frontend).
> Since Gcc_backend does lowering to GENERIC using a middle-end is probably OK.

I doubt I understand all the issues here.

In general the Go frontend only takes the addresses of VAR_DECLs or
PARM_DECLs.  It doesn't bother to set TREE_ADDRESSABLE for global
variables for which TREE_STATIC or DECL_EXTERNAL is true.  For local
variables it sets TREE_ADDRESSABLE based on the is_address_taken
parameter to Gcc_backend::local_variable, and similarly for PARM_DECLs
and Gcc_backend::parameter_variable.

The name in the bug report is for a string initializer, which should
be TREE_STATIC == 1 and TREE_PUBLIC == 0.  Perhaps the fix is simply
to set TREE_ADDRESSABLE in Gcc_backend::immutable_struct and
Gcc_backend::implicit_variable.  I can't see how it would hurt to set
TREE_ADDRESSABLE unnecessarily for a TREE_STATIC variable.

But, again, I doubt I understand all the issues here.

Ian

Re: [PATCH] go/100537 - Bootstrap-O3 and bootstrap-debug fail

2021-05-17 Thread guojiufu via Gcc-patches


On 2021-05-17 16:17, Richard Biener wrote:

On Fri, May 14, 2021 at 11:19 AM guojiufu via Gcc-patches
 wrote:


On 2021-05-14 15:39, guojiufu via Gcc-patches wrote:
> On 2021-05-14 15:15, Richard Biener wrote:
>> On May 14, 2021 4:52:56 AM GMT+02:00, Jiufu Guo
>>  wrote:
>>> As discussed in the PR, Richard mentioned the method to
>>> figure out which VAR was not set TREE_ADDRESSABLE, and
>>> then cause this failure.  It is address_expression which
>>> build addr_expr (build_fold_addr_expr_loc), but not set
>>> TREE_ADDRESSABLE.
>>>
>>> I drafted this patch with reference the comments from Richard
>>> in this PR, while I'm not quite sure if more thing need to do.
>>> So, please have review, thanks!
>>>
>>> Bootstrap and regtest pass on ppc64le. Is this ok for trunk?
>>
>> I suggest to use mark_addresssable unless we're sure expr is always an
>> entity where TREE_ADDRESSABLE has the desired meaning.

Thanks, Richard!
You point out the root concern, I'm not sure ;)

With looking at code "mark_addresssable" and code around
tree-ssa.c:1013,
VAR_P, PARM_DECL, and RESULT_DECL are checked before accessing
TREE_ADDRESSABLE.
So, just wondering if these entities need to be marked as
TREE_ADDRESSABLE?

diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
index 5d9dbb5d068..85d324a92cc 100644
--- a/gcc/go/go-gcc.cc
+++ b/gcc/go/go-gcc.cc
@@ -1680,6 +1680,11 @@ Gcc_backend::address_expression(Bexpression*
bexpr, Location location)
if (expr == error_mark_node)
  return this->error_expression();

+  if ((VAR_P(expr)
+   || TREE_CODE(expr) == PARM_DECL
+   || TREE_CODE(expr) == RESULT_DECL)
+TREE_ADDRESSABLE (expr) = 1;
+


The root concern is that mark_addressable does

  while (handled_component_p (x))
x = TREE_OPERAND (x, 0);

and I do not know the constraints on 'expr' as passed to
Gcc_backend::address_expression.

I think we need input from Ian here.  Most FEs have their own 
*_mark_addressable

function where they also emit diagnostics (guess this is handled in
the actual Go frontend).
Since Gcc_backend does lowering to GENERIC using a middle-end is 
probably OK.


Yeap.  Hope this patch is ok, then the bootstrap could pass.
Otherwise, we may need more help from Ian and guys ;)

Jiufu Guo.



tree ret = build_fold_addr_expr_loc(location.gcc_location(), 
expr);

return this->make_expression(ret);
  }


Or call mark_addressable, and update mark_addressable to avoid NULL
pointer ICE:
The below patch also pass bootstrap-debug.

diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
index b8c732b632a..f682841391b 100644
--- a/gcc/gimple-expr.c
+++ b/gcc/gimple-expr.c
@@ -915,6 +915,7 @@ mark_addressable (tree x)
if (TREE_CODE (x) == VAR_DECL
&& !DECL_EXTERNAL (x)
&& !TREE_STATIC (x)
+  && cfun != NULL


I'd be OK with this hunk of course.


&& cfun->gimple_df != NULL
&& cfun->gimple_df->decls_to_pointers != NULL)
  {
diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
index 5d9dbb5d068..fe9dfaf8579 100644
--- a/gcc/go/go-gcc.cc
+++ b/gcc/go/go-gcc.cc
@@ -1680,6 +1680,7 @@ Gcc_backend::address_expression(Bexpression*
bexpr, Location location)
if (expr == error_mark_node)
  return this->error_expression();

+  mark_addressable(expr);
tree ret = build_fold_addr_expr_loc(location.gcc_location(), 
expr);

return this->make_expression(ret);
  }


>
> I notice you mentioned "mark_addresssable" in PR.
> And I had tried yesterday, it cause new ICEs at gimple-expr.c:918
> below line:
>
>   && cfun->gimple_df != NULL
>
>
>
>>
>> Richard.
>>
>>> Jiufu Guo.
>>>
>>> 2021-05-14  Richard Biener  
>>> Jiufu Guo 
>>>
>>> PR go/100537
>>> * go-gcc.cc
>>> (Gcc_backend::address_expression): Set TREE_ADDRESSABLE.
>>>
>>> ---
>>> gcc/go/go-gcc.cc | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
>>> index 5d9dbb5d068..8ed20a3b479 100644
>>> --- a/gcc/go/go-gcc.cc
>>> +++ b/gcc/go/go-gcc.cc
>>> @@ -1680,6 +1680,7 @@ Gcc_backend::address_expression(Bexpression*
>>> bexpr, Location location)
>>>   if (expr == error_mark_node)
>>> return this->error_expression();
>>>
>>> +  TREE_ADDRESSABLE (expr) = 1;
>>>   tree ret = build_fold_addr_expr_loc(location.gcc_location(), expr);
>>>   return this->make_expression(ret);
>>> }

[PATCH] libgccjit: Add support for types used by atomic builtins [PR96066] [PR96067]

2021-05-17 Thread Antoni Boucher via Gcc-patches

Hello.
This patch fixes the issue with using atomic builtins in libgccjit.
Thanks to review it.
From 0ce53d373ffba9f3f80a2d2b4e1a7d724ba31b7d Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Sun, 9 May 2021 20:14:37 -0400
Subject: [PATCH] Add support for types used by atomic builtins [PR96066]
 [PR96067]

2021-05-17  Antoni Boucher  

gcc/jit/
PR target/PR96066
PR target/PR96067
* jit-builtins.c: Implement missing types for builtins.
* jit-recording.c:: Allow sending a volatile const void * as
argument.
gcc/testsuite/
PR target/PR96066
PR target/PR96067
* jit.dg/all-non-failing-tests.h: Add test-builtin-types.c.
* jit.dg/test-builtin-types.c
---
 gcc/jit/jit-builtins.c   | 10 ++---
 gcc/jit/jit-recording.c  | 14 ++-
 gcc/testsuite/jit.dg/all-non-failing-tests.h |  7 
 gcc/testsuite/jit.dg/test-builtin-types.c| 41 
 4 files changed, 65 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/jit.dg/test-builtin-types.c

diff --git a/gcc/jit/jit-builtins.c b/gcc/jit/jit-builtins.c
index 1ea96f4e025..c279dd858f9 100644
--- a/gcc/jit/jit-builtins.c
+++ b/gcc/jit/jit-builtins.c
@@ -541,11 +541,11 @@ builtins_manager::make_primitive_type (enum jit_builtin_type type_id)
 // case BT_DFLOAT128:
 // case BT_VALIST_REF:
 // case BT_VALIST_ARG:
-// case BT_I1:
-// case BT_I2:
-// case BT_I4:
-// case BT_I8:
-// case BT_I16:
+case BT_I1: return m_ctxt->get_int_type (1, true);
+case BT_I2: return m_ctxt->get_int_type (2, true);
+case BT_I4: return m_ctxt->get_int_type (4, true);
+case BT_I8: return m_ctxt->get_int_type (8, true);
+case BT_I16: return m_ctxt->get_int_type (16, true);
 // case BT_PTR_CONST_STRING:
 }
 }
diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c
index 117ff70114c..de876ff9fa6 100644
--- a/gcc/jit/jit-recording.c
+++ b/gcc/jit/jit-recording.c
@@ -2598,8 +2598,18 @@ recording::memento_of_get_pointer::accepts_writes_from (type *rtype)
 return false;
 
   /* It's OK to assign to a (const T *) from a (T *).  */
-  return m_other_type->unqualified ()
-->accepts_writes_from (rtype_points_to);
+  if (m_other_type->unqualified ()
+->accepts_writes_from (rtype_points_to)) {
+  return true;
+  }
+
+  /* It's OK to assign to a (volatile const T *) from a (volatile const T *). */
+  if (m_other_type->unqualified ()->unqualified ()
+->accepts_writes_from (rtype_points_to->unqualified ())) {
+  return true;
+  }
+
+  return false;
 }
 
 /* Implementation of pure virtual hook recording::memento::replay_into
diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h b/gcc/testsuite/jit.dg/all-non-failing-tests.h
index 4202eb7798b..dfc6596358c 100644
--- a/gcc/testsuite/jit.dg/all-non-failing-tests.h
+++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h
@@ -181,6 +181,13 @@
 #undef create_code
 #undef verify_code
 
+/* test-builtin-types.c */
+#define create_code create_code_builtin_types
+#define verify_code verify_code_builtin_types
+#include "test-builtin-types.c"
+#undef create_code
+#undef verify_code
+
 /* test-hello-world.c */
 #define create_code create_code_hello_world
 #define verify_code verify_code_hello_world
diff --git a/gcc/testsuite/jit.dg/test-builtin-types.c b/gcc/testsuite/jit.dg/test-builtin-types.c
new file mode 100644
index 000..e20d71571b5
--- /dev/null
+++ b/gcc/testsuite/jit.dg/test-builtin-types.c
@@ -0,0 +1,41 @@
+#include 
+#include 
+#include 
+#include 
+
+#include "libgccjit.h"
+
+#include "harness.h"
+
+void
+create_code (gcc_jit_context *ctxt, void *user_data)
+{
+  CHECK_NON_NULL (gcc_jit_context_get_builtin_function (ctxt, "__atomic_fetch_add_4"));
+
+  gcc_jit_function *atomic_load = gcc_jit_context_get_builtin_function (ctxt, "__atomic_load_8");
+
+  gcc_jit_type *volatile_void_ptr =
+gcc_jit_type_get_volatile(gcc_jit_type_get_const(gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_VOID_PTR)));
+  gcc_jit_type *void_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_VOID);
+  gcc_jit_type *long_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_LONG);
+  gcc_jit_type *int_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT);
+  gcc_jit_function *func =
+gcc_jit_context_new_function (ctxt, NULL, GCC_JIT_FUNCTION_EXPORTED, void_type, "atomics", 0, NULL, 0);
+
+  gcc_jit_lvalue *variable = gcc_jit_function_new_local (func, NULL, long_type, "variable");
+  gcc_jit_rvalue *builtin_args[2];
+  gcc_jit_rvalue *param1 = gcc_jit_lvalue_get_address(variable, NULL);
+  builtin_args[0] = gcc_jit_context_new_cast(ctxt, NULL, param1, volatile_void_ptr);
+  builtin_args[1] = gcc_jit_context_new_rvalue_from_long(ctxt, int_type, 0);
+  gcc_jit_context_new_call (ctxt, NULL, atomic_load, 2, builtin_args);
+}
+
+void
+verify_code (gcc_jit_context *ctxt, gcc_jit_result *result)
+{

Re: [PATCH][v2] c/100547 - reject overly large vector_size attributes

2021-05-17 Thread Joseph Myers

On Mon, 17 May 2021, Richard Biener wrote:

> This rejects a number of vector components that does not fit an 'int'
> which is an internal limitation of RTVEC.  This requires adjusting
> gcc.dg/attr-vector_size.c which checks for much larger
> supported vectors.  Note that the RTVEC limitation is a host specific
> limitation (unless we change this 'int' to int32_t), but should be
> 32bits in practice everywhere.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH][DOCS] Remove install-old.texi

2021-05-17 Thread Joseph Myers

On Mon, 17 May 2021, Martin Liška wrote:

> -@enumerate
> -@item
> -If you have chosen a configuration for GCC which requires other GNU
> -tools (such as GAS or the GNU linker) instead of the standard system
> -tools, install the required tools in the build directory under the names
> -@file{as}, @file{ld} or whatever is appropriate.

This bit is obsoleted by --with-build-time-tools (putting tools in the 
build directory was needed in some cases before that option was added).

> -@item
> -Specify the host, build and target machine configurations.  You do this
> -when you run the @file{configure} script.

But install.texi doesn't appear to have such documentation of what host, 
build and target are and how to specify them.

> -Here are the possible CPU types:
> -
> -@quotation
> -@c gmicro, fx80, spur and tahoe omitted since they don't work.
> -1750a, a29k, alpha, arm, avr, c@var{n}, clipper, dsp16xx, elxsi, fr30, h8300,
> -hppa1.0, hppa1.1, i370, i386, i486, i586, i686, i786, i860, i960, ip2k, m32r,
> -m68000, m68k, m88k, mcore, mips, mipsel, mips64, mips64el,
> -mn10200, mn10300, ns32k, pdp11, powerpc, powerpcle, romp, rs6000, sh, sparc,
> -sparclite, sparc64, v850, vax, we32k.
> -@end quotation

The very outdated list of specific names may not be very useful now, 
although arguably there *should* be a current list of supported targets 
(closer to that in config-list.mk, or at least a list of supported CPU 
names and another list of supported OS names) in the installation 
documentation.

> -Often a particular model of machine has a name.  Many machine names are
> -recognized as aliases for CPU/company combinations.  Thus, the machine

All such machine names can probably be considered obsolete; the main case 
to document is CPU-SYSTEM (no company mentioned), not machine name aliases 
(and mention somewhere that config.sub produces the canonical version of a 
name).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] libstdc++: Fix iterator caching inside range adaptors [PR100479]

2021-05-17 Thread Tim Song via Gcc-patches

On Mon, May 17, 2021 at 2:59 PM Patrick Palka  wrote:
>
> +   constexpr _CachedPosition&
> +   operator=(_CachedPosition&& __other) noexcept
> +   {
> + if (std::__addressof(__other) != this)

I don't think we need this check - self-move-assigning the underlying
view isn't required to be a no-op, so we should still invalidate.


> +   {
> + // Propagate the cached offset, but invalidate the source.
> + this->_M_offset = __other._M_offset;
> + __other._M_offset = -1;
> +   }
> + return *this;
> +   }

Re: [PATCH] libstdc++: Fix access issues in elements_view::_Sentinel [PR100631]

2021-05-17 Thread Tim Song via Gcc-patches

On Mon, May 17, 2021 at 12:21 PM Patrick Palka via Gcc-patches
 wrote:
>
>   using _Base = elements_view::_Base<_Const>;
>   sentinel_t<_Base> _M_end = sentinel_t<_Base>();
> @@ -3800,7 +3807,7 @@ namespace views::__adaptor
> requires sized_sentinel_for, iterator_t<_Base2>>
> friend constexpr range_difference_t<_Base>

Preexisting, but this one should be _Base2 - we always want to get the
difference type from the iterator being used.



> operator-(const _Sentinel& __x, const _Iterator<_Const2>& __y)
> -   { return __x._M_end - __y._M_current; }
> +   { return __x._M_distance_from(__y); }
>

[PATCH] PR tree-optimization/100512: Once a range becomes constant, make it invariant.

2021-05-17 Thread Andrew MacLeod via Gcc-patches

The code in PR 100512 triggers an interaction between ranger and the 
propagation engine related to undefined values.


I put the detailed analysis in the PR, but it boils down to the early 
VRP pass has concluded that something is a constant and can be replaced, 
and removes the definition expecting the constant to be propagated 
everywhere.



If the code is in an undefined region that the CFG is going to remove, 
we can find impossible situations,a nd ranger then changes that value ot 
UNDEFINED..  because, well, it is.  But then the propagation engine 
panics because it doesnt have a constant any more, so odesnt replace it, 
and now we have a used but not defined value.


Once we get to a globally constant range where further refinements can 
only end up in an UNDEFINED state, stop further evaluating the range.  
This is typically in places which are about to be removed by CFG cleanup 
anyway, and it will make the propagation engine happy with no surprises.


Bootstraps on x86_64-pc-linux-gnu with no regressions, and fixes the PR.

Pushed.

Andrew



commit 3f476de7fd274f619a0b04c2e2f7077ee8ab17a5
Author: Andrew MacLeod 
Date:   Mon May 17 15:53:39 2021 -0400

Once a range becomes constant, make it invariant.

Once a range is forced to a constant globally, simply make it invariant.
Unify this with the code which makes non-zero pointer ranges invariant.

gcc/
PR tree-optimization/100512
* gimple-range-cache.cc (ranger_cache::set_global_range): Mark const
and non-zero pointer ranges as invariant.
* gimple-range.cc (gimple_ranger::range_of_stmt): Remove pointer
processing from here.

gcc/testsuite/
PR tree-optimization/100512
* gcc.dg/pr100512.c: New.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 60e5d66c52d..2c922e32913 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -703,8 +703,19 @@ ranger_cache::set_global_range (tree name, const irange )
 
   propagate_updated_value (name, bb);
 }
-  // Mark the value as up-to-date.
-  m_temporal->set_timestamp (name);
+  // Constants no longer need to tracked.  Any further refinement has to be
+  // undefined. Propagation works better with constants. PR 100512.
+  // Pointers which resolve to non-zero also do not need
+  // tracking in the cache as they will never change.  See PR 98866.
+  // Otherwise mark the value as up-to-date.
+  if (r.singleton_p ()
+  || (POINTER_TYPE_P (TREE_TYPE (name)) && r.nonzero_p ()))
+{
+  set_range_invariant (name);
+  m_temporal->set_always_current (name);
+}
+  else
+m_temporal->set_timestamp (name);
 }
 
 // Register a dependency on DEP to name.  If the timestamp for DEP is ever
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 5b288d8e6a7..710bc7f9632 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -1082,11 +1082,6 @@ gimple_ranger::range_of_stmt (irange , gimple *s, tree name)
   r.intersect (tmp);
   m_cache.set_global_range (name, r);
 
-  // Pointers which resolve to non-zero at the defintion point do not need
-  // tracking in the cache as they will never change.  See PR 98866.
-  if (POINTER_TYPE_P (TREE_TYPE (name)) && r.nonzero_p ())
-m_cache.set_range_invariant (name);
-
   return true;
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr100512.c b/gcc/testsuite/gcc.dg/pr100512.c
new file mode 100644
index 000..70b90e04be9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr100512.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -w" } */
+
+#include 
+int a;
+void b() {
+  int16_t *c;
+  uint16_t d = 2;
+  if (0 == d) {
+uint64_t e;
+uint64_t *f = 
+for (;;) {
+  if (e += 0 >= 0)
+for (;;)
+  ;
+g:
+  for (; a;) {
+int16_t i = 
+*c = i && *f;
+  }
+}
+  }
+  goto g;
+}
+

Re: [PATCH] PING implement pre-c++20 contracts

2021-05-17 Thread Jason Merrill via Gcc-patches


On 5/14/21 4:54 PM, Jason Merrill wrote:

On 4/30/21 1:44 PM, Jeff Chapman wrote:

Hello! Looping back around to this. re:
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567334.html

On 3/25/21, Jason Merrill  wrote:

On 3/1/21 8:12 AM, Jeff Chapman wrote:

On 1/18/21, Jason Merrill  wrote:

On 1/4/21 9:58 AM, Jeff Chapman wrote:

Ping. re:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561135.html


https://github.com/lock3/gcc/tree/contracts-jac-alt



Why is some of the code in c-family?  From the modules merge there is
now a cp_handle_option function that you could add the option handling
to, and I don't see a reason for cxx-contracts.c to be in c-family/
rather than cp/.


I've been pushing changes that address the points raised and wanted to
ping to see if there's more feedback and give a status summary. The
notable change is making sure the implementation plays nicely with
modules and a mangling change that did away with a good chunk of code.

The contracts code outside of cp/ has been moved into it, and the
contract attributes have been altered to work with language independent
handling code. More of the contracts code can probably still be 
moved to

cxx-contracts which I'll loop back to after other refactoring. The
naming, spelling, and comments (or lack thereof) have been addressed.


Sounds good.  I plan to get back to this when GCC 11 branches, which
should be mid-April.


Please let me know if you see any more issues when you pick it back up.
Particularly in modules interop, since I don't think you've had a chance
to look at that yet.

Completed another merge with master earlier this week which didn't bring
to light any new issues or regressions, but I'll keep on that :)

+  /* If we have contracts, check that they're valid in this 
context. */

+  // FIXME: These aren't entirely correct.


How not?  Can local extern function decls have contract attributes?

+ /* FIXME when we instatiate a template class with 
guarded
+  * members, particularly guarded template members, 
the resulting
+  * pre/post functions end up being inaccessible 
because their
+  * template info's context is the original 
uninstantiated class.


This sounds like a significant functionality gap.  I'm guessing you 
want

to reconsider the unshare_template approach.


One approach would be to only do the pre/post/guarded/unguarded 
transformation for a fully-instantiated function; a temploid function 
would leave the contracts as attributes.


+  /* FIXME do we need magic to perfectly forward this so we 
don't clobber

+    RVO/NRVO etc?  */


Yes.  CALL_FROM_THUNK_P will probably get you some of the magic you
want.


These points are still being investigated and addressed; including them
for reference.


Any update?


More soon.


Please let me know what other issues need work.


If there's anything I can do to make the process smoother please don't
hesitate to ask.


Larger-scope comments:

Please add an overview of the implementation strategy to the top of 
cxx-contracts.c.  Particularly to discuss the why and how of 
pre/post/guarded/unguarded functions.


I'm confused by the approach to late parsing of contracts; it seems like 
you wait until the end of parsing the function to go back and parse the 
contracts.  Why not parse them sooner, along with nsdmis, noexcept, and 
function bodies?


Smaller-scope comments:


+   /* If we didn't build a check, insert a NOP so we don't leak
+  contracts into GENERIC.  */
+   *stmt_p = build1 (NOP_EXPR, void_type_node, integer_zero_node);


You can use void_node for the NOP.


+  error_at (EXPR_LOCATION (new_contract),
+   "mismatched contract condition in %s",
+   ctx == cmc_declaration ? "declaration" : "override");


This sort of build-up of diagnostic messages by substring replacement 
doesn't work very well for translations.  In general, don't use %s for 
inserting English text, only code strings that will be the same in all 
languages.



+  /* Remove the associated contracts and unchecked result, if any.  */
+  if (flag_contracts && TREE_CODE (newdecl) == FUNCTION_DECL)
+    {
+  remove_contract_attributes (newdecl);
+  set_contract_functions (newdecl, NULL_TREE, NULL_TREE);
+    }


Why bother removing attributes on a decl that's about to be freed?

Why did we set the contract functions above only to clear them now?


   if (DECL_EXTERNAL (decl) && ! DECL_TEMPLATE_SPECIALIZATION (decl)
  /* Aliases are definitions. */
- && !alias)
+ && !alias
+ && (DECL_VIRTUAL_P (decl) || !flag_contracts))
    permerror (declarator->id_loc,
   "declaration of %q#D outside of class is not 
definition",

   decl);
+  else if (DECL_EXTERNAL (decl) && !

[COMMITTED] MAINTAINERS: Add myself for write after approval

2021-05-17 Thread Serge Belyshev via Gcc-patches

2021-05-17  Serge Belyshev  

* MAINTAINERS (Write After Approval): Add myself.

---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5b10f212ce8..fbaa183cea4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -317,6 +317,7 @@ Gergö Barany

 Charles Baylis 
 Tejas Belagod  
 Matthew Beliveau   
+Serge Belyshev 
 Jon Beniston   
 Andrew Bennett 
 Andrew Benson

Re: [PATCH] Add a couple of A?CST1:CST2 match and simplify optimizations

2021-05-17 Thread Andrew Pinski via Gcc-patches

On Mon, May 17, 2021 at 9:41 AM Bernd Edlinger
 wrote:
>
> On 5/16/21 10:36 PM, apinski--- via Gcc-patches wrote:
> > From: Andrew Pinski 
> >
> > Instead of some of the more manual optimizations inside phi-opt,
> > it would be good idea to do a lot of the heavy lifting inside match
> > and simplify instead. In the process, this moves the three simple
> > A?CST1:CST2 (where CST1 or CST2 is zero) simplifications.
> >
> > OK? Boostrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > Thanks,
> > Andrew Pinski
> >
> > gcc:
> > * match.pd (A?CST1:CST2): Add simplifcations for A?0:+-1, A?+-1:0,
> > A?POW2:0 and A?0:POW2.
> > ---
> >  gcc/match.pd | 37 +
> >  1 file changed, 37 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 10503b97ab5..844f7dd5f87 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3711,6 +3711,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > (if (integer_all_onesp (@1) && integer_zerop (@2))
> >  @0
> >
> > +/* A few simplifications of "a ? CST1 : CST2". */
> > +/* NOTE: Only do this on gimple as the if-chain-to-switch
> > +   optimization depends on the gimple to have if statements in it. */
> > +#if GIMPLE
> > +(simplify
> > + (cond @0 INTEGER_CST@1 INTEGER_CST@2)
> > + (switch
> > +  (if (integer_zerop (@2))
> > +   (switch
> > +/* a ? 1 : 0 -> a if 0 and 1 are integral types. */
> > +(if (integer_onep (@1))
> > + (convert (convert:boolean_type_node @0)))
> > +/* a ? -1 : 0 -> -a. */
> > +(if (integer_all_onesp (@1))
> > + (negate (convert (convert:boolean_type_node @0
> > +/* a ? powerof2cst : 0 -> a << (log2(powerof2cst)) */
> > +(if (!POINTER_TYPE_P (type) && integer_pow2p (@1))
> > + (with {
> > +   tree shift = build_int_cst (integer_type_node, wi::exact_log2 
> > (wi::to_wide (@1)));
> > +  }
> > +  (lshift (convert (convert:boolean_type_node @0)) { shift; })
> > +  (if (integer_zerop (@1))
> > +   (switch
> > +/* a ? 0 : 1 -> !a. */
> > +(if (integer_onep (@2))
> > + (convert (bit_not:boolean_type_node (convert:boolean_type_node @0
> > +/* a ? -1 : 0 -> -(!a). */
> > +(if (integer_all_onesp (@2))
> > + (negate (convert (bit_not:boolean_type_node 
> > (convert:boolean_type_node @0)
> > +/* a ? powerof2cst : 0 -> (!a) << (log2(powerof2cst)) */
> > +(if (!POINTER_TYPE_P (type) && integer_pow2p (@2))
> > + (with {
> > +   tree shift = build_int_cst (integer_type_node, wi::exact_log2 
> > (wi::to_wide (@2)));
> > +  }
> > +  (lshift (convert (bit_not:boolean_type_node 
> > (convert:boolean_type_node @0))) { shift; })))
> > +#endif
> > +
> >  /* Simplification moved from fold_cond_expr_with_comparison.  It may also
> > be extended.  */
> >  /* This pattern implements two kinds simplification:
> >
>
> Hi Andrew,
>
> Sorry, but I don't know what is exactly  wrong with this patch,
> but it seems to cause this, when I try to bootstrap it:

Thanks for testing the patch out.  I did not normally build with Ada
enabled so I did not see this. Anyways I will try to debug it and see
what is going wrong.  I might have made a typo or something else is
broken from the IR change.  I do know that normally COND_EXPR is not
normally generated so it could be anything.  Plus it is definitely
stage 2 being miscompiled which is causing stage 3 to crash.

Thanks,
Andrew Pinski


>
>
> /home/ed/gnu/gcc-build-2/./prev-gcc/xgcc 
> -B/home/ed/gnu/gcc-build-2/./prev-gcc/ 
> -B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
> -B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
> -B/home/ed/gnu/install/x86_64-pc-linux-gnu/lib/ -isystem 
> /home/ed/gnu/install/x86_64-pc-linux-gnu/include -isystem 
> /home/ed/gnu/install/x86_64-pc-linux-gnu/sys-include   -fchecking=1 -c -g -O2 
> -fchecking=1  -gnatpg -gnata -W -Wall -nostdinc -I- -I. -Iada/generated -Iada 
> -Iada/gcc-interface -I../../gcc-trunk-1/gcc/ada 
> -I../../gcc-trunk-1/gcc/ada/gcc-interface -Iada/libgnat 
> -I../../gcc-trunk-1/gcc/ada/libgnat 
> ../../gcc-trunk-1/gcc/ada/libgnat/a-charac.ads -o ada/libgnat/a-charac.o
> /home/ed/gnu/gcc-build-2/./prev-gcc/xgcc 
> -B/home/ed/gnu/gcc-build-2/./prev-gcc/ 
> -B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
> -B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
> -B/home/ed/gnu/install/x86_64-pc-linux-gnu/lib/ -isystem 
> /home/ed/gnu/install/x8

Re: [committed] arm: correctly handle inequality comparisons against max constants [PR100563]

2021-05-17 Thread Hans-Peter Nilsson

On Thu, 13 May 2021, Richard Earnshaw via Gcc-patches wrote:
>
> Normally we expect the gimple optimizers to fold away comparisons that
> are always true, but at some lower optimization levels this is not
> always the case, so the back-end has to be able to generate correct
> code in these cases.
>
> In this example, we have a comparison of the form
>
>   (unsigned long long) op <= ~0ULL
>
> which, of course is always true.
>
> Normally, in the arm back-end we handle these expansions where the
> immediate cannot be handled directly by adding 1 to the constant and
> then adjusting the comparison operator:
>
>   (unsigned long long) op < CONST + 1
>
> but we cannot do that when the constant is already the largest value.

Sounds like a target-independent bug in the making, lurking and
waiting for a target to do the above adjustment but missing the
bounds-check.

>  gcc/testsuite/gcc.dg/pr100563.c |  9 +
>  2 files changed, 34 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr100563.c

I'll therefore humbly suggest the test-case adjusted to be a
run-time check (and if not done by others, projecting to do that
myself...some time late next summer9.

brgds, H-P

Re: [PATCH] libstdc++: Fix iterator caching inside range adaptors [PR100479]

2021-05-17 Thread Patrick Palka via Gcc-patches

On Mon, 17 May 2021, Tim Song wrote:

> On Mon, May 17, 2021 at 11:46 AM Patrick Palka via Libstdc++
>  wrote:
> > constexpr void
> > _M_set(const _Range&, const iterator_t<_Range>& __it)
> > {
> >   __glibcxx_assert(!_M_has_value());
> > - _M_iter = __it;
> > + this->_M_payload._M_payload._M_value = __it;
> > + this->_M_payload._M_engaged = true;
> > }
> >};
> 
> This part looks questionable - if we don't have a value then we can't
> assign to a nonexistent object.

Whoops, that was my lazy attempt at making the function
constexpr-friendly without thinking enough about it.  I now changed this
to use std::construct_at appropriately.

> 
> Also, I believe the offset partial specialization of _CachedPosition
> needs a change to invalidate the source on move.

Ah, true.  I reckoned that because it's safe to propagate an offset on
copy, we can leave alone its behavior on move.  I changed this to
propagate the cached offset on copy/move, but additionally invalidate
the source object on move.

How does this look?

-- >8 --

Subject: [PATCH] libstdc++: Fix iterator caching inside range adaptors
 [PR100479]

This fixes two issues with our iterator caching as described in detail
in the PR.  Since we recently added the __non_propagating_cache class
template as part of r12-336 for P2328, this patch just rewrites the
problematic _CachedPosition partial specialization in terms of this
class template.

For the offset partial specialization, it's safe to propagate the cached
value on copy/move, but we should still invalidate the cached value in
the source object on move.

libstdc++-v3/ChangeLog:

PR libstdc++/100479
* include/std/ranges (__detail::__non_propagating_cache): Move
definition up to before that of _CachedPosition.  Make base
class _Optional_base protected instead of private.  Add const
overload for operator*.
(__detail::_CachedPosition): Rewrite the partial specialization
for forward ranges as a derived class of __non_propagating_cache.
Remove the size constraint on the partial specialization for
random access ranges.  Add copy/move/copy-assignment/move-assignment
members to the offset partial specialization for random access
ranges that propagate the cached value but additionally
invalidates it in the source object on move.
* testsuite/std/ranges/adaptors/100479.cc: New test.
---
 libstdc++-v3/include/std/ranges   | 158 ++
 .../testsuite/std/ranges/adaptors/100479.cc   | 113 +
 2 files changed, 204 insertions(+), 67 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/100479.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 1707aeaebcd..b48fbca3cf6 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1139,6 +1139,67 @@ namespace views::__adaptor
 
   namespace __detail
   {
+template
+  struct __non_propagating_cache
+  {
+   // When _Tp is not an object type (e.g. is a reference type), we make
+   // __non_propagating_cache<_Tp> empty rather than ill-formed so that
+   // users can easily conditionally declare data members with this type
+   // (such as join_view::_M_inner).
+  };
+
+template
+  requires is_object_v<_Tp>
+  struct __non_propagating_cache<_Tp> : protected _Optional_base<_Tp>
+  {
+   __non_propagating_cache() = default;
+
+   constexpr
+   __non_propagating_cache(const __non_propagating_cache&) noexcept
+   { }
+
+   constexpr
+   __non_propagating_cache(__non_propagating_cache&& __other) noexcept
+   { __other._M_reset(); }
+
+   constexpr __non_propagating_cache&
+   operator=(const __non_propagating_cache& __other) noexcept
+   {
+ if (std::__addressof(__other) != this)
+   this->_M_reset();
+ return *this;
+   }
+
+   constexpr __non_propagating_cache&
+   operator=(__non_propagating_cache&& __other) noexcept
+   {
+ this->_M_reset();
+ __other._M_reset();
+ return *this;
+   }
+
+   constexpr _Tp&
+   operator*() noexcept
+   { return this->_M_get(); }
+
+   constexpr const _Tp&
+   operator*() const noexcept
+   { return this->_M_get(); }
+
+   template
+ _Tp&
+ _M_emplace_deref(const _Iter& __i)
+ {
+   this->_M_reset();
+   // Using _Optional_base::_M_construct to initialize from '*__i'
+   // would incur an extra move due to the indirection, so we instead
+   // use placement new directly.
+   ::new ((void *) std::__addressof(this->_M_payload._M_payload)) 
_Tp(*__i);
+   this->_M_payload._M_engaged = true;
+   return this->_M_get();
+ }
+  };
+
 template
   struct _CachedPosition
   {
@@

Re: [PATCH] Hashtable PR96088

2021-05-17 Thread François Dumont via Gcc-patches


Hi

    No chance yet to review this proposal ?

François

On 06/05/21 10:03 pm, François Dumont wrote:

Hi

    Considering your feedback on backtrace in debug mode is going to 
take me some time so here is another one.


    Compared to latest submission I've added a _Hash_arg_t partial 
specialization for std::hash<>. It is not strictly necessary for the 
moment but when we will eventually remove its nested argument_type it 
will be. I also wonder if it is not easier to handle for the compiler, 
not sure about that thought.


Tested under Linux x86_64, ok to commit ?

François


On 04/12/20 10:10 am, François Dumont wrote:
Following submission of the heterogeneous lookup in unordered 
containers I rebased this patch on top of it.


Appart from reducing its size because of some code reuse the 
heterogeneous lookup had no impact on this one. This is because when 
I cannot find out if conversion from inserted element type to hash 
functor can throw then I pass the element as-is, like if hash functor 
was transparent.


    libstdc++: Limit allocation on iterator insertion in Hashtable 
[PR 96088]


    Detect Hash functor argument type to find out if it is different 
to the
    container key_type and if a temporary instance needs to be 
generated to invoke
    the functor from the iterator value_type key part. If this 
temporary generation
    can throw a key_type instance is generated at Hashtable level and 
used to call

    the functors and, if necessary, moved to the storage.

    libstdc++-v3/ChangeLog:

    PR libstdc++/96088
    * include/bits/hashtable_policy.h (_Select2nd): New.
    (_NodeBuilder<>): New.
    (_ReuseOrAllocNode<>::operator()): Use variadic template 
args.

    (_AllocNode<>::operator()): Likewise.
    (_Hash_code_base<>::_M_hash_code): Add _Kt template 
parameter.

    (_Hashtable_base<>::_M_equals): Add _Kt template parameter.
    * include/bits/hashtable.h
    (_Hashtable<>::__node_builder_t): New.
    (_Hashtable<>::_M_find_before_node): Add _Kt template 
parameter.

    (_Hashtable<>::_M_find_node): Likewise.
    (_Hashtable<>::_Hash_arg_t): New.
    (_Hashtable<>::_S_forward_key): New.
(_Hashtable<>::_M_insert_unique<>(_Kt&&, _Arg&&, const 
_NodeGenerator&)):

 New.
    (_Hashtable<>::_M_insert): Use latter.
    * testsuite/23_containers/unordered_map/96088.cc: New test.
    * testsuite/23_containers/unordered_multimap/96088.cc: 
New test.
    * testsuite/23_containers/unordered_multiset/96088.cc: 
New test.

    * testsuite/23_containers/unordered_set/96088.cc: New test.
    * testsuite/util/replacement_memory_operators.h
    (counter::_M_increment): New.
    (counter::_M_decrement): New.
    (counter::reset()): New.

Note that I plan to change counter type name to something more 
meaningful but only when back to stage 1.


François

On 24/10/20 4:25 pm, François Dumont wrote:

Hi

    Just a rebase of this patch.

François

On 17/10/20 6:21 pm, François Dumont wrote:

I eventually would like to propose the following resolution.

For multi-key containers I kept the same resolution build the node 
first and compute has code from the node key.


For unique-key ones I change behavior when I can't find out hash 
functor argument type. I am rather using the iterator key type and 
just hope that the user's functors are prepared for it.


For now I am using functor argument_type which is deprecated. I 
just hope that the day we remove it we will have a compiler 
built-in to get any functor argument type given an input type.


    libstdc++: Limit allocation on iterator insertion in Hashtable 
[PR 96088]


    Detect Hash functor argument type to find out if it is 
different to the
    container key_type and if a temporary instance needs to be 
generated to invoke
    the functor from the iterator value_type key part. If this 
temporary generation
    can throw a key_type instance is generated at Hashtable level 
and use to call

    the functors and, if needed, move it to the storage.

    libstdc++-v3/ChangeLog:

    PR libstdc++/96088
    * include/bits/hashtable_policy.h (_Select2nd): New.
    (_NodeBuilder<>): New.
    (_ReuseOrAllocNode<>::operator()): Use varriadic 
template args.

    (_AllocNode<>::operator()): Likewise.
    (_Hash_code_base<>::_M_hash_code): Add _KType template 
parameter.
    (_Hashtable_base<>::_M_equals): Add _KType template 
parameter.

    * include/bits/hashtable.h
    (_Hashtable<>::__node_builder_t): New.
    (_Hashtable<>::_M_find_before_node): Add _KType 
template parameter.

    (_Hashtable<>::_M_find_node): Likewise.
    (_Hashtable<>::_Hash_arg_t): New.
    (_Hashtable<>::_S_forward_key): New.
(_Hashtable<>::_M_insert_unique<>(_KType&&, _Arg&&, const

Re: [PATCH] libstdc++: Fix wrong thread waking on notify [PR100334]

2021-05-17 Thread Thomas Rodgers


On 2021-05-17 09:43, Jonathan Wakely wrote:

On 14/05/21 18:09 +0100, Jonathan Wakely wrote: On 13/05/21 18:54 
-0700, Thomas Rodgers wrote: From: Thomas Rodgers 



Please ignore the previous patch. This one removes the need to carry 
any

extra state in the case of a 'laundered' atomic wait.

libstdc++/ChangeLog:
* include/bits/atomic_wait.h (__waiter::_M_do_wait_v): loop
until value change observed.
(__waiter_base::_M_laundered): New member function.
(__watier_base::_M_notify): Check _M_laundered() to determine
whether to wake one or all.
(__detail::__atomic_compare): Return true if call to
__builtin_memcmp() == 0.
(__waiter_base::_S_do_spin_v): Adjust predicate.
* testsuite/29_atomics/atomic/wait_notify/100334.cc: New
test.
---
libstdc++-v3/include/bits/atomic_wait.h   | 28 --
.../29_atomics/atomic/wait_notify/100334.cc   | 94 +++
2 files changed, 114 insertions(+), 8 deletions(-)
create mode 100644 
libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc


diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h

index 984ed70f16c..07bb744d822 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -181,11 +181,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return false;
}

+// return true if equal
template
bool __atomic_compare(const _Tp& __a, const _Tp& __b)
{
// TODO make this do the correct padding bit ignoring comparison
-return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) != 0;
+return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
}

struct __waiter_pool_base
@@ -300,14 +301,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
explicit __waiter_base(const _Up* __addr) noexcept
: _M_w(_S_for(__addr))
, _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
-  {
-  }
+  { }
+
+bool
+_M_laundered() const
+{ return _M_addr == &_M_w._M_ver; }

void
_M_notify(bool __all, bool __bare = false)
{
-  if (_M_addr == &_M_w._M_ver)
-__atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);
+  if (_M_laundered())
+{
+  __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);
Please mention this increment in the changelog.


Ugh, sorry, I seem to have forgotten how to read a diff.


OK for trunk and gcc-11 with that change, thanks.


OK to push, no changes needed.

Tested x86_64-pc-linux-gnu, committed to master and cherry-picked to 
releases/gcc-11.

Re: [PATCH] aix: handle 64bit inodes for include directories

2021-05-17 Thread David Edelsohn via Gcc-patches

The aix.h change is okay with me, but you need to get approval for the
incpath.c and cpplib.h parts of the patch from the appropriate
maintainers.

Thanks, David

On Mon, May 17, 2021 at 7:44 AM CHIGOT, CLEMENT  wrote:
>
> On AIX, stat will store inodes in 32bit even when using LARGE_FILES.
> If the inode is larger, it will return -1 in st_ino.
> Thus, in incpath.c when comparing include directories, if several
> of them have 64bit inodes, they will be considered as duplicated.
>
> gcc/ChangeLog:
> 2021-05-06  Clément Chigot  
>
> * configure.ac: Check sizeof ino_t and dev_t.
> * config.in: Regenerate.
> * configure: Regenerate.
> * config/rs6000/aix.h (HOST_STAT_FOR_64BIT_INODES): New define.
> * incpath.c (HOST_STAT_FOR_64BIT_INODES): New define.
> (remove_duplicates): Use it.
>
> libcpp/ChangeLog:
> 2021-05-06  Clément Chigot  
>
> * configure.ac: Check sizeof ino_t and dev_t.
> * config.in: Regenerate.
> * configure: Regenerate.
> * include/cpplib.h (INO_T_CPP): Change for AIX.
> (DEV_T_CPP): New macro.
> (struct cpp_dir): Use it.
>
>
>

Re: [PATCH v3] bpf.2: Use standard types and attributes

2021-05-17 Thread Daniel Borkmann


On 5/16/21 11:16 AM, Alejandro Colomar (man-pages) wrote:

On 5/15/21 9:01 PM, Alejandro Colomar wrote:

Some manual pages are already using C99 syntax for integral
types 'uint32_t', but some aren't.  There are some using kernel
syntax '__u32'.  Fix those.

Both the kernel and the standard types are 100% binary compatible,
and the source code differences between them are very small, and
not important in a manual page:

- Some of them are implemented with different underlying types
   (e.g., s64 is always long long, while int64_t may be long long
   or long, depending on the arch).  This causes the following
   differences.

- length modifiers required by printf are different, resulting in
   a warning ('-Wformat=').

- pointer assignment causes a warning:
   ('-Wincompatible-pointer-types'), but there aren't any pointers
   in this page.

But, AFAIK, all of those warnings can be safely ignored, due to
the binary compatibility between the types.

...

Some pages also document attributes, using GNU syntax
'__attribute__((xxx))'.  Update those to use the shorter and more
portable C11 keywords such as 'alignas()' when possible, and C2x
syntax '[[gnu::xxx]]' elsewhere, which hasn't been standardized
yet, but is already implemented in GCC, and available through
either --std=c2x or any of the --std=gnu... options.

The standard isn't very clear on how to use alignas() or
[[]]-style attributes, and the GNU documentation isn't better, so
the following link is a useful experiment about the different
alignment syntaxes:
__attribute__((aligned())), alignas(), and [[gnu::aligned()]]:


Signed-off-by: Alejandro Colomar 

Discussion: 


Nacked-by: Alexei Starovoitov 
Nacked-by: Greg Kroah-Hartman 


You forgot to retain my ...

Nacked-by: Daniel Borkmann 


Acked-by: Zack Weinberg 
Cc: LKML 
Cc: glibc 
Cc: GCC 
Cc: bpf 
Cc: David Laight 
Cc: Joseph Myers 
Cc: Florian Weimer 
Cc: Daniel Borkmann 
---
  man2/bpf.2 | 49 -
  1 file changed, 24 insertions(+), 25 deletions(-)

Re: [PATCH] libstdc++: Fix iterator caching inside range adaptors [PR100479]

2021-05-17 Thread Tim Song via Gcc-patches

On Mon, May 17, 2021 at 11:46 AM Patrick Palka via Libstdc++
 wrote:
> constexpr void
> _M_set(const _Range&, const iterator_t<_Range>& __it)
> {
>   __glibcxx_assert(!_M_has_value());
> - _M_iter = __it;
> + this->_M_payload._M_payload._M_value = __it;
> + this->_M_payload._M_engaged = true;
> }
>};

This part looks questionable - if we don't have a value then we can't
assign to a nonexistent object.

Also, I believe the offset partial specialization of _CachedPosition
needs a change to invalidate the source on move.

Re: [PATCH][DOCS] Remove install-old.texi

2021-05-17 Thread Richard Sandiford via Gcc-patches

Martin Liška  writes:
> Hello.
>
> As mentioned at the beginning of https://gcc.gnu.org/install/old.html:
> "Note most of this information is out of date and superseded by the previous 
> chapters of this manual."
>
> The installation page is deprecated for 20 years now.
>
> Does it make sense to remove it?
> Thanks,
> Martin

I agree this makes sense, but it looks like there's some stuff here that
isn't explained (or isn't explained as well) in the new version.  E.g.:

> -Here is the procedure for installing GCC on a GNU or Unix system.
> -
> -@enumerate
> -@item
> -If you have chosen a configuration for GCC which requires other GNU
> -tools (such as GAS or the GNU linker) instead of the standard system
> -tools, install the required tools in the build directory under the names
> -@file{as}, @file{ld} or whatever is appropriate.
> -
> -Alternatively, you can do subsequent compilation using a value of the
> -@code{PATH} environment variable such that the necessary GNU tools come
> -before the standard system tools.
> -
> -@item
> -Specify the host, build and target machine configurations.  You do this
> -when you run the @file{configure} script.
> -
> -The @dfn{build} machine is the system which you are using, the
> -@dfn{host} machine is the system where you want to run the resulting
> -compiler (normally the build machine), and the @dfn{target} machine is
> -the system for which you want the compiler to generate code.

I can't see the equivalent of this paragraph in the new docs.
There are scattered examples that use --build and --host, but nothing
that says what --build and --host actually do.

> -If you are building a compiler to produce code for the machine it runs
> -on (a native compiler), you normally do not need to specify any operands
> -to @file{configure}; it will try to guess the type of machine you are on
> -and use that as the build, host and target machines.  So you don't need
> -to specify a configuration when building a native compiler unless
> -@file{configure} cannot figure out what your configuration is or guesses
> -wrong.
> -
> -In those cases, specify the build machine's @dfn{configuration name}
> -with the @option{--host} option; the host and target will default to be
> -the same as the host machine.
> -
> -Here is an example:
> -
> -@smallexample
> -./configure --host=sparc-sun-sunos4.1
> -@end smallexample
> -
> -A configuration name may be canonical or it may be more or less
> -abbreviated.
> -
> -A canonical configuration name has three parts, separated by dashes.
> -It looks like this: @samp{@var{cpu}-@var{company}-@var{system}}.

This too isn't said explicitly in the new docs AFAICS.

Thanks for cleaning this up.

Richard

> -(The three parts may themselves contain dashes; @file{configure}
> -can figure out which dashes serve which purpose.)  For example,
> -@samp{m68k-sun-sunos4.1} specifies a Sun 3.
> -
> -You can also replace parts of the configuration by nicknames or aliases.
> -For example, @samp{sun3} stands for @samp{m68k-sun}, so
> -@samp{sun3-sunos4.1} is another way to specify a Sun 3.
> -
> -You can specify a version number after any of the system types, and some
> -of the CPU types.  In most cases, the version is irrelevant, and will be
> -ignored.  So you might as well specify the version if you know it.
> -
> -See @ref{Configurations}, for a list of supported configuration names and
> -notes on many of the configurations.  You should check the notes in that
> -section before proceeding any further with the installation of GCC@.
> -
> -@end enumerate
> -
> -@ifnothtml
> -@node Configurations, , , Old
> -@section Configurations Supported by GCC
> -@end ifnothtml
> -@html
> -@anchor{Configurations}Configurations Supported by GCC
> -@end html
> -@cindex configurations supported by GCC
> -
> -Here are the possible CPU types:
> -
> -@quotation
> -@c gmicro, fx80, spur and tahoe omitted since they don't work.
> -1750a, a29k, alpha, arm, avr, c@var{n}, clipper, dsp16xx, elxsi, fr30, h8300,
> -hppa1.0, hppa1.1, i370, i386, i486, i586, i686, i786, i860, i960, ip2k, m32r,
> -m68000, m68k, m88k, mcore, mips, mipsel, mips64, mips64el,
> -mn10200, mn10300, ns32k, pdp11, powerpc, powerpcle, romp, rs6000, sh, sparc,
> -sparclite, sparc64, v850, vax, we32k.
> -@end quotation
> -
> -Here are the recognized company names.  As you can see, customary
> -abbreviations are used rather than the longer official names.
> -
> -@c What should be done about merlin, tek*, dolphin?
> -@quotation
> -acorn, alliant, altos, apollo, apple, att, bull,
> -cbm, convergent, convex, crds, dec, dg, dolphin,
> -elxsi, encore, harris, hitachi, hp, ibm, intergraph, isi,
> -mips, motorola, ncr, next, ns, omron, plexus,
> -sequent, sgi, sony, sun, tti, unicom, wrs.
> -@end quotation
> -
> -The company name is meaningful only to disambiguate when the rest of
> -the information supplied is insufficient.  You can omit it, writing
> -just @samp{@var{cpu}-@var{system}}, if it is not needed.  For example,
>

Re: [PATCH] tree-sra: Avoid refreshing into const base decls (PR 100453)

2021-05-17 Thread Eric Botcazou

> sorry for breaking Ada over the weekend, however...

No problem.

> None of the non-ACATS tests fail for me without doing a bootstrap, but I
> did manage to reproduce this one (by the way, there really should be a
> documentation on how to run ACATS tests manually, I should not need to
> spend an hour and half figuring it out).

https://gcc.gnu.org/wiki/DebuggingGCC

> The problem is that before (early) SRA, there is a TREE_READONLY decl
> that is being written to and my patch eliminated those writes.
> Specifically, I see:
> 
>:
>   var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[1]{lb: 1 sz: 1} = _877;
>   _880 = report.ident_bool (1);
> 
>:
>   var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[2]{lb: 1 sz: 1} = _880;
>   _883 = report.ident_bool (1);
> 
>:
>   var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[3]{lb: 1 sz: 1} = _883;
>   _886 = report.ident_bool (1);
> 
>:
>   var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[4]{lb: 1 sz: 1} = _886;
>   _889 = report.ident_bool (1);
> 
> [...and many more.]  Note that this is an -fdump-tree-all-uid dump, the
> DECL that is being assigned to has DECL_UID 5012 and when I dump it:
> 
> DECL_UID of racc->base is: 5012
> print_node (dump_file, "", racc->base, 0):
>   type  sizes-gimplified type_5 TI size 
> unit-size 
> align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7fc249faf690 fields  0x7fc249faf1f8 c41325a__p__array_5> decl_3 TI c41325a.adb:57:11
> size 
> unit-size 
> align:8 warn_if_not_align:0 offset_align 128
> offset 
> bit-offset  context
> > context
>  Ada size  constant 128>
> pointer_to_this  chain  0x7fc249fb0390 c41325a__p__p__obj_ara_5___PAD>> --> readonly TI
> c41325a.adb:70:6
> size  bitsizetype> constant 128> unit-size   constant 16> align:64
> warn_if_not_align:0 context  chain
> >
> 
> I can see that base is TREE_READONLY.
> 
> Am I right that this is a bug happening at some point earlier in the Ada
> compiler?

Yes, in the gimplifier apparently, so try with:

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e790f08b23f..52cef6f8bff 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1822,6 +1822,7 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
  if (!TREE_STATIC (decl))
{
  DECL_INITIAL (decl) = NULL_TREE;
+ TREE_READONLY (decl) = 0;
  init = build2 (INIT_EXPR, void_type_node, decl, init);
  gimplify_and_add (init, seq_p);
  ggc_free (init);

-- 
Eric Botcazou

Re: [PATCH][nvptx] Handle memmodel for atomic ops

2021-05-17 Thread Tobias Burnus


Hi Tom,

short version works :-)
now (and I don't know why it didn't). Long version:

On 17.05.21 18:58, Tom de Vries wrote:

I have:
...
@ %r25 atom.global.exch.b32 %r22,[atomic_lock],1;
@ %r25 membar.sys;
...


I tried with the PowerPC cross build - running on
PowerPC + Quadro GV100 did work for both reduction-{5,6}.c
and I can confirm that this line does appear in all four
nvptx libgomp.a (x86->power, power->power and lib/ vs. lib/mgomp/)

On the x86-64 build, which did fail before and where I didn't spot that
line, it now also shows up now (after 'rm -rf' all files) – I wonder why
it didn't work before; the script should have deleted the directory
before compiling the files.
In any case, it is now works there as well with non-Volta 'Tesla K80'
and with 'TITAN V'.

Sorry for the red herring and thanks for the patch!

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf

[r12-837 Regression] FAIL: gcc.target/i386/pr100582.c scan-assembler-times pblendvb 1 on Linux/x86_64

2021-05-17 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

e0a5daf81f2c79a0275eccd7c1a25349990a7a4d is the first bad commit
commit e0a5daf81f2c79a0275eccd7c1a25349990a7a4d
Author: Richard Biener 
Date:   Mon May 17 13:56:14 2021 +0200

middle-end/100582 - fix array_at_struct_end_p for vector indexing

caused

FAIL: gcc.target/i386/pr100582.c scan-assembler-times pblendvb 1

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-837/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr100582.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr100582.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

Re: [PATCH 1/1 v2] PR100281 C++: Fix SImode pointer handling

2021-05-17 Thread Jason Merrill via Gcc-patches


On 5/17/21 4:48 AM, Richard Biener wrote:

On Thu, May 13, 2021 at 8:28 AM Andreas Krebbel via Gcc-patches
 wrote:


v1 -> v2: build_reference_type_for_mode and build_pointer_type_for_mode now 
pick pointer mode if
MODE argument is VOIDmode.

Bootstrapped and regression tested on x86_64 and s390x.

Ok for mainline and GCC 11?


The middle-end parts are fine with me.


The C++ parts are OK as well.


Richard.


Andreas


gcc/cp/ChangeLog:

 PR c++/100281
 * cvt.c (cp_convert_to_pointer): Use the size of the target
 pointer type.
 * tree.c (cp_build_reference_type): Call
 cp_build_reference_type_for_mode with VOIDmode.
 (cp_build_reference_type_for_mode): Rename from
 cp_build_reference_type.  Add MODE argument and invoke
 build_reference_type_for_mode.
 (strip_typedefs): Use build_pointer_type_for_mode and
 cp_build_reference_type_for_mode for pointers and references.

gcc/ChangeLog:

 PR c++/100281
 * tree.c (build_reference_type_for_mode)
 (build_pointer_type_for_mode): Pick pointer mode if MODE argument
 is VOIDmode.
 (build_reference_type, build_pointer_type): Invoke
 build_*_type_for_mode with VOIDmode.

gcc/testsuite/ChangeLog:

 PR c++/100281
 * g++.target/s390/pr100281-1.C: New test.
 * g++.target/s390/pr100281-2.C: New test.
---
  gcc/cp/cvt.c   |  2 +-
  gcc/cp/tree.c  | 25 ++-
  gcc/testsuite/g++.target/s390/pr100281-1.C | 10 
  gcc/testsuite/g++.target/s390/pr100281-2.C |  9 +++
  gcc/tree.c | 29 ++
  5 files changed, 57 insertions(+), 18 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/s390/pr100281-1.C
  create mode 100644 gcc/testsuite/g++.target/s390/pr100281-2.C

diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index f1687e804d1..7fa6e8df52b 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -232,7 +232,7 @@ cp_convert_to_pointer (tree type, tree expr, bool dofold,
  {
if (TYPE_PRECISION (intype) == POINTER_SIZE)
 return build1 (CONVERT_EXPR, type, expr);
-  expr = cp_convert (c_common_type_for_size (POINTER_SIZE, 0), expr,
+  expr = cp_convert (c_common_type_for_size (TYPE_PRECISION (type), 0), 
expr,
  complain);
/* Modes may be different but sizes should be the same.  There
  is supposed to be some integral type that is the same width
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 7f148b4b158..35faeff065a 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -1206,12 +1206,14 @@ vla_type_p (tree t)
return false;
  }

-/* Return a reference type node referring to TO_TYPE.  If RVAL is
+
+/* Return a reference type node of MODE referring to TO_TYPE.  If MODE
+   is VOIDmode the standard pointer mode will be picked.  If RVAL is
 true, return an rvalue reference type, otherwise return an lvalue
 reference type.  If a type node exists, reuse it, otherwise create
 a new one.  */
  tree
-cp_build_reference_type (tree to_type, bool rval)
+cp_build_reference_type_for_mode (tree to_type, machine_mode mode, bool rval)
  {
tree lvalue_ref, t;

@@ -1224,7 +1226,8 @@ cp_build_reference_type (tree to_type, bool rval)
to_type = TREE_TYPE (to_type);
  }

-  lvalue_ref = build_reference_type (to_type);
+  lvalue_ref = build_reference_type_for_mode (to_type, mode, false);
+
if (!rval)
  return lvalue_ref;

@@ -1250,7 +1253,7 @@ cp_build_reference_type (tree to_type, bool rval)
  SET_TYPE_STRUCTURAL_EQUALITY (t);
else if (TYPE_CANONICAL (to_type) != to_type)
  TYPE_CANONICAL (t)
-  = cp_build_reference_type (TYPE_CANONICAL (to_type), rval);
+  = cp_build_reference_type_for_mode (TYPE_CANONICAL (to_type), mode, 
rval);
else
  TYPE_CANONICAL (t) = t;

@@ -1260,6 +1263,16 @@ cp_build_reference_type (tree to_type, bool rval)

  }

+/* Return a reference type node referring to TO_TYPE.  If RVAL is
+   true, return an rvalue reference type, otherwise return an lvalue
+   reference type.  If a type node exists, reuse it, otherwise create
+   a new one.  */
+tree
+cp_build_reference_type (tree to_type, bool rval)
+{
+  return cp_build_reference_type_for_mode (to_type, VOIDmode, rval);
+}
+
  /* Returns EXPR cast to rvalue reference type, like std::move.  */

  tree
@@ -1561,11 +1574,11 @@ strip_typedefs (tree t, bool *remove_attributes, 
unsigned int flags)
  {
  case POINTER_TYPE:
type = strip_typedefs (TREE_TYPE (t), remove_attributes, flags);
-  result = build_pointer_type (type);
+  result = build_pointer_type_for_mode (type, TYPE_MODE (t), false);
break;
  case REFERENCE_TYPE:
type = strip_typedefs (TREE_TYPE (t), remove_attributes, flags);
-  result = cp_build_reference_type (type, TYPE_REF_IS_RVALUE (t));
+  result =

Re: [PATCH] c++: Fix diagnostic for binding lvalue reference to volatile rvalue [PR 100635]

2021-05-17 Thread Jason Merrill via Gcc-patches


On 5/17/21 7:15 AM, Jonathan Wakely wrote:

The current diagnostic assumes the reference binding fails because the
reference is non-const, but it can also fail if the rvalue is volatile.

Use the current diagnostic for non-const cases, and a modified
diagnostic otherwise.

gcc/cp/ChangeLog:

PR c++/100635
* call.c (convert_like_internal): Print different diagnostic if
the lvalue reference is const.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/pr100635.C: New test.

Tested powerpc64le-linux.

OK for trunk?


OK, thanks.

Jason

Re: [PATCH RFA] tree-iterator: C++11 range-for and tree_stmt_iterator

2021-05-17 Thread Jason Merrill via Gcc-patches


On 5/17/21 3:56 AM, Richard Biener wrote:

On Fri, May 14, 2021 at 2:23 AM Martin Sebor via Gcc-patches
 wrote:


On 5/13/21 1:26 PM, Jason Merrill via Gcc-patches wrote:

Ping.

On 5/1/21 12:29 PM, Jason Merrill wrote:

Like my recent patch to add ovl_range and lkp_range in the C++ front end,
this patch adds the tsi_range adaptor for using C++11 range-based
'for' with
a STATEMENT_LIST, e.g.

for (tree stmt : tsi_range (stmt_list)) { ... }

This also involves adding some operators to tree_stmt_iterator that are
needed for range-for iterators, and should also be useful in code that
uses
the iterators directly.

The patch updates the suitable loops in the C++ front end, but does not
touch any loops elsewhere in the compiler.


I like the modernization of the loops.


The only worry I have (and why I stopped looking at range-for) is that
this adds another style of looping over stmts without opening the
possibility to remove another or even unify all of them.  That's because
range-for isn't powerful enough w/o jumping through hoops and/or
we cannot use what appearantly ranges<> was intended for (fix
this limitation).


The range-for enabled by my patch simplifies the common case of simple 
iteration over elements; that seems worth doing to me even if it doesn't 
replace all loops.  Just as FOR_EACH_VEC_ELT isn't suitable for all 
loops over a vector.



That said, if some C++ literate could see if for example
what gimple-iterator.h provides can be completely modernized
then that would be great of course.

There's stuff like reverse iteration


This is typically done with the reverse_iterator<> adaptor, which we 
could get from  or duplicate.  I didn't see enough reverse 
iterations to make it seem worth the bother.



iteration skipping debug stmts,


There you can move the condition into the loop:

if (gimple_code (stmt) == GIMPLE_DEBUG)
  continue;


compares of iterators like gsi_one_before_end_p, etc.


Certainly anything where you want to mess with the iterators directly 
doesn't really translate to range-for.



Given my failed tries (but I'm a C++ illiterate) my TODO list now
only contains turning the iterators into STL style ones, thus
gsi_stmt (it) -> *it, gsi_next () -> ++it, etc. - but even
it != end_p looks a bit awkward there.


Well, it < end_val is pretty typical for loops involving integer 
iterators.  But you don't have to use that style if you'd rather not. 
You could continue to use gsi_end_p, or just *it, since we know that *it 
is NULL at the end of the sequence.



I can't find anything terribly wrong with the iterator but let me
at least pick on some nits ;)



gcc/ChangeLog:

 * tree-iterator.h (struct tree_stmt_iterator): Add operator++,
 operator--, operator*, operator==, and operator!=.
 (class tsi_range): New.

gcc/cp/ChangeLog:

 * constexpr.c (build_data_member_initialization): Use tsi_range.
 (build_constexpr_constructor_member_initializers): Likewise.
 (constexpr_fn_retval, cxx_eval_statement_list): Likewise.
 (potential_constant_expression_1): Likewise.
 * coroutines.cc (await_statement_expander): Likewise.
 (await_statement_walker): Likewise.
 * module.cc (trees_out::core_vals): Likewise.
 * pt.c (tsubst_expr): Likewise.
 * semantics.c (set_cleanup_locs): Likewise.
---
   gcc/tree-iterator.h  | 28 +++-
   gcc/cp/constexpr.c   | 42 ++
   gcc/cp/coroutines.cc | 10 --
   gcc/cp/module.cc |  5 ++---
   gcc/cp/pt.c  |  5 ++---
   gcc/cp/semantics.c   |  5 ++---
   6 files changed, 47 insertions(+), 48 deletions(-)

diff --git a/gcc/tree-iterator.h b/gcc/tree-iterator.h
index 076fff8644c..f57456bb473 100644
--- a/gcc/tree-iterator.h
+++ b/gcc/tree-iterator.h
@@ -1,4 +1,4 @@
-/* Iterator routines for manipulating GENERIC tree statement list.
+/* Iterator routines for manipulating GENERIC tree statement list.
-*- C++ -*-
  Copyright (C) 2003-2021 Free Software Foundation, Inc.
  Contributed by Andrew MacLeod  
@@ -32,6 +32,13 @@ along with GCC; see the file COPYING3.  If not see
   struct tree_stmt_iterator {
 struct tree_statement_list_node *ptr;
 tree container;


I assume the absence of ctors is intentional.  If so, I suggest
to add a comment explaing why.  Otherwise, I would provide one
(or as many as needed).


+
+  bool operator== (tree_stmt_iterator b) const
+{ return b.ptr == ptr && b.container == container; }
+  bool operator!= (tree_stmt_iterator b) const { return !(*this == b); }
+  tree_stmt_iterator ++ () { ptr = ptr->next; return *this; }
+  tree_stmt_iterator  () { ptr = ptr->prev; return *this; }


I would suggest to add postincrement and postdecrement.


+  tree * () { return ptr->stmt; }


Given the pervasive lack of const-safety in GCC and the by-value
semantics of the iterator this probably isn't worth it but maybe
add a const overload.  operator-> would probably never be used.


   };
   static inline

Re: [PATCH] gcc-changelog: Remove non-strict mode.

2021-05-17 Thread Jonathan Wakely via Gcc-patches


On 11/05/21 10:56 +0200, Martin Liška wrote:

Hello.

I'm going to push a commit that removes non-strict mode. It's useless right now.

Martin

contrib/ChangeLog:

* gcc-changelog/git_check_commit.py: Remove --non-strict-mode.
* gcc-changelog/git_commit.py: Remove strict mode.
* gcc-changelog/git_email.py: Likewise.
* gcc-changelog/git_repository.py: Likewise.
* gcc-changelog/test_email.py: Likewise.
* gcc-changelog/test_patches.txt: Update patches so that they
don't contain a ChangeLog file changes.



I think this is needed too:

--- a/contrib/gcc-changelog/git_email.py
+++ b/contrib/gcc-changelog/git_email.py
@@ -110,7 +110,7 @@ if __name__ == '__main__':
 print()
 print('Successfully parsed: %d/%d' % (success, len(allfiles)))
 else:
-email = GitEmail(sys.argv[1], False)
+email = GitEmail(sys.argv[1])
 if email.success:
 print('OK')
 email.print_output()


I currently get:

$ ../contrib/gcc-changelog/git_email.py /tmp/git-commit.lfiXB8
Traceback (most recent call last):
  File 
"/home/jwakely/src/gcc/gcc-11/libstdc++-v3/../contrib/gcc-changelog/git_email.py", 
line 113, in 
email = GitEmail(sys.argv[1], False)
TypeError: __init__() takes 2 positional arguments but 3 were given

[PATCH v2] x86: Warn for excessive argument alignment in main

2021-05-17 Thread H.J. Lu via Gcc-patches

On Thu, May 13, 2021 at 9:15 AM Bernd Edlinger
 wrote:
>
> On 5/13/21 3:37 PM, H.J. Lu via Gcc-patches wrote:
> > Warn for excessive argument alignment in main instead of ICE.
> >
> > gcc/
> >
> >   PR c/100575
> >   * cfgexpand.c (expand_stack_alignment): Add a bool argument for
> >   expanding main.  Warn for excessive argument alignment in main.
> >   (pass_expand::execute): Pass true to expand_stack_alignment when
> >   expanding main.
> >
> > gcc/testsuite/
> >
> >   PR c/100575
> >   * c-c++-common/pr100575.c: New test.
> > ---
> >  gcc/cfgexpand.c   | 26 --
> >  gcc/testsuite/c-c++-common/pr100575.c | 11 +++
> >  2 files changed, 31 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/c-c++-common/pr100575.c
> >
> > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> > index e3814ee9d06..50ccb720e6c 100644
> > --- a/gcc/cfgexpand.c
> > +++ b/gcc/cfgexpand.c
> > @@ -6363,7 +6363,7 @@ discover_nonconstant_array_refs (void)
> > virtual_incoming_args_rtx with the virtual register.  */
> >
> >  static void
> > -expand_stack_alignment (void)
> > +expand_stack_alignment (bool expanding_main)
> >  {
> >rtx drap_rtx;
> >unsigned int preferred_stack_boundary;
> > @@ -6385,9 +6385,18 @@ expand_stack_alignment (void)
> >if (targetm.calls.update_stack_boundary)
> >  targetm.calls.update_stack_boundary ();
> >
> > -  /* The incoming stack frame has to be aligned at least at
> > - parm_stack_boundary.  */
> > -  gcc_assert (crtl->parm_stack_boundary <= INCOMING_STACK_BOUNDARY);
> > +  if (crtl->parm_stack_boundary > INCOMING_STACK_BOUNDARY)
> > +{
> > +  /* The incoming stack frame has to be aligned at least at
> > +  parm_stack_boundary.  NB: The incoming stack frame alignment
> > +  for main is fixed.  */
> > +  if (expanding_main)
> > + warning_at (DECL_SOURCE_LOCATION (current_function_decl),
> > + OPT_Wmain, "argument alignment of %q+D is too large",
> > + current_function_decl);
> > +  else
> > + gcc_unreachable ();
> > +}
>
> Could you do this instead in ix86_minimum_incoming_stack_boundary

Fixed.

>   /* The incoming stack frame has to be aligned at least at
>  parm_stack_boundary.  */
>   if (incoming_stack_boundary < crtl->parm_stack_boundary)
> incoming_stack_boundary = crtl->parm_stack_boundary;
>
>   /* Stack at entrance of main is aligned by runtime.  We use the
>  smallest incoming stack boundary. */
>   if (incoming_stack_boundary > MAIN_STACK_BOUNDARY
>   && DECL_NAME (current_function_decl)
>   && MAIN_NAME_P (DECL_NAME (current_function_decl))
>   && DECL_FILE_SCOPE_P (current_function_decl))
> incoming_stack_boundary = MAIN_STACK_BOUNDARY;
>
>
> maybe just repeat this after incoming_stack_boundary is set to
> MAIN_STACK_BOUNDARY:
>
>   /* The incoming stack frame has to be aligned at least at
>  parm_stack_boundary.  */
>   if (incoming_stack_boundary < crtl->parm_stack_boundary)
> incoming_stack_boundary = crtl->parm_stack_boundary;
>
> and print the warning here?
>

Here is the v2 patch to issue a warning in
ix86_minimum_incoming_stack_boundary.

OK for master?

Thanks.

-- 
H.J.
From c28fb9d5c6f6d68ac137e461aada23f8e10352bb Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 13 May 2021 06:31:04 -0700
Subject: [PATCH v2] x86: Warn for excessive argument alignment in main

Warn for excessive argument alignment in main instead of ICE.

gcc/

	PR c/100575
	* config/i386/i386.c (ix86_minimum_incoming_stack_boundary):
	Warn for excessive argument alignment in main.

gcc/testsuite/

	PR c/100575
	* gcc.target/i386/pr100575.c: New test.
---
 gcc/config/i386/i386.c   | 12 +++-
 gcc/testsuite/gcc.target/i386/pr100575.c | 11 +++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100575.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index befe69e5eeb..85283d23bd3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -7272,7 +7272,17 @@ ix86_minimum_incoming_stack_boundary (bool sibcall)
   && DECL_NAME (current_function_decl)
   && MAIN_NAME_P (DECL_NAME (current_function_decl))
   && DECL_FILE_SCOPE_P (current_function_decl))
-incoming_stack_boundary = MAIN_STACK_BOUNDARY;
+{
+  incoming_stack_boundary = MAIN_STACK_BOUNDARY;
+  if (crtl->parm_stack_boundary > incoming_stack_boundary)
+	{
+	  warning_at (DECL_SOURCE_LOCATION (current_function_decl),
+		  OPT_Wmain,
+		  "argument alignment of %q+D is too large",
+		  current_function_decl);
+	  incoming_stack_boundary = crtl->parm_stack_boundary;
+	}
+}
 
   return incoming_stack_boundary;
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr100575.c b/gcc/testsuite/gcc.target/i386/pr100575.c
new file mode 100644
index 000..e7366a8fe7f
--- /dev/null
+++

Re: [PATCH] tree-sra: Avoid refreshing into const base decls (PR 100453)

2021-05-17 Thread Martin Jambor

Hi Eric,

sorry for breaking Ada over the weekend, however...

On Fri, May 14 2021, Eric Botcazou wrote:
>> Looks like this caused:
>> 
>> === acats tests ===
>> FAIL:   c41325a

None of the non-ACATS tests fail for me without doing a bootstrap, but I
did manage to reproduce this one (by the way, there really should be a
documentation on how to run ACATS tests manually, I should not need to
spend an hour and half figuring it out).

The problem is that before (early) SRA, there is a TREE_READONLY decl
that is being written to and my patch eliminated those writes.
Specifically, I see:

   :
  var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[1]{lb: 1 sz: 1} = _877;
  _880 = report.ident_bool (1);

   :
  var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[2]{lb: 1 sz: 1} = _880;
  _883 = report.ident_bool (1);

   :
  var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[3]{lb: 1 sz: 1} = _883;
  _886 = report.ident_bool (1);

   :
  var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[4]{lb: 1 sz: 1} = _886;
  _889 = report.ident_bool (1);

[...and many more.]  Note that this is an -fdump-tree-all-uid dump, the
DECL that is being assigned to has DECL_UID 5012 and when I dump it:

DECL_UID of racc->base is: 5012
print_node (dump_file, "", racc->base, 0):

unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x7fc249faf690
fields 
decl_3 TI c41325a.adb:57:11
size 
unit-size 
align:8 warn_if_not_align:0 offset_align 128
offset 
bit-offset  context 
> context 

Ada size 
pointer_to_this  chain >
--> readonly TI c41325a.adb:70:6  
size  constant 128>
unit-size  constant 16>
align:64 warn_if_not_align:0 context  
chain >

I can see that base is TREE_READONLY.

Am I right that this is a bug happening at some point earlier in the Ada
compiler?

Would you please have a look at why this is so?  Otherwise I can modify
my patch to only consider TREE_READONLY meaningful for PARM_DECLs but
that does not seem right.

Thanks,

Martin

>> FAIL:   c45347d
>> FAIL:   c74402a
>> FAIL:   c95085m
>> FAIL:   cc3601a
>> 
>> === gnat tests ===
>> 
>> FAIL: gnat.dg/addr12.adb (test for excess errors)
>> UNRESOLVED: gnat.dg/addr12.adb compilation failed to produce executable
>> FAIL: gnat.dg/addr12_a.adb (test for excess errors)
>> FAIL: gnat.dg/bip_overlay.adb (test for excess errors)
>> FAIL: gnat.dg/global.adb (test for excess errors)
>> FAIL: gnat.dg/spark1.adb  (test for errors, line 8)
>> FAIL: gnat.dg/spark1.adb (test for excess errors)
>> FAIL: gnat.dg/sync2.adb (test for excess errors)
>> FAIL: gnat.dg/synchronized1.adb (test for excess errors)
>
> Yes, it did, as well as PR boostrap/100597 probably.
>
> -- 
> Eric Botcazou

Re: [PATCH] PR libstdc++/89728 diagnose some missuses of [locale.convenience] functions

2021-05-17 Thread Jonathan Wakely via Gcc-patches


On 12/05/21 17:16 +0100, Jonathan Wakely wrote:

On 12/05/21 18:51 +0300, Antony Polukhin via Libstdc++ wrote:

ср, 12 мая 2021 г. в 18:38, Antony Polukhin :


ср, 12 мая 2021 г. в 17:44, Jonathan Wakely :


On 12/05/21 12:58 +0300, Antony Polukhin wrote:
>ср, 12 мая 2021 г. в 12:18, Jonathan Wakely :
><...>
>> Or just leave it undefined, as libc++ seems to do according to your
>> comment in PR 89728:
>>
>> error: implicit instantiation of undefined template 
'std::__1::ctype >'
>>
>> Was your aim to have a static_assert that gives a more descriptive
>> error? We could leave it undefined in C++98 and have the static assert
>> for C++11 and up.
>
>Leaving it undefined would be the best. It would allow SFINAE on ctype
>and a compile time error is informative enough.
>
>However, there may be users who instantiate ctype in a
>shared library without ctype template specializations in
>the main executable. Making the default ctype undefined would break
>their compilation:
>
>#include 
>// no ctype specialization
>c = std::tolower(ThierChar{42}, locale_from_shared_library()); // OK
>right now in libstdc++, fails on libc++

What I meant was leaving the partial specialization undefined, not the
primary template, i.e.

--- a/libstdc++-v3/include/bits/locale_facets.h
+++ b/libstdc++-v3/include/bits/locale_facets.h
@@ -1476,6 +1476,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  };
  #endif //_GLIBCXX_USE_WCHAR_T

+  template
+class ctype >;
+
/// class ctype_byname [22.2.1.2].
template
  class ctype_byname : public ctype<_CharT>

This makes your test fail with errors like this:

In file included from /home/jwakely/gcc/12/include/c++/12.0.0/locale:40,
  from loc.C:1:
/home/jwakely/gcc/12/include/c++/12.0.0/bits/locale_facets.h: In instantiation of 'bool 
std::isspace(_CharT, const std::locale&) [with _CharT = 
std::__cxx11::basic_string]':
loc.C:16:15:   required from here
/home/jwakely/gcc/12/include/c++/12.0.0/bits/locale_facets.h:2600:47: error: invalid use of 
incomplete type 'const class std::ctype >'
  2600 | { return use_facet >(__loc).is(ctype_base::space, 
__c); }
   |  ~^~

But it shouldn't affect the uses of ctype.

What do you think?


Good idea. That way the compiler message points directly to the
misused function.

Patch is in attachment


Replaced {} with () in test to be C++98 compatible


Looks great, thanks.

I'll test and commit this tomorrow.


Not quite "tomorrow", but it's pushed to trunk now. Thanks again!

Re: [PATCH] libstdc++: Fix filesystem::path constraints for volatile [PR 100630]

2021-05-17 Thread Jonathan Wakely via Gcc-patches


On 17/05/21 15:25 +0100, Jonathan Wakely wrote:

On 17/05/21 15:02 +0100, Jonathan Wakely wrote:

The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval())) which mean it considers
conversion from an rvalue.  When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.

Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval(),
not declval(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.

libstdc++-v3/ChangeLog:

PR libstdc++/100630
* include/bits/fs_path.h (__is_constructible_from): Test
construction from a const lvalue, not an rvalue.
* include/experimental/bits/fs_path.h (__is_constructible_from):
Likewise.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.

Tested x86_64-linux, pushed to gcc-10 (this isn't needed for gcc-11 or
trunk, but I also plan to backport it to gcc-9).


Oh actually this is needed for experimental::filesystem::path on trun
kand gcc-11 (as I found when I added to the new tests to trunk) so
I'll fix it there too.


Here's the patch for trunk and gcc-11.

commit 45aa7a447652e8541cc381d7ab128544f81ed857
Author: Jonathan Wakely 
Date:   Mon May 17 11:54:06 2021

libstdc++: Fix filesystem::path constraints for volatile [PR 100630]

The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval())) which mean it considers
conversion from an rvalue.  When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.

Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval(),
not declval(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.

libstdc++-v3/ChangeLog:

PR libstdc++/100630
* include/experimental/bits/fs_path.h (__is_constructible_from):
Test construction from a const lvalue, not an rvalue.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.

diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h b/libstdc++-v3/include/experimental/bits/fs_path.h
index 2df2bba3dcd..1ecf2f3a7bd 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -124,7 +124,7 @@ namespace __detail
 
   template
 struct __constructible_from<_Source, void>
-: decltype(__is_path_src(std::declval<_Source>(), 0))
+: decltype(__is_path_src(std::declval(), 0))
 { };
 
   template
+
+void f(bool) { }
+void f(const std::filesystem::path&) { }
+
+void
+test_100630()
+{
+  volatile bool b = true;
+  f(b);
+}
diff --git a/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc b/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc
new file mode 100644
index 000..b2428ff74cf
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++11 } }
+// { dg-require-filesystem-ts "" }
+
+#include 
+
+void f(bool) { }
+void f(const std::experimental::filesystem::path&) { }
+
+void
+test_100630()
+{
+  volatile bool b = true;
+  f(b);
+}

[committed] libstdc++: Fix std::jthread assertion and re-enable skipped test

2021-05-17 Thread Jonathan Wakely via Gcc-patches

libstdc++-v3/ChangeLog:

* include/std/thread (jthread::_S_create): Fix static assert
message.
* testsuite/30_threads/jthread/95989.cc: Re-enable test.
* testsuite/30_threads/jthread/jthread.cc: Do not require
pthread effective target.
* testsuite/30_threads/jthread/2.cc: Moved to...
* testsuite/30_threads/jthread/version.cc: ...here.

Tested powerpc64le-linux. Committed to trunk.

Let's see if this test is actually fixed, or if it still causes
failures on some targets.


commit 60a156ae53e976dfe44689f7c89e607596e7cf67
Author: Jonathan Wakely 
Date:   Mon May 17 14:55:22 2021

libstdc++: Fix std::jthread assertion and re-enable skipped test

libstdc++-v3/ChangeLog:

* include/std/thread (jthread::_S_create): Fix static assert
message.
* testsuite/30_threads/jthread/95989.cc: Re-enable test.
* testsuite/30_threads/jthread/jthread.cc: Do not require
pthread effective target.
* testsuite/30_threads/jthread/2.cc: Moved to...
* testsuite/30_threads/jthread/version.cc: ...here.

diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index 886994c1320..f51392ab42c 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -219,7 +219,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  {
static_assert(is_invocable_v,
 decay_t<_Args>...>,
- "std::thread arguments must be invocable after"
+ "std::jthread arguments must be invocable after"
  " conversion to rvalues");
return thread{std::forward<_Callable>(__f),
  std::forward<_Args>(__args)...};
diff --git a/libstdc++-v3/testsuite/30_threads/jthread/95989.cc 
b/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
index 53f90827f2e..fb3f43bc722 100644
--- a/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
+++ b/libstdc++-v3/testsuite/30_threads/jthread/95989.cc
@@ -20,7 +20,6 @@
 // { dg-require-gthreads {} }
 // { dg-additional-options "-pthread" { target pthread } }
 // { dg-additional-options "-static" { target static } }
-// { dg-skip-if "broken" { *-*-* } }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc 
b/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
index 6adc4981175..799787088ac 100644
--- a/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
+++ b/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
@@ -16,9 +16,9 @@
 // .
 
 // { dg-options "-std=gnu++2a -pthread" }
-// { dg-add-options libatomic }
 // { dg-do run { target c++2a } }
-// { dg-require-effective-target pthread }
+// { dg-add-options libatomic }
+// { dg-additional-options "-pthread" { target pthread } }
 // { dg-require-gthreads "" }
 
 #include

Re: [PATCH][nvptx] Handle memmodel for atomic ops

2021-05-17 Thread Tom de Vries

On 5/17/21 6:47 PM, Tobias Burnus wrote:
> On 17.05.21 17:49, Tom de Vries wrote:
>> [ Tobias, can you test this on volta ? ]
> 
> Unfortunately, it does not seem to help. On a non-Volta system, it still
> works (run time 0.3s) but on a Volta system it fails after 1.5s (abort).
> 
> Looking (with an editor) at nvptx-none/lib/mgomp/libgomp.a, I still see
>  @ %r25 atom.global.exch.b32 %r22,[atomic_lock],1;
> with no prior membar in GOMP_atomic_start.

I have:
...
@ %r25 atom.global.exch.b32 %r22,[atomic_lock],1;
@ %r25 membar.sys;
...

> Likewise with
> nvptx-none/lib/libgomp.a which has
>  atom.global.exch.b32 %r22,[atomic_lock],1;
> I thought a barrier would show up there?
> 

and:
...
atom.global.exch.b32 %r22,[atomic_lock],1;
membar.sys;
...

So both look as expect.

>> The atomic ops in nvptx.md have memmodel arguments, which are currently
>> ignored.
>> Handle these, fixing test-case fails
>> libgomp.c-c++-common/reduction-{5,6}.c
>> on volta.
> 
> Is there a reason that PR target/96932 isn't listed in the
> ChangeLog?

Just that I'm going to mark it a duplicate when this is fixed.

> Or is it supposed that the barrier does not show up
> at GOMP_atomic_start (as it doesn't) and it should show up elsewhere
> and still help with those two testcases?
> 

Nope, it should show up.

> Sorry for not having better news. (Unless I messed up and it is an
> issue on my side - but it doesn't look like.)

Well yes, it's possible that the patch somehow does not work, but then
you'll need to investigate why that is.

Thanks,
- Tom

Re: [PATCH] libstdc++: Fix iterator caching inside range adaptors [PR100479]

2021-05-17 Thread Jonathan Wakely via Gcc-patches


On 17/05/21 11:43 -0400, Patrick Palka via Libstdc++ wrote:

This fixes two issues with our iterator caching as described in detail
in the PR.  Since r12-336 added the __non_propagating_cache class
template as part of P2328, this patch just rewrites the _CachedPosition
partial specialization in terms of this class template.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk? 


OK, thanks.


Shall we
also backport this?


I think so, but give it a couple of weeks (or more) on trunk first.

Re: [PATCH] libstdc++: Fix access issues in elements_view::_Sentinel [PR100631]

2021-05-17 Thread Jonathan Wakely via Gcc-patches


On 17/05/21 12:17 -0400, Patrick Palka via Libstdc++ wrote:

Tested on x86_64-pc-linux-gnu, does this look OK for 10/11/trunk?


OK, thanks.


libstdc++-v3/ChangeLog:

PR libstdc++/100631
* include/std/ranges (elements_view::_Iterator): Befriend
_Sentinel.
(elements_view::_Sentinel::_M_equal): Templatize.
(elements_view::_Sentinel::_M_distance_from): Split out from ...
(elements_view::_Sentinel::operator-): Here.
* testsuite/std/ranges/adaptors/elements.cc (test06, test07):
New tests.

Re: [PATCH] libstdc++: Fix condition for memoizing reverse_view::begin() [PR100621]

2021-05-17 Thread Jonathan Wakely via Gcc-patches


On 17/05/21 12:17 -0400, Patrick Palka via Libstdc++ wrote:

A range being a random access range is not a sufficient condition for
ranges::next(iter, sent) to have constant time complexity; the range
must also have a sized sentinel.  This adjusts the memoization condition
for reverse_view accordingly.

Tested on x86_64-pc-linxu-gnu, does this look OK for trunk?  Doesn't
seem to be worth backporting.


OK for trunk. I agree the backports probably aren't needed, but if
it causes anybody problems we can do it later.



libstdc++-v3/ChangeLog:

* include/std/ranges (reverse_view::_S_needs_cached_begin):
Set to false if the underlying non-common random-access range
doesn't have a sized sentinel.
---
libstdc++-v3/include/std/ranges | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index bf52074ca05..e93469ca3b4 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3340,7 +3340,9 @@ namespace views::__adaptor
{
private:
  static constexpr bool _S_needs_cached_begin
-   = !common_range<_Vp> && !random_access_range<_Vp>;
+   = !common_range<_Vp> && !(random_access_range<_Vp>
+ && sized_sentinel_for,
+   iterator_t<_Vp>>);

  [[no_unique_address]]
__detail::__maybe_present_t<_S_needs_cached_begin,
--
2.31.1.621.g97eea85a0a

Re: [PATCH][nvptx] Handle memmodel for atomic ops

2021-05-17 Thread Tobias Burnus


On 17.05.21 17:49, Tom de Vries wrote:

[ Tobias, can you test this on volta ? ]


Unfortunately, it does not seem to help. On a non-Volta system, it still
works (run time 0.3s) but on a Volta system it fails after 1.5s (abort).

Looking (with an editor) at nvptx-none/lib/mgomp/libgomp.a, I still see
  @ %r25 atom.global.exch.b32 %r22,[atomic_lock],1;
with no prior membar in GOMP_atomic_start. Likewise with
nvptx-none/lib/libgomp.a which has
  atom.global.exch.b32 %r22,[atomic_lock],1;
I thought a barrier would show up there?


The atomic ops in nvptx.md have memmodel arguments, which are currently
ignored.
Handle these, fixing test-case fails libgomp.c-c++-common/reduction-{5,6}.c
on volta.


Is there a reason that PR target/96932 isn't listed in the
ChangeLog? Or is it supposed that the barrier does not show up
at GOMP_atomic_start (as it doesn't) and it should show up elsewhere
and still help with those two testcases?

Sorry for not having better news. (Unless I messed up and it is an
issue on my side - but it doesn't look like.)

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf

Re: [PATCH] libstdc++: Fix up semiregular-box partial specialization [PR100475]

2021-05-17 Thread Jonathan Wakely via Gcc-patches


On 17/05/21 11:43 -0400, Patrick Palka via Libstdc++ wrote:

This makes the in-place constructor of our partial specialization of
__box for already-semiregular types to use direct-non-list-initialization
(in accordance with the specification of the primary template), and
additionally makes its data() member function use std::__addressof.

Tested on x86_64-pc-linux-gnu, does this look OK for 10/11/trunk?


Yes for all, thanks.

Re: [PATCH] libstdc++: Fix wrong thread waking on notify [PR100334]

2021-05-17 Thread Jonathan Wakely via Gcc-patches


On 14/05/21 18:09 +0100, Jonathan Wakely wrote:

On 13/05/21 18:54 -0700, Thomas Rodgers wrote:

From: Thomas Rodgers 

Please ignore the previous patch. This one removes the need to carry any
extra state in the case of a 'laundered' atomic wait.

libstdc++/ChangeLog:
* include/bits/atomic_wait.h (__waiter::_M_do_wait_v): loop
until value change observed.
(__waiter_base::_M_laundered): New member function.
(__watier_base::_M_notify): Check _M_laundered() to determine
whether to wake one or all.
(__detail::__atomic_compare): Return true if call to
__builtin_memcmp() == 0.
(__waiter_base::_S_do_spin_v): Adjust predicate.
* testsuite/29_atomics/atomic/wait_notify/100334.cc: New
test.
---
libstdc++-v3/include/bits/atomic_wait.h   | 28 --
.../29_atomics/atomic/wait_notify/100334.cc   | 94 +++
2 files changed, 114 insertions(+), 8 deletions(-)
create mode 100644 
libstdc++-v3/testsuite/29_atomics/atomic/wait_notify/100334.cc

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index 984ed70f16c..07bb744d822 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -181,11 +181,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return false;
 }

+// return true if equal
   template
 bool __atomic_compare(const _Tp& __a, const _Tp& __b)
 {
// TODO make this do the correct padding bit ignoring comparison
-   return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) != 0;
+   return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
 }

   struct __waiter_pool_base
@@ -300,14 +301,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  explicit __waiter_base(const _Up* __addr) noexcept
: _M_w(_S_for(__addr))
, _M_addr(_S_wait_addr(__addr, &_M_w._M_ver))
- {
- }
+ { }
+
+   bool
+   _M_laundered() const
+   { return _M_addr == &_M_w._M_ver; }

void
_M_notify(bool __all, bool __bare = false)
{
- if (_M_addr == &_M_w._M_ver)
-   __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);
+ if (_M_laundered())
+   {
+ __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);


Please mention this increment in the changelog.


Ugh, sorry, I seem to have forgotten how to read a diff.


OK for trunk and gcc-11 with that change, thanks.


OK to push, no changes needed.

Re: [PATCH] arm: Fix ICE with CMSE nonsecure call on Armv8.1-M [PR100333]

2021-05-17 Thread Richard Earnshaw via Gcc-patches





On 30/04/2021 09:30, Alex Coplan via Gcc-patches wrote:

Hi,

As the PR shows, we ICE shortly after expanding nonsecure calls for
Armv8.1-M.  For Armv8.1-M, we have TARGET_HAVE_FPCXT_CMSE. As it stands,
the expander (arm.md:nonsecure_call_internal) moves the callee's address
to a register (with copy_to_suggested_reg) only if
!TARGET_HAVE_FPCXT_CMSE.

However, looking at the pattern which the insn appears to be intended to
match (thumb2.md:*nonsecure_call_reg_thumb2_fpcxt), it requires the
callee's address to be in a register.

This patch therefore just forces the callee's address into a register in
the expander.

Testing:
  * Regtested an arm-eabi cross configured with
  --with-arch=armv8.1-m.main+mve.fp+fp.dp --with-float=hard. No regressions.
  * Bootstrap and regtest on arm-linux-gnueabihf in progress.

OK for trunk and backports as appropriate if bootstrap looks good?

Thanks,
Alex

gcc/ChangeLog:

PR target/100333
* config/arm/arm.md (nonsecure_call_internal): Always ensure
callee's address is in a register.

gcc/testsuite/ChangeLog:

PR target/100333
* gcc.target/arm/cmse/pr100333.c: New test.




-  "
   {
-if (!TARGET_HAVE_FPCXT_CMSE)
-  {
-   rtx tmp =
- copy_to_suggested_reg (XEXP (operands[0], 0),
-gen_rtx_REG (SImode, R4_REGNUM),
-SImode);
+rtx tmp = NULL_RTX;
+rtx addr = XEXP (operands[0], 0);

-   operands[0] = replace_equiv_address (operands[0], tmp);
-  }
-  }")
+if (TARGET_HAVE_FPCXT_CMSE && !REG_P (addr))
+  tmp = force_reg (SImode, addr);
+else if (!TARGET_HAVE_FPCXT_CMSE)
+  tmp = copy_to_suggested_reg (XEXP (operands[0], 0),
+  gen_rtx_REG (SImode, R4_REGNUM),
+  SImode);


I think it might be better to handle the !TARGET_HAVE_FPCXT_CMSE case 
via a pseudo as well, then we don't end up generating a potentially 
non-trivial insn that directly writes a fixed hard reg - it's better to 
let later passes clean that up if they can.


Also, you've extracted XEXP (operands[0], 0) into 'addr', but then 
continue to use the XEXP form in the existing path.  Please be 
consistent use XEXP directly everywhere, or use 'addr' everywhere.


So you want something like

  addr = XEXP (operands[0], 0);
  if (!REG_P (addr))
addr = force_reg (SImode, addr);

  if (!T_H_F_C)
addr = copy...(addr, gen(r4), SImode);

  operands[0] = replace_equiv_addr (operands[0], addr);

R.

R.

RE: [PATCH] arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977]

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Alex Coplan 
> Sent: 17 May 2021 17:29
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org; ni...@redhat.com; Richard Earnshaw
> ; Ramana Radhakrishnan
> 
> Subject: Re: [PATCH] arm: Fix ICEs with compare-and-swap and -
> march=armv8-m.base [PR99977]
> 
> Hi Kyrill,
> 
> On 27/04/2021 13:47, Kyrylo Tkachov wrote:
> > Hi Alex,
> >
> > > -Original Message-
> > > From: Alex Coplan 
> > > Sent: 27 April 2021 14:14
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: ni...@redhat.com; Richard Earnshaw
> ;
> > > Ramana Radhakrishnan ; Kyrylo
> > > Tkachov 
> > > Subject: Re: [PATCH] arm: Fix ICEs with compare-and-swap and -
> > > march=armv8-m.base [PR99977]
> > >
> > > Ping
> > >
> > > On 15/04/2021 15:39, Alex Coplan via Gcc-patches wrote:
> > > > Hi all,
> > > >
> > > > The PR shows two ICEs with __sync_bool_compare_and_swap and
> > > > -mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA
> and
> > > one
> > > > later on, after the CAS insn is split.
> > > >
> > > > The LRA ICE occurs because the
> > > > @atomic_compare_and_swap_1 pattern
> > > attempts to tie
> > > > two output operands together (operands 0 and 1 in the third
> > > > alternative). LRA can't handle this, since it doesn't make sense for an
> > > > insn to assign to the same operand twice.
> > > >
> > > > The later (post-splitting) ICE occurs because the expansion of the
> > > > cbranchsi4_scratch insn doesn't quite go according to plan. As it
> > > > stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch,
> > > > attempting to pass a register (neg_bval) to use as a scratch register.
> > > > However, since the RTL template has a match_scratch here,
> > > > gen_cbranchsi4_scratch ignores this argument and produces a scratch
> rtx.
> > > > Since this is all happening after RA, this is doomed to fail (and we get
> > > > an ICE about the insn not matching its constraints).
> > > >
> > > > It seems that the motivation for the choice of constraints in the
> > > > atomic_compare_and_swap pattern comes from an attempt to satisfy
> the
> > > > constraints of the cbranchsi4_scratch insn. This insn requires the
> > > > scratch register to be the same as the input register in the case that
> > > > we use a larger negative immediate (one that satisfies J, but not L).
> > > >
> > > > Of course, as noted above, LRA refuses to assign two output operands
> to
> > > > the same register, so this was never going to work.
> > > >
> > > > The solution I'm proposing here is to collapse the alternatives to the
> > > > CAS insn (allowing the two output register operands to be matched to
> > > > different registers) and to ensure that the constraints for
> > > > cbranchsi4_scratch are met in arm_split_compare_and_swap. We do
> this
> > > by
> > > > inserting a move to ensure the source and destination registers match if
> > > > necessary (i.e. in the case of large negative immediates).
> > > >
> > > > Another notable change here is that we only do:
> > > >
> > > >   emit_move_insn (neg_bval, const1_rtx);
> > > >
> > > > for non-negative immediates. This is because the ADDS instruction used
> in
> > > > the negative case suffices to leave a suitable value in neg_bval: if the
> > > > operands compare equal, we don't take the branch (so neg_bval will be
> > > > set by the load exclusive). Otherwise, the ADDS will leave a nonzero
> > > > value in neg_bval, which will correctly signal that the CAS has failed
> > > > when it is later negated.
> > > >
> > > > Testing:
> > > >  * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions.
> > > >  * Regtested an arm-eabi cross configured with --with-arch=armv8-
> m.base,
> > > no
> > > >  regressions. The patch fixes the gcc.dg/ia64-sync-3.c test in this 
> > > > config.
> > > >
> > > > OK for trunk?
> >
> > Ok.
> 
> The patch applies cleanly on the 11 branch and passes bootstrap/regtest on
> arm-linux-gnueabihf as well as a regtest on arm-eabi configured with
> --with-arch=armv8-m.base.
> 
> OK for the 11 branch? OK for the other affected branches if the same is
> true there?

Yes,
Thanks,
Kyrill

> 
> Thanks,
> Alex
> 
> > Thanks,
> > Kyrill
> >
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/99977
> > > > * config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen
> > > > with negative immediates: ensure we expand cbranchsi4_scratch
> > > > correctly and ensure we satisfy its constraints.
> > > > * config/arm/sync.md
> > > > (@atomic_compare_and_swap_1):
> > > Don't
> > > > attempt to tie two output operands together with constraints;
> > > > collapse two alternatives.
> > > > (@atomic_compare_and_swap_1): Likewise.
> > > > * config/arm/thumb1.md (cbranchsi4_neg_late): New.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR target/99977
> > > > * gcc.target/arm/pr99977.c: New test.
> > >
> > > > diff --git

Re: [PATCH] arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977]

2021-05-17 Thread Alex Coplan via Gcc-patches

Hi Kyrill,

On 27/04/2021 13:47, Kyrylo Tkachov wrote:
> Hi Alex,
> 
> > -Original Message-
> > From: Alex Coplan 
> > Sent: 27 April 2021 14:14
> > To: gcc-patches@gcc.gnu.org
> > Cc: ni...@redhat.com; Richard Earnshaw ;
> > Ramana Radhakrishnan ; Kyrylo
> > Tkachov 
> > Subject: Re: [PATCH] arm: Fix ICEs with compare-and-swap and -
> > march=armv8-m.base [PR99977]
> > 
> > Ping
> > 
> > On 15/04/2021 15:39, Alex Coplan via Gcc-patches wrote:
> > > Hi all,
> > >
> > > The PR shows two ICEs with __sync_bool_compare_and_swap and
> > > -mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and
> > one
> > > later on, after the CAS insn is split.
> > >
> > > The LRA ICE occurs because the
> > > @atomic_compare_and_swap_1 pattern
> > attempts to tie
> > > two output operands together (operands 0 and 1 in the third
> > > alternative). LRA can't handle this, since it doesn't make sense for an
> > > insn to assign to the same operand twice.
> > >
> > > The later (post-splitting) ICE occurs because the expansion of the
> > > cbranchsi4_scratch insn doesn't quite go according to plan. As it
> > > stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch,
> > > attempting to pass a register (neg_bval) to use as a scratch register.
> > > However, since the RTL template has a match_scratch here,
> > > gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx.
> > > Since this is all happening after RA, this is doomed to fail (and we get
> > > an ICE about the insn not matching its constraints).
> > >
> > > It seems that the motivation for the choice of constraints in the
> > > atomic_compare_and_swap pattern comes from an attempt to satisfy the
> > > constraints of the cbranchsi4_scratch insn. This insn requires the
> > > scratch register to be the same as the input register in the case that
> > > we use a larger negative immediate (one that satisfies J, but not L).
> > >
> > > Of course, as noted above, LRA refuses to assign two output operands to
> > > the same register, so this was never going to work.
> > >
> > > The solution I'm proposing here is to collapse the alternatives to the
> > > CAS insn (allowing the two output register operands to be matched to
> > > different registers) and to ensure that the constraints for
> > > cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this
> > by
> > > inserting a move to ensure the source and destination registers match if
> > > necessary (i.e. in the case of large negative immediates).
> > >
> > > Another notable change here is that we only do:
> > >
> > >   emit_move_insn (neg_bval, const1_rtx);
> > >
> > > for non-negative immediates. This is because the ADDS instruction used in
> > > the negative case suffices to leave a suitable value in neg_bval: if the
> > > operands compare equal, we don't take the branch (so neg_bval will be
> > > set by the load exclusive). Otherwise, the ADDS will leave a nonzero
> > > value in neg_bval, which will correctly signal that the CAS has failed
> > > when it is later negated.
> > >
> > > Testing:
> > >  * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions.
> > >  * Regtested an arm-eabi cross configured with --with-arch=armv8-m.base,
> > no
> > >  regressions. The patch fixes the gcc.dg/ia64-sync-3.c test in this 
> > > config.
> > >
> > > OK for trunk?
> 
> Ok.

The patch applies cleanly on the 11 branch and passes bootstrap/regtest on
arm-linux-gnueabihf as well as a regtest on arm-eabi configured with
--with-arch=armv8-m.base.

OK for the 11 branch? OK for the other affected branches if the same is
true there?

Thanks,
Alex

> Thanks,
> Kyrill
> 
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR target/99977
> > >   * config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen
> > >   with negative immediates: ensure we expand cbranchsi4_scratch
> > >   correctly and ensure we satisfy its constraints.
> > >   * config/arm/sync.md
> > >   (@atomic_compare_and_swap_1):
> > Don't
> > >   attempt to tie two output operands together with constraints;
> > >   collapse two alternatives.
> > >   (@atomic_compare_and_swap_1): Likewise.
> > >   * config/arm/thumb1.md (cbranchsi4_neg_late): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR target/99977
> > >   * gcc.target/arm/pr99977.c: New test.
> > 
> > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > > index 475fb0d827f..8d19b8a73fd 100644
> > > --- a/gcc/config/arm/arm.c
> > > +++ b/gcc/config/arm/arm.c
> > > @@ -30737,13 +30737,31 @@ arm_split_compare_and_swap (rtx
> > operands[])
> > >  }
> > >else
> > >  {
> > > -  emit_move_insn (neg_bval, const1_rtx);
> > >cond = gen_rtx_NE (VOIDmode, rval, oldval);
> > >if (thumb1_cmpneg_operand (oldval, SImode))
> > > - emit_unlikely_jump (gen_cbranchsi4_scratch (neg_bval, rval, oldval,
> > > - label2, cond));
> > > + {
> > > +   rtx src = rval;
> > > +

Re: RFA: Add option -fretry-compilation

2021-05-17 Thread Richard Biener via Gcc-patches

On Mon, May 17, 2021 at 5:33 PM Joern Rennecke
 wrote:
>
> On Mon, 17 May 2021 at 11:59, Richard Biener  
> wrote:
>
> > The plan for reload is to axe it similar to CC0 support.  Sooner than 
> > later, but
> > give it's still used exclusively by a lot of target means it might
> > take some time.
>
> > So for you it's always just -fretry-compilation -m[no-]lra?  Given 
> > -m[no-]lra
> > is a thing cycling between the two directly in RA lra/reload should be 
> > possible?
>
> Even if that were possible, it wouldn't solve the problem.  When I try 
> compiling
> newlib without -fretry-compilation, it's falling over first for
> libc/time/strftime.c .
> With lra, lra finishes, but it ignores an earlyclobber constraint, so
> reload_cse_simplify_operands ICEs.  With reload, you get a spill failure.
> I've tried various options, but only -O0 seems to work.  Compiling strftime 
> with
> -O0 is not really an issue because the target is too deeply embedded to hope
> to link something that uses strftime.  But identifyig all the files
> that can't be
> compiled with optimization and treating them differently is a problem if it 
> has
> to be done by hand.
>
> > Or are reload/LRA too greedy in that they ICE when having transformed half
> > of the code already?
>
> Both of them do a lot of transformations before they ICE.  Or they don't even
> ICE themselves, but leave behind invalid rtl that a later pass catches.
>
> Even if we fixed both passes so that they could roll back everything
> (which I think would be a lot harder for lra; reload can already roll
> back a lot),
> what's the point if you axe reload soon after?
>
> > I see.  It's of course difficult for the FSF tree to cater for
> > extremes that are not
> > represented in its tree.  I wonder what prevents you from contributing the 
> > port?
>
> I can neither confirm nor deny that I can't contribute the port.
>
> > Still if that solves a lot of the issues this seems like the way to go.
>
> It has merit in it's own right, but it can't fix all the ICEs, and thus 
> doesn't
> make building libraries manageable.

But then it's a sub-par quality port (whoever is to blame here), working
around this way "officially" doesn't sound like a good thing.  So I suppose
this plumbing as to stay private to your port.

Richard.

[PATCH] libstdc++: Fix condition for memoizing reverse_view::begin() [PR100621]

2021-05-17 Thread Patrick Palka via Gcc-patches

A range being a random access range is not a sufficient condition for
ranges::next(iter, sent) to have constant time complexity; the range
must also have a sized sentinel.  This adjusts the memoization condition
for reverse_view accordingly.

Tested on x86_64-pc-linxu-gnu, does this look OK for trunk?  Doesn't
seem to be worth backporting.

libstdc++-v3/ChangeLog:

* include/std/ranges (reverse_view::_S_needs_cached_begin):
Set to false if the underlying non-common random-access range
doesn't have a sized sentinel.
---
 libstdc++-v3/include/std/ranges | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index bf52074ca05..e93469ca3b4 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3340,7 +3340,9 @@ namespace views::__adaptor
 {
 private:
   static constexpr bool _S_needs_cached_begin
-   = !common_range<_Vp> && !random_access_range<_Vp>;
+   = !common_range<_Vp> && !(random_access_range<_Vp>
+ && sized_sentinel_for,
+   iterator_t<_Vp>>);
 
   [[no_unique_address]]
__detail::__maybe_present_t<_S_needs_cached_begin,
-- 
2.31.1.621.g97eea85a0a

[PATCH] libstdc++: Fix access issues in elements_view::_Sentinel [PR100631]

2021-05-17 Thread Patrick Palka via Gcc-patches

Tested on x86_64-pc-linux-gnu, does this look OK for 10/11/trunk?

libstdc++-v3/ChangeLog:

PR libstdc++/100631
* include/std/ranges (elements_view::_Iterator): Befriend
_Sentinel.
(elements_view::_Sentinel::_M_equal): Templatize.
(elements_view::_Sentinel::_M_distance_from): Split out from ...
(elements_view::_Sentinel::operator-): Here.
* testsuite/std/ranges/adaptors/elements.cc (test06, test07):
New tests.
---
 libstdc++-v3/include/std/ranges   | 15 ++---
 .../testsuite/std/ranges/adaptors/elements.cc | 31 +++
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index fe6379fb858..bf52074ca05 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3749,15 +3749,22 @@ namespace views::__adaptor
  { return __x._M_current - __y._M_current; }
 
  friend _Sentinel<_Const>;
+ friend _Sentinel;
};
 
   template
struct _Sentinel
{
private:
- constexpr bool
- _M_equal(const _Iterator<_Const>& __x) const
- { return __x._M_current == _M_end; }
+ template
+   constexpr bool
+   _M_equal(const _Iterator<_Const2>& __x) const
+   { return __x._M_current == _M_end; }
+
+ template
+   constexpr auto
+   _M_distance_from(const _Iterator<_Const2>& __i) const
+   { return _M_end - __i._M_current; }
 
  using _Base = elements_view::_Base<_Const>;
  sentinel_t<_Base> _M_end = sentinel_t<_Base>();
@@ -3800,7 +3807,7 @@ namespace views::__adaptor
requires sized_sentinel_for, iterator_t<_Base2>>
friend constexpr range_difference_t<_Base>
operator-(const _Sentinel& __x, const _Iterator<_Const2>& __y)
-   { return __x._M_end - __y._M_current; }
+   { return __x._M_distance_from(__y); }
 
  friend _Sentinel;
};
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
index 134afd6a873..27aba2c0ff0 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
@@ -115,6 +115,35 @@ test05()
   VERIFY( r2[0] == 1 && r2[1] == 3 );
 }
 
+void
+test06()
+{
+  // PR libstdc++/100631
+  auto r = std::views::iota(0)
+| std::views::filter([](int){ return true; })
+| std::views::take(42)
+| std::views::reverse
+| std::views::transform([](int) { return std::make_pair(42, "hello"); })
+| std::views::take(42)
+| std::views::keys;
+  auto b = r.begin();
+  auto e = r.end();
+  e - b;
+}
+
+void
+test07()
+{
+  // PR libstdc++/100631 comment #2
+  auto r = std::views::iota(0)
+| std::views::transform([](int) { return std::make_pair(42, "hello"); })
+| std::views::keys;
+  auto b = std::ranges::cbegin(r);
+  auto e = std::end(r);
+  b.base() == e.base();
+  b == e;
+}
+
 int
 main()
 {
@@ -123,4 +152,6 @@ main()
   test03();
   test04();
   test05();
+  test06();
+  test07();
 }
-- 
2.31.1.621.g97eea85a0a

Re: [PATCH] openmp: Notify team barrier of pending tasks in, omp_fulfill_event

2021-05-17 Thread Jakub Jelinek via Gcc-patches

On Mon, May 17, 2021 at 04:48:03PM +0100, Kwok Cheung Yeung wrote:
> 2021-05-17  Kwok Cheung Yeung  
> 
>   libgomp/
>   * task.c (omp_fulfill_event): Call gomp_team_barrier_set_task_pending
>   if new tasks generated.
>   * testsuite/libgomp.c-c++-common/task-detach-13.c: New.
> ---
>  libgomp/task.c|  1 +
>  .../libgomp.c-c++-common/task-detach-13.c | 60 +++
>  2 files changed, 61 insertions(+)
>  create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c
> 
> diff --git a/libgomp/task.c b/libgomp/task.c
> index 1c73c759a8d..feb4796a3ac 100644
> --- a/libgomp/task.c
> +++ b/libgomp/task.c
> @@ -2460,6 +2460,7 @@ omp_fulfill_event (omp_event_handle_t event)
>if (new_tasks > 0)
>  {
>/* Wake up threads to run new tasks.  */
> +  gomp_team_barrier_set_task_pending (>barrier);
>do_wake = team->nthreads - team->task_running_count;
>if (do_wake > new_tasks)
>   do_wake = new_tasks;
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c 
> b/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c
> new file mode 100644
> index 000..4306524526d
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c
> @@ -0,0 +1,60 @@
> +/* { dg-do run } */
> +/* { dg-options "-fopenmp" } */

-fopenmp as dg-options is implicit, please remove it.

> +/* { dg-timeout 10 } */

This will fail on targets that don't have pthreads.
We have already some tests that do use pthread_create,
and those currently use
/* { dg-do run { target *-*-linux* *-*-gnu* *-*-freebsd* } } */
so I'd do the same for this test.
There is also effective target pthread but am not sure if it covers
everything we need to test.

> +
> +
> +  pthread_join (thr, 0);

I'd add return 0;
While we default to C17 which doesn't need it, we don't say anywhere
in the testcase that it is C99+ or C++ only, so I think better make it valid
C89 too.

Otherwise LGTM, thanks.

Jakub

[PATCH][nvptx] Handle memmodel for atomic ops

2021-05-17 Thread Tom de Vries

Hi,

[ Tobias, can you test this on volta ? ]

The atomic ops in nvptx.md have memmodel arguments, which are currently
ignored.

Handle these, fixing test-case fails libgomp.c-c++-common/reduction-{5,6}.c
on volta.

Tested libgomp on x86_64-linux with nvptx accelerator.

Any comments?

Thanks,
- Tom

[nvptx] Handle memmodel for atomic ops

gcc/ChangeLog:

2021-05-17  Tom de Vries  

PR target/100497
* config/nvptx/nvptx-protos.h (nvptx_output_atomic_insn): Declare
* config/nvptx/nvptx.c (nvptx_output_barrier)
(nvptx_output_atomic_insn): New function.
(nvptx_print_operand): Add support for 'B'.
* config/nvptx/nvptx.md: Use nvptx_output_atomic_insn for atomic
insns.

---
 gcc/config/nvptx/nvptx-protos.h |  1 +
 gcc/config/nvptx/nvptx.c| 77 +
 gcc/config/nvptx/nvptx.md   | 31 ++---
 3 files changed, 104 insertions(+), 5 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 15122096487..b7e6ae26522 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -57,5 +57,6 @@ extern const char *nvptx_output_set_softstack (unsigned);
 extern const char *nvptx_output_simt_enter (rtx, rtx, rtx);
 extern const char *nvptx_output_simt_exit (rtx);
 extern const char *nvptx_output_red_partition (rtx, rtx);
+extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int);
 #endif
 #endif
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index ebbfa921589..722b0faa330 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2444,6 +2444,53 @@ nvptx_output_mov_insn (rtx dst, rtx src)
   return "%.\tcvt%t0%t1\t%0, %1;";
 }
 
+/* Output a pre/post barrier for MEM_OPERAND according to MEMMODEL.  */
+
+static void
+nvptx_output_barrier (rtx *mem_operand, int memmodel, bool pre_p)
+{
+  bool post_p = !pre_p;
+
+  switch (memmodel)
+{
+case MEMMODEL_RELAXED:
+  return;
+case MEMMODEL_CONSUME:
+case MEMMODEL_ACQUIRE:
+case MEMMODEL_SYNC_ACQUIRE:
+  if (post_p)
+   break;
+  return;
+case MEMMODEL_RELEASE:
+case MEMMODEL_SYNC_RELEASE:
+  if (pre_p)
+   break;
+  return;
+case MEMMODEL_ACQ_REL:
+case MEMMODEL_SEQ_CST:
+case MEMMODEL_SYNC_SEQ_CST:
+  if (pre_p || post_p)
+   break;
+  return;
+default:
+  gcc_unreachable ();
+}
+
+  output_asm_insn ("%.\tmembar%B0;", mem_operand);
+}
+
+const char *
+nvptx_output_atomic_insn (const char *asm_template, rtx *operands, int mem_pos,
+ int memmodel_pos)
+{
+  nvptx_output_barrier ([mem_pos], INTVAL (operands[memmodel_pos]),
+   true);
+  output_asm_insn (asm_template, operands);
+  nvptx_output_barrier ([mem_pos], INTVAL (operands[memmodel_pos]),
+   false);
+  return "";
+}
+
 static void nvptx_print_operand (FILE *, rtx, int);
 
 /* Output INSN, which is a call to CALLEE with result RESULT.  For ptx, this
@@ -2660,6 +2707,36 @@ nvptx_print_operand (FILE *file, rtx x, int code)
 
   switch (code)
 {
+case 'B':
+  if (SYMBOL_REF_P (XEXP (x, 0)))
+   switch (SYMBOL_DATA_AREA (XEXP (x, 0)))
+ {
+ case DATA_AREA_GENERIC:
+   /* Assume worst-case: global.  */
+   gcc_fallthrough (); /* FALLTHROUGH.  */
+ case DATA_AREA_GLOBAL:
+   break;
+ case DATA_AREA_SHARED:
+   fputs (".cta", file);
+   return;
+ case DATA_AREA_LOCAL:
+ case DATA_AREA_CONST:
+ case DATA_AREA_PARAM:
+ default:
+   gcc_unreachable ();
+ }
+
+  /* There are 2 cases where membar.sys differs from membar.gl:
+- host accesses global memory (f.i. systemwide atomics)
+- 2 or more devices are setup in peer-to-peer mode, and one
+  peer can access global memory of other peer.
+Neither are currently supported by openMP/OpenACC on nvptx, but
+that could change, so we default to membar.sys.  We could support
+this more optimally by adding DATA_AREA_SYS and then emitting
+.gl for DATA_AREA_GLOBAL and .sys for DATA_AREA_SYS.  */
+  fputs (".sys", file);
+  return;
+
 case 'A':
   x = XEXP (x, 0);
   gcc_fallthrough (); /* FALLTHROUGH. */
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 00bb8fea821..108de1c0c59 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1642,7 +1642,11 @@
(set (match_dup 1)
(unspec_volatile:SDIM [(const_int 0)] UNSPECV_CAS))]
   ""
-  "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;"
+  {
+const char *t
+  = "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;";
+return nvptx_output_atomic_insn (t, operands, 1, 4);
+  }
   [(set_attr "atomic" "true")])
 
 (define_insn "atomic_exchange"
@@ -1654,7 +1658,11 @@
(set (match_dup 1)
(match_operand:SDIM 2

[PATCH] openmp: Notify team barrier of pending tasks in, omp_fulfill_event

2021-05-17 Thread Kwok Cheung Yeung


Hello

This patch fixes the issue where a call to omp_fulfill_event could fail to 
trigger the execution of tasks that were dependent on the task whose completion 
event is being fulfilled.


This mainly (or can only?) occurs when the thread is external to OpenMP, and all 
the barrier threads are sleeping when the omp_fulfill_event is called. 
omp_fulfill_event wakes the appropriate number of threads, but if 
BAR_TASK_PENDING is not set on bar->generation, the threads go back to sleep 
again rather than process new tasks.


I have added a new testcase using a pthread thread to call omp_fulfill_event on 
a suspended task after a short delay. I have not included a Fortran version as 
there doesn't appear to be a standard interface for threading on Fortran.


I have tested all the task-detach-* libgomp tests (which are the only tests that 
call omp_fulfill_event) with no offloading and offloading to Nvidia, with no 
fails. Okay to commit to master, releases/gcc-11 and devel/omp/gcc-11?


Thanks

Kwok
From 348c7cd00e358a8dc0b7563055f367fce2713fa5 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 14 May 2021 09:59:11 -0700
Subject: [PATCH] openmp: Notify team barrier of pending tasks in
 omp_fulfill_event

The team barrier should be notified of any new tasks that become runnable
as the result of a completing task, otherwise the barrier threads might
not resume processing available tasks, resulting in a hang.

2021-05-17  Kwok Cheung Yeung  

libgomp/
* task.c (omp_fulfill_event): Call gomp_team_barrier_set_task_pending
if new tasks generated.
* testsuite/libgomp.c-c++-common/task-detach-13.c: New.
---
 libgomp/task.c|  1 +
 .../libgomp.c-c++-common/task-detach-13.c | 60 +++
 2 files changed, 61 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c

diff --git a/libgomp/task.c b/libgomp/task.c
index 1c73c759a8d..feb4796a3ac 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -2460,6 +2460,7 @@ omp_fulfill_event (omp_event_handle_t event)
   if (new_tasks > 0)
 {
   /* Wake up threads to run new tasks.  */
+  gomp_team_barrier_set_task_pending (>barrier);
   do_wake = team->nthreads - team->task_running_count;
   if (do_wake > new_tasks)
do_wake = new_tasks;
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c
new file mode 100644
index 000..4306524526d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-13.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-options "-fopenmp" } */
+/* { dg-timeout 10 } */
+
+/* Test that omp_fulfill_event works when called from an external
+   non-OpenMP thread.  */
+
+#include 
+#include 
+#include 
+#include 
+
+int finished = 0;
+int event_pending = 0;
+omp_event_handle_t detach_event;
+
+void*
+fulfill_thread (void *)
+{
+  while (!__atomic_load_n (, __ATOMIC_RELAXED))
+{
+  if (__atomic_load_n (_pending, __ATOMIC_ACQUIRE))
+   {
+ omp_fulfill_event (detach_event);
+ __atomic_store_n (_pending, 0, __ATOMIC_RELEASE);
+   }
+
+  sleep(1);
+}
+
+  return 0;
+}
+
+int
+main (void)
+{
+  pthread_t thr;
+  int dep;
+  pthread_create (, NULL, fulfill_thread, 0);
+
+  #pragma omp parallel
+#pragma omp single
+  {
+   omp_event_handle_t ev;
+
+   #pragma omp task depend (out: dep) detach (ev)
+   {
+ detach_event = ev;
+ __atomic_store_n (_pending, 1, __ATOMIC_RELEASE);
+   }
+
+   #pragma omp task depend (in: dep)
+   {
+ __atomic_store_n (, 1, __ATOMIC_RELAXED);
+   }
+  }
+
+
+  pthread_join (thr, 0);
+}
-- 
2.30.0.335.ge636282

[PATCH] libstdc++: Fix up semiregular-box partial specialization [PR100475]

2021-05-17 Thread Patrick Palka via Gcc-patches

This makes the in-place constructor of our partial specialization of
__box for already-semiregular types to use direct-non-list-initialization
(in accordance with the specification of the primary template), and
additionally makes its data() member function use std::__addressof.

Tested on x86_64-pc-linux-gnu, does this look OK for 10/11/trunk?

libstdc++-v3/ChangeLog:

PR libstdc++/100475
* include/std/ranges (__box::__box): Use non-list-initialization
in member initializer list of in-place constructor of the
partial specialization for semiregular types.
(__box::operator->): Use std::__addressof.
* testsuite/std/ranges/adaptors/detail/semiregular_box.cc
(test02): New test.
* testsuite/std/ranges/single_view.cc (test04): New test.
---
 libstdc++-v3/include/std/ranges|  6 +++---
 .../ranges/adaptors/detail/semiregular_box.cc  | 18 ++
 .../testsuite/std/ranges/single_view.cc| 16 
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 0f69d4f0839..1707aeaebcd 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -163,7 +163,7 @@ namespace ranges
  constexpr explicit
  __box(in_place_t, _Args&&... __args)
  noexcept(is_nothrow_constructible_v<_Tp, _Args...>)
- : _M_value{std::forward<_Args>(__args)...}
+ : _M_value(std::forward<_Args>(__args)...)
  { }
 
constexpr bool
@@ -180,11 +180,11 @@ namespace ranges
 
constexpr _Tp*
operator->() noexcept
-   { return &_M_value; }
+   { return std::__addressof(_M_value); }
 
constexpr const _Tp*
operator->() const noexcept
-   { return &_M_value; }
+   { return std::__addressof(_M_value); }
   };
   } // namespace __detail
 
diff --git 
a/libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc
index 65931dea51a..ed694e04fd1 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/detail/semiregular_box.cc
@@ -81,3 +81,21 @@ test01()
   return true;
 }
 static_assert(test01());
+
+template
+  struct A {
+A() requires make_semiregular;
+A(int, int);
+A(std::initializer_list) = delete;
+  };
+
+void
+test02()
+{
+  // PR libstdc++/100475
+  static_assert(std::semiregular>);
+  __box> x2(std::in_place, 0, 0);
+
+  static_assert(!std::semiregular>);
+  __box> x1(std::in_place, 0, 0);
+}
diff --git a/libstdc++-v3/testsuite/std/ranges/single_view.cc 
b/libstdc++-v3/testsuite/std/ranges/single_view.cc
index 97bc39bb636..f530cc07565 100644
--- a/libstdc++-v3/testsuite/std/ranges/single_view.cc
+++ b/libstdc++-v3/testsuite/std/ranges/single_view.cc
@@ -58,9 +58,25 @@ test03()
   VERIFY(*std::ranges::begin(s3) == 'a');
 }
 
+void
+test04()
+{
+  // PR libstdc++/100475
+  struct A {
+A() = default;
+A(int, int) { }
+A(std::initializer_list) = delete;
+void operator&() const = delete;
+  };
+  std::ranges::single_view s(std::in_place, 0, 0);
+  s.data();
+  std::as_const(s).data();
+}
+
 int main()
 {
   test01();
   test02();
   test03();
+  test04();
 }
-- 
2.31.1.621.g97eea85a0a

[PATCH] libstdc++: Fix iterator caching inside range adaptors [PR100479]

2021-05-17 Thread Patrick Palka via Gcc-patches

This fixes two issues with our iterator caching as described in detail
in the PR.  Since r12-336 added the __non_propagating_cache class
template as part of P2328, this patch just rewrites the _CachedPosition
partial specialization in terms of this class template.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?  Shall we
also backport this?

libstdc++-v3/ChangeLog:

PR libstdc++/100479
* include/std/ranges (__detail::__non_propagating_cache): Move
definition up to before that of _CachedPosition.  Make base
class _Optional_base protected instead of private.  Add const
overload for operator*.
(__detail::_CachedPosition): Rewrite the partial specialization
for forward ranges as a derived class of __non_propagating_cache.
Remove the size constraint on the partial specialization for
random access ranges.
* testsuite/std/ranges/adaptors/100479.cc: New test.
---
 libstdc++-v3/include/std/ranges   | 133 +-
 .../testsuite/std/ranges/adaptors/100479.cc   |  82 +++
 2 files changed, 148 insertions(+), 67 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/100479.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 1707aeaebcd..fe6379fb858 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1139,6 +1139,67 @@ namespace views::__adaptor
 
   namespace __detail
   {
+template
+  struct __non_propagating_cache
+  {
+   // When _Tp is not an object type (e.g. is a reference type), we make
+   // __non_propagating_cache<_Tp> empty rather than ill-formed so that
+   // users can easily conditionally declare data members with this type
+   // (such as join_view::_M_inner).
+  };
+
+template
+  requires is_object_v<_Tp>
+  struct __non_propagating_cache<_Tp> : protected _Optional_base<_Tp>
+  {
+   __non_propagating_cache() = default;
+
+   constexpr
+   __non_propagating_cache(const __non_propagating_cache&) noexcept
+   { }
+
+   constexpr
+   __non_propagating_cache(__non_propagating_cache&& __other) noexcept
+   { __other._M_reset(); }
+
+   constexpr __non_propagating_cache&
+   operator=(const __non_propagating_cache& __other) noexcept
+   {
+ if (std::__addressof(__other) != this)
+   this->_M_reset();
+ return *this;
+   }
+
+   constexpr __non_propagating_cache&
+   operator=(__non_propagating_cache&& __other) noexcept
+   {
+ this->_M_reset();
+ __other._M_reset();
+ return *this;
+   }
+
+   constexpr _Tp&
+   operator*() noexcept
+   { return this->_M_get(); }
+
+   constexpr const _Tp&
+   operator*() const noexcept
+   { return this->_M_get(); }
+
+   template
+ _Tp&
+ _M_emplace_deref(const _Iter& __i)
+ {
+   this->_M_reset();
+   // Using _Optional_base::_M_construct to initialize from '*__i'
+   // would incur an extra move due to the indirection, so we instead
+   // use placement new directly.
+   ::new ((void *) std::__addressof(this->_M_payload._M_payload)) 
_Tp(*__i);
+   this->_M_payload._M_engaged = true;
+   return this->_M_get();
+ }
+  };
+
 template
   struct _CachedPosition
   {
@@ -1160,27 +1221,25 @@ namespace views::__adaptor
 
 template
   struct _CachedPosition<_Range>
+   : protected __non_propagating_cache>
   {
-  private:
-   iterator_t<_Range> _M_iter{};
-
-  public:
constexpr bool
_M_has_value() const
-   { return _M_iter != iterator_t<_Range>{}; }
+   { return this->_M_is_engaged(); }
 
constexpr iterator_t<_Range>
_M_get(const _Range&) const
{
  __glibcxx_assert(_M_has_value());
- return _M_iter;
+ return **this;
}
 
constexpr void
_M_set(const _Range&, const iterator_t<_Range>& __it)
{
  __glibcxx_assert(!_M_has_value());
- _M_iter = __it;
+ this->_M_payload._M_payload._M_value = __it;
+ this->_M_payload._M_engaged = true;
}
   };
 
@@ -2339,66 +2398,6 @@ namespace views::__adaptor
 inline constexpr _DropWhile drop_while;
   } // namespace views
 
-  namespace __detail
-  {
-template
-  struct __non_propagating_cache
-  {
-   // When _Tp is not an object type (e.g. is a reference type), we make
-   // __non_propagating_cache<_Tp> empty rather than ill-formed so that
-   // users can easily conditionally declare data members with this type
-   // (such as join_view::_M_inner).
-  };
-
-template
-  requires is_object_v<_Tp>
-  struct __non_propagating_cache<_Tp> : private _Optional_base<_Tp>
-  {
-   __non_propagating_cache() = default;
-
-

Re: [PATCH] Add a couple of A?CST1:CST2 match and simplify optimizations

2021-05-17 Thread Bernd Edlinger

On 5/16/21 10:36 PM, apinski--- via Gcc-patches wrote:
> From: Andrew Pinski 
> 
> Instead of some of the more manual optimizations inside phi-opt,
> it would be good idea to do a lot of the heavy lifting inside match
> and simplify instead. In the process, this moves the three simple
> A?CST1:CST2 (where CST1 or CST2 is zero) simplifications.
> 
> OK? Boostrapped and tested on x86_64-linux-gnu with no regressions.
> 
> Thanks,
> Andrew Pinski
> 
> gcc:
> * match.pd (A?CST1:CST2): Add simplifcations for A?0:+-1, A?+-1:0,
> A?POW2:0 and A?0:POW2.
> ---
>  gcc/match.pd | 37 +
>  1 file changed, 37 insertions(+)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 10503b97ab5..844f7dd5f87 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3711,6 +3711,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (if (integer_all_onesp (@1) && integer_zerop (@2))
>  @0
>  
> +/* A few simplifications of "a ? CST1 : CST2". */
> +/* NOTE: Only do this on gimple as the if-chain-to-switch
> +   optimization depends on the gimple to have if statements in it. */
> +#if GIMPLE
> +(simplify
> + (cond @0 INTEGER_CST@1 INTEGER_CST@2)
> + (switch
> +  (if (integer_zerop (@2))
> +   (switch
> +/* a ? 1 : 0 -> a if 0 and 1 are integral types. */
> +(if (integer_onep (@1))
> + (convert (convert:boolean_type_node @0)))
> +/* a ? -1 : 0 -> -a. */
> +(if (integer_all_onesp (@1))
> + (negate (convert (convert:boolean_type_node @0
> +/* a ? powerof2cst : 0 -> a << (log2(powerof2cst)) */
> +(if (!POINTER_TYPE_P (type) && integer_pow2p (@1))
> + (with {
> +   tree shift = build_int_cst (integer_type_node, wi::exact_log2 
> (wi::to_wide (@1)));
> +  }
> +  (lshift (convert (convert:boolean_type_node @0)) { shift; })
> +  (if (integer_zerop (@1))
> +   (switch
> +/* a ? 0 : 1 -> !a. */
> +(if (integer_onep (@2))
> + (convert (bit_not:boolean_type_node (convert:boolean_type_node @0
> +/* a ? -1 : 0 -> -(!a). */
> +(if (integer_all_onesp (@2))
> + (negate (convert (bit_not:boolean_type_node (convert:boolean_type_node 
> @0)
> +/* a ? powerof2cst : 0 -> (!a) << (log2(powerof2cst)) */
> +(if (!POINTER_TYPE_P (type) && integer_pow2p (@2))
> + (with {
> +   tree shift = build_int_cst (integer_type_node, wi::exact_log2 
> (wi::to_wide (@2)));
> +  }
> +  (lshift (convert (bit_not:boolean_type_node (convert:boolean_type_node 
> @0))) { shift; })))
> +#endif
> +
>  /* Simplification moved from fold_cond_expr_with_comparison.  It may also
> be extended.  */
>  /* This pattern implements two kinds simplification:
> 

Hi Andrew,

Sorry, but I don't know what is exactly  wrong with this patch,
but it seems to cause this, when I try to bootstrap it:


/home/ed/gnu/gcc-build-2/./prev-gcc/xgcc -B/home/ed/gnu/gcc-build-2/./prev-gcc/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/lib/ -isystem 
/home/ed/gnu/install/x86_64-pc-linux-gnu/include -isystem 
/home/ed/gnu/install/x86_64-pc-linux-gnu/sys-include   -fchecking=1 -c -g -O2 
-fchecking=1  -gnatpg -gnata -W -Wall -nostdinc -I- -I. -Iada/generated -Iada 
-Iada/gcc-interface -I../../gcc-trunk-1/gcc/ada 
-I../../gcc-trunk-1/gcc/ada/gcc-interface -Iada/libgnat 
-I../../gcc-trunk-1/gcc/ada/libgnat 
../../gcc-trunk-1/gcc/ada/libgnat/a-charac.ads -o ada/libgnat/a-charac.o
/home/ed/gnu/gcc-build-2/./prev-gcc/xgcc -B/home/ed/gnu/gcc-build-2/./prev-gcc/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/bin/ 
-B/home/ed/gnu/install/x86_64-pc-linux-gnu/lib/ -isystem 
/home/ed/gnu/install/x86_64-pc-linux-gnu/include -isystem 
/home/ed/gnu/install/x86_64-pc-linux-gnu/sys-include   -fchecking=1 -c -g -O2 
-fchecking=1  -gnatpg -gnata -W -Wall -nostdinc -I- -I. -Iada/generated -Iada 
-Iada/gcc-interface -I../../gcc-trunk-1/gcc/ada 
-I../../gcc-trunk-1/gcc/ada/gcc-interface -Iada/libgnat 
-I../../gcc-trunk-1/gcc/ada/libgnat 
../../gcc-trunk-1/gcc/ada/libgnat/a-chlat1.ads -o ada/libgnat/a-chlat1.o
+===GNAT BUG DETECTED==+
| 12.0.0 20210517 (experimental) (x86_64-pc-linux-gnu) Storage_Error stack 
overflow or erroneous memory access|
| Error detected at a-charac.ads:16:12 |
| Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
| Use a subject line meaningful to you and us to track the bug.|
| Include the entire contents of this bug box in the report.   |
| Include the exact command that you entered.

Re: RFA: Add option -fretry-compilation

2021-05-17 Thread Joern Rennecke

On Mon, 17 May 2021 at 11:59, Richard Biener  wrote:

> The plan for reload is to axe it similar to CC0 support.  Sooner than later, 
> but
> give it's still used exclusively by a lot of target means it might
> take some time.

> So for you it's always just -fretry-compilation -m[no-]lra?  Given -m[no-]lra
> is a thing cycling between the two directly in RA lra/reload should be 
> possible?

Even if that were possible, it wouldn't solve the problem.  When I try compiling
newlib without -fretry-compilation, it's falling over first for
libc/time/strftime.c .
With lra, lra finishes, but it ignores an earlyclobber constraint, so
reload_cse_simplify_operands ICEs.  With reload, you get a spill failure.
I've tried various options, but only -O0 seems to work.  Compiling strftime with
-O0 is not really an issue because the target is too deeply embedded to hope
to link something that uses strftime.  But identifyig all the files
that can't be
compiled with optimization and treating them differently is a problem if it has
to be done by hand.

> Or are reload/LRA too greedy in that they ICE when having transformed half
> of the code already?

Both of them do a lot of transformations before they ICE.  Or they don't even
ICE themselves, but leave behind invalid rtl that a later pass catches.

Even if we fixed both passes so that they could roll back everything
(which I think would be a lot harder for lra; reload can already roll
back a lot),
what's the point if you axe reload soon after?

> I see.  It's of course difficult for the FSF tree to cater for
> extremes that are not
> represented in its tree.  I wonder what prevents you from contributing the 
> port?

I can neither confirm nor deny that I can't contribute the port.

> Still if that solves a lot of the issues this seems like the way to go.

It has merit in it's own right, but it can't fix all the ICEs, and thus doesn't
make building libraries manageable.

Re: [PATCH] Bail in bounds_of_var_in_loop if scev returns NULL.

2021-05-17 Thread Andrew MacLeod via Gcc-patches


On 5/13/21 4:15 PM, Aldy Hernandez via Gcc-patches wrote:

Both initial_condition_in_loop_num and evolution_part_in_loop_num
can return NULL.  This patch exits if either one is NULL.  Presumably
this didn't happen before, because adjust_range_with_scev was called
far less frequently than in ranger, which can call it for every PHI.

OK pending tests?

gcc/ChangeLog:

PR tree-optimization/100349
* vr-values.c (bounds_of_var_in_loop): Bail if scev returns
  NULL.

gcc/testsuite/ChangeLog:

* gcc.dg/pr100349.c: New test.
-


OK.

Andrew

Re: [PATCH 7/7] [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes

2021-05-17 Thread Julian Brown

On Mon, 17 May 2021 21:14:26 +0800
Chung-Lin Tang  wrote:

> On 2021/5/11 4:57 PM, Julian Brown wrote:
> > This work-in-progress patch tries to get
> > GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION to behave more like
> > GOMP_MAP_ATTACH_DETACH -- in that the mapping is made to form groups
> > to be processed by build_struct_group/build_struct_comp_map.  I
> > think that's important to integrate with how groups of mappings for
> > array sections are handled in other cases.
> > 
> > This patch isn't sufficient by itself to fix a couple of broken
> > test cases at present (libgomp.c++/target-lambda-1.C,
> > libgomp.c++/target-this-4.C), though.  
> 
> No, GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION is supposed to be just
> a slightly different behavior version of GOMP_MAP_ATTACH; it
> tolerates an unmapped pointer-target and assigns NULL on the device,
> instead of just gomp_fatal(). (see its handling in libgomp/target.c)
> 
> In case OpenACC can have the same such zero-length array section
> behavior, we can just share one GOMP_MAP_ATTACH map. For now it is
> treated as separate cases.

OK, understood. But, I'm a bit concerned that we're ignoring some
"hidden rules" with regards to OMP pointer clause ordering/grouping that
certain code (at least the bit that creates GOMP_MAP_STRUCT node
groups, and parts of omp-low.c) relies on. I believe those rules are as
follows:

 - an array slice is mapped using two or three pointers -- two for a
   normal (non-reference) base pointer, and three if we have a
   reference to a pointer (i.e. in C++) or an array descriptor (i.e. in
   Fortran). So we can have e.g.

   GOMP_MAP_TO
   GOMP_MAP_ALWAYS_POINTER

   GOMP_MAP_TO
   GOMP_MAP_.*_POINTER
   GOMP_MAP_ALWAYS_POINTER

   GOMP_MAP_TO
   GOMP_MAP_TO_PSET
   GOMP_MAP_ALWAYS_POINTER

 - for OpenACC, we extend this to allow (up to and including
   gimplify.c) the GOMP_MAP_ATTACH_DETACH mapping. So we can have (for
   component refs):

   GOMP_MAP_TO
   GOMP_MAP_ATTACH_DETACH

   GOMP_MAP_TO
   GOMP_MAP_TO_PSET
   GOMP_MAP_ATTACH_DETACH

   GOMP_MAP_TO
   GOMP_MAP_.*_POINTER
   GOMP_MAP_ATTACH_DETACH

For the scanning in insert_struct_comp_map (as it is at present) to
work right, these groups must stay intact.  I think the current
behaviour of omp_target_reorder_clauses on the og10 branch can break
those groups apart though!

(The "prev_list_p" stuff in the loop in question in gimplify.c just
keeps track of the first node in these groups.)

For OpenACC, the GOMP_MAP_ATTACH_DETACH code does *not* depend on the
previous clause when lowering in omp-low.c. But GOMP_MAP_ALWAYS_POINTER
does! And in one case ("update" directive), GOMP_MAP_ATTACH_DETACH is
rewritten to GOMP_MAP_ALWAYS_POINTER, so for that case at least, the
dependency on the preceding mapping node must stay intact.

OpenACC also allows "bare" GOMP_MAP_ATTACH and GOMP_MAP_DETACH nodes
(corresponding to the "attach" and "detach" clauses). Those are handled
a bit differently to GOMP_MAP_ATTACH_DETACH in gimplify.c -- but
GOMP_MAP_ATTACH_Z_L_A_S doesn't quite behave like that either, I don't
think?

Anyway: I've not entirely understood what omp_target_reorder_clauses is
doing, but I think it may need to try harder to keep the groups
mentioned above together.  What do you think?

Thanks,

Julian

Re: [PATCH] libstdc++: Fix filesystem::path constraints for volatile [PR 100630]

2021-05-17 Thread Jonathan Wakely via Gcc-patches


On 17/05/21 15:02 +0100, Jonathan Wakely wrote:

The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval())) which mean it considers
conversion from an rvalue.  When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.

Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval(),
not declval(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.

libstdc++-v3/ChangeLog:

PR libstdc++/100630
* include/bits/fs_path.h (__is_constructible_from): Test
construction from a const lvalue, not an rvalue.
* include/experimental/bits/fs_path.h (__is_constructible_from):
Likewise.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.

Tested x86_64-linux, pushed to gcc-10 (this isn't needed for gcc-11 or
trunk, but I also plan to backport it to gcc-9).


Oh actually this is needed for experimental::filesystem::path on trun
kand gcc-11 (as I found when I added to the new tests to trunk) so
I'll fix it there too.

[PATCH] c/100625 - avoid building invalid labels in the GIMPLE FE

2021-05-17 Thread Richard Biener

When duplicate labes are diagnosed, avoid building a GIMPLE_LABEL.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

2021-05-17  Richard Biener  

PR c/100625
gcc/c/
* gimple-parser.c (c_parser_gimple_label): Avoid building
a GIMPLE label with NULL label decl.

* gcc.dg/gimplefe-error-9.c: New testcase.
---
 gcc/c/gimple-parser.c   | 3 ++-
 gcc/testsuite/gcc.dg/gimplefe-error-9.c | 9 +
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/gimplefe-error-9.c

diff --git a/gcc/c/gimple-parser.c b/gcc/c/gimple-parser.c
index 3a6e72ef002..398e21631d9 100644
--- a/gcc/c/gimple-parser.c
+++ b/gcc/c/gimple-parser.c
@@ -1887,7 +1887,8 @@ c_parser_gimple_label (gimple_parser , gimple_seq 
*seq)
   gcc_assert (c_parser_next_token_is (parser, CPP_COLON));
   c_parser_consume_token (parser);
   tree label = define_label (loc1, name);
-  gimple_seq_add_stmt_without_update (seq, gimple_build_label (label));
+  if (label)
+gimple_seq_add_stmt_without_update (seq, gimple_build_label (label));
   return;
 }
 
diff --git a/gcc/testsuite/gcc.dg/gimplefe-error-9.c 
b/gcc/testsuite/gcc.dg/gimplefe-error-9.c
new file mode 100644
index 000..87014c1cbbf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gimplefe-error-9.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple" } */
+
+void __GIMPLE
+foo()
+{
+bb1:
+bb1:; /* { dg-error "duplicate" } */
+}
-- 
2.26.2

Re: [PATCH 5/5] Mapping of components of references to pointers to structs for OpenMP/OpenACC

2021-05-17 Thread Julian Brown

On Mon, 17 May 2021 21:07:19 +0800
Chung-Lin Tang  wrote:

> Hi Julian,
> 
> On 2021/5/15 5:27 AM, Julian Brown wrote:
> > GCC currently raises a parse error for indirect accesses to struct
> > members, where the base of the access is a reference to a pointer.
> > This patch fixes that case.  
> 
> > gcc/cp/
> > * semantics.c (finish_omp_clauses): Handle components of
> > references to pointers to structs.
> > 
> > libgomp/
> > * testsuite/libgomp.oacc-c++/deep-copy-17.C: Update test.  
> 
> > --- a/gcc/cp/semantics.c
> > +++ b/gcc/cp/semantics.c
> > @@ -7670,7 +7670,12 @@ finish_omp_clauses (tree clauses, enum
> > c_omp_region_type ort) if ((ort == C_ORT_ACC || ort == C_ORT_OMP)
> >   && TREE_CODE (t) == COMPONENT_REF
> >   && TREE_CODE (TREE_OPERAND (t, 0)) == INDIRECT_REF)
> > -   t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
> > +   {
> > + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
> > + /* References to pointers have a double indirection
> > here.  */
> > + if (TREE_CODE (t) == INDIRECT_REF)
> > +   t = TREE_OPERAND (t, 0);
> > +   }
> >   if (TREE_CODE (t) == COMPONENT_REF
> >   && ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP
> >   || ort == C_ORT_ACC)  
> 
> There is already a large plethora of such modifications in this patch:
> "[PATCH, OG10, OpenMP 5.0, committed] Remove array section
> base-pointer mapping semantics, and other front-end adjustments."
> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html
> 
> I am in the process of taking that patch to mainline, so are you sure
> this is not already handled there?

Hmm, it might be -- thanks. Consider this patch withdrawn if so. (But
yeah, keep the test case by all means!)

Julian

Re: [PATCH 4/5] Rework indirect struct handling for OpenACC/OpenMP in gimplify.c

2021-05-17 Thread Julian Brown

On Mon, 17 May 2021 14:12:00 +0200
Bernd Edlinger  wrote:

> >  */ @@ -8715,19 +8770,26 @@ static tree
> >  build_struct_group (struct gimplify_omp_ctx *ctx,
> > enum omp_region_type region_type, enum
> > tree_code code, tree decl, unsigned int *flags, tree c,
> > -   hash_map *_map_to_clause,
> > +   hash_map
> > *_map_to_clause, tree *_list_p, tree *_p, bool
> > *cont) {
> >poly_offset_int coffset;
> >poly_int64 cbitpos;
> > -  tree base_ref;
> > +  tree base_ind, base_ref;
> > +  tree *list_in_p = list_p, *prev_list_in_p = prev_list_p;
> >
> 
> Is this a kind of debug code?
> This fails to compile:
> 
> ../../gcc-trunk/gcc/gimplify.c: In function ‘tree_node*
> build_struct_group(gimplify_omp_ctx*, omp_region_type, tree_code,
> tree, unsigned int*, tree, hash_map*&,
> tree_node**&, tree_node**&, bool*)’:
> ../../gcc-trunk/gcc/gimplify.c:8779:9: error: unused variable
> ‘list_in_p’ [-Werror=unused-variable] 8779 |   tree *list_in_p =
> list_p, *prev_list_in_p = prev_list_p; | ^
> ../../gcc-trunk/gcc/gimplify.c:8779:30: error: unused variable
> ‘prev_list_in_p’ [-Werror=unused-variable] 8779 |   tree *list_in_p =
> list_p, *prev_list_in_p = prev_list_p; |
> ^~

Oops, that's left over from an earlier iteration of the patch, and
indeed isn't needed any more. I'll be sure to bootstrap the next
iteration of these patches I send upstream.

Thanks,

Julian

Re: [PATCH 1/2] c-family: Copy DECL_USER_ALIGN even if DECL_ALIGN is similar.

2021-05-17 Thread Robin Dapp via Gcc-patches


on s390 a warning test fails:

inline int ATTR ((cold, aligned (8)))
finline_hot_noret_align (int);

inline int ATTR ((warn_unused_result))
finline_hot_noret_align (int);

inline int ATTR ((aligned (4)))
finline_hot_noret_align (int);  /* { dg-warning "ignoring attribute
.aligned \\(4\\). because it conflicts with attribute .aligned \\(8\\)."

This test actually uncovered two problems.  First, on s390 the default
function alignment is 8 bytes.  When the second decl above is merged
with the first one, DECL_USER_ALIGN is only copied if DECL_ALIGN (old) >
DECL_ALIGN (new).  Subsequently, when merging the third decl, no warning
is emitted since DECL_USER_ALIGN is unset.


[..]

Ping.

[PATCH] libstdc++: Fix filesystem::path constraints for volatile [PR 100630]

2021-05-17 Thread Jonathan Wakely via Gcc-patches

The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval())) which mean it considers
conversion from an rvalue.  When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.

Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval(),
not declval(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.

libstdc++-v3/ChangeLog:

PR libstdc++/100630
* include/bits/fs_path.h (__is_constructible_from): Test
construction from a const lvalue, not an rvalue.
* include/experimental/bits/fs_path.h (__is_constructible_from):
Likewise.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.

Tested x86_64-linux, pushed to gcc-10 (this isn't needed for gcc-11 or
trunk, but I also plan to backport it to gcc-9).


commit 4cd69a5a0dd31bc6fdef1bbabc8d6d1416014ea1
Author: Jonathan Wakely 
Date:   Mon May 17 11:54:06 2021

libstdc++: Fix filesystem::path constraints for volatile [PR 100630]

The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval())) which mean it considers
conversion from an rvalue.  When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.

Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval(),
not declval(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.

libstdc++-v3/ChangeLog:

PR libstdc++/100630
* include/bits/fs_path.h (__is_constructible_from): Test
construction from a const lvalue, not an rvalue.
* include/experimental/bits/fs_path.h (__is_constructible_from):
Likewise.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 3d341916db5..6e0a85417cc 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -116,7 +116,7 @@ namespace __detail
 
   template
 struct __constructible_from<_Source, void>
-: decltype(__is_path_src(std::declval<_Source>(), 0))
+: decltype(__is_path_src(std::declval(), 0))
 { };
 
   template
diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h 
b/libstdc++-v3/include/experimental/bits/fs_path.h
index c5fc3beed1f..0a8f4eee0a1 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -124,7 +124,7 @@ namespace __detail
 
   template
 struct __constructible_from<_Source, void>
-: decltype(__is_path_src(std::declval<_Source>(), 0))
+: decltype(__is_path_src(std::declval(), 0))
 { };
 
   template
+
+void f(bool) { }
+void f(const std::filesystem::path&) { }
+
+void
+test_100630()
+{
+  volatile bool b = true;
+  f(b);
+}
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc
new file mode 100644
index 000..b2428ff74cf
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/path/construct/100630.cc
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++11 } }
+// { dg-require-filesystem-ts "" }
+
+#include 
+
+void f(bool) { }
+void f(const std::experimental::filesystem::path&) { }
+
+void
+test_100630()
+{
+  volatile bool b = true;
+  f(b);
+}

[PATCH][v2] c/100547 - reject overly large vector_size attributes

2021-05-17 Thread Richard Biener

This rejects a number of vector components that does not fit an 'int'
which is an internal limitation of RTVEC.  This requires adjusting
gcc.dg/attr-vector_size.c which checks for much larger
supported vectors.  Note that the RTVEC limitation is a host specific
limitation (unless we change this 'int' to int32_t), but should be
32bits in practice everywhere.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2021-05-12  Richard Biener  

PR c/100547
gcc/c-family/
* c-attribs.c (type_valid_for_vector_size): Reject too large nunits.
Reword existing nunit diagnostic.

* gcc.dg/pr100547.c: New testcase.
* gcc.dg/attr-vector_size.c: Adjust.
---
 gcc/c-family/c-attribs.c| 16 +--
 gcc/testsuite/gcc.dg/attr-vector_size.c | 16 ---
 gcc/testsuite/gcc.dg/pr100547.c | 35 +
 3 files changed, 49 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr100547.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index f54388e9939..ecb32c70172 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -4245,10 +4245,22 @@ type_valid_for_vector_size (tree type, tree atname, 
tree args,
   if (nunits & (nunits - 1))
 {
   if (error_p)
-   error ("number of components of the vector not a power of two");
+   error ("number of vector components %wu not a power of two", nunits);
   else
warning (OPT_Wattributes,
-"number of components of the vector not a power of two");
+"number of vector components %wu not a power of two", nunits);
+  return NULL_TREE;
+}
+
+  if (nunits >= (unsigned HOST_WIDE_INT)INT_MAX)
+{
+  if (error_p)
+   error ("number of vector components %wu exceeds %d",
+  nunits, INT_MAX - 1);
+  else
+   warning (OPT_Wattributes,
+"number of vector components %wu exceeds %d",
+nunits, INT_MAX - 1);
   return NULL_TREE;
 }
 
diff --git a/gcc/testsuite/gcc.dg/attr-vector_size.c 
b/gcc/testsuite/gcc.dg/attr-vector_size.c
index 00be26accd5..3f2ce889121 100644
--- a/gcc/testsuite/gcc.dg/attr-vector_size.c
+++ b/gcc/testsuite/gcc.dg/attr-vector_size.c
@@ -22,14 +22,6 @@ DEFVEC (extern, 30);
 
 #if __SIZEOF_SIZE_T__ > 4
 
-DEFVEC (extern, 31);
-DEFVEC (extern, 32);
-DEFVEC (extern, 33);
-DEFVEC (extern, 34);
-DEFVEC (extern, 60);
-DEFVEC (extern, 61);
-DEFVEC (extern, 62);
-
 VEC (POW2 (63)) char v63; /* { dg-error  "'vector_size' attribute argument 
value '9223372036854775808' exceeds 9223372036854775807" "LP64" { target lp64 } 
} */
 
 #else
@@ -49,14 +41,6 @@ void test_local_scope (void)
 
 #if __SIZEOF_SIZE_T__ > 4
 
-  DEFVEC (auto, 31);
-  DEFVEC (auto, 32);
-  DEFVEC (auto, 33);
-  DEFVEC (auto, 34);
-  DEFVEC (auto, 60);
-  DEFVEC (auto, 61);
-  DEFVEC (auto, 62);
-
   VEC (POW2 (63)) char v63;   /* { dg-error  "'vector_size' attribute argument 
value '9223372036854775808' exceeds 9223372036854775807" "LP64" { target lp64 } 
} */
 
 #else
diff --git a/gcc/testsuite/gcc.dg/pr100547.c b/gcc/testsuite/gcc.dg/pr100547.c
new file mode 100644
index 000..2d3da4eb50e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr100547.c
@@ -0,0 +1,35 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O -g" } */
+
+typedef int __attribute__((vector_size(
+((8 * sizeof(short)) * sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short)) *
+   sizeof(short)) *
+  sizeof(short)) *
+ sizeof(short)) *
+sizeof(short V; /* { dg-error "number of vector components" } */
+void k() { V w = { 0 }; }
-- 
2.26.2

[PATCH] middle-end/100582 - fix array_at_struct_end_p for vector indexing

2021-05-17 Thread Richard Biener

Vector indexing leaves us with ARRAY_REFs of VIEW_CONVERT_EXPRs,
sth which array_at_struct_end_p considers a array-at-struct-end
even when there's an underlying decl visible.  The following fixes
the latter.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-05-17  Richard Biener  

PR middle-end/100582
* tree.c (array_at_struct_end_p): Get to the base of the
reference before looking for the underlying decl.

* gcc.target/i386/pr100582.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr100582.c | 16 
 gcc/tree.c   |  8 +++-
 2 files changed, 19 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100582.c

diff --git a/gcc/testsuite/gcc.target/i386/pr100582.c 
b/gcc/testsuite/gcc.target/i386/pr100582.c
new file mode 100644
index 000..9520fe7a197
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100582.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2" } */
+
+typedef unsigned char v32qi __attribute__((vector_size(32)));
+
+v32qi
+f2 (v32qi x, v32qi a, v32qi b)
+{
+v32qi e;
+  for (int i = 0; i != 32; i++)
+ e[i] = x[i] ? a[i] : b[i];
+
+  return e;
+}
+
+/* { dg-final { scan-assembler-times "pblendvb" 1 } } */
diff --git a/gcc/tree.c b/gcc/tree.c
index 01eda553a65..8afba598eb5 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -12550,13 +12550,11 @@ array_at_struct_end_p (tree ref)
   || ! TYPE_MAX_VALUE (TYPE_DOMAIN (atype)))
 return true;
 
-  if (TREE_CODE (ref) == MEM_REF
-  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
-ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
-
   /* If the reference is based on a declared entity, the size of the array
  is constrained by its given domain.  (Do not trust commons PR/69368).  */
-  if (DECL_P (ref)
+  ref = get_base_address (ref);
+  if (ref
+  && DECL_P (ref)
   && !(flag_unconstrained_commons
   && VAR_P (ref) && DECL_COMMON (ref))
   && DECL_SIZE_UNIT (ref)
-- 
2.26.2

Re: RFA: fix gcc.dg/tree-ssa/popcount4l.c 16 bit failure, improve 64 bit popcount expansion for 32 bit target

2021-05-17 Thread Joern Wolfgang Rennecke

Attached is the updated version of the patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu.

OK to apply?
Recognize popcount also when a double width operation is needed.

2021-01-27  Joern Rennecke  

gcc/
* match.pd :
When generating popcount directly fails, try doing it in two halves.
gcc/testsuite/
* gcc.dg/tree-ssa/popcount4ll.c: Remove lp64 condition.
Adjust scanning pattern for !lp64.
* gcc.dg/tree-ssa/popcount5ll.c: Likewise.
* gcc.dg/tree-ssa/popcount4l.c: Adjust scanning pattern
for ! int32plus.

diff --git a/gcc/match.pd b/gcc/match.pd
index cdb87636951..3cfa5e761a4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6550,10 +6550,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_to_uhwi (@3) == c2
&& tree_to_uhwi (@9) == c3
&& tree_to_uhwi (@7) == c3
-   && tree_to_uhwi (@11) == c4
-   && direct_internal_fn_supported_p (IFN_POPCOUNT, type,
-  OPTIMIZE_FOR_BOTH))
-(convert (IFN_POPCOUNT:type @0)
+   && tree_to_uhwi (@11) == c4)
+(if (direct_internal_fn_supported_p (IFN_POPCOUNT, type,
+OPTIMIZE_FOR_BOTH))
+ (convert (IFN_POPCOUNT:type @0))
+ /* Try to do popcount in two halves.  PREC must be at least
+   five bits for this to work without extension before adding.  */
+ (with {
+   tree half_type = NULL_TREE;
+   machine_mode m = mode_for_size ((prec + 1) / 2, MODE_INT, 1).require ();
+   int half_prec = GET_MODE_PRECISION (as_a  (m));
+   if (m != TYPE_MODE (type))
+half_type = build_nonstandard_integer_type (half_prec, 1);
+   gcc_assert (half_prec > 2);
+  }
+  (if (half_type != NULL_TREE
+  && direct_internal_fn_supported_p (IFN_POPCOUNT, half_type,
+ OPTIMIZE_FOR_BOTH))
+   (convert (plus
+(IFN_POPCOUNT:half_type (convert @0))
+(IFN_POPCOUNT:half_type (convert (rshift @0
+   { build_int_cst (integer_type_node, half_prec); } )))
 
 /* __builtin_ffs needs to deal on many targets with the possible zero
argument.  If we know the argument is always non-zero, __builtin_ctz + 1
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount4l.c 
b/gcc/testsuite/gcc.dg/tree-ssa/popcount4l.c
index 69fb2d1134d..269e56e90f9 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount4l.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount4l.c
@@ -25,6 +25,7 @@ int popcount64c(unsigned long x)
 return (x * h01) >> shift;
 }
 
-/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" { target 
int32plus } } } */
+/* { dg-final { scan-tree-dump "\.POPCOUNT" "optimized" { target { ! int32plus 
} } } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount4ll.c 
b/gcc/testsuite/gcc.dg/tree-ssa/popcount4ll.c
index c1588be68e4..7abadf6df04 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount4ll.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount4ll.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { lp64 } } } */
+/* { dg-do compile } */
 /* { dg-require-effective-target popcountll } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 
@@ -16,4 +16,5 @@ int popcount64c(unsigned long long x)
 return (x * h01) >> shift;
 }
 
-/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" { target { 
lp64 } } } } */
+/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 2 "optimized" { target { ! 
lp64 } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount5ll.c 
b/gcc/testsuite/gcc.dg/tree-ssa/popcount5ll.c
index edb191bf894..2afe08124fe 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount5ll.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount5ll.c
@@ -1,5 +1,5 @@
 /* PR tree-optimization/94800 */
-/* { dg-do compile { target { lp64 } } } */
+/* { dg-do compile } */
 /* { dg-require-effective-target popcountll } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 
@@ -19,4 +19,5 @@ int popcount64c(unsigned long long x)
 return x >> shift;
 }
 
-/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 1 "optimized" { target { 
lp64 } } } } */
+/* { dg-final { scan-tree-dump-times "\.POPCOUNT" 2 "optimized" { target { ! 
lp64 } } } } */

[PATCH v3 09/12] x86: Also pass -mno-avx to cold-attribute-1.c

2021-05-17 Thread H.J. Lu via Gcc-patches

Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/cold-attribute-1.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/cold-attribute-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c 
b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
index 57666ac60b6..658eb3e25bb 100644
--- a/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
+++ b/gcc/testsuite/gcc.target/i386/cold-attribute-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -mno-avx" } */
 #include 
 static inline
 __attribute__ ((cold)) void
-- 
2.31.1

[PATCH v3 05/12] x86: Update piecewise move and store

2021-05-17 Thread H.J. Lu via Gcc-patches

We can use TImode/OImode/XImode integers for piecewise move and store.
When vector register is used for piecewise move and store, we don't
increase stack_alignment_needed since vector register spill isn't
required for piecewise move and store.  Since stack_realign_needed is
set to true by checking stack_alignment_estimated set by pseudo vector
register usage, we also need to check stack_realign_needed to eliminate
frame pointer.

gcc/

* config/i386/i386.c (ix86_finalize_stack_frame_flags): Also
check stack_realign_needed for stack realignment.
(ix86_legitimate_constant_p): Always allow CONST_WIDE_INT smaller
than the largest integer supported by vector register.
* config/i386/i386.h (MOVE_MAX): Set to 64.
(MOVE_MAX_PIECES): Set to bytes of the largest integer supported
by vector register.
(STORE_MAX_PIECES): New.

gcc/testsuite/

* gcc.target/i386/pr90773-1.c: Adjust to expect movq for 32-bit.
* gcc.target/i386/pr90773-4.c: Also run for 32-bit.
* gcc.target/i386/pr90773-14.c: Likewise.
* gcc.target/i386/pr90773-15.c: Likewise.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
---
 gcc/config/i386/i386.c | 21 ---
 gcc/config/i386/i386.h | 31 +-
 gcc/testsuite/gcc.target/i386/pr90773-1.c  | 10 +++
 gcc/testsuite/gcc.target/i386/pr90773-14.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-15.c |  6 ++---
 gcc/testsuite/gcc.target/i386/pr90773-16.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-17.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-4.c  |  2 +-
 8 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8b9b2346478..b5c1436464f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -7943,8 +7943,17 @@ ix86_finalize_stack_frame_flags (void)
  assumed stack realignment might be needed or -fno-omit-frame-pointer
  is used, but in the end nothing that needed the stack alignment had
  been spilled nor stack access, clear frame_pointer_needed and say we
- don't need stack realignment.  */
-  if ((stack_realign || (!flag_omit_frame_pointer && optimize))
+ don't need stack realignment.
+
+ When vector register is used for piecewise move and store, we don't
+ increase stack_alignment_needed as there is no register spill for
+ piecewise move and store.  Since stack_realign_needed is set to true
+ by checking stack_alignment_estimated which is updated by pseudo
+ vector register usage, we also need to check stack_realign_needed to
+ eliminate frame pointer.  */
+  if ((stack_realign
+   || (!flag_omit_frame_pointer && optimize)
+   || crtl->stack_realign_needed)
   && frame_pointer_needed
   && crtl->is_leaf
   && crtl->sp_is_unchanging
@@ -10403,7 +10412,13 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
  /* FALLTHRU */
case E_OImode:
case E_XImode:
- if (!standard_sse_constant_p (x, mode))
+ if (!standard_sse_constant_p (x, mode)
+ && GET_MODE_SIZE (TARGET_AVX512F
+   ? XImode
+   : (TARGET_AVX
+  ? OImode
+  : (TARGET_SSE2
+ ? TImode : DImode))) < GET_MODE_SIZE 
(mode))
return false;
default:
  break;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 45d86802c51..677afbf7031 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1754,7 +1754,7 @@ typedef struct ix86_args {
 
 /* Max number of bytes we can move from memory to memory
in one reasonably fast instruction.  */
-#define MOVE_MAX 16
+#define MOVE_MAX 64
 
 /* MOVE_MAX_PIECES is the number of bytes at a time which we can
move efficiently, as opposed to  MOVE_MAX which is the maximum
@@ -1765,11 +1765,30 @@ typedef struct ix86_args {
widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in
64-bit mode.  */
 #define MOVE_MAX_PIECES \
-  ((TARGET_64BIT \
-&& TARGET_SSE2 \
-&& TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
-&& TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
-   ? GET_MODE_SIZE (TImode) : UNITS_PER_WORD)
+  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
+   ? 64 \
+   : ((TARGET_AVX \
+   && !TARGET_PREFER_AVX128 \
+   && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD \
+   && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \
+  ? 32 \
+  : ((TARGET_SSE2 \
+ && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
+ && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
+? 16 : UNITS_PER_WORD)))
+
+/* STORE_MAX_PIECES is the number of bytes at a time that we can
+   store efficiently.  */
+#define STORE_MAX_PIECES \
+  ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \
+   ? 64 \
+   : ((TARGET_AVX \
+   &&

[PATCH v3 12/12] constructor: Check if it is faster to load constant from memory

2021-05-17 Thread H.J. Lu via Gcc-patches

When expanding a constant constructor, don't call expand_constructor if
it is more efficient to load the data from the memory via move by pieces.

gcc/

PR middle-end/90773
* expr.c (expand_expr_real_1): Don't call expand_constructor if
it is more efficient to load the data from the memory.

gcc/testsuite/

PR middle-end/90773
* gcc.target/i386/pr90773-24.c: New test.
* gcc.target/i386/pr90773-25.c: Likewise.
---
 gcc/expr.c | 10 ++
 gcc/testsuite/gcc.target/i386/pr90773-24.c | 22 ++
 gcc/testsuite/gcc.target/i386/pr90773-25.c | 20 
 3 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-24.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-25.c

diff --git a/gcc/expr.c b/gcc/expr.c
index d09ee42e262..80e01ea1cbe 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -10886,6 +10886,16 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
unsigned HOST_WIDE_INT ix;
tree field, value;
 
+   /* Check if it is more efficient to load the data from
+  the memory directly.  FIXME: How many stores do we
+  need here if not moved by pieces?  */
+   unsigned HOST_WIDE_INT bytes
+ = tree_to_uhwi (TYPE_SIZE_UNIT (type));
+   if ((bytes / UNITS_PER_WORD) > 2
+   && MOVE_MAX_PIECES > UNITS_PER_WORD
+   && can_move_by_pieces (bytes, TYPE_ALIGN (type)))
+ goto normal_inner_ref;
+
FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (init), ix,
  field, value)
  if (tree_int_cst_equal (field, index))
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-24.c 
b/gcc/testsuite/gcc.target/i386/pr90773-24.c
new file mode 100644
index 000..4a4b62533dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-24.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64" } */
+
+struct S
+{
+  long long s1 __attribute__ ((aligned (8)));
+  unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+};
+
+const struct S array[] = {
+  { 0, 60, 640, 2112543726, 39682, 48, 16, 33, 10, 96, 2, 0, 0, 4 }
+};
+
+void
+foo (struct S *x)
+{
+  x[0] = array[0];
+}
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
16\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
32\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movups\[\\t \]%xmm\[0-9\]+, 
48\\(%\[\^,\]+\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-25.c 
b/gcc/testsuite/gcc.target/i386/pr90773-25.c
new file mode 100644
index 000..2520b670989
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-25.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+struct S
+{
+  long long s1 __attribute__ ((aligned (8)));
+  unsigned s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+};
+
+const struct S array[] = {
+  { 0, 60, 640, 2112543726, 39682, 48, 16, 33, 10, 96, 2, 0, 0, 4 }
+};
+
+void
+foo (struct S *x)
+{
+  x[0] = array[0];
+}
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
32\\(%\[\^,\]+\\)" 1 } } */
-- 
2.31.1

[PATCH v3 07/12] x86: Add tests for piecewise move and store

2021-05-17 Thread H.J. Lu via Gcc-patches

* gcc.target/i386/pieces-memcpy-10.c: New test.
* gcc.target/i386/pieces-memcpy-11.c: Likewise.
* gcc.target/i386/pieces-memcpy-12.c: Likewise.
* gcc.target/i386/pieces-memcpy-13.c: Likewise.
* gcc.target/i386/pieces-memcpy-14.c: Likewise.
* gcc.target/i386/pieces-memcpy-15.c: Likewise.
* gcc.target/i386/pieces-memcpy-16.c: Likewise.
* gcc.target/i386/pieces-memcpy-17.c: Likewise.
* gcc.target/i386/pieces-memcpy-18.c: Likewise.
* gcc.target/i386/pieces-memcpy-19.c: Likewise.
* gcc.target/i386/pieces-memset-1.c: Likewise.
* gcc.target/i386/pieces-memset-2.c: Likewise.
* gcc.target/i386/pieces-memset-3.c: Likewise.
* gcc.target/i386/pieces-memset-4.c: Likewise.
* gcc.target/i386/pieces-memset-5.c: Likewise.
* gcc.target/i386/pieces-memset-6.c: Likewise.
* gcc.target/i386/pieces-memset-7.c: Likewise.
* gcc.target/i386/pieces-memset-8.c: Likewise.
* gcc.target/i386/pieces-memset-9.c: Likewise.
* gcc.target/i386/pieces-memset-10.c: Likewise.
* gcc.target/i386/pieces-memset-11.c: Likewise.
* gcc.target/i386/pieces-memset-12.c: Likewise.
* gcc.target/i386/pieces-memset-13.c: Likewise.
* gcc.target/i386/pieces-memset-14.c: Likewise.
* gcc.target/i386/pieces-memset-15.c: Likewise.
* gcc.target/i386/pieces-memset-16.c: Likewise.
* gcc.target/i386/pieces-memset-17.c: Likewise.
* gcc.target/i386/pieces-memset-18.c: Likewise.
* gcc.target/i386/pieces-memset-19.c: Likewise.
* gcc.target/i386/pieces-memset-20.c: Likewise.
* gcc.target/i386/pieces-memset-21.c: Likewise.
* gcc.target/i386/pieces-memset-22.c: Likewise.
* gcc.target/i386/pieces-memset-23.c: Likewise.
* gcc.target/i386/pieces-memset-24.c: Likewise.
* gcc.target/i386/pieces-memset-25.c: Likewise.
* gcc.target/i386/pieces-memset-26.c: Likewise.
* gcc.target/i386/pieces-memset-27.c: Likewise.
* gcc.target/i386/pieces-memset-28.c: Likewise.
* gcc.target/i386/pieces-memset-29.c: Likewise.
* gcc.target/i386/pieces-memset-30.c: Likewise.
* gcc.target/i386/pieces-memset-31.c: Likewise.
* gcc.target/i386/pieces-memset-32.c: Likewise.
* gcc.target/i386/pieces-memset-33.c: Likewise.
* gcc.target/i386/pieces-memset-34.c: Likewise.
* gcc.target/i386/pieces-memset-35.c: Likewise.
* gcc.target/i386/pieces-memset-36.c: Likewise.
* gcc.target/i386/pieces-memset-37.c: Likewise.
* gcc.target/i386/pieces-memset-38.c: Likewise.
* gcc.target/i386/pieces-memset-39.c: Likewise.
* gcc.target/i386/pieces-memset-40.c: Likewise.
* gcc.target/i386/pieces-memset-41.c: Likewise.
* gcc.target/i386/pieces-memset-42.c: Likewise.
* gcc.target/i386/pieces-memset-43.c: Likewise.
* gcc.target/i386/pieces-memset-44.c: Likewise.
---
 .../gcc.target/i386/pieces-memcpy-10.c | 16 
 .../gcc.target/i386/pieces-memcpy-11.c | 17 +
 .../gcc.target/i386/pieces-memcpy-12.c | 16 
 .../gcc.target/i386/pieces-memcpy-13.c | 16 
 .../gcc.target/i386/pieces-memcpy-14.c | 17 +
 .../gcc.target/i386/pieces-memcpy-15.c | 16 
 .../gcc.target/i386/pieces-memcpy-16.c | 16 
 .../gcc.target/i386/pieces-memcpy-7.c  | 15 +++
 .../gcc.target/i386/pieces-memcpy-8.c  | 14 ++
 .../gcc.target/i386/pieces-memcpy-9.c  | 14 ++
 .../gcc.target/i386/pieces-memset-1.c  | 16 
 .../gcc.target/i386/pieces-memset-10.c | 16 
 .../gcc.target/i386/pieces-memset-11.c | 16 
 .../gcc.target/i386/pieces-memset-12.c | 16 
 .../gcc.target/i386/pieces-memset-13.c | 16 
 .../gcc.target/i386/pieces-memset-14.c | 16 
 .../gcc.target/i386/pieces-memset-15.c | 16 
 .../gcc.target/i386/pieces-memset-16.c | 16 
 .../gcc.target/i386/pieces-memset-17.c | 16 
 .../gcc.target/i386/pieces-memset-18.c | 16 
 .../gcc.target/i386/pieces-memset-19.c | 17 +
 .../gcc.target/i386/pieces-memset-2.c  | 12 
 .../gcc.target/i386/pieces-memset-20.c | 17 +
 .../gcc.target/i386/pieces-memset-21.c | 17 +
 .../gcc.target/i386/pieces-memset-22.c | 17 +
 .../gcc.target/i386/pieces-memset-23.c | 17 +
 .../gcc.target/i386/pieces-memset-24.c | 17 +
 .../gcc.target/i386/pieces-memset-25.c | 17 +

[PATCH v3 06/12] x86: Add AVX2 tests for PR middle-end/90773

2021-05-17 Thread H.J. Lu via Gcc-patches

PR middle-end/90773
* gcc.target/i386/pr90773-20.c: New test.
* gcc.target/i386/pr90773-21.c: Likewise.
* gcc.target/i386/pr90773-22.c: Likewise.
* gcc.target/i386/pr90773-23.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr90773-20.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-21.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-22.c | 13 +
 gcc/testsuite/gcc.target/i386/pr90773-23.c | 13 +
 4 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-23.c

diff --git a/gcc/testsuite/gcc.target/i386/pr90773-20.c 
b/gcc/testsuite/gcc.target/i386/pr90773-20.c
new file mode 100644
index 000..e61e405f2b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-21.c 
b/gcc/testsuite/gcc.target/i386/pr90773-21.c
new file mode 100644
index 000..16ad17f3cbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-21.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (int c)
+{
+  __builtin_memset (dst, c, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]%.*, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-22.c 
b/gcc/testsuite/gcc.target/i386/pr90773-22.c
new file mode 100644
index 000..45a8ff65a84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-22.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 33);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movb\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr90773-23.c 
b/gcc/testsuite/gcc.target/i386/pr90773-23.c
new file mode 100644
index 000..9256ce10ff0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90773-23.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=skylake" } */
+
+extern char *dst;
+
+void
+foo (void)
+{
+  __builtin_memset (dst, 0, 34);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[\\t \]%ymm\[0-9\]+, 
\\(%\[\^,\]+\\)" 1 } } */
+/* { dg-final { scan-assembler-times "movw\[\\t \]+.+, 32\\(%\[\^,\]+\\)" 1 } 
} */
-- 
2.31.1

[PATCH v3 11/12] x86: Update gcc.target/i386/incoming-11.c

2021-05-17 Thread H.J. Lu via Gcc-patches

Expect no stack realignment since we no longer realign stack when
copying data.

* gcc.target/i386/incoming-11.c: Expect no stack realignment.
---
 gcc/testsuite/gcc.target/i386/incoming-11.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/incoming-11.c 
b/gcc/testsuite/gcc.target/i386/incoming-11.c
index a830c96f7d1..4b822684b88 100644
--- a/gcc/testsuite/gcc.target/i386/incoming-11.c
+++ b/gcc/testsuite/gcc.target/i386/incoming-11.c
@@ -15,4 +15,4 @@ void f()
for (i = 0; i < 100; i++) q[i] = 1;
 }
 
-/* { dg-final { scan-assembler "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
+/* { dg-final { scan-assembler-not "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */
-- 
2.31.1

[PATCH v3 10/12] x86: Also pass -mno-avx to sw-1.c for ia32

2021-05-17 Thread H.J. Lu via Gcc-patches

Also pass -mno-avx to sw-1.c for ia32 since copying data with YMM or ZMM
registers disables shrink-wrapping when the second argument is passed on
stack.

* gcc.target/i386/sw-1.c: Also pass -mno-avx for ia32.
---
 gcc/testsuite/gcc.target/i386/sw-1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/i386/sw-1.c 
b/gcc/testsuite/gcc.target/i386/sw-1.c
index aec095eda62..a9c89fca4ec 100644
--- a/gcc/testsuite/gcc.target/i386/sw-1.c
+++ b/gcc/testsuite/gcc.target/i386/sw-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune=generic -fshrink-wrap -fdump-rtl-pro_and_epilogue" 
} */
+/* { dg-additional-options "-mno-avx" { target ia32 } } */
 /* { dg-skip-if "No shrink-wrapping preformed" { x86_64-*-mingw* } } */
 
 #include 
-- 
2.31.1

[PATCH v3 08/12] x86: Also pass -mno-avx to pr72839.c

2021-05-17 Thread H.J. Lu via Gcc-patches

Also pass -mno-avx to pr72839.c to avoid copying data with YMM or ZMM
registers.

* gcc.target/i386/pr72839.c: Also pass -mno-avx.
---
 gcc/testsuite/gcc.target/i386/pr72839.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr72839.c 
b/gcc/testsuite/gcc.target/i386/pr72839.c
index ea724f70377..6888d9d0a55 100644
--- a/gcc/testsuite/gcc.target/i386/pr72839.c
+++ b/gcc/testsuite/gcc.target/i386/pr72839.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target ia32 } */
-/* { dg-options "-O2 -mtune=lakemont" } */
+/* { dg-options "-O2 -mtune=lakemont -mno-avx" } */
 
 extern char *strcpy (char *, const char *);
 
-- 
2.31.1

[PATCH v3 02/12] x86: Add TARGET_READ_MEMSET_VALUE/TARGET_GEN_MEMSET_VALUE

2021-05-17 Thread H.J. Lu via Gcc-patches

1. Make ix86_expand_vector_init_duplicate global to duplicate QImode
value to TImode/OImode/XImode.
2. Make ix86_minimum_incoming_stack_boundary global and add an argument
to ignore stack_alignment_estimated.
3. Define SCRATCH_SSE_REG as a scratch register for ix86_gen_memset_value.
4. Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
target instructions to duplicate QImode value to TImode/OImode/XImode
value for memmset.

gcc/

PR middle-end/90773
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
Make it global.
* config/i386/i386-protos.h (ix86_minimum_incoming_stack_boundary):
New.
(ix86_expand_vector_init_duplicate): Likewise.
* config/i386/i386.c (ix86_minimum_incoming_stack_boundary): Add
an argument to ignore stack_alignment_estimated.  It is passed
as false by default.  Make it global.
(ix86_gen_memset_value_from_prev): New function.
(ix86_gen_memset_value): Likewise.
(ix86_read_memset_value): Likewise.
(TARGET_GEN_MEMSET_VALUE): New.
(TARGET_READ_MEMSET_VALUE): Likewise.
* config/i386/i386.h (SCRATCH_SSE_REG): New.

gcc/testsuite/

PR middle-end/90773
* gcc.target/i386/pr90773-15.c: New test.
* gcc.target/i386/pr90773-16.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr90773-18.c: Likewise.
* gcc.target/i386/pr90773-19.c: Likewise.
---
 gcc/config/i386/i386-expand.c  |   2 +-
 gcc/config/i386/i386-protos.h  |   5 +
 gcc/config/i386/i386.c | 268 -
 gcc/config/i386/i386.h |   4 +
 gcc/testsuite/gcc.target/i386/pr90773-15.c |  14 ++
 gcc/testsuite/gcc.target/i386/pr90773-16.c |  14 ++
 gcc/testsuite/gcc.target/i386/pr90773-17.c |  14 ++
 gcc/testsuite/gcc.target/i386/pr90773-18.c |  15 ++
 gcc/testsuite/gcc.target/i386/pr90773-19.c |  14 ++
 9 files changed, 345 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-19.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 0fa8d45a684..485825b3c15 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -13648,7 +13648,7 @@ static bool expand_vec_perm_1 (struct expand_vec_perm_d 
*d);
 /* A subroutine of ix86_expand_vector_init.  Store into TARGET a vector
with all elements equal to VAR.  Return true if successful.  */
 
-static bool
+bool
 ix86_expand_vector_init_duplicate (bool mmx_ok, machine_mode mode,
   rtx target, rtx val)
 {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 7782cf1163f..c4896c2da74 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -50,6 +50,9 @@ extern void ix86_reset_previous_fndecl (void);
 
 extern bool ix86_using_red_zone (void);
 
+extern unsigned int ix86_minimum_incoming_stack_boundary (bool,
+ bool = false);
+
 extern unsigned int ix86_regmode_natural_size (machine_mode);
 #ifdef RTX_CODE
 extern int standard_80387_constant_p (rtx);
@@ -257,6 +260,8 @@ extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, 
bool, bool);
 extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_abs (rtx, rtx);
+extern bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx,
+  rtx);
 
 /* In i386-c.c  */
 extern void ix86_target_macros (void);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6a1f5746089..8b9b2346478 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -415,7 +415,6 @@ static unsigned int split_stack_prologue_scratch_regno 
(void);
 static bool i386_asm_output_addr_const_extra (FILE *, rtx);
 
 static bool ix86_can_inline_p (tree, tree);
-static unsigned int ix86_minimum_incoming_stack_boundary (bool);
 
 
 /* Whether -mtune= or -march= were specified */
@@ -7232,8 +7231,9 @@ find_drap_reg (void)
 
 /* Return minimum incoming stack alignment.  */
 
-static unsigned int
-ix86_minimum_incoming_stack_boundary (bool sibcall)
+unsigned int
+ix86_minimum_incoming_stack_boundary (bool sibcall,
+ bool ignore_estimated)
 {
   unsigned int incoming_stack_boundary;
 
@@ -7248,7 +7248,8 @@ ix86_minimum_incoming_stack_boundary (bool sibcall)
  estimated stack alignment is 128bit.  */
   else if (!sibcall
   && ix86_force_align_arg_pointer
-  && crtl->stack_alignment_estimated == 128)
+  && (ignore_estimated
+

[PATCH v3 03/12] x86: Avoid stack realignment when copying data

2021-05-17 Thread H.J. Lu via Gcc-patches

To avoid stack realignment, use SCRATCH_SSE_REG to copy data from one
memory location to another.

gcc/

* config/i386/i386-expand.c (ix86_expand_vector_move): Use
SCRATCH_SSE_REG to copy data from one memory location to
another.

gcc/testsuite/

* gcc.target/i386/eh_return-1.c: New test.
---
 gcc/config/i386/i386-expand.c   | 16 -
 gcc/testsuite/gcc.target/i386/eh_return-1.c | 26 +
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/eh_return-1.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 485825b3c15..f799678b273 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -431,7 +431,21 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
   && !register_operand (op0, mode)
   && !register_operand (op1, mode))
 {
-  emit_move_insn (op0, force_reg (GET_MODE (op0), op1));
+  rtx tmp;
+  mode = GET_MODE (op0);
+  if (TARGET_SSE
+ && (GET_MODE_ALIGNMENT (mode)
+ > ix86_minimum_incoming_stack_boundary (false, true)))
+   {
+ /* NB: Don't increase stack alignment requirement by using
+a scratch SSE register to copy data from one memory
+location to another since it doesn't require a spill.  */
+ tmp = gen_rtx_REG (mode, SCRATCH_SSE_REG);
+ emit_move_insn (tmp, op1);
+   }
+  else
+   tmp = force_reg (mode, op1);
+  emit_move_insn (op0, tmp);
   return;
 }
 
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c 
b/gcc/testsuite/gcc.target/i386/eh_return-1.c
new file mode 100644
index 000..671ba635e88
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell -mno-avx512f" } */
+
+struct _Unwind_Context
+{
+  void *ra;
+  char array[48];
+};
+
+extern long uw_install_context_1 (struct _Unwind_Context *);
+
+void
+_Unwind_RaiseException (void)
+{
+  struct _Unwind_Context this_context, cur_context;
+  long offset = uw_install_context_1 (_context);
+  __builtin_memcpy (_context, _context,
+   sizeof (struct _Unwind_Context));
+  void *handler = __builtin_frob_return_addr ((_context)->ra);
+  uw_install_context_1 (_context);
+  __builtin_eh_return (offset, handler);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 4 } } */
+/* No need to dynamically realign the stack here.  */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
-- 
2.31.1

[PATCH v3 01/12] Add TARGET_READ_MEMSET_VALUE/TARGET_GEN_MEMSET_VALUE

2021-05-17 Thread H.J. Lu via Gcc-patches

Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
target instructions to duplicate QImode value to TImode/OImode/XImode
value for memmset.

PR middle-end/90773
* builtins.c (builtin_memset_read_str): Call
targetm.read_memset_value.
(builtin_memset_gen_str): Call targetm.gen_memset_value.
* target.def (read_memset_value): New hook.
(gen_memset_value): Likewise.
* targhooks.c: Inclue "builtins.h".
(default_read_memset_value): New function.
(default_gen_memset_value): Likewise.
* targhooks.h (default_read_memset_value): New prototype.
(default_gen_memset_value): Likewise.
* doc/tm.texi.in: Add TARGET_READ_MEMSET_VALUE and
TARGET_GEN_MEMSET_VALUE hooks.
* doc/tm.texi: Regenerated.
---
 gcc/builtins.c | 47 --
 gcc/doc/tm.texi| 16 +
 gcc/doc/tm.texi.in |  4 
 gcc/target.def | 20 +
 gcc/targhooks.c| 56 ++
 gcc/targhooks.h|  4 
 6 files changed, 104 insertions(+), 43 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index e1b284846b1..f78a36478ef 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6584,24 +6584,11 @@ expand_builtin_strncpy (tree exp, rtx target)
previous iteration.  */
 
 rtx
-builtin_memset_read_str (void *data, void *prevp,
+builtin_memset_read_str (void *data, void *prev,
 HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
 scalar_int_mode mode)
 {
-  by_pieces_prev *prev = (by_pieces_prev *) prevp;
-  if (prev != nullptr && prev->data != nullptr)
-{
-  /* Use the previous data in the same mode.  */
-  if (prev->mode == mode)
-   return prev->data;
-}
-
-  const char *c = (const char *) data;
-  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
-
-  memset (p, *c, GET_MODE_SIZE (mode));
-
-  return c_readstr (p, mode);
+  return targetm.read_memset_value ((const char *) data, prev, mode);
 }
 
 /* Callback routine for store_by_pieces.  Return the RTL of a register
@@ -6611,37 +6598,11 @@ builtin_memset_read_str (void *data, void *prevp,
nullptr, it has the RTL info from the previous iteration.  */
 
 static rtx
-builtin_memset_gen_str (void *data, void *prevp,
+builtin_memset_gen_str (void *data, void *prev,
HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
scalar_int_mode mode)
 {
-  rtx target, coeff;
-  size_t size;
-  char *p;
-
-  by_pieces_prev *prev = (by_pieces_prev *) prevp;
-  if (prev != nullptr && prev->data != nullptr)
-{
-  /* Use the previous data in the same mode.  */
-  if (prev->mode == mode)
-   return prev->data;
-
-  target = simplify_gen_subreg (mode, prev->data, prev->mode, 0);
-  if (target != nullptr)
-   return target;
-}
-
-  size = GET_MODE_SIZE (mode);
-  if (size == 1)
-return (rtx) data;
-
-  p = XALLOCAVEC (char, size);
-  memset (p, 1, size);
-  coeff = c_readstr (p, mode);
-
-  target = convert_to_mode (mode, (rtx) data, 1);
-  target = expand_mult (mode, target, coeff, NULL_RTX, 1);
-  return force_reg (mode, target);
+  return targetm.gen_memset_value ((rtx) data, prev, mode);
 }
 
 /* Expand expression EXP, which is a call to the memset builtin.  Return
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 85ea9395560..51385044e76 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11868,6 +11868,22 @@ This function prepares to emit a conditional 
comparison within a sequence
  @var{bit_code} is @code{AND} or @code{IOR}, which is the op on the compares.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_READ_MEMSET_VALUE (const char *@var{c}, 
void *@var{prev}, scalar_int_mode @var{mode})
+This function returns the RTL of a constant integer corresponding to
+target reading @code{GET_MODE_SIZE (@var{mode})} bytes from the stringn
+constant @var{str}.  If @var{prev} is not @samp{nullptr}, it contains
+the RTL information from the previous interation.
+@end deftypefn
+
+@deftypefn {Target Hook} rtx TARGET_GEN_MEMSET_VALUE (rtx @var{data}, void 
*@var{prev}, scalar_int_mode @var{mode})
+This function returns the RTL of a register containing
+@code{GET_MODE_SIZE (@var{mode})} consecutive copies of the unsigned
+char value given in the RTL register @var{data}.  For example, if
+@var{mode} is 4 bytes wide, return the RTL for 0x01010101*@var{data}.
+If @var{PREV} is not @samp{nullptr}, it is the RTL information from
+the previous iteration.
+@end deftypefn
+
 @deftypefn {Target Hook} unsigned TARGET_LOOP_UNROLL_ADJUST (unsigned 
@var{nunroll}, class loop *@var{loop})
 This target hook returns a new value for the number of times @var{loop}
 should be unrolled. The parameter @var{nunroll} is the number of times
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index d8e3de14af1..8d4c3949fbf 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@

[PATCH v3 00/12] Allow TImode/OImode/XImode in op_by_pieces operations

2021-05-17 Thread H.J. Lu via Gcc-patches

Changes in the v3 patches:

1. Split the TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE changes
into the generic part and the x86 part.


1. Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
target instructions to duplicate QImode value to TImode/OImode/XImode
value for memmset.
2. x86: Avoid stack realignment when copying data
3. x86: Remov MAX_BITSIZE_MODE_ANY_INT.  Only x86 backend defines it.
4. x86: Use TImode/OImode/XImode integers for piecewise move and store.
5. x86: Add tests for TImode/OImode/XImode for piecewise move and store.
6. x86: Adjust existing tests.

On x86-64, SPEC CPU 2017 performance impact is neutral.  Glibc code size
differences with -O2 build are:

 Before After
libc.so 19065721906444

Some code sequence differences in libc.so are:

:
...
jne   | jne

test   %r15,%r15test   
%r15,%r15
je| je 

mov%r13d,(%r14) mov
%r13d,(%r14)
lea0x10(%r14),%rdi  lea
0x10(%r14),%rdi
mov$0x1,%ecxmov
$0x1,%ecx
mov%r13d,%edx   mov
%r13d,%edx
mov%r15,0x40(%r12)  mov
%r15,0x40(%r12)
mov%r15,%rsimov
%r15,%rsi
call call   

lea0xa2f9b(%rip),%rax# | lea
0xa2fab(%rip),%rax# 
xor%esi,%esixor
%esi,%esi
mov%ebp,%edimov
%ebp,%edi
mov%rax,0x8(%r12)   mov
%rax,0x8(%r12)
movzwl 0x12(%rsp),%eax  movzwl 
0x12(%rsp),%eax
mov$0x8,%edx  <
lea0xc(%rsp),%rcx   lea
0xc(%rsp),%rcx
mov%r14,0x48(%r12)<
add$0x40,%r14 <
mov$0x4,%r8dmov
$0x4,%r8d
  > movq   
$0x0,0x1d0(%r14)
  > mov
$0x8,%edx
rol$0x8,%ax rol
$0x8,%ax
mov%ebp,(%r12)| mov
%r14,0x48(%r12)
movq   $0x0,0x190(%r14)   | add
$0x40,%r14
mov%ax,0x4(%r12)  <
mov%r14,0x30(%r12)  mov
%r14,0x30(%r12)
  > mov
%ax,0x4(%r12)
  > mov
%ebp,(%r12)
movl   $0x1,0xc(%rsp)   movl   
$0x1,0xc(%rsp)
callcall   

mov%r12,%rdimov
%r12,%rdi
movabs $0x101010101010101,%rdx<
test   %eax,%eaxtest   
%eax,%eax
mov$0xff,%eax   mov
$0xff,%eax
cmove  %eax,%ebxcmove  
%eax,%ebx
movzbl %bl,%eax   | movd   
%ebx,%xmm0
mov%ebx,0xc(%rsp)   mov
%ebx,0xc(%rsp)
mov%rax,%rsi  | 
punpcklbw %xmm0,%xmm0
imul   %rdx,%rsi  | 
punpcklwd %xmm0,%xmm0
mul%rdx   | pshufd 
$0x0,%xmm0,%xmm0
add%rsi,%rdx  | movups 
%xmm0,0x50(%r12)
mov%rax,0x50(%r12)| movups 
%xmm0,0x60(%r12)
mov%rdx,0x58(%r12)| movups 
%xmm0,0x70(%r12)
mov%rax,0x60(%r12)| movups 
%xmm0,0x80(%r12)
mov%rdx,0x68(%r12)| movups 
%xmm0,0x90(%r12)
mov%rax,0x70(%r12)| movups 
%xmm0,0xa0(%r12)
mov%rdx,0x78(%r12)|

[PATCH v3 04/12] Remove MAX_BITSIZE_MODE_ANY_INT

2021-05-17 Thread H.J. Lu via Gcc-patches

It is only defined for i386 and everyone uses the default:

 #define MAX_BITSIZE_MODE_ANY_INT (64*BITS_PER_UNIT)

Whatever problems we had before, they have been fixed now.

* config/i386/i386-modes.def (MAX_BITSIZE_MODE_ANY_INT): Removed.
---
 gcc/config/i386/i386-modes.def | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index dbddfd8e48f..4e7014be034 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -107,19 +107,10 @@ INT_MODE (XI, 64);
 PARTIAL_INT_MODE (HI, 16, P2QI);
 PARTIAL_INT_MODE (SI, 32, P2HI);
 
-/* Mode used for signed overflow checking of TImode.  As
-   MAX_BITSIZE_MODE_ANY_INT is only 160, wide-int.h reserves only that
-   rounded up to multiple of HOST_BITS_PER_WIDE_INT bits in wide_int etc.,
-   so OImode is too large.  For the overflow checking we actually need
-   just 1 or 2 bits beyond TImode precision.  Use 160 bits to have
-   a multiple of 32.  */
+/* Mode used for signed overflow checking of TImode.  For the overflow
+   checking we actually need just 1 or 2 bits beyond TImode precision.
+   Use 160 bits to have a multiple of 32.  */
 PARTIAL_INT_MODE (OI, 160, POI);
 
-/* Keep the OI and XI modes from confusing the compiler into thinking
-   that these modes could actually be used for computation.  They are
-   only holders for vectors during data movement.  Include POImode precision
-   though.  */
-#define MAX_BITSIZE_MODE_ANY_INT (160)
-
 /* The symbol Pmode stands for one of the above machine modes (usually SImode).
The tm.h file specifies which one.  It is not a distinct mode.  */
-- 
2.31.1

Re: [PATCH 7/7] [og10] WIP GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION changes

2021-05-17 Thread Chung-Lin Tang


On 2021/5/11 4:57 PM, Julian Brown wrote:

This work-in-progress patch tries to get
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION to behave more like
GOMP_MAP_ATTACH_DETACH -- in that the mapping is made to form groups
to be processed by build_struct_group/build_struct_comp_map.  I think
that's important to integrate with how groups of mappings for array
sections are handled in other cases.

This patch isn't sufficient by itself to fix a couple of broken test cases
at present (libgomp.c++/target-lambda-1.C, libgomp.c++/target-this-4.C),
though.


No, GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION is supposed to be just a slightly
different behavior version of GOMP_MAP_ATTACH; it tolerates an unmapped
pointer-target and assigns NULL on the device, instead of just gomp_fatal().
(see its handling in libgomp/target.c)

In case OpenACC can have the same such zero-length array section behavior,
we can just share one GOMP_MAP_ATTACH map. For now it is treated as separate
cases.

Chung-Lin


2021-05-11  Julian Brown  

gcc/
* gimplify.c (build_struct_comp_nodes): Add
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION handling.
(build_struct_group): Process GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION
as part of pointer group.
(gimplify_scan_omp_clauses): Update prev_list_p such that
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION will form part of pointer
group.
---
  gcc/gimplify.c | 16 
  1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 6d204908c82..c5cb486aa23 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8298,7 +8298,9 @@ build_struct_comp_nodes (enum tree_code code, tree 
grp_start, tree grp_end,
if (grp_mid
&& OMP_CLAUSE_CODE (grp_mid) == OMP_CLAUSE_MAP
&& (OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ALWAYS_POINTER
- || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH))
+ || OMP_CLAUSE_MAP_KIND (grp_mid) == GOMP_MAP_ATTACH_DETACH
+ || (OMP_CLAUSE_MAP_KIND (grp_mid)
+ == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)))
  {
tree c3
= build_omp_clause (OMP_CLAUSE_LOCATION (grp_end), OMP_CLAUSE_MAP);
@@ -8774,12 +8776,14 @@ build_struct_group (struct gimplify_omp_ctx *ctx,
 ? splay_tree_lookup (ctx->variables, (splay_tree_key) decl)
 : NULL);
bool ptr = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALWAYS_POINTER);
-  bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH);
+  bool attach_detach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH
+   || (OMP_CLAUSE_MAP_KIND (c)
+   == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION));
bool attach = (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
 || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH);
bool has_attachments = false;
/* For OpenACC, pointers in structs should trigger an attach action.  */
-  if (attach_detach
+  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH_DETACH
&& ((region_type & (ORT_ACC | ORT_TARGET | ORT_TARGET_DATA))
  || code == OMP_TARGET_ENTER_DATA
  || code == OMP_TARGET_EXIT_DATA))
@@ -9784,6 +9788,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  if (!remove
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ALWAYS_POINTER
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_ATTACH_DETACH
+ && (OMP_CLAUSE_MAP_KIND (c)
+ != GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)
  && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_TO_PSET
  && OMP_CLAUSE_CHAIN (c)
  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (c)) == OMP_CLAUSE_MAP
@@ -9792,7 +9798,9 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
  == GOMP_MAP_ATTACH_DETACH)
  || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
- == GOMP_MAP_TO_PSET)))
+ == GOMP_MAP_TO_PSET)
+ || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (c))
+ == GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION)))
prev_list_p = list_p;
  
  	  break;

Re: [PATCH 5/5] Mapping of components of references to pointers to structs for OpenMP/OpenACC

2021-05-17 Thread Chung-Lin Tang


Hi Julian,

On 2021/5/15 5:27 AM, Julian Brown wrote:

GCC currently raises a parse error for indirect accesses to struct
members, where the base of the access is a reference to a pointer.
This patch fixes that case.



gcc/cp/
* semantics.c (finish_omp_clauses): Handle components of references to
pointers to structs.

libgomp/
* testsuite/libgomp.oacc-c++/deep-copy-17.C: Update test.



--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7670,7 +7670,12 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
  if ((ort == C_ORT_ACC || ort == C_ORT_OMP)
  && TREE_CODE (t) == COMPONENT_REF
  && TREE_CODE (TREE_OPERAND (t, 0)) == INDIRECT_REF)
-   t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
+   {
+ t = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
+ /* References to pointers have a double indirection here.  */
+ if (TREE_CODE (t) == INDIRECT_REF)
+   t = TREE_OPERAND (t, 0);
+   }
  if (TREE_CODE (t) == COMPONENT_REF
  && ((ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP
  || ort == C_ORT_ACC)


There is already a large plethora of such modifications in this patch:
"[PATCH, OG10, OpenMP 5.0, committed] Remove array section base-pointer mapping 
semantics, and other front-end adjustments."
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html

I am in the process of taking that patch to mainline, so are you sure this is 
not already handled there?


diff --git a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C 
b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
index dacbb520f3d..e038e9e3802 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
+++ b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
@@ -83,7 +83,7 @@ void strrp (void)
a[0] = 8;
c[0] = 10;
e[0] = 12;
-  #pragma acc parallel copy(n->a[0:10], n->c[0:10], n->e[0:10])
+  #pragma acc parallel copy(n->a[0:10], n->b, n->c[0:10], n->d, n->e[0:10])
{
  n->a[0] = n->c[0] + n->e[0];
}


This testcase can be added.

Chung-Lin

Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

2021-05-17 Thread Christophe Lyon via Gcc-patches

On Mon, 17 May 2021 at 12:35, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Christophe Lyon via Gcc-patches
> > Sent: 05 May 2021 15:08
> > To: Andre Simoes Dias Vieira 
> > Cc: gcc Patches 
> > Subject: Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp
> >
> > On Tue, 4 May 2021 at 15:41, Christophe Lyon 
> > wrote:
> > >
> > > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists)
> > >  wrote:
> > > >
> > > > Hi Christophe,
> > > >
> > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > > Since MVE has a different set of vector comparison operators from
> > > > > Neon, we have to update the expansion to take into account the new
> > > > > ones, for instance 'NE' for which MVE does not require to use 'EQ'
> > > > > with the inverted condition.
> > > > >
> > > > > Conversely, Neon supports comparisons with #0, MVE does not.
> > > > >
> > > > > For:
> > > > > typedef long int vs32 __attribute__((vector_size(16)));
> > > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; }
> > > > >
> > > > > we now generate:
> > > > > cmp_eq_vs32_reg:
> > > > >   vldr.64 d4, .L123   @ 8 [c=8 l=4]  *mve_movv4si/8
> > > > >   vldr.64 d5, .L123+8
> > > > >   vldr.64 d6, .L123+16@ 9 [c=8 l=4]  *mve_movv4si/8
> > > > >   vldr.64 d7, .L123+24
> > > > >   vcmp.i32  eq, q0, q1@ 7 [c=16 l=4]  mve_vcmpeqq_v4si
> > > > >   vpsel q0, q3, q2@ 15[c=8 l=4]  mve_vpselq_sv4si
> > > > >   bx  lr  @ 26[c=8 l=4]  *thumb2_return
> > > > > .L124:
> > > > >   .align  3
> > > > > .L123:
> > > > >   .word   0
> > > > >   .word   0
> > > > >   .word   0
> > > > >   .word   0
> > > > >   .word   1
> > > > >   .word   1
> > > > >   .word   1
> > > > >   .word   1
> > > > >
> > > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode))
> > produces
> > > > > a pair of vldr instead of vmov.i32, qX, #0
> > > > I think ideally we would even want:
> > > > vpte  eq, q0, q1
> > > > vmovt.i32 q0, #0
> > > > vmove.i32 q0, #1
> > > >
> > > > But we don't have a way to generate VPT blocks with multiple
> > > > instructions yet unfortunately so I guess VPSEL will have to do for now.
> > >
> > > TBH,  I looked at what LLVM generates currently ;-)
> > >
> >
> > Here is an updated version, which adds
> > && (! || flag_unsafe_math_optimizations)
> > to vcond_mask_
> >
> > This condition was not present in the neon.md version I move to vec-
> > common.md,
> > but since the VDQW iterator includes V2SF and V4SF, it should take
> > float-point flags into account.
> >
>
> -  emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1));
> +case NE:
> +  if (TARGET_HAVE_MVE) {
> +   rtx vpr_p0;
>
> GNU style wants the '{' on the new line. This appears a few other times in 
> the patch.
>
> +   if (vcond_mve)
> + vpr_p0 = target;
> +   else
> + vpr_p0 = gen_reg_rtx (HImode);
> +
> +   switch (cmp_mode)
> + {
> + case E_V16QImode:
> + case E_V8HImode:
> + case E_V4SImode:
> +   emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg 
> (cmp_mode, op1)));
> +   break;
> + case E_V8HFmode:
> + case E_V4SFmode:
> +   if (TARGET_HAVE_MVE_FLOAT)
> + emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, 
> force_reg (cmp_mode, op1)));
> +   else
> + gcc_unreachable ();
> +   break;
> + default:
> +   gcc_unreachable ();
> + }
>
> Hmm, I think we can just check GET_MODE_CLASS (cmp_mode) for MODE_VECTOR_INT 
> or MODE_VECTOR_FLOAT here rather than have this switch statement.
>
> +
> +   /* If we are not expanding a vcond, build the result here.  */
> +   if (!vcond_mve) {
> + rtx zero = gen_reg_rtx (cmp_result_mode);
> + rtx one = gen_reg_rtx (cmp_result_mode);
> + emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> + emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, 
> zero, vpr_p0));
> +   }
> +  }
> +  else
>
> ...
>bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]),
> -operands[4], operands[5], true);
> +operands[4], operands[5], true, 
> vcond_mve);
>if (inverted)
>  std::swap (operands[1], operands[2]);
> +  if (TARGET_NEON)
>emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0],
> mask, operands[1], operands[2]));
> +  else
> +{
> +  machine_mode cmp_mode = GET_MODE (operands[4]);
> +  rtx vpr_p0 = mask;
> +  rtx zero = gen_reg_rtx (cmp_mode);
> +  rtx one = gen_reg_rtx (cmp_mode);
> +  emit_move_insn (zero, CONST0_RTX (cmp_mode));
> +  emit_move_insn (one, CONST1_RTX

Re: [PATCH 4/5] Rework indirect struct handling for OpenACC/OpenMP in gimplify.c

2021-05-17 Thread Bernd Edlinger

On 5/14/21 11:26 PM, Julian Brown wrote:
> This patch reworks indirect struct handling in gimplify.c (i.e. for struct
> components mapped with "mystruct->a[0:n]", "mystruct->b", etc.), for
> both OpenACC and OpenMP.  The key observation leading to these changes
> was that component mappings of references-to-structures is already
> implemented and working, and indirect struct component handling via a
> pointer can work quite similarly.  That lets us remove some earlier,
> special-case handling for mapping indirect struct component accesses
> for OpenACC, which required the pointed-to struct to be manually mapped
> before the indirect component mapping.
> 
> With this patch, you can map struct components directly (e.g. an array
> slice "mystruct->a[0:n]") just like you can map a non-indirect struct
> component slice ("mystruct.a[0:n]"). Both references-to-pointers (with
> the former syntax) and references to structs (with the latter syntax)
> work now.
> 
> For Fortran class pointers, we no longer re-use GOMP_MAP_TO_PSET for the
> class metadata (the structure that points to the class data and vptr)
> -- it is instead treated as any other struct.
> 
> For C++, the struct handling also works for class members ("this->foo"),
> without having to explicitly map "this[:1]" first.
> 
> For OpenACC, we permit chained indirect component references
> ("mystruct->a->b[0:n]"), though only the last part of such mappings will
> trigger an attach/detach operation.  To properly use such a construct
> on the target, you must still manually map "mystruct->a[:1]" first --
> but there's no need to map "mystruct[:1]" explicitly before that.
> 
> This patch incorporates parts of Chung-Lin's patch "Recommit "Enable
> gimplify GOMP_MAP_STRUCT handling of (COMPONENT_REF (INDIRECT_REF
> ...)) map clauses"." from the og10 branch.
> 
> OK for trunk?
> 
> Thanks,
> 
> Julian
> 
> 2021-05-14  Julian Brown  
>   Chung-Lin Tang  
> 
> gcc/fortran/
>   * trans-openmp.c (gfc_trans_omp_clauses): Don't create GOMP_MAP_TO_PSET
>   mappings for class metadata, nor GOMP_MAP_POINTER mappings for
>   POINTER_TYPE_P decls.
> 
> gcc/
>   * gimplify.c (tree-hash-traits.h): Include.
>   (extract_base_bit_offset): Add BASE_IND parameter.  Handle
>   pointer-typed indirect references alongside reference-typed ones.
>   (strip_components_and_deref, aggregate_base_p): New functions.
>   (build_struct_group): Update struct_map_to_clause type.  Add pointer
>   type indirect ref handling, including chained references.  Handle
>   pointers and references to structs in OpenACC regions as well as
>   OpenMP ones.
>   (gimplify_scan_omp_clauses): Remove struct_deref_set handling.  Rework
>   pointer-type indirect structure access handling to work more like
>   the reference-typed handling.
>   * omp-low.c (scan_sharing_clauses): Handle pointer-type indirect struct
>   references, and references to pointers to structs also.
> 
> gcc/testsuite/
>   * g++.dg/goacc/member-array-acc.C: New test (XFAILed for now).
>   * g++.dg/gomp/member-array-omp.C: New test (XFAILed for now).
> 
> libgomp/
>   * testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c: New test.
>   * testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c: New test.
>   * testsuite/libgomp.oacc-c++/deep-copy-17.C: New test.
> ---
>  gcc/fortran/trans-openmp.c|  20 +-
>  gcc/gimplify.c| 285 ++
>  gcc/omp-low.c |  16 +-
>  gcc/testsuite/g++.dg/goacc/member-array-acc.C |  14 +
>  gcc/testsuite/g++.dg/gomp/member-array-omp.C  |  14 +
>  .../testsuite/libgomp.oacc-c++/deep-copy-17.C | 101 +++
>  .../libgomp.oacc-c-c++-common/deep-copy-15.c  |  71 +
>  .../libgomp.oacc-c-c++-common/deep-copy-16.c  | 231 ++
>  8 files changed, 612 insertions(+), 140 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/goacc/member-array-acc.C
>  create mode 100644 gcc/testsuite/g++.dg/gomp/member-array-omp.C
>  create mode 100644 libgomp/testsuite/libgomp.oacc-c++/deep-copy-17.C
>  create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c
>  create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c
> 
> diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
> index 5666cd68c7e..ff614ffe744 100644
> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -2721,30 +2721,16 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
> gfc_omp_clauses *clauses,
> tree present = gfc_omp_check_optional_argument (decl, true);
> if (openacc && n->sym->ts.type == BT_CLASS)
>   {
> -   tree type = TREE_TYPE (decl);
> if (n->sym->attr.optional)
>   sorry ("optional class parameter");
> -   if (POINTER_TYPE_P (type))
> - {
> -

[PATCH] RISC-V: Properly parse the letter 'p' in '-march'.

2021-05-17 Thread Geng Qi via Gcc-patches

gcc/ChangeLog:
* common/config/riscv/riscv-common.c
(riscv_subset_list::parsing_subset_version): Properly parse the letter
'p' in '-march'.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-12.c: New.
* gcc.target/riscv/attribute-19.c: New.
---
 gcc/common/config/riscv/riscv-common.c| 64 +--
 gcc/testsuite/gcc.target/riscv/arch-12.c  |  4 ++
 gcc/testsuite/gcc.target/riscv/attribute-19.c |  4 ++
 3 files changed, 40 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-19.c

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 34b74e5..76f544e 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -518,40 +518,38 @@ riscv_subset_list::parsing_subset_version (const char 
*ext,
   unsigned version = 0;
   unsigned major = 0;
   unsigned minor = 0;
-  char np;
   *explicit_version_p = false;
 
-  for (; *p; ++p)
-{
-  if (*p == 'p')
-   {
- np = *(p + 1);
-
- if (!ISDIGIT (np))
-   {
- /* Might be beginning of `p` extension.  */
- if (std_ext_p)
-   {
- get_default_version (ext, major_version, minor_version);
- return p;
-   }
- else
-   {
- error_at (m_loc, "%<-march=%s%>: Expect number "
-   "after %<%dp%>.", m_arch, version);
- return NULL;
-   }
-   }
-
- major = version;
- major_p = false;
- version = 0;
-   }
-  else if (ISDIGIT (*p))
-   version = (version * 10) + (*p - '0');
-  else
-   break;
-}
+  if (*p == 'p')
+gcc_assert (std_ext_p);
+  else {
+for (; *p; ++p)
+  {
+   if (*p == 'p')
+ {
+   if (!ISDIGIT (*(p+1)))
+ {
+   error_at (m_loc, "%<-march=%s%>: Expect number "
+ "after %<%dp%>.", m_arch, version);
+   return NULL;
+ }
+   if (!major_p)
+ {
+   error_at (m_loc, "%<-march=%s%>: For %<%s%dp%dp?%>, version "
+ "number with more than 2 level is not supported.",
+ m_arch, ext, major, version);
+   return NULL;
+ }
+   major = version;
+   major_p = false;
+   version = 0;
+ }
+   else if (ISDIGIT (*p))
+ version = (version * 10) + (*p - '0');
+   else
+ break;
+  }
+  }
 
   if (major_p)
 major = version;
@@ -681,6 +679,8 @@ riscv_subset_list::parse_std_ext (const char *p)
 
   p = parsing_subset_version (subset, p, _version, _version,
  /* std_ext_p= */ true, _version_p);
+  if (p == NULL)
+   return NULL;
 
   add (subset, major_version, minor_version, explicit_version_p, false);
 }
diff --git a/gcc/testsuite/gcc.target/riscv/arch-12.c 
b/gcc/testsuite/gcc.target/riscv/arch-12.c
new file mode 100644
index 000..29e16c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-12.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64im1p2p3 -mabi=lp64" } */
+int foo() {}
+/* { dg-error "'-march=rv64im1p2p3': For 'm1p2p\\?', version number with more 
than 2 level is not supported." "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-19.c 
b/gcc/testsuite/gcc.target/riscv/attribute-19.c
new file mode 100644
index 000..18f68d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-19.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-mriscv-attribute -march=rv64imp0p9 -mabi=lp64" } */
+int foo() {}
+/* { dg-final { scan-assembler ".attribute arch, \"rv64i2p0_m2p0_p0p9\"" } } */
-- 
2.7.4

[PATCH] aix: handle 64bit inodes for include directories

2021-05-17 Thread CHIGOT, CLEMENT via Gcc-patches

On AIX, stat will store inodes in 32bit even when using LARGE_FILES.
If the inode is larger, it will return -1 in st_ino.
Thus, in incpath.c when comparing include directories, if several
of them have 64bit inodes, they will be considered as duplicated.

gcc/ChangeLog:
2021-05-06  Clément Chigot  

* configure.ac: Check sizeof ino_t and dev_t.
* config.in: Regenerate.
* configure: Regenerate.
* config/rs6000/aix.h (HOST_STAT_FOR_64BIT_INODES): New define.
* incpath.c (HOST_STAT_FOR_64BIT_INODES): New define.
(remove_duplicates): Use it.

libcpp/ChangeLog:
2021-05-06  Clément Chigot  

* configure.ac: Check sizeof ino_t and dev_t.
* config.in: Regenerate.
* configure: Regenerate.
* include/cpplib.h (INO_T_CPP): Change for AIX.
(DEV_T_CPP): New macro.
(struct cpp_dir): Use it.





0001-aix-handle-64bit-inodes-for-include-directories.patch
Description: 0001-aix-handle-64bit-inodes-for-include-directories.patch

[PATCH] c++: Fix diagnostic for binding lvalue reference to volatile rvalue [PR 100635]

2021-05-17 Thread Jonathan Wakely via Gcc-patches

The current diagnostic assumes the reference binding fails because the
reference is non-const, but it can also fail if the rvalue is volatile.

Use the current diagnostic for non-const cases, and a modified
diagnostic otherwise.

gcc/cp/ChangeLog:

PR c++/100635
* call.c (convert_like_internal): Print different diagnostic if
the lvalue reference is const.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/pr100635.C: New test.

Tested powerpc64le-linux.

OK for trunk?


commit 26624b68aebd80d0c922ee48f944124dcc8c02e2
Author: Jonathan Wakely 
Date:   Mon May 17 10:53:56 2021

c++: Fix diagnostic for binding lvalue reference to volatile rvalue [PR 
100635]

The current diagnostic assumes the reference binding fails because the
reference is non-const, but it can also fail if the rvalue is volatile.

Use the current diagnostic for non-const cases, and a modified
diagnostic otherwise.

gcc/cp/ChangeLog:

PR c++/100635
* call.c (convert_like_internal): Print different diagnostic if
the lvalue reference is const.

gcc/testsuite/ChangeLog:

* g++.dg/conversion/pr100635.C: New test.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f07e09a36d1..1e2d1d43184 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -7900,9 +7900,13 @@ convert_like_internal (conversion *convs, tree expr, 
tree fn, int argnum,
  "type %qH to a value of type %qI",
  totype, next->type);
  }
-   else
+   else if (!CP_TYPE_CONST_P (TREE_TYPE (ref_type)))
  error_at (loc, "cannot bind non-const lvalue reference of "
"type %qH to an rvalue of type %qI", totype, 
extype);
+   else // extype is volatile
+ error_at (loc, "cannot bind lvalue reference of type "
+   "%qH to an rvalue of type %qI", totype,
+   extype);
  }
else if (!reference_compatible_p (TREE_TYPE (totype), extype))
  {
diff --git a/gcc/testsuite/g++.dg/conversion/pr100635.C 
b/gcc/testsuite/g++.dg/conversion/pr100635.C
new file mode 100644
index 000..58412152238
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr100635.C
@@ -0,0 +1,12 @@
+// PR c++/100635
+// { dg-do compile }
+// { dg-additional-options "-Wno-volatile" { target c++2a } }
+
+struct S { };
+volatile S v();
+const volatile S& svol = v(); // { dg-error "cannot bind lvalue reference of 
type 'const volatile S&' to an rvalue of type 'volatile S'" }
+
+#if __cplusplus >= 201103L
+volatile int&& declvol();
+const volatile int& voli = declvol(); // { dg-error "cannot bind lvalue 
reference of type 'const volatile int&' to an rvalue of type 'volatile int'" "" 
{ target c++11} }
+#endif

[committed] libstdc++: Allow lualatex to be used for Doxygen PDF

2021-05-17 Thread Jonathan Wakely via Gcc-patches

This allows the Doxygen PDF to be built using lualatex instead of
pdflatex, which solves a problem with pdflatex running out of memory
sometimes. This is done by adding a --latex_cmd option to the
run_doxygen script, which then sets the specified command in the
generated user.cfg file used by Doxygen. The makefile is adjusted to
pass --latex_cmd=$(LATEX_CMD) to the script, so using running make with
LATEX_CMD=lualatex will override the default.

Additionally, this does some refactoring of the doc/Makefile.am rules
and the run_doxygen script.

libstdc++-v3/ChangeLog:

* doc/Makefile.am: Simplify doxygen recipes and use --latex_cmd.
* doc/Makefile.in: Regenerate.
* doc/doxygen/user.cfg.in (LATEX_CMD_NAME): Add placeholder
value.
* scripts/run_doxygen (print_usage): Always print to stdout and
do not exit.
(fail): New function for exiting on error.
(parse_options): Handle --latex_cmd. Do not treat --help the
same as errors. Simplify handling of required arguments.

Tested x86_64-linux. Committed to trunk.

commit e3b6d3a887fc0df09ea742c9c5a5acbc27c11ea7
Author: Jonathan Wakely 
Date:   Fri May 14 14:19:50 2021

libstdc++: Allow lualatex to be used for Doxygen PDF

This allows the Doxygen PDF to be built using lualatex instead of
pdflatex, which solves a problem with pdflatex running out of memory
sometimes. This is done by adding a --latex_cmd option to the
run_doxygen script, which then sets the specified command in the
generated user.cfg file used by Doxygen. The makefile is adjusted to
pass --latex_cmd=$(LATEX_CMD) to the script, so using running make with
LATEX_CMD=lualatex will override the default.

Additionally, this does some refactoring of the doc/Makefile.am rules
and the run_doxygen script.

libstdc++-v3/ChangeLog:

* doc/Makefile.am: Simplify doxygen recipes and use --latex_cmd.
* doc/Makefile.in: Regenerate.
* doc/doxygen/user.cfg.in (LATEX_CMD_NAME): Add placeholder
value.
* scripts/run_doxygen (print_usage): Always print to stdout and
do not exit.
(fail): New function for exiting on error.
(parse_options): Handle --latex_cmd. Do not treat --help the
same as errors. Simplify handling of required arguments.

diff --git a/libstdc++-v3/doc/Makefile.am b/libstdc++-v3/doc/Makefile.am
index 2f8bb0770f3..487e8621b23 100644
--- a/libstdc++-v3/doc/Makefile.am
+++ b/libstdc++-v3/doc/Makefile.am
@@ -226,10 +226,10 @@ ${doxygen_outdir}/man:
mkdir -p ${doxygen_outdir}/man
 
 stamp-xml-doxygen: ${doxygen_outdir}/xml
-   -(srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \
- builddir=`cd ..; ${PWD_COMMAND}`; \
+   @builddir=`cd ..; ${PWD_COMMAND}`; \
  ${SHELL} ${doxygen_script} \
- --host_alias=${host_alias} --mode=xml $${srcdir} $${builddir} NO)
+ --host_alias=${host_alias} --mode=xml \
+ "${top_srcdir}" "$${builddir}" NO || true
$(STAMP) stamp-xml-doxygen
 
 stamp-xml-single-doxygen: stamp-xml-doxygen
@@ -239,29 +239,29 @@ stamp-xml-single-doxygen: stamp-xml-doxygen
$(STAMP) stamp-xml-single-doxygen
 
 stamp-html-doxygen: ${doxygen_outdir}/html
-   -(srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \
- builddir=`cd ..; ${PWD_COMMAND}`; \
+   @builddir=`cd ..; ${PWD_COMMAND}`; \
  ${SHELL} ${doxygen_script} \
- --host_alias=${host_alias} --mode=html $${srcdir} $${builddir} YES)
+ --host_alias=${host_alias} --mode=html \
+ "${top_srcdir}" "$${builddir}" YES || true
$(STAMP) stamp-html-doxygen
 
 stamp-latex-doxygen: ${doxygen_outdir}/latex
-   -(srcdir=`cd ${top_srcdir}; ${PWD_COMMAND}`; \
- builddir=`cd ..; ${PWD_COMMAND}`; \
+   @builddir=`cd ..; ${PWD_COMMAND}`; \
  ${SHELL} ${doxygen_script} \
- --host_alias=${host_alias} --mode=latex $${srcdir} $${builddir} NO)
+ --host_alias=${host_alias} --mode=latex --latex_cmd=$(LATEX_CMD) \
+ "${top_srcdir}" "$${builddir}" NO || true
$(STAMP) stamp-latex-doxygen
 
 # Chance of loonnggg creation time on this rule.  Iff this fails,
 # look at refman.log and see if TeX's memory is exhausted. Symptoms
 # include asking a wizard to enlarge capacity. If this is the case,
 # find texmf.cnf and add a zero for pool_size, string_vacancies,
-# max_strings, and pool_free values. A much simpler workaround is to install
-# lualatex and set LATEX_CMD_NAME = lualatex in the doxygen user.cfg file.
+# max_strings, and pool_free values. A much simpler workaround is to
+# install lualatex and set LATEX_CMD=lualatex when running make.
 # Errors like "File `foo.sty' not found" mean a TeX package is missing.
 stamp-pdf-doxygen: stamp-latex-doxygen ${doxygen_outdir}/pdf
-   -(cd ${doxygen_outdir}/latex && $(MAKE) -i pdf;)
@echo "Generating doxygen pdf file...";
+

Re: RFA: Add option -fretry-compilation

2021-05-17 Thread Richard Biener via Gcc-patches

On Mon, May 17, 2021 at 10:55 AM Joern Rennecke
 wrote:
>
> On Mon, 17 May 2021 at 08:36, Richard Biener  
> wrote:
> >
> > On Sun, May 16, 2021 at 8:53 PM Joern Rennecke
> >  wrote:
> > >
> > > For architectures with likely spilled register classes, neither
> > > register allocator is guaranteed
> > > to succeed when using optimization.  If you have just a few files to
> > > compile, you can try
> > > by hand which compiler options will succeed and still give reasonable
> > > code, but for large projects,
> > > hand-tweaking library / program build rules on a file-by-file basis is
> > > time intensive and does not
> > > scale well across different build environments and compiler versions.
> > >
> > > The attached patch adds a new option -fretry-compilation that allows
> > > you to specify a list - or
> > > lists - of options to use for a compilation retry, which is
> > > implemented in the compiler driver.
> > >
> > > Bootstrapped on x86_64-pc-linux-gnu.
> >
> > Eh, no ;)  But funny idea, nevertheless.
>
> Why no?
>
> lra just throws a ton of transformations at the code with no theoretical
> concept that I can discern that it should - modulo bugs - succeed for
> all well-formed code.  It works well most of the time so I'd like to use it as
> a default, but how are you supposed to compile libgcc and newlib with
> a register allocator that only works most of the time?
>
> reload is more robust in the basic design, but it's so complex that it's
> rather time-consuming to debug.  The failures I had left with reload
> were not spill-failures per se, but code that was considered mal-formed by
> the postreload passes and it's hard to decide which one was actually wrong.
> And if I debug the failures seeen with realod, will this do any good in the
> long run, or will it just be changed beyond all recognition (with works for
> the top five most popular processor architectures but not quite for anything
> else) or plain ripped out a few years down the line?

The plan for reload is to axe it similar to CC0 support.  Sooner than later, but
give it's still used exclusively by a lot of target means it might
take some time.

> I had a proof-of-concept for the option in the target code first, but that 
> used
> fork(2) and thus left non-POSIX hosts (even if they have a pretend POSIX
> subsystem) high and dry.  The logical place to implement the option to
> make it portable is in the compiler driver.
> I've called the option originally -mretry-regalloc / -fretry-regalloc, but 
> when
> I got around to write the invoke.texi patch, I realized that the option can be
> used more generally to work around glitches, so it's more apt to name it
> -fretry-compilation .

So for you it's always just -fretry-compilation -m[no-]lra?  Given -m[no-]lra
is a thing cycling between the two directly in RA lra/reload should be possible?
Or are reload/LRA too greedy in that they ICE when having transformed half
of the code already?

> > Do you run into the issues
> > with the first scheduling pass disabled?
>
> The target doesn't have anything that needs scheduling, and hence no 
> scheduling
> description.  But it also has more severe register pressures for
> memory access than
> ports in the FSF tree.

I see.  It's of course difficult for the FSF tree to cater for
extremes that are not
represented in its tree.  I wonder what prevents you from contributing the port?

> The bane of lra are memory-memory moves.  Instead of using an intermediate
> register, it starts by reloading the well-formed addresses and thus jacking up
> the base register pressure.
>
> I had a patch for that, but I found it needs a bit more work.

Still if that solves a lot of the issues this seems like the way to go.

Richard.

[PATCH][DOCS] Remove install-old.texi

2021-05-17 Thread Martin Liška


Hello.

As mentioned at the beginning of https://gcc.gnu.org/install/old.html:
"Note most of this information is out of date and superseded by the previous 
chapters of this manual."

The installation page is deprecated for 20 years now.

Does it make sense to remove it?
Thanks,
Martin

gcc/ChangeLog:

* Makefile.in: Remove it.
* doc/include/fdl.texi: Update next/previous chapters.
* doc/install.texi: Likewise.
* doc/install-old.texi: Removed.
---
 gcc/Makefile.in  |   2 +-
 gcc/doc/include/fdl.texi |   2 +-
 gcc/doc/install-old.texi | 184 ---
 gcc/doc/install.texi |  20 +
 4 files changed, 3 insertions(+), 205 deletions(-)
 delete mode 100644 gcc/doc/install-old.texi

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 1b5d3f4696c..5fd6ac97117 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3314,7 +3314,7 @@ TEXI_GCCINT_FILES = gccint.texi gcc-common.texi 
gcc-vers.texi \
 loop.texi generic.texi gimple.texi plugins.texi optinfo.texi   \
 match-and-simplify.texi analyzer.texi ux.texi poly-int.texi
 
-TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi		\

+TEXI_GCCINSTALL_FILES = install.texi fdl.texi  \
 gcc-common.texi gcc-vers.texi
 
 TEXI_CPPINT_FILES = cppinternals.texi gcc-common.texi gcc-vers.texi

diff --git a/gcc/doc/include/fdl.texi b/gcc/doc/include/fdl.texi
index 4e3457fe9c4..7fa222c5f32 100644
--- a/gcc/doc/include/fdl.texi
+++ b/gcc/doc/include/fdl.texi
@@ -19,7 +19,7 @@ of this license document, but changing it is not allowed.
 @ifset gfdlhtml
 @ifnothtml
 @comment node-name, next,  previous, up
-@nodeGNU Free Documentation License, Concept Index, Old, Top
+@nodeGNU Free Documentation License, Concept Index, Specific, Top
 @end ifnothtml
 @html
 Installing GCC: GNU Free Documentation License
diff --git a/gcc/doc/install-old.texi b/gcc/doc/install-old.texi
deleted file mode 100644
index b425971f944..000
--- a/gcc/doc/install-old.texi
+++ /dev/null
@@ -1,184 +0,0 @@
-@c Copyright (C) 1988-2021 Free Software Foundation, Inc.
-@c This is part of the GCC manual.
-@c For copying conditions, see the file install.texi.
-
-@ifnothtml
-@comment node-name, next,  previous, up
-@nodeOld, GNU Free Documentation License, Specific, Top
-@end ifnothtml
-@html
-Old installation documentation
-@end html
-@ifnothtml
-@chapter Old installation documentation
-@end ifnothtml
-
-Note most of this information is out of date and superseded by the
-previous chapters of this manual.  It is provided for historical
-reference only, because of a lack of volunteers to merge it into the
-main manual.
-
-@ifnothtml
-@menu
-* Configurations::Configurations Supported by GCC.
-@end menu
-@end ifnothtml
-
-Here is the procedure for installing GCC on a GNU or Unix system.
-
-@enumerate
-@item
-If you have chosen a configuration for GCC which requires other GNU
-tools (such as GAS or the GNU linker) instead of the standard system
-tools, install the required tools in the build directory under the names
-@file{as}, @file{ld} or whatever is appropriate.
-
-Alternatively, you can do subsequent compilation using a value of the
-@code{PATH} environment variable such that the necessary GNU tools come
-before the standard system tools.
-
-@item
-Specify the host, build and target machine configurations.  You do this
-when you run the @file{configure} script.
-
-The @dfn{build} machine is the system which you are using, the
-@dfn{host} machine is the system where you want to run the resulting
-compiler (normally the build machine), and the @dfn{target} machine is
-the system for which you want the compiler to generate code.
-
-If you are building a compiler to produce code for the machine it runs
-on (a native compiler), you normally do not need to specify any operands
-to @file{configure}; it will try to guess the type of machine you are on
-and use that as the build, host and target machines.  So you don't need
-to specify a configuration when building a native compiler unless
-@file{configure} cannot figure out what your configuration is or guesses
-wrong.
-
-In those cases, specify the build machine's @dfn{configuration name}
-with the @option{--host} option; the host and target will default to be
-the same as the host machine.
-
-Here is an example:
-
-@smallexample
-./configure --host=sparc-sun-sunos4.1
-@end smallexample
-
-A configuration name may be canonical or it may be more or less
-abbreviated.
-
-A canonical configuration name has three parts, separated by dashes.
-It looks like this: @samp{@var{cpu}-@var{company}-@var{system}}.
-(The three parts may themselves contain dashes; @file{configure}
-can figure out which dashes serve which purpose.)  For example,
-@samp{m68k-sun-sunos4.1} specifies a Sun 3.
-
-You can also replace parts of the configuration by nicknames or aliases.
-For example, @samp{sun3} stands for @samp{m68k-sun},

RE: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16 support to VCMP

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 05 May 2021 15:09
> To: Andre Simoes Dias Vieira 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16
> support to VCMP
> 
> On Tue, 4 May 2021 at 19:03, Christophe Lyon 
> wrote:
> >
> > On Tue, 4 May 2021 at 15:43, Christophe Lyon
>  wrote:
> > >
> > > On Tue, 4 May 2021 at 13:48, Andre Vieira (lists)
> > >  wrote:
> > > >
> > > > It would be good to also add tests for NEON as you also enable auto-
> vec
> > > > for it. I checked and I do think the necessary 'neon_vc' patterns exist
> > > > for 'VH', so we should be OK there.
> > > >
> > >
> > > Actually since I posted the patch series, I've noticed a regression in
> > > armv8_2-fp16-arith-1.c, because we now vectorize all the float16x[48]_t
> loops,
> > > but we lose the fact that some FP comparisons can throw exceptions.
> > >
> > > I'll have to revisit this patch.
> >
> > Actually it looks like my patch does the right thing: we now vectorize
> > appropriately, given that the testcase is compiled with -ffast-math.
> > I need to update the testcase, though.
> >
> 
> Here is a new version, with armv8_2-fp16-arith-1.c updated to take
> into account the new vectorization.

Ok.
Thanks,
Kyrill

> 
> Christophe
> 
> 
> > >
> > > Thanks,
> > >
> > > Christophe
> > >
> > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > > This patch adds __fp16 support to the previous patch that added
> vcmp
> > > > > support with MVE. For this we update existing expanders to use
> VDQWH
> > > > > iterator, and add a new expander vcond.  In the
> > > > > process we need to create suitable iterators, and update
> v_cmp_result
> > > > > as needed.
> > > > >
> > > > > 2021-04-26  Christophe Lyon  
> > > > >
> > > > >   gcc/
> > > > >   * config/arm/iterators.md (V16): New iterator.
> > > > >   (VH_cvtto): New iterator.
> > > > >   (v_cmp_result): Added V4HF and V8HF support.
> > > > >   * config/arm/vec-common.md (vec_cmp):
> Use VDQWH.
> > > > >   (vcond): Likewise.
> > > > >   (vcond_mask_): Likewise.
> > > > >   (vcond): New expander.
> > > > >
> > > > >   gcc/testsuite/
> > > > >   * gcc.target/arm/simd/mve-compare-3.c: New test with GCC
> vectors.
> > > > >   * gcc.target/arm/simd/mve-vcmp-f16.c: New test for
> > > > >   auto-vectorization.
> > > > > ---
> > > > >   gcc/config/arm/iterators.md   |  6 
> > > > >   gcc/config/arm/vec-common.md  | 40
> ---
> > > > >   gcc/testsuite/gcc.target/arm/simd/mve-compare-3.c | 38
> +
> > > > >   gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f16.c  | 30
> +
> > > > >   4 files changed, 102 insertions(+), 12 deletions(-)
> > > > >   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-
> compare-3.c
> > > > >   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-
> f16.c
> > > > >
> > > > > diff --git a/gcc/config/arm/iterators.md
> b/gcc/config/arm/iterators.md
> > > > > index a128465..3042baf 100644
> > > > > --- a/gcc/config/arm/iterators.md
> > > > > +++ b/gcc/config/arm/iterators.md
> > > > > @@ -231,6 +231,9 @@ (define_mode_iterator VU [V16QI V8HI V4SI])
> > > > >   ;; Vector modes for 16-bit floating-point support.
> > > > >   (define_mode_iterator VH [V8HF V4HF])
> > > > >
> > > > > +;; Modes with 16-bit elements only.
> > > > > +(define_mode_iterator V16 [V4HI V4HF V8HI V8HF])
> > > > > +
> > > > >   ;; 16-bit floating-point vector modes suitable for moving (includes
> BFmode).
> > > > >   (define_mode_iterator VHFBF [V8HF V4HF V4BF V8BF])
> > > > >
> > > > > @@ -571,6 +574,8 @@ (define_mode_attr V_cvtto [(V2SI "v2sf")
> (V2SF "v2si")
> > > > >   ;; (Opposite) mode to convert to/from for vector-half mode
> conversions.
> > > > >   (define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI")
> > > > >   (V8HI "V8HF") (V8HF "V8HI")])
> > > > > +(define_mode_attr VH_cvtto [(V4HI "v4hf") (V4HF "v4hi")
> > > > > + (V8HI "v8hf") (V8HF "v8hi")])
> > > > >
> > > > >   ;; Define element mode for each vector mode.
> > > > >   (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI")
> > > > > @@ -720,6 +725,7 @@ (define_mode_attr V_cmp_result [(V8QI
> "V8QI") (V16QI "V16QI")
> > > > >   (define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
> > > > >   (V4HI "v4hi") (V8HI  "v8hi")
> > > > >   (V2SI "v2si") (V4SI  "v4si")
> > > > > + (V4HF "v4hi") (V8HF  "v8hi")
> > > > >   (DI   "di")   (V2DI  "v2di")
> > > > >   (V2SF "v2si") (V4SF  "v4si")])
> > > > >
> > > > > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md
> > > > > index 034b48b..3fd341c 100644
> > > > > --- a/gcc/config/arm/vec-common.md
> > > >

Re: [Patch] OpenMP/Fortran: Reject nonintrinsic assignments in OMP WORKSHARE [PR100633]

2021-05-17 Thread Jakub Jelinek via Gcc-patches

On Mon, May 17, 2021 at 12:27:22PM +0200, Tobias Burnus wrote:
> OK for mainline?
> It is an ice-on-invalid; does a GCC 11 backport nonetheless make sense?
> 
> Tobias
> 
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
> Thürauf

> OpenMP/Fortran: Reject nonintrinsic assignments in OMP WORKSHARE [PR100633]
> 
>   PR fortran/100633
> 
> gcc/fortran/ChangeLog:
> 
>   * resolve.c (gfc_resolve_code): Reject nonintrinsic assignments in
>   OMP WORKSHARE.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gfortran.dg/gomp/workshare-59.f90: New test.

LGTM for both trunk and 11.
Thanks.

Jakub

RE: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 05 May 2021 15:08
> To: Andre Simoes Dias Vieira 
> Cc: gcc Patches 
> Subject: Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp
> 
> On Tue, 4 May 2021 at 15:41, Christophe Lyon 
> wrote:
> >
> > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists)
> >  wrote:
> > >
> > > Hi Christophe,
> > >
> > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
> > > > Since MVE has a different set of vector comparison operators from
> > > > Neon, we have to update the expansion to take into account the new
> > > > ones, for instance 'NE' for which MVE does not require to use 'EQ'
> > > > with the inverted condition.
> > > >
> > > > Conversely, Neon supports comparisons with #0, MVE does not.
> > > >
> > > > For:
> > > > typedef long int vs32 __attribute__((vector_size(16)));
> > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; }
> > > >
> > > > we now generate:
> > > > cmp_eq_vs32_reg:
> > > >   vldr.64 d4, .L123   @ 8 [c=8 l=4]  *mve_movv4si/8
> > > >   vldr.64 d5, .L123+8
> > > >   vldr.64 d6, .L123+16@ 9 [c=8 l=4]  *mve_movv4si/8
> > > >   vldr.64 d7, .L123+24
> > > >   vcmp.i32  eq, q0, q1@ 7 [c=16 l=4]  mve_vcmpeqq_v4si
> > > >   vpsel q0, q3, q2@ 15[c=8 l=4]  mve_vpselq_sv4si
> > > >   bx  lr  @ 26[c=8 l=4]  *thumb2_return
> > > > .L124:
> > > >   .align  3
> > > > .L123:
> > > >   .word   0
> > > >   .word   0
> > > >   .word   0
> > > >   .word   0
> > > >   .word   1
> > > >   .word   1
> > > >   .word   1
> > > >   .word   1
> > > >
> > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode))
> produces
> > > > a pair of vldr instead of vmov.i32, qX, #0
> > > I think ideally we would even want:
> > > vpte  eq, q0, q1
> > > vmovt.i32 q0, #0
> > > vmove.i32 q0, #1
> > >
> > > But we don't have a way to generate VPT blocks with multiple
> > > instructions yet unfortunately so I guess VPSEL will have to do for now.
> >
> > TBH,  I looked at what LLVM generates currently ;-)
> >
> 
> Here is an updated version, which adds
> && (! || flag_unsafe_math_optimizations)
> to vcond_mask_
> 
> This condition was not present in the neon.md version I move to vec-
> common.md,
> but since the VDQW iterator includes V2SF and V4SF, it should take
> float-point flags into account.
> 

-  emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1));
+case NE:
+  if (TARGET_HAVE_MVE) {
+   rtx vpr_p0;

GNU style wants the '{' on the new line. This appears a few other times in the 
patch.

+   if (vcond_mve)
+ vpr_p0 = target;
+   else
+ vpr_p0 = gen_reg_rtx (HImode);
+
+   switch (cmp_mode)
+ {
+ case E_V16QImode:
+ case E_V8HImode:
+ case E_V4SImode:
+   emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg 
(cmp_mode, op1)));
+   break;
+ case E_V8HFmode:
+ case E_V4SFmode:
+   if (TARGET_HAVE_MVE_FLOAT)
+ emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, 
force_reg (cmp_mode, op1)));
+   else
+ gcc_unreachable ();
+   break;
+ default:
+   gcc_unreachable ();
+ }

Hmm, I think we can just check GET_MODE_CLASS (cmp_mode) for MODE_VECTOR_INT or 
MODE_VECTOR_FLOAT here rather than have this switch statement.

+
+   /* If we are not expanding a vcond, build the result here.  */
+   if (!vcond_mve) {
+ rtx zero = gen_reg_rtx (cmp_result_mode);
+ rtx one = gen_reg_rtx (cmp_result_mode);
+ emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
+ emit_move_insn (one, CONST1_RTX (cmp_result_mode));
+ emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, 
zero, vpr_p0));
+   }
+  }
+  else

...
   bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]),
-operands[4], operands[5], true);
+operands[4], operands[5], true, 
vcond_mve);
   if (inverted)
 std::swap (operands[1], operands[2]);
+  if (TARGET_NEON)
   emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0],
mask, operands[1], operands[2]));
+  else
+{
+  machine_mode cmp_mode = GET_MODE (operands[4]);
+  rtx vpr_p0 = mask;
+  rtx zero = gen_reg_rtx (cmp_mode);
+  rtx one = gen_reg_rtx (cmp_mode);
+  emit_move_insn (zero, CONST0_RTX (cmp_mode));
+  emit_move_insn (one, CONST1_RTX (cmp_mode));
+  switch (cmp_mode)
+   {
+   case E_V16QImode:
+   case E_V8HImode:
+   case E_V4SImode:
+ emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], 
one, zero, vpr_p0));
+ break;
+   case E_V8HFmode:
+   case E_V4SFmode:
+ if (TARGET_HAVE_MVE_FLOAT)
+

Re: [OG11] Merge GCC 11 into branch, cherry picks from mainline

2021-05-17 Thread Tobias Burnus


On 14.05.21 10:51, Tobias Burnus wrote:


OG11 = devel/omp/gcc-11, a branch with some OpenMP/OpenACC/offload
patches
which are not yet on mainline. Additionally, patches in this area are
cherry-picked from mainline


Changes since last email (cherry pick, merge, post-cherry-pick fix):

0b8439a602c Fortran/OpenMP: Support 'omp parallel master'
e9e03ca4b9f Merge branch 'releases/gcc-11' into devel/omp/gcc-11
17c55806b37 c-c++-common/gomp/map-6.c: Fix dg-error due to mapping changes

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf

Re: [PATCH] arm: Fix ICE with CMSE nonsecure call on Armv8.1-M [PR100333]

2021-05-17 Thread Alex Coplan via Gcc-patches

On 30/04/2021 09:30, Alex Coplan via Gcc-patches wrote:
> Hi,
> 
> As the PR shows, we ICE shortly after expanding nonsecure calls for
> Armv8.1-M.  For Armv8.1-M, we have TARGET_HAVE_FPCXT_CMSE. As it stands,
> the expander (arm.md:nonsecure_call_internal) moves the callee's address
> to a register (with copy_to_suggested_reg) only if
> !TARGET_HAVE_FPCXT_CMSE.
> 
> However, looking at the pattern which the insn appears to be intended to
> match (thumb2.md:*nonsecure_call_reg_thumb2_fpcxt), it requires the
> callee's address to be in a register.
> 
> This patch therefore just forces the callee's address into a register in
> the expander.
> 
> Testing:
>  * Regtested an arm-eabi cross configured with
>  --with-arch=armv8.1-m.main+mve.fp+fp.dp --with-float=hard. No regressions.
>  * Bootstrap and regtest on arm-linux-gnueabihf in progress.
> 
> OK for trunk and backports as appropriate if bootstrap looks good?

Ping? Bootstrap/regtest looked good, FWIW.

> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR target/100333
>   * config/arm/arm.md (nonsecure_call_internal): Always ensure
>   callee's address is in a register.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/100333
>   * gcc.target/arm/cmse/pr100333.c: New test.

> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 45a471a887a..e2ad1a962e3 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -8580,18 +8580,21 @@ (define_expand "nonsecure_call_internal"
> (use (match_operand 2 "" ""))
> (clobber (reg:SI LR_REGNUM))])]
>"use_cmse"
> -  "
>{
> -if (!TARGET_HAVE_FPCXT_CMSE)
> -  {
> - rtx tmp =
> -   copy_to_suggested_reg (XEXP (operands[0], 0),
> -  gen_rtx_REG (SImode, R4_REGNUM),
> -  SImode);
> +rtx tmp = NULL_RTX;
> +rtx addr = XEXP (operands[0], 0);
>  
> - operands[0] = replace_equiv_address (operands[0], tmp);
> -  }
> -  }")
> +if (TARGET_HAVE_FPCXT_CMSE && !REG_P (addr))
> +  tmp = force_reg (SImode, addr);
> +else if (!TARGET_HAVE_FPCXT_CMSE)
> +  tmp = copy_to_suggested_reg (XEXP (operands[0], 0),
> +gen_rtx_REG (SImode, R4_REGNUM),
> +SImode);
> +
> +if (tmp)
> +  operands[0] = replace_equiv_address (operands[0], tmp);
> +  }
> +)
>  
>  (define_insn "*call_reg_armv5"
>[(call (mem:SI (match_operand:SI 0 "s_register_operand" "r"))
> diff --git a/gcc/testsuite/gcc.target/arm/cmse/pr100333.c 
> b/gcc/testsuite/gcc.target/arm/cmse/pr100333.c
> new file mode 100644
> index 000..d8e3d809f73
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/cmse/pr100333.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-mcmse" } */
> +typedef void __attribute__((cmse_nonsecure_call)) t(void);
> +t g;
> +void f() {
> +  g();
> +}


-- 
Alex

RE: [PATCH] testsuite/arm: Add mve-vadd-scalar-1.c test

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 17 May 2021 10:54
> To: gcc Patches 
> Subject: Re: [PATCH] testsuite/arm: Add mve-vadd-scalar-1.c test
> 
> ping?
> 
> On Mon, 10 May 2021 at 13:22, Christophe Lyon
>  wrote:
> >
> > Ping?
> >
> > On Fri, 30 Apr 2021 at 16:06, Christophe Lyon
> >  wrote:
> > >
> > > This patch adds a test for the scalar mode of vadd, precisely noting
> > > that we do not yet use the T2 variants of vadd, which take a scalar as
> > > final argument.
> > >

Ok.
Thanks,
Kyrill

> > > 2021-04-22  Christophe Lyon  
> > >
> > > gcc/testsuite/
> > > * gcc.target/arm/simd/mve-vadd-scalar-1: New.
> > > ---
> > >  .../gcc.target/arm/simd/mve-vadd-scalar-1.c| 47
> ++
> > >  1 file changed, 47 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vadd-
> scalar-1.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
> > > new file mode 100644
> > > index 000..bbf70e1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-scalar-1.c
> > > @@ -0,0 +1,47 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +#include 
> > > +
> > > +#define FUNC_IMM(SIGN, TYPE, BITS, NB, OP, NAME)   \
> > > +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t *
> __restrict__ dest, \
> > > +TYPE##BITS##_t *a) { 
> > > \
> > > +int i; \
> > > +for (i=0; i > > +  dest[i] = a[i] OP 1; \
> > > +}  \
> > > +}
> > > +
> > > +/* 128-bit vectors.  */
> > > +FUNC_IMM(s, int, 32, 4, +, vaddimm)
> > > +FUNC_IMM(u, uint, 32, 4, +, vaddimm)
> > > +FUNC_IMM(s, int, 16, 8, +, vaddimm)
> > > +FUNC_IMM(u, uint, 16, 8, +, vaddimm)
> > > +FUNC_IMM(s, int, 8, 16, +, vaddimm)
> > > +FUNC_IMM(u, uint, 8, 16, +, vaddimm)
> > > +
> > > +/* For the moment we do not select the T2 vadd variant operating on a
> scalar
> > > +   final argument.  */
> > > +/* { dg-final { scan-assembler-times {vadd\.i32  q[0-9]+, q[0-9]+, 
> > > r[0-9]+}
> 2 { xfail *-*-* } } } */
> > > +/* { dg-final { scan-assembler-times {vadd\.i16  q[0-9]+, q[0-9]+, 
> > > r[0-9]+}
> 2 { xfail *-*-* } } } */
> > > +/* { dg-final { scan-assembler-times {vadd\.i8  q[0-9]+, q[0-9]+, 
> > > r[0-9]+}
> 2 { xfail *-*-* } } } */
> > > +
> > > +void test_vaddimm_f32 (float * dest, float * a) {
> > > +  int i;
> > > +  for (i=0; i<4; i++) {
> > > +dest[i] = a[i] + 5.0;
> > > +  }
> > > +}
> > > +/* { dg-final { scan-assembler-times {vadd\.f32 q[0-9]+, q[0-9]+, 
> > > r[0-9]+}
> 1 { xfail *-*-* } } } */
> > > +
> > > +/* Note that dest[i] = a[i] + 5.0f16 is not vectorized.  */
> > > +void test_vaddimm_f16 (__fp16 * dest, __fp16 * a) {
> > > +  int i;
> > > +  __fp16 b = 5.0f16;
> > > +  for (i=0; i<8; i++) {
> > > +dest[i] = a[i] + b;
> > > +  }
> > > +}
> > > +/* { dg-final { scan-assembler-times {vadd\.f16 q[0-9]+, q[0-9]+, 
> > > r[0-9]+}
> 1 { xfail *-*-* } } } */
> > > --
> > > 2.7.4
> > >

[Patch] OpenMP/Fortran: Reject nonintrinsic assignments in OMP WORKSHARE [PR100633]

2021-05-17 Thread Tobias Burnus


OK for mainline?
It is an ice-on-invalid; does a GCC 11 backport nonetheless make sense?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
OpenMP/Fortran: Reject nonintrinsic assignments in OMP WORKSHARE [PR100633]

	PR fortran/100633

gcc/fortran/ChangeLog:

	* resolve.c (gfc_resolve_code): Reject nonintrinsic assignments in
	OMP WORKSHARE.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/workshare-59.f90: New test.

 gcc/fortran/resolve.c   |  6 ++
 gcc/testsuite/gfortran.dg/gomp/workshare-59.f90 | 26 +
 2 files changed, 32 insertions(+)

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index c02bbed8739..747516fbc1d 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -11940,6 +11940,12 @@ start:
 
 	  if (resolve_ordinary_assign (code, ns))
 	{
+	  if (omp_workshare_flag)
+		{
+		  gfc_error ("Expected intrinsic assignment in OMP WORKSHARE "
+			 "at %L", >loc);
+		  break;
+		}
 	  if (code->op == EXEC_COMPCALL)
 		goto compcall;
 	  else
diff --git a/gcc/testsuite/gfortran.dg/gomp/workshare-59.f90 b/gcc/testsuite/gfortran.dg/gomp/workshare-59.f90
new file mode 100644
index 000..65d04c2b55d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/workshare-59.f90
@@ -0,0 +1,26 @@
+! PR fortran/100633
+
+module defined_assign
+  interface assignment(=)
+module procedure work_assign
+  end interface
+
+  contains
+subroutine work_assign(a,b)
+  integer, intent(out) :: a
+  logical, intent(in) :: b(:)
+end subroutine work_assign
+end module defined_assign
+
+program omp_workshare
+  use defined_assign
+
+  integer :: a
+  logical :: l(10)
+  l = .TRUE.
+
+  !$omp workshare
+  a = l   ! { dg-error "Expected intrinsic assignment in OMP WORKSHARE" }
+  !$omp end workshare
+
+end program omp_workshare

RE: [PATCH] testsuite/arm: Add mve-vadd-1.c test

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 17 May 2021 10:54
> To: gcc Patches 
> Subject: Re: [PATCH] testsuite/arm: Add mve-vadd-1.c test
> 
> ping?
> 
> On Mon, 10 May 2021 at 13:22, Christophe Lyon
>  wrote:
> >
> > Ping?
> >
> > On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
> >  wrote:
> > >
> > > Support for vadd has been present for a while, but it was lacking a
> > > test.
> > >

Ok.
Thanks,
Kyrill

> > > 2021-04-22  Christophe Lyon  
> > >
> > > gcc/testsuite/
> > > * gcc.target/arm/simd/mve-vadd-1.c: New.
> > > ---
> > >  gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c | 43
> ++
> > >  1 file changed, 43 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> > > new file mode 100644
> > > index 000..15a9daa
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vadd-1.c
> > > @@ -0,0 +1,43 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +#include 
> > > +
> > > +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)   \
> > > +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t *
> __restrict__ dest, \
> > > +TYPE##BITS##_t *a, 
> > > TYPE##BITS##_t *b) { \
> > > +int i; \
> > > +for (i=0; i > > +  dest[i] = a[i] OP b[i];  \
> > > +}  \
> > > +}
> > > +
> > > +/* 128-bit vectors.  */
> > > +FUNC(s, int, 32, 4, +, vadd)
> > > +FUNC(u, uint, 32, 4, +, vadd)
> > > +FUNC(s, int, 16, 8, +, vadd)
> > > +FUNC(u, uint, 16, 8, +, vadd)
> > > +FUNC(s, int, 8, 16, +, vadd)
> > > +FUNC(u, uint, 8, 16, +, vadd)
> > > +
> > > +/* { dg-final { scan-assembler-times {vadd\.i32  q[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > > +/* { dg-final { scan-assembler-times {vadd\.i16  q[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > > +/* { dg-final { scan-assembler-times {vadd\.i8  q[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > > +
> > > +void test_vadd_f32 (float * dest, float * a, float * b) {
> > > +  int i;
> > > +  for (i=0; i<4; i++) {
> > > +dest[i] = a[i] + b[i];
> > > +  }
> > > +}
> > > +/* { dg-final { scan-assembler-times {vadd\.f32 q[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 1 } } */
> > > +
> > > +void test_vadd_f16 (__fp16 * dest, __fp16 * a, __fp16 * b) {
> > > +  int i;
> > > +  for (i=0; i<8; i++) {
> > > +dest[i] = a[i] + b[i];
> > > +  }
> > > +}
> > > +/* { dg-final { scan-assembler-times {vadd\.f16 q[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 1 } } */
> > > --
> > > 2.7.4
> > >

RE: [PATCH] testsuite/arm: Factorize and increase coverage in mve-sub_1.c

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 17 May 2021 10:54
> To: gcc Patches 
> Subject: Re: [PATCH] testsuite/arm: Factorize and increase coverage in mve-
> sub_1.c
> 
> ping?
> 
> On Mon, 10 May 2021 at 13:22, Christophe Lyon
>  wrote:
> >
> > Ping?
> >
> > On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
> >  wrote:
> > >
> > > Use a template macro to factorize the existing test functions.
> > >
> > > This patch also adds a version to check subtraction with __fp16 type.
> > >

Ok.
Thanks,
Kyrill

> > > 2021-04-26  Christophe Lyon  
> > >
> > > gcc/testsuite/
> > > * gcc.target/arm/simd/mve-vsub_1.c: Factorize and add __fp16 test.
> > > ---
> > >  gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c | 60 +
> -
> > >  1 file changed, 21 insertions(+), 39 deletions(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> > > index 842e5c6..5a6c345 100644
> > > --- a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
> > > @@ -5,60 +5,42 @@
> > >
> > >  #include 
> > >
> > > -void test_vsub_i32 (int32_t * dest, int32_t * a, int32_t * b) {
> > > -  int i;
> > > -  for (i=0; i<4; i++) {
> > > -dest[i] = a[i] - b[i];
> > > -  }
> > > +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)   \
> > > +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t *
> __restrict__ dest, \
> > > +TYPE##BITS##_t *a, 
> > > TYPE##BITS##_t *b) { \
> > > +int i; \
> > > +for (i=0; i > > +  dest[i] = a[i] OP b[i];  \
> > > +}  \
> > >  }
> > >
> > > -void test_vsub_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
> > > -  int i;
> > > -  for (i=0; i<4; i++) {
> > > -dest[i] = a[i] - b[i];
> > > -  }
> > > -}
> > > +/* 128-bit vectors.  */
> > > +FUNC(s, int, 32, 4, -, vsub)
> > > +FUNC(u, uint, 32, 4, -, vsub)
> > > +FUNC(s, int, 16, 8, -, vsub)
> > > +FUNC(u, uint, 16, 8, -, vsub)
> > > +FUNC(s, int, 8, 16, -, vsub)
> > > +FUNC(u, uint, 8, 16, -, vsub)
> > >
> > >  /* { dg-final { scan-assembler-times {vsub\.i32\tq[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > > -
> > > -void test_vsub_i16 (int16_t * dest, int16_t * a, int16_t * b) {
> > > -  int i;
> > > -  for (i=0; i<8; i++) {
> > > -dest[i] = a[i] - b[i];
> > > -  }
> > > -}
> > > -
> > > -void test_vsub_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
> > > -  int i;
> > > -  for (i=0; i<8; i++) {
> > > -dest[i] = a[i] - b[i];
> > > -  }
> > > -}
> > > -
> > >  /* { dg-final { scan-assembler-times {vsub\.i16\tq[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > > +/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > >
> > > -void test_vsub_i8 (int8_t * dest, int8_t * a, int8_t * b) {
> > > -  int i;
> > > -  for (i=0; i<16; i++) {
> > > -dest[i] = a[i] - b[i];
> > > -  }
> > > -}
> > > -
> > > -void test_vsub_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
> > > +void test_vsub_f32 (float * dest, float * a, float * b) {
> > >int i;
> > > -  for (i=0; i<16; i++) {
> > > +  for (i=0; i<4; i++) {
> > >  dest[i] = a[i] - b[i];
> > >}
> > >  }
> > > +/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, q[0-
> 9]+} 1 } } */
> > >
> > > -/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 2 } } */
> > >
> > > -void test_vsub_f32 (float * dest, float * a, float * b) {
> > > +void test_vsub_f16 (__fp16 * dest, __fp16 * a, __fp16 * b) {
> > >int i;
> > > -  for (i=0; i<4; i++) {
> > > +  for (i=0; i<8; i++) {
> > >  dest[i] = a[i] - b[i];
> > >}
> > >  }
> > >
> > > -/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, 
> > > q[0-9]+}
> 1 } } */
> > > +/* { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, q[0-
> 9]+} 1 } } */
> > >
> > > --
> > > 2.7.4
> > >

RE: [PATCH] testsuite/arm: Improve mve-vshr.c

2021-05-17 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 17 May 2021 10:54
> To: gcc Patches 
> Subject: Re: [PATCH] testsuite/arm: Improve mve-vshr.c
> 
> ping?
> 
> On Mon, 10 May 2021 at 13:22, Christophe Lyon
>  wrote:
> >
> > Ping?
> >
> > On Tue, 27 Apr 2021 at 13:32, Christophe Lyon
> >  wrote:
> > >
> > > Vector right shifts by immediate use vshr, while right shifts by
> > > vectors instead use vneg and vshl.
> > >
> > > This patch adds the corresponding scan-assembler-times that were
> > > missing.
> > >

Ok.
Thanks,
Kyrill

> > > 2021-04-22  Christophe Lyon  
> > >
> > > gcc/testsuite/
> > > * gcc.target/arm/simd/mve-vshr.c: Add more scan-assembler-times.
> > > ---
> > >  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c | 7 +++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > > index d4e658c..d4258e9 100644
> > > --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > > @@ -55,5 +55,12 @@ FUNC_IMM(u, uint, 8, 16, >>, vshrimm)
> > >
> > >  /* MVE has only 128-bit vectors, so we can vectorize only half of the
> > > functions above.  */
> > > +/* Vector right shifts use vneg and left shifts.  */
> > > +/* { dg-final { scan-assembler-times {vshl.s[0-9]+\tq[0-9]+, q[0-9]+} 3 
> > > } }
> */
> > > +/* { dg-final { scan-assembler-times {vshl.u[0-9]+\tq[0-9]+, q[0-9]+} 3 
> > > } }
> */
> > > +/* { dg-final { scan-assembler-times {vneg.s[0-9]+  q[0-9]+, q[0-9]+} 6 
> > > } }
> */
> > > +
> > > +
> > > +/* Shift by immediate.  */
> > >  /* { dg-final { scan-assembler-times {vshr.s[0-9]+\tq[0-9]+, q[0-9]+} 3 
> > > } }
> */
> > >  /* { dg-final { scan-assembler-times {vshr.u[0-9]+\tq[0-9]+, q[0-9]+} 3 
> > > } }
> */
> > > --
> > > 2.7.4
> > >

1 2 >

1 - 100 of 134 matches

Mail list logo