Re: [PATCH] c++: Fix self-mapping in map_arguments [PR96531, PR97103]

2020-09-18 Thread Jason Merrill via Gcc-patches

On 9/18/20 8:42 PM, Patrick Palka wrote:

On Fri, 18 Sep 2020, Patrick Palka wrote:


On Fri, 18 Sep 2020, Jason Merrill wrote:


On 9/18/20 4:07 PM, Patrick Palka wrote:

With r10-8077 we stopped passing the argified current_template_parms to
normalize_constraint_expression from finish_nested_requirement, and
instead tweaked map_arguments to perform a self-mapping of parameters
when args is NULL.  We're currently not handling parameter packs and
BOUND_TEMPLATE_TEMPLATE_PARMs properly during this self-mapping, which
leads to ICEs later during satisfaction.

To fix the self-mapping of a parameter pack, this patch makes
map_arguments use template_parm_to_arg which already does the right
thing for parameter packs.

Before r10-8077, a BOUND_TEMPLATE_TEMPLATE_PARM would get mapped to the
corresponding TEMPLATE_TEMPLATE_PARM.  We could restore this behavior in
map_arguments, but since a BOUND_TEMPLATE_TEMPLATE_PARM is not really a
template parameter it seems better to make keep_template_parm not give
us a BOUND_TEMPLATE_TEMPLATE_PARM in the first place.  I think what we
actually want is to map the TEMPLATE_TEMPLATE_PARM to itself, so this
patch adjusts keep_template_parm to give us the corresponding
TEMPLATE_TEMPLATE_PARM of a BOUND_TEMPLATE_TEMPLATE_PARM instead.

Tested on x86_64-pc-linux-gnu, and also tested with the cmcstl2 library.
Does this look OK for trunk/10?

gcc/cp/ChangeLog:

PR c++/96531
PR c++/97103
* constraint.cc (map_arguments): Call template_parm_to_arg
appropriately when doing a self-mapping.
* pt.c (keep_template_parm): Don't record a
BOUND_TEMPLATE_TEMPLATE_PARM, instead record its corresponding
TEMPLATE_TEMPLATE_PARM.

gcc/testsuite/ChangeLog:

PR c++/96531
PR c++/97103
* g++.dg/cpp2a/concepts-ttp2.C: New test.
* g++.dg/cpp2a/concepts-variadic1.C: New test.
---
   gcc/cp/constraint.cc  | 27 --
   gcc/cp/pt.c   |  5 
   gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C| 11 
   .../g++.dg/cpp2a/concepts-variadic1.C | 28 +++
   4 files changed, 62 insertions(+), 9 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-variadic1.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 0aab3073cc1..43336d191d9 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -546,15 +546,24 @@ static tree
   map_arguments (tree parms, tree args)
   {
 for (tree p = parms; p; p = TREE_CHAIN (p))
-if (args)
-  {
-   int level;
-   int index;
-   template_parm_level_and_index (TREE_VALUE (p), , );
-   TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
-  }
-else
-  TREE_PURPOSE (p) = TREE_VALUE (p);
+{
+  tree parm = TREE_VALUE (p);
+  if (args)
+   {
+ int level;
+ int index;
+ template_parm_level_and_index (parm, , );
+ TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
+   }
+  else
+   {
+ tree tpi = (TYPE_P (parm)
+ ? TEMPLATE_TYPE_PARM_INDEX (parm) : parm);
+ TREE_PURPOSE (p)
+   = template_parm_to_arg (build_tree_list (NULL_TREE,
+TEMPLATE_PARM_DECL
(tpi)));


Doesn't passing 'p' to template_parm_to_arg work?


Unfortunately not, template_parm_to_arg expects a TREE_LIST node whose
TREE_VALUE is the corresponding *_DECL for the template parm (i.e. it
expects node from current_template_parms), and 'p' is a
TEMPLATE_TYPE_PARM, TEMPLATE_TEMPLATE PARM or TEMPLATE_PARM_INDEX (given
to us by find_template_parameters).


Oops, 'p' is of course a TREE_LIST of a TEMPLATE_PARM_P node.
But template_parm_to_arg wants a TREE_LIST of a DECL_TEMPLATE_PARM_P
node.



Would it be appropriate to adjust template_parm_to_arg to accept the
latter inputs as well?


This version extends template_parm_to_arg to handle TEMPLATE_PARM_P
nodes alongside its existing DECL_TEMPLATE_PARM_P handling, which allows
us to simply pass 'p' to template_parm_to_arg from map_arguments.

Bootstrapped and regtested on x86_64-pc-linux-gnu and tested on cmcstl2
and range-v3.


OK, thanks.


-- >8 --

Subject: [PATCH] c++: Fix self-mapping in map_arguments [PR96531, PR97103]

With r10-8077 we stopped passing the argified current_template_parms to
normalize_constraint_expression from finish_nested_requirement, and
instead made map_arguments perform a self-mapping of parameters when
args is NULL.  But we're currently not handling parameter packs and
BOUND_TEMPLATE_TEMPLATE_PARMs properly during this self-mapping, which
leads to ICEs later during satisfaction.

To properly handle self-mapping of a parameter pack, this patch
extends template_parm_to_arg to handle TEMPLATE_PARM_P nodes, and
makes map_arguments use it.  This change revealed that the call to

[PATCH] libstdc++: Fix division by zero in std::sample

2020-09-18 Thread Patrick Palka via Gcc-patches
This fixes a division by zero in the selection-sampling std::__search
overload when the input range is empty (and hence __unsampled_sz is 0).

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:

* include/bits/stl_algo.h (__sample): Exit early when the
input range is empty.
* testsuite/25_algorithms/sample/3.cc: New test.
---
 libstdc++-v3/include/bits/stl_algo.h  |  3 ++
 .../testsuite/25_algorithms/sample/3.cc   | 50 +++
 2 files changed, 53 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/sample/3.cc

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index a0b96c61798..2478b5857c1 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -5775,6 +5775,9 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   using _Gen = remove_reference_t<_UniformRandomBitGenerator>;
   using __uc_type = common_type_t;
 
+  if (__first == __last)
+   return __out;
+
   __distrib_type __d{};
   _Size __unsampled_sz = std::distance(__first, __last);
   __n = std::min(__n, __unsampled_sz);
diff --git a/libstdc++-v3/testsuite/25_algorithms/sample/3.cc 
b/libstdc++-v3/testsuite/25_algorithms/sample/3.cc
new file mode 100644
index 000..e89c40e27ee
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/sample/3.cc
@@ -0,0 +1,50 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do run { target c++17 } }
+// { dg-require-cstdint "" }
+
+#include 
+#include 
+#include 
+#include 
+
+std::mt19937 rng;
+
+using std::sample;
+using __gnu_test::test_container;
+using __gnu_test::output_iterator_wrapper;
+using __gnu_test::forward_iterator_wrapper;
+
+void
+test01()
+{
+  const int in = 0;
+  test_container pop(, );
+  int out;
+  test_container samp(,  + 1);
+
+  auto it = sample(pop.begin(), pop.end(), samp.begin(), 1, rng);
+  VERIFY( it.ptr ==  );
+}
+
+int
+main()
+{
+  test01();
+}
-- 
2.28.0.497.g54e85e7af1



[PATCH] libstdc++: Mark some more algorithms constexpr for C++20

2020-09-18 Thread Patrick Palka via Gcc-patches
As per P0202.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:

* include/bits/stl_algo.h (for_each_n): Mark constexpr for C++20.
(search): Likewise for the overload that takes a searcher.
* testsuite/25_algorithms/for_each/constexpr.cc: Test constexpr
std::for_each_n.
* testsuite/25_algorithms/search/constexpr.cc: Test constexpr
std::search overload that takes a searcher.
---
 libstdc++-v3/include/bits/stl_algo.h |  2 ++
 .../testsuite/25_algorithms/for_each/constexpr.cc| 12 
 .../testsuite/25_algorithms/search/constexpr.cc  |  4 
 3 files changed, 18 insertions(+)

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 550a15f2b3b..a0b96c61798 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -3832,6 +3832,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
*  If `__f` has a return value it is ignored.
   */
   template
+_GLIBCXX20_CONSTEXPR
 _InputIterator
 for_each_n(_InputIterator __first, _Size __n, _Function __f)
 {
@@ -4251,6 +4252,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
*  @return @p __searcher(__first,__last).first
   */
   template
+_GLIBCXX20_CONSTEXPR
 inline _ForwardIterator
 search(_ForwardIterator __first, _ForwardIterator __last,
   const _Searcher& __searcher)
diff --git a/libstdc++-v3/testsuite/25_algorithms/for_each/constexpr.cc 
b/libstdc++-v3/testsuite/25_algorithms/for_each/constexpr.cc
index 1bece35a0d9..b3aca23eccc 100644
--- a/libstdc++-v3/testsuite/25_algorithms/for_each/constexpr.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/for_each/constexpr.cc
@@ -34,3 +34,15 @@ test()
 }
 
 static_assert(test());
+
+constexpr bool
+test_n()
+{
+  int tot = 0;
+  auto sum = [ = tot](int i){ total += i; };
+  auto sum2 = std::for_each_n(ca0.begin(), std::size(ca0)-1, sum);
+
+  return tot == 55;
+}
+
+static_assert(test_n());
diff --git a/libstdc++-v3/testsuite/25_algorithms/search/constexpr.cc 
b/libstdc++-v3/testsuite/25_algorithms/search/constexpr.cc
index ba9437eced7..e34194cfc5d 100644
--- a/libstdc++-v3/testsuite/25_algorithms/search/constexpr.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/search/constexpr.cc
@@ -31,6 +31,10 @@ test()
 cam.begin(), cam.end(),
 std::equal_to());
 
+  const auto outtt2
+= std::search(ca0.begin(), ca0.end(),
+ std::default_searcher(cam.begin(), cam.end()));
+
   return true;
 }
 
-- 
2.28.0.497.g54e85e7af1



Re: [PATCH] c++: Fix self-mapping in map_arguments [PR96531, PR97103]

2020-09-18 Thread Patrick Palka via Gcc-patches
On Fri, 18 Sep 2020, Patrick Palka wrote:

> On Fri, 18 Sep 2020, Jason Merrill wrote:
> 
> > On 9/18/20 4:07 PM, Patrick Palka wrote:
> > > With r10-8077 we stopped passing the argified current_template_parms to
> > > normalize_constraint_expression from finish_nested_requirement, and
> > > instead tweaked map_arguments to perform a self-mapping of parameters
> > > when args is NULL.  We're currently not handling parameter packs and
> > > BOUND_TEMPLATE_TEMPLATE_PARMs properly during this self-mapping, which
> > > leads to ICEs later during satisfaction.
> > > 
> > > To fix the self-mapping of a parameter pack, this patch makes
> > > map_arguments use template_parm_to_arg which already does the right
> > > thing for parameter packs.
> > > 
> > > Before r10-8077, a BOUND_TEMPLATE_TEMPLATE_PARM would get mapped to the
> > > corresponding TEMPLATE_TEMPLATE_PARM.  We could restore this behavior in
> > > map_arguments, but since a BOUND_TEMPLATE_TEMPLATE_PARM is not really a
> > > template parameter it seems better to make keep_template_parm not give
> > > us a BOUND_TEMPLATE_TEMPLATE_PARM in the first place.  I think what we
> > > actually want is to map the TEMPLATE_TEMPLATE_PARM to itself, so this
> > > patch adjusts keep_template_parm to give us the corresponding
> > > TEMPLATE_TEMPLATE_PARM of a BOUND_TEMPLATE_TEMPLATE_PARM instead.
> > > 
> > > Tested on x86_64-pc-linux-gnu, and also tested with the cmcstl2 library.
> > > Does this look OK for trunk/10?
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   PR c++/96531
> > >   PR c++/97103
> > >   * constraint.cc (map_arguments): Call template_parm_to_arg
> > >   appropriately when doing a self-mapping.
> > >   * pt.c (keep_template_parm): Don't record a
> > >   BOUND_TEMPLATE_TEMPLATE_PARM, instead record its corresponding
> > >   TEMPLATE_TEMPLATE_PARM.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   PR c++/96531
> > >   PR c++/97103
> > >   * g++.dg/cpp2a/concepts-ttp2.C: New test.
> > >   * g++.dg/cpp2a/concepts-variadic1.C: New test.
> > > ---
> > >   gcc/cp/constraint.cc  | 27 --
> > >   gcc/cp/pt.c   |  5 
> > >   gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C| 11 
> > >   .../g++.dg/cpp2a/concepts-variadic1.C | 28 +++
> > >   4 files changed, 62 insertions(+), 9 deletions(-)
> > >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C
> > >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-variadic1.C
> > > 
> > > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > > index 0aab3073cc1..43336d191d9 100644
> > > --- a/gcc/cp/constraint.cc
> > > +++ b/gcc/cp/constraint.cc
> > > @@ -546,15 +546,24 @@ static tree
> > >   map_arguments (tree parms, tree args)
> > >   {
> > > for (tree p = parms; p; p = TREE_CHAIN (p))
> > > -if (args)
> > > -  {
> > > - int level;
> > > - int index;
> > > - template_parm_level_and_index (TREE_VALUE (p), , );
> > > - TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
> > > -  }
> > > -else
> > > -  TREE_PURPOSE (p) = TREE_VALUE (p);
> > > +{
> > > +  tree parm = TREE_VALUE (p);
> > > +  if (args)
> > > + {
> > > +   int level;
> > > +   int index;
> > > +   template_parm_level_and_index (parm, , );
> > > +   TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
> > > + }
> > > +  else
> > > + {
> > > +   tree tpi = (TYPE_P (parm)
> > > +   ? TEMPLATE_TYPE_PARM_INDEX (parm) : parm);
> > > +   TREE_PURPOSE (p)
> > > + = template_parm_to_arg (build_tree_list (NULL_TREE,
> > > +  TEMPLATE_PARM_DECL
> > > (tpi)));
> > 
> > Doesn't passing 'p' to template_parm_to_arg work?
> 
> Unfortunately not, template_parm_to_arg expects a TREE_LIST node whose
> TREE_VALUE is the corresponding *_DECL for the template parm (i.e. it
> expects node from current_template_parms), and 'p' is a
> TEMPLATE_TYPE_PARM, TEMPLATE_TEMPLATE PARM or TEMPLATE_PARM_INDEX (given
> to us by find_template_parameters).

Oops, 'p' is of course a TREE_LIST of a TEMPLATE_PARM_P node.
But template_parm_to_arg wants a TREE_LIST of a DECL_TEMPLATE_PARM_P
node.

> 
> Would it be appropriate to adjust template_parm_to_arg to accept the
> latter inputs as well?

This version extends template_parm_to_arg to handle TEMPLATE_PARM_P
nodes alongside its existing DECL_TEMPLATE_PARM_P handling, which allows
us to simply pass 'p' to template_parm_to_arg from map_arguments.

Bootstrapped and regtested on x86_64-pc-linux-gnu and tested on cmcstl2
and range-v3.

-- >8 --

Subject: [PATCH] c++: Fix self-mapping in map_arguments [PR96531, PR97103]

With r10-8077 we stopped passing the argified current_template_parms to
normalize_constraint_expression from finish_nested_requirement, and
instead made map_arguments perform a self-mapping of parameters when
args is NULL.  But we're currently not handling parameter packs and
BOUND_TEMPLATE_TEMPLATE_PARMs properly 

Re: [Patch 5/5] rs6000, Conversions between 128-bit integer and floating point values.

2020-09-18 Thread will schmidt via Gcc-patches
On Tue, 2020-08-11 at 12:23 -0700, Carl Love wrote:
> Segher, Will:
> 
> Patch 5 adds the 128-bit integer to/from 128-floating point
> conversions.  This patch has to invoke the routines to use the 128-bit
> hardware instructions if on Power 10 or use software routines if
> running on a pre Power 10 system via the resolve function.  
> 
>   Carl 


Some mostly cosmetic bits below.
Thanks
-Will


> 
> ---
> Conversions between 128-bit integer and floating point values.
> 
> gcc/ChangeLog
> 
> 2020-08-10  Carl Love  
>   config/rs6000/rs6000.md (floatunsti2,
>   fix_truncti2, fixuns_truncti2): Add
>   define_insn for mode IEEE 128.

also floatti2



>   libgcc/config/rs6000/fixkfi-sw.c: New file.
>   libgcc/config/rs6000/fixkfi.c: Remove file.

Should that be fixkfti-sw.c  (missing t)?

Adjust to indicate this is a rename
libgcc/config/rs6000/fixkfti.c: Rename to
libgcc/config/rs6000/fixkfti-sw.c


>   libgcc/config/rs6000/fixunskfi-sw.c: New file.
>   libgcc/config/rs6000/fixunskfi.c: Remove file.
>   libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
>   __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
>   New functions.

>   libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
>   New macro.
>   (__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
>   __fixunskfti_resolve): Add resolve functions.
>   (__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
>   functions.
>   libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
>   __fixtfti, __fixunstfti): Add editor commands to change
>   names.
>   libgcc/config/rs6000/float128-sed-hw (__floattitf,
>   __floatuntitf, __fixtfti, __fixunstfti): Add editor commands
>   to change names.
>   libgcc/config/rs6000/floattikf-sw.c: New file.
>   libgcc/config/rs6000/floattikf.c: Remove file.
>   libgcc/config/rs6000/floatuntikf-sw.c: New file.
>   libgcc/config/rs6000/floatuntikf.c: Remove file.
>   libgcc/config/rs6000/floatuntikf-sw.c: New file.
>   libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
>   __floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
>   __floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
>   __floatuntikf, __fixkfti, __fixunskfti):New extern declarations.

no tab.

>   libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
>   fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
>   (floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
>   file names to fp128_ppc_funcs.
> 
> gcc/testsuite/ChangeLog
> 
> 2020-08-10  Carl Love  
>   gcc.target/powerpc/fl128_conversions.c: New file.
> ---
>  gcc/config/rs6000/rs6000.md   |  36 +++
>  .../gcc.target/powerpc/fp128_conversions.c| 287 ++
>  .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
>  .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   4 +-
>  libgcc/config/rs6000/float128-hw.c|  24 ++
>  libgcc/config/rs6000/float128-ifunc.c |  44 ++-
>  libgcc/config/rs6000/float128-sed |   4 +
>  libgcc/config/rs6000/float128-sed-hw  |   4 +
>  .../rs6000/{floattikf.c => floattikf-sw.c}|   4 +-
>  .../{floatuntikf.c => floatuntikf-sw.c}   |   4 +-
>  libgcc/config/rs6000/quad-float128.h  |  17 +-
>  libgcc/config/rs6000/t-float128   |   3 +-
>  12 files changed, 415 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
>  rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
>  rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (96%)
>  rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
>  rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%)
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 43b620ae1c0..3853ebd4195 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6390,6 +6390,42 @@
> xscvsxddp %x0,%x1"
>[(set_attr "type" "fp")])
> 
> +(define_insn "floatti2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +   (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvsqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "floatunsti2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +   (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" 
> "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvuqqp %0,%1";
> +}
> +  [(set_attr "type" "fp")])
> +
> +(define_insn "fix_truncti2"
> +  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
> +   (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +{
> +  return  "xscvqpsqz %0,%1";
> +}
> +  

Re: [PATCH] CSE negated multiplications and divisions

2020-09-18 Thread Segher Boessenkool
Hi!

On Thu, Sep 17, 2020 at 01:20:35PM +0200, Richard Biener wrote:
> This adds the capability to look for available negated multiplications
> and divisions, replacing them with cheaper negates.

It is longer latency than the original insns.  Combine will try to undo
this, because of that (it depends on the insn costs if that can
succeed).  On gimple it is always cheaper, of course.


Segher


Re: [PATCH] bpf: use xBPF signed div, mod insns when available

2020-09-18 Thread Segher Boessenkool
Hi!

On Thu, Sep 17, 2020 at 10:15:30AM -0700, David Faust via Gcc-patches wrote:
> The 'mod' and 'div' operators in eBPF are unsigned, with no signed
> counterpart. xBPF adds two new ALU operations, sdiv and smod, for
> signed division and modulus, respectively. Update bpf.md with
> 'define_insn' blocks for signed div and mod to use them when targetting
> xBPF, and add new tests to ensure they are used appropriately.

So why does xBPF have signed versions of the divides?  Is it because it
is wanted to have it in eBPF eventually?  Is it because the libgcc
routines are just too slow?  Is it because (the generic) libgcc does not
trap for MIN_INT / -1 ?  Some other reason?

(I'm just curious; I cannot figure it out :-) )


Segher


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-18 Thread Segher Boessenkool
Hi!

On Fri, Sep 18, 2020 at 03:31:12PM -0500, Qing Zhao wrote:
> Let me know your opinion:
> 
> A.  Will not provide default definition in middle end to generate the zeroing 
> insn for selected registers.  Move the generation work all to target; X86 
> implementation will be provided;
> 
> OR:
> 
> B.  Will provide a default definition in middle end to generate the zeroing 
> insn for selected registers. Then need to add a new target hook 
> “ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will 
> be provided in my patch. 

Is this just to make the xor thing work?  i386 has a peephole to
transform the mov to a xor for this (and the backend could just handle
it in its mov patterns, maybe a peephole was easier for i386, no
idea).


Segher


Re: [PATCH] PR fortran/97036 - [F2018] Allow ELEMENTAL RECURSIVE procedure prefix

2020-09-18 Thread Jerry DeLisle via Gcc-patches

ok, thanks .

On 9/15/20 1:35 PM, Harald Anlauf wrote:

As stated in the PR, the Fortran 2018 standard removed the restriction
prohibiting ELEMENTAL RECURSIVE procedures.  Adjust the relevant check.

Regtested on x86_64-pc-linux-gnu.

OK for master?

Thanks,
Harald


PR fortran/97036 - [F2018] Allow ELEMENTAL RECURSIVE procedure prefix

gcc/fortran/ChangeLog:

* symbol.c (gfc_check_conflict): Allow ELEMENTAL RECURSIVE
procedure prefix for -std=f2018.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr97036.f90: New test.





[committed] analyzer: fix warning_event::get_desc for global state changes

2020-09-18 Thread David Malcolm via Gcc-patches
When experimenting the a new state_machine with global state I noticed
that the fallback handling in warning_event::get_desc assumes we have
per-value states, and ICEs on global states.  Fixed thusly.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as 1df487a5204462f1ec0b1eab39b0a463a8d6b89f.

gcc/analyzer/ChangeLog:
* checker-path.cc (warning_event::get_desc): Handle global state
changes.
---
 gcc/analyzer/checker-path.cc | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
index 2503d024a83..c28131651c6 100644
--- a/gcc/analyzer/checker-path.cc
+++ b/gcc/analyzer/checker-path.cc
@@ -872,11 +872,17 @@ warning_event::get_desc (bool can_colorize) const
{
  if (m_sm && flag_analyzer_verbose_state_changes)
{
- label_text result
-   = make_label_text (can_colorize,
-  "%s (%qE is in state %qs)",
-  ev_desc.m_buffer,
-  m_var, m_state->get_name ());
+ label_text result;
+ if (m_var)
+   result = make_label_text (can_colorize,
+ "%s (%qE is in state %qs)",
+ ev_desc.m_buffer,
+ m_var, m_state->get_name ());
+ else
+   result = make_label_text (can_colorize,
+ "%s (in global state %qs)",
+ ev_desc.m_buffer,
+ m_state->get_name ());
  ev_desc.maybe_free ();
  return result;
}
@@ -886,9 +892,16 @@ warning_event::get_desc (bool can_colorize) const
 }
 
   if (m_sm)
-return make_label_text (can_colorize,
-   "here (%qE is in state %qs)",
-   m_var, m_state->get_name ());
+{
+  if (m_var)
+   return make_label_text (can_colorize,
+   "here (%qE is in state %qs)",
+   m_var, m_state->get_name ());
+  else
+   return make_label_text (can_colorize,
+   "here (in global state %qs)",
+   m_state->get_name ());
+}
   else
 return label_text::borrow ("here");
 }
-- 
2.26.2



[committed] analyzer: handle strdup and strndup

2020-09-18 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as c89956cba9d1a5fbf059f7880ff49418718a2965.

gcc/analyzer/ChangeLog:
* sm-malloc.cc (malloc_state_machine::on_stmt): Handle strdup and
strndup as being malloc-like allocators.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/strdup-1.c: New test.
* gcc.dg/analyzer/strndup-1.c: New test.
---
 gcc/analyzer/sm-malloc.cc |  4 +++-
 gcc/testsuite/gcc.dg/analyzer/strdup-1.c  | 21 +
 gcc/testsuite/gcc.dg/analyzer/strndup-1.c | 21 +
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strdup-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strndup-1.c

diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
index 2b5a870d35f..90d1da14586 100644
--- a/gcc/analyzer/sm-malloc.cc
+++ b/gcc/analyzer/sm-malloc.cc
@@ -984,7 +984,9 @@ malloc_state_machine::on_stmt (sm_context *sm_ctxt,
|| is_std_named_call_p (callee_fndecl, "malloc", call, 1)
|| is_std_named_call_p (callee_fndecl, "calloc", call, 2)
|| is_named_call_p (callee_fndecl, "__builtin_malloc", call, 1)
-   || is_named_call_p (callee_fndecl, "__builtin_calloc", call, 2))
+   || is_named_call_p (callee_fndecl, "__builtin_calloc", call, 2)
+   || is_named_call_p (callee_fndecl, "strdup", call, 1)
+   || is_named_call_p (callee_fndecl, "strndup", call, 2))
  {
on_allocator_call (sm_ctxt, call, m_malloc);
return true;
diff --git a/gcc/testsuite/gcc.dg/analyzer/strdup-1.c 
b/gcc/testsuite/gcc.dg/analyzer/strdup-1.c
new file mode 100644
index 000..6b950ca23a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/strdup-1.c
@@ -0,0 +1,21 @@
+#include 
+#include 
+
+extern void requires_nonnull (void *ptr)
+  __attribute__((nonnull));
+
+void test_1 (const char *s)
+{
+  char *p = strdup (s); /* { dg-message "allocated here" } */
+} /* { dg-warning "leak of 'p'" } */
+
+void test_2 (const char *s)
+{
+  char *p = strdup (s);
+  free (p);
+}
+void test_3 (const char *s)
+{
+  char *p = strdup (s); /* { dg-message "this call could return NULL" } */
+  requires_nonnull (p); /* { dg-warning "use of possibly-NULL 'p'" } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/strndup-1.c 
b/gcc/testsuite/gcc.dg/analyzer/strndup-1.c
new file mode 100644
index 000..23d9b6070ce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/strndup-1.c
@@ -0,0 +1,21 @@
+#include 
+#include 
+
+extern void requires_nonnull (void *ptr)
+  __attribute__((nonnull));
+
+void test_1 (const char *s)
+{
+  char *p = strndup (s, 42); /* { dg-message "allocated here" } */
+} /* { dg-warning "leak of 'p'" } */
+
+void test_2 (const char *s)
+{
+  char *p = strndup (s, 42);
+  free (p);
+}
+void test_3 (const char *s)
+{
+  char *p = strndup (s, 42); /* { dg-message "this call could return NULL" } */
+  requires_nonnull (p); /* { dg-warning "use of possibly-NULL 'p'" } */
+}
-- 
2.26.2



Re: [PATCH] irange_pool class

2020-09-18 Thread Andrew MacLeod via Gcc-patches

On 9/18/20 4:35 PM, Martin Sebor wrote:

On 9/18/20 11:36 AM, Andrew MacLeod wrote:

On

it works exactly like one would expect a simple allocator to work.. 
as long as the allcoator is "live", its allocations are live.  once 
it is destructed, all the memory it manages is freed..    It purpose 
is to avoid having to walk/find everything that was allocated so it 
can be freed.


I'll give you the use case from the ranger. in fact, it *is* the 
ranger's allocator, exposed for other passes to use.


Ranger uses the allocator to store the live-on-entry ranges for 
ssa-names.  Each basic block has a vec allocated as needed 
which is indexed by ssa-name.


int_range_max is passed to range_on_entry() to go off and find the 
range..  when it comes back, it could have 0-255 subranges,. it 
doesnt matter.
we call allocate(range) to get a pointer to an efficent copy and 
store it in the vector for the ssa name in that block.
If the updater later discovers that the range can be made more 
accurate, it checks if the new range fits in the memory allocated and 
if it does, simply copies.  if it doesnt fit, then it frees the old 
hunk, and allocates  a new one and stores that.


The ranger creates an allocator when it starts up, and then frees it 
when its being destructed.  Thats the life of the on-entry cache, so 
thats matches the life of the allocator..


I don't understand the proxy comment, or why one would ever want to 
copy the allocator itself? or why would you derive from irange? why 
cant you just create an allocator when the pass starts, use it when 
you need to store a range, and then let it go at the end of the pass 
with the other memory?


The int_range template is derived from irange and provides the array
of trees that the irange works with.  The pool also provides memory
for iranges and effectively returns objects "derived" from irange
(they're bigger than it is).

What I meant by proxy is a substitute class each of whose objects
stands in for a single instance of int_range where N is
a runtime value.   There's no class like that.



no, but that doesnt serve a lot of purpose either.     you can call    
allocator.allocate(N) which is effectively that... ?




its mean to be a convenient way to get a range allocated to store via 
a pointer. nothing more.  if you have more complex needs,then you 
need to manage those needs.  The ranger manages the live on entry 
vectors separately from the ranges..


What I'm thinking of is actually more basic than that: an irange
class with a runtime number of subranges, one that can be either
directly stored in a container like vec:

  vec

where dynamic_range is such a class, or that can be a member of
a class that's stored in it.  I expect that will be the default
use case for the passes that walk the IL looking for the sizes
and offsets into the objects, accesses to which they validate.
This can be done with int_range for constant N but not with
irange because it doesn't own the memory it works with).

To illustrate what I mean here's a completely untested outline
of a plain-Jane dynamic_irange class (it won't compile because
it accesses private and protected members of irange, but it
should give you the idea):

  class dynamic_irange: public irange
  {
  public:
    dynamic_irange (unsigned num_pairs)
  : irange (new tree[num_pairs], num_pairs) { }

    dynamic_irange (const dynamic_irange )
  : irange (new tree[rhs.m_max_ranges], rhs.m_num_ranges)
    { irange::operator= (rhs); }

    dynamic_irange (dynamic_irange &)
  : irange (rhs.m_base, rhs.m_max_ranges)
    { rhs.m_base = 0; }

    dynamic_irange& operator= (const dynamic_irange )
    {
  if (this != )
    {
  delete[] m_base;
  m_base = new tree[rhs.m_max_ranges];
  m_num_ranges = rhs.m_num_ranges;
  irange::operator= (rhs);
    }
  return *this;
    }
    ~dynamic_irange () { delete[] m_base; }
  };

A more fancy class would be parameterized on an Allocator policy
that it would allocate memory with, much like C++ standard classes
do.  That would let you define an obstack-based allocator class to
use the way you need, as well and any others.  (Providing
the allocator as a template argument to just the ctor as opposed
to the whole class itself would make different "instances"
interchangeable for one another.)

Martin


 We had a dynamic sized irange, I told aldy to kill it off and we 
replaced it with int_range_max with 255 ranges because the combo of 
int_range_max for calculating and allocation to store seemed to solve 
all the problems with far less allocation overhead, and wasnt 
particularly onerous.


int_range_max use to have a small vector of something like 5 pairs. If  
a range was being created and we needed more by accessing elements 
higher than that, , it would allocate a hunk of memory to be able to 
handle it, plus a little extra buffer space, and point to that instead.  
So it was a dynamically size  int_range_max that managed it 

Re: [PATCH] c++: Fix self-mapping in map_arguments [PR96531, PR97103]

2020-09-18 Thread Patrick Palka via Gcc-patches
On Fri, 18 Sep 2020, Jason Merrill wrote:

> On 9/18/20 4:07 PM, Patrick Palka wrote:
> > With r10-8077 we stopped passing the argified current_template_parms to
> > normalize_constraint_expression from finish_nested_requirement, and
> > instead tweaked map_arguments to perform a self-mapping of parameters
> > when args is NULL.  We're currently not handling parameter packs and
> > BOUND_TEMPLATE_TEMPLATE_PARMs properly during this self-mapping, which
> > leads to ICEs later during satisfaction.
> > 
> > To fix the self-mapping of a parameter pack, this patch makes
> > map_arguments use template_parm_to_arg which already does the right
> > thing for parameter packs.
> > 
> > Before r10-8077, a BOUND_TEMPLATE_TEMPLATE_PARM would get mapped to the
> > corresponding TEMPLATE_TEMPLATE_PARM.  We could restore this behavior in
> > map_arguments, but since a BOUND_TEMPLATE_TEMPLATE_PARM is not really a
> > template parameter it seems better to make keep_template_parm not give
> > us a BOUND_TEMPLATE_TEMPLATE_PARM in the first place.  I think what we
> > actually want is to map the TEMPLATE_TEMPLATE_PARM to itself, so this
> > patch adjusts keep_template_parm to give us the corresponding
> > TEMPLATE_TEMPLATE_PARM of a BOUND_TEMPLATE_TEMPLATE_PARM instead.
> > 
> > Tested on x86_64-pc-linux-gnu, and also tested with the cmcstl2 library.
> > Does this look OK for trunk/10?
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/96531
> > PR c++/97103
> > * constraint.cc (map_arguments): Call template_parm_to_arg
> > appropriately when doing a self-mapping.
> > * pt.c (keep_template_parm): Don't record a
> > BOUND_TEMPLATE_TEMPLATE_PARM, instead record its corresponding
> > TEMPLATE_TEMPLATE_PARM.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR c++/96531
> > PR c++/97103
> > * g++.dg/cpp2a/concepts-ttp2.C: New test.
> > * g++.dg/cpp2a/concepts-variadic1.C: New test.
> > ---
> >   gcc/cp/constraint.cc  | 27 --
> >   gcc/cp/pt.c   |  5 
> >   gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C| 11 
> >   .../g++.dg/cpp2a/concepts-variadic1.C | 28 +++
> >   4 files changed, 62 insertions(+), 9 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-variadic1.C
> > 
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index 0aab3073cc1..43336d191d9 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -546,15 +546,24 @@ static tree
> >   map_arguments (tree parms, tree args)
> >   {
> > for (tree p = parms; p; p = TREE_CHAIN (p))
> > -if (args)
> > -  {
> > -   int level;
> > -   int index;
> > -   template_parm_level_and_index (TREE_VALUE (p), , );
> > -   TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
> > -  }
> > -else
> > -  TREE_PURPOSE (p) = TREE_VALUE (p);
> > +{
> > +  tree parm = TREE_VALUE (p);
> > +  if (args)
> > +   {
> > + int level;
> > + int index;
> > + template_parm_level_and_index (parm, , );
> > + TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
> > +   }
> > +  else
> > +   {
> > + tree tpi = (TYPE_P (parm)
> > + ? TEMPLATE_TYPE_PARM_INDEX (parm) : parm);
> > + TREE_PURPOSE (p)
> > +   = template_parm_to_arg (build_tree_list (NULL_TREE,
> > +TEMPLATE_PARM_DECL
> > (tpi)));
> 
> Doesn't passing 'p' to template_parm_to_arg work?

Unfortunately not, template_parm_to_arg expects a TREE_LIST node whose
TREE_VALUE is the corresponding *_DECL for the template parm (i.e. it
expects node from current_template_parms), and 'p' is a
TEMPLATE_TYPE_PARM, TEMPLATE_TEMPLATE PARM or TEMPLATE_PARM_INDEX (given
to us by find_template_parameters).

Would it be appropriate to adjust template_parm_to_arg to accept the
latter inputs as well?

> 
> > +   }
> > +}
> >   return parms;
> >   }
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index cfe5ff4a94f..55d8060b911 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -10539,6 +10539,11 @@ keep_template_parm (tree t, void* data)
> > if (level > ftpi->max_depth)
> >   return 0;
> >   +  if (TREE_CODE (t) == BOUND_TEMPLATE_TEMPLATE_PARM)
> > +/* A BOUND_TEMPLATE_TEMPLATE_PARM isn't a template parameter.  What we
> > +   really want is the corresponding TEMPLATE_TEMPLATE_PARM.  */
> > +t = TREE_TYPE (TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL (t));
> > +
> > /* Arguments like const T yield parameters like const T. This means that
> >a template-id like X would yield two distinct parameters:
> >T and const T. Adjust types to their unqualified versions.  */
> > diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C
> > b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C
> > new file mode 100644
> > index 000..7f4883754dd
> > --- /dev/null
> > +++ 

Re: [PATCH] generalized range_query class for multiple contexts

2020-09-18 Thread Aldy Hernandez via Gcc-patches

Forgot to include ChangeLog entries.

Aldy
>From 49246a5aa51aff0e1beb97b8415985ffdbd5d922 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Thu, 17 Sep 2020 09:23:12 +0200
Subject: [PATCH 1/3] Initial implementation of value query class.

gcc/ChangeLog:

* Makefile.in: Add value-query.o.
* value-query.cc: New file.
* value-query.h: New file.
---
 gcc/Makefile.in|   1 +
 gcc/value-query.cc | 161 +
 gcc/value-query.h  |  97 +++
 3 files changed, 259 insertions(+)
 create mode 100644 gcc/value-query.cc
 create mode 100644 gcc/value-query.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 79e854aa938..9bc568fa9fd 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1644,6 +1644,7 @@ OBJS = \
 	typed-splay-tree.o \
 	unique-ptr-tests.o \
 	valtrack.o \
+	value-query.o \
 	value-range.o \
 	value-range-equiv.o \
 	value-prof.o \
diff --git a/gcc/value-query.cc b/gcc/value-query.cc
new file mode 100644
index 000..7afa1ae5250
--- /dev/null
+++ b/gcc/value-query.cc
@@ -0,0 +1,161 @@
+/* Support routines for value queries.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Aldy Hernandez  and
+   Andrew MacLeod .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "ssa.h"
+#include "tree-pretty-print.h"
+#include "fold-const.h"
+#include "value-range-equiv.h"
+#include "value-query.h"
+#include "alloc-pool.h"
+
+// value_query default methods.
+
+tree
+value_query::value_on_edge (edge, tree name)
+{
+  return value_of_expr (name);
+}
+
+tree
+value_query::value_of_stmt (gimple *stmt, tree name)
+{
+  if (!name)
+name = gimple_get_lhs (stmt);
+
+  gcc_checking_assert (!name || name == gimple_get_lhs (stmt));
+
+  if (name)
+return value_of_expr (name);
+  return NULL_TREE;
+}
+
+// range_query default methods.
+
+bool
+range_query::range_on_edge (irange , edge, tree name)
+{
+  return range_of_expr (r, name);
+}
+
+bool
+range_query::range_of_stmt (irange , gimple *stmt, tree name)
+{
+  if (!name)
+name = gimple_get_lhs (stmt);
+
+  gcc_checking_assert (!name || name == gimple_get_lhs (stmt));
+
+  if (name)
+return range_of_expr (r, name);
+  return false;
+}
+
+tree
+range_query::value_of_expr (tree name, gimple *stmt)
+{
+  tree t;
+  value_range r;
+
+  if (!irange::supports_type_p (TREE_TYPE (name)))
+return NULL_TREE;
+  if (range_of_expr (r, name, stmt) && r.singleton_p ())
+return t;
+  return NULL_TREE;
+}
+
+tree
+range_query::value_on_edge (edge e, tree name)
+{
+  tree t;
+  value_range r;
+
+  if (!irange::supports_type_p (TREE_TYPE (name)))
+return NULL_TREE;
+  if (range_on_edge (r, e, name) && r.singleton_p ())
+return t;
+  return NULL_TREE;
+
+}
+
+tree
+range_query::value_of_stmt (gimple *stmt, tree name)
+{
+  tree t;
+  value_range r;
+
+  if (!name)
+name = gimple_get_lhs (stmt);
+
+  gcc_checking_assert (!name || name == gimple_get_lhs (stmt));
+
+  if (!name || !irange::supports_type_p (TREE_TYPE (name)))
+return NULL_TREE;
+  if (range_of_stmt (r, stmt, name) && r.singleton_p ())
+return t;
+  return NULL_TREE;
+
+}
+
+// valuation_query support routines for value_range_equiv's.
+
+class equiv_allocator : public object_allocator
+{
+public:
+  equiv_allocator ()
+: object_allocator ("equiv_allocator pool") { }
+};
+
+value_range_equiv *
+range_query::allocate_value_range_equiv ()
+{
+  return new (equiv_alloc->allocate ()) value_range_equiv;
+}
+
+void
+range_query::free_value_range_equiv (value_range_equiv *v)
+{
+  equiv_alloc->remove (v);
+}
+
+const class value_range_equiv *
+range_query::get_value_range (const_tree expr, gimple *stmt)
+{
+  int_range_max r;
+  if (range_of_expr (r, const_cast (expr), stmt))
+return new (equiv_alloc->allocate ()) value_range_equiv (r);
+  return new (equiv_alloc->allocate ()) value_range_equiv (TREE_TYPE (expr));
+}
+
+range_query::range_query ()
+{
+  equiv_alloc = new equiv_allocator;
+}
+
+range_query::~range_query ()
+{
+  equiv_alloc->release ();
+}
diff --git a/gcc/value-query.h b/gcc/value-query.h
new file mode 100644
index 000..a64e37d7596
--- /dev/null
+++ b/gcc/value-query.h
@@ -0,0 +1,97 @@
+/* Support routines for value queries.
+   

Re: [PATCH] irange_pool class

2020-09-18 Thread Martin Sebor via Gcc-patches

On 9/18/20 11:36 AM, Andrew MacLeod wrote:

On 9/18/20 1:07 PM, Martin Sebor wrote:

On 9/18/20 8:10 AM, Aldy Hernandez via Gcc-patches wrote:



On 9/18/20 2:28 PM, David Malcolm wrote:

On Fri, 2020-09-18 at 07:49 +0200, Aldy Hernandez wrote:


On 9/18/20 3:43 AM, David Malcolm wrote:

On Thu, 2020-09-17 at 12:36 +0200, Aldy Hernandez via Gcc-patches
wrote:

This is the irange storage class. It is used to allocate the
minimum
amount of storage needed for a given irange.  Storage is
automatically
freed at destruction.

It is meant for long term storage, as opposed to int_range_max
which
is
meant for intermediate temporary results on the stack.

The general gist is:

irange_pool pool;

// Allocate an irange of 5 sub-ranges.
irange *p = pool.allocate (5);

// Allocate an irange of 3 sub-ranges.
irange *q = pool.allocate (3);

// Allocate an irange with as many sub-ranges as are currently
// used in "some_other_range".
irange *r = pool.allocate (some_other_range);


FWIW my first thoughts reading this example were - "how do I
deallocate
these iranges?" and "do I need to call pool.deallocate on them, or
is
that done for me by the irange dtor?"


Thus the description front and center of the header file:

"Storage is automatically freed at destruction..."


I think of a "pool allocator" as something that makes a small
number of
large allocation under the covers, and then uses that to serve
large
numbers of fixed sized small allocations and deallocations with
O(1)
using a free list.


Ah, I didn't know pool had a different meaning.


See e.g. gcc/alloc-pool.h



[...]


+// This is the irange storage class.  It is used to allocate the
+// minimum amount of storage needed for a given irange.  Storage
is
+// automatically freed at destruction.


"at destruction" of what object - the irange or the irange_pool?
Reading the code, it turns out to be "at destruction of the
irange_pool", and it turns out that irange_pool is an obstack under
the
covers (also called a "bump allocator") and thus, I believe, the
lifetime of the irange instances is that of the storage instance.


The sentence is talking about the storage class, so I thought it was
obvious that the destruction we talk about is the storage class
itself.
I suppose if it isn't clear we could say:

"Storage is automatically freed at destruction of the storage class."




I think it would be clearer to name this "irange_obstack", or
somesuch.


I'd prefer something more generic.  We don't want to tie the name of
the
allocator to the underlying implementation.  What if we later change
to
malloc?  We'd have to change the name to irange_malloc.



irange_allocator?  Or is there something more generically appropriate
here?


How about "irange_bump_allocator?"   Rather long, but it expresses the
key point that the irange instances coming out of it don't have
independent lifetimes - their lifetimes are those of the allocator; the
client code doesn't need to find and clean up all of those individual
iranges, right?  (assuming I'm reading the code correctly) (That name
also avoids mentioning the implementation detail that it uses obstack).


I'm sorry, but that's _really_ ugly.

Surely irange_allocator can't be that confusing.  A casual look at 
the header file would dispel all doubts.


David's point abut different strategies was also in the back my
mind but it took me a bit to formulate a question about the design.
Is a pool of ranges with a fixed allocation policy the right design
for long-term storage of irange objects?  What are some example use
cases?

Here's one that comes to mind based on what I want to do in
gimple-ssa-isolate-paths.c.  I need to store irange instances as
members of my own class, but I don't know how many subranges each
instance might need to store (it depends on the IL).  I store
objects of this class a container (e.g., hash_map or set).
I don't want to use int_range_max because that would be wasteful
but I can't use the pool as a proxy because it's not copyable.
So I either have to dynamically allocate the pool and store
a pointer to it instead, in addition to the instances, or derive
my own class from irange and manage the tree arrays myself.  In
both cases I'm adding a layer of memory management on top of what
that the pool is there to provide.  So the design doesn't seem
very well suited for my use case.

I don't mean this as an objection to the patch (I'm sure there's
a use case I'm not thinking of), but more as a question.

Martin



I don't know why  this is confusing...

it works exactly like one would expect a simple allocator to work.. as 
long as the allcoator is "live", its allocations are live.  once it is 
destructed, all the memory it manages is freed..    It purpose is to 
avoid having to walk/find everything that was allocated so it can be freed.


I'll give you the use case from the ranger. in fact, it *is* the 
ranger's allocator, exposed for other passes to use.


Ranger uses the allocator to 

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-18 Thread Qing Zhao via Gcc-patches
Hi, Richard,

During my implementation of the new version of the patch. I still feel that 
it’s not practical to add a default definition in the middle end to just use 
move patterns to zero each selected register. 

The major issues are:

There are some target specific information on how to define “general register” 
set and “all register” set,  we have to add a new specific target hook to get 
such target specific information and pass to middle-end. 


For example, on X86, for CALL_USED_REGISTERS, we have:

#define CALL_USED_REGISTERS \
/*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7*/  \
{  1, 1, 1, 0, 4, 4, 0, 1, 1,  1,  1,  1,  1,  1,  1,  1,   \
/*arg,flags,fpsr,frame*/\
1,   1,1,1, \
/*xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7*/ \
 1,   1,   1,   1,   1,   1,   6,   6,  \
/* mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7*/   


From the above, we can see “st0 to st7” are call_used_registers for x86, 
however, we should not zero these registers on x86. 

Such details is only known by x86 backend. 

I guess that other platforms might have similar issue. 

If we still want  a default definition in middle end to generate the zeroing 
insn for selected registers, I have to add another target hook, say, 
“ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)” to check whether a register should be 
zeroed based on gpr_only (general register only)  and target specific decision. 
  I will provide a x86 implementation for this target hook in this patch. 

Other targets have to implement this new target hook to utilize the default 
handler. 

Let me know your opinion:

A.  Will not provide default definition in middle end to generate the zeroing 
insn for selected registers.  Move the generation work all to target; X86 
implementation will be provided;

OR:

B.  Will provide a default definition in middle end to generate the zeroing 
insn for selected registers. Then need to add a new target hook 
“ZERO_CALL_USED_REGNO_P(REGNO, GPR_ONLY)”, same as A, X86 implementation will 
be provided in my patch. 


thanks.

Qing


> On Sep 11, 2020, at 4:44 PM, Richard Sandiford  
> wrote:
> 
> Having a target hook sounds good, but I think it should have a
> default definition that just uses the move patterns to zero each
> selected register.  I expect the default will be good enough for
> most targets.
> 
> Thanks,
> Richard



Re: [PATCH] c++: Fix self-mapping in map_arguments [PR96531, PR97103]

2020-09-18 Thread Jason Merrill via Gcc-patches

On 9/18/20 4:07 PM, Patrick Palka wrote:

With r10-8077 we stopped passing the argified current_template_parms to
normalize_constraint_expression from finish_nested_requirement, and
instead tweaked map_arguments to perform a self-mapping of parameters
when args is NULL.  We're currently not handling parameter packs and
BOUND_TEMPLATE_TEMPLATE_PARMs properly during this self-mapping, which
leads to ICEs later during satisfaction.

To fix the self-mapping of a parameter pack, this patch makes
map_arguments use template_parm_to_arg which already does the right
thing for parameter packs.

Before r10-8077, a BOUND_TEMPLATE_TEMPLATE_PARM would get mapped to the
corresponding TEMPLATE_TEMPLATE_PARM.  We could restore this behavior in
map_arguments, but since a BOUND_TEMPLATE_TEMPLATE_PARM is not really a
template parameter it seems better to make keep_template_parm not give
us a BOUND_TEMPLATE_TEMPLATE_PARM in the first place.  I think what we
actually want is to map the TEMPLATE_TEMPLATE_PARM to itself, so this
patch adjusts keep_template_parm to give us the corresponding
TEMPLATE_TEMPLATE_PARM of a BOUND_TEMPLATE_TEMPLATE_PARM instead.

Tested on x86_64-pc-linux-gnu, and also tested with the cmcstl2 library.
Does this look OK for trunk/10?

gcc/cp/ChangeLog:

PR c++/96531
PR c++/97103
* constraint.cc (map_arguments): Call template_parm_to_arg
appropriately when doing a self-mapping.
* pt.c (keep_template_parm): Don't record a
BOUND_TEMPLATE_TEMPLATE_PARM, instead record its corresponding
TEMPLATE_TEMPLATE_PARM.

gcc/testsuite/ChangeLog:

PR c++/96531
PR c++/97103
* g++.dg/cpp2a/concepts-ttp2.C: New test.
* g++.dg/cpp2a/concepts-variadic1.C: New test.
---
  gcc/cp/constraint.cc  | 27 --
  gcc/cp/pt.c   |  5 
  gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C| 11 
  .../g++.dg/cpp2a/concepts-variadic1.C | 28 +++
  4 files changed, 62 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-variadic1.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 0aab3073cc1..43336d191d9 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -546,15 +546,24 @@ static tree
  map_arguments (tree parms, tree args)
  {
for (tree p = parms; p; p = TREE_CHAIN (p))
-if (args)
-  {
-   int level;
-   int index;
-   template_parm_level_and_index (TREE_VALUE (p), , );
-   TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
-  }
-else
-  TREE_PURPOSE (p) = TREE_VALUE (p);
+{
+  tree parm = TREE_VALUE (p);
+  if (args)
+   {
+ int level;
+ int index;
+ template_parm_level_and_index (parm, , );
+ TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
+   }
+  else
+   {
+ tree tpi = (TYPE_P (parm)
+ ? TEMPLATE_TYPE_PARM_INDEX (parm) : parm);
+ TREE_PURPOSE (p)
+   = template_parm_to_arg (build_tree_list (NULL_TREE,
+TEMPLATE_PARM_DECL (tpi)));


Doesn't passing 'p' to template_parm_to_arg work?


+   }
+}
  
return parms;

  }
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index cfe5ff4a94f..55d8060b911 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10539,6 +10539,11 @@ keep_template_parm (tree t, void* data)
if (level > ftpi->max_depth)
  return 0;
  
+  if (TREE_CODE (t) == BOUND_TEMPLATE_TEMPLATE_PARM)

+/* A BOUND_TEMPLATE_TEMPLATE_PARM isn't a template parameter.  What we
+   really want is the corresponding TEMPLATE_TEMPLATE_PARM.  */
+t = TREE_TYPE (TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL (t));
+
/* Arguments like const T yield parameters like const T. This means that
   a template-id like X would yield two distinct parameters:
   T and const T. Adjust types to their unqualified versions.  */
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C
new file mode 100644
index 000..7f4883754dd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C
@@ -0,0 +1,11 @@
+// PR c++/97103
+// { dg-do compile { target c++20 } }
+
+template
+class quantity {};
+
+template typename Q>
+inline constexpr bool valid_template_arguments = requires {
+  requires requires { typename Q; };
+};
+static_assert(valid_template_arguments);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-variadic1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-variadic1.C
new file mode 100644
index 000..deab028ca3c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-variadic1.C
@@ -0,0 +1,28 @@
+// PR c++/96531
+// { dg-do compile { target c++20 } }
+
+template
+concept is_bool = __is_same(bool, T);
+
+template 
+concept C = requires {
+  requires (is_bool || ...);
+};
+
+template 

Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-18 Thread Segher Boessenkool
Hi!

On Fri, Sep 18, 2020 at 01:17:41AM -0500, Xiong Hu Luo wrote:
> vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
> to be insert, arg2 is the place to insert arg1 to arg0.  Current expander
> generates stxv+stwx+lxv if arg2 is variable instead of constant, which
> causes serious store hit load performance issue on Power.  This patch tries
>  1) Build VIEW_CONVERT_EXPR for vec_insert (i, v, n) like v[n&3] = i to
> unify the gimple code, then expander could use vec_set_optab to expand.
>  2) Expand the IFN VEC_SET to fast instructions: lvsl+xxperm+xxsel.
> In this way, "vec_insert (i, v, n)" and "v[n&3] = i" won't be expanded too
> early in gimple stage if arg2 is variable, avoid generating store hit load
> instructions.

Sounds good.

> For Power9 V4SI:
>   addi 9,1,-16
>   rldic 6,6,2,60
>   stxv 34,-16(1)
>   stwx 5,9,6
>   lxv 34,-16(1)
> =>
>   addis 9,2,.LC0@toc@ha
>   addi 9,9,.LC0@toc@l
>   mtvsrwz 33,5
>   lxv 32,0(9)
>   sradi 9,6,2
>   addze 9,9

These last two insns are a signed modulo.  That seems wrong.

>   sldi 9,9,2
>   subf 9,9,6
>   subfic 9,9,3
>   sldi 9,9,2
>   subfic 9,9,20

This should be optimised quite a bit.  Why isn't it?  Is this -O0
output?  That isn't interesting, please show -O2 instead.

>   lvsl 13,0,9
>   xxperm 33,33,45
>   xxperm 32,32,45
>   xxsel 34,34,33,32

It feels like this can be done in fewer insns.  Hrm.  Since you already
rotate one of the data and the mask, does it work better if you rotate
to some known position and then use vinsertw (and then rotate again)?

> Though instructions increase from 5 to 15, the performance is improved
> 60% in typical cases.

Yeah, great :-)

> Tested with V2DI, V2DF V4SI, V4SF, V8HI, V16QI

That is what the new testcases do, right?

>   * config/rs6000/altivec.md (altivec_lvsl_reg_2): Rename to
>(altivec_lvsl_reg_2) and extend to SDI mode.

(wrong leading space)

You say the same thing is renamed to the same thing (and not how we
normally do that).

"extend to SDI mode"...  Please rephrase, "extend" usually means sign-
or zero-extension in RTL.  But, see below.

>   * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>   Ajdust variable index vec_insert to VIEW_CONVERT_EXPR.

"Adjust", but this changelog entry does not tell me what the change is.

>   * config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var):
>   New declare.

(declaration)

>   * config/rs6000/vsx.md: Call gen_altivec_lvsl_reg_di2.

>From some specific define_something, surely?

> -(define_insn "altivec_lvsl_reg"
> +(define_insn "altivec_lvsl_reg_2"
>[(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
>   (unspec:V16QI
> - [(match_operand:DI 1 "gpc_reg_operand" "b")]
> + [(match_operand:SDI 1 "gpc_reg_operand" "b")]
>   UNSPEC_LVSL_REG))]
>"TARGET_ALTIVEC"
>"lvsl %0,0,%1"

"DI" is wrong, yes; but so is SDI.  It should just be P here, afaics?
SI on -m32, DI on -m64.

> +/* Insert value from VEC into idx of TARGET.  */

What does idx count?  Bytes?  Things the same size as "value"?

What does "value" mean, anyway?  Do you mean "VAL" (we use upercase for
all parameter names; "IDX" as well).

> +void
> +rs6000_expand_vector_set_var (rtx target, rtx vec, rtx val, rtx idx)
> +{
> +  machine_mode mode = GET_MODE (vec);
> +
> +  if (VECTOR_MEM_VSX_P (mode) && CONST_INT_P (idx))
> +gcc_unreachable ();
> +  else if (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx)
> +&& TARGET_DIRECT_MOVE_64BIT)

You already handled the CONST_INT_P case a line before, CONST_INT_P is
always true here.

This actually does *nothing* if this if() isn't true, which cannot be
right.

Maybe you want to replace these four lines with just
  gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));

and handle !TARGET_DIRECT_MOVE_64BIT some way as well (maybe this cannot
be called then; assert it as well, then, or at least document it is
required).

> +   if (!BYTES_BIG_ENDIAN)
> + {
> +   /*  idx = 1 - idx.  */
> +   emit_insn (gen_subsi3 (tmp, GEN_INT (1), idx));
> +   /*  idx = idx * 8.  */
> +   emit_insn (gen_ashlsi3 (tmp, tmp, GEN_INT (3)));
> +   /*  idx = 16 - idx.  */
> +   emit_insn (gen_subsi3 (tmp, GEN_INT (16), tmp));
> + }

Maybe combine this here already?  16 - 8*(1 - idx) is the same as
8 + 8*idx.  Which might not be correct at all?

(GEN_INT (1) is const1_rtx, btw.)

> +  else if (GET_MODE_SIZE (inner_mode) == 4)
> + {
> +   if (!BYTES_BIG_ENDIAN)
> + {
> +   /*  idx = 3 - idx.  */
> +   emit_insn (gen_subsi3 (tmp, GEN_INT (3), idx));
> +   /*  idx = idx * 4.  */
> +   emit_insn (gen_ashlsi3 (tmp, tmp, GEN_INT (2)));
> +   /*  idx = 20 - idx.  */
> +   emit_insn (gen_subsi3 (tmp, GEN_INT (20), tmp));
> + }
> +   else
> +   {
> +  

[PATCH] c++: Fix self-mapping in map_arguments [PR96531, PR97103]

2020-09-18 Thread Patrick Palka via Gcc-patches
With r10-8077 we stopped passing the argified current_template_parms to
normalize_constraint_expression from finish_nested_requirement, and
instead tweaked map_arguments to perform a self-mapping of parameters
when args is NULL.  We're currently not handling parameter packs and
BOUND_TEMPLATE_TEMPLATE_PARMs properly during this self-mapping, which
leads to ICEs later during satisfaction.

To fix the self-mapping of a parameter pack, this patch makes
map_arguments use template_parm_to_arg which already does the right
thing for parameter packs.

Before r10-8077, a BOUND_TEMPLATE_TEMPLATE_PARM would get mapped to the
corresponding TEMPLATE_TEMPLATE_PARM.  We could restore this behavior in
map_arguments, but since a BOUND_TEMPLATE_TEMPLATE_PARM is not really a
template parameter it seems better to make keep_template_parm not give
us a BOUND_TEMPLATE_TEMPLATE_PARM in the first place.  I think what we
actually want is to map the TEMPLATE_TEMPLATE_PARM to itself, so this
patch adjusts keep_template_parm to give us the corresponding
TEMPLATE_TEMPLATE_PARM of a BOUND_TEMPLATE_TEMPLATE_PARM instead.

Tested on x86_64-pc-linux-gnu, and also tested with the cmcstl2 library.
Does this look OK for trunk/10?

gcc/cp/ChangeLog:

PR c++/96531
PR c++/97103
* constraint.cc (map_arguments): Call template_parm_to_arg
appropriately when doing a self-mapping.
* pt.c (keep_template_parm): Don't record a
BOUND_TEMPLATE_TEMPLATE_PARM, instead record its corresponding
TEMPLATE_TEMPLATE_PARM.

gcc/testsuite/ChangeLog:

PR c++/96531
PR c++/97103
* g++.dg/cpp2a/concepts-ttp2.C: New test.
* g++.dg/cpp2a/concepts-variadic1.C: New test.
---
 gcc/cp/constraint.cc  | 27 --
 gcc/cp/pt.c   |  5 
 gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C| 11 
 .../g++.dg/cpp2a/concepts-variadic1.C | 28 +++
 4 files changed, 62 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-variadic1.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 0aab3073cc1..43336d191d9 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -546,15 +546,24 @@ static tree
 map_arguments (tree parms, tree args)
 {
   for (tree p = parms; p; p = TREE_CHAIN (p))
-if (args)
-  {
-   int level;
-   int index;
-   template_parm_level_and_index (TREE_VALUE (p), , );
-   TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
-  }
-else
-  TREE_PURPOSE (p) = TREE_VALUE (p);
+{
+  tree parm = TREE_VALUE (p);
+  if (args)
+   {
+ int level;
+ int index;
+ template_parm_level_and_index (parm, , );
+ TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
+   }
+  else
+   {
+ tree tpi = (TYPE_P (parm)
+ ? TEMPLATE_TYPE_PARM_INDEX (parm) : parm);
+ TREE_PURPOSE (p)
+   = template_parm_to_arg (build_tree_list (NULL_TREE,
+TEMPLATE_PARM_DECL (tpi)));
+   }
+}
 
   return parms;
 }
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index cfe5ff4a94f..55d8060b911 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10539,6 +10539,11 @@ keep_template_parm (tree t, void* data)
   if (level > ftpi->max_depth)
 return 0;
 
+  if (TREE_CODE (t) == BOUND_TEMPLATE_TEMPLATE_PARM)
+/* A BOUND_TEMPLATE_TEMPLATE_PARM isn't a template parameter.  What we
+   really want is the corresponding TEMPLATE_TEMPLATE_PARM.  */
+t = TREE_TYPE (TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL (t));
+
   /* Arguments like const T yield parameters like const T. This means that
  a template-id like X would yield two distinct parameters:
  T and const T. Adjust types to their unqualified versions.  */
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C
new file mode 100644
index 000..7f4883754dd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp2.C
@@ -0,0 +1,11 @@
+// PR c++/97103
+// { dg-do compile { target c++20 } }
+
+template
+class quantity {};
+
+template typename Q>
+inline constexpr bool valid_template_arguments = requires {
+  requires requires { typename Q; };
+};
+static_assert(valid_template_arguments);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-variadic1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-variadic1.C
new file mode 100644
index 000..deab028ca3c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-variadic1.C
@@ -0,0 +1,28 @@
+// PR c++/96531
+// { dg-do compile { target c++20 } }
+
+template
+concept is_bool = __is_same(bool, T);
+
+template 
+concept C = requires {
+  requires (is_bool || ...);
+};
+
+template 
+concept D = requires {
+  requires (Bs || ...);
+};
+
+template 
+requires C
+void bar() {}
+
+template 
+requires D

Re: [PATCH] c++: Fix P0846 (ADL and function templates) in template [PR97010]

2020-09-18 Thread Marek Polacek via Gcc-patches
Ping.

On Thu, Sep 10, 2020 at 06:15:24PM -0400, Marek Polacek via Gcc-patches wrote:
> To quickly recap, P0846 says that a name is also considered to refer to
> a template if it is an unqualified-id followed by a < and name lookup
> finds either one or more functions or finds nothing.
> 
> In a template, when parsing a function call that has type-dependent
> arguments, we can't perform ADL right away so we set KOENIG_LOOKUP_P in
> the call to remember to do it when instantiating the call
> (tsubst_copy_and_build/CALL_EXPR).  When the called function is a
> function template, we represent the call with a TEMPLATE_ID_EXPR;
> usually the operand is an OVERLOAD.
> 
> In the P0846 case though, the operand can be an IDENTIFIER_NODE, when
> name lookup found nothing when parsing the template name.  But we
> weren't handling this correctly in tsubst_copy_and_build.  First
> we need to pass the FUNCTION_P argument from  to
> , otherwise we give a bogus error.  And then in
>  we need to perform ADL.  The rest of the changes is to
> give better errors when ADL didn't find anything.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> I think I'd like to backport to 10 too.
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/97010
>   * pt.c (tsubst_copy_and_build) : Call
>   tsubst_copy_and_build explicitly instead of using the RECUR macro.
>   Handle a TEMPLATE_ID_EXPR with an IDENTIFIER_NODE as its operand.
>   : Perform ADL for a TEMPLATE_ID_EXPR with an
>   IDENTIFIER_NODE as its operand.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/97010
>   * g++.dg/cpp2a/fn-template21.C: New test.
>   * g++.dg/cpp2a/fn-template22.C: New test.
> ---
>  gcc/cp/pt.c| 37 --
>  gcc/testsuite/g++.dg/cpp2a/fn-template21.C | 24 ++
>  gcc/testsuite/g++.dg/cpp2a/fn-template22.C | 25 +++
>  3 files changed, 77 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/fn-template21.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/fn-template22.C
> 
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index 30c6735dede..566e24f9bf3 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -19241,7 +19241,8 @@ out:
>  }
>  
>  /* Like tsubst but deals with expressions and performs semantic
> -   analysis.  FUNCTION_P is true if T is the "F" in "F (ARGS)".  */
> +   analysis.  FUNCTION_P is true if T is the "F" in "F (ARGS)" or
> +   "F (ARGS)".  */
>  
>  tree
>  tsubst_copy_and_build (tree t,
> @@ -19323,7 +19324,10 @@ tsubst_copy_and_build (tree t,
>  case TEMPLATE_ID_EXPR:
>{
>   tree object;
> - tree templ = RECUR (TREE_OPERAND (t, 0));
> + tree templ = tsubst_copy_and_build (TREE_OPERAND (t, 0), args,
> + complain, in_decl,
> + function_p,
> + integral_constant_expression_p);
>   tree targs = TREE_OPERAND (t, 1);
>  
>   if (targs)
> @@ -19370,13 +19374,21 @@ tsubst_copy_and_build (tree t,
> }
>   else
> object = NULL_TREE;
> - templ = lookup_template_function (templ, targs);
> +
> + tree tid = lookup_template_function (templ, targs);
>  
>   if (object)
> -   RETURN (build3 (COMPONENT_REF, TREE_TYPE (templ),
> -  object, templ, NULL_TREE));
> +   RETURN (build3 (COMPONENT_REF, TREE_TYPE (tid),
> +  object, tid, NULL_TREE));
> + else if (identifier_p (templ))
> +   {
> + /* C++20 P0846: we can encounter an IDENTIFIER_NODE here when
> +name lookup found nothing when parsing the template name.  */
> + gcc_assert (cxx_dialect >= cxx20 || seen_error ());
> + RETURN (tid);
> +   }
>   else
> -   RETURN (baselink_for_fns (templ));
> +   RETURN (baselink_for_fns (tid));
>}
>  
>  case INDIRECT_REF:
> @@ -19967,14 +19979,17 @@ tsubst_copy_and_build (tree t,
>  
>   /* We do not perform argument-dependent lookup if normal
>  lookup finds a non-function, in accordance with the
> -expected resolution of DR 218.  */
> +resolution of DR 218.  */
>   if (koenig_p
>   && ((is_overloaded_fn (function)
>/* If lookup found a member function, the Koenig lookup is
>   not appropriate, even if an unqualified-name was used
>   to denote the function.  */
>&& !DECL_FUNCTION_MEMBER_P (get_first_fn (function)))
> - || identifier_p (function))
> + || identifier_p (function)
> + /* C++20 P0846: Lookup found nothing.  */
> + || (TREE_CODE (function) == TEMPLATE_ID_EXPR
> + && identifier_p (TREE_OPERAND (function, 0
>   /* Only do this when substitution turns a dependent call
>  into a non-dependent call.  */
>   && type_dependent_expression_p_push (t)

Re: [PATCH] c++: std::is_constant_evaluated inside a constraint [PR97051]

2020-09-18 Thread Jason Merrill via Gcc-patches

On 9/17/20 12:36 PM, Patrick Palka wrote:

According to [expr.const]/14, the result of substitution into an atomic
constraint is manifestly constant-evaluated; this patch adjusts the call
to maybe_constant_value in satisfy_atom to that effect.

Tested on x86_64-pc-linux-gnu, and also tested on the cmcstl2 and
range-v3 libraries.  Does this look OK to commit?


OK.


gcc/cp/ChangeLog:

PR c++/97051
* constraint.cc (satisfy_atom): Pass true as the
manifestly_const_eval argument to maybe_constant_value.

gcc/testsuite/ChangeLog:

PR c++/97051
* g++.dg/cpp2a/is-constant-evaluated11.C: New test.
---
  gcc/cp/constraint.cc |  3 ++-
  .../g++.dg/cpp2a/is-constant-evaluated11.C   | 16 
  2 files changed, 18 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/is-constant-evaluated11.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 0aab3073cc1..8137df59e37 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2643,7 +2643,8 @@ satisfy_atom (tree t, tree args, subst_info info)
  result = cxx_constant_value (result);
else
  {
-  result = maybe_constant_value (result);
+  result = maybe_constant_value (result, NULL_TREE,
+/*manifestly_const_eval=*/true);
if (!TREE_CONSTANT (result))
result = error_mark_node;
  }
diff --git a/gcc/testsuite/g++.dg/cpp2a/is-constant-evaluated11.C 
b/gcc/testsuite/g++.dg/cpp2a/is-constant-evaluated11.C
new file mode 100644
index 000..a31867f74fb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/is-constant-evaluated11.C
@@ -0,0 +1,16 @@
+// PR c++/97051
+// { dg-do compile { target c++20 } }
+
+namespace std {
+  constexpr inline bool
+  is_constant_evaluated () noexcept
+  {
+return __builtin_is_constant_evaluated ();
+  }
+}
+
+template
+  requires (std::is_constant_evaluated())
+constexpr int a = 0;
+
+constexpr int b = a;





[PATCH] generalized range_query class for multiple contexts

2020-09-18 Thread Aldy Hernandez via Gcc-patches
As part of the ranger work, we have been trying to clean up and 
generalize interfaces whenever possible.  This not only helps in 
reducing the maintenance burden going forward, but provides mechanisms 
for backwards compatibility between ranger and other providers/users of 
ranges throughout the compiler like evrp and VRP.


One such interface is the range_query class in vr_values.h, which 
provides a range query mechanism for use in the simplify_using_ranges 
module.  With it, simplify_using_ranges can be used with the ranger, or 
the VRP twins by providing a get_value_range() method.  This has helped 
us in comparing apples to apples while doing our work, and has also 
future proofed the interface so that asking for a range can be done 
within the context in which it appeared.  For example, get_value_range 
now takes a gimple statement which provides context.  We are no longer 
tied to asking for a global SSA range, but can ask for the range of an 
SSA within a statement.  Granted, this functionality is currently only 
in the ranger, but evrp/vrp could be adapted to pass such context.


The range_query is a good first step, but what we really want is a 
generic query mechanism that can ask for SSA ranges within an 
expression, a statement, an edge, or anything else that may come up.  We 
think that a generic mechanism can be used not only for range producers, 
but consumers such as the substitute_and_fold_engine (see get_value 
virtual) and possibly the gimple folder (see valueize).


The attached patchset provides such an interface.  It is meant to be a 
replacement for range_query that can be used for vr_values, 
substitute_and_fold, the subsitute_and_fold_engine, as well as the 
ranger.  The general API is:


class value_query
{
public:
  // Return the singleton expression for NAME at a gimple statement,
  // or NULL if none found.
  virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
  // Return the singleton expression for NAME at an edge, or NULL if
  // none found.
  virtual tree value_on_edge (edge, tree name);
  // Return the singleton expression for the LHS of a gimple
  // statement, assuming an (optional) initial value of NAME.  Returns
  // NULL if none found.
  //
  // Note this method calculates the range the LHS would have *after*
  // the statement has executed.
  virtual tree value_of_stmt (gimple *, tree name = NULL);
};

class range_query : public value_query
{
public:
  range_query ();
  virtual ~range_query ();

  virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
  virtual tree value_on_edge (edge, tree name) OVERRIDE;
  virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;

  // These are the range equivalents of the value_* methods.  Instead
  // of returning a singleton, they calculate a range and return it in
  // R.  TRUE is returned on success or FALSE if no range was found.
  virtual bool range_of_expr (irange , tree name, gimple * = NULL) = 0;
  virtual bool range_on_edge (irange , edge, tree name);
  virtual bool range_of_stmt (irange , gimple *, tree name = NULL);

  // DEPRECATED: This method is used from vr-values.  The plan is to
  // rewrite all uses of it to the above API.
  virtual const class value_range_equiv *get_value_range (const_tree,
  gimple * = NULL);
};

The duality of the API (value_of_* and range_on_*) is because some 
passes are interested in a singleton value 
(substitute_and_fold_enginge), while others are interested in ranges 
(vr_values).  Passes that are only interested in singletons can take a 
value_query, while passes that are interested in full ranges, can take a 
range_query.  Of course, for future proofing, we would recommend taking 
a range_query, since if you provide a default range_of_expr, sensible 
defaults will be provided for the others in terms of range_of_expr.


Note, that the absolute bare minimum that must be provided is a 
value_of_expr and a range_of_expr respectively.


One piece of the API which is missing is a method  to return the range 
of an arbitrary SSA_NAME *after* a statement.  Currently range_of_expr 
calculates the range of an expression upon entry to the statement, 
whereas range_of_stmt calculates the range of *only* the LHS of a 
statement AFTER the statement has executed.


This would allow for complete representation of the ranges/values in 
something like:


d_4 = *g_7;

Here the range of g_7 upon entry could be VARYING, but after the 
dereference we know it must be non-zero.  Well for sane targets anyhow.


Choices would be to:

  1) add a 4th method such as "range_after_stmt", or

  2) merge that functionality with the existing range_of_stmt method to 
provide "after" functionality for any ssa_name.  Currently the SSA_NAME 
must be the same as the LHS if specified.  It also does not need to be 
specified to allow evaluation of statements without a LHS, (if (x < 1) 
has no LHS, but will return [0,0] [1,1] or [0,1] 

Re: [PATCH v2 1/2] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR

2020-09-18 Thread Segher Boessenkool
On Fri, Sep 18, 2020 at 01:17:40AM -0500, Xiong Hu Luo wrote:
> This patch enables transformation from ARRAY_REF(VIEW_CONVERT_EXPR) to
> VEC_SET internal function in gimple-isel pass if target supports
> vec_set with variable index by checking can_vec_set_var_idx_p.

> +  tree_code code = TREE_CODE (TREE_TYPE(view_op0));
  ^
Missing space here ---/

Thanks,


Segher


Re: [PATCH] vect/test: Don't check for epilogue loop [PR97075]

2020-09-18 Thread Richard Sandiford
Thanks for looking at this.

"Kewen.Lin"  writes:
> Hi,
>
> The commit r11-3230 brings a nice improvement to use full
> vectors instead of partial vectors when available.  But
> it caused some vector with length test cases to fail on
> Power.
>
> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
> exposed one issue that: we call function 
> vect_need_peeling_or_partial_vectors_p in function
> vect_analyze_loop_2, since it's in analysis phase, for
> the epilogue loop, we could get the wrong information like
> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
> answer for vect_need_peeling_or_partial_vectors_p.
> For the epilogue loop in this failure specific, the niter
> that we get is 58 (should be 1), vf is 2.
>
> For epilogue loop with partial vectors, it would use the
> same VF as the main loop, so it won't be able to use full
> vector, this patch is to exclude epilogue loop for the
> check vect_need_peeling_or_partial_vectors_p in
> vect_analyze_loop_2.

Hmm.  For better or worse, I think the analysis phase is actually
operating on the assumption that the vector code needs to handle
all scalar iterations, even in the epilogue case.  We actually
rely on that to implement VECT_COMPARE_COSTS (although it wasn't
the original motivation for handling epilogues this way).

Perhaps we should expand the functionality (and name)
of determine_peel_for_niter so that it also computes
LOOP_VINFO_USING_PARTIAL_VECTORS_P.  We'd then recompute
that flag once we know the final epilogue scalar iteration count,
just like we recompute LOOP_VINFO_PEELING_FOR_NITER.

As a sanity check, I think it would also be good for the
renamed function to do the following parts of vect_analyze_loop_2:

  /* If epilog loop is required because of data accesses with gaps,
 one additional iteration needs to be peeled.  Check if there is
 enough iterations for vectorization.  */
  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
  && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
  && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
{
  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
  tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);

  if (known_lt (wi::to_widest (scalar_niters), vf))
return opt_result::failure_at (vect_location,
   "loop has no enough iterations to"
   " support peeling for gaps.\n");
}

  /* If we're vectorizing an epilogue loop, the vectorized loop either needs
 to be able to handle fewer than VF scalars, or needs to have a lower VF
 than the main loop.  */
  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
  && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
  && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
   LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
return opt_result::failure_at (vect_location,
   "Vectorization factor too high for"
   " epilogue loop.\n");

and then assert that no failure occurs when called for epilogues
from vect_do_peeling.  So the function would be doing three things:

- compute LOOP_VINFO_USING_PARTIAL_VECTORS_P, using the current code
  in vect_analyze_loop_2

- perform the checks quoted above

- what the function currently does

That's all speculation though -- I haven't tried any of this.

Andrea, how should we handle this?  Is it something you'd have time to
look at?

Thanks,
Richard

>
> The failure on gcc.target/powerpc/p9-vec-length-full-6.c
> is just a test issue, the 64bit/32bit pairs are able to
> use full vector, fixed in the patch accordingly.
>
> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
>
> Is it OK for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
>   * tree-vect-loop.c (vect_analyze_loop_2): Don't check
>   vect_need_peeling_or_partial_vectors_p for the epilogue loop.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/p9-vec-length-full-6.c: Adjust.
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c 
> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> index cfae9bbc927..5d2357aabfa 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> @@ -9,8 +9,7 @@
>  #include "p9-vec-length-6.h"
>  
>  /* It can use normal vector load for constant vector load.  */
> -/* { dg-final { scan-assembler-not   {\mstxv\M} } } */
> -/* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
> -/* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 16 } } */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 16 } } */
> +/* { dg-final { scan-assembler-times {\mstxvx?\M} 6 } } */
> +/* 64bit/32bit pairs won't use partial vectors.  */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> diff --git a/gcc/tree-vect-loop.c 

Re: [RS6000] rs6000_rtx_costs reduce cost for SETs

2020-09-18 Thread Segher Boessenkool
On Fri, Sep 18, 2020 at 01:08:42PM +0930, Alan Modra wrote:
> On Thu, Sep 17, 2020 at 12:51:25PM -0500, Segher Boessenkool wrote:
> > > -  if (CONST_INT_P (XEXP (x, 1))
> > > -   && satisfies_constraint_I (XEXP (x, 1)))
> > > +  if (!speed)
> > > + /* A little more than one insn so that nothing is tempted to
> > > +turn a shift left into a multiply.  */
> > > + *total = COSTS_N_INSNS (1) + 1;
> > 
> > Please don't.  We have a lot of code elsewhere to handle this directly,
> > already.  Also, this is just wrong for size.  Five shifts is *not*
> > better than four muls.  If that is the only way to get good results,
> > than unfortunately we probably have to; but do not do this without any
> > proof.
> 
> Huh.  If a cost of 5 is "just wrong for size" then you prefer a cost
> of 12 for example (power9 mulsi or muldi rs6000_cost)?  Noticing that
> result for !speed rs6000_rtx_costs is the entire basis for the !speed
> changes.  I don't have any proof that this is correct.

No, just 4, like all other insns -- it is the size of the insn, after
all!  Not accidentally using the speed cost is a fine change of course,
but the + 1 isn't based on anything afaics, so it can only hurt.

> > > -   if (GET_MODE (XEXP (x, 1)) == DImode)
> > > +   if (!speed)
> > > + *total = COSTS_N_INSNS (1) + 2;
> > > +   else if (GET_MODE (XEXP (x, 1)) == DImode)
> > >   *total = rs6000_cost->divdi;
> > > else
> > >   *total = rs6000_cost->divsi;
> > 
> > (more)
> 
> OK, I can remove all the !speed changes.

No, those are quite okay (as a separate patch though).  But you
shouldn't add random numbers to it, one insn is just one insn.  The cost
function for -O2 is just code size, nothing more, nothing less.

> > >  case AND:
> > > +  *total = COSTS_N_INSNS (1);
> > >right = XEXP (x, 1);
> > >if (CONST_INT_P (right))
> > >   {
> > > @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> > > outer_code,
> > >  || left_code == LSHIFTRT)
> > > && rs6000_is_valid_shift_mask (right, left, mode))
> > >   {
> > > -   *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > > -   if (!CONST_INT_P (XEXP (left, 1)))
> > > - *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, 
> > > speed);
> > > -   *total += COSTS_N_INSNS (1);
> > > +   rtx reg_op = XEXP (left, 0);
> > > +   if (!REG_P (reg_op))
> > > + *total += rtx_cost (reg_op, mode, left_code, 0, speed);
> > > +   reg_op = XEXP (left, 1);
> > > +   if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> > > + *total += rtx_cost (reg_op, mode, left_code, 1, speed);
> > > return true;
> > >   }
> > >   }
> > > -
> > > -  *total = COSTS_N_INSNS (1);
> > >return false;
> > 
> > This doesn't improve anything?  It just makes it different from all
> > surrounding code?
> 
> So it moves the common COSTS_N_INSNS (1) count and doesn't recurse for
> regs, like it doesn't for const_int.

I meant that *total set only.  It is fine to have one extra source line.

Does not recursing for regs help anything?  Yes, for CONST_INT that is
questionable as well, but adding more stuff like it does not help
(without actual justification).

This routine is not hot at all.  Maybe decades ago it was, and back then
the thought was that micro-optimisations like this were useful (and
maybe they were, _back then_ :-) )

> > > +case SET:
> > > +  /* The default cost of a SET is the number of general purpose
> > > +  regs being set multiplied by COSTS_N_INSNS (1).  That only
> > > +  works where the incremental cost of the operation and
> > > +  operands is zero, when the operation performed can be done in
> > > +  one instruction.  For other cases where we add COSTS_N_INSNS
> > > +  for some operation (see point 5 above), COSTS_N_INSNS (1)
> > > +  should be subtracted from the total cost.  */
> > 
> > What does "incremental cost" mean there?  If what increases?

Can you improve that comment please?

> > > +  {
> > > + rtx_code src_code = GET_CODE (SET_SRC (x));
> > > + if (src_code == CONST_INT
> > > + || src_code == CONST_DOUBLE
> > > + || src_code == CONST_WIDE_INT)
> > > +   return false;
> > > + int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> > > + + rtx_cost (SET_DEST (x), mode, SET, 0, speed));
> > 
> > This should use set_src_cost, if anything.  But that will recurse then,
> > currently?  Ugh.
> > 
> > Using rtx_cost for SET_DEST is problematic, too.
> > 
> > What (if anything!) calls this for a SET?  Oh, set_rtx_cost still does
> > that, hrm.
> > 
> > rtx_cost costs RTL *expressions*.  Not instructions.  Except where it is
> > (ab)used for that, sigh.
> > 
> > Many targets have something for it already, but all quite different from
> > this.
> 
> Right, you are starting to understand just how difficult it is to do
> anything at all to rs6000_rtx_costs.

"Starting to understand", lol :-)  I 

Re: [pushed] c++: Layout decls with newly-complete type.

2020-09-18 Thread H.J. Lu via Gcc-patches
On Thu, Sep 17, 2020 at 11:43 PM Richard Biener via Gcc-patches
 wrote:
>
> On Fri, Sep 18, 2020 at 5:20 AM Jason Merrill via Gcc-patches
>  wrote:
> >
> > Martin's -Wplacement-new patch ran into a problem with DECL_SIZE not being
> > set on an extern variable for which the type was not complete until after
> > its declaration.  complete_vars was deliberately not calling layout_decl for
> > some reason, instead leaving that for expand_expr_real_1 much later in the
> > compilation.  But if we layout decls at declaration time, I don't see any
> > reason we shouldn't lay them out here, when their type is newly complete.
> >
> > Tested x86_64-pc-linux-gnu, applying to trunk.
>
> Seems to break bootstrap in stage2,
>
> /home/rguenther/src/trunk/gcc/dumpfile.c:169:33: error: storage size
> of 'optgroup_options' isn't known
>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97118


-- 
H.J.


Re: [OG10] Backporting + merge of GCC 10 into branch + pre-applying (OpenMP: Handle cpp_implicit_alias in declare-target discovery)

2020-09-18 Thread Tobias Burnus

On 9/17/20 7:04 PM, Tobias Burnus wrote:


OG10 = devel/omp/gcc-10


Added additionally:

5e8af933d6f libgomp.c-c++-common/pr96390.c: XFAIL on nvptx.
d759c5ff1a0 OpenMP: Fix declare-target discovery with aliasing

Cheers,

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] irange_pool class

2020-09-18 Thread Andrew MacLeod via Gcc-patches

On 9/18/20 1:07 PM, Martin Sebor wrote:

On 9/18/20 8:10 AM, Aldy Hernandez via Gcc-patches wrote:



On 9/18/20 2:28 PM, David Malcolm wrote:

On Fri, 2020-09-18 at 07:49 +0200, Aldy Hernandez wrote:


On 9/18/20 3:43 AM, David Malcolm wrote:

On Thu, 2020-09-17 at 12:36 +0200, Aldy Hernandez via Gcc-patches
wrote:

This is the irange storage class. It is used to allocate the
minimum
amount of storage needed for a given irange.  Storage is
automatically
freed at destruction.

It is meant for long term storage, as opposed to int_range_max
which
is
meant for intermediate temporary results on the stack.

The general gist is:

irange_pool pool;

// Allocate an irange of 5 sub-ranges.
irange *p = pool.allocate (5);

// Allocate an irange of 3 sub-ranges.
irange *q = pool.allocate (3);

// Allocate an irange with as many sub-ranges as are currently
// used in "some_other_range".
irange *r = pool.allocate (some_other_range);


FWIW my first thoughts reading this example were - "how do I
deallocate
these iranges?" and "do I need to call pool.deallocate on them, or
is
that done for me by the irange dtor?"


Thus the description front and center of the header file:

"Storage is automatically freed at destruction..."


I think of a "pool allocator" as something that makes a small
number of
large allocation under the covers, and then uses that to serve
large
numbers of fixed sized small allocations and deallocations with
O(1)
using a free list.


Ah, I didn't know pool had a different meaning.


See e.g. gcc/alloc-pool.h



[...]


+// This is the irange storage class.  It is used to allocate the
+// minimum amount of storage needed for a given irange.  Storage
is
+// automatically freed at destruction.


"at destruction" of what object - the irange or the irange_pool?
Reading the code, it turns out to be "at destruction of the
irange_pool", and it turns out that irange_pool is an obstack under
the
covers (also called a "bump allocator") and thus, I believe, the
lifetime of the irange instances is that of the storage instance.


The sentence is talking about the storage class, so I thought it was
obvious that the destruction we talk about is the storage class
itself.
I suppose if it isn't clear we could say:

"Storage is automatically freed at destruction of the storage class."




I think it would be clearer to name this "irange_obstack", or
somesuch.


I'd prefer something more generic.  We don't want to tie the name of
the
allocator to the underlying implementation.  What if we later change
to
malloc?  We'd have to change the name to irange_malloc.



irange_allocator?  Or is there something more generically appropriate
here?


How about "irange_bump_allocator?"   Rather long, but it expresses the
key point that the irange instances coming out of it don't have
independent lifetimes - their lifetimes are those of the allocator; the
client code doesn't need to find and clean up all of those individual
iranges, right?  (assuming I'm reading the code correctly) (That name
also avoids mentioning the implementation detail that it uses obstack).


I'm sorry, but that's _really_ ugly.

Surely irange_allocator can't be that confusing.  A casual look at 
the header file would dispel all doubts.


David's point abut different strategies was also in the back my
mind but it took me a bit to formulate a question about the design.
Is a pool of ranges with a fixed allocation policy the right design
for long-term storage of irange objects?  What are some example use
cases?

Here's one that comes to mind based on what I want to do in
gimple-ssa-isolate-paths.c.  I need to store irange instances as
members of my own class, but I don't know how many subranges each
instance might need to store (it depends on the IL).  I store
objects of this class a container (e.g., hash_map or set).
I don't want to use int_range_max because that would be wasteful
but I can't use the pool as a proxy because it's not copyable.
So I either have to dynamically allocate the pool and store
a pointer to it instead, in addition to the instances, or derive
my own class from irange and manage the tree arrays myself.  In
both cases I'm adding a layer of memory management on top of what
that the pool is there to provide.  So the design doesn't seem
very well suited for my use case.

I don't mean this as an objection to the patch (I'm sure there's
a use case I'm not thinking of), but more as a question.

Martin



I don't know why  this is confusing...

it works exactly like one would expect a simple allocator to work.. as 
long as the allcoator is "live", its allocations are live.  once it is 
destructed, all the memory it manages is freed..    It purpose is to 
avoid having to walk/find everything that was allocated so it can be freed.


I'll give you the use case from the ranger. in fact, it *is* the 
ranger's allocator, exposed for other passes to use.


Ranger uses the allocator to store the live-on-entry ranges for 

Re: [PATCH v2] builtins: rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-09-18 Thread Raoni Fassina Firmino via Gcc-patches
Ping.


[GCC 10] [PATCH] IRA: Don't make a global register eliminable

2020-09-18 Thread H.J. Lu via Gcc-patches
On Thu, Sep 17, 2020 at 3:52 PM Jeff Law  wrote:
>
>
> On 9/16/20 8:46 AM, Richard Sandiford wrote:
>
> "H.J. Lu"  writes:
>
> On Tue, Sep 15, 2020 at 7:44 AM Richard Sandiford
>  wrote:
>
> Thanks for looking at this.
>
> "H.J. Lu"  writes:
>
> commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c
> Author: Richard Sandiford 
> Date:   Wed Oct 2 07:37:10 2019 +
>
> [LRA] Don't make eliminable registers live (PR91957)
>
> didn't make eliminable registers live which breaks
>
> register void *cur_pro asm("reg");
>
> where "reg" is an eliminable register.  Make fixed eliminable registers
> live to fix it.
>
> I don't think fixedness itself is the issue here: it's usual for at
> least some registers involved in eliminations to be fixed registers.
>
> I think what makes this case different is instead that cur_pro/ebp
> is a global register.  But IMO things have already gone wrong if we
> think that a global register is eliminable.
>
> So I wonder if instead we should check global_regs at the beginning of:
>
>   for (i = 0; i < fp_reg_count; i++)
> if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
> HARD_FRAME_POINTER_REGNUM + i))
>   {
> SET_HARD_REG_BIT (eliminable_regset,
>   HARD_FRAME_POINTER_REGNUM + i);
> if (frame_pointer_needed)
>   SET_HARD_REG_BIT (ira_no_alloc_regs,
> HARD_FRAME_POINTER_REGNUM + i);
>   }
> else if (frame_pointer_needed)
>   error ("%s cannot be used in % here",
>  reg_names[HARD_FRAME_POINTER_REGNUM + i]);
> else
>   df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
>
> (ira_setup_eliminable_regset), and handle the global_regs[] case in
> the same way as the else case, i.e. short-circuiting both of the ifs.
>
> Like this?
>
> Sorry for the delay.  I was testing this in parallel.
>
> Bootstrapped & regression-tested on x86_64-linux-gnu.
>
> Thanks,
> Richard
>
>
> 0001-ira-Fix-elimination-for-global-hard-FPs-PR91957.patch
>
> From af4499845d26fe65573b21197a79fd22fd38694e Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" 
> Date: Tue, 15 Sep 2020 06:23:26 -0700
> Subject: [PATCH] ira: Fix elimination for global hard FPs [PR91957]
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> If the hard frame pointer is being used as a global register,
> we should skip the usual handling for eliminations.  As the
> comment says, the register cannot in that case be eliminated
> (or eliminated to) and is already marked live where appropriate.
>
> Doing this removes the duplicate error for gcc.target/i386/pr82673.c.
> The “cannot be used in 'asm' here” message is meant to be for asm
> statements rather than register asms, and the function that the
> error is reported against doesn't use asm.
>
> gcc/
> 2020-09-16  Richard Sandiford  
>
> PR middle-end/91957
> * ira.c (ira_setup_eliminable_regset): Skip the special elimination
> handling of the hard frame pointer if the hard frame pointer is fixed.
>
> gcc/testsuite/
> 2020-09-16  H.J. Lu  
>Richard Sandiford  
>
> PR middle-end/91957
> * g++.target/i386/pr97054.C: New test.
> * gcc.target/i386/pr82673.c: Remove redundant extra message.
>
> OK

OK for GCC 10 branch?

Thanks.

-- 
H.J.
From 0f4a9cfe04c89c0aabe20e6c91932676704a6496 Mon Sep 17 00:00:00 2001
From: Richard Sandiford 
Date: Fri, 18 Sep 2020 16:55:45 +0100
Subject: [PATCH] ira: Fix elimination for global hard FPs [PR97054]
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If the hard frame pointer is being used as a global register,
we should skip the usual handling for eliminations.  As the
comment says, the register cannot in that case be eliminated
(or eliminated to) and is already marked live where appropriate.

Doing this removes the duplicate error for gcc.target/i386/pr82673.c.
The “cannot be used in 'asm' here” message is meant to be for asm
statements rather than register asms, and the function that the
error is reported against doesn't use asm.

gcc/
2020-09-18  Richard Sandiford  

	PR middle-end/97054
	* ira.c (ira_setup_eliminable_regset): Skip the special elimination
	handling of the hard frame pointer if the hard frame pointer is fixed.

gcc/testsuite/
2020-09-18  H.J. Lu  
	Richard Sandiford  

	PR middle-end/97054
	* g++.target/i386/pr97054.C: New test.
	* gcc.target/i386/pr82673.c: Remove redundant extra message.

(cherry picked from commit 3c7c5f1d4a4b8328fb4c07483cdbfe4ea7762155)
---
 gcc/ira.c   |  8 ++-
 gcc/testsuite/g++.target/i386/pr97054.C | 96 +
 gcc/testsuite/gcc.target/i386/pr82673.c |  2 +-
 3 files changed, 103 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr97054.C

diff --git a/gcc/ira.c b/gcc/ira.c
index a655ae12eb2..681ec2f46f9 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ 

Re: [PATCH] bpf: use xBPF signed div, mod insns when available

2020-09-18 Thread David Faust via Gcc-patches
> 
>>> The 'mod' and 'div' operators in eBPF are unsigned, with no signed
>>> counterpart. xBPF adds two new ALU operations, sdiv and smod, for
>>> signed division and modulus, respectively. Update bpf.md with
>>> 'define_insn' blocks for signed div and mod to use them when targetting
>>> xBPF, and add new tests to ensure they are used appropriately.
>>>
>>> 2020-09-17  David Faust  
>>>
>>> gcc/
>>> * config/bpf/bpf.md: Add defines for signed div and mod operators.
>>>
>>> gcc/testsuite/
>>> * gcc.target/bpf/diag-sdiv.c: New test.
>>> * gcc.target/bpf/diag-smod.c: New test.
>>> * gcc.target/bpf/xbpf-sdiv-1.c: New test.
>>> * gcc.target/bpf/xbpf-smod-1.c: New test.
>>
>> OK.  But give Jose 48hrs before committing just in case there's
>> something he wants to comment on.  Y'all are far more familiar with bpf
>> than I ;-)
> 
> Looks good to me! :)
> But the related pending patch in binutils should go in first.
> 
Hi!

I just checked in the binutils patch, but I don't actually have write
access for gcc.

If someone could check this in for me I'd appreciate it :)

Thanks!
David


Re: [PATCH] irange_pool class

2020-09-18 Thread Martin Sebor via Gcc-patches

On 9/18/20 8:10 AM, Aldy Hernandez via Gcc-patches wrote:



On 9/18/20 2:28 PM, David Malcolm wrote:

On Fri, 2020-09-18 at 07:49 +0200, Aldy Hernandez wrote:


On 9/18/20 3:43 AM, David Malcolm wrote:

On Thu, 2020-09-17 at 12:36 +0200, Aldy Hernandez via Gcc-patches
wrote:

This is the irange storage class.  It is used to allocate the
minimum
amount of storage needed for a given irange.  Storage is
automatically
freed at destruction.

It is meant for long term storage, as opposed to int_range_max
which
is
meant for intermediate temporary results on the stack.

The general gist is:

irange_pool pool;

// Allocate an irange of 5 sub-ranges.
irange *p = pool.allocate (5);

// Allocate an irange of 3 sub-ranges.
irange *q = pool.allocate (3);

// Allocate an irange with as many sub-ranges as are currently
// used in "some_other_range".
irange *r = pool.allocate (some_other_range);


FWIW my first thoughts reading this example were - "how do I
deallocate
these iranges?" and "do I need to call pool.deallocate on them, or
is
that done for me by the irange dtor?"


Thus the description front and center of the header file:

"Storage is automatically freed at destruction..."


I think of a "pool allocator" as something that makes a small
number of
large allocation under the covers, and then uses that to serve
large
numbers of fixed sized small allocations and deallocations with
O(1)
using a free list.


Ah, I didn't know pool had a different meaning.


See e.g. gcc/alloc-pool.h



[...]


+// This is the irange storage class.  It is used to allocate the
+// minimum amount of storage needed for a given irange.  Storage
is
+// automatically freed at destruction.


"at destruction" of what object - the irange or the irange_pool?
Reading the code, it turns out to be "at destruction of the
irange_pool", and it turns out that irange_pool is an obstack under
the
covers (also called a "bump allocator") and thus, I believe, the
lifetime of the irange instances is that of the storage instance.


The sentence is talking about the storage class, so I thought it was
obvious that the destruction we talk about is the storage class
itself.
I suppose if it isn't clear we could say:

"Storage is automatically freed at destruction of the storage class."




I think it would be clearer to name this "irange_obstack", or
somesuch.


I'd prefer something more generic.  We don't want to tie the name of
the
allocator to the underlying implementation.  What if we later change
to
malloc?  We'd have to change the name to irange_malloc.



irange_allocator?  Or is there something more generically appropriate
here?


How about "irange_bump_allocator?"   Rather long, but it expresses the
key point that the irange instances coming out of it don't have
independent lifetimes - their lifetimes are those of the allocator; the
client code doesn't need to find and clean up all of those individual
iranges, right?  (assuming I'm reading the code correctly)   (That name
also avoids mentioning the implementation detail that it uses obstack).


I'm sorry, but that's _really_ ugly.

Surely irange_allocator can't be that confusing.  A casual look at the 
header file would dispel all doubts.


David's point abut different strategies was also in the back my
mind but it took me a bit to formulate a question about the design.
Is a pool of ranges with a fixed allocation policy the right design
for long-term storage of irange objects?  What are some example use
cases?

Here's one that comes to mind based on what I want to do in
gimple-ssa-isolate-paths.c.  I need to store irange instances as
members of my own class, but I don't know how many subranges each
instance might need to store (it depends on the IL).  I store
objects of this class a container (e.g., hash_map or set).
I don't want to use int_range_max because that would be wasteful
but I can't use the pool as a proxy because it's not copyable.
So I either have to dynamically allocate the pool and store
a pointer to it instead, in addition to the instances, or derive
my own class from irange and manage the tree arrays myself.  In
both cases I'm adding a layer of memory management on top of what
that the pool is there to provide.  So the design doesn't seem
very well suited for my use case.

I don't mean this as an objection to the patch (I'm sure there's
a use case I'm not thinking of), but more as a question.

Martin



Aldy



Sorry if I'm nitpicking; I think my high level thought here is that we
have various memory management strategies inside GCC (in no particular
order):
   - garbage collection
   - explicit malloc/free
   - explicit new/delete
   - explicit new/delete backed by allocation pools
   - RAII
   - bump allocators aka obstacks
and I like to be clear about what "owns" each object (responsibility
for cleanup, lifetimes, etc)

Hope this is constructive
Dave







Re: [PATCH] irange_pool class

2020-09-18 Thread Aldy Hernandez via Gcc-patches

On 9/18/20 6:42 PM, Andrew MacLeod wrote:
On 9/18/20 8:28 AM, David Malcolm wrote:I think of a "pool allocator" as 
something that makes a small

number of
large allocation under the covers, and then uses that to serve
large
numbers of fixed sized small allocations and deallocations with
O(1)
using a free list.

Ah, I didn't know pool had a different meaning.

See e.g. gcc/alloc-pool.h


The name originated when the original v1 version was based on using 
alloc-pool.h.  when we went to varying sizes, we switched to and obstack 
implementation  and never changed the name.

  <...>


I think it would be clearer to name this "irange_obstack", or
somesuch.

I'd prefer something more generic.  We don't want to tie the name of
the
allocator to the underlying implementation.  What if we later change
to
malloc?  We'd have to change the name to irange_malloc.
irange_allocator?  Or is there something more generically appropriate
here?

How about "irange_bump_allocator?"   Rather long, but it expresses the




"irange_allocator" is sufficient .      The consumer should not care 
what the implementation is, and we may decide to implement it 
differently down the road. So I don't want to imply something specific 
in the name or we'd have to change it again.


Updated patch attached.

Aldy
commit 343463f8887ab510503d9e230268963a299a84ef
Author: Aldy Hernandez 
Date:   Fri Sep 11 10:15:12 2020 +0200

irange_allocator class

This is the irange storage class.  It is used to allocate the
minimum amount of storage needed for a given irange.  Storage is
automatically freed at destruction of the storage class.

It is meant for long term storage, as opposed to int_range_max
which is meant for intermediate temporary results on the stack.

The general gist is:

irange_allocator alloc;

// Allocate an irange of 5 sub-ranges.
irange *p = alloc.allocate (5);

// Allocate an irange of 3 sub-ranges.
irange *q = alloc.allocate (3);

// Allocate an irange with as many sub-ranges as are currently
// used in "some_other_range".
irange *r = alloc.allocate (some_other_range);

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 8497791c7b3..c875e713d65 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -43,6 +43,7 @@ enum value_range_kind
 
 class irange
 {
+  friend class irange_allocator;
 public:
   // In-place setters.
   void set (tree, tree, value_range_kind = VR_RANGE);
@@ -619,4 +620,68 @@ vrp_val_min (const_tree type)
   return NULL_TREE;
 }
 
+// This is the irange storage class.  It is used to allocate the
+// minimum amount of storage needed for a given irange.  Storage is
+// automatically freed at destruction of the storage class.
+//
+// It is meant for long term storage, as opposed to int_range_max
+// which is meant for intermediate temporary results on the stack.
+//
+// The newly allocated irange is initialized to the empty set
+// (undefined_p() is true).
+
+class irange_allocator
+{
+public:
+  irange_allocator ();
+  ~irange_allocator ();
+  // Return a new range with NUM_PAIRS.
+  irange *allocate (unsigned num_pairs);
+  // Return a copy of SRC with the minimum amount of sub-ranges needed
+  // to represent it.
+  irange *allocate (const irange );
+private:
+  DISABLE_COPY_AND_ASSIGN (irange_allocator);
+  struct obstack m_obstack;
+};
+
+inline
+irange_allocator::irange_allocator ()
+{
+  obstack_init (_obstack);
+}
+
+inline
+irange_allocator::~irange_allocator ()
+{
+  obstack_free (_obstack, NULL);
+}
+
+// Return a new range with NUM_PAIRS.
+
+inline irange *
+irange_allocator::allocate (unsigned num_pairs)
+{
+  // Never allocate 0 pairs.
+  // Don't allocate 1 either, or we get legacy value_range's.
+  if (num_pairs < 2)
+num_pairs = 2;
+
+  struct newir {
+irange range;
+tree mem[1];
+  };
+  size_t nbytes = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
+  struct newir *r = (newir *) obstack_alloc (_obstack, nbytes);
+  return new (r) irange (r->mem, num_pairs);
+}
+
+inline irange *
+irange_allocator::allocate (const irange )
+{
+  irange *r = allocate (src.num_pairs ());
+  *r = src;
+  return r;
+}
+
 #endif // GCC_VALUE_RANGE_H


Re: [PATCH] irange_pool class

2020-09-18 Thread Andrew MacLeod via Gcc-patches
On 9/18/20 8:28 AM, David Malcolm wrote:I think of a "pool allocator" as 
something that makes a small

number of
large allocation under the covers, and then uses that to serve
large
numbers of fixed sized small allocations and deallocations with
O(1)
using a free list.

Ah, I didn't know pool had a different meaning.

See e.g. gcc/alloc-pool.h


The name originated when the original v1 version was based on using 
alloc-pool.h.  when we went to varying sizes, we switched to and obstack 
implementation  and never changed the name.

 <...>


I think it would be clearer to name this "irange_obstack", or
somesuch.

I'd prefer something more generic.  We don't want to tie the name of
the
allocator to the underlying implementation.  What if we later change
to
malloc?  We'd have to change the name to irange_malloc.
irange_allocator?  Or is there something more generically appropriate
here?

How about "irange_bump_allocator?"   Rather long, but it expresses the




"irange_allocator" is sufficient .      The consumer should not care 
what the implementation is, and we may decide to implement it 
differently down the road. So I don't want to imply something specific 
in the name or we'd have to change it again.


Andrew



Re: [PATCH] aarch64: Don't generate invalid zero/sign-extend syntax

2020-09-18 Thread Alex Coplan
Hi Christophe,

On 08/09/2020 10:14, Christophe Lyon wrote:
> On Mon, 17 Aug 2020 at 11:00, Alex Coplan  wrote:
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.md
> > (*adds__): Ensure extended operand
> > agrees with width of extension specifier.
> > (*subs__): Likewise.
> > (*adds__shift_): Likewise.
> > (*subs__shift_): Likewise.
> > (*add__): Likewise.
> > (*add__shft_): Likewise.
> > (*add_uxt_shift2): Likewise.
> > (*sub__): Likewise.
> > (*sub__shft_): Likewise.
> > (*sub_uxt_shift2): Likewise.
> > (*cmp_swp__reg): Likewise.
> > (*cmp_swp__shft_): Likewise.
> >
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/adds3.c: Fix test w.r.t. new syntax.
> > * gcc.target/aarch64/cmp.c: Likewise.
> > * gcc.target/aarch64/subs3.c: Likewise.
> > * gcc.target/aarch64/subsp.c: Likewise.
> > * gcc.target/aarch64/extend-syntax.c: New test.
> >
> 
> Hi,
> 
> I've noticed some of the new tests fail with -mabi=ilp32:
> gcc.target/aarch64/extend-syntax.c check-function-bodies add1
> gcc.target/aarch64/extend-syntax.c check-function-bodies add3
> gcc.target/aarch64/extend-syntax.c check-function-bodies sub2
> gcc.target/aarch64/extend-syntax.c check-function-bodies sub3
> gcc.target/aarch64/extend-syntax.c scan-assembler-times
> subs\tx[0-9]+, x[0-9]+, w[0-9]+, sxtw 3 1
> gcc.target/aarch64/subsp.c scan-assembler sub\tsp, sp, w[0-9]*, sxtw 4\n
> 
> Christophe

AFAICT the second scan-assembler in that subsp test failed on ILP32
before my commit. This is because we generate slightly suboptimal code
here. On LP64 with -O, we get:

f2:
stp x29, x30, [sp, -16]!
mov x29, sp
add w1, w1, 1
sub sp, sp, x1, sxtw 4
mov x0, sp
bl  foo
mov sp, x29
ldp x29, x30, [sp], 16
ret

On ILP32, we get:

f2:
stp x29, x30, [sp, -16]!
mov x29, sp
add w1, w1, 1
lsl w1, w1, 4
sub sp, sp, x1
mov w0, wsp
bl  foo
mov sp, x29
ldp x29, x30, [sp], 16
ret

And we see similar results all the way back to GCC 6. So AFAICT this
scan-assembler has never worked. The attached patch disables it on ILP32
since this isn't a code quality regression.

This patch also fixes up the DejaGnu directives in extend-syntax.c to
work on ILP32: we change the check-function-bodies directive to only run
on LP64, adding scan-assembler directives for ILP32 where required.

OK for trunk?

Thanks,
Alex
diff --git a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c 
b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
index 23fa9f4ffc5..1bfcdb59dde 100644
--- a/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
+++ b/gcc/testsuite/gcc.target/aarch64/extend-syntax.c
@@ -20,6 +20,7 @@ unsigned long long *add1(unsigned long long *p, unsigned x)
 */
 unsigned long long add2(unsigned long long x, unsigned y)
 {
+  /* { dg-final { scan-assembler-times "add\tx0, x0, w1, uxtw" 1 { target 
ilp32 } } } */
   return x + y;
 }
 
@@ -34,6 +35,9 @@ double *add3(double *p, int x)
   return p + x;
 }
 
+// add1 and add3 should both generate this on ILP32:
+/* { dg-final { scan-assembler-times "add\tw0, w0, w1, lsl 3" 2 { target ilp32 
} } } */
+
 // Hits *sub_zero_extendsi_di (*sub__).
 /*
 ** sub1:
@@ -42,6 +46,7 @@ double *add3(double *p, int x)
 */
 unsigned long long sub1(unsigned long long x, unsigned n)
 {
+/* { dg-final { scan-assembler-times "sub\tx0, x0, w1, uxtw" 1 { target 
ilp32 } } } */
 return x - n;
 }
 
@@ -67,6 +72,9 @@ double *sub3(double *p, int n)
   return p - n;
 }
 
+// sub2 and sub3 should both generate this on ILP32:
+/* { dg-final { scan-assembler-times "sub\tw0, w0, w1, lsl 3" 2 { target ilp32 
} } } */
+
 // Hits *adds_zero_extendsi_di (*adds__).
 int adds1(unsigned long long x, unsigned y)
 {
@@ -97,7 +105,8 @@ int subs1(unsigned long long x, unsigned y)
 unsigned long long *w;
 int subs2(unsigned long long *x, int y)
 {
-  /* { dg-final { scan-assembler-times "subs\tx\[0-9\]+, x\[0-9\]+, w\[0-9\]+, 
sxtw 3" 1 } } */
+  /* { dg-final { scan-assembler-times "subs\tx\[0-9\]+, x\[0-9\]+, w\[0-9\]+, 
sxtw 3" 1 { target lp64 } } } */
+  /* { dg-final { scan-assembler-times "subs\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, 
lsl 3" 1 { target ilp32 } } } */
   unsigned long long *t = x - y;
   w = t;
   return !!t;
@@ -117,4 +126,4 @@ int cmp2(unsigned long long x, int y)
   return x == ((unsigned long long)y << 3);
 }
 
-/* { dg-final { check-function-bodies "**" "" "" } } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/subsp.c 
b/gcc/testsuite/gcc.target/aarch64/subsp.c
index 341b83dca86..e7f61e0799b 100644
--- a/gcc/testsuite/gcc.target/aarch64/subsp.c
+++ b/gcc/testsuite/gcc.target/aarch64/subsp.c

Re: [PATCH] vect/test: Don't check for epilogue loop [PR97075]

2020-09-18 Thread Segher Boessenkool
On Fri, Sep 18, 2020 at 10:37:47AM +0800, Kewen.Lin wrote:
> The commit r11-3230 brings a nice improvement to use full
> vectors instead of partial vectors when available.  But
> it caused some vector with length test cases to fail on
> Power.
> 
> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
> exposed one issue that: we call function 
> vect_need_peeling_or_partial_vectors_p in function
> vect_analyze_loop_2, since it's in analysis phase, for
> the epilogue loop, we could get the wrong information like
> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
> answer for vect_need_peeling_or_partial_vectors_p.
> For the epilogue loop in this failure specific, the niter
> that we get is 58 (should be 1), vf is 2.
> 
> For epilogue loop with partial vectors, it would use the
> same VF as the main loop, so it won't be able to use full
> vector, this patch is to exclude epilogue loop for the
> check vect_need_peeling_or_partial_vectors_p in
> vect_analyze_loop_2.
> 
> The failure on gcc.target/powerpc/p9-vec-length-full-6.c
> is just a test issue, the 64bit/32bit pairs are able to
> use full vector, fixed in the patch accordingly.

> gcc/ChangeLog:
> 
>   * tree-vect-loop.c (vect_analyze_loop_2): Don't check
>   vect_need_peeling_or_partial_vectors_p for the epilogue loop.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/p9-vec-length-full-6.c: Adjust.

The testcase part of course is okay for trunk, if this is the expected
(and good :-) ) code.Thanks,


Segher


Re: [PATCH] debug: Pass --gdwarf-N to assembler if fixed gas is detected during configure

2020-09-18 Thread Mark Wielaard
Hi,

On Tue, 2020-09-15 at 20:40 +0200, Jakub Jelinek wrote:
> Ok, here it is in patch form.
> I've briefly tested it, with the older binutils I have around (no --gdwarf-N
> support), with latest gas (--gdwarf-N that can be passed to as even when
> compiling C/C++ etc. code and emitting .debug_line) and latest gas with 
> Mark's fix
> reverted (--gdwarf-N support, but can only pass it to as when assembling
> user .s/.S files, not when compiling C/C++ etc.).
> Will bootstrap/regtest (with the older binutils) later tonight.
> 
> 2020-09-15  Jakub Jelinek  
> 
>   * configure.ac (HAVE_AS_GDWARF_5_DEBUG_FLAG,
>   HAVE_AS_WORKING_DWARF_4_FLAG): New tests.
>   * gcc.c (ASM_DEBUG_DWARF_OPTION): Define.
>   (ASM_DEBUG_SPEC): Use ASM_DEBUG_DWARF_OPTION instead of
>   "--gdwarf2".  Use %{cond:opt1;:opt2} style.
>   (ASM_DEBUG_OPTION_DWARF_OPT): Define.
>   (ASM_DEBUG_OPTION_SPEC): Define.
>   (asm_debug_option): New variable.
>   (asm_options): Add "%(asm_debug_option)".
>   (static_specs): Add asm_debug_option entry.
>   (static_spec_functions): Add dwarf-version-gt.
>   (debug_level_greater_than_spec_func): New function.
>   * config/darwin.h (ASM_DEBUG_OPTION_SPEC): Define.
>   * config/darwin9.h (ASM_DEBUG_OPTION_SPEC): Redefine.
>   * config.in: Regenerated.
>   * configure: Regenerated.

Once this is in we can more generally emit DW_FORM_line_str for
filepaths in CU DIEs for the name and comp_dir attribute. There
currently is a bit of a hack to do this in dwarf2out_early_finish, but
that only works when the assembler doesn't emit a DWARF5 .debug_line,
but gcc does it itself.

What do you think of the attached patch?
From c31667db57de62c3107a0b2a5e30fbd57a4708a3 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Fri, 18 Sep 2020 17:07:03 +0200
Subject: [PATCH] Output filepath strings in .debug_line_str for DWARF5

DWARF5 has a new string table specially for file paths. .debug_line
file and dir tables reference strings in .debug_line_str.  If a
.debug_line_str section is emitted then also place CU DIE file
names and comp dirs there.

gcc/ChangeLog:

	* dwarf2out.c (add_filepath_AT_string): New function.
	(asm_outputs_debug_line_str): Likewise.
	(add_filename_attribute): Likewise.
	(add_comp_dir_attribute): Call add_filepath_AT_string.
	(gen_compile_unit_die): Call add_filename_attribute for name.
	(init_sections_and_labels): Init debug_line_str_section when
	asm_outputs_debug_line_str return true.
	(dwarf2out_early_finish): Remove DW_AT_name and DW_AT_comp_dir
	hack and call add_filename_attribute for the remap_debug_filename.
---
 gcc/dwarf2out.c | 96 -
 1 file changed, 64 insertions(+), 32 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 4096c0c0d69f..a43082864a75 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3347,6 +3347,8 @@ output_asm_line_debug_info (void)
 	  || !debug_variable_location_views));
 }
 
+static bool asm_outputs_debug_line_str (void);
+
 /* Minimum line offset in a special line info. opcode.
This value was chosen to give a reasonable range of values.  */
 #define DWARF_LINE_BASE  -10
@@ -4731,6 +4733,33 @@ reset_indirect_string (indirect_string_node **h, void *)
   return 1;
 }
 
+static inline void
+add_filepath_AT_string (dw_die_ref die, enum dwarf_attribute attr_kind,
+			const char *str)
+{
+  if (! asm_outputs_debug_line_str ())
+add_AT_string (die, attr_kind, str);
+  else
+{
+  dw_attr_node attr;
+  struct indirect_string_node *node;
+
+  if (!debug_line_str_hash)
+	debug_line_str_hash
+	  = hash_table::create_ggc (10);
+
+  node = find_AT_string_in_table (str, debug_line_str_hash);
+  set_indirect_string (node);
+  node->form = DW_FORM_line_strp;
+
+  attr.dw_attr = attr_kind;
+  attr.dw_attr_val.val_class = dw_val_class_str;
+  attr.dw_attr_val.val_entry = NULL;
+  attr.dw_attr_val.v.val_str = node;
+  add_dwarf_attr (die, );
+}
+}
+
 /* Find out whether a string should be output inline in DIE
or out-of-line in .debug_str section.  */
 
@@ -11839,6 +11868,29 @@ output_ranges (void)
   for -gsplit-dwarf we should use DW_FORM_strx instead.  */	\
&& !dwarf_split_debug_info)
 
+
+/* Returns TRUE if we are outputting DWARF5 and the assembler supports
+   DWARF5 .debug_line tables using .debug_line_str or we generate
+   it ourselves, except for split-dwarf which doesn't have a
+   .debug_line_str.  */
+static bool
+asm_outputs_debug_line_str (void)
+{
+  if (dwarf_version >= 5
+  && ! output_asm_line_debug_info ()
+  && DWARF5_USE_DEBUG_LINE_STR)
+return true;
+  else
+{
+#if defined(HAVE_AS_GDWARF_5_DEBUG_FLAG) && defined(HAVE_AS_WORKING_DWARF_4_FLAG)
+  return !dwarf_split_debug_info && dwarf_version >= 5;
+#else
+  return false;
+#endif
+}
+}
+
+
 /* Assign .debug_rnglists indexes.  */
 
 static void
@@ -20514,6 +20566,13 @@ add_name_attribute 

Re: [PATCH] bpf: use xBPF signed div, mod insns when available

2020-09-18 Thread Jose E. Marchesi via Gcc-patches


>> The 'mod' and 'div' operators in eBPF are unsigned, with no signed
>> counterpart. xBPF adds two new ALU operations, sdiv and smod, for
>> signed division and modulus, respectively. Update bpf.md with
>> 'define_insn' blocks for signed div and mod to use them when targetting
>> xBPF, and add new tests to ensure they are used appropriately.
>>
>> 2020-09-17  David Faust  
>>
>> gcc/
>>  * config/bpf/bpf.md: Add defines for signed div and mod operators.
>>
>> gcc/testsuite/
>>  * gcc.target/bpf/diag-sdiv.c: New test.
>>  * gcc.target/bpf/diag-smod.c: New test.
>>  * gcc.target/bpf/xbpf-sdiv-1.c: New test.
>>  * gcc.target/bpf/xbpf-smod-1.c: New test.
>
> OK.  But give Jose 48hrs before committing just in case there's
> something he wants to comment on.  Y'all are far more familiar with bpf
> than I ;-)

Looks good to me! :)
But the related pending patch in binutils should go in first.


Re: [PATCH] irange_pool class

2020-09-18 Thread Aldy Hernandez via Gcc-patches




On 9/18/20 2:28 PM, David Malcolm wrote:

On Fri, 2020-09-18 at 07:49 +0200, Aldy Hernandez wrote:


On 9/18/20 3:43 AM, David Malcolm wrote:

On Thu, 2020-09-17 at 12:36 +0200, Aldy Hernandez via Gcc-patches
wrote:

This is the irange storage class.  It is used to allocate the
minimum
amount of storage needed for a given irange.  Storage is
automatically
freed at destruction.

It is meant for long term storage, as opposed to int_range_max
which
is
meant for intermediate temporary results on the stack.

The general gist is:

irange_pool pool;

// Allocate an irange of 5 sub-ranges.
irange *p = pool.allocate (5);

// Allocate an irange of 3 sub-ranges.
irange *q = pool.allocate (3);

// Allocate an irange with as many sub-ranges as are currently
// used in "some_other_range".
irange *r = pool.allocate (some_other_range);


FWIW my first thoughts reading this example were - "how do I
deallocate
these iranges?" and "do I need to call pool.deallocate on them, or
is
that done for me by the irange dtor?"


Thus the description front and center of the header file:

"Storage is automatically freed at destruction..."


I think of a "pool allocator" as something that makes a small
number of
large allocation under the covers, and then uses that to serve
large
numbers of fixed sized small allocations and deallocations with
O(1)
using a free list.


Ah, I didn't know pool had a different meaning.


See e.g. gcc/alloc-pool.h



[...]


+// This is the irange storage class.  It is used to allocate the
+// minimum amount of storage needed for a given irange.  Storage
is
+// automatically freed at destruction.


"at destruction" of what object - the irange or the irange_pool?
Reading the code, it turns out to be "at destruction of the
irange_pool", and it turns out that irange_pool is an obstack under
the
covers (also called a "bump allocator") and thus, I believe, the
lifetime of the irange instances is that of the storage instance.


The sentence is talking about the storage class, so I thought it was
obvious that the destruction we talk about is the storage class
itself.
I suppose if it isn't clear we could say:

"Storage is automatically freed at destruction of the storage class."




I think it would be clearer to name this "irange_obstack", or
somesuch.


I'd prefer something more generic.  We don't want to tie the name of
the
allocator to the underlying implementation.  What if we later change
to
malloc?  We'd have to change the name to irange_malloc.



irange_allocator?  Or is there something more generically appropriate
here?


How about "irange_bump_allocator?"   Rather long, but it expresses the
key point that the irange instances coming out of it don't have
independent lifetimes - their lifetimes are those of the allocator; the
client code doesn't need to find and clean up all of those individual
iranges, right?  (assuming I'm reading the code correctly)   (That name
also avoids mentioning the implementation detail that it uses obstack).


I'm sorry, but that's _really_ ugly.

Surely irange_allocator can't be that confusing.  A casual look at the 
header file would dispel all doubts.


Aldy



Sorry if I'm nitpicking; I think my high level thought here is that we
have various memory management strategies inside GCC (in no particular
order):
   - garbage collection
   - explicit malloc/free
   - explicit new/delete
   - explicit new/delete backed by allocation pools
   - RAII
   - bump allocators aka obstacks
and I like to be clear about what "owns" each object (responsibility
for cleanup, lifetimes, etc)

Hope this is constructive
Dave





Re: [PATCH] tree-optimization/97081 - fix wrong-code with vectorized shift

2020-09-18 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 18, 2020 at 01:39:16PM +0200, Richard Biener wrote:
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -2456,7 +2456,6 @@ vect_recog_rotate_pattern (vec_info *vinfo,
>append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt);
>  }
>stype = TREE_TYPE (def);
> -  scalar_int_mode smode = SCALAR_INT_TYPE_MODE (stype);
>  
>if (TREE_CODE (def) == INTEGER_CST)
>  {
> @@ -2485,7 +2484,7 @@ vect_recog_rotate_pattern (vec_info *vinfo,
>   append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt, vecstype);
>  
>def2 = vect_recog_temp_ssa_var (stype, NULL);
> -  tree mask = build_int_cst (stype, GET_MODE_PRECISION (smode) - 1);
> +  tree mask = build_int_cst (stype, GET_MODE_PRECISION (mode) - 1);
>def_stmt = gimple_build_assign (def2, BIT_AND_EXPR,
> gimple_assign_lhs (def_stmt), mask);
>if (ext_def)

The bug didn't affect just the > 32 precision type rotates, but also the
narrower ones where we'd end up with UB on the vector shifts and some arches
like x86 would end up with 0 as the result of the shift rather than the
shift count being masked with the operand width.

So I think it is worth adding testcase for that too.  Tested on x86_64-linux
-m32/-m64 without and with your fix, committed to trunk as obvious:

2020-09-18  Jakub Jelinek  

PR tree-optimization/97081
* gcc.dg/vect/pr97081-2.c: New test.

--- gcc/testsuite/gcc.dg/vect/pr97081-2.c
+++ gcc/testsuite/gcc.dg/vect/pr97081-2.c
@@ -0,0 +1,32 @@
+/* PR tree-optimization/97081 */
+
+#include "tree-vect.h"
+
+unsigned short s[1024];
+unsigned char c[1024];
+
+__attribute__((noipa)) void
+foo (int n)
+{
+  for (int i = 0; i < 1024; i++)
+s[i] = (s[i] << n) | (s[i] >> (__SIZEOF_SHORT__ * __CHAR_BIT__ - n));
+  for (int i = 0; i < 1024; i++)
+c[i] = (c[i] << n) | (c[i] >> (__CHAR_BIT__ - n));
+}
+
+int
+main ()
+{
+  check_vect ();
+  for (int i = 0; i < 1024; i++)
+{
+  s[i] = i;
+  c[i] = i;
+}
+  foo (3);
+  for (int i = 0; i < 1024; i++)
+if (s[i] != (unsigned short) ((i << 3) | (i >> (__SIZEOF_SHORT__ * 
__CHAR_BIT__ - 3)))
+|| c[i] != (unsigned char) unsigned char) i) << 3) | (((unsigned 
char) i) >> (__CHAR_BIT__ - 3
+  __builtin_abort ();
+  return 0;
+}


Jakub



Re: [PATCH] aarch64: Add extend-as-extract-with-shift pattern [PR96998]

2020-09-18 Thread Alex Coplan
Hi Richard, Segher,

On 17/09/2020 08:10, Richard Sandiford wrote:
> Alex Coplan  writes:
> > Hi Richard,
> >
> > On 10/09/2020 19:18, Richard Sandiford wrote:
> >> Alex Coplan  writes:
> >> > Hello,
> >> >
> >> > Since r11-2903-g6b3034eaba83935d9f6dfb20d2efbdb34b5b00bf introduced a
> >> > canonicalization from mult to shift on address reloads, a missing
> >> > pattern in the AArch64 backend was exposed.
> >> >
> >> > In particular, we were missing the ashift variant of
> >> > *add__multp2 (this mult variant is redundant and was
> >> > removed in r11-3033-g2f8ae301f6a125f50b0a758047fcddae7b68daa8).
> >> >
> >> > This patch adds that missing pattern (fixing PR96998), updates
> >> > aarch64_is_extend_from_extract() to work for the shift pattern (instead
> >> > of the redundant mult variant) and updates callers in the cost
> >> > calculations to apply the costing for the shift variant instead.
> >> 
> >> Hmm.  I think we should instead fix this in combine.
> >
> > Ok, thanks for the review. Please find a revised patch attached which
> > fixes this in combine instead.
> 
> Thanks for doing this, looks like a really nice clean-up.
> 
> > @@ -7650,20 +7650,29 @@ make_extraction (machine_mode mode, rtx inner, 
> > HOST_WIDE_INT pos,
> > is_mode = GET_MODE (SUBREG_REG (inner));
> >inner = SUBREG_REG (inner);
> >  }
> > -  else if (GET_CODE (inner) == ASHIFT
> > +  else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT)
> >&& CONST_INT_P (XEXP (inner, 1))
> > -  && pos_rtx == 0 && pos == 0
> > -  && len > UINTVAL (XEXP (inner, 1)))
> > -{
> > -  /* We're extracting the least significant bits of an rtx
> > -(ashift X (const_int C)), where LEN > C.  Extract the
> > -least significant (LEN - C) bits of X, giving an rtx
> > -whose mode is MODE, then shift it left C times.  */
> > -  new_rtx = make_extraction (mode, XEXP (inner, 0),
> > -0, 0, len - INTVAL (XEXP (inner, 1)),
> > -unsignedp, in_dest, in_compare);
> > -  if (new_rtx != 0)
> > -   return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
> > +  && pos_rtx == 0 && pos == 0)
> > +{
> > +  const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
> > +  const bool mult = GET_CODE (inner) == MULT;
> > +  const int shift_amt = mult ? exact_log2 (ci) : ci;
> 
> Not going to be a problem in practice but: HOST_WIDE_INT would be better
> than int here, so that we don't truncate the value for ASHIFT before it
> has been tested.  Similarly:
> 
> > +
> > +  if (shift_amt > 0 && len > (unsigned)shift_amt)
> 
> unsigned HOST_WIDE_INT here.

Makes sense, thanks.

> 
> > +   {
> > + /* We're extracting the least significant bits of an rtx
> > +(ashift X (const_int C)) or (mult X (const_int (2^C))),
> > +where LEN > C.  Extract the least significant (LEN - C) bits
> > +of X, giving an rtx whose mode is MODE, then shift it left
> > +C times.  */
> > + new_rtx = make_extraction (mode, XEXP (inner, 0),
> > +0, 0, len - shift_amt,
> > +unsignedp, in_dest, in_compare);
> > + if (new_rtx)
> > +   return mult
> > +   ? gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1))
> > +   : gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
> 
> Easier as gen_rtx_fmt_ee (GET_CODE (inner), mode, …);

Ah, I thought there must have been an easier way to do that!

> 
> The combine parts LGTM otherwise, but Segher should have the
> final say.

Great. FWIW, I tested the previous patch on aarch64, arm, and x86 and
there were no regressions there. I've attached a revised patch with
these fixes and that bootstraps/regtests fine on aarch64-none-linux-gnu.



> The aarch64 changes are OK with those (incredibly inconsequential)
> comments fixed.

Great, thanks. I'll wait for Segher's verdict on the combine bits.

Alex

---

gcc/ChangeLog:

* combine.c (expand_compound_operation): Tweak variable name in
comment to match source.
(make_extraction): Handle mult by power of two in addition to
ashift.
* config/aarch64/aarch64.c (aarch64_is_extend_from_extract): Delete.
(aarch64_classify_index): Remove redundant extend-as-extract handling.
(aarch64_strip_extend): Likewise.
(aarch64_rtx_arith_op_extract_p): Likewise, remove now-unused
parameter. Update callers...
(aarch64_rtx_costs): ... here.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr96998.c: New test.
diff --git a/gcc/combine.c b/gcc/combine.c
index c88382efbd3..fe8eff2b464 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x)
 }
 
   /* If we reach here, we want to return a pair of shifts.  The inner
- shift is a left shift of BITSIZE - POS - LEN bits.  The outer
- shift is a right shift of BITSIZE - LEN bits.  It is arithmetic or
+ shift is a left shift of 

Re: [PATCH] CSE negated multiplications and divisions

2020-09-18 Thread Christophe Lyon via Gcc-patches
On Thu, 17 Sep 2020 at 13:20, Richard Biener  wrote:
>
> This adds the capability to look for available negated multiplications
> and divisions, replacing them with cheaper negates.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
>

This patch caused a regression in fortran, I filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97109


> 2020-09-17  Richard Biener  
>
> * tree-ssa-sccvn.c (visit_nary_op): Value-number multiplications
> and divisions to negates of available negated forms.
>
> * gcc.dg/tree-ssa/ssa-fre-88.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-88.c | 18 +++
>  gcc/tree-ssa-sccvn.c   | 35 ++
>  2 files changed, 53 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-88.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-88.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-88.c
> new file mode 100644
> index 000..15d2ca05e65
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-88.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-fre1" } */
> +
> +double y[2];
> +void foo (double x)
> +{
> +  y[0] = x * -3.;
> +  y[1] = x * 3.;
> +}
> +void bar (double x, double z)
> +{
> +  y[0] = -z / x;
> +  y[1] = z / x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "fre1" } } */
> +/* { dg-final { scan-tree-dump-times " / " 1 "fre1" } } */
> +/* { dg-final { scan-tree-dump-times "= -_" 2 "fre1" } } */
> diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
> index 8fbb1dd46d1..64f1e8c9160 100644
> --- a/gcc/tree-ssa-sccvn.c
> +++ b/gcc/tree-ssa-sccvn.c
> @@ -4824,6 +4824,40 @@ visit_nary_op (tree lhs, gassign *stmt)
> }
> }
>break;
> +case RDIV_EXPR:
> +case TRUNC_DIV_EXPR:
> +case MULT_EXPR:
> +  /* Match up ([-]a){/,*}([-])b with v=a{/,*}b, replacing it with -v.  */
> +  if (! HONOR_SIGN_DEPENDENT_ROUNDING (type))
> +   {
> + tree rhs[2];
> + rhs[0] = rhs1;
> + rhs[1] = gimple_assign_rhs2 (stmt);
> + for (unsigned i = 0; i <= 1; ++i)
> +   {
> + unsigned j = i == 0 ? 1 : 0;
> + tree ops[2];
> + gimple_match_op match_op (gimple_match_cond::UNCOND,
> +   NEGATE_EXPR, type, rhs[i]);
> + ops[i] = vn_nary_build_or_lookup_1 (_op, false);
> + ops[j] = rhs[j];
> + if (ops[i]
> + && (ops[0] = vn_nary_op_lookup_pieces (2, code,
> +type, ops, NULL)))
> +   {
> + gimple_match_op match_op (gimple_match_cond::UNCOND,
> +   NEGATE_EXPR, type, ops[0]);
> + result = vn_nary_build_or_lookup (_op);
> + if (result)
> +   {
> + bool changed = set_ssa_val_to (lhs, result);
> + vn_nary_op_insert_stmt (stmt, result);
> + return changed;
> +   }
> +   }
> +   }
> +   }
> +  break;
>  default:
>break;
>  }
> @@ -5739,6 +5773,7 @@ eliminate_dom_walker::eliminate_insert (basic_block bb,
>if (!stmt
>|| (!CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))
>   && gimple_assign_rhs_code (stmt) != VIEW_CONVERT_EXPR
> + && gimple_assign_rhs_code (stmt) != NEGATE_EXPR
>   && gimple_assign_rhs_code (stmt) != BIT_FIELD_REF
>   && (gimple_assign_rhs_code (stmt) != BIT_AND_EXPR
>   || TREE_CODE (gimple_assign_rhs2 (stmt)) != INTEGER_CST)))
> --
> 2.26.2


Re: [PATCH] libiberty/pex-win32.c: Initialize orig_err

2020-09-18 Thread Christophe Lyon via Gcc-patches
On Thu, 17 Sep 2020 at 23:33, Jeff Law  wrote:
>
>
> On 9/14/20 3:29 AM, Christophe Lyon via Gcc-patches wrote:
> > Initializing orig_err avoids a warning: "may be used uninitialized".
> >
> > 2020-09-14  Torbjörn SVENSSON 
> >   Christophe Lyon  
> >
> >   libiberty/
> >   * pex-win32 (pex_win32_exec_child): Initialize orig_err.
>
> Rather than just blindly initializing orig_err, we'd actually prefer to
> have a compilable testcase and analyze why we're getting the warning.  A
> false positive -Wuninitialized warning is often a missed optimization
> under the hood.
>

OK, I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97108
after checking the problem is still present with trunk and not only
with the old gcc version we have in our builders.

Thanks

Christophe

>
> jeff
>
>


Re: [PATCH] irange_pool class

2020-09-18 Thread David Malcolm via Gcc-patches
On Fri, 2020-09-18 at 07:49 +0200, Aldy Hernandez wrote:
> 
> On 9/18/20 3:43 AM, David Malcolm wrote:
> > On Thu, 2020-09-17 at 12:36 +0200, Aldy Hernandez via Gcc-patches
> > wrote:
> > > This is the irange storage class.  It is used to allocate the
> > > minimum
> > > amount of storage needed for a given irange.  Storage is
> > > automatically
> > > freed at destruction.
> > > 
> > > It is meant for long term storage, as opposed to int_range_max
> > > which
> > > is
> > > meant for intermediate temporary results on the stack.
> > > 
> > > The general gist is:
> > > 
> > >   irange_pool pool;
> > > 
> > >   // Allocate an irange of 5 sub-ranges.
> > >   irange *p = pool.allocate (5);
> > > 
> > >   // Allocate an irange of 3 sub-ranges.
> > >   irange *q = pool.allocate (3);
> > > 
> > >   // Allocate an irange with as many sub-ranges as are currently
> > >   // used in "some_other_range".
> > >   irange *r = pool.allocate (some_other_range);
> > 
> > FWIW my first thoughts reading this example were - "how do I
> > deallocate
> > these iranges?" and "do I need to call pool.deallocate on them, or
> > is
> > that done for me by the irange dtor?"
> 
> Thus the description front and center of the header file:
> 
> "Storage is automatically freed at destruction..."
> 
> > I think of a "pool allocator" as something that makes a small
> > number of
> > large allocation under the covers, and then uses that to serve
> > large
> > numbers of fixed sized small allocations and deallocations with
> > O(1)
> > using a free list.
> 
> Ah, I didn't know pool had a different meaning.

See e.g. gcc/alloc-pool.h


> > [...]
> > 
> > > +// This is the irange storage class.  It is used to allocate the
> > > +// minimum amount of storage needed for a given irange.  Storage
> > > is
> > > +// automatically freed at destruction.
> > 
> > "at destruction" of what object - the irange or the irange_pool?
> > Reading the code, it turns out to be "at destruction of the
> > irange_pool", and it turns out that irange_pool is an obstack under
> > the
> > covers (also called a "bump allocator") and thus, I believe, the
> > lifetime of the irange instances is that of the storage instance.
> 
> The sentence is talking about the storage class, so I thought it was 
> obvious that the destruction we talk about is the storage class
> itself. 
> I suppose if it isn't clear we could say:
> 
> "Storage is automatically freed at destruction of the storage class."


> > I think it would be clearer to name this "irange_obstack", or
> > somesuch.
> 
> I'd prefer something more generic.  We don't want to tie the name of
> the 
> allocator to the underlying implementation.  What if we later change
> to 
> malloc?  We'd have to change the name to irange_malloc.

> irange_allocator?  Or is there something more generically appropriate
> here?

How about "irange_bump_allocator?"   Rather long, but it expresses the
key point that the irange instances coming out of it don't have
independent lifetimes - their lifetimes are those of the allocator; the
client code doesn't need to find and clean up all of those individual
iranges, right?  (assuming I'm reading the code correctly)   (That name
also avoids mentioning the implementation detail that it uses obstack).

Sorry if I'm nitpicking; I think my high level thought here is that we
have various memory management strategies inside GCC (in no particular
order):
  - garbage collection
  - explicit malloc/free
  - explicit new/delete
  - explicit new/delete backed by allocation pools
  - RAII
  - bump allocators aka obstacks
and I like to be clear about what "owns" each object (responsibility
for cleanup, lifetimes, etc)

Hope this is constructive
Dave



Re: [PATCH] tree-optimization/97081 - fix wrong-code with vectorized shift

2020-09-18 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 18, 2020 at 01:39:16PM +0200, Richard Biener wrote:
> This corrects the mask for creation of x << s | x >> (-x & mask)
> from a rotate x < 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> 
> 2020-09-18  Richard Biener  
> 
>   PR tree-optimization/97081
>   * tree-vect-patterns.c (vect_recog_rotate_pattern): Use the
>   precision of the shifted operand to determine the mask.
> 
>   * gcc.dg/vect/pr97081.c: New testcase.

LGTM, thanks.  Please make sure to backport it down up to 8.x.

Jakub



[PATCH] tree-optimization/97081 - fix wrong-code with vectorized shift

2020-09-18 Thread Richard Biener
This corrects the mask for creation of x << s | x >> (-x & mask)
from a rotate x <

PR tree-optimization/97081
* tree-vect-patterns.c (vect_recog_rotate_pattern): Use the
precision of the shifted operand to determine the mask.

* gcc.dg/vect/pr97081.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr97081.c | 26 ++
 gcc/tree-vect-patterns.c|  3 +--
 2 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr97081.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr97081.c 
b/gcc/testsuite/gcc.dg/vect/pr97081.c
new file mode 100644
index 000..bc83c88c019
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr97081.c
@@ -0,0 +1,26 @@
+#include "tree-vect.h"
+
+#define EXEC_ROR2(a, b, sz) (a >> b) | (a << (64 - b))
+
+#define TYPE __UINT64_TYPE__
+
+void __attribute__((noipa))
+exec_VRORudi_i(TYPE *__restrict__ pvd,
+  TYPE *__restrict__ const pva, unsigned char IMM)
+{
+  unsigned char I2 = IMM & 63;
+
+  for (unsigned i = 0; i < 4; i++)
+pvd[i] = EXEC_ROR2(pva[i], I2, 8);
+}
+
+int main()
+{
+  check_vect ();
+
+  TYPE pvd[4], pva[4] = { 0x0102030405060708, 0x0102030405060708, 
0x0102030405060708, 0x0102030405060708 };
+  exec_VRORudi_i (pvd, pva, 7);
+  if (pvd[0] != 0x10020406080a0c0e)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 478a45a2281..db45740da3c 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2456,7 +2456,6 @@ vect_recog_rotate_pattern (vec_info *vinfo,
   append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt);
 }
   stype = TREE_TYPE (def);
-  scalar_int_mode smode = SCALAR_INT_TYPE_MODE (stype);
 
   if (TREE_CODE (def) == INTEGER_CST)
 {
@@ -2485,7 +2484,7 @@ vect_recog_rotate_pattern (vec_info *vinfo,
append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt, vecstype);
 
   def2 = vect_recog_temp_ssa_var (stype, NULL);
-  tree mask = build_int_cst (stype, GET_MODE_PRECISION (smode) - 1);
+  tree mask = build_int_cst (stype, GET_MODE_PRECISION (mode) - 1);
   def_stmt = gimple_build_assign (def2, BIT_AND_EXPR,
  gimple_assign_lhs (def_stmt), mask);
   if (ext_def)
-- 
2.26.2


[PATCH] amdgcn, nvptx: Disable OMP barriers in nested teams

2020-09-18 Thread Andrew Stubbs
This patch fixes a problem in which nested OpenMP parallel regions cause 
errors if the number of inner teams is not balanced (i.e. the number of 
loop iterations is not divisible by the number of physical threads). A 
testcase is included.


On NVPTX the symptom was a fatal error:

libgomp: cuCtxSynchronize error: an illegal instruction was encountered

This was caused by mismatched "bar.sync" instructions (one waiting for 
32 threads while another is waiting for 256). The source of the mismatch 
being that some threads were still busy while others had run out of work 
to do.


On GCN there was no such error (GCN barriers always wait for all 
threads), but it worked only by chance: the idle threads were "matching" 
different barriers to the busy threads, but it was harmless because the 
thread function pointer remained NULL.


This patch simply skips barriers when they would "wait" for only one 
thread (the current thread). This means that teams nested inside other 
teams now run independently, instead of strictly in lock-step, and is 
only valid as long as inner teams are limited to one thread each 
(currently the case). When the inner regions exit then the barriers for 
the outer region will sync everything up again.


OK to commit?

Andrew

P.S. I can approve the amdgcn portion myself; I'm seeking approval for 
the nvptx portion.
libgomp: disable barriers in nested teams

Both GCN and NVPTX allow nested parallel regions, but the barrier
implementation did not allow the nested teams to run independently of each
other (due to hardware limitations).  This patch fixes that, under the
assumption that each thread will create a new subteam of one thread, by
simply not using barriers when there's no other thread to synchronise.

libgomp/ChangeLog:

	* config/gcn/bar.c (gomp_barrier_wait_end): Skip the barrier if the
	total number of threads is one.
	(gomp_team_barrier_wake): Likewise.
	(gomp_team_barrier_wait_end): Likewise.
	(gomp_team_barrier_wait_cancel_end): Likewise.
	* config/nvptx/bar.c (gomp_barrier_wait_end): Likewise.
	(gomp_team_barrier_wake): Likewise.
	(gomp_team_barrier_wait_end): Likewise.
	(gomp_team_barrier_wait_cancel_end): Likewise.
	* testsuite/libgomp.c-c++-common/nested-parallel-unbalanced.c: New test.

diff --git a/libgomp/config/gcn/bar.c b/libgomp/config/gcn/bar.c
index 02fd19710d4..a21529a624b 100644
--- a/libgomp/config/gcn/bar.c
+++ b/libgomp/config/gcn/bar.c
@@ -43,7 +43,8 @@ gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
   __atomic_store_n (>generation, bar->generation + BAR_INCR,
 			MEMMODEL_RELAXED);
 }
-  asm ("s_barrier" ::: "memory");
+  if (bar->total > 1)
+asm ("s_barrier" ::: "memory");
 }
 
 void
@@ -71,7 +72,8 @@ gomp_barrier_wait_last (gomp_barrier_t *bar)
 void
 gomp_team_barrier_wake (gomp_barrier_t *bar, int count)
 {
-  asm ("s_barrier" ::: "memory");
+  if (bar->total > 1)
+asm ("s_barrier" ::: "memory");
 }
 
 void
@@ -97,7 +99,8 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
 	  state &= ~BAR_CANCELLED;
 	  state += BAR_INCR - BAR_WAS_LAST;
 	  __atomic_store_n (>generation, state, MEMMODEL_RELAXED);
-	  asm ("s_barrier" ::: "memory");
+	  if (bar->total > 1)
+	asm ("s_barrier" ::: "memory");
 	  return;
 	}
 }
@@ -172,7 +175,8 @@ gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar,
 	{
 	  state += BAR_INCR - BAR_WAS_LAST;
 	  __atomic_store_n (>generation, state, MEMMODEL_RELAXED);
-	  asm ("s_barrier" ::: "memory");
+	  if (bar->total > 1)
+	asm ("s_barrier" ::: "memory");
 	  return false;
 	}
 }
@@ -195,7 +199,8 @@ gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar,
 	  abort();
 	}
 
-  asm ("s_barrier" ::: "memory");
+  if (bar->total > 1)
+	asm ("s_barrier" ::: "memory");
   gen = __atomic_load_n (>generation, MEMMODEL_RELAXED);
   if (__builtin_expect (gen & BAR_CANCELLED, 0))
 	return true;
diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c
index 125ca3e49ec..0a723087b9e 100644
--- a/libgomp/config/nvptx/bar.c
+++ b/libgomp/config/nvptx/bar.c
@@ -41,7 +41,8 @@ gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
   __atomic_store_n (>generation, bar->generation + BAR_INCR,
 			MEMMODEL_RELEASE);
 }
-  asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
+  if (bar->total > 1)
+asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
 }
 
 void
@@ -69,7 +70,9 @@ gomp_barrier_wait_last (gomp_barrier_t *bar)
 void
 gomp_team_barrier_wake (gomp_barrier_t *bar, int count)
 {
-  asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
+  asm ("bar.sync 1, %0;" : : "r" (32 * 8/*bar->total*/));
+  if (bar->total > 1)
+asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
 }
 
 void
@@ -95,7 +98,8 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
 	  state &= ~BAR_CANCELLED;
 	  state += BAR_INCR - BAR_WAS_LAST;
 	  __atomic_store_n (>generation, state, MEMMODEL_RELEASE);
-	  asm 

[PATCH] IBM Z: Try to make use of load-and-test instructions

2020-09-18 Thread Stefan Schulze Frielinghaus via Gcc-patches
This patch enables a peephole2 optimization which transforms a load of
constant zero into a temporary register which is then finally used to
compare against a floating-point register of interest into a single load
and test instruction.  However, the optimization is only applied if both
registers are dead afterwards and if we test for (in)equality only.
This is relaxed in case of fast math.

This is a follow up to PR88856.

Bootstrapped and regtested on IBM Z.

gcc/ChangeLog:

* config/s390/s390.md ("*cmp_ccs_0", "*cmp_ccz_0",
"*cmp_ccs_0_fastmath"): Basically change "*cmp_ccs_0" into
"*cmp_ccz_0" and for fast math add "*cmp_ccs_0_fastmath".

gcc/testsuite/ChangeLog:

* gcc.target/s390/load-and-test-fp-1.c: Change test to include all
possible combinations of dead/live registers and comparisons (equality,
relational).
* gcc.target/s390/load-and-test-fp-2.c: Same as load-and-test-fp-1.c
but for fast math.
* gcc.target/s390/load-and-test-fp.h: New test included by
load-and-test-fp-{1,2}.c.
---
 gcc/config/s390/s390.md   | 54 +++
 .../gcc.target/s390/load-and-test-fp-1.c  | 19 +++
 .../gcc.target/s390/load-and-test-fp-2.c  | 17 ++
 .../gcc.target/s390/load-and-test-fp.h| 12 +
 4 files changed, 67 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/load-and-test-fp.h

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 4c3e5400a2b..e591aa7c324 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -1391,23 +1391,55 @@
 ; (TF|DF|SF|TD|DD|SD) instructions
 
 
-; FIXME: load and test instructions turn SNaN into QNaN what is not
-; acceptable if the target will be used afterwards.  On the other hand
-; they are quite convenient for implementing comparisons with 0.0. So
-; try to enable them via splitter/peephole if the value isn't needed anymore.
-; See testcases: load-and-test-fp-1.c and load-and-test-fp-2.c
+; load and test instructions turn a signaling NaN into a quiet NaN.  Thus they
+; may only be used if the target register is dead afterwards or if fast math
+; is enabled.  The former is done via a peephole optimization.  Note, load and
+; test instructions may only be used for (in)equality comparisons because
+; relational comparisons must treat a quiet NaN like a signaling NaN which is
+; not the case for load and test instructions.  For fast math insn
+; "cmp_ccs_0_fastmath" applies.
+; See testcases load-and-test-fp-{1,2}.c
+
+(define_peephole2
+  [(set (match_operand:FP 0 "register_operand")
+   (match_operand:FP 1 "const0_operand"))
+   (set (reg:CCZ CC_REGNUM)
+   (compare:CCZ (match_operand:FP 2 "register_operand")
+(match_operand:FP 3 "register_operand")))]
+  "TARGET_HARD_FLOAT
+   && FP_REG_P (operands[2])
+   && REGNO (operands[0]) == REGNO (operands[3])
+   && peep2_reg_dead_p (2, operands[0])
+   && peep2_reg_dead_p (2, operands[2])"
+  [(parallel
+[(set (reg:CCZ CC_REGNUM)
+ (match_op_dup 4 [(match_dup 2) (match_dup 1)]))
+ (clobber (match_dup 2))])]
+  "operands[4] = gen_rtx_COMPARE (CCZmode, operands[2], operands[1]);")
 
 ; ltxbr, ltdbr, ltebr, ltxtr, ltdtr
-(define_insn "*cmp_ccs_0"
-  [(set (reg CC_REGNUM)
-   (compare (match_operand:FP 0 "register_operand"  "f")
-(match_operand:FP 1 "const0_operand""")))
-   (clobber (match_operand:FP  2 "register_operand" "=0"))]
-  "s390_match_ccmode(insn, CCSmode) && TARGET_HARD_FLOAT"
+(define_insn "*cmp_ccz_0"
+  [(set (reg:CCZ CC_REGNUM)
+   (compare:CCZ (match_operand:FP 0 "register_operand" "f")
+(match_operand:FP 1 "const0_operand")))
+   (clobber (match_operand:FP 2 "register_operand" "=0"))]
+  "TARGET_HARD_FLOAT"
   "ltr\t%0,%0"
[(set_attr "op_type" "RRE")
 (set_attr "type"  "fsimp")])
 
+(define_insn "*cmp_ccs_0_fastmath"
+  [(set (reg CC_REGNUM)
+   (compare (match_operand:FP 0 "register_operand" "f")
+(match_operand:FP 1 "const0_operand")))]
+  "s390_match_ccmode (insn, CCSmode)
+   && TARGET_HARD_FLOAT
+   && !flag_trapping_math
+   && !flag_signaling_nans"
+  "ltr\t%0,%0"
+  [(set_attr "op_type" "RRE")
+   (set_attr "type" "fsimp")])
+
 ; VX: TFmode in FPR pairs: use cxbr instead of wfcxb
 ; cxtr, cdtr, cxbr, cdbr, cebr, cdb, ceb, wfcsb, wfcdb
 (define_insn "*cmp_ccs"
diff --git a/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c 
b/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c
index 2a7e88c0f1b..ebb8a88c574 100644
--- a/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c
+++ b/gcc/testsuite/gcc.target/s390/load-and-test-fp-1.c
@@ -1,17 +1,12 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -mzarch" } */
 
-/* a is used after the comparison.  We cannot use load and test here
-   since it would turn SNaNs into QNaNs.  */
+/* Use load-and-test instructions if compared for (in)equality and if variable
+   `a` is 

[committed] amdgcn: Remove omp_gcn pass

2020-09-18 Thread Andrew Stubbs
This patch removes the amdgcn-specific "omp_gcn" pass that was 
responsible for tweaking the OpenMP middle-end IR for GCN.


In the past there were a few things there to make it work for simple 
cases while real support was built out in the backend and libgomp, but 
those haven't been needed ever since the code was upstreamed.


The only remaining content was replacing two libgomp function calls with 
compiler builtins. Specifically, omp_get_thread_num and omp_get_team_num 
are replaced with the HSA/ROCm grid numbers stored in registers. This is 
an optimization to avoid the function call only, and served no 
correctness purpose because the libgomp functions do work too.


However, I have now learned that OpenMP allows "parallel" regions inside 
other "parallel" regions, and when this happens the team and thread 
numbers no longer match the runtime grid numbers, so the optimization is 
invalid.


Removing the whole pass fixes the correctness issue. It would be nice to 
keep the optimization for the normal case where the regions are not 
nested, but since such calls do not usually occur in the "hot" code, 
it's not clear to me how to detect if nesting is present, and I don't 
have a lot of time to fix this issue, I'm just going to let it go, for now.


Committed to master. I'll backport it to GCC 10 and OG10 shortly.

Andrew
amdgcn: Remove omp_gcn pass

This pass only had an optimization for obtaining team/thread numbers in it,
and that turns out to be invalid in the presence of nested parallel regions,
so we can simply delete the whole thing.

Of course, it would be nice to apply the optimization where it is valid, but
that will take more effort than I have to spend right now.

gcc/ChangeLog:

	* config/gcn/gcn-tree.c (execute_omp_gcn): Delete.
	(make_pass_omp_gcn): Delete.
	* config/gcn/t-gcn-hsa (PASSES_EXTRA): Delete.
	* config/gcn/gcn-passes.def: Removed.

diff --git a/gcc/config/gcn/gcn-passes.def b/gcc/config/gcn/gcn-passes.def
deleted file mode 100644
index bcf928dd418..000
--- a/gcc/config/gcn/gcn-passes.def
+++ /dev/null
@@ -1,19 +0,0 @@
-/* Copyright (C) 2017-2020 Free Software Foundation, Inc.
-
-   This file is part of GCC.
-   
-   GCC is free software; you can redistribute it and/or modify it under
-   the terms of the GNU General Public License as published by the Free
-   Software Foundation; either version 3, or (at your option) any later
-   version.
-   
-   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or
-   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
-   for more details.
-   
-   You should have received a copy of the GNU General Public License
-   along with GCC; see the file COPYING3.  If not see
-   .  */
-
-INSERT_PASS_AFTER (pass_omp_target_link, 1, pass_omp_gcn);
diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c
index a681426139b..4304f13160a 100644
--- a/gcc/config/gcn/gcn-tree.c
+++ b/gcc/config/gcn/gcn-tree.c
@@ -45,125 +45,6 @@
 #include "targhooks.h"
 #include "langhooks-def.h"
 
-/* }}}  */
-/* {{{ OMP GCN pass.
- 
-   This pass is intended to make any GCN-specfic transformations to OpenMP
-   target regions.
- 
-   At present, its only purpose is to convert some "omp" built-in functions
-   to use closer-to-the-metal "gcn" built-in functions.  */
-
-unsigned int
-execute_omp_gcn (void)
-{
-  tree thr_num_tree = builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM);
-  tree thr_num_id = DECL_NAME (thr_num_tree);
-  tree team_num_tree = builtin_decl_explicit (BUILT_IN_OMP_GET_TEAM_NUM);
-  tree team_num_id = DECL_NAME (team_num_tree);
-  basic_block bb;
-  gimple_stmt_iterator gsi;
-  unsigned int todo = 0;
-
-  FOR_EACH_BB_FN (bb, cfun)
-for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
-{
-  gimple *call = gsi_stmt (gsi);
-  tree decl;
-
-  if (is_gimple_call (call) && (decl = gimple_call_fndecl (call)))
-	{
-	  tree decl_id = DECL_NAME (decl);
-	  tree lhs = gimple_get_lhs (call);
-
-	  if (decl_id == thr_num_id)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-		fprintf (dump_file,
-			 "Replace '%s' with __builtin_gcn_dim_pos.\n",
-			 IDENTIFIER_POINTER (decl_id));
-
-	  /* Transform this:
-	 lhs = __builtin_omp_get_thread_num ()
-	 to this:
-	 lhs = __builtin_gcn_dim_pos (1)  */
-	  tree fn = targetm.builtin_decl (GCN_BUILTIN_OMP_DIM_POS, 0);
-	  tree fnarg = build_int_cst (unsigned_type_node, 1);
-	  gimple *stmt = gimple_build_call (fn, 1, fnarg);
-	  gimple_call_set_lhs (stmt, lhs);
-	  gsi_replace (, stmt, true);
-
-	  todo |= TODO_update_ssa;
-	}
-	  else if (decl_id == team_num_id)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-		fprintf (dump_file,
-			 "Replace '%s' with __builtin_gcn_dim_pos.\n",
-			 IDENTIFIER_POINTER (decl_id));
-
-	  /* Transform this:
-	 lhs 

[PATCH] tree-optimization/97095 - fix typo in vectorizable_live_operation

2020-09-18 Thread Richard Biener
This fixes a typo introduced with the last change and not noticed
because those vectorizer access macros are not type safe ...

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-09-18  Richard Biener  

PR tree-optimization/97095
* tree-vect-loop.c (vectorizable_live_operation): Get
the SLP vector type from the correct object.

* gfortran.dg/pr97095.f: New testcase.
---
 gcc/testsuite/gfortran.dg/pr97095.f | 27 +++
 gcc/tree-vect-loop.c|  2 +-
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr97095.f

diff --git a/gcc/testsuite/gfortran.dg/pr97095.f 
b/gcc/testsuite/gfortran.dg/pr97095.f
new file mode 100644
index 000..0b86e8b12d1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr97095.f
@@ -0,0 +1,27 @@
+! { dg-do compile }
+! { dg-options "-O2 -ftree-vectorize" }
+  subroutine gen3delem(nel,ial,ifix,xta,xnoref,dd,jact,nelshell)
+  real*8 xnoref(3),xta(3,100),xn1(3,100)
+  if(nel.gt.0) then
+do j=1,nel
+enddo
+do
+enddo
+  endif
+  do
+if(ifix.eq.0) then
+  do j=nelshell,nel
+if(ial(j).eq.0) then
+endif
+  enddo
+endif
+do j=nelshell,nel
+enddo
+do j=1,3
+  xnoref(j)=xnoref(j)/dd
+enddo
+xn1(2,jact)=xnoref(3)*xta(1,jact)-xnoref(1)*xta(3,jact)
+xn1(3,jact)=xnoref(1)*xta(2,jact)-xnoref(2)*xta(1,jact)
+call foo(xn1)
+  enddo
+  end
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 3021be3b241..b1a6e1508c7 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -8045,7 +8045,7 @@ vectorizable_live_operation (vec_info *vinfo,
   imm_use_iterator imm_iter;
   tree lhs, lhs_type, bitsize, vec_bitsize;
   tree vectype = (slp_node
- ? SLP_TREE_VECTYPE (SLP_TREE_REPRESENTATIVE (slp_node))
+ ? SLP_TREE_VECTYPE (slp_node)
  : STMT_VINFO_VECTYPE (stmt_info));
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
-- 
2.26.2


Re: [PATCH] Enable GCC support for AMX

2020-09-18 Thread Hongyu Wang via Gcc-patches
Hi Kirill,

Very Appreciated for your review again

I just update the patch with adding XSAVE dependency and use
__builtin_cpu_supports for runtime test.

Re-based on Sept. 15 trunk and tested with sde. Kindly PING.


Hongyu Wang  于2020年9月12日周六 上午1:00写道:

> Hi
>
> Thanks for your review, and sorry for the late reply. It took a while
> to finish the runtime test.
>
> > > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > > index 797f0ad5edd..d0e59e86a5c 100644
> > > --- a/gcc/config.gcc
> > > +++ b/gcc/config.gcc
> > > @@ -412,7 +412,7 @@ i[34567]86-*-*)
> > >  waitpkgintrin.h cldemoteintrin.h
> avx512bf16vlintrin.h
> > >  avx512bf16intrin.h enqcmdintrin.h
> serializeintrin.h
> > >  avx512vp2intersectintrin.h
> avx512vp2intersectvlintrin.h
> > > -tsxldtrkintrin.h"
> > > +tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
> amxbf16intrin.h"
> >
> > Line more than 80 chars.
> >
> > >   ;;
> > >  x86_64-*-*)
> > >   cpu_type=i386
> > > @@ -447,7 +447,7 @@ x86_64-*-*)
> > >  waitpkgintrin.h cldemoteintrin.h
> avx512bf16vlintrin.h
> > >  avx512bf16intrin.h enqcmdintrin.h
> serializeintrin.h
> > >  avx512vp2intersectintrin.h
> avx512vp2intersectvlintrin.h
> > > -tsxldtrkintrin.h"
> > > +tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h
> amxbf16intrin.h"
> >
> > Ditto.
>
> Changed.
>
> >
> > > diff --git a/gcc/config/i386/amxbf16intrin.h
> b/gcc/config/i386/amxbf16intrin.h
> > > new file mode 100644
> > > index 000..df0e2262d50
> > > --- /dev/null
> > > +++ b/gcc/config/i386/amxbf16intrin.h
> > > @@ -0,0 +1,25 @@
> > > +#if !defined _IMMINTRIN_H_INCLUDED
> > > +#error "Never use  directly; include 
> instead."
> > > +#endif
> > > +
> > > +#ifndef _AMXBF16INTRIN_H_INCLUDED
> > > +#define _AMXBF16INTRIN_H_INCLUDED
> > > +
> > > +#if !defined(__AMX_BF16__)
> > > +#pragma GCC push_options
> > > +#pragma GCC target("amx-bf16")
> > > +#define __DISABLE_AMX_BF16__
> > > +#endif /* __AMX_BF16__ */
> > > +
> > > +#if defined(__x86_64__) && defined(__AMX_BF16__)
> > > +#define _tile_dpbf16ps(dst,src1,src2)
> \
> > > +  __asm__ volatile\
> > > +  ("{tdpbf16ps\t%%tmm"#src2", %%tmm"#src1",
> %%tmm"#dst"|tdpbf16ps\t%%tmm"#dst", %%tmm"#src1", %%tmm"#src2"}" ::)
> > > +#endif
> >
> > I hope in future we'll replace it with unspecs at least...
>
> Currently we think it is redundant to add builtins with just const int
> parameters, which are supposed to be replaced in the future.
>
> >
> > > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > > index c9f7195d423..9389dc24948 100644
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index bca8c856dc8..a46e31f5862 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -1357,6 +1357,7 @@ See RS/6000 and PowerPC Options.
> > >  -mvpclmulqdq  -mavx512bitalg  -mmovdiri  -mmovdir64b
> -mavx512vpopcntdq @gol
> > >  -mavx5124fmaps  -mavx512vnni  -mavx5124vnniw  -mprfchw  -mrdpid @gol
> > >  -mrdseed  -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol
> > > +-mamx-tile -mamx-int8 -mamx-bf16@gol
> >
> > Add space please.
>
> Changed.
>
> >
> > > diff --git a/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > > new file mode 100644
> > > index 000..605a44df3f8
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/amxbf16-asmintel-2.c
> > > @@ -0,0 +1,4 @@
> > > +/* { dg-do assemble { target { ! ia32 } } } */
> > > +/* { dg-options "-O2 -mamx-bf16 -masm=intel" } */
> > > +/* { dg-require-effective-target amx_bf16 } */
> > > +#include"amxbf16-asmintel-1.c"
> >
> > I didn't get it. We ususally use second tescase to actually execute
> > it and (well, a little) verify that semantics is ok. E.g. that
> > operands order is correct. Could you please do that?
> > This applies to all *-2.c cases.
> > I've checked and looks like public SDE simulator supports AMX.
> >
>
> Added runtime test. Tested and passed under SDE.
>
> Also, we adjust the intrinsic call to accept macro parameters.
>
> Updated patch.
>
> > --
> > K
> > Hello,
> >
> > On 03 сен 08:17, H.J. Lu wrote:
> > > On Thu, Sep 3, 2020 at 8:08 AM Kirill Yukhin via Gcc-patches
> > >  wrote:
> > > >
> > > > Hello,
> > > >
> > > > On 06 июл 09:58, Hongyu Wang via Gcc-patches wrote:
> > > > > Hi:
> > > > >
> > > > > This patch is about to support Intel Advanced Matrix Extensions
> (AMX)
> > > > > which will be enabled in GLC.
> > > > >
> > > > > AMX is a new 64-bit programming paradigm consisting of two
> > > > > compo nents: a set of 2-dimensional registers (tiles) representing
> > > > > sub-arrays from a larger 2-dimensional memory image,
> > > > > and an accelerator able to operate on tiles
> > > > >
> > > > > Supported instructions are
> > > > >
> > > > >
> 

[PATCH] optabs: Don't reuse target for multi-word expansions if it overlaps operand(s) [PR97073]

2020-09-18 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase is miscompiled on i686-linux, because
we try to expand a double-word bitwise logic operation with op0
being a (mem:DI u) and target (mem:DI u+4), i.e. partial overlap, and
thus end up with:
movl4(%esp), %eax
andlu, %eax
movl%eax, u+4
! movl u+4, %eax optimized out
andl8(%esp), %eax
movl%eax, u+8
rather than with the desired:
movl4(%esp), %edx
movl8(%esp), %eax
andlu, %edx
andlu+4, %eax
movl%eax, u+8
movl%edx, u+4
because the store of the first word to target overwrites the second word of
the operand.
expand_binop for this (and several similar places) already check for target
== op0 or target == op1, this patch just adds reg_overlap_mentioned_p calls
next to it.
Pedantically, at least for some of these it might be sufficient to force
a different target if there is overlap but target is not rtx_equal_p to
the operand (e.g. in this bitwise logical case, but e.g. not in the shift
cases where there is reordering), though that would go against the
preexisting target == op? checks and the rationale that REG_EQUAL notes in
that case isn't correct.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
release branches?

2020-09-18  Jakub Jelinek  

PR middle-end/97073
* optabs.c (expand_binop, expand_absneg_bit, expand_unop,
expand_copysign_bit): Check reg_overlap_mentioned_p between target
and operand(s) and if it returns true, force a pseudo as target.

* gcc.c-torture/execute/pr97073.c: New test.

--- gcc/optabs.c.jj 2020-07-28 15:39:09.0 +0200
+++ gcc/optabs.c2020-09-17 22:07:53.238396458 +0200
@@ -1395,6 +1395,8 @@ expand_binop (machine_mode mode, optab b
   if (target == 0
  || target == op0
  || target == op1
+ || reg_overlap_mentioned_p (target, op0)
+ || reg_overlap_mentioned_p (target, op1)
  || !valid_multiword_target_p (target))
target = gen_reg_rtx (int_mode);
 
@@ -1475,6 +1477,8 @@ expand_binop (machine_mode mode, optab b
  if (target == 0
  || target == op0
  || target == op1
+ || reg_overlap_mentioned_p (target, op0)
+ || reg_overlap_mentioned_p (target, op1)
  || !valid_multiword_target_p (target))
target = gen_reg_rtx (int_mode);
 
@@ -1533,6 +1537,8 @@ expand_binop (machine_mode mode, optab b
  || target == op0
  || target == op1
  || !REG_P (target)
+ || reg_overlap_mentioned_p (target, op0)
+ || reg_overlap_mentioned_p (target, op1)
  || !valid_multiword_target_p (target))
target = gen_reg_rtx (int_mode);
 
@@ -2670,6 +2676,7 @@ expand_absneg_bit (enum rtx_code code, s
 
   if (target == 0
   || target == op0
+  || reg_overlap_mentioned_p (target, op0)
   || (nwords > 1 && !valid_multiword_target_p (target)))
 target = gen_reg_rtx (mode);
 
@@ -2951,7 +2958,10 @@ expand_unop (machine_mode mode, optab un
   int i;
   rtx_insn *insns;
 
-  if (target == 0 || target == op0 || !valid_multiword_target_p (target))
+  if (target == 0
+ || target == op0
+ || reg_overlap_mentioned_p (target, op0)
+ || !valid_multiword_target_p (target))
target = gen_reg_rtx (int_mode);
 
   start_sequence ();
@@ -3472,6 +3482,8 @@ expand_copysign_bit (scalar_float_mode m
   if (target == 0
   || target == op0
   || target == op1
+  || reg_overlap_mentioned_p (target, op0)
+  || reg_overlap_mentioned_p (target, op1)
   || (nwords > 1 && !valid_multiword_target_p (target)))
 target = gen_reg_rtx (mode);
 
--- gcc/testsuite/gcc.c-torture/execute/pr97073.c.jj2020-09-17 
22:00:39.778614611 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr97073.c   2020-09-17 
22:06:08.870893863 +0200
@@ -0,0 +1,21 @@
+/* PR middle-end/97073 */
+/* { dg-additional-options "-mno-stv" { target i?86-*-* x86_64-*-* } } */
+
+typedef unsigned long long L;
+union U { L i; struct T { unsigned k; L l; } j; } u;
+
+__attribute__((noinline,noclone)) void
+foo (L x)
+{
+  u.j.l = u.i & x;
+}
+
+int
+main ()
+{
+  u.i = 5;
+  foo (-1ULL);
+  if (u.j.l != 5)
+__builtin_abort ();
+  return 0;
+}

Jakub



[PATCH 1/2] tree-optimization/97089 - fix bogus unsigned division replacement

2020-09-18 Thread Richard Biener
This fixes bogus replacing of an unsigned (-x)/y division by
-(x/y).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2020-09-18  Richard Biener  

PR tree-optimization/97089
* tree-ssa-sccvn.c (visit_nary_op): Do not replace unsigned
divisions.
---
 gcc/tree-ssa-sccvn.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 64f1e8c9160..014b7bdfd01 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -4824,8 +4824,11 @@ visit_nary_op (tree lhs, gassign *stmt)
}
}
   break;
-case RDIV_EXPR:
 case TRUNC_DIV_EXPR:
+  if (TYPE_UNSIGNED (type))
+   break;
+  /* Fallthru.  */
+case RDIV_EXPR:
 case MULT_EXPR:
   /* Match up ([-]a){/,*}([-])b with v=a{/,*}b, replacing it with -v.  */
   if (! HONOR_SIGN_DEPENDENT_ROUNDING (type))
-- 
2.26.2




[PATCH] tree-optimization/97098 - fix compile-time hog in SLP live

2020-09-18 Thread Richard Biener
This fixes a missed early-out in SLP live stmt marking when
all scalar stmts were already visited (oops).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

2020-09-18  Richard Biener  

PR tree-optimization/97098
* tree-vect-slp.c (vect_bb_slp_mark_live_stmts): Do not
recurse to children when all stmts were already visited.
---
 gcc/tree-vect-slp.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index ecce7a982dd..ef62c2dff2e 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3018,11 +3018,13 @@ vect_bb_slp_mark_live_stmts (bb_vec_info bb_vinfo, 
slp_tree node,
   unsigned i;
   stmt_vec_info stmt_info;
   stmt_vec_info last_stmt = vect_find_last_scalar_stmt_in_slp (node);
+  bool all_visited = true;
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
 {
   stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
   if (svisited.contains (orig_stmt_info))
continue;
+  all_visited = false;
   bool mark_visited = true;
   gimple *orig_stmt = orig_stmt_info->stmt;
   ssa_op_iter op_iter;
@@ -3091,6 +3093,8 @@ vect_bb_slp_mark_live_stmts (bb_vec_info bb_vinfo, 
slp_tree node,
   if (mark_visited)
svisited.add (orig_stmt_info);
 }
+  if (all_visited)
+return;
 
   slp_tree child;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
-- 
2.26.2


Re: [pushed] c++: Layout decls with newly-complete type.

2020-09-18 Thread Richard Biener via Gcc-patches
On Fri, Sep 18, 2020 at 5:20 AM Jason Merrill via Gcc-patches
 wrote:
>
> Martin's -Wplacement-new patch ran into a problem with DECL_SIZE not being
> set on an extern variable for which the type was not complete until after
> its declaration.  complete_vars was deliberately not calling layout_decl for
> some reason, instead leaving that for expand_expr_real_1 much later in the
> compilation.  But if we layout decls at declaration time, I don't see any
> reason we shouldn't lay them out here, when their type is newly complete.
>
> Tested x86_64-pc-linux-gnu, applying to trunk.

Seems to break bootstrap in stage2,

/home/rguenther/src/trunk/gcc/dumpfile.c:169:33: error: storage size
of 'optgroup_options' isn't known

Richard.

> gcc/cp/ChangeLog:
>
> * decl.c (complete_vars): Call layout_var_decl.
> ---
>  gcc/cp/decl.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index ad2a30fcf71..746ed101fef 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -17460,10 +17460,10 @@ complete_vars (tree type)
>   && (TYPE_MAIN_VARIANT (strip_array_types (type))
>   == iv->incomplete_type))
> {
> - /* Complete the type of the variable.  The VAR_DECL itself
> -will be laid out in expand_expr.  */
> + /* Complete the type of the variable.  */
>   complete_type (type);
>   cp_apply_type_quals_to_decl (cp_type_quals (type), var);
> + layout_var_decl (var);
> }
>
>   /* Remove this entry from the list.  */
>
> base-commit: 0f079e104a8d1994b6b47169a6b45737615eb2d7
> --
> 2.18.1
>


Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)

2020-09-18 Thread Aldy Hernandez via Gcc-patches




On 9/17/20 10:18 PM, Martin Sebor wrote:

On 9/17/20 12:39 PM, Andrew MacLeod wrote:

On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:

On 9/16/20 9:23 PM, Jeff Law wrote:


On 9/15/20 1:47 PM, Martin Sebor wrote:

Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
can lead to a subsequent buffer overflow corrupting the heap or
stack.  The attached patch diagnoses a subset of these cases where
the overflow/wraparound is still detectable.

Besides regtesting GCC on x86_64-linux I also verified the warning
doesn't introduce any false positives into Glibc or Binutils/GDB
builds on the same target.

Martin

gcc-96838.diff

PR middle-end/96838 - missing warning on integer overflow in calls 
to allocation functions


gcc/ChangeLog:

PR middle-end/96838
* calls.c (eval_size_vflow): New function.
(get_size_range): Call it.  Add argument.
(maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
* calls.h (get_size_range): Add argument.

gcc/testsuite/ChangeLog:

PR middle-end/96838
* gcc.dg/Walloc-size-larger-than-19.c: New test.
* gcc.dg/Walloc-size-larger-than-20.c: New test.


If an attacker can control an integer overflow that feeds an 
allocation, then they can do all kinds of bad things.  In fact, when 
my son was asking me attack vectors, this is one I said I'd look at 
if I were a bad guy.



I'm a bit surprised you can't just query the range of the argument 
and get the overflow status out of that range, but I don't see that 
in the APIs.  How painful would it be to make that part of the API? 
The conceptual model would be to just ask for the range of the 
argument to malloc which would include the range and a status bit 
indicating the computation might have overflowed.


  Do we know if it did/would have wrapped? sure.  since we have to do 
the math.    so you are correct in that the information is there. but 
is it useful?


We are in the very annoying habit of subtracting one by adding 
0xFFF.  which means you get an overflow for unsigned when you 
subtract one.   From what I have seen of unsigned math, we would be 
flagging very many operations as overflows, so you would still have 
the difficulty of figuring out whether its a "real" overflow or a fake 
one because of the way we do unsigned math


You and me both :)



At the very start, I did have an overflow flag in the range class... 
but it was turning out to be fairly useless so it was removed.

.


I agree that being able to evaluate an expression in an as-if
infinite precision (in addition to its type) would be helpful.


SO again, we get back to adding 0x0f when we are trying to 
subtract one...  now, with infinite precision you are going to see


  [2,10]  - 1  we end up with [2,10]+0xFF, which will now give 
you  [0x10001, 0x10009]    so its going to look like it 
overflowed?





But just to make sure I understood correctly, let me ask again
using an example:

  void* f (size_t n)
  {
    if (n < PTRDIFF_MAX / 2)
  n = PTRDIFF_MAX / 2;

    return malloc (n * sizeof (int));
  }

Can the unsigned wraparound in the argument be readily detected?

On trunk, this ends up with the following:

  # RANGE [4611686018427387903, 18446744073709551615]
  _6 = MAX_EXPR ;
  # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
  _1 = _6 * 4;
  ...
  p_5 = mallocD.1206 (_1); [tail call]
  ...
  return p_5;

so _1's range reflects the wraparound in size_t, but _6's range
has enough information to uncover it.  So detecting it is possible
and is done in the patch so we get a warning:

warning: argument 1 range [18446744073709551612, 0x3fffc] 
is too large to represent in ‘long unsigned int’ 
[-Walloc-size-larger-than=]

    6 |   return malloc (n * sizeof (int));
  |  ^

The code is very simplistic and only handles a small subset of cases.
It could be generalized and exposed by a more generic API but it does
seem like the ranger must already have all the logic built into it so
if it isn't exposed now it should be a matter of opening it up.


everything is exposed in range-ops.  well, mostly.
if we have _1 = _6 * 4

if one wanted to do that infinite precision, you query the range for 
_6, and the range for 4 (which would be [4,4] :-)

range_of_expr (op1r, _6, stmt)
range_of_expr (op2r, 4, stmt)

you could take their current types, and cast those ranges to whatever 
the next higher precsion is,

range_cast  (op1r, highertype)
range_cast (op2r, highertype)
then invoke the operation on those parameters

gimple_range_fold (r, stmt,  op1r, op2r)

and that will do your operation in the higher precision.  you could 
compare that to the value in regular precision too i suppose.


The patch does pretty much exactly what you described, except in
offset_int, and only for a limited set of arithmetic operations.
It figures out if an unsigned expression wrapped simply by checking
to see if the mathematically correct result 

[PATCH v2 1/2] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR

2020-09-18 Thread Xiong Hu Luo via Gcc-patches
This patch enables transformation from ARRAY_REF(VIEW_CONVERT_EXPR) to
VEC_SET internal function in gimple-isel pass if target supports
vec_set with variable index by checking can_vec_set_var_idx_p.

gcc/ChangeLog:

2020-09-18  Xionghu Luo  

* gimple-isel.cc (gimple_expand_vec_set_expr): New function.
(gimple_expand_vec_cond_exprs): Call gimple_expand_vec_set_expr.
* internal-fn.c (vec_set_direct): New define.
(expand_vec_set_optab_fn): New function.
(direct_vec_set_optab_supported_p): New define.
* internal-fn.def (VEC_SET): New DEF_INTERNAL_OPTAB_FN.
* optabs.c (can_vec_set_var_idx_p): New function.
* optabs.h (can_vec_set_var_idx_p): New declare.
---
 gcc/gimple-isel.cc  | 116 +++-
 gcc/internal-fn.c   |  36 ++
 gcc/internal-fn.def |   2 +
 gcc/optabs.c|  17 +++
 gcc/optabs.h|   3 ++
 5 files changed, 172 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index b330cf4c20e..bc61e2895be 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -35,6 +35,80 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfg.h"
 #include "bitmap.h"
 #include "tree-ssa-dce.h"
+#include "fold-const.h"
+#include "gimple-fold.h"
+#include "memmodel.h"
+#include "optabs.h"
+
+/* Expand all ARRAY_REF(VIEW_CONVERT_EXPR) gimple assignments into calls to
+   internal function based on vector type of selected expansion.
+   i.e.:
+ VIEW_CONVERT_EXPR(u)[_1] =  = i_4(D);
+   =>
+ _7 = u;
+ _8 = .VEC_SET (_7, i_4(D), _1);
+ u = _8;  */
+
+static gimple *
+gimple_expand_vec_set_expr (gimple_stmt_iterator *gsi)
+{
+  enum tree_code code;
+  gcall *new_stmt = NULL;
+  gassign *ass_stmt = NULL;
+
+  /* Only consider code == GIMPLE_ASSIGN.  */
+  gassign *stmt = dyn_cast (gsi_stmt (*gsi));
+  if (!stmt)
+return NULL;
+
+  code = TREE_CODE (gimple_assign_lhs (stmt));
+  if (code != ARRAY_REF)
+return NULL;
+
+  tree lhs = gimple_assign_lhs (stmt);
+  tree val = gimple_assign_rhs1 (stmt);
+
+  tree type = TREE_TYPE (lhs);
+  tree op0 = TREE_OPERAND (lhs, 0);
+  if (TREE_CODE (op0) == VIEW_CONVERT_EXPR
+  && tree_fits_uhwi_p (TYPE_SIZE (type)))
+{
+  tree pos = TREE_OPERAND (lhs, 1);
+  tree view_op0 = TREE_OPERAND (op0, 0);
+  machine_mode outermode = TYPE_MODE (TREE_TYPE (view_op0));
+  scalar_mode innermode = GET_MODE_INNER (outermode);
+  tree_code code = TREE_CODE (TREE_TYPE(view_op0));
+  if (!is_global_var (view_op0) && code == VECTOR_TYPE
+ && tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (view_op0)))
+ && can_vec_set_var_idx_p (code, outermode, innermode,
+   TYPE_MODE (TREE_TYPE (pos
+   {
+ location_t loc = gimple_location (stmt);
+ tree var_src = make_ssa_name (TREE_TYPE (view_op0));
+ tree var_dst = make_ssa_name (TREE_TYPE (view_op0));
+
+ ass_stmt = gimple_build_assign (var_src, view_op0);
+ gimple_set_vuse (ass_stmt, gimple_vuse (stmt));
+ gimple_set_location (ass_stmt, loc);
+ gsi_insert_before (gsi, ass_stmt, GSI_SAME_STMT);
+
+ new_stmt
+   = gimple_build_call_internal (IFN_VEC_SET, 3, var_src, val, pos);
+ gimple_call_set_lhs (new_stmt, var_dst);
+ gimple_set_location (new_stmt, loc);
+ gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+
+ ass_stmt = gimple_build_assign (view_op0, var_dst);
+ gimple_set_location (ass_stmt, loc);
+ gsi_insert_before (gsi, ass_stmt, GSI_SAME_STMT);
+
+ gimple_move_vops (ass_stmt, stmt);
+ gsi_remove (gsi, true);
+   }
+}
+
+  return ass_stmt;
+}
 
 /* Expand all VEC_COND_EXPR gimple assignments into calls to internal
function based on type of selected expansion.  */
@@ -187,8 +261,25 @@ gimple_expand_vec_cond_exprs (void)
 {
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
{
- gimple *g = gimple_expand_vec_cond_expr (,
-  _cond_ssa_name_uses);
+ gassign *stmt = dyn_cast (gsi_stmt (gsi));
+ if (!stmt)
+   continue;
+
+ enum tree_code code;
+ gimple *g = NULL;
+ code = gimple_assign_rhs_code (stmt);
+ switch (code)
+   {
+   case VEC_COND_EXPR:
+ g = gimple_expand_vec_cond_expr (, _cond_ssa_name_uses);
+ break;
+   case ARRAY_REF:
+ /*  TODO: generate IFN for vec_extract with variable index.  */
+ break;
+   default:
+ break;
+   }
+
  if (g != NULL)
{
  tree lhs = gimple_assign_lhs (gsi_stmt (gsi));
@@ -204,6 +295,27 @@ gimple_expand_vec_cond_exprs (void)
 
   simple_dce_from_worklist (dce_ssa_names);
 
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  for (gsi = gsi_start_bb (bb); 

[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-18 Thread Xiong Hu Luo via Gcc-patches
vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
to be insert, arg2 is the place to insert arg1 to arg0.  Current expander
generates stxv+stwx+lxv if arg2 is variable instead of constant, which
causes serious store hit load performance issue on Power.  This patch tries
 1) Build VIEW_CONVERT_EXPR for vec_insert (i, v, n) like v[n&3] = i to
unify the gimple code, then expander could use vec_set_optab to expand.
 2) Expand the IFN VEC_SET to fast instructions: lvsl+xxperm+xxsel.
In this way, "vec_insert (i, v, n)" and "v[n&3] = i" won't be expanded too
early in gimple stage if arg2 is variable, avoid generating store hit load
instructions.

For Power9 V4SI:
addi 9,1,-16
rldic 6,6,2,60
stxv 34,-16(1)
stwx 5,9,6
lxv 34,-16(1)
=>
addis 9,2,.LC0@toc@ha
addi 9,9,.LC0@toc@l
mtvsrwz 33,5
lxv 32,0(9)
sradi 9,6,2
addze 9,9
sldi 9,9,2
subf 9,9,6
subfic 9,9,3
sldi 9,9,2
subfic 9,9,20
lvsl 13,0,9
xxperm 33,33,45
xxperm 32,32,45
xxsel 34,34,33,32

Though instructions increase from 5 to 15, the performance is improved
60% in typical cases.
Tested with V2DI, V2DF V4SI, V4SF, V8HI, V16QI on Power9-LE and
Power8-BE, bootstrap tested pass.

gcc/ChangeLog:

2020-09-18  Xionghu Luo  

* config/rs6000/altivec.md (altivec_lvsl_reg_2): Rename to
 (altivec_lvsl_reg_2) and extend to SDI mode.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Ajdust variable index vec_insert to VIEW_CONVERT_EXPR.
* config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var):
New declare.
* config/rs6000/rs6000.c (rs6000_expand_vector_set_var):
New function.
* config/rs6000/rs6000.md (FQHS): New mode iterator.
(FD): New mode iterator.
p8_mtvsrwz_v16qi2: New define_insn.
p8_mtvsrd_v16qi2: New define_insn.
* config/rs6000/vector.md: Add register operand2 match for
vec_set index.
* config/rs6000/vsx.md: Call gen_altivec_lvsl_reg_di2.

gcc/testsuite/ChangeLog:

2020-09-18  Xionghu Luo  

* gcc.target/powerpc/pr79251.c: New test.
* gcc.target/powerpc/pr79251-run.c: New test.
* gcc.target/powerpc/pr79251.h: New header.
---
 gcc/config/rs6000/altivec.md  |   4 +-
 gcc/config/rs6000/rs6000-c.c  |  22 ++-
 gcc/config/rs6000/rs6000-protos.h |   1 +
 gcc/config/rs6000/rs6000.c| 146 ++
 gcc/config/rs6000/rs6000.md   |  19 +++
 gcc/config/rs6000/vector.md   |  19 ++-
 gcc/config/rs6000/vsx.md  |   2 +-
 .../gcc.target/powerpc/pr79251-run.c  |  29 
 gcc/testsuite/gcc.target/powerpc/pr79251.c|  15 ++
 gcc/testsuite/gcc.target/powerpc/pr79251.h|  19 +++
 10 files changed, 257 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251-run.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.h

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0a2e634d6b0..66b636059a6 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2772,10 +2772,10 @@ (define_expand "altivec_lvsl"
   DONE;
 })
 
-(define_insn "altivec_lvsl_reg"
+(define_insn "altivec_lvsl_reg_2"
   [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
(unspec:V16QI
-   [(match_operand:DI 1 "gpc_reg_operand" "b")]
+   [(match_operand:SDI 1 "gpc_reg_operand" "b")]
UNSPEC_LVSL_REG))]
   "TARGET_ALTIVEC"
   "lvsl %0,0,%1"
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 2fad3d94706..78abe49c833 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -1509,9 +1509,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
   tree arg1;
   tree arg2;
   tree arg1_type;
-  tree arg1_inner_type;
   tree decl, stmt;
-  tree innerptrtype;
   machine_mode mode;
 
   /* No second or third arguments. */
@@ -1563,8 +1561,13 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  return build_call_expr (call, 3, arg1, arg0, arg2);
}
 
-  /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2) = arg0. */
-  arg1_inner_type = TREE_TYPE (arg1_type);
+  /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2) = arg0 with
+VIEW_CONVERT_EXPR.  i.e.:
+D.3192 = v1;
+_1 = n & 3;
+VIEW_CONVERT_EXPR(D.3192)[_1] = i;
+v1 = D.3192;
+D.3194 = v1;  */
   if (TYPE_VECTOR_SUBPARTS (arg1_type) == 1)
arg2 = build_int_cst (TREE_TYPE (arg2), 0);
   else
@@ -1593,15 +1596,8 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  SET_EXPR_LOCATION 

Re: [PATCH] irange_pool class

2020-09-18 Thread Aldy Hernandez via Gcc-patches

On 9/17/20 8:02 PM, Martin Sebor wrote:
> On 9/17/20 4:36 AM, Aldy Hernandez via Gcc-patches wrote:
>> This is the irange storage class.  It is used to allocate the minimum
>> amount of storage needed for a given irange.  Storage is automatically
>> freed at destruction.
>>
>> It is meant for long term storage, as opposed to int_range_max which
>> is meant for intermediate temporary results on the stack.
>>
>> The general gist is:
>>
>>  irange_pool pool;
>>
>>  // Allocate an irange of 5 sub-ranges.
>>  irange *p = pool.allocate (5);
>>
>>  // Allocate an irange of 3 sub-ranges.
>>  irange *q = pool.allocate (3);
>>
>>  // Allocate an irange with as many sub-ranges as are currently
>>  // used in "some_other_range".
>>  irange *r = pool.allocate (some_other_range);
>
> I used this as an opportunity to learn more about the irange classes,
> so I have more to say than this little change alone might justify.
>
>>
>> OK?
>> Aldy
>> ---
>>   gcc/value-range.h | 63 +++
>>   1 file changed, 63 insertions(+)
>>
>> diff --git a/gcc/value-range.h b/gcc/value-range.h
>> index 8497791c7b3..88cb3075bf0 100644
>> --- a/gcc/value-range.h
>> +++ b/gcc/value-range.h
>> @@ -43,6 +43,7 @@ enum value_range_kind
>>
>>   class irange
>>   {
>> +  friend class irange_pool;
>>   public:
>> // In-place setters.
>> void set (tree, tree, value_range_kind = VR_RANGE);
>> @@ -619,4 +620,66 @@ vrp_val_min (const_tree type)
>> return NULL_TREE;
>>   }
>>
>> +// This is the irange storage class.  It is used to allocate the
>> +// minimum amount of storage needed for a given irange.  Storage is
>> +// automatically freed at destruction.
>> +//
>> +// It is meant for long term storage, as opposed to int_range_max
>> +// which is meant for intermediate temporary results on the stack.
>> +
>> +class irange_pool
>> +{
>> +public:
>> +  irange_pool ();
>> +  ~irange_pool ();
>> +  // Return a new range with NUM_PAIRS.
>> +  irange *allocate (unsigned num_pairs);
>> +  // Return a copy of SRC with the minimum amount of sub-ranges needed
>> +  // to represent it.
>> +  irange *allocate (const irange );
>> +private:
>> +  struct obstack irange_obstack;
>> +};
>
> Since the class owns a resource, its copy ctor and assignment should
> either transfer it or copy it between instances, or be disabled if
> copying isn't intended.

But that would be silly.  Who would ever think of copying the allocator 
object? :-).  But it makes perfect sense.  I've disabled copy and 
assignment.


>
> I don't know much about the obstack APIs but if it's anything like
> malloc/free or new/delete I'm guessing the class isn't meant to be
> copied or assigned.  If that's correct, I would suggest to make it
> an error by making its copy ctor and copy assignment operator
> private or deleted, e.g., by making use of DISABLE_COPY_AND_ASSIGN.
>
> Otherwise, if the implicitly provided copy ctor and assignment work
> correctly, I'd suggest to add a comment to the class making it clear
> that copying and assignment are in fact safe.
>
>
>> +
>> +inline
>> +irange_pool::irange_pool ()
>> +{
>> +  obstack_init (_obstack);
>> +}
>> +
>> +inline
>> +irange_pool::~irange_pool ()
>> +{
>> +  obstack_free (_obstack, NULL);
>> +}
>> +
>> +// Return a new range with NUM_PAIRS.
>> +
>> +inline irange *
>> +irange_pool::allocate (unsigned num_pairs)
>> +{
>> +  // Never allocate 0 pairs.
>> +  // Don't allocate 1 either, or we get legacy value_range's.
>> +  if (num_pairs < 2)
>> +num_pairs = 2;
>> +
>> +  struct newir {
>> +irange range;
>> +tree mem[1];
>> +  };
>> +  struct newir *r
>> += (struct newir *) obstack_alloc (_obstack,
>> +  sizeof (struct newir)
>> +  + sizeof (tree) * 2 * (num_pairs - 1));
>> +  return new ((irange *) r) irange (&(r->mem[0]), num_pairs);
>
> FWIW, it took me a minute before I understood what this dense code
> does.  Rewriting it like this helped:

Yeah.  We need the newir object to get the alignment right.

>
>size_t nbytes
>  = (sizeof (newir) + sizeof (tree) * 2 * (num_pairs - 1));
>
>struct newir *r
>  = (newir *) obstack_alloc (_obstack, nbytes);
>
>return new (r) irange (r->mem, num_pairs);

Indeed.  I find splitting things up easier to read.  FWIW, the original 
code had it split up, and then it got munged up in one of our many 
iterations.  I blame Andrew :).


>
> The non-member placement new takes a void* argument so the cast from
> r's type to irange* isn't necessary.  &(r->mem[0]) is the same as
> r->mem which also reads a little clearer to me.  In C++, the class-

Same here :).

> id ("struct" or "class") can be omitted when there's no ambiguity.

I always thought common usage was to put the struct, but avoid the 
class.  I'm fine either way though.


>
> The allocated memory is passed to the irange ctor uninitialized but
> the ctor doesn't access it, and (AFAICS) neither do any 

Re: [PATCH] Increase rtx_cost of sse_to_integer in skylake_cost.

2020-09-18 Thread Hongtao Liu via Gcc-patches
On Thu, Sep 17, 2020 at 12:10 PM Jeff Law  wrote:
>
>
> On 9/15/20 9:20 PM, Hongtao Liu via Gcc-patches wrote:
> > Hi:
> >   Rtx cost of sse_to_integer would be used by pass_stv as a
> > measurement for the scalar-to-vector transformation. As
> > https://gcc.gnu.org/pipermail/gcc-patches/2019-August/528839.html
> > indicates, movement between sse regs and gprs should be much expensive
> > than movement inside gprs(which is 2 as default). This patch would
> > also fix "pr96861".
> >
> >   Bootstrap is ok, regression test is ok for both "i386.exp=*
> > --target_board='unix{-m32,}'" and "i386.exp=*
> > --target_board='unix{-m32\ -march=cascadelake,-m64\
> > -march=cascadelake}"".
> >   No big impact on SPEC2017.
> >   Ok for trunk?
> >
> > gcc/ChangeLog
> >
> > PR target/96861
> > * config/i386/x86-tune-costs.h (skylake_cost): increase rtx
> > cost of sse_to_integer from 2 to 6.
> >
> > gcc/testsuite
> >
> > * gcc.target/i386/pr95021-3.c: Add -mtune=generic.
>
> I'll defer to HJ's judgement here.  If he's OK with it, then it's fine
> by me.
>

I've talked to H.J and he agrees on this patch.
So i'm going to check in this patch.

>
> jeff
>
>


-- 
BR,
Hongtao