Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-09-28 Thread Vineet Gupta



On 9/28/23 20:17, Jeff Law wrote:
I can bootstrap & regression test alpha using QEMU user mode 
emulation. So we might be able to trigger something that way. It'll 
take some time, but might prove fruitful. 


That would be awesome. It's not like this this is burning or anything 
and one of the things in the long tail of things we need to do in this area.


Thx,
-Vineet


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-09-28 Thread Jeff Law




On 9/28/23 15:43, Vineet Gupta wrote:

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")

This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|   | # of unexpected case / # of unique unexpected 
case
|   |  gcc |  g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /87 |5 / 2 |   72 /12 |
|lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this :-)

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming ;-)

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.
So I scoured my old gcc2 archives to see if there was anything that 
might hint as to why this was changed.  Sadly (but not unexpectedly), 
nothing.  The relevant ChangeLog entry is;




Fri Jul  8 11:46:50 1994  Richard Kenner  (ken...@vlsi1.ultra.nyu.edu)

* varasm.c (record_constant_rtx, force_const_mem): Ensure everything
is in saveable_obstack, not current_obstack.

* combine.c (force_to_mode): OP_MODE must be MODE if MODE and
mode of X are of different classes.
(nonzero_bits, num_sign_bit_copies): Say nothing known for
floating-point modes.

* function.c (instantiate_virtual_regs_1, case SET):
If DEST is virtual_stack_vars_rtx, replace with hardware
frame pointer.

* expr.c (expand_expr, case CONVERT_EXPR): If changing signedness
and we have a promoted SUBREG, clear the promotion flag.

* c-decl.c (finish_decl): Put RTL and other stuff in
permanent_obstack if DECL is.

* combine.c (gen_unary): Add new arg, OP0_MODE.
All callers changed.


So standard practice back then was to re-use the header and have a blank 
line between conceptual changes if the same author made a series of 
changes.  So it's reasonable to assume the expr.c change was considered 
independent of the other changes.


At that particular time I think Kenner was mostly focused on the alpha 
and ppc ports, but I think he was also still poking around with romp and 
a29k.  I think romp is an unlikely target for this because it didn't 
promote modes and it wasn't even building for several months 
(April->late July).


PPC and a29k were both 32 bit ports and while they did promotions, I 
would hazard a guess the alpha was actually more sensitive to this 
stuff.  Which suggests a possible path forward.


I can bootstrap & regression test alpha using QEMU user mode emulation. 
So we might be able to trigger something that way.  It'll take some 
time, but might prove fruitful.



Jeff


Re: [PATCH] libstdc++: Ensure active union member is correctly set

2023-09-28 Thread Nathaniel Shead
On Wed, Sep 27, 2023 at 03:13:35PM +0100, Jonathan Wakely wrote:
> On Sat, 23 Sept 2023 at 08:30, Nathaniel Shead via Libstdc++
>  wrote:
> >
> > On Sat, Sep 23, 2023 at 07:40:48AM +0100, Jonathan Wakely wrote:
> > > On Sat, 23 Sept 2023, 01:39 Nathaniel Shead via Libstdc++, <
> > > libstd...@gcc.gnu.org> wrote:
> > >
> > > > Now that bootstrap has finished, I have gotten regressions in the
> > > > following libstdc++ tests:
> > > >
> > > > Running libstdc++:libstdc++-dg/conformance.exp ...
> > > > FAIL: 20_util/bitset/access/constexpr.cc -std=gnu++23 (test for excess
> > > > errors)
> > > > FAIL: 20_util/bitset/access/constexpr.cc -std=gnu++26 (test for excess
> > > > errors)
> > > > FAIL: 20_util/variant/constexpr.cc -std=gnu++20 (test for excess errors)
> > > > FAIL: 20_util/variant/constexpr.cc -std=gnu++26 (test for excess errors)
> > > > FAIL: 21_strings/basic_string/cons/char/constexpr.cc -std=gnu++20 (test
> > > > for excess errors)
> > > > FAIL: 21_strings/basic_string/cons/char/constexpr.cc -std=gnu++26 (test
> > > > for excess errors)
> > > > FAIL: 21_strings/basic_string/cons/wchar_t/constexpr.cc -std=gnu++20 
> > > > (test
> > > > for excess errors)
> > > > FAIL: 21_strings/basic_string/cons/wchar_t/constexpr.cc -std=gnu++26 
> > > > (test
> > > > for excess errors)
> > > > FAIL: 21_strings/basic_string/modifiers/swap/constexpr-wchar_t.cc
> > > > -std=gnu++20 (test for excess errors)
> > > > FAIL: 21_strings/basic_string/modifiers/swap/constexpr-wchar_t.cc
> > > > -std=gnu++26 (test for excess errors)
> > > > FAIL: 21_strings/basic_string/modifiers/swap/constexpr.cc -std=gnu++20
> > > > (test for excess errors)
> > > > FAIL: 21_strings/basic_string/modifiers/swap/constexpr.cc -std=gnu++26
> > > > (test for excess errors)
> > > > FAIL: std/ranges/adaptors/join_with/1.cc -std=gnu++23 (test for excess
> > > > errors)
> > > > UNRESOLVED: std/ranges/adaptors/join_with/1.cc -std=gnu++23 compilation
> > > > failed to produce executable
> > > > FAIL: std/ranges/adaptors/join_with/1.cc -std=gnu++26 (test for excess
> > > > errors)
> > > > UNRESOLVED: std/ranges/adaptors/join_with/1.cc -std=gnu++26 compilation
> > > > failed to produce executable
> > > >
> > > > On investigation though it looks like the issue might be with libstdc++
> > > > rather than the patch itself; running the failing tests using clang with
> > > > libstdc++ also produces similar errors, and my reading of the code
> > > > suggests that this is correct.
> > > >
> > > > What's the way forward here? Should I look at creating a patch to fix
> > > > the libstdc++ issues before resubmitting this patch for the C++
> > > > frontend? Or should I submit a version of this patch without the
> > > > `std::construct_at` changes and wait till libstdc++ gets fixed for that?
> > > >
> > >
> > > I think we should fix libstdc++. There are probably only a few places that
> > > need a fix, which cause all those failures.
> > >
> > > I can help with those fixes. I'll look into it after the weekend.
> > >
> >
> > Thanks. I did end up getting a chance to look at it earlier today, and
> > with the following patch I had no regressions when applying the frontend
> > changes. Bootstrapped and regtested on x86_64-pc-linux-gnu.
> >
> > -- >8 --
> >
> > This patch ensures that the union members for std::string and
> > std::variant are always properly set when a change occurs.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/bits/basic_string.h: (basic_string(basic_string&&)):
> > Activate _M_local_buf when needed.
> > (basic_string(basic_string&&, const _Alloc&)): Likewise.
> > * include/bits/basic_string.tcc: (basic_string::swap): Likewise.
> > * include/std/variant: (__detail::__variant::__construct_n): New.
> > (__detail::_variant::__emplace): Use __construct_n.
> >
> > Signed-off-by: Nathaniel Shead 
> > ---
> >  libstdc++-v3/include/bits/basic_string.h   |  7 +++--
> >  libstdc++-v3/include/bits/basic_string.tcc |  8 +++---
> >  libstdc++-v3/include/std/variant   | 32 --
> >  3 files changed, 38 insertions(+), 9 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/basic_string.h 
> > b/libstdc++-v3/include/bits/basic_string.h
> > index 09fd62afa66..7c342879827 100644
> > --- a/libstdc++-v3/include/bits/basic_string.h
> > +++ b/libstdc++-v3/include/bits/basic_string.h
> > @@ -678,7 +678,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> >{
> > if (__str._M_is_local())
> >   {
> > -   traits_type::copy(_M_local_buf, __str._M_local_buf,
> > +   traits_type::copy(_M_use_local_data(), __str._M_local_buf,
> >   __str.length() + 1);
> >   }
> > else
> > @@ -691,7 +691,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
> > // basic_stringbuf relies on writing into unallocated capacity so
> > // we mess up the contents if we put a '\0' in the string.
> > _M_length(__str.length());
> > -   

[RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-09-28 Thread Vineet Gupta
RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
   5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")

This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|   | # of unexpected case / # of unique unexpected 
case
|   |  gcc |  g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /87 |5 / 2 |   72 /12 |
|lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this :-)

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming ;-)

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.

```
foo2:
sext.w  a6,a1 <-- this goes away
beq a1,zero,.L4
li  a5,0
li  a0,0
.L3:
addwa4,a2,a5
addwa5,a3,a5
addwa0,a4,a0
bltua5,a6,.L3
ret
.L4:
li  a0,0
ret
```

Signed-off-by: Vineet Gupta 
Co-developed-by: Robin Dapp 
---
 gcc/expr.cc   |  7 ---
 gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++
 2 files changed, 15 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 308ddc09e631..d259c6e53385 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -9332,13 +9332,6 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode 
tmode,
  op0 = expand_expr (treeop0, target, VOIDmode,
 modifier);
 
- /* If the signedness of the conversion differs and OP0 is
-a promoted SUBREG, clear that indication since we now
-have to do the proper extension.  */
- if (TYPE_UNSIGNED (TREE_TYPE (treeop0)) != unsignedp
- && GET_CODE (op0) == SUBREG)
-   SUBREG_PROMOTED_VAR_P (op0) = 0;
-
  return REDUCE_BIT_FIELD (op0);
}
 
diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c 
b/gcc/testsuite/gcc.target/riscv/pr111466.c
new file mode 100644
index ..007792466a51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
@@ -0,0 +1,15 @@
+/* Simplified varaint of gcc.target/riscv/zba-adduw.c.  */
+
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int foo2(int unused, int n, unsigned y, unsigned delta){
+  int s = 0;
+  unsigned int x = 0;
+  for (;x

Re: [PATCH 2/7] libstdc++: Use gdb.ValuePrinter base class

2023-09-28 Thread Jonathan Wakely
On Thu, 28 Sept 2023, 21:38 Tom Tromey,  wrote:

> Jonathan> I've pushed the changes I wanted to make, so you'll have to
> rebase
> Jonathan> your patches now, sorry.
>
> No problem.  I rebased & re-tested them.
> I can send a v2 if you want to double-check (only this large patch
> required any changes), or just go ahead.  Let me know.
>

Just go ahead, the changes are all straightforward so if the tests still
pass, you can push it.


I may not be able to push until Monday.
>
> Tom
>


Re: [PATCH 2/7] libstdc++: Use gdb.ValuePrinter base class

2023-09-28 Thread Tom Tromey
Jonathan> I've pushed the changes I wanted to make, so you'll have to rebase
Jonathan> your patches now, sorry.

No problem.  I rebased & re-tested them.
I can send a v2 if you want to double-check (only this large patch
required any changes), or just go ahead.  Let me know.
I may not be able to push until Monday.

Tom


Re: [PATCH 2/7] libstdc++: Use gdb.ValuePrinter base class

2023-09-28 Thread Jonathan Wakely
On Thu, 28 Sept 2023 at 18:50, Tom Tromey via Libstdc++
 wrote:
>
> GDB 14 will add a new ValuePrinter tag class that will be used to
> signal that pretty-printers will agree to the "extension protocol" --
> essentially that they will follow some simple namespace rules, so that
> GDB can add new methods over time.
>
> A couple new methods have already been added to GDB, to support DAP.
> While I haven't implemented these for any libstdc++ printers yet, this
> patch makes the basic conversion: printers derive from
> gdb.ValuePrinter if it is available, and all "non-standard" (that is,
> not specified by GDB) members of the various value-printing classes
> are renamed to have a leading underscore.

OK, thanks.

I've pushed the changes I wanted to make, so you'll have to rebase
your patches now, sorry.


> ---
>  libstdc++-v3/python/libstdcxx/v6/printers.py | 1201 +-
>  1 file changed, 605 insertions(+), 596 deletions(-)
>
> diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
> b/libstdc++-v3/python/libstdcxx/v6/printers.py
> index d60c8003a63..bbc4375541f 100644
> --- a/libstdc++-v3/python/libstdcxx/v6/printers.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
> @@ -97,6 +97,12 @@ try:
>  except ImportError:
>  pass
>
> +# Use the base class if available.
> +if hasattr(gdb, 'ValuePrinter'):
> +printer_base = gdb.ValuePrinter
> +else:
> +printer_base = object
> +
>  # Starting with the type ORIG, search for the member type NAME.  This
>  # handles searching upward through superclasses.  This is needed to
>  # work around http://sourceware.org/bugzilla/show_bug.cgi?id=13615.
> @@ -241,43 +247,43 @@ class SmartPtrIterator(Iterator):
>  "An iterator for smart pointer types with a single 'child' value"
>
>  def __init__(self, val):
> -self.val = val
> +self._val = val
>
>  def __iter__(self):
>  return self
>
>  def __next__(self):
> -if self.val is None:
> +if self._val is None:
>  raise StopIteration
> -self.val, val = None, self.val
> +self._val, val = None, self._val
>  return ('get()', val)
>
>
> -class SharedPointerPrinter:
> +class SharedPointerPrinter(printer_base):
>  "Print a shared_ptr, weak_ptr, atomic, or atomic"
>
>  def __init__(self, typename, val):
> -self.typename = strip_versioned_namespace(typename)
> -self.val = val
> -self.pointer = val['_M_ptr']
> +self._typename = strip_versioned_namespace(typename)
> +self._val = val
> +self._pointer = val['_M_ptr']
>
>  def children(self):
> -return SmartPtrIterator(self.pointer)
> +return SmartPtrIterator(self._pointer)
>
>  # Return the _Sp_counted_base<>* that holds the refcounts.
>  def _get_refcounts(self):
> -if self.typename == 'std::atomic':
> +if self._typename == 'std::atomic':
>  # A tagged pointer is stored as uintptr_t.
> -ptr_val = self.val['_M_refcount']['_M_val']['_M_i']
> +ptr_val = self._val['_M_refcount']['_M_val']['_M_i']
>  ptr_val = ptr_val - (ptr_val % 2)  # clear lock bit
> -ptr_type = find_type(self.val['_M_refcount'].type, 'pointer')
> +ptr_type = find_type(self._val['_M_refcount'].type, 'pointer')
>  return ptr_val.cast(ptr_type)
> -return self.val['_M_refcount']['_M_pi']
> +return self._val['_M_refcount']['_M_pi']
>
>  def to_string(self):
>  state = 'empty'
>  refcounts = self._get_refcounts()
> -targ = self.val.type.template_argument(0)
> +targ = self._val.type.template_argument(0)
>  targ = strip_versioned_namespace(str(targ))
>
>  if refcounts != 0:
> @@ -288,7 +294,7 @@ class SharedPointerPrinter:
>  else:
>  state = 'use count %d, weak count %d' % (
>  usecount, weakcount - 1)
> -return '%s<%s> (%s)' % (self.typename, targ, state)
> +return '%s<%s> (%s)' % (self._typename, targ, state)
>
>
>  def _tuple_impl_get(val):
> @@ -347,17 +353,17 @@ def unique_ptr_get(val):
>  return tuple_get(0, tuple_member)
>
>
> -class UniquePointerPrinter:
> +class UniquePointerPrinter(printer_base):
>  "Print a unique_ptr"
>
>  def __init__(self, typename, val):
> -self.val = val
> +self._val = val
>
>  def children(self):
> -return SmartPtrIterator(unique_ptr_get(self.val))
> +return SmartPtrIterator(unique_ptr_get(self._val))
>
>  def to_string(self):
> -return ('std::unique_ptr<%s>' % 
> (str(self.val.type.template_argument(0
> +return ('std::unique_ptr<%s>' % 
> (str(self._val.type.template_argument(0
>
>
>  def get_value_from_aligned_membuf(buf, valtype):
> @@ -381,55 +387,56 @@ def get_value_from_list_node(node):
>  raise ValueError("Unsupported implementation for %s" % str(node.type))
>
>
> -class StdListPrinter:

[committed] libstdc++: Refactor Python Xmethods to use is_specialization_of

2023-09-28 Thread Jonathan Wakely
Tested x86_64-linux (GDB 13.2, Python 3.11). Pushed to trunk.

-- >8 --

This copies the is_specialization_of function from printers.py (with
slight modification for versioned namespace handling) and reuses it in
xmethods.py to replace repetitive re.match calls in every class.

This fixes the problem that the regular expressions used \d without
escaping the backslash properly.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/xmethods.py (is_specialization_of): Define
new function.
(ArrayMethodsMatcher, DequeMethodsMatcher)
(ForwardListMethodsMatcher, ListMethodsMatcher)
(VectorMethodsMatcher, AssociativeContainerMethodsMatcher)
(UniquePtrGetWorker, UniquePtrMethodsMatcher)
(SharedPtrSubscriptWorker, SharedPtrMethodsMatcher): Use
is_specialization_of instead of re.match.
---
 libstdc++-v3/python/libstdcxx/v6/xmethods.py | 36 +---
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/xmethods.py 
b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
index ef0a6e3cef3..844c8a2105a 100644
--- a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
+++ b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
@@ -25,10 +25,21 @@ matcher_name_prefix = 'libstdc++::'
 def get_bool_type():
 return gdb.lookup_type('bool')
 
-
 def get_std_size_type():
 return gdb.lookup_type('std::size_t')
 
+def is_specialization_of(x, template_name):
+"""
+Test whether a type is a specialization of the named class template.
+The type can be specified as a string or a gdb.Type object.
+The template should be the name of a class template as a string,
+without any 'std' qualification.
+"""
+if isinstance(x, gdb.Type):
+x = x.tag
+if _versioned_namespace:
+template_name = '(%s)?%s' % (_versioned_namespace, template_name)
+return re.match(r'^std::(__\d::)?%s<.*>$' % template_name, x) is not None
 
 class LibStdCxxXMethod(gdb.xmethod.XMethod):
 def __init__(self, name, worker_class):
@@ -159,7 +170,7 @@ class ArrayMethodsMatcher(gdb.xmethod.XMethodMatcher):
 self.methods = [self._method_dict[m] for m in self._method_dict]
 
 def match(self, class_type, method_name):
-if not re.match('^std::(__\d+::)?array<.*>$', class_type.tag):
+if not is_specialization_of(class_type, 'array'):
 return None
 method = self._method_dict.get(method_name)
 if method is None or not method.enabled:
@@ -171,6 +182,7 @@ class ArrayMethodsMatcher(gdb.xmethod.XMethodMatcher):
 return None
 return method.worker_class(value_type, size)
 
+
 # Xmethods for std::deque
 
 
@@ -284,7 +296,7 @@ class DequeMethodsMatcher(gdb.xmethod.XMethodMatcher):
 self.methods = [self._method_dict[m] for m in self._method_dict]
 
 def match(self, class_type, method_name):
-if not re.match('^std::(__\d+::)?deque<.*>$', class_type.tag):
+if not is_specialization_of(class_type, 'deque'):
 return None
 method = self._method_dict.get(method_name)
 if method is None or not method.enabled:
@@ -332,7 +344,7 @@ class ForwardListMethodsMatcher(gdb.xmethod.XMethodMatcher):
 self.methods = [self._method_dict[m] for m in self._method_dict]
 
 def match(self, class_type, method_name):
-if not re.match('^std::(__\d+::)?forward_list<.*>$', class_type.tag):
+if not is_specialization_of(class_type, 'forward_list'):
 return None
 method = self._method_dict.get(method_name)
 if method is None or not method.enabled:
@@ -419,7 +431,7 @@ class ListMethodsMatcher(gdb.xmethod.XMethodMatcher):
 self.methods = [self._method_dict[m] for m in self._method_dict]
 
 def match(self, class_type, method_name):
-if not re.match('^std::(__\d+::)?(__cxx11::)?list<.*>$', 
class_type.tag):
+if not is_specialization_of(class_type, '(__cxx11::)?list'):
 return None
 method = self._method_dict.get(method_name)
 if method is None or not method.enabled:
@@ -542,7 +554,7 @@ class VectorMethodsMatcher(gdb.xmethod.XMethodMatcher):
 self.methods = [self._method_dict[m] for m in self._method_dict]
 
 def match(self, class_type, method_name):
-if not re.match('^std::(__\d+::)?vector<.*>$', class_type.tag):
+if not is_specialization_of(class_type, 'vector'):
 return None
 method = self._method_dict.get(method_name)
 if method is None or not method.enabled:
@@ -595,7 +607,7 @@ class 
AssociativeContainerMethodsMatcher(gdb.xmethod.XMethodMatcher):
 self.methods = [self._method_dict[m] for m in self._method_dict]
 
 def match(self, class_type, method_name):
-if not re.match('^std::(__\d+::)?%s<.*>$' % self._name, 
class_type.tag):
+if not is_specialization_of(class_type, self._name):
 return None
 method = 

[committed] libstdc++: Reformat Python code

2023-09-28 Thread Jonathan Wakely
Tested x86_64-linux (GDB 13.2, Python 3.11). Pushed to trunk.

-- >8 --

Some of these changes were suggested by autopep8's --aggressive
option, others are for readability.

Break long lines by splitting strings across multiple lines, or
introducing local variables to hold results.

Use raw strings for regular expressions, so that backslashes don't need
to be escaped.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py: Break long lines. Use raw
strings for regular expressions. Add whitespace around
operators.
(is_member_of_namespace): Use isinstance to check type.
(is_specialization_of): Likewise. Adjust template_name
for versioned namespace instead of duplicating the re.match
call.
(StdExpAnyPrinter._string_types): New static method.
(StdExpAnyPrinter.to_string): Use _string_types.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py | 122 ---
 1 file changed, 75 insertions(+), 47 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 7889235ce1c..3f22ba23452 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -145,7 +145,6 @@ def lookup_templ_spec(templ, *args):
 
 # Use this to find container node types instead of find_type,
 # see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91997 for details.
-
 def lookup_node_type(nodename, containertype):
 """
 Lookup specialization of template nodename corresponding to containertype.
@@ -188,7 +187,7 @@ def is_member_of_namespace(typ, *namespaces):
 Test whether a type is a member of one of the specified namespaces.
 The type can be specified as a string or a gdb.Type object.
 """
-if type(typ) is gdb.Type:
+if isinstance(typ, gdb.Type):
 typ = str(typ)
 typ = strip_versioned_namespace(typ)
 for namespace in namespaces:
@@ -205,10 +204,10 @@ def is_specialization_of(x, template_name):
 without any 'std' qualification.
 """
 global _versioned_namespace
-if type(x) is gdb.Type:
+if isinstance(x, gdb.Type):
 x = x.tag
 if _versioned_namespace:
-return re.match('^std::(%s)?%s<.*>$' % (_versioned_namespace, 
template_name), x) is not None
+template_name = '(%s)?%s' % (_versioned_namespace, template_name)
 return re.match('^std::%s<.*>$' % template_name, x) is not None
 
 
@@ -225,9 +224,9 @@ def strip_inline_namespaces(type_str):
 type_str = type_str.replace('std::__cxx11::', 'std::')
 expt_ns = 'std::experimental::'
 for lfts_ns in ('fundamentals_v1', 'fundamentals_v2'):
-type_str = type_str.replace(expt_ns+lfts_ns+'::', expt_ns)
+type_str = type_str.replace(expt_ns + lfts_ns + '::', expt_ns)
 fs_ns = expt_ns + 'filesystem::'
-type_str = type_str.replace(fs_ns+'v1::', fs_ns)
+type_str = type_str.replace(fs_ns + 'v1::', fs_ns)
 return type_str
 
 
@@ -365,7 +364,8 @@ class UniquePointerPrinter:
 return SmartPtrIterator(unique_ptr_get(self.val))
 
 def to_string(self):
-return ('std::unique_ptr<%s>' % 
(str(self.val.type.template_argument(0
+t = self.val.type.template_argument(0)
+return 'std::unique_ptr<{}>'.format(str(t))
 
 
 def get_value_from_aligned_membuf(buf, valtype):
@@ -597,7 +597,8 @@ class StdBitIteratorPrinter:
 def to_string(self):
 if not self.val['_M_p']:
 return 'non-dereferenceable iterator for std::vector'
-return bool(self.val['_M_p'].dereference() & (1 << 
self.val['_M_offset']))
+return bool(self.val['_M_p'].dereference()
+& (1 << self.val['_M_offset']))
 
 
 class StdBitReferencePrinter:
@@ -1087,9 +1088,10 @@ class StdStringStreamPrinter:
 self.val = val
 self.typename = typename
 
-# Check if the stream was redirected:
-# This is essentially: val['_M_streambuf'] == 
val['_M_stringbuf'].address
-# However, GDB can't resolve the virtual inheritance, so we do that 
manually
+# Check if the stream was redirected. This is essentially:
+# val['_M_streambuf'] != val['_M_stringbuf'].address
+# However, GDB can't resolve the virtual inheritance, so we do that
+# manually.
 basetype = [f.type for f in val.type.fields() if f.is_base_class][0]
 gdb.set_convenience_variable('__stream', val.cast(basetype).address)
 self.streambuf = gdb.parse_and_eval('$__stream->rdbuf()')
@@ -1097,7 +1099,8 @@ class StdStringStreamPrinter:
 
 def to_string(self):
 if self.was_redirected:
-return "%s redirected to %s" % (self.typename, 
self.streambuf.dereference())
+return "%s redirected to %s" % (
+self.typename, self.streambuf.dereference())
 return self.val['_M_stringbuf']
 
 def display_hint(self):
@@ -1309,8 +1312,9 @@ class 

[committed] libstdc++: Format Python docstrings according to PEP 357

2023-09-28 Thread Jonathan Wakely
Tested x86_64-linux (GDB 13.2, Python 3.11). Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py: Format docstrings according
to PEP 257.
* python/libstdcxx/v6/xmethods.py: Likewise.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py | 177 ++-
 libstdc++-v3/python/libstdcxx/v6/xmethods.py |  32 ++--
 2 files changed, 112 insertions(+), 97 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index d60c8003a63..7889235ce1c 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -69,7 +69,7 @@ else:
 
 # Python 2 does not provide the datetime.UTC singleton.
 class UTC(datetime.tzinfo):
-"""Concrete tzinfo class representing the UTC time zone"""
+"""Concrete tzinfo class representing the UTC time zone."""
 
 def utcoffset(self, dt):
 return datetime.timedelta(0)
@@ -126,7 +126,7 @@ _versioned_namespace = '__8::'
 
 def lookup_templ_spec(templ, *args):
 """
-Lookup template specialization templ
+Lookup template specialization templ.
 """
 t = '{}<{}>'.format(templ, ', '.join([str(a) for a in args]))
 try:
@@ -146,17 +146,23 @@ def lookup_templ_spec(templ, *args):
 # Use this to find container node types instead of find_type,
 # see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91997 for details.
 
-
 def lookup_node_type(nodename, containertype):
 """
-Lookup specialization of template NODENAME corresponding to CONTAINERTYPE.
-e.g. if NODENAME is '_List_node' and CONTAINERTYPE is std::list
-then return the type std::_List_node.
-Returns None if not found.
+Lookup specialization of template nodename corresponding to containertype.
+
+nodename - The name of a class template, as a String
+containertype - The container, as a gdb.Type
+
+Return a gdb.Type for the corresponding specialization of nodename,
+or None if the type cannot be found.
+
+e.g. lookup_node_type('_List_node', gdb.lookup_type('std::list'))
+will return a gdb.Type for the type std::_List_node.
 """
 # If nodename is unqualified, assume it's in namespace std.
 if '::' not in nodename:
 nodename = 'std::' + nodename
+# Use either containertype's value_type or its first template argument.
 try:
 valtype = find_type(containertype, 'value_type')
 except:
@@ -214,7 +220,7 @@ def strip_versioned_namespace(typename):
 
 
 def strip_inline_namespaces(type_str):
-"Remove known inline namespaces from the canonical name of a type."
+"""Remove known inline namespaces from the canonical name of a type."""
 type_str = strip_versioned_namespace(type_str)
 type_str = type_str.replace('std::__cxx11::', 'std::')
 expt_ns = 'std::experimental::'
@@ -226,7 +232,7 @@ def strip_inline_namespaces(type_str):
 
 
 def get_template_arg_list(type_obj):
-"Return a type's template arguments as a list"
+"""Return a type's template arguments as a list."""
 n = 0
 template_args = []
 while True:
@@ -238,7 +244,7 @@ def get_template_arg_list(type_obj):
 
 
 class SmartPtrIterator(Iterator):
-"An iterator for smart pointer types with a single 'child' value"
+"""An iterator for smart pointer types with a single 'child' value."""
 
 def __init__(self, val):
 self.val = val
@@ -254,7 +260,9 @@ class SmartPtrIterator(Iterator):
 
 
 class SharedPointerPrinter:
-"Print a shared_ptr, weak_ptr, atomic, or atomic"
+"""
+Print a shared_ptr, weak_ptr, atomic, or atomic.
+"""
 
 def __init__(self, typename, val):
 self.typename = strip_versioned_namespace(typename)
@@ -292,7 +300,7 @@ class SharedPointerPrinter:
 
 
 def _tuple_impl_get(val):
-"Return the tuple element stored in a _Tuple_impl base class."
+"""Return the tuple element stored in a _Tuple_impl base class."""
 bases = val.type.fields()
 if not bases[-1].is_base_class:
 raise ValueError(
@@ -316,7 +324,7 @@ def _tuple_impl_get(val):
 
 
 def tuple_get(n, val):
-"Return the result of std::get(val) on a std::tuple"
+"""Return the result of std::get(val) on a std::tuple."""
 tuple_size = len(get_template_arg_list(val.type))
 if n > tuple_size:
 raise ValueError("Out of range index for std::get on std::tuple")
@@ -330,7 +338,7 @@ def tuple_get(n, val):
 
 
 def unique_ptr_get(val):
-"Return the result of val.get() on a std::unique_ptr"
+"""Return the result of val.get() on a std::unique_ptr."""
 # std::unique_ptr contains a std::tuple,
 # either as a direct data member _M_t (the old implementation)
 # or within a data member of type __uniq_ptr_data.
@@ -348,7 +356,7 @@ def unique_ptr_get(val):
 
 
 class UniquePointerPrinter:
-"Print a unique_ptr"
+"""Print a unique_ptr."""
 
 def __init__(self, typename, val):
 

[PATCH] ggc: do not wipe out unrelated data via gt_ggc_rtab

2023-09-28 Thread Sergei Trofimovich
From: Sergei Trofimovich 

There are 3 GC root tables:

   gt_ggc_rtab
   gt_ggc_deletable_rtab
   gt_pch_scalar_rtab

`deletable` and `scalar` tables are both simple: each element always
contains a pointer to the beginning of the object and it's size is the
full object.

`rtab` is different: it's `base` is a pointer in the middle of the
struct and `stride` points to the next GC pointer in the array.

Before the change there were 2 problems:

1. We memset()ed not just pointers but data around them.
2. We wen out of bounds of the last object described by gt_ggc_rtab
   and triggered bootstrap failures in profile and asan bootstraps.

After the change we handle only pointers themselves like the rest of
ggc-common.cc code.

gcc/
PR/111505
* ggc-common.cc (ggc_zero_out_root_pointers): New helper.
* ggc-common.cc (ggc_common_finalize): Use helper instead of
memset() to wipe out pointers.
---
 gcc/ggc-common.cc | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/ggc-common.cc b/gcc/ggc-common.cc
index 95803fa95a1..39e2581affd 100644
--- a/gcc/ggc-common.cc
+++ b/gcc/ggc-common.cc
@@ -75,6 +75,18 @@ ggc_mark_root_tab (const_ggc_root_tab_t rt)
   (*rt->cb) (*(void **) ((char *)rt->base + rt->stride * i));
 }
 
+/* Zero out all the roots in the table RT.  */
+
+static void
+ggc_zero_rtab_roots (const_ggc_root_tab_t rt)
+{
+  size_t i;
+
+  for ( ; rt->base != NULL; rt++)
+for (i = 0; i < rt->nelt; i++)
+  (*(void **) ((char *)rt->base + rt->stride * i)) = (void*)0;
+}
+
 /* Iterate through all registered roots and mark each element.  */
 
 void
@@ -1307,8 +1319,7 @@ ggc_common_finalize ()
   memset (rti->base, 0, rti->stride * rti->nelt);
 
   for (rt = gt_ggc_rtab; *rt; rt++)
-for (rti = *rt; rti->base != NULL; rti++)
-  memset (rti->base, 0, rti->stride * rti->nelt);
+ggc_zero_rtab_roots (*rt);
 
   for (rt = gt_pch_scalar_rtab; *rt; rt++)
 for (rti = *rt; rti->base != NULL; rti++)
-- 
2.42.0



[PATCH v3] libstdc++: Fix handling of surrogate CP in codecvt [PR108976]

2023-09-28 Thread Dimitrij Mijoski
This patch fixes the handling of surrogate code points in all standard
facets for transcoding Unicode that are based on std::codecvt. Surrogate
code points should always be treated as error. On the other hand
surrogate code units can only appear in UTF-16 and only when they come
in a proper pair.

Additionally, it fixes a bug in std::codecvt_utf16::in() when odd number
of bytes were given in the range [from, from_end), error was returned
always. The last byte in such range does not form a full UTF-16 code
unit and we can not make any decisions for error, instead partial should
be returned.

The testsuite for testing these facets was updated in the following
order:

1. All functions that test codecvts that work with UTF-8 were refactored
   and made more generic so they accept codecvt that works with the char
   type char8_t.
2. The same functions were updated with new test cases for transcoding
   errors and now additionally test for surrogates, overlong UTF-8
   sequences, code points out of the Unicode range, and more tests for
   missing leading and trailing code units.
3. New tests were added to test codecvt_utf16 in both of its variants,
   UTF-16 <-> UTF-32/UCS-4 and UTF-16 <-> UCS-2.

libstdc++-v3/ChangeLog:

* src/c++11/codecvt.cc (read_utf8_code_point): Fix handing of
surrogates in UTF-8.
(ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-8.
(ucs4_in): Fix handling of range with odd number of bytes.
(ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-16.
(ucs2_out): Fix handling of surrogates in UCS-2 -> UTF-16.
(ucs2_in): Fix handling of range with odd number of bytes.
(__codecvt_utf16_base::do_in): Likewise.
(__codecvt_utf16_base::do_in): Likewise.
(__codecvt_utf16_base::do_in): Likewise.
* testsuite/22_locale/codecvt/codecvt_unicode.cc: Renames, add
tests for codecvt_utf16 and codecvt_utf16.
* testsuite/22_locale/codecvt/codecvt_unicode.h: Refactor UTF-8
testing functions for char8_t, add more test cases for errors,
add testing functions for codecvt_utf16.
* testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc:
Renames, add tests for codecvt_utf16.
* testsuite/22_locale/codecvt/codecvt_utf16/79980.cc (test06):
Fix test.
* testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc: New test.
---
 libstdc++-v3/src/c++11/codecvt.cc |   18 +-
 .../22_locale/codecvt/codecvt_unicode.cc  |   38 +-
 .../22_locale/codecvt/codecvt_unicode.h   | 1799 +
 .../codecvt/codecvt_unicode_char8_t.cc|   53 +
 .../codecvt/codecvt_unicode_wchar_t.cc|   32 +-
 .../22_locale/codecvt/codecvt_utf16/79980.cc  |2 +-
 6 files changed, 1493 insertions(+), 449 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc

diff --git a/libstdc++-v3/src/c++11/codecvt.cc 
b/libstdc++-v3/src/c++11/codecvt.cc
index 02f05752d..2cc812cfc 100644
--- a/libstdc++-v3/src/c++11/codecvt.cc
+++ b/libstdc++-v3/src/c++11/codecvt.cc
@@ -284,6 +284,8 @@ namespace
return invalid_mb_sequence;
   if (c1 == 0xE0 && c2 < 0xA0) [[unlikely]] // overlong
return invalid_mb_sequence;
+  if (c1 == 0xED && c2 >= 0xA0) [[unlikely]] // surrogate
+   return invalid_mb_sequence;
   if (avail < 3) [[unlikely]]
return incomplete_mb_character;
   char32_t c3 = (unsigned char) from[2];
@@ -484,6 +486,8 @@ namespace
 while (from.size())
   {
const char32_t c = from[0];
+   if (0xD800 <= c && c <= 0xDFFF) [[unlikely]]
+ return codecvt_base::error;
if (c > maxcode) [[unlikely]]
  return codecvt_base::error;
if (!write_utf8_code_point(to, c)) [[unlikely]]
@@ -508,7 +512,7 @@ namespace
  return codecvt_base::error;
to = codepoint;
   }
-return from.size() ? codecvt_base::partial : codecvt_base::ok;
+return from.nbytes() ? codecvt_base::partial : codecvt_base::ok;
   }
 
   // ucs4 -> utf16
@@ -521,6 +525,8 @@ namespace
 while (from.size())
   {
const char32_t c = from[0];
+   if (0xD800 <= c && c <= 0xDFFF) [[unlikely]]
+ return codecvt_base::error;
if (c > maxcode) [[unlikely]]
  return codecvt_base::error;
if (!write_utf16_code_point(to, c, mode)) [[unlikely]]
@@ -653,7 +659,7 @@ namespace
 while (from.size() && to.size())
   {
char16_t c = from[0];
-   if (is_high_surrogate(c))
+   if (0xD800 <= c && c <= 0xDFFF)
  return codecvt_base::error;
if (c > maxcode)
  return codecvt_base::error;
@@ -680,7 +686,7 @@ namespace
  return codecvt_base::error;
to = c;
   }
-return from.size() == 0 ? codecvt_base::ok : codecvt_base::partial;
+return from.nbytes() == 0 ? codecvt_base::ok : codecvt_base::partial;
   }
 
   const char16_t*
@@ -1344,8 +1350,6 @@ 

Re: [PATCH 6/7] libstdc++: Fix regex escapes in pretty-printers

2023-09-28 Thread Tom Tromey
Jonathan> I already have a patch to use r'...' for these, so we only
Jonathan> need the single backslash.

Yeah, probably nicer.

Jonathan> So please don't commit this one, I think it will be
Jonathan> unnecessary in a couple of hours.

No problem, I'll drop it when I rebase on top of your changes.

Tom


Re: [PATCH] Remove poly_int_pod

2023-09-28 Thread Jeff Law




On 9/28/23 11:26, Jason Merrill wrote:

On 9/28/23 05:55, Richard Sandiford wrote:

poly_int was written before the switch to C++11 and so couldn't
use explicit default constructors.  This led to an awkward split
between poly_int_pod and poly_int.  poly_int simply inherited from
poly_int_pod and added constructors, with the argumentless constructor
having an empty body.  But inheritance meant that poly_int had to
repeat the assignment operators from poly_int_pod (again, no C++11,
so no "using" to inherit base-class implementations).

All that goes away if we switch to using default constructors.

The main complication is ensuring that braced initialisation still
gives a constexpr, so that static variables can be initialised without
runtime code.  The two problems here are:

(1) When initialising a poly_int with fewer than N
 coefficients, the other coefficients need to be a zero of
 the same precision as the explicit coefficients.  This was
 previously done in a for loop using wi::ints_for<...>::zero,
 but C++11 constexpr constructors can't have function bodies.
 The patch instead uses a series of delegated initialisers to
 fill in the implicit coefficients.


Perhaps it's time to update the bootstrap requirement to C++14 (i.e. GCC 
5, from eight years ago).  Not that this would affect this particular 
patch.
IIRC the primary reason we settled on gcc-4.8.x was RHEL7/Centos7.  With 
RHEL 7 approaching EOL moving the baseline forward would seem to make sense.


I'd want to know if this affects folks using SuSE's enterprise distro 
before actually making the change, but I'm broadly in favor of moving 
forward it it's not going to have a major impact on users that are using 
enterprise distros.


jeff


Re: [PATCH 5/7] libstdc++: Remove std_ratio_t_tuple

2023-09-28 Thread Jonathan Wakely
On Thu, 28 Sept 2023, 18:55 Tom Tromey via Libstdc++, 
wrote:

> This removes the std_ratio_t_tuple function from the Python
> pretty-printer code.  It is not used.  Apparently the relevant parts
> were moved to StdChronoDurationPrinter._ratio at some point in the
> past.
>

I think I added it at the same time as that printer, rather than moving it
there later. I don't remember if I wanted to replace the _ratio method with
that function, or vice versa, but it looks like I never finished whatever I
meant to do. Either way, we don't need to keep the unused function.

OK, thanks.





> libstdc++-v3/ChangeLog:
>
> * python/libstdcxx/v6/printers.py (std_ratio_t_tuple):
> Remove.
> ---
>  libstdc++-v3/python/libstdcxx/v6/printers.py | 8 
>  1 file changed, 8 deletions(-)
>
> diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py
> b/libstdc++-v3/python/libstdcxx/v6/printers.py
> index 6bf4fe891fd..94ac9232da7 100644
> --- a/libstdc++-v3/python/libstdcxx/v6/printers.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
> @@ -1985,14 +1985,6 @@ class StdFormatArgsPrinter(printer_base):
>  return "%s with %d arguments" % (typ, size)
>
>
> -def std_ratio_t_tuple(ratio_type):
> -# TODO use reduced period i.e. duration::period
> -period = self._val.type.template_argument(1)
> -num = period.template_argument(0)
> -den = period.template_argument(1)
> -return (num, den)
> -
> -
>  class StdChronoDurationPrinter(printer_base):
>  "Print a std::chrono::duration"
>
> --
> 2.40.1
>
>


Re: [PATCH 3/7] libstdc++: Remove unused Python imports

2023-09-28 Thread Jonathan Wakely
On Thu, 28 Sept 2023, 18:50 Tom Tromey via Libstdc++, 
wrote:

> flake8 pointed out some unused imports.
>

OK, thanks.



> libstdc++-v3/ChangeLog:
>
> * python/libstdcxx/v6/printers.py: Don't import 'os'.
> * python/libstdcxx/v6/__init__.py: Don't import 'gdb'.
> ---
>  libstdc++-v3/python/libstdcxx/v6/__init__.py | 2 --
>  libstdc++-v3/python/libstdcxx/v6/printers.py | 1 -
>  2 files changed, 3 deletions(-)
>
> diff --git a/libstdc++-v3/python/libstdcxx/v6/__init__.py
> b/libstdc++-v3/python/libstdcxx/v6/__init__.py
> index df654acd0c2..8b2cbc60a1b 100644
> --- a/libstdc++-v3/python/libstdcxx/v6/__init__.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/__init__.py
> @@ -13,8 +13,6 @@
>  # You should have received a copy of the GNU General Public License
>  # along with this program.  If not, see .
>
> -import gdb
> -
>  # Load the xmethods if GDB supports them.
>  def gdb_has_xmethods():
>  try:
> diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py
> b/libstdc++-v3/python/libstdcxx/v6/printers.py
> index bbc4375541f..8d44244afb0 100644
> --- a/libstdc++-v3/python/libstdcxx/v6/printers.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
> @@ -19,7 +19,6 @@ import gdb
>  import itertools
>  import re
>  import sys
> -import os
>  import errno
>  import datetime
>
> --
> 2.40.1
>
>


Re: [PATCH 4/7] libstdc++: Remove unused locals from printers.py

2023-09-28 Thread Jonathan Wakely
On Thu, 28 Sept 2023, 18:50 Tom Tromey via Libstdc++, 
wrote:

> flake8 pointed out some unused local variables in the libstdc++
> pretty-printers.  This removes them.
>

OK, thanks.



> libstdc++-v3/ChangeLog:
>
> * python/libstdcxx/v6/printers.py
> (StdExpOptionalPrinter.__init__, lookup_node_type):
> Remove unused variables.
> ---
>  libstdc++-v3/python/libstdcxx/v6/printers.py | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py
> b/libstdc++-v3/python/libstdcxx/v6/printers.py
> index 8d44244afb0..6bf4fe891fd 100644
> --- a/libstdc++-v3/python/libstdcxx/v6/printers.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
> @@ -169,7 +169,7 @@ def lookup_node_type(nodename, containertype):
>  valtype = valtype.strip_typedefs()
>  try:
>  return lookup_templ_spec(nodename, valtype)
> -except gdb.error as e:
> +except gdb.error:
>  # For debug mode containers the node is in std::__cxx1998.
>  if is_member_of_namespace(nodename, 'std'):
>  if is_member_of_namespace(containertype, 'std::__cxx1998',
> @@ -1423,7 +1423,6 @@ class
> StdExpOptionalPrinter(SingleObjContainerPrinter):
>  "Print a std::optional or std::experimental::optional"
>
>  def __init__(self, typename, val):
> -valtype = self._recognize(val.type.template_argument(0))
>  typename = strip_versioned_namespace(typename)
>  self._typename = re.sub(
>  '^std::(experimental::|)(fundamentals_v\d::|)(.*)',
> r'std::\1\3', typename, 1)
> --
> 2.40.1
>
>


Re: [PATCH 7/7] libstdc++: Use Python "not in" operator

2023-09-28 Thread Jonathan Wakely
On Thu, 28 Sept 2023, 18:54 Tom Tromey via Libstdc++, 
wrote:

> flake8 warns about code like
>
> not something in "whatever"
>
> Ordinarily in Python this should be written as:
>
> something not in "whatever"
>
> This patch makes this change.
>

OK, thanks.



> libstdc++-v3/ChangeLog:
>
> * python/libstdcxx/v6/printers.py (Printer.add_version)
> (add_one_template_type_printer)
> (FilteringTypePrinter.add_one_type_printer): Use Python
> "not in" operator.
> ---
>  libstdc++-v3/python/libstdcxx/v6/printers.py | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py
> b/libstdc++-v3/python/libstdcxx/v6/printers.py
> index d125236b777..380426cd41e 100644
> --- a/libstdc++-v3/python/libstdcxx/v6/printers.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
> @@ -2321,7 +2321,7 @@ class Printer(object):
>  # Add a name using _GLIBCXX_BEGIN_NAMESPACE_VERSION.
>  def add_version(self, base, name, function):
>  self.add(base + name, function)
> -if _versioned_namespace and not '__cxx11' in base:
> +if _versioned_namespace and '__cxx11' not in base:
>  vbase = re.sub('^(std|__gnu_cxx)::', r'\g<0>%s' %
> _versioned_namespace, base)
>  self.add(vbase + name, function)
> @@ -2494,7 +2494,7 @@ def add_one_template_type_printer(obj, name,
> defargs):
>  printer = TemplateTypePrinter('std::__debug::'+name, defargs)
>  gdb.types.register_type_printer(obj, printer)
>
> -if _versioned_namespace and not '__cxx11' in name:
> +if _versioned_namespace and '__cxx11' not in name:
>  # Add second type printer for same type in versioned namespace:
>  ns = 'std::' + _versioned_namespace
>  # PR 86112 Cannot use dict comprehension here:
> @@ -2589,7 +2589,7 @@ class FilteringTypePrinter(object):
>  def add_one_type_printer(obj, template, name, targ1=None):
>  printer = FilteringTypePrinter('std::' + template, 'std::' + name,
> targ1)
>  gdb.types.register_type_printer(obj, printer)
> -if _versioned_namespace and not '__cxx11' in template:
> +if _versioned_namespace and '__cxx11' not in template:
>  ns = 'std::' + _versioned_namespace
>  printer = FilteringTypePrinter(ns + template, ns + name, targ1)
>  gdb.types.register_type_printer(obj, printer)
> --
> 2.40.1
>
>


Re: [PATCH 6/7] libstdc++: Fix regex escapes in pretty-printers

2023-09-28 Thread Jonathan Wakely
On Thu, 28 Sept 2023, 18:50 Tom Tromey via Libstdc++, 
wrote:

> flake8 pointed out that some regexes in the pretty-printers are
> missing a backslash.  This patch fixes these.
>

I already have a patch to use r'...' for these, so we only need the single
backslash.

I'm also refactoring all those re.match calls in xmethods.exp to use a
common function.

So please don't commit this one, I think it will be unnecessary in a couple
of hours.



> libstdc++-v3/ChangeLog:
>
> * python/libstdcxx/v6/printers.py
> (StdExpAnyPrinter.__init__, StdExpOptionalPrinter.__init__):
> Add missing backslash.
> * python/libstdcxx/v6/xmethods.py
> (ArrayMethodsMatcher.match, DequeMethodsMatcher.match)
> (ForwardListMethodsMatcher.match, ListMethodsMatcher.match)
> (VectorMethodsMatcher.match)
> (AssociativeContainerMethodsMatcher.match)
> (UniquePtrGetWorker.__call__, UniquePtrMethodsMatcher.match)
> (SharedPtrSubscriptWorker.__call__)
> (SharedPtrMethodsMatcher.match): Add missing backslash.
> ---
>  libstdc++-v3/python/libstdcxx/v6/printers.py |  6 +++---
>  libstdc++-v3/python/libstdcxx/v6/xmethods.py | 22 ++--
>  2 files changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py
> b/libstdc++-v3/python/libstdcxx/v6/printers.py
> index 94ac9232da7..d125236b777 100644
> --- a/libstdc++-v3/python/libstdcxx/v6/printers.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
> @@ -1344,7 +1344,7 @@ class StdExpAnyPrinter(SingleObjContainerPrinter):
>  def __init__(self, typename, val):
>  self._typename = strip_versioned_namespace(typename)
>  self._typename = re.sub(
> -'^std::experimental::fundamentals_v\d::',
> 'std::experimental::', self._typename, 1)
> +'^std::experimental::fundamentals_v\\d::',
> 'std::experimental::', self._typename, 1)
>  self._val = val
>  self._contained_type = None
>  contained_value = None
> @@ -1377,7 +1377,7 @@ class StdExpAnyPrinter(SingleObjContainerPrinter):
>  mgrtypes = []
>  for s in strings:
>  try:
> -x = re.sub("std::string(?!\w)", s, m.group(1))
> +x = re.sub("std::string(?!\\w)", s, m.group(1))
>  # The following lookup might raise gdb.error if
> the
>  # manager function was never instantiated for 's'
> in the
>  # program, because there will be no such type.
> @@ -1425,7 +1425,7 @@ class
> StdExpOptionalPrinter(SingleObjContainerPrinter):
>  def __init__(self, typename, val):
>  typename = strip_versioned_namespace(typename)
>  self._typename = re.sub(
> -'^std::(experimental::|)(fundamentals_v\d::|)(.*)',
> r'std::\1\3', typename, 1)
> +'^std::(experimental::|)(fundamentals_v\\d::|)(.*)',
> r'std::\1\3', typename, 1)
>  payload = val['_M_payload']
>  if self._typename.startswith('std::experimental'):
>  engaged = val['_M_engaged']
> diff --git a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> index 025b1b86ed0..eafecbb148e 100644
> --- a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> +++ b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
> @@ -159,7 +159,7 @@ class ArrayMethodsMatcher(gdb.xmethod.XMethodMatcher):
>  self.methods = [self._method_dict[m] for m in self._method_dict]
>
>  def match(self, class_type, method_name):
> -if not re.match('^std::(__\d+::)?array<.*>$', class_type.tag):
> +if not re.match('^std::(__\\d+::)?array<.*>$', class_type.tag):
>  return None
>  method = self._method_dict.get(method_name)
>  if method is None or not method.enabled:
> @@ -284,7 +284,7 @@ class DequeMethodsMatcher(gdb.xmethod.XMethodMatcher):
>  self.methods = [self._method_dict[m] for m in self._method_dict]
>
>  def match(self, class_type, method_name):
> -if not re.match('^std::(__\d+::)?deque<.*>$', class_type.tag):
> +if not re.match('^std::(__\\d+::)?deque<.*>$', class_type.tag):
>  return None
>  method = self._method_dict.get(method_name)
>  if method is None or not method.enabled:
> @@ -332,7 +332,7 @@ class
> ForwardListMethodsMatcher(gdb.xmethod.XMethodMatcher):
>  self.methods = [self._method_dict[m] for m in self._method_dict]
>
>  def match(self, class_type, method_name):
> -if not re.match('^std::(__\d+::)?forward_list<.*>$',
> class_type.tag):
> +if not re.match('^std::(__\\d+::)?forward_list<.*>$',
> class_type.tag):
>  return None
>  method = self._method_dict.get(method_name)
>  if method is None or not method.enabled:
> @@ -419,7 +419,7 @@ class ListMethodsMatcher(gdb.xmethod.XMethodMatcher):
>  

Re: [PATCH 1/7] libstdc++: Show full Python stack on error

2023-09-28 Thread Jonathan Wakely
On Thu, 28 Sept 2023, 18:48 Tom Tromey via Libstdc++, 
wrote:

> This changes the libstdc++ test suite to arrange for gdb to show the
> full Python stack if any sort of Python exception occurs.  This makes
> debugging the printers a little simpler.
>

Oh I wish I'd known about this sooner.

OK for trunk, thanks.


> libstdc++-v3/ChangeLog:
>
> * testsuite/lib/gdb-test.exp (gdb-test): Enable Python
> stack traces from gdb.
> ---
>  libstdc++-v3/testsuite/lib/gdb-test.exp | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/libstdc++-v3/testsuite/lib/gdb-test.exp
> b/libstdc++-v3/testsuite/lib/gdb-test.exp
> index d8e572ef7b3..af7d970d388 100644
> --- a/libstdc++-v3/testsuite/lib/gdb-test.exp
> +++ b/libstdc++-v3/testsuite/lib/gdb-test.exp
> @@ -141,6 +141,8 @@ proc gdb-test { marker {selector {}} {load_xmethods 0}
> } {
>  puts $fd "set auto-load no"
>  # Now that we've disabled auto-load, it's safe to set the target file
>  puts $fd "file ./$output_file"
> +# See the full backtrace of any failures.
> +puts $fd "set python print-stack full"
>  # Load & register *our* copy of the pretty-printers
>  puts $fd "source $printer_code"
>  puts $fd "python register_libstdcxx_printers(None)"
> --
> 2.40.1
>
>


Re: [committed] libstdc++: Add GDB printers for types

2023-09-28 Thread Jonathan Wakely
On Thu, 28 Sept 2023, 18:37 Tom Tromey,  wrote:

> Jonathan> The changes made by black seem reasonable, though I prefer it
> Jonathan> with -S to disable string-normalization. It also needs an
> Jonathan> option to use 79 as the maximum line length.
>
> I've got some patches I'm about to send.
>
> I made a pyproject.toml to auto-configure black (and isort), and this
> works fine, but it also makes a bunch of edits.  So I'd rather send that
> separately, after the current batch of patches is handled.
>
> flake8 still isn't really happy, I guess because there are strings that
> cause lines over 79, and black doesn't split those.  But meh, maybe
> suppressing some flake8 errors is the way to go.
>


I was about to push some changes to split those strings up.



> Tom
>


Re: [Fortran, Patch, Coarray, PR 37336] Fix crash in finalizer when derived type coarray is already freed.

2023-09-28 Thread Paul Richard Thomas
Hi Andre,

The patch looks fine to me. Since you mention it in the comment, is it
worth declaring the derived type 'foo' in a module and giving it a
final routine?

Thanks for the patch.

Paul

On Thu, 28 Sept 2023 at 13:45, Andre Vehreschild via Fortran
 wrote:
>
> Hi all,
>
> attached patch fixes a crash in coarray programs when an allocatable derived
> typed coarray was freed explicitly. The generated cleanup code did not take
> into account, that the coarray may have been deallocated already. The patch
> fixes this by moving the statements accessing components inside the derived 
> type
> into the block guard by its allocated check.
>
> Regtested ok on f37/x86_64. Ok for master?
>
> Regards,
> Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


[PATCH 2/7] libstdc++: Use gdb.ValuePrinter base class

2023-09-28 Thread Tom Tromey
GDB 14 will add a new ValuePrinter tag class that will be used to
signal that pretty-printers will agree to the "extension protocol" --
essentially that they will follow some simple namespace rules, so that
GDB can add new methods over time.

A couple new methods have already been added to GDB, to support DAP.
While I haven't implemented these for any libstdc++ printers yet, this
patch makes the basic conversion: printers derive from
gdb.ValuePrinter if it is available, and all "non-standard" (that is,
not specified by GDB) members of the various value-printing classes
are renamed to have a leading underscore.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py | 1201 +-
 1 file changed, 605 insertions(+), 596 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index d60c8003a63..bbc4375541f 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -97,6 +97,12 @@ try:
 except ImportError:
 pass
 
+# Use the base class if available.
+if hasattr(gdb, 'ValuePrinter'):
+printer_base = gdb.ValuePrinter
+else:
+printer_base = object
+
 # Starting with the type ORIG, search for the member type NAME.  This
 # handles searching upward through superclasses.  This is needed to
 # work around http://sourceware.org/bugzilla/show_bug.cgi?id=13615.
@@ -241,43 +247,43 @@ class SmartPtrIterator(Iterator):
 "An iterator for smart pointer types with a single 'child' value"
 
 def __init__(self, val):
-self.val = val
+self._val = val
 
 def __iter__(self):
 return self
 
 def __next__(self):
-if self.val is None:
+if self._val is None:
 raise StopIteration
-self.val, val = None, self.val
+self._val, val = None, self._val
 return ('get()', val)
 
 
-class SharedPointerPrinter:
+class SharedPointerPrinter(printer_base):
 "Print a shared_ptr, weak_ptr, atomic, or atomic"
 
 def __init__(self, typename, val):
-self.typename = strip_versioned_namespace(typename)
-self.val = val
-self.pointer = val['_M_ptr']
+self._typename = strip_versioned_namespace(typename)
+self._val = val
+self._pointer = val['_M_ptr']
 
 def children(self):
-return SmartPtrIterator(self.pointer)
+return SmartPtrIterator(self._pointer)
 
 # Return the _Sp_counted_base<>* that holds the refcounts.
 def _get_refcounts(self):
-if self.typename == 'std::atomic':
+if self._typename == 'std::atomic':
 # A tagged pointer is stored as uintptr_t.
-ptr_val = self.val['_M_refcount']['_M_val']['_M_i']
+ptr_val = self._val['_M_refcount']['_M_val']['_M_i']
 ptr_val = ptr_val - (ptr_val % 2)  # clear lock bit
-ptr_type = find_type(self.val['_M_refcount'].type, 'pointer')
+ptr_type = find_type(self._val['_M_refcount'].type, 'pointer')
 return ptr_val.cast(ptr_type)
-return self.val['_M_refcount']['_M_pi']
+return self._val['_M_refcount']['_M_pi']
 
 def to_string(self):
 state = 'empty'
 refcounts = self._get_refcounts()
-targ = self.val.type.template_argument(0)
+targ = self._val.type.template_argument(0)
 targ = strip_versioned_namespace(str(targ))
 
 if refcounts != 0:
@@ -288,7 +294,7 @@ class SharedPointerPrinter:
 else:
 state = 'use count %d, weak count %d' % (
 usecount, weakcount - 1)
-return '%s<%s> (%s)' % (self.typename, targ, state)
+return '%s<%s> (%s)' % (self._typename, targ, state)
 
 
 def _tuple_impl_get(val):
@@ -347,17 +353,17 @@ def unique_ptr_get(val):
 return tuple_get(0, tuple_member)
 
 
-class UniquePointerPrinter:
+class UniquePointerPrinter(printer_base):
 "Print a unique_ptr"
 
 def __init__(self, typename, val):
-self.val = val
+self._val = val
 
 def children(self):
-return SmartPtrIterator(unique_ptr_get(self.val))
+return SmartPtrIterator(unique_ptr_get(self._val))
 
 def to_string(self):
-return ('std::unique_ptr<%s>' % 
(str(self.val.type.template_argument(0
+return ('std::unique_ptr<%s>' % 
(str(self._val.type.template_argument(0
 
 
 def get_value_from_aligned_membuf(buf, valtype):
@@ -381,55 +387,56 @@ def get_value_from_list_node(node):
 raise ValueError("Unsupported implementation for %s" % str(node.type))
 
 
-class StdListPrinter:
+class StdListPrinter(printer_base):
 "Print a std::list"
 
 class _iterator(Iterator):
 def __init__(self, nodetype, head):
-self.nodetype = nodetype
-self.base = head['_M_next']
-self.head = head.address
-self.count = 0
+self._nodetype = nodetype
+self._base = head['_M_next']
+

[PATCH 7/7] libstdc++: Use Python "not in" operator

2023-09-28 Thread Tom Tromey
flake8 warns about code like

not something in "whatever"

Ordinarily in Python this should be written as:

something not in "whatever"

This patch makes this change.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (Printer.add_version)
(add_one_template_type_printer)
(FilteringTypePrinter.add_one_type_printer): Use Python
"not in" operator.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index d125236b777..380426cd41e 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -2321,7 +2321,7 @@ class Printer(object):
 # Add a name using _GLIBCXX_BEGIN_NAMESPACE_VERSION.
 def add_version(self, base, name, function):
 self.add(base + name, function)
-if _versioned_namespace and not '__cxx11' in base:
+if _versioned_namespace and '__cxx11' not in base:
 vbase = re.sub('^(std|__gnu_cxx)::', r'\g<0>%s' %
_versioned_namespace, base)
 self.add(vbase + name, function)
@@ -2494,7 +2494,7 @@ def add_one_template_type_printer(obj, name, defargs):
 printer = TemplateTypePrinter('std::__debug::'+name, defargs)
 gdb.types.register_type_printer(obj, printer)
 
-if _versioned_namespace and not '__cxx11' in name:
+if _versioned_namespace and '__cxx11' not in name:
 # Add second type printer for same type in versioned namespace:
 ns = 'std::' + _versioned_namespace
 # PR 86112 Cannot use dict comprehension here:
@@ -2589,7 +2589,7 @@ class FilteringTypePrinter(object):
 def add_one_type_printer(obj, template, name, targ1=None):
 printer = FilteringTypePrinter('std::' + template, 'std::' + name, targ1)
 gdb.types.register_type_printer(obj, printer)
-if _versioned_namespace and not '__cxx11' in template:
+if _versioned_namespace and '__cxx11' not in template:
 ns = 'std::' + _versioned_namespace
 printer = FilteringTypePrinter(ns + template, ns + name, targ1)
 gdb.types.register_type_printer(obj, printer)
-- 
2.40.1



[PATCH 4/7] libstdc++: Remove unused locals from printers.py

2023-09-28 Thread Tom Tromey
flake8 pointed out some unused local variables in the libstdc++
pretty-printers.  This removes them.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py
(StdExpOptionalPrinter.__init__, lookup_node_type):
Remove unused variables.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 8d44244afb0..6bf4fe891fd 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -169,7 +169,7 @@ def lookup_node_type(nodename, containertype):
 valtype = valtype.strip_typedefs()
 try:
 return lookup_templ_spec(nodename, valtype)
-except gdb.error as e:
+except gdb.error:
 # For debug mode containers the node is in std::__cxx1998.
 if is_member_of_namespace(nodename, 'std'):
 if is_member_of_namespace(containertype, 'std::__cxx1998',
@@ -1423,7 +1423,6 @@ class StdExpOptionalPrinter(SingleObjContainerPrinter):
 "Print a std::optional or std::experimental::optional"
 
 def __init__(self, typename, val):
-valtype = self._recognize(val.type.template_argument(0))
 typename = strip_versioned_namespace(typename)
 self._typename = re.sub(
 '^std::(experimental::|)(fundamentals_v\d::|)(.*)', r'std::\1\3', 
typename, 1)
-- 
2.40.1



[PATCH 6/7] libstdc++: Fix regex escapes in pretty-printers

2023-09-28 Thread Tom Tromey
flake8 pointed out that some regexes in the pretty-printers are
missing a backslash.  This patch fixes these.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py
(StdExpAnyPrinter.__init__, StdExpOptionalPrinter.__init__):
Add missing backslash.
* python/libstdcxx/v6/xmethods.py
(ArrayMethodsMatcher.match, DequeMethodsMatcher.match)
(ForwardListMethodsMatcher.match, ListMethodsMatcher.match)
(VectorMethodsMatcher.match)
(AssociativeContainerMethodsMatcher.match)
(UniquePtrGetWorker.__call__, UniquePtrMethodsMatcher.match)
(SharedPtrSubscriptWorker.__call__)
(SharedPtrMethodsMatcher.match): Add missing backslash.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py |  6 +++---
 libstdc++-v3/python/libstdcxx/v6/xmethods.py | 22 ++--
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 94ac9232da7..d125236b777 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -1344,7 +1344,7 @@ class StdExpAnyPrinter(SingleObjContainerPrinter):
 def __init__(self, typename, val):
 self._typename = strip_versioned_namespace(typename)
 self._typename = re.sub(
-'^std::experimental::fundamentals_v\d::', 'std::experimental::', 
self._typename, 1)
+'^std::experimental::fundamentals_v\\d::', 'std::experimental::', 
self._typename, 1)
 self._val = val
 self._contained_type = None
 contained_value = None
@@ -1377,7 +1377,7 @@ class StdExpAnyPrinter(SingleObjContainerPrinter):
 mgrtypes = []
 for s in strings:
 try:
-x = re.sub("std::string(?!\w)", s, m.group(1))
+x = re.sub("std::string(?!\\w)", s, m.group(1))
 # The following lookup might raise gdb.error if the
 # manager function was never instantiated for 's' in 
the
 # program, because there will be no such type.
@@ -1425,7 +1425,7 @@ class StdExpOptionalPrinter(SingleObjContainerPrinter):
 def __init__(self, typename, val):
 typename = strip_versioned_namespace(typename)
 self._typename = re.sub(
-'^std::(experimental::|)(fundamentals_v\d::|)(.*)', r'std::\1\3', 
typename, 1)
+'^std::(experimental::|)(fundamentals_v\\d::|)(.*)', r'std::\1\3', 
typename, 1)
 payload = val['_M_payload']
 if self._typename.startswith('std::experimental'):
 engaged = val['_M_engaged']
diff --git a/libstdc++-v3/python/libstdcxx/v6/xmethods.py 
b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
index 025b1b86ed0..eafecbb148e 100644
--- a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
+++ b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
@@ -159,7 +159,7 @@ class ArrayMethodsMatcher(gdb.xmethod.XMethodMatcher):
 self.methods = [self._method_dict[m] for m in self._method_dict]
 
 def match(self, class_type, method_name):
-if not re.match('^std::(__\d+::)?array<.*>$', class_type.tag):
+if not re.match('^std::(__\\d+::)?array<.*>$', class_type.tag):
 return None
 method = self._method_dict.get(method_name)
 if method is None or not method.enabled:
@@ -284,7 +284,7 @@ class DequeMethodsMatcher(gdb.xmethod.XMethodMatcher):
 self.methods = [self._method_dict[m] for m in self._method_dict]
 
 def match(self, class_type, method_name):
-if not re.match('^std::(__\d+::)?deque<.*>$', class_type.tag):
+if not re.match('^std::(__\\d+::)?deque<.*>$', class_type.tag):
 return None
 method = self._method_dict.get(method_name)
 if method is None or not method.enabled:
@@ -332,7 +332,7 @@ class ForwardListMethodsMatcher(gdb.xmethod.XMethodMatcher):
 self.methods = [self._method_dict[m] for m in self._method_dict]
 
 def match(self, class_type, method_name):
-if not re.match('^std::(__\d+::)?forward_list<.*>$', class_type.tag):
+if not re.match('^std::(__\\d+::)?forward_list<.*>$', class_type.tag):
 return None
 method = self._method_dict.get(method_name)
 if method is None or not method.enabled:
@@ -419,7 +419,7 @@ class ListMethodsMatcher(gdb.xmethod.XMethodMatcher):
 self.methods = [self._method_dict[m] for m in self._method_dict]
 
 def match(self, class_type, method_name):
-if not re.match('^std::(__\d+::)?(__cxx11::)?list<.*>$', 
class_type.tag):
+if not re.match('^std::(__\\d+::)?(__cxx11::)?list<.*>$', 
class_type.tag):
 return None
 method = self._method_dict.get(method_name)
 if method is None or not method.enabled:
@@ -542,7 +542,7 @@ class VectorMethodsMatcher(gdb.xmethod.XMethodMatcher):
 self.methods = 

[PATCH 5/7] libstdc++: Remove std_ratio_t_tuple

2023-09-28 Thread Tom Tromey
This removes the std_ratio_t_tuple function from the Python
pretty-printer code.  It is not used.  Apparently the relevant parts
were moved to StdChronoDurationPrinter._ratio at some point in the
past.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (std_ratio_t_tuple):
Remove.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py | 8 
 1 file changed, 8 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 6bf4fe891fd..94ac9232da7 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -1985,14 +1985,6 @@ class StdFormatArgsPrinter(printer_base):
 return "%s with %d arguments" % (typ, size)
 
 
-def std_ratio_t_tuple(ratio_type):
-# TODO use reduced period i.e. duration::period
-period = self._val.type.template_argument(1)
-num = period.template_argument(0)
-den = period.template_argument(1)
-return (num, den)
-
-
 class StdChronoDurationPrinter(printer_base):
 "Print a std::chrono::duration"
 
-- 
2.40.1



[PATCH 3/7] libstdc++: Remove unused Python imports

2023-09-28 Thread Tom Tromey
flake8 pointed out some unused imports.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py: Don't import 'os'.
* python/libstdcxx/v6/__init__.py: Don't import 'gdb'.
---
 libstdc++-v3/python/libstdcxx/v6/__init__.py | 2 --
 libstdc++-v3/python/libstdcxx/v6/printers.py | 1 -
 2 files changed, 3 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/__init__.py 
b/libstdc++-v3/python/libstdcxx/v6/__init__.py
index df654acd0c2..8b2cbc60a1b 100644
--- a/libstdc++-v3/python/libstdcxx/v6/__init__.py
+++ b/libstdc++-v3/python/libstdcxx/v6/__init__.py
@@ -13,8 +13,6 @@
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see .
 
-import gdb
-
 # Load the xmethods if GDB supports them.
 def gdb_has_xmethods():
 try:
diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index bbc4375541f..8d44244afb0 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -19,7 +19,6 @@ import gdb
 import itertools
 import re
 import sys
-import os
 import errno
 import datetime
 
-- 
2.40.1



[PATCH 0/7] libstdc++: Use gdb.ValuePrinter in pretty-printers

2023-09-28 Thread Tom Tromey
GDB 14 will include a gdb.ValuePrinter tag class that can be used by
pretty-printers to signal they will accept any extensions that GDB
happens to make over time.

This series started as an attempt to change the libstdc++ printers to
support this.  This just involves renaming a bunch of attributes.
There aren't many interesting GDB API additions yet (and I didn't
implement the new ones in libstdc++ yet anyway), but seeing as these
are the flagship pretty-printers, it seemed worthwhile to do.

I added patch 1 when debugging the changes; then proceeded to fix a
bunch of small issues that were pointed out by flake8.

Tested on x86-64 Fedora 36.  Let me know what you think.

Tom




[PATCH 1/7] libstdc++: Show full Python stack on error

2023-09-28 Thread Tom Tromey
This changes the libstdc++ test suite to arrange for gdb to show the
full Python stack if any sort of Python exception occurs.  This makes
debugging the printers a little simpler.

libstdc++-v3/ChangeLog:

* testsuite/lib/gdb-test.exp (gdb-test): Enable Python
stack traces from gdb.
---
 libstdc++-v3/testsuite/lib/gdb-test.exp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/testsuite/lib/gdb-test.exp 
b/libstdc++-v3/testsuite/lib/gdb-test.exp
index d8e572ef7b3..af7d970d388 100644
--- a/libstdc++-v3/testsuite/lib/gdb-test.exp
+++ b/libstdc++-v3/testsuite/lib/gdb-test.exp
@@ -141,6 +141,8 @@ proc gdb-test { marker {selector {}} {load_xmethods 0} } {
 puts $fd "set auto-load no"
 # Now that we've disabled auto-load, it's safe to set the target file
 puts $fd "file ./$output_file"
+# See the full backtrace of any failures.
+puts $fd "set python print-stack full"
 # Load & register *our* copy of the pretty-printers
 puts $fd "source $printer_code"
 puts $fd "python register_libstdcxx_printers(None)"
-- 
2.40.1



Re: [committed] libstdc++: Add GDB printers for types

2023-09-28 Thread Tom Tromey
Jonathan> The changes made by black seem reasonable, though I prefer it
Jonathan> with -S to disable string-normalization. It also needs an
Jonathan> option to use 79 as the maximum line length.

I've got some patches I'm about to send.

I made a pyproject.toml to auto-configure black (and isort), and this
works fine, but it also makes a bunch of edits.  So I'd rather send that
separately, after the current batch of patches is handled.

flake8 still isn't really happy, I guess because there are strings that
cause lines over 79, and black doesn't split those.  But meh, maybe
suppressing some flake8 errors is the way to go.

Tom


Re: [PATCH] Remove poly_int_pod

2023-09-28 Thread Jason Merrill

On 9/28/23 05:55, Richard Sandiford wrote:

poly_int was written before the switch to C++11 and so couldn't
use explicit default constructors.  This led to an awkward split
between poly_int_pod and poly_int.  poly_int simply inherited from
poly_int_pod and added constructors, with the argumentless constructor
having an empty body.  But inheritance meant that poly_int had to
repeat the assignment operators from poly_int_pod (again, no C++11,
so no "using" to inherit base-class implementations).

All that goes away if we switch to using default constructors.

The main complication is ensuring that braced initialisation still
gives a constexpr, so that static variables can be initialised without
runtime code.  The two problems here are:

(1) When initialising a poly_int with fewer than N
 coefficients, the other coefficients need to be a zero of
 the same precision as the explicit coefficients.  This was
 previously done in a for loop using wi::ints_for<...>::zero,
 but C++11 constexpr constructors can't have function bodies.
 The patch instead uses a series of delegated initialisers to
 fill in the implicit coefficients.


Perhaps it's time to update the bootstrap requirement to C++14 (i.e. GCC 
5, from eight years ago).  Not that this would affect this particular patch.


Jason



Re: [PATCH] [11/12/13/14 Regression] ABI break in _Hash_node_value_base since GCC 11 [PR 111050]

2023-09-28 Thread François Dumont



On 28/09/2023 18:18, Jonathan Wakely wrote:

On Wed, 27 Sept 2023 at 05:44, François Dumont  wrote:

Still no chance to get feedback from TC ? Maybe I can commit the below
then ?

I've heard back from Tim now. Please use "Tim Song
" as the author.

You can change the commit again using git commit --amend --author "Tim
Song "


Sure :-)



OK for trunk with that change - thanks for waiting.


Committed to trunk, let me know for backports.


AFAICS on gcc mailing list several gcc releases were done recently, too
late.

There have been no releases this month, so the delay hasn't caused any problems.


I was confused by emails like this one:

https://gcc.gnu.org/pipermail/gcc/2023-September/242429.html

I just subscribed to gcc mailing list, I had no idea there were regular 
snapshots like this.




[PATCH] Include safe-ctype.h after C++ standard headers, to avoid over-poisoning

2023-09-28 Thread Dimitry Andric
Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632

When building gcc's C++ sources against recent libc++, the poisoning of
the ctype macros due to including safe-ctype.h before including C++
standard headers such as , , etc, causes many compilation
errors, similar to:

 In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
 In file included from /home/dim/src/gcc/master/gcc/system.h:233:
 In file included from /usr/include/c++/v1/vector:321:
 In file included from
 /usr/include/c++/v1/__format/formatter_bool.h:20:
 In file included from
 /usr/include/c++/v1/__format/formatter_integral.h:32:
 In file included from /usr/include/c++/v1/locale:202:
 /usr/include/c++/v1/__locale:546:5: error: '__abi_tag__' attribute
 only applies to structs, variables, functions, and namespaces
   546 | _LIBCPP_INLINE_VISIBILITY
   | ^
 /usr/include/c++/v1/__config:813:37: note: expanded from macro
 '_LIBCPP_INLINE_VISIBILITY'
   813 | #  define _LIBCPP_INLINE_VISIBILITY _LIBCPP_HIDE_FROM_ABI
   | ^
 /usr/include/c++/v1/__config:792:26: note: expanded from macro
 '_LIBCPP_HIDE_FROM_ABI'
   792 |
   __attribute__((__abi_tag__(_LIBCPP_TOSTRING(
 _LIBCPP_VERSIONED_IDENTIFIER
   |  ^
 In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
 In file included from /home/dim/src/gcc/master/gcc/system.h:233:
 In file included from /usr/include/c++/v1/vector:321:
 In file included from
 /usr/include/c++/v1/__format/formatter_bool.h:20:
 In file included from
 /usr/include/c++/v1/__format/formatter_integral.h:32:
 In file included from /usr/include/c++/v1/locale:202:
 /usr/include/c++/v1/__locale:547:37: error: expected ';' at end of
 declaration list
   547 | char_type toupper(char_type __c) const
   | ^
 /usr/include/c++/v1/__locale:553:48: error: too many arguments
 provided to function-like macro invocation
   553 | const char_type* toupper(char_type* __low, const
   char_type* __high) const
   |^
 /home/dim/src/gcc/master/gcc/../include/safe-ctype.h:146:9: note:
 macro 'toupper' defined here
   146 | #define toupper(c) do_not_use_toupper_with_safe_ctype
   | ^

This is because libc++ uses different transitive includes than
libstdc++, and some of those transitive includes pull in various ctype
declarations (typically via ).

There was already a special case for including  before
safe-ctype.h, so move the rest of the C++ standard header includes to
the same location, to fix the problem.

Signed-off-by: Dimitry Andric 
---
gcc/system.h | 39 ++-
1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/gcc/system.h b/gcc/system.h
index e924152ad4c..7a516b11438 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -194,27 +194,8 @@ extern int fprintf_unlocked (FILE *, const char *, ...);
#undef fread_unlocked
#undef fwrite_unlocked

-/* Include  before "safe-ctype.h" to avoid GCC poisoning
-   the ctype macros through safe-ctype.h */
-
-#ifdef __cplusplus
-#ifdef INCLUDE_STRING
-# include 
-#endif
-#endif
-
-/* There are an extraordinary number of issues with .
-   The last straw is that it varies with the locale.  Use libiberty's
-   replacement instead.  */
-#include "safe-ctype.h"
-
-#include 
-
-#include 
-
-#if !defined (errno) && defined (HAVE_DECL_ERRNO) && !HAVE_DECL_ERRNO
-extern int errno;
-#endif
+/* Include C++ standard headers before "safe-ctype.h" to avoid GCC
+   poisoning the ctype macros through safe-ctype.h */

#ifdef __cplusplus
#if defined (INCLUDE_ALGORITHM) || !defined (HAVE_SWAP_IN_UTILITY)
@@ -229,6 +210,9 @@ extern int errno;
#ifdef INCLUDE_SET
# include 
#endif
+#ifdef INCLUDE_STRING
+# include 
+#endif
#ifdef INCLUDE_VECTOR
# include 
#endif
@@ -245,6 +229,19 @@ extern int errno;
# include 
#endif

+/* There are an extraordinary number of issues with .
+   The last straw is that it varies with the locale.  Use libiberty's
+   replacement instead.  */
+#include "safe-ctype.h"
+
+#include 
+
+#include 
+
+#if !defined (errno) && defined (HAVE_DECL_ERRNO) && !HAVE_DECL_ERRNO
+extern int errno;
+#endif
+
/* Some of glibc's string inlines cause warnings.  Plus we'd rather
   rely on (and therefore test) GCC's string builtins.  */
#define __NO_STRING_INLINES
-- 
2.42.0



Re: [PATCH] [11/12/13/14 Regression] ABI break in _Hash_node_value_base since GCC 11 [PR 111050]

2023-09-28 Thread Jonathan Wakely
On Wed, 27 Sept 2023 at 05:44, François Dumont  wrote:
>
> Still no chance to get feedback from TC ? Maybe I can commit the below
> then ?

I've heard back from Tim now. Please use "Tim Song
" as the author.

You can change the commit again using git commit --amend --author "Tim
Song "

OK for trunk with that change - thanks for waiting.


>
> AFAICS on gcc mailing list several gcc releases were done recently, too
> late.

There have been no releases this month, so the delay hasn't caused any problems.


>
>
> On 14/09/2023 06:46, François Dumont wrote:
> > Author: TC 
> > Date:   Wed Sep 6 19:31:55 2023 +0200
> >
> > libstdc++: Force _Hash_node_value_base methods inline to fix abi
> > (PR111050)
> >
> > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1b6f0476837205932613ddb2b3429a55c26c409d
> >
> > changed _Hash_node_value_base to no longer derive from
> > _Hash_node_base, which means
> > that its member functions expect _M_storage to be at a different
> > offset. So explosions
> > result if an out-of-line definition is emitted for any of the
> > member functions (say,
> > in a non-optimized build) and the resulting object file is then
> > linked with code built
> > using older version of GCC/libstdc++.
> >
> > libstdc++-v3/ChangeLog:
> >
> > PR libstdc++/111050
> > * include/bits/hashtable_policy.h
> > (_Hash_node_value_base<>::_M_valptr(),
> > _Hash_node_value_base<>::_M_v())
> > Add [[__gnu__::__always_inline__]].
> >
> > Ok to commit ?
> >
> > On 12/09/2023 18:09, Jonathan Wakely wrote:
> >> On Mon, 11 Sept 2023 at 18:19, François Dumont 
> >> wrote:
> >>>
> >>> On 11/09/2023 13:51, Jonathan Wakely wrote:
>  On Sun, 10 Sept 2023 at 14:57, François Dumont via Libstdc++
>   wrote:
> > Following confirmation of the fix by TC here is the patch where I'm
> > simply adding a 'constexpr' on _M_next().
> >
> > Please let me know this ChangeLog entry is correct. I would prefer
> > this
> > patch to be assigned to 'TC' with me as co-author but I don't know
> > how
> > to do such a thing. Unless I need to change my user git identity
> > to do so ?
>  Sam already explained that, but please check with Tim how he wants to
>  be credited, if at all. He doesn't have a copyright assignment, and
>  hasn't added a DCO sign-off to the patch, but it's small enough to not
>  need it as this is the first contribution credited to him.
> 
> 
> >libstdc++: Add constexpr qualification to
> > _Hash_node::_M_next()
>  What has this constexpr addition got to do with the ABI change and the
>  always_inline attributes?
> 
>  It certainly doesn't seem like it should be the summary line of the
>  git commit message.
> >>> Oops, sorry, that's what I had started to do before Tim submitted
> >>> anything.
> >>>
> >>> Here is latest version:
> >> No patch attached, and the ChangeLog below still mentions the constexpr.
> >>
> >> I've pinged Tim via another channel to ask him about the author
> >> attribution.
> >>
> >>
> >>> Author: TC 
> >>> Date:   Wed Sep 6 19:31:55 2023 +0200
> >>>
> >>>   libstdc++: Force inline on _Hash_node_value_base methods to
> >>> fix abi
> >>> (PR111050)
> >>>
> >>> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1b6f0476837205932613ddb2b3429a55c26c409d
> >>>
> >>>   changed _Hash_node_value_base to no longer derive from
> >>> _Hash_node_base, which means
> >>>   that its member functions expect _M_storage to be at a different
> >>> offset. So explosions
> >>>   result if an out-of-line definition is emitted for any of the
> >>> member functions (say,
> >>>   in a non-optimized build) and the resulting object file is then
> >>> linked with code built
> >>>   using older version of GCC/libstdc++.
> >>>
> >>>   libstdc++-v3/ChangeLog:
> >>>
> >>>   PR libstdc++/111050
> >>>   * include/bits/hashtable_policy.h
> >>>   (_Hash_node_value_base<>::_M_valptr(),
> >>> _Hash_node_value_base<>::_M_v())
> >>>   Add [[__gnu__::__always_inline__]].
> >>>   (_Hash_node<>::_M_next()): Add constexpr.
> >>>
> >>>   Co-authored-by: François Dumont 
> >>>
> >>> Ok for you TC (Tim ?) ?
> >>>
> >>>
>



reverting patch to improve equiv cost calculation

2023-09-28 Thread Vladimir Makarov
I've got a lot of complaints about my recent patch to improve equiv cost 
calculation.  So I am reverting the patch.
commit 8552dcd8e4448c02fe230662093756b75dd94399
Author: Vladimir N. Makarov 
Date:   Thu Sep 28 11:53:51 2023 -0400

Revert "[RA]: Improve cost calculation of pseudos with equivalences"

This reverts commit 3c834d85f2ec42c60995c2b678196a06cb744959.

Although the patch improves x86-64 specfp2007, it also results in
performance and code size regression on different targets and
new GCC testsuite failures on tests expecting a specific output.

diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
index 8c93ace5094..d9e700e8947 100644
--- a/gcc/ira-costs.cc
+++ b/gcc/ira-costs.cc
@@ -1947,8 +1947,15 @@ find_costs_and_classes (FILE *dump_file)
 	}
 	  if (i >= first_moveable_pseudo && i < last_moveable_pseudo)
 	i_mem_cost = 0;
-	  else
-	i_mem_cost -= equiv_savings;
+	  else if (equiv_savings < 0)
+	i_mem_cost = -equiv_savings;
+	  else if (equiv_savings > 0)
+	{
+	  i_mem_cost = 0;
+	  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
+		i_costs[k] += equiv_savings;
+	}
+
 	  best_cost = (1 << (HOST_BITS_PER_INT - 2)) - 1;
 	  best = ALL_REGS;
 	  alt_class = NO_REGS;


Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable

2023-09-28 Thread Andre Vieira (lists)




On 31/08/2023 07:39, Richard Biener wrote:

On Wed, Aug 30, 2023 at 5:02 PM Andre Vieira (lists)
 wrote:




On 30/08/2023 14:01, Richard Biener wrote:

On Wed, Aug 30, 2023 at 11:15 AM Andre Vieira (lists) via Gcc-patches
 wrote:


This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
hook to enable rejecting SVE modes when the target architecture does not
support SVE.


How does the graph node of the SIMD clone lack this information?  That is, it
should have information on the types (and thus modes) for all formal arguments
and return values already, no?  At least the target would know how to
instantiate
it if it's not readily available at the point of use.



Yes it does, but that's the modes the simd clone itself uses, it does
not know what vector_mode we are currently vectorizing for. Which is
exactly why we need the vinfo's vector_mode to make sure the simd clone
and its types are compatible with the vector mode.

In practice, to make sure that a SVE simd clones are only used in loops
being vectorized for SVE modes. Having said that... I just realized that
the simdlen check already takes care of that currently...

by simdlen check I mean the one that writes off simdclones that match:
  if (!constant_multiple_p (vf, n->simdclone->simdlen, _calls)

However, when using -msve-vector-bits this will become an issue, as the
VF will be constant and we will match NEON simdclones.  This requires
some further attention though given that we now also reject the use of
SVE simdclones when using -msve-vector-bits, and I'm not entirely sure
we should...


Hmm, but vectorizable_simdclone should check for compatible types here
and if they are compatible why should we reject them?  Are -msve-vector-bits
"SVE" modes different from "NEON" modes?  I suppose not, because otherwise
the type compatibility check would say incompatible.

Prior to transformation we do all checks on the original scalar values, 
not the vector types. But I do believe you are right in that we don't 
need to pass the vector_mode. The simdlen check should be enough and if 
the length is the same or a multiple of the rest of the could should be 
able to deal with that and any conversions when dealing with things like 
SVE types that require the attribute.


I'll update the patch series soon and after that I'll look at how this 
reacts to -msve-vector-bits in more detail.


Thanks,
Andre


Re: [PATCH v2] RISC-V: Support {U}INT64 to FP16 auto-vectorization

2023-09-28 Thread 钟居哲
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-28 22:15
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Support {U}INT64 to FP16 auto-vectorization
From: Pan Li 
 
Update in v2:
 
* Add math trap check.
* Adjust some test cases.
 
Original logs:
 
This patch would like to support the auto-vectorization from
the INT64 to FP16. We take below steps for the conversion.
 
* INT64 to FP32.
* FP32 to FP16.
 
Given sample code as below:
void
test_func (int64_t * __restrict a, _Float16 *b, unsigned n)
{
  for (unsigned i = 0; i < n; i++)
b[i] = (_Float16) (a[i]);
}
 
Before this patch:
test.c:6:26: missed: couldn't vectorize loop
test.c:6:26: missed: not vectorized: unsupported data-type
ld  a0,0(s0)
call__floatdihf
fsh fa0,0(s1)
addis0,s0,8
addis1,s1,2
bne s2,s0,.L3
ld  ra,24(sp)
ld  s0,16(sp)
ld  s1,8(sp)
ld  s2,0(sp)
addisp,sp,32
 
After this patch:
vsetvli a5,a2,e8,mf8,ta,ma
vle64.v v1,0(a0)
vsetvli a4,zero,e32,mf2,ta,ma
vfncvt.f.x.wv1,v1
vsetvli zero,zero,e16,mf4,ta,ma
vfncvt.f.f.wv1,v1
vsetvli zero,a2,e16,mf4,ta,ma
vse16.v v1,0(a1)
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
PR target/111506
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2):
New pattern.
* config/riscv/vector-iterators.md: New iterator.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cvt-0.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 24 ++
gcc/config/riscv/vector-iterators.md  | 38 +++
.../gcc.target/riscv/rvv/autovec/unop/cvt-0.c | 21 +
.../gcc.target/riscv/rvv/autovec/unop/cvt-1.c | 22 +
.../gcc.target/riscv/rvv/autovec/vls/cvt-0.c  | 47 +++
5 files changed, 152 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cvt-0.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd0cbdd2889..d6cf376ebca 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -974,6 +974,30 @@ (define_insn_and_split "2"
}
[(set_attr "type" "vfncvtitof")])
+;; This operation can be performed in the loop vectorizer but unfortunately
+;; not applicable for now. We can remove this pattern after loop vectorizer
+;; is able to take care of INT64 to FP16 conversion.
+(define_insn_and_split "2"
+  [(set (match_operand:  0 "register_operand")
+ (any_float:
+   (match_operand:VWWCONVERTI 1 "register_operand")))]
+  "TARGET_VECTOR && TARGET_ZVFH && can_create_pseudo_p () && 
!flag_trapping_math"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx single = gen_reg_rtx (mode); /* Get vector SF mode.  */
+
+/* Step-1, INT64 => FP32.  */
+emit_insn (gen_2 (single, operands[1]));
+/* Step-2, FP32 => FP16.  */
+emit_insn (gen_trunc2 (operands[0], single));
+
+DONE;
+  }
+  [(set_attr "type" "vfncvtitof")]
+)
+
;; =
;; == Unary arithmetic
;; =
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index b6cd872eb42..c9a7344b1bc 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1247,6 +1247,24 @@ (define_mode_iterator VWCONVERTI [
   (V512DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && 
TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 4096")
])
+(define_mode_iterator VWWCONVERTI [
+  (RVVM8DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM4DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM2DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM1DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+
+  (V1DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V2DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V4DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V8DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 64")
+  (V16DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 128")
+  (V32DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 256")
+  (V64DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 512")
+  (V128DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 1024")
+  (V256DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 2048")
+  (V512DI 

[PATCH v2] RISC-V: Support {U}INT64 to FP16 auto-vectorization

2023-09-28 Thread pan2 . li
From: Pan Li 

Update in v2:

* Add math trap check.
* Adjust some test cases.

Original logs:

This patch would like to support the auto-vectorization from
the INT64 to FP16. We take below steps for the conversion.

* INT64 to FP32.
* FP32 to FP16.

Given sample code as below:
void
test_func (int64_t * __restrict a, _Float16 *b, unsigned n)
{
  for (unsigned i = 0; i < n; i++)
b[i] = (_Float16) (a[i]);
}

Before this patch:
test.c:6:26: missed: couldn't vectorize loop
test.c:6:26: missed: not vectorized: unsupported data-type
ld  a0,0(s0)
call__floatdihf
fsh fa0,0(s1)
addis0,s0,8
addis1,s1,2
bne s2,s0,.L3
ld  ra,24(sp)
ld  s0,16(sp)
ld  s1,8(sp)
ld  s2,0(sp)
addisp,sp,32

After this patch:
vsetvli a5,a2,e8,mf8,ta,ma
vle64.v v1,0(a0)
vsetvli a4,zero,e32,mf2,ta,ma
vfncvt.f.x.wv1,v1
vsetvli zero,zero,e16,mf4,ta,ma
vfncvt.f.f.wv1,v1
vsetvli zero,a2,e16,mf4,ta,ma
vse16.v v1,0(a1)

Please note VLS mode is also involved in this patch and covered by the
test cases.

PR target/111506

gcc/ChangeLog:

* config/riscv/autovec.md (2):
New pattern.
* config/riscv/vector-iterators.md: New iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cvt-0.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   | 24 ++
 gcc/config/riscv/vector-iterators.md  | 38 +++
 .../gcc.target/riscv/rvv/autovec/unop/cvt-0.c | 21 +
 .../gcc.target/riscv/rvv/autovec/unop/cvt-1.c | 22 +
 .../gcc.target/riscv/rvv/autovec/vls/cvt-0.c  | 47 +++
 5 files changed, 152 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cvt-0.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd0cbdd2889..d6cf376ebca 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -974,6 +974,30 @@ (define_insn_and_split "2"
 }
 [(set_attr "type" "vfncvtitof")])
 
+;; This operation can be performed in the loop vectorizer but unfortunately
+;; not applicable for now. We can remove this pattern after loop vectorizer
+;; is able to take care of INT64 to FP16 conversion.
+(define_insn_and_split "2"
+  [(set (match_operand:  0 "register_operand")
+   (any_float:
+ (match_operand:VWWCONVERTI 1 "register_operand")))]
+  "TARGET_VECTOR && TARGET_ZVFH && can_create_pseudo_p () && 
!flag_trapping_math"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx single = gen_reg_rtx (mode); /* Get vector SF mode.  */
+
+/* Step-1, INT64 => FP32.  */
+emit_insn (gen_2 (single, operands[1]));
+/* Step-2, FP32 => FP16.  */
+emit_insn (gen_trunc2 (operands[0], single));
+
+DONE;
+  }
+  [(set_attr "type" "vfncvtitof")]
+)
+
 ;; =
 ;; == Unary arithmetic
 ;; =
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index b6cd872eb42..c9a7344b1bc 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1247,6 +1247,24 @@ (define_mode_iterator VWCONVERTI [
   (V512DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && 
TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 4096")
 ])
 
+(define_mode_iterator VWWCONVERTI [
+  (RVVM8DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM4DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM2DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM1DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+
+  (V1DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V2DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V4DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V8DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 64")
+  (V16DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 128")
+  (V32DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 256")
+  (V64DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 512")
+  (V128DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 1024")
+  (V256DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 2048")
+  (V512DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 4096")
+])
+
 (define_mode_iterator VQEXTI [
   RVVM8SI RVVM4SI 

Re: [PATCH] Remove poly_int_pod

2023-09-28 Thread Jakub Jelinek
On Thu, Sep 28, 2023 at 10:55:46AM +0100, Richard Sandiford wrote:
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  Also tested with
> Jakub's vec.h patch with the static_asserts uncommented; there were
> no errors from poly_int-related stuff.  OK to install?

LGTM (mostly as the general idea, but didn't see anything wrong in the
patch either), please give others a day or so to comment though.

Jakub



[RFC] > WIDE_INT_MAX_PREC support in wide_int and widest_int

2023-09-28 Thread Jakub Jelinek
Hi!

On Tue, Aug 29, 2023 at 05:09:52PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Aug 29, 2023 at 11:42:48AM +0100, Richard Sandiford wrote:
> > > I'll note tree-ssa-loop-niter.cc also uses GMP in some cases, widest_int
> > > is really trying to be poor-mans GMP by limiting the maximum precision.
> > 
> > I'd characterise widest_int as "a wide_int that is big enough to hold
> > all supported integer types, without losing sign information".  It's
> > not big enough to do arbitrary arithmetic without losing precision
> > (in the way that GMP is).
> > 
> > If the new limit on integer sizes is 65535 bits for all targets,
> > then I think that means that widest_int needs to become a 65536-bit type.
> > (But not with all bits represented all the time, of course.)
> 
> If the widest_int storage would be dependent on the len rather than
> precision for how it is stored, then I think we'd need a new method which
> would be called at the start of filling the limbs where we'd tell how many
> limbs there would be (i.e. what will set_len be called with later on), and
> do nothing for all storages but the new widest_int_storage.

So, I've spent some time on this.  While wide_int is in the patch a 
fixed/variable
number of limbs (aka len) storage depending on precision (precision >
WIDE_INT_MAX_PRECISION means heap allocated limb array, otherwise it is
inline), widest_int has always very large precision
(WIDEST_INT_MAX_PRECISION, currently defined to the INTEGER_CST imposed
limitation of 255 64-bit limbs) but uses inline array for length
corresponding up to WIDE_INT_MAX_PRECISION bits and for larger one uses
similarly to wide_int a heap allocated array of limbs.
These changes make both wide_int and widest_int obviously non-POD, not
trivially default constructible, nor trivially copy constructible, trivially
destructible, trivially copyable, so not a good fit for GC and some vec
operations.
One common use of wide_int in GC structures was in dwarf2out.{h,cc}; but as
large _BitInt constants don't appear in RTL, we really don't need such large
precisions there.
So, for wide_int the patch introduces rwide_int, restricted wide_int, which
acts like the old wide_int (except that it is now trivially default
constructible and has assertions precision isn't set above
WIDE_INT_MAX_PRECISION).
For widest_int, the nastiness is that because it always has huge precision
of 16320 right now,
a) we need to be told upfront in wide-int.h before calling the large
   value internal functions in wide-int.cc how many elements we'll need for
   the result (some reasonable upper estimate is fine)
b) various of the wide-int.cc functions were lazy and assumed precision is
   small enough and often used up to that many elements, which is
   undesirable; so, it now tries to decreas that and use xi.len etc. based
   estimates instead if possible (sometimes only if precision is above
   WIDE_INT_MAX_PRECISION)
c) with the higher precision, behavior changes for lrshift (-1, 2) etc. or
   unsigned division with dividend having most significant bit set in
   widest_int - while such values were considered to be above or equal to
   1 << (WIDE_INT_MAX_PRECISION - 2), now they are with
   WIDEST_INT_MAX_PRECISION and so much larger; but lrshift on widest_int
   is I think only done in ccp and I'd strongly hope that we treat the
   values as unsigned and so usually much smaller length; so it is just
   when we call wi::lrshift (-1, 2) or similar that results change.
I've noticed that for wide_int or widest_int references even simple
operations like eq_p liked to allocate and immediately free huge buffers,
which was caused by wide_int doing allocation on creation with a particular
precision and e.g. get_binary_precision running into that.  So, I've
duplicated that to avoid the allocations when all we need is just a
precision.

The patch below doesn't actually build anymore since the vec.h asserts
(which point to useful stuff though), so temporarily I've applied it also
with
--- gcc/vec.h.xx2023-09-28 12:56:09.055786055 +0200
+++ gcc/vec.h   2023-09-28 13:15:31.760487111 +0200
@@ -1197,7 +1197,7 @@ template
 inline void
 vec::qsort (int (*cmp) (const void *, const void *))
 {
-  static_assert (vec_detail::is_trivially_copyable_or_pair ::value, "");
+//  static_assert (vec_detail::is_trivially_copyable_or_pair ::value, "");
   if (length () > 1)
 gcc_qsort (address (), length (), sizeof (T), cmp);
 }
@@ -1422,7 +1422,7 @@ template
 void
 gt_ggc_mx (vec *v)
 {
-  static_assert (std::is_trivially_destructible ::value, "");
+//  static_assert (std::is_trivially_destructible ::value, "");
   extern void gt_ggc_mx (T &);
   for (unsigned i = 0; i < v->length (); i++)
 gt_ggc_mx ((*v)[i]);
hack.  The two spots that trigger are tree-ssa-loop-niter.cc doing qsort on
widest_int vector (to be exact, swapping elements in the vector of
widest_int or wide_int by memcpy actually would work, the reason it has
non-trivial destructor and copy 

Re: [PATCH v2] aarch64: Fine-grained ldp and stp policies with test-cases.

2023-09-28 Thread Manos Anagnostakis
No problem!

I'll send a follow up with the requested changes.

Thanks for the input!

Manos.

On Thu, Sep 28, 2023 at 4:42 PM Richard Sandiford 
wrote:

> Manos Anagnostakis  writes:
> > Hey Richard,
> >
> > Thanks for taking the time to review this, but it has been commited since
> > yesterday after getting reviewed by Kyrill and Tamar.
> >
> > Discussions:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631285.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631300.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631389.html
> >
> > Commited version:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631484.html
>
> Sorry about that.  I had v3 being filtered differently and so it went
> into a different inbox.
>
> Richard
>
> >
> > Manos.
> >
> > On Thu, Sep 28, 2023 at 4:17 PM Richard Sandiford <
> richard.sandif...@arm.com>
> > wrote:
> >
> >> Thanks for the patch and sorry for the slow review.
> >>
> >> Manos Anagnostakis  writes:
> >> > This patch implements the following TODO in
> gcc/config/aarch64/aarch64.cc
> >> > to provide the requested behaviour for handling ldp and stp:
> >> >
> >> >   /* Allow the tuning structure to disable LDP instruction formation
> >> >  from combining instructions (e.g., in peephole2).
> >> >  TODO: Implement fine-grained tuning control for LDP and STP:
> >> >1. control policies for load and store separately;
> >> >2. support the following policies:
> >> >   - default (use what is in the tuning structure)
> >> >   - always
> >> >   - never
> >> >   - aligned (only if the compiler can prove that the
> >> > load will be aligned to 2 * element_size)  */
> >> >
> >> > It provides two new and concrete command-line options -mldp-policy and
> >> -mstp-policy
> >> > to give the ability to control load and store policies seperately as
> >> > stated in part 1 of the TODO.
> >> >
> >> > The accepted values for both options are:
> >> > - default: Use the ldp/stp policy defined in the corresponding tuning
> >> >   structure.
> >> > - always: Emit ldp/stp regardless of alignment.
> >> > - never: Do not emit ldp/stp.
> >> > - aligned: In order to emit ldp/stp, first check if the load/store
> will
> >> >   be aligned to 2 * element_size.
> >> >
> >> > gcc/ChangeLog:
> >> > * config/aarch64/aarch64-protos.h (struct tune_params): Add
> >> >   appropriate enums for the policies.
> >> > * config/aarch64/aarch64-tuning-flags.def
> >> >   (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
> >> >   options.
> >> > * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
> >> >   function to parse ldp-policy option.
> >> > (aarch64_parse_stp_policy): New function to parse stp-policy
> >> option.
> >> > (aarch64_override_options_internal): Call parsing functions.
> >> > (aarch64_operands_ok_for_ldpstp): Add option-value check and
> >> >   alignment check and remove superseded ones
> >> > (aarch64_operands_adjust_ok_for_ldpstp): Add option-value
> check
> >> and
> >> >   alignment check and remove superseded ones.
> >> > * config/aarch64/aarch64.opt: Add options.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> > * gcc.target/aarch64/ldp_aligned.c: New test.
> >> > * gcc.target/aarch64/ldp_always.c: New test.
> >> > * gcc.target/aarch64/ldp_never.c: New test.
> >> > * gcc.target/aarch64/stp_aligned.c: New test.
> >> > * gcc.target/aarch64/stp_always.c: New test.
> >> > * gcc.target/aarch64/stp_never.c: New test.
> >> >
> >> > Signed-off-by: Manos Anagnostakis 
> >> > ---
> >> > Changes in v2:
> >> > - Fixed commited ldp tests to correctly trigger
> >> >   and test aarch64_operands_adjust_ok_for_ldpstp in
> aarch64.cc.
> >> > - Added "-mcpu=generic" to commited tests to guarantee generic
> >> target code
> >> >   generation and not cause the regressions of v1.
> >> >
> >> >  gcc/config/aarch64/aarch64-protos.h   |  24 ++
> >> >  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
> >> >  gcc/config/aarch64/aarch64.cc | 229
> ++
> >> >  gcc/config/aarch64/aarch64.opt|   8 +
> >> >  .../gcc.target/aarch64/ldp_aligned.c  |  66 +
> >> >  gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 +
> >> >  gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 +
> >> >  .../gcc.target/aarch64/stp_aligned.c  |  60 +
> >> >  gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +
> >> >  gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +
> >> >  10 files changed, 586 insertions(+), 61 deletions(-)
> >> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> >> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
> >> >  create mode 100644 

[pushed][RA]: Add flag for checking IRA in progress

2023-09-28 Thread Vladimir Makarov
I've pushed the following patch. The explanation is in commit message.  
The patch was successfully bootstrapped on x86-64.
commit 0c8ecbcd3cf7d7187d2017ad02b663a57123b417
Author: Vladimir N. Makarov 
Date:   Thu Sep 28 09:41:18 2023 -0400

[RA]: Add flag for checking IRA in progress

RISCV target developers need a flag to prevent creating
insns in IRA which can not be split after RA as they will need a
temporary reg.  The patch introduces such flag.

gcc/ChangeLog:

* rtl.h (lra_in_progress): Change type to bool.
(ira_in_progress): Add new extern.
* ira.cc (ira_in_progress): New global.
(pass_ira::execute): Set up ira_in_progress.
* lra.cc: (lra_in_progress): Change type to bool and initialize.
(lra): Use bool values for lra_in_progress.
* lra-eliminations.cc (init_elim_table): Ditto.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 0b0d460689d..d7530f01380 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -5542,6 +5542,9 @@ bool ira_conflicts_p;
 /* Saved between IRA and reload.  */
 static int saved_flag_ira_share_spill_slots;
 
+/* Set to true while in IRA.  */
+bool ira_in_progress = false;
+
 /* This is the main entry of IRA.  */
 static void
 ira (FILE *f)
@@ -6110,7 +6113,9 @@ public:
 }
   unsigned int execute (function *) final override
 {
+  ira_in_progress = true;
   ira (dump_file);
+  ira_in_progress = false;
   return 0;
 }
 
diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 4daaff1a124..9ff4774cf5d 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1294,14 +1294,14 @@ init_elim_table (void)
  will cause, e.g., gen_rtx_REG (Pmode, STACK_POINTER_REGNUM) to
  equal stack_pointer_rtx.  We depend on this. Threfore we switch
  off that we are in LRA temporarily.  */
-  lra_in_progress = 0;
+  lra_in_progress = false;
   for (ep = reg_eliminate; ep < _eliminate[NUM_ELIMINABLE_REGS]; ep++)
 {
   ep->from_rtx = gen_rtx_REG (Pmode, ep->from);
   ep->to_rtx = gen_rtx_REG (Pmode, ep->to);
   eliminable_reg_rtx[ep->from] = ep->from_rtx;
 }
-  lra_in_progress = 1;
+  lra_in_progress = true;
 }
 
 /* Function for initialization of elimination once per function.  It
diff --git a/gcc/lra.cc b/gcc/lra.cc
index 361f84fdacb..bcc00ff7d6b 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -2262,8 +2262,8 @@ update_inc_notes (void)
   }
 }
 
-/* Set to 1 while in lra.  */
-int lra_in_progress;
+/* Set to true while in LRA.  */
+bool lra_in_progress = false;
 
 /* Start of pseudo regnos before the LRA.  */
 int lra_new_regno_start;
@@ -2360,7 +2360,7 @@ lra (FILE *f)
   if (flag_checking)
 check_rtl (false);
 
-  lra_in_progress = 1;
+  lra_in_progress = true;
 
   lra_live_range_iter = lra_coalesce_iter = lra_constraint_iter = 0;
   lra_assignment_iter = lra_assignment_iter_after_spill = 0;
@@ -2552,7 +2552,7 @@ lra (FILE *f)
   ira_restore_scratches (lra_dump_file);
   lra_eliminate (true, false);
   lra_final_code_change ();
-  lra_in_progress = 0;
+  lra_in_progress = false;
   if (live_p)
 lra_clear_live_ranges ();
   lra_live_ranges_finish ();
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 102ad9b57a6..8e59cd5d156 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -4108,8 +4108,11 @@ extern int epilogue_completed;
 
 extern int reload_in_progress;
 
-/* Set to 1 while in lra.  */
-extern int lra_in_progress;
+/* Set to true while in IRA.  */
+extern bool ira_in_progress;
+
+/* Set to true while in LRA.  */
+extern bool lra_in_progress;
 
 /* This macro indicates whether you may create a new
pseudo-register.  */


Re: [PATCH v2] aarch64: Fine-grained ldp and stp policies with test-cases.

2023-09-28 Thread Richard Sandiford
Manos Anagnostakis  writes:
> Hey Richard,
>
> Thanks for taking the time to review this, but it has been commited since
> yesterday after getting reviewed by Kyrill and Tamar.
>
> Discussions:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631285.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631300.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631389.html
>
> Commited version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631484.html

Sorry about that.  I had v3 being filtered differently and so it went
into a different inbox.

Richard

>
> Manos.
>
> On Thu, Sep 28, 2023 at 4:17 PM Richard Sandiford 
> wrote:
>
>> Thanks for the patch and sorry for the slow review.
>>
>> Manos Anagnostakis  writes:
>> > This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
>> > to provide the requested behaviour for handling ldp and stp:
>> >
>> >   /* Allow the tuning structure to disable LDP instruction formation
>> >  from combining instructions (e.g., in peephole2).
>> >  TODO: Implement fine-grained tuning control for LDP and STP:
>> >1. control policies for load and store separately;
>> >2. support the following policies:
>> >   - default (use what is in the tuning structure)
>> >   - always
>> >   - never
>> >   - aligned (only if the compiler can prove that the
>> > load will be aligned to 2 * element_size)  */
>> >
>> > It provides two new and concrete command-line options -mldp-policy and
>> -mstp-policy
>> > to give the ability to control load and store policies seperately as
>> > stated in part 1 of the TODO.
>> >
>> > The accepted values for both options are:
>> > - default: Use the ldp/stp policy defined in the corresponding tuning
>> >   structure.
>> > - always: Emit ldp/stp regardless of alignment.
>> > - never: Do not emit ldp/stp.
>> > - aligned: In order to emit ldp/stp, first check if the load/store will
>> >   be aligned to 2 * element_size.
>> >
>> > gcc/ChangeLog:
>> > * config/aarch64/aarch64-protos.h (struct tune_params): Add
>> >   appropriate enums for the policies.
>> > * config/aarch64/aarch64-tuning-flags.def
>> >   (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
>> >   options.
>> > * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
>> >   function to parse ldp-policy option.
>> > (aarch64_parse_stp_policy): New function to parse stp-policy
>> option.
>> > (aarch64_override_options_internal): Call parsing functions.
>> > (aarch64_operands_ok_for_ldpstp): Add option-value check and
>> >   alignment check and remove superseded ones
>> > (aarch64_operands_adjust_ok_for_ldpstp): Add option-value check
>> and
>> >   alignment check and remove superseded ones.
>> > * config/aarch64/aarch64.opt: Add options.
>> >
>> > gcc/testsuite/ChangeLog:
>> > * gcc.target/aarch64/ldp_aligned.c: New test.
>> > * gcc.target/aarch64/ldp_always.c: New test.
>> > * gcc.target/aarch64/ldp_never.c: New test.
>> > * gcc.target/aarch64/stp_aligned.c: New test.
>> > * gcc.target/aarch64/stp_always.c: New test.
>> > * gcc.target/aarch64/stp_never.c: New test.
>> >
>> > Signed-off-by: Manos Anagnostakis 
>> > ---
>> > Changes in v2:
>> > - Fixed commited ldp tests to correctly trigger
>> >   and test aarch64_operands_adjust_ok_for_ldpstp in aarch64.cc.
>> > - Added "-mcpu=generic" to commited tests to guarantee generic
>> target code
>> >   generation and not cause the regressions of v1.
>> >
>> >  gcc/config/aarch64/aarch64-protos.h   |  24 ++
>> >  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
>> >  gcc/config/aarch64/aarch64.cc | 229 ++
>> >  gcc/config/aarch64/aarch64.opt|   8 +
>> >  .../gcc.target/aarch64/ldp_aligned.c  |  66 +
>> >  gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 +
>> >  gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 +
>> >  .../gcc.target/aarch64/stp_aligned.c  |  60 +
>> >  gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +
>> >  gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +
>> >  10 files changed, 586 insertions(+), 61 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c
>> >
>> > diff --git a/gcc/config/aarch64/aarch64-protos.h
>> b/gcc/config/aarch64/aarch64-protos.h
>> > index 70303d6fd95..be1d73490ed 100644
>> > --- 

Re: [PATCH v2] aarch64: Fine-grained ldp and stp policies with test-cases.

2023-09-28 Thread Manos Anagnostakis
Sure, I will attend to this.

Manos.

On Thu, Sep 28, 2023 at 4:37 PM Philipp Tomsich 
wrote:

> Manos,
>
> Please submit a follow-on patch implementing the requested
> improvements of the code structure (as this reduces the maintenance
> burden).
>
> Thanks,
> Philipp.
>
>
> On Thu, 28 Sept 2023 at 15:33, Manos Anagnostakis
>  wrote:
> >
> > Hey Richard,
> >
> > Thanks for taking the time to review this, but it has been commited
> since yesterday after getting reviewed by Kyrill and Tamar.
> >
> > Discussions:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631285.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631300.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631389.html
> >
> > Commited version:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631484.html
> >
> > Manos.
> >
> > On Thu, Sep 28, 2023 at 4:17 PM Richard Sandiford <
> richard.sandif...@arm.com> wrote:
> >>
> >> Thanks for the patch and sorry for the slow review.
> >>
> >> Manos Anagnostakis  writes:
> >> > This patch implements the following TODO in
> gcc/config/aarch64/aarch64.cc
> >> > to provide the requested behaviour for handling ldp and stp:
> >> >
> >> >   /* Allow the tuning structure to disable LDP instruction formation
> >> >  from combining instructions (e.g., in peephole2).
> >> >  TODO: Implement fine-grained tuning control for LDP and STP:
> >> >1. control policies for load and store separately;
> >> >2. support the following policies:
> >> >   - default (use what is in the tuning structure)
> >> >   - always
> >> >   - never
> >> >   - aligned (only if the compiler can prove that the
> >> > load will be aligned to 2 * element_size)  */
> >> >
> >> > It provides two new and concrete command-line options -mldp-policy
> and -mstp-policy
> >> > to give the ability to control load and store policies seperately as
> >> > stated in part 1 of the TODO.
> >> >
> >> > The accepted values for both options are:
> >> > - default: Use the ldp/stp policy defined in the corresponding tuning
> >> >   structure.
> >> > - always: Emit ldp/stp regardless of alignment.
> >> > - never: Do not emit ldp/stp.
> >> > - aligned: In order to emit ldp/stp, first check if the load/store
> will
> >> >   be aligned to 2 * element_size.
> >> >
> >> > gcc/ChangeLog:
> >> > * config/aarch64/aarch64-protos.h (struct tune_params): Add
> >> >   appropriate enums for the policies.
> >> > * config/aarch64/aarch64-tuning-flags.def
> >> >   (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
> >> >   options.
> >> > * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
> >> >   function to parse ldp-policy option.
> >> > (aarch64_parse_stp_policy): New function to parse stp-policy
> option.
> >> > (aarch64_override_options_internal): Call parsing functions.
> >> > (aarch64_operands_ok_for_ldpstp): Add option-value check and
> >> >   alignment check and remove superseded ones
> >> > (aarch64_operands_adjust_ok_for_ldpstp): Add option-value
> check and
> >> >   alignment check and remove superseded ones.
> >> > * config/aarch64/aarch64.opt: Add options.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> > * gcc.target/aarch64/ldp_aligned.c: New test.
> >> > * gcc.target/aarch64/ldp_always.c: New test.
> >> > * gcc.target/aarch64/ldp_never.c: New test.
> >> > * gcc.target/aarch64/stp_aligned.c: New test.
> >> > * gcc.target/aarch64/stp_always.c: New test.
> >> > * gcc.target/aarch64/stp_never.c: New test.
> >> >
> >> > Signed-off-by: Manos Anagnostakis 
> >> > ---
> >> > Changes in v2:
> >> > - Fixed commited ldp tests to correctly trigger
> >> >   and test aarch64_operands_adjust_ok_for_ldpstp in
> aarch64.cc.
> >> > - Added "-mcpu=generic" to commited tests to guarantee
> generic target code
> >> >   generation and not cause the regressions of v1.
> >> >
> >> >  gcc/config/aarch64/aarch64-protos.h   |  24 ++
> >> >  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
> >> >  gcc/config/aarch64/aarch64.cc | 229
> ++
> >> >  gcc/config/aarch64/aarch64.opt|   8 +
> >> >  .../gcc.target/aarch64/ldp_aligned.c  |  66 +
> >> >  gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 +
> >> >  gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 +
> >> >  .../gcc.target/aarch64/stp_aligned.c  |  60 +
> >> >  gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +
> >> >  gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +
> >> >  10 files changed, 586 insertions(+), 61 deletions(-)
> >> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> >> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
> >> >  create mode 100644 

Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]

2023-09-28 Thread Richard Sandiford
HAO CHEN GUI  writes:
> Kewen and Richard,
>   Thanks for your comments. Please let me clarify it.
>
> 在 2023/9/27 19:10, Richard Sandiford 写道:
>> Yeah, I agree there doesn't seem to be a good reason to exclude vectors.
>> Sorry to dive straight into details, but maybe we should have something
>> called bitwise_mode_for_size that tries to use integer modes where possible,
>> but falls back to vector modes otherwise.  That mode could then be used
>> for copying, storing, bitwise ops, and equality comparisons (if there
>> is appropriate optabs support).
>
>   The vector mode is not supported for compare_by_pieces and move_by_pieces.
> But it is supported for set_by_pieces and clear_by_pieces. The help function
> widest_fixed_size_mode_for_size returns vector mode when qi_vector is set to
> true.
>
> static fixed_size_mode
> widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector)

Ah, had forgotten about that function.

>
> I tried to enable qi_vector for compare_by_pieces. It can pick up a vector
> mode (eg. V16QImode) and works on some cases. But it fails on a constant
> string case.
>
> int compare (const char* s1)
> {
>   return __builtin_memcmp_eq (s1, "__GCC_HAVE_DWARF2_CFI_ASM", 16);
> }
>
> As the second op is a constant string, it calls builtin_memcpy_read_str to
> build the string. Unfortunately, the inner function doesn't support
> vector mode.
>
>   /* The by-pieces infrastructure does not try to pick a vector mode
>  for memcpy expansion.  */
>   return c_readstr (rep + offset, as_a  (mode),
> /*nul_terminated=*/false);
>
> Seems by-pieces infrastructure itself supports vector mode, but low level
> functions do not.

That looks easily solvable though.  I've posted a potential fix as:

   https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631595.html

Is that the only blocker to doing this in generic code?

Thanks,
Richard

>
> I think there are two ways enable vector mode for compare_by_pieces.
> One is to modify the by-pieces infrastructure . Another is to enable it
> by cmpmem expand. The expand is target specific and be flexible.
>
> What's your opinion?
>
> Thanks
> Gui Haochen


Re: [PATCH v2] aarch64: Fine-grained ldp and stp policies with test-cases.

2023-09-28 Thread Philipp Tomsich
Manos,

Please submit a follow-on patch implementing the requested
improvements of the code structure (as this reduces the maintenance
burden).

Thanks,
Philipp.


On Thu, 28 Sept 2023 at 15:33, Manos Anagnostakis
 wrote:
>
> Hey Richard,
>
> Thanks for taking the time to review this, but it has been commited since 
> yesterday after getting reviewed by Kyrill and Tamar.
>
> Discussions:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631285.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631300.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631389.html
>
> Commited version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631484.html
>
> Manos.
>
> On Thu, Sep 28, 2023 at 4:17 PM Richard Sandiford  
> wrote:
>>
>> Thanks for the patch and sorry for the slow review.
>>
>> Manos Anagnostakis  writes:
>> > This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
>> > to provide the requested behaviour for handling ldp and stp:
>> >
>> >   /* Allow the tuning structure to disable LDP instruction formation
>> >  from combining instructions (e.g., in peephole2).
>> >  TODO: Implement fine-grained tuning control for LDP and STP:
>> >1. control policies for load and store separately;
>> >2. support the following policies:
>> >   - default (use what is in the tuning structure)
>> >   - always
>> >   - never
>> >   - aligned (only if the compiler can prove that the
>> > load will be aligned to 2 * element_size)  */
>> >
>> > It provides two new and concrete command-line options -mldp-policy and 
>> > -mstp-policy
>> > to give the ability to control load and store policies seperately as
>> > stated in part 1 of the TODO.
>> >
>> > The accepted values for both options are:
>> > - default: Use the ldp/stp policy defined in the corresponding tuning
>> >   structure.
>> > - always: Emit ldp/stp regardless of alignment.
>> > - never: Do not emit ldp/stp.
>> > - aligned: In order to emit ldp/stp, first check if the load/store will
>> >   be aligned to 2 * element_size.
>> >
>> > gcc/ChangeLog:
>> > * config/aarch64/aarch64-protos.h (struct tune_params): Add
>> >   appropriate enums for the policies.
>> > * config/aarch64/aarch64-tuning-flags.def
>> >   (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
>> >   options.
>> > * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
>> >   function to parse ldp-policy option.
>> > (aarch64_parse_stp_policy): New function to parse stp-policy 
>> > option.
>> > (aarch64_override_options_internal): Call parsing functions.
>> > (aarch64_operands_ok_for_ldpstp): Add option-value check and
>> >   alignment check and remove superseded ones
>> > (aarch64_operands_adjust_ok_for_ldpstp): Add option-value check and
>> >   alignment check and remove superseded ones.
>> > * config/aarch64/aarch64.opt: Add options.
>> >
>> > gcc/testsuite/ChangeLog:
>> > * gcc.target/aarch64/ldp_aligned.c: New test.
>> > * gcc.target/aarch64/ldp_always.c: New test.
>> > * gcc.target/aarch64/ldp_never.c: New test.
>> > * gcc.target/aarch64/stp_aligned.c: New test.
>> > * gcc.target/aarch64/stp_always.c: New test.
>> > * gcc.target/aarch64/stp_never.c: New test.
>> >
>> > Signed-off-by: Manos Anagnostakis 
>> > ---
>> > Changes in v2:
>> > - Fixed commited ldp tests to correctly trigger
>> >   and test aarch64_operands_adjust_ok_for_ldpstp in aarch64.cc.
>> > - Added "-mcpu=generic" to commited tests to guarantee generic 
>> > target code
>> >   generation and not cause the regressions of v1.
>> >
>> >  gcc/config/aarch64/aarch64-protos.h   |  24 ++
>> >  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
>> >  gcc/config/aarch64/aarch64.cc | 229 ++
>> >  gcc/config/aarch64/aarch64.opt|   8 +
>> >  .../gcc.target/aarch64/ldp_aligned.c  |  66 +
>> >  gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 +
>> >  gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 +
>> >  .../gcc.target/aarch64/stp_aligned.c  |  60 +
>> >  gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +
>> >  gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +
>> >  10 files changed, 586 insertions(+), 61 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c
>> >
>> > diff --git a/gcc/config/aarch64/aarch64-protos.h 
>> > 

[PATCH] Simplify & expand c_readstr

2023-09-28 Thread Richard Sandiford
c_readstr only operated on integer modes.  It worked by reading
the source string into an array of HOST_WIDE_INTs, converting
that array into a wide_int, and from there to an rtx.

It's simpler to do this by building a target memory image and
using native_decode_rtx to convert that memory image into an rtx.
It avoids all the endianness shenanigans because both the string and
native_decode_rtx follow target memory order.  It also means that the
function can handle all fixed-size modes, which simplifies callers
and allows vector modes to be used more widely.

Tested on aarch64-linux-gnu so far.  OK to install?

Richard


gcc/
* builtins.h (c_readstr): Take a fixed_size_mode rather than a
scalar_int_mode.
* builtins.cc (c_readstr): Likewise.  Build a local array of
bytes and use native_decode_rtx to get the rtx image.
(builtin_memcpy_read_str): Simplify accordingly.
(builtin_strncpy_read_str): Likewise.
(builtin_memset_read_str): Likewise.
(builtin_memset_gen_str: Likewise.
* expr.cc (string_cst_read_str): Likewise.
---
 gcc/builtins.cc | 46 +++---
 gcc/builtins.h  |  2 +-
 gcc/expr.cc |  5 ++---
 3 files changed, 14 insertions(+), 39 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 40dfd36a319..cb90bd03b3e 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -743,39 +743,22 @@ c_strlen (tree arg, int only_value, c_strlen_data *data, 
unsigned eltsize)
as needed.  */
 
 rtx
-c_readstr (const char *str, scalar_int_mode mode,
+c_readstr (const char *str, fixed_size_mode mode,
   bool null_terminated_p/*=true*/)
 {
-  HOST_WIDE_INT ch;
-  unsigned int i, j;
-  HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT];
+  auto_vec bytes;
 
-  gcc_assert (GET_MODE_CLASS (mode) == MODE_INT);
-  unsigned int len = (GET_MODE_PRECISION (mode) + HOST_BITS_PER_WIDE_INT - 1)
-/ HOST_BITS_PER_WIDE_INT;
+  bytes.reserve (GET_MODE_SIZE (mode));
 
-  gcc_assert (len <= MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT);
-  for (i = 0; i < len; i++)
-tmp[i] = 0;
-
-  ch = 1;
-  for (i = 0; i < GET_MODE_SIZE (mode); i++)
+  target_unit ch = 1;
+  for (unsigned int i = 0; i < GET_MODE_SIZE (mode); ++i)
 {
-  j = i;
-  if (WORDS_BIG_ENDIAN)
-   j = GET_MODE_SIZE (mode) - i - 1;
-  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
- && GET_MODE_SIZE (mode) >= UNITS_PER_WORD)
-   j = j + UNITS_PER_WORD - 2 * (j % UNITS_PER_WORD) - 1;
-  j *= BITS_PER_UNIT;
-
   if (ch || !null_terminated_p)
ch = (unsigned char) str[i];
-  tmp[j / HOST_BITS_PER_WIDE_INT] |= ch << (j % HOST_BITS_PER_WIDE_INT);
+  bytes.quick_push (ch);
 }
 
-  wide_int c = wide_int::from_array (tmp, len, GET_MODE_PRECISION (mode));
-  return immed_wide_int_const (c, mode);
+  return native_decode_rtx (mode, bytes, 0);
 }
 
 /* Cast a target constant CST to target CHAR and if that value fits into
@@ -3530,10 +3513,7 @@ builtin_memcpy_read_str (void *data, void *, 
HOST_WIDE_INT offset,
  string but the caller guarantees it's large enough for MODE.  */
   const char *rep = (const char *) data;
 
-  /* The by-pieces infrastructure does not try to pick a vector mode
- for memcpy expansion.  */
-  return c_readstr (rep + offset, as_a  (mode),
-   /*nul_terminated=*/false);
+  return c_readstr (rep + offset, mode, /*nul_terminated=*/false);
 }
 
 /* LEN specify length of the block of memcpy/memset operation.
@@ -3994,9 +3974,7 @@ builtin_strncpy_read_str (void *data, void *, 
HOST_WIDE_INT offset,
   if ((unsigned HOST_WIDE_INT) offset > strlen (str))
 return const0_rtx;
 
-  /* The by-pieces infrastructure does not try to pick a vector mode
- for strncpy expansion.  */
-  return c_readstr (str + offset, as_a  (mode));
+  return c_readstr (str + offset, mode);
 }
 
 /* Helper to check the sizes of sequences and the destination of calls
@@ -4227,8 +4205,7 @@ builtin_memset_read_str (void *data, void *prev,
 
   memset (p, *c, size);
 
-  /* Vector modes should be handled above.  */
-  return c_readstr (p, as_a  (mode));
+  return c_readstr (p, mode);
 }
 
 /* Callback routine for store_by_pieces.  Return the RTL of a register
@@ -4275,8 +4252,7 @@ builtin_memset_gen_str (void *data, void *prev,
 
   p = XALLOCAVEC (char, size);
   memset (p, 1, size);
-  /* Vector modes should be handled above.  */
-  coeff = c_readstr (p, as_a  (mode));
+  coeff = c_readstr (p, mode);
 
   target = convert_to_mode (mode, (rtx) data, 1);
   target = expand_mult (mode, target, coeff, NULL_RTX, 1);
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 3b5c34c4802..88a26d70cd5 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -105,7 +105,7 @@ struct c_strlen_data
 };
 
 extern tree c_strlen (tree, int, c_strlen_data * = NULL, unsigned = 1);
-extern rtx c_readstr (const char *, scalar_int_mode, bool = true);
+extern rtx c_readstr (const char *, 

Re: [PATCH v2] aarch64: Fine-grained ldp and stp policies with test-cases.

2023-09-28 Thread Manos Anagnostakis
Hey Richard,

Thanks for taking the time to review this, but it has been commited since
yesterday after getting reviewed by Kyrill and Tamar.

Discussions:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631285.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631300.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631389.html

Commited version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631484.html

Manos.

On Thu, Sep 28, 2023 at 4:17 PM Richard Sandiford 
wrote:

> Thanks for the patch and sorry for the slow review.
>
> Manos Anagnostakis  writes:
> > This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
> > to provide the requested behaviour for handling ldp and stp:
> >
> >   /* Allow the tuning structure to disable LDP instruction formation
> >  from combining instructions (e.g., in peephole2).
> >  TODO: Implement fine-grained tuning control for LDP and STP:
> >1. control policies for load and store separately;
> >2. support the following policies:
> >   - default (use what is in the tuning structure)
> >   - always
> >   - never
> >   - aligned (only if the compiler can prove that the
> > load will be aligned to 2 * element_size)  */
> >
> > It provides two new and concrete command-line options -mldp-policy and
> -mstp-policy
> > to give the ability to control load and store policies seperately as
> > stated in part 1 of the TODO.
> >
> > The accepted values for both options are:
> > - default: Use the ldp/stp policy defined in the corresponding tuning
> >   structure.
> > - always: Emit ldp/stp regardless of alignment.
> > - never: Do not emit ldp/stp.
> > - aligned: In order to emit ldp/stp, first check if the load/store will
> >   be aligned to 2 * element_size.
> >
> > gcc/ChangeLog:
> > * config/aarch64/aarch64-protos.h (struct tune_params): Add
> >   appropriate enums for the policies.
> > * config/aarch64/aarch64-tuning-flags.def
> >   (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
> >   options.
> > * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
> >   function to parse ldp-policy option.
> > (aarch64_parse_stp_policy): New function to parse stp-policy
> option.
> > (aarch64_override_options_internal): Call parsing functions.
> > (aarch64_operands_ok_for_ldpstp): Add option-value check and
> >   alignment check and remove superseded ones
> > (aarch64_operands_adjust_ok_for_ldpstp): Add option-value check
> and
> >   alignment check and remove superseded ones.
> > * config/aarch64/aarch64.opt: Add options.
> >
> > gcc/testsuite/ChangeLog:
> > * gcc.target/aarch64/ldp_aligned.c: New test.
> > * gcc.target/aarch64/ldp_always.c: New test.
> > * gcc.target/aarch64/ldp_never.c: New test.
> > * gcc.target/aarch64/stp_aligned.c: New test.
> > * gcc.target/aarch64/stp_always.c: New test.
> > * gcc.target/aarch64/stp_never.c: New test.
> >
> > Signed-off-by: Manos Anagnostakis 
> > ---
> > Changes in v2:
> > - Fixed commited ldp tests to correctly trigger
> >   and test aarch64_operands_adjust_ok_for_ldpstp in aarch64.cc.
> > - Added "-mcpu=generic" to commited tests to guarantee generic
> target code
> >   generation and not cause the regressions of v1.
> >
> >  gcc/config/aarch64/aarch64-protos.h   |  24 ++
> >  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
> >  gcc/config/aarch64/aarch64.cc | 229 ++
> >  gcc/config/aarch64/aarch64.opt|   8 +
> >  .../gcc.target/aarch64/ldp_aligned.c  |  66 +
> >  gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 +
> >  gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 +
> >  .../gcc.target/aarch64/stp_aligned.c  |  60 +
> >  gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +
> >  gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +
> >  10 files changed, 586 insertions(+), 61 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> > index 70303d6fd95..be1d73490ed 100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -568,6 +568,30 @@ struct tune_params
> >/* Place prefetch struct pointer at the end to enable type checking
> >   errors when tune_params misses elements (e.g., from 

[PATCH 2/2] arm: move the switch tables for Arm to the RO data section.

2023-09-28 Thread Richard Ball
Follow up patch to arm: Use deltas for Arm switch tables
This patch moves the switch tables for Arm from the .text section
into the .rodata section.

gcc/ChangeLog:

* config/arm/aout.h: Change to use the Lrtx label.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove arm targets
from (!target_pure_code) condition.
(ADDR_VEC_ALIGN): Add align for tables in rodata section.
* config/arm/arm.cc (arm_output_casesi): Alter the function to include
.Lrtx label and remove adr instructions.
* config/arm/arm.md
(arm_casesi_internal): Use force_reg to generate ldr instructions that
would otherwise be out of range, and change rtl to accommodate force 
reg.
Additionally remove unnecessary register temp.
(casesi): Remove pure code check for Arm.
* config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Remove arm
targets from JUMP_TABLES_IN_TEXT_SECTION definition.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: Alter the tests to
change adr instruction to ldr.diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 
6a4c8da5f6d5a1695518f42830b9d045888eeed6..49896bb962081a5ee4b5328029813c681c489a9e
 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -187,16 +187,16 @@
  switch (GET_MODE (body))  \
{   \
case E_QImode:  \
- asm_fprintf (STREAM, "\t.byte\t(%LL%d-%LL%d-4)/4\n",  \
+ asm_fprintf (STREAM, "\t.byte\t(%LL%d-%LLrtx%d-4)/4\n",   \
   VALUE, REL); \
  break;\
case E_HImode:  \
- asm_fprintf (STREAM, "\t.2byte\t(%LL%d-%LL%d-4)/4\n", \
+ asm_fprintf (STREAM, "\t.2byte\t(%LL%d-%LLrtx%d-4)/4\n",  \
   VALUE, REL); \
  break;\
case E_SImode:  \
  if (flag_pic) \
-   asm_fprintf (STREAM, "\t.word\t%LL%d-%LL%d-4\n",\
+   asm_fprintf (STREAM, "\t.word\t%LL%d-%LLrtx%d-4\n", \
 VALUE, REL);   \
  else  \
asm_fprintf (STREAM, "\t.word\t%LL%d\n", VALUE);\
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
3063e3489094f04ecf03a52952c185d4a75da645..ba61cf6fb9e4969776b49b499ce2205a940385d0
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2092,10 +2092,10 @@ enum arm_auto_incmodes
for the index in the tablejump instruction.  */
 #define CASE_VECTOR_MODE Pmode
 
-#define CASE_VECTOR_PC_RELATIVE ((TARGET_ARM || TARGET_THUMB2  \
+#define CASE_VECTOR_PC_RELATIVE (TARGET_ARM || ((TARGET_THUMB2 \
  || (TARGET_THUMB1 \
  && (optimize_size || flag_pic)))  \
-&& (!target_pure_code))
+&& (!target_pure_code)))
 
 
 #define CASE_VECTOR_SHORTEN_MODE(min, max, body)   \
@@ -2301,8 +2301,14 @@ extern int making_const_table;
asm_fprintf (STREAM, "\tpop {%r}\n", REGNO);\
 } while (0)
 
-#define ADDR_VEC_ALIGN(JUMPTABLE)  \
-  ((TARGET_THUMB && GET_MODE (PATTERN (JUMPTABLE)) == SImode) ? 2 : 0)
+/* If the switch table is in the code segment, additional alignment is
+   needed for Thumb SImode tables.  Otherwise, tables in RO data have
+   natural alignment.  */
+#define ADDR_VEC_ALIGN(TABLE)  \
+  (JUMP_TABLES_IN_TEXT_SECTION \
+   ? ((TARGET_THUMB && GET_MODE (PATTERN (TABLE)) == SImode) ? 2 : 0)  \
+   : (exact_log2 (GET_MODE_ALIGNMENT (GET_MODE (PATTERN (TABLE)))  \
+ / BITS_PER_UNIT)))
 
 /* Alignment for case labels comes from ADDR_VEC_ALIGN; avoid the
default alignment from elfos.h.  */
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
4e5e6997ed555372683e01b2aff5c25265f4e50c..c3a5ef274276cdef1b41690eb0ad7fd4f4218ecf
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -30469,44 +30469,58 @@ arm_output_iwmmxt_tinsr (rtx *operands)
 const char *
 arm_output_casesi (rtx *operands)
 {
+  char buf[100];
+  char label[100];
   rtx diff_vec = PATTERN (NEXT_INSN (as_a  (operands[2])));
-
   gcc_assert (GET_CODE (diff_vec) == ADDR_DIFF_VEC);
-
   output_asm_insn ("cmp\t%0, %1", operands);
   output_asm_insn ("bhi\t%l3", operands);

[PATCH 1/2] arm: Use deltas for Arm switch tables

2023-09-28 Thread Richard Ball
For normal optimization for the Arm state in gcc we get an uncompressed
table of jump targets. This is in the middle of the text segment
far larger than necessary, especially at -Os.
This patch compresses the table to use deltas in a similar manner to
Thumb code generation.
Similar code is also used for -fpic where we currently generate a jump
to a jump. In this format the jumps are too dense for the hardware branch
predictor to handle accurately, so execution is likely to be very expensive.

Changes to switch statements for arm include a new function to handle the
assembly generation for different machine modes. This allows for more
optimisation to be performed in aout.h where arm has switched from using
ASM_OUTPUT_ADDR_VEC_ELT to using ASM_OUTPUT_ADDR_DIFF_ELT.
In ASM_OUTPUT_ADDR_DIFF_ELT new assembly generation options have been
added to utilise the different machine modes. Additional changes
made to the casesi expand and insn, CASE_VECTOR_PC_RELATIVE,
CASE_VECTOR_SHORTEN_MODE and LABEL_ALIGN_AFTER_BARRIER are all
to accomodate this new approach to switch statement generation.

New tests have been added and no regressions on arm-none-eabi.

gcc/ChangeLog:

* config/arm/aout.h (ASM_OUTPUT_ADDR_DIFF_ELT): Add table output
for different machine modes for arm.
* config/arm/arm-protos.h (arm_output_casesi): New prototype.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Make arm use
ASM_OUTPUT_ADDR_DIFF_ELT.
(CASE_VECTOR_SHORTEN_MODE): Change table size calculation for
TARGET_ARM.
(LABEL_ALIGN_AFTER_BARRIER): Change to accommodate .p2align 2
for TARGET_ARM.
* config/arm/arm.cc (arm_output_casesi): New function.
* config/arm/arm.md (arm_casesi_internal): Change casesi expand
and insn.
for arm to use new function arm_output_casesi.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: New test.diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 
57c3b9b7b8b02f15e191ffcb9446f0edf27bbce6..6a4c8da5f6d5a1695518f42830b9d045888eeed6
 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -183,7 +183,28 @@
   do   \
 {  \
   if (TARGET_ARM)  \
-   asm_fprintf (STREAM, "\tb\t%LL%d\n", VALUE);\
+   {   \
+ switch (GET_MODE (body))  \
+   {   \
+   case E_QImode:  \
+ asm_fprintf (STREAM, "\t.byte\t(%LL%d-%LL%d-4)/4\n",  \
+  VALUE, REL); \
+ break;\
+   case E_HImode:  \
+ asm_fprintf (STREAM, "\t.2byte\t(%LL%d-%LL%d-4)/4\n", \
+  VALUE, REL); \
+ break;\
+   case E_SImode:  \
+ if (flag_pic) \
+   asm_fprintf (STREAM, "\t.word\t%LL%d-%LL%d-4\n",\
+VALUE, REL);   \
+ else  \
+   asm_fprintf (STREAM, "\t.word\t%LL%d\n", VALUE);\
+ break;\
+   default:\
+ gcc_unreachable ();   \
+   }   \
+   }   \
   else if (TARGET_THUMB1)  \
{   \
  if (flag_pic || optimize_size)\
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
77e76336e94096975093c0c7c72a005993a4c27d..2f5ca79ed8ddd647b212782a0454ee4fefc07257
 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -261,6 +261,7 @@ extern void thumb_expand_cpymemqi (rtx *);
 extern rtx arm_return_addr (int, rtx);
 extern void thumb_reload_out_hi (rtx *);
 extern void thumb_set_return_address (rtx, rtx);
+extern const char *arm_output_casesi (rtx *);
 extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
 #endif
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 

Re: [PATCH v2] aarch64: Fine-grained ldp and stp policies with test-cases.

2023-09-28 Thread Richard Sandiford
Thanks for the patch and sorry for the slow review.

Manos Anagnostakis  writes:
> This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
> to provide the requested behaviour for handling ldp and stp:
>
>   /* Allow the tuning structure to disable LDP instruction formation
>  from combining instructions (e.g., in peephole2).
>  TODO: Implement fine-grained tuning control for LDP and STP:
>1. control policies for load and store separately;
>2. support the following policies:
>   - default (use what is in the tuning structure)
>   - always
>   - never
>   - aligned (only if the compiler can prove that the
> load will be aligned to 2 * element_size)  */
>
> It provides two new and concrete command-line options -mldp-policy and 
> -mstp-policy
> to give the ability to control load and store policies seperately as
> stated in part 1 of the TODO.
>
> The accepted values for both options are:
> - default: Use the ldp/stp policy defined in the corresponding tuning
>   structure.
> - always: Emit ldp/stp regardless of alignment.
> - never: Do not emit ldp/stp.
> - aligned: In order to emit ldp/stp, first check if the load/store will
>   be aligned to 2 * element_size.
>
> gcc/ChangeLog:
> * config/aarch64/aarch64-protos.h (struct tune_params): Add
>   appropriate enums for the policies.
> * config/aarch64/aarch64-tuning-flags.def
>   (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
>   options.
> * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
>   function to parse ldp-policy option.
> (aarch64_parse_stp_policy): New function to parse stp-policy option.
> (aarch64_override_options_internal): Call parsing functions.
> (aarch64_operands_ok_for_ldpstp): Add option-value check and
>   alignment check and remove superseded ones
> (aarch64_operands_adjust_ok_for_ldpstp): Add option-value check and
>   alignment check and remove superseded ones.
> * config/aarch64/aarch64.opt: Add options.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/aarch64/ldp_aligned.c: New test.
> * gcc.target/aarch64/ldp_always.c: New test.
> * gcc.target/aarch64/ldp_never.c: New test.
> * gcc.target/aarch64/stp_aligned.c: New test.
> * gcc.target/aarch64/stp_always.c: New test.
> * gcc.target/aarch64/stp_never.c: New test.
>
> Signed-off-by: Manos Anagnostakis 
> ---
> Changes in v2:
> - Fixed commited ldp tests to correctly trigger
>   and test aarch64_operands_adjust_ok_for_ldpstp in aarch64.cc.
> - Added "-mcpu=generic" to commited tests to guarantee generic target 
> code
>   generation and not cause the regressions of v1.
>
>  gcc/config/aarch64/aarch64-protos.h   |  24 ++
>  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
>  gcc/config/aarch64/aarch64.cc | 229 ++
>  gcc/config/aarch64/aarch64.opt|   8 +
>  .../gcc.target/aarch64/ldp_aligned.c  |  66 +
>  gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 +
>  gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 +
>  .../gcc.target/aarch64/stp_aligned.c  |  60 +
>  gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +
>  gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +
>  10 files changed, 586 insertions(+), 61 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 70303d6fd95..be1d73490ed 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -568,6 +568,30 @@ struct tune_params
>/* Place prefetch struct pointer at the end to enable type checking
>   errors when tune_params misses elements (e.g., from erroneous merges).  
> */
>const struct cpu_prefetch_tune *prefetch;
> +/* An enum specifying how to handle load pairs using a fine-grained policy:
> +   - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned
> +   to at least double the alignment of the type.
> +   - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment.
> +   - LDP_POLICY_NEVER: Do not emit ldp.  */
> +
> +  enum aarch64_ldp_policy_model
> +  {
> +LDP_POLICY_ALIGNED,
> +LDP_POLICY_ALWAYS,
> +LDP_POLICY_NEVER
> +  } ldp_policy_model;
> +/* An enum specifying how to handle store pairs using a fine-grained policy:
> +   - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned
> +   to at 

[PATCH] target/111600 - avoid deep recursion in access diagnostics

2023-09-28 Thread Richard Biener
pass_waccess::check_dangling_stores uses recursion to traverse the CFG.
The following changes this to use a heap allocated worklist to avoid
blowing the stack.

Instead of using a better iteration order it tries hard to preserve
the current iteration order to avoid new false positives to pop up
since the set of stores we keep track isn't properly modeling flow,
so what is diagnosed and what not is quite random.  We are also
lacking the ideal RPO compute on the inverted graph that would just
ignore reverse unreachable code (as the current iteration scheme does).

Bootstrapped and tested on x86_64-unknown-linux-gnu, with this
8MB of stack are now enough to build riscv insn-opinit.cc.

Pushed.

PR target/111600
* gimple-ssa-warn-access.cc (pass_waccess::check_dangling_stores):
Use a heap allocated worklist for CFG traversal instead of
recursion.
---
 gcc/gimple-ssa-warn-access.cc | 51 ++-
 1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index ac07a6f9b95..fcaff128d60 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -2141,7 +2141,7 @@ private:
   void check_dangling_uses (tree, tree, bool = false, bool = false);
   void check_dangling_uses ();
   void check_dangling_stores ();
-  void check_dangling_stores (basic_block, hash_set &, auto_bitmap &);
+  bool check_dangling_stores (basic_block, hash_set &);
 
   void warn_invalid_pointer (tree, gimple *, gimple *, tree, bool, bool = 
false);
 
@@ -4524,17 +4524,13 @@ pass_waccess::check_dangling_uses (tree var, tree decl, 
bool maybe /* = false */
 
 /* Diagnose stores in BB and (recursively) its predecessors of the addresses
of local variables into nonlocal pointers that are left dangling after
-   the function returns.  BBS is a bitmap of basic blocks visited.  */
+   the function returns.  Returns true when we can continue walking
+   the CFG to predecessors.  */
 
-void
+bool
 pass_waccess::check_dangling_stores (basic_block bb,
-hash_set ,
-auto_bitmap )
+hash_set )
 {
-  if (!bitmap_set_bit (bbs, bb->index))
-/* Avoid cycles. */
-return;
-
   /* Iterate backwards over the statements looking for a store of
  the address of a local variable into a nonlocal pointer.  */
   for (auto gsi = gsi_last_nondebug_bb (bb); ; gsi_prev_nondebug ())
@@ -4550,7 +4546,7 @@ pass_waccess::check_dangling_stores (basic_block bb,
  && !(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE)))
/* Avoid looking before nonconst, nonpure calls since those might
   use the escaped locals.  */
-   return;
+   return false;
 
   if (!is_gimple_assign (stmt) || gimple_clobber_p (stmt)
  || !gimple_store_p (stmt))
@@ -4576,7 +4572,7 @@ pass_waccess::check_dangling_stores (basic_block bb,
  gimple *def_stmt = SSA_NAME_DEF_STMT (lhs_ref.ref);
  if (!gimple_nop_p (def_stmt))
/* Avoid looking at or before stores into unknown objects.  */
-   return;
+   return false;
 
  lhs_ref.ref = SSA_NAME_VAR (lhs_ref.ref);
}
@@ -4620,13 +4616,7 @@ pass_waccess::check_dangling_stores (basic_block bb,
}
 }
 
-  edge e;
-  edge_iterator ei;
-  FOR_EACH_EDGE (e, ei, bb->preds)
-{
-  basic_block pred = e->src;
-  check_dangling_stores (pred, stores, bbs);
-}
+  return true;
 }
 
 /* Diagnose stores of the addresses of local variables into nonlocal
@@ -4635,9 +4625,32 @@ pass_waccess::check_dangling_stores (basic_block bb,
 void
 pass_waccess::check_dangling_stores ()
 {
+  if (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (m_func)->preds) == 0)
+return;
+
   auto_bitmap bbs;
   hash_set stores;
-  check_dangling_stores (EXIT_BLOCK_PTR_FOR_FN (m_func), stores, bbs);
+  auto_vec worklist (n_basic_blocks_for_fn (cfun) + 1);
+  worklist.quick_push (ei_start (EXIT_BLOCK_PTR_FOR_FN (m_func)->preds));
+  do
+{
+  edge_iterator ei = worklist.last ();
+  basic_block src = ei_edge (ei)->src;
+  if (bitmap_set_bit (bbs, src->index))
+   {
+ if (check_dangling_stores (src, stores)
+ && EDGE_COUNT (src->preds) > 0)
+   worklist.quick_push (ei_start (src->preds));
+   }
+  else
+   {
+ if (ei_one_before_end_p (ei))
+   worklist.pop ();
+ else
+   ei_next ( ());
+   }
+}
+  while (!worklist.is_empty ());
 }
 
 /* Check for and diagnose uses of dangling pointers to auto objects
-- 
2.35.3


Re: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-09-28 Thread Andre Vieira (lists)

Hi,

On 14/09/2023 13:10, Kyrylo Tkachov via Gcc-patches wrote:

Hi Stam,





The arm parts look sensible but we'd need review for the df-core.h and 
df-core.cc changes.
Maybe Jeff can help or can recommend someone to take a look?
Thanks,
Kyrill



FWIW the changes LGTM, if we don't want these in df-core we can always 
implement the extra utility locally. It's really just a helper function 
to check if df_bb_regno_first_def_find and df_bb_regno_last_def_find 
yield the same result, meaning we only have a single definition.


Kind regards,
Andre


[Fortran, Patch, Coarray, PR 37336] Fix crash in finalizer when derived type coarray is already freed.

2023-09-28 Thread Andre Vehreschild
Hi all,

attached patch fixes a crash in coarray programs when an allocatable derived
typed coarray was freed explicitly. The generated cleanup code did not take
into account, that the coarray may have been deallocated already. The patch
fixes this by moving the statements accessing components inside the derived type
into the block guard by its allocated check.

Regtested ok on f37/x86_64. Ok for master?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index e0fc8ebc46b..8e94a9a469f 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -9320,6 +9320,12 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl, tree dest,
   gfc_add_expr_to_block (, tmp);
 }

+  /* Still having a descriptor array of rank == 0 here, indicates an
+ allocatable coarrays.  Dereference it correctly.  */
+  if (GFC_DESCRIPTOR_TYPE_P (decl_type))
+{
+  decl = build_fold_indirect_ref (gfc_conv_array_data (decl));
+}
   /* Otherwise, act on the components or recursively call self to
  act on a chain of components.  */
   for (c = der_type->components; c; c = c->next)
@@ -11507,7 +11513,11 @@ gfc_trans_deferred_array (gfc_symbol * sym, gfc_wrapped_block * block)
 {
   int rank;
   rank = sym->as ? sym->as->rank : 0;
-  tmp = gfc_deallocate_alloc_comp (sym->ts.u.derived, descriptor, rank);
+  tmp = gfc_deallocate_alloc_comp (sym->ts.u.derived, descriptor, rank,
+   (sym->attr.codimension
+	&& flag_coarray == GFC_FCOARRAY_LIB)
+   ? GFC_STRUCTURE_CAF_MODE_IN_COARRAY
+   : 0);
   gfc_add_expr_to_block (, tmp);
 }

@@ -11521,9 +11531,11 @@ gfc_trans_deferred_array (gfc_symbol * sym, gfc_wrapped_block * block)
 	NULL_TREE, NULL_TREE, true, e,
 	sym->attr.codimension
 	? GFC_CAF_COARRAY_DEREGISTER
-	: GFC_CAF_COARRAY_NOCOARRAY);
+	: GFC_CAF_COARRAY_NOCOARRAY,
+	NULL_TREE, gfc_finish_block ());
   if (e)
 	gfc_free_expr (e);
+  gfc_init_block ();
   gfc_add_expr_to_block (, tmp);
 }

diff --git a/gcc/testsuite/gfortran.dg/coarray/alloc_comp_6.f90 b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_6.f90
new file mode 100644
index 000..e8a74db2c18
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/alloc_comp_6.f90
@@ -0,0 +1,29 @@
+! { dg-do run }
+
+program alloc_comp_6
+
+  implicit none
+
+  type :: foo
+real :: x
+integer, allocatable :: y(:)
+  end type
+
+  call check()
+
+contains
+
+  subroutine check()
+block
+  type(foo), allocatable :: example[:] ! needs to be a coarray
+
+  allocate(example[*])
+  allocate(example%y(10))
+  example%x = 3.4
+  example%y = 4
+
+  deallocate(example)
+end block  ! example%y shall not be accessed here by the finalizer,
+   ! because example is already deallocated
+  end subroutine check
+end program alloc_comp_6
Fortran: Free alloc. comp. in allocated coarrays only.

When freeing allocatable components of an allocatable coarray, add
a check that the coarray is still allocated, before accessing the
components.

This patch adds to PR fortran/37336, but does not fix it completely.

gcc/fortran/ChangeLog:
PR fortran/37336
* trans-array.cc (structure_alloc_comps): Deref coarray.
(gfc_trans_deferred_array): Add freeing of components after
check for allocated coarray.

gcc/testsuite/ChangeLog:
PR fortran/37336
* gfortran.dg/coarray/alloc_comp_6.f90: New test.



Re: [PATCH] RFC: Add late-combine pass [PR106594]

2023-09-28 Thread Jeff Law




On 9/26/23 10:21, Richard Sandiford wrote:

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.  I hope it would
also help with Robin's vec_duplicate testcase, although the
pressure heuristic might need tweaking for that case.

This is just a first step..  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutitable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

I've run an assembly comparison with one target per CPU directory,
and it seems to be a win for all targets except nvptx (which is hard
to measure, being a higher-level asm).  The biggest winner seemed
to be AVR.

However, if a version of the pass does go in, it might be better
to enable it by default only on targets where the extra compile
time seems to be worth it.  IMO, fixing PR106594 and the MOVPRFX
issues makes it worthwhile for AArch64.

The patch contains various bug fixes and new helper routines.
I'd submit those separately in the final version.  Because of
that, there's no GNU changelog yet.

Bootstrapped & regression tested on aarch64-linux-gnu so far.

Very interesting.

I would generally expect it to be a win on most targets and might allow 
us to reduce the number of post-reload hacks we do.  So I'd lean towards 
enabling it everywhere.



With that in mind, I briefly threw it into my tester.  The first thing 
that popped out was rl78-elf regresses on compile/20021008-1.c.


In the pre-RA version we've taken these insns:


(insn 22 21 7 2 (set (reg/v/f:HI 44 [ buf ])
(const_int 0 [0])) "k.c":9:9 -1
 (nil))
(insn 7 22 8 2 (set (subreg:SI (reg:DF 43 [ _1 ]) 0)
(mem:SI (plus:HI (reg/v/f:HI 44 [ buf ])
(const_int 1 [0x1])) [1 MEM[(long double *)buf_4(D) + 1B]+0 S4 A16])) 
"k.c":9:9 2 {movsi}
 (nil))
(insn 8 7 9 2 (set (subreg:SI (reg:DF 43 [ _1 ]) 4)
(mem:SI (plus:HI (reg/v/f:HI 44 [ buf ])
(const_int 5 [0x5])) [1 MEM[(long double *)buf_4(D) + 1B]+4 S4 A16])) 
"k.c":9:9 2 {movsi}
 (expr_list:REG_DEAD (reg/v/f:HI 44 [ buf ])
(nil)))



We combine insn 22 with insn 7 and 8 resulting in:


(insn 7 22 8 2 (set (subreg:SI (reg:DF 43 [ _1 ]) 0)
(mem:SI (const_int 1 [0x1]) [1 MEM[(long double *)buf_4(D) + 1B]+0 S4 A16])) 
"k.c":9:9 2 {movsi}
 (nil))
(insn 8 7 9 2 (set (subreg:SI (reg:DF 43 [ _1 ]) 4)
(mem:SI (const_int 5 [0x5]) [1 MEM[(long double *)buf_4(D) + 1B]+4 S4 A16])) 
"k.c":9:9 2 {movsi}
 (nil))



Which ultimately triggers an assembler error:


k.s: Assembler messages:
k.s:41: Error: movw ax,!1
k.s:41: Error:  ^ Expression not word-aligned
k.s:43: Error: movw ax,!3
k.s:43: Error:  ^ Expression not word-aligned

[ ... ]


This seems more likely than not to be a target issue.  I suspect combine 
didn't trip over this because of the multiple uses of (reg 44).



Jeff



Re: [PATCH v2] Add a GCC Security policy

2023-09-28 Thread Siddhesh Poyarekar

On 2023-09-28 12:55, Siddhesh Poyarekar wrote:

Define a security process and exclusions to security issues for GCC and
all components it ships.

Signed-off-by: Siddhesh Poyarekar 
---


Sorry I forgot to summarize changes since the previous version:

- Reworded the introduction so that it doesn't sound like we know *all* 
scenarios and also encourage reporters to reach out.


- Fixed up support and diagnostic libraries sections based on Jakub's 
feedback.



  SECURITY.txt | 205 +++
  1 file changed, 205 insertions(+)
  create mode 100644 SECURITY.txt

diff --git a/SECURITY.txt b/SECURITY.txt
new file mode 100644
index 000..14cb31570d3
--- /dev/null
+++ b/SECURITY.txt
@@ -0,0 +1,205 @@
+What is a GCC security bug?
+===
+
+A security bug is one that threatens the security of a system or
+network, or might compromise the security of data stored on it.
+In the context of GCC there are multiple ways in which this might
+happen and some common scenarios are detailed below.
+
+If you're reporting a security issue and feel like it does not fit
+into any of the descriptions below, you're encouraged to reach out
+through the GCC bugzilla or if needed, privately by following the
+instructions in the last two sections of this document.
+
+Compiler drivers, programs, libgccjit and support libraries
+---
+
+The compiler driver processes source code, invokes other programs
+such as the assembler and linker and generates the output result,
+which may be assembly code or machine code.  Compiling untrusted
+sources can result in arbitrary code execution and unconstrained
+resource consumption in the compiler. As a result, compilation of
+such code should be done inside a sandboxed environment to ensure
+that it does not compromise the development environment.
+
+The libgccjit library can, despite the name, be used both for
+ahead-of-time compilation and for just-in-compilation.  In both
+cases it can be used to translate input representations (such as
+source code) in the application context; in the latter case the
+generated code is also run in the application context.
+
+Limitations that apply to the compiler driver, apply here too in
+terms of sanitizing inputs and it is recommended that both the
+compilation *and* execution context of the code are appropriately
+sandboxed to contain the effects of any bugs in libgccjit, the
+application code using it, or its generated code to the sandboxed
+environment.
+
+Libraries such as libiberty, libcc1 and libcpp are not distributed
+for runtime support and have similar challenges to compiler drivers.
+While they are expected to be robust against arbitrary input, they
+should only be used with trusted inputs when linked into the
+compiler.
+
+Libraries such as zlib that bundled into GCC to build it will be
+treated the same as the compiler drivers and programs as far as
+security coverage is concerned.  However if you find an issue in
+these libraries independent of their use in GCC, you should reach
+out to their upstream projects to report them.
+
+As a result, the only case for a potential security issue in the
+compiler is when it generates vulnerable application code for
+trusted input source code that is conforming to the relevant
+programming standard or extensions documented as supported by GCC
+and the algorithm expressed in the source code does not have the
+vulnerability.  The output application code could be considered
+vulnerable if it produces an actual vulnerability in the target
+application, specifically in the following cases:
+
+- The application dereferences an invalid memory location despite
+  the application sources being valid.
+- The application reads from or writes to a valid but incorrect
+  memory location, resulting in an information integrity issue or an
+  information leak.
+- The application ends up running in an infinite loop or with
+  severe degradation in performance despite the input sources having
+  no such issue, resulting in a Denial of Service.  Note that
+  correct but non-performant code is not a security issue candidate,
+  this only applies to incorrect code that may result in performance
+  degradation severe enough to amount to a denial of service.
+- The application crashes due to the generated incorrect code,
+  resulting in a Denial of Service.
+
+Language runtime libraries
+--
+
+GCC also builds and distributes libraries that are intended to be
+used widely to implement runtime support for various programming
+languages.  These include the following:
+
+* libada
+* libatomic
+* libbacktrace
+* libcc1
+* libcody
+  

[PATCH v2] Add a GCC Security policy

2023-09-28 Thread Siddhesh Poyarekar
Define a security process and exclusions to security issues for GCC and
all components it ships.

Signed-off-by: Siddhesh Poyarekar 
---
 SECURITY.txt | 205 +++
 1 file changed, 205 insertions(+)
 create mode 100644 SECURITY.txt

diff --git a/SECURITY.txt b/SECURITY.txt
new file mode 100644
index 000..14cb31570d3
--- /dev/null
+++ b/SECURITY.txt
@@ -0,0 +1,205 @@
+What is a GCC security bug?
+===
+
+A security bug is one that threatens the security of a system or
+network, or might compromise the security of data stored on it.
+In the context of GCC there are multiple ways in which this might
+happen and some common scenarios are detailed below.
+
+If you're reporting a security issue and feel like it does not fit
+into any of the descriptions below, you're encouraged to reach out
+through the GCC bugzilla or if needed, privately by following the
+instructions in the last two sections of this document.
+
+Compiler drivers, programs, libgccjit and support libraries
+---
+
+The compiler driver processes source code, invokes other programs
+such as the assembler and linker and generates the output result,
+which may be assembly code or machine code.  Compiling untrusted
+sources can result in arbitrary code execution and unconstrained
+resource consumption in the compiler. As a result, compilation of
+such code should be done inside a sandboxed environment to ensure
+that it does not compromise the development environment.
+
+The libgccjit library can, despite the name, be used both for
+ahead-of-time compilation and for just-in-compilation.  In both
+cases it can be used to translate input representations (such as
+source code) in the application context; in the latter case the
+generated code is also run in the application context.
+
+Limitations that apply to the compiler driver, apply here too in
+terms of sanitizing inputs and it is recommended that both the
+compilation *and* execution context of the code are appropriately
+sandboxed to contain the effects of any bugs in libgccjit, the
+application code using it, or its generated code to the sandboxed
+environment.
+
+Libraries such as libiberty, libcc1 and libcpp are not distributed
+for runtime support and have similar challenges to compiler drivers.
+While they are expected to be robust against arbitrary input, they
+should only be used with trusted inputs when linked into the
+compiler.
+
+Libraries such as zlib that bundled into GCC to build it will be
+treated the same as the compiler drivers and programs as far as
+security coverage is concerned.  However if you find an issue in
+these libraries independent of their use in GCC, you should reach
+out to their upstream projects to report them.
+
+As a result, the only case for a potential security issue in the
+compiler is when it generates vulnerable application code for
+trusted input source code that is conforming to the relevant
+programming standard or extensions documented as supported by GCC
+and the algorithm expressed in the source code does not have the
+vulnerability.  The output application code could be considered
+vulnerable if it produces an actual vulnerability in the target
+application, specifically in the following cases:
+
+- The application dereferences an invalid memory location despite
+  the application sources being valid.
+- The application reads from or writes to a valid but incorrect
+  memory location, resulting in an information integrity issue or an
+  information leak.
+- The application ends up running in an infinite loop or with
+  severe degradation in performance despite the input sources having
+  no such issue, resulting in a Denial of Service.  Note that
+  correct but non-performant code is not a security issue candidate,
+  this only applies to incorrect code that may result in performance
+  degradation severe enough to amount to a denial of service.
+- The application crashes due to the generated incorrect code,
+  resulting in a Denial of Service.
+
+Language runtime libraries
+--
+
+GCC also builds and distributes libraries that are intended to be
+used widely to implement runtime support for various programming
+languages.  These include the following:
+
+* libada
+* libatomic
+* libbacktrace
+* libcc1
+* libcody
+* libcpp
+* libdecnumber
+* libffi
+* libgcc
+* libgfortran
+* libgm2
+* libgo
+* libgomp
+* libitm
+* libobjc
+* libphobos
+* libquadmath
+* libssp
+* libstdc++
+
+These libraries are intended to be used in arbitrary contexts and as
+a result, bugs in these libraries 

Re: [PATCH] use *_grow_cleared rather than *_grow on vec

2023-09-28 Thread Richard Biener
On Thu, 28 Sep 2023, Jakub Jelinek wrote:

> On Thu, Sep 28, 2023 at 12:29:15PM +0200, Jakub Jelinek wrote:
> > On Thu, Sep 28, 2023 at 09:29:31AM +, Richard Biener wrote:
> > > > The following patch splits the bitmap_head class into a POD
> > > > struct bitmap_head_pod and bitmap_head derived from it with non-trivial
> > > > default constexpr constructor.  Most code should keep using bitmap_head
> > > > as before, bitmap_head_pod is there just for the cases where we want to
> > > > embed the bitmap head into a vector which we want to e.g. 
> > > > {quick,safe}_grow
> > > > and in a loop bitmap_initialize it afterwards (to avoid having to
> > > > {quick,safe}_grow_cleared them just to overwrite with 
> > > > bitmap_initialize).
> > > > The patch is larger than I hoped, because e.g. some code just used 
> > > > bitmap
> > > > and bitmap_head * or const_bitmap and const bitmap_head * 
> > > > interchangeably.
> > > > 
> > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > > 
> > > OK if there are no comments indicating otherwise.
> > 
> > A counter argument against this patch would be that it weakens the intent
> > to catch uses of uninitialized bitmaps for saving a few compile time cycles.
> > If one uses
> >   bitmap_head var;
> >   bitmap_initialize (, NULL);
> > etc., we spend those extra cycles to initialize it and nothing is told that
> > bitmap_initialize overwrites the whole var without ever using of any of its
> > elements, so DSE can't eliminate that.  And in the vec case which prompted
> > this patch it was about
> >   vec a;
> >   a.create (n);
> >   a.safe_grow (n); // vs. a.safe_grow_cleared (n);
> >   for (int i = 0; i < n; ++i)
> > bitmap_initialize ([i], NULL);
> > When using bitmap_head_pod, one needs to ensure initialization without
> > help to catch such mistakes.
> 
> Here is the alternative patch which pays the small extra price while not
> undermining the checking.  Verified in all those places there is a loop
> doing bitmap_initialize immediately afterwards or worst case a few lines
> later.
> 
> With the static_assert uncommented, the remaining failures are poly_int
> related (supposedly gone with Richard S.'s poly_int patch) and the
> vect_unpromoted_value/ao_ref still unresolved cases.

OK, I like this better - it's only when we'd sparsely use the vec<>
that it's worth to delay initialization.

Richard.

> 2023-09-28  Jakub Jelinek  
> 
>   * tree-ssa-loop-im.cc (tree_ssa_lim_initialize): Use quick_grow_cleared
>   instead of quick_grow on vec members.
>   * cfganal.cc (control_dependences::control_dependences): Likewise.
>   * rtl-ssa/blocks.cc (function_info::build_info::build_info): Likewise.
>   (function_info::place_phis): Use safe_grow_cleared instead of safe_grow
>   on auto_vec vars.
>   * tree-ssa-live.cc (compute_live_vars): Use quick_grow_cleared instead
>   of quick_grow on vec var.
> 
> --- gcc/tree-ssa-loop-im.cc.jj2023-09-28 12:06:03.527974171 +0200
> +++ gcc/tree-ssa-loop-im.cc   2023-09-28 12:38:07.028966742 +0200
> @@ -3496,13 +3496,13 @@ tree_ssa_lim_initialize (bool store_moti
>  (mem_ref_alloc (NULL, 0, UNANALYZABLE_MEM_ID));
>  
>memory_accesses.refs_loaded_in_loop.create (number_of_loops (cfun));
> -  memory_accesses.refs_loaded_in_loop.quick_grow (number_of_loops (cfun));
> +  memory_accesses.refs_loaded_in_loop.quick_grow_cleared (number_of_loops 
> (cfun));
>memory_accesses.refs_stored_in_loop.create (number_of_loops (cfun));
> -  memory_accesses.refs_stored_in_loop.quick_grow (number_of_loops (cfun));
> +  memory_accesses.refs_stored_in_loop.quick_grow_cleared (number_of_loops 
> (cfun));
>if (store_motion)
>  {
>memory_accesses.all_refs_stored_in_loop.create (number_of_loops 
> (cfun));
> -  memory_accesses.all_refs_stored_in_loop.quick_grow
> +  memory_accesses.all_refs_stored_in_loop.quick_grow_cleared
> (number_of_loops (cfun));
>  }
>  
> --- gcc/cfganal.cc.jj 2023-09-28 11:31:45.013870771 +0200
> +++ gcc/cfganal.cc2023-09-28 12:37:34.302425957 +0200
> @@ -468,7 +468,7 @@ control_dependences::control_dependences
>  
>bitmap_obstack_initialize (_bitmaps);
>control_dependence_map.create (last_basic_block_for_fn (cfun));
> -  control_dependence_map.quick_grow (last_basic_block_for_fn (cfun));
> +  control_dependence_map.quick_grow_cleared (last_basic_block_for_fn (cfun));
>for (int i = 0; i < last_basic_block_for_fn (cfun); ++i)
>  bitmap_initialize (_dependence_map[i], _bitmaps);
>for (int i = 0; i < num_edges; ++i)
> --- gcc/rtl-ssa/blocks.cc.jj  2023-09-28 11:31:45.413865158 +0200
> +++ gcc/rtl-ssa/blocks.cc 2023-09-28 12:41:28.063145949 +0200
> @@ -57,7 +57,7 @@ function_info::build_info::build_info (u
>// write to an entry before reading from it.  But poison the contents
>// when checking, just to make sure we don't accidentally use an
>// 

[ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.

2023-09-28 Thread Roger Sayle

Hi Claudiu,
It was great meeting up with you and the Synopsys ARC team at the
GNU tools Cauldron in Cambridge.

This patch is the first in a series to improve SImode and DImode
shifts and rotates in the ARC backend.  This first piece splits
SImode shifts, for !TARGET_BARREL_SHIFTER targets, after combine
and before reload, in the split1 pass, as suggested by the FIXME
comment above output_shift in arc.cc.  To do this I've copied the
implementation of the x86_pre_reload_split function from i386
backend, and renamed it arc_pre_reload_split.

Although the actual implementations of shifts remain the same
(as in output_shift), having them as explicit instructions in
the RTL stream allows better scheduling and use of compact forms
when available.  The benefits can be seen in two short examples
below.

For the function:
unsigned int foo(unsigned int x, unsigned int y) {
  return y << 2;
}

GCC with -O2 -mcpu=em would previously generate:
foo:add r1,r1,r1
add r1,r1,r1
j_s.d   [blink]
mov_s   r0,r1   ;4
and with this patch now generates:
foo:asl_s r0,r1
j_s.d   [blink]
asl_s r0,r0

Notice the original (from shift_si3's output_shift) requires the
shift sequence to be monolithic with the same destination register
as the source (requiring an extra mov_s).  The new version can
eliminate this move, and schedule the second asl in the branch
delay slot of the return.

For the function:
int x,y,z;

void bar()
{
  x <<= 3;
  y <<= 3;
  z <<= 3;
}

GCC -O2 -mcpu=em currently generates:
bar:push_s  r13
ld.as   r12,[gp,@x@sda] ;23
ld.as   r3,[gp,@y@sda]  ;23
mov r2,0
add3 r12,r2,r12
mov r2,0
add3 r3,r2,r3
ld.as   r2,[gp,@z@sda]  ;23
st.as   r12,[gp,@x@sda] ;26
mov r13,0
add3 r2,r13,r2
st.as   r3,[gp,@y@sda]  ;26
st.as   r2,[gp,@z@sda]  ;26
j_s.d   [blink]
pop_s   r13

where each shift by 3, uses ARC's add3 instruction, which is similar
to x86's lea implementing x = (y<<3) + z, but requires the value zero
to be placed in a temporary register "z".  Splitting this before reload
allows these pseudos to be shared/reused.  With this patch, we get

bar:ld.as   r2,[gp,@x@sda]  ;23
mov_s   r3,0;3
add3r2,r3,r2
ld.as   r3,[gp,@y@sda]  ;23
st.as   r2,[gp,@x@sda]  ;26
ld.as   r2,[gp,@z@sda]  ;23
mov_s   r12,0   ;3
add3r3,r12,r3
add3r2,r12,r2
st.as   r3,[gp,@y@sda]  ;26
st.as   r2,[gp,@z@sda]  ;26
j_s [blink]

Unfortunately, register allocation means that we only share two of the
three "mov_s z,0", but this is sufficient to reduce register pressure
enough to avoid spilling r13 in the prologue/epilogue.

This patch also contains a (latent?) bug fix.  The implementation of
the default insn "length" attribute, assumes instructions of type
"shift" have two input operands and accesses operands[2], hence 
specializations of shifts that don't have a operands[2], need to be
categorized as type "unary" (which results in the correct length).

This patch has been tested on a cross-compiler to arc-elf (hosted on
x86_64-pc-linux-gnu), but because I've an incomplete tool chain many
of the regression test fail, but there are no new failures with new
test cases added below.  If you can confirm that there are no issues
from additional testing, is this OK for mainline?

Finally a quick technical question.  ARC's zero overhead loops require
at least two instructions in the loop, so currently the backend's
implementation of shr20 pads the loop body with a "nop".

lshr20: mov.f lp_count, 20
lpnz2f
lsr r0,r0
nop
2:  # end single insn loop
j_s [blink]

could this be more efficiently implemented as:

lshr20: mov lp_count, 10
lp 2f
lsr_s r0,r0
lsr_s r0,r0
2:  # end single insn loop
j_s [blink]

i.e. half the number of iterations, but doing twice as much useful
work in each iteration?  Or might the nop be free on advanced
microarchitectures, and/or the consecutive dependent shifts cause
a pipeline stall?  It would be nice to fuse loops to implement
rotations, such that rotr16 (aka swap) would look like:

rot16:  mov_s r1,r0
mov lp_count, 16
lp 2f
asl_s r0,r0
lsr_s r1,r1
2:  # end single insn loop
j_s.d[blink]
or_s r0,r0,r1

Thanks in advance,
Roger


2023-09-28  Roger Sayle  

gcc/ChangeLog
* config/arc/arc-protos.h (emit_shift): Delete prototype.
(arc_pre_reload_split): New function prototype.
* config/arc/arc.cc (emit_shift): Delete function.
(arc_pre_reload_split): New predicate function, copied from i386,
to schedule define_insn_and_split splitters to the split1 pass.
* config/arc/arc.md (ashlsi3): Expand RTL template unconditionally.
(ashrsi3): Likewise.
(lshrsi3): Likewise.

Re: Test with an lto-build of libgfortran.

2023-09-28 Thread Jakub Jelinek
On Thu, Sep 28, 2023 at 01:00:41PM +0200, Tobias Burnus wrote:
> I am not aware of any logigal/integer/real(+comples)/character kind > 16,
> except for this PPC one. And complex numbers are pairs of BT_REAL.
> 
> Thus, I think that patch should be fine - except:
> 
> > Does anything error earlier if it is larger?  I mean, say user calling
> > _gfortan_transfer_integer by hand with kind 1024?
> 
> I think this will fail. We have various ways to deal with this in libgfortran;
> I see some cases where the switch "default:" sets the length to 0; we have
> other places where we use an "assert", I think we have other places were
> we run into UB.
> 
> Thus, one option would be to either 'assert(len <= 16)' or
> 'assert((size_t)len < GFC_OTOA_BUF_SIZE - 1)' instead.
> 
> Or we could handle it as len=0 and silently ignore the output or ...
> 
> I am fine with either of the many options - except that I like something
> explicit involving 'len' and a comparison (unreachable, assert, regarding as 
> len = 0)
> better than the existing warning suppression which is too indirect for
> me. (Besides: it does not work for LTO.) Preferences? Tobias

Let's go with the __builtin_unreachable, ok for trunk.

Jakub



Re: Test with an lto-build of libgfortran.

2023-09-28 Thread Tobias Burnus

(replace gcc@ by gcc-patches@; see
https://gcc.gnu.org/pipermail/gcc/2023-September/242591.html
and other emails in that thread)

On 28.09.23 11:51, Jakub Jelinek wrote:

On Thu, Sep 28, 2023 at 09:29:02AM +0200, Tobias Burnus wrote:

On 28.09.23 08:25, Richard Biener via Fortran wrote:


This particular place in libgfortran has

/* write_z, which calls xtoa_big, is called from transfer.c,
   formatted_transfer_scalar_write.  There it is passed the kind as
   argument, which means a maximum of 16.  The buffer is large
   enough, but the compiler does not know that, so shut up the
   warning here.  */

...

I have replaced it now by the assert that "len <= 16", i.e.
+  if (len > 16)
+__builtin_unreachable ();

Is it just that in correct programs len can't be > 16, or that it is really
impossible for it being > 16?  I mean, we have that artificial kind 17 for
powerpc which better should be turned into length of 16, but isn't e.g.
_gfortran_transfer_integer etc.


My understanding is that kind=17 only pops up on PowerPC
for REAL variables as they represent __float128 in multiple ways.

Having said that, the current call tree is:

* xtoa_big: that's where the warning suppression
  was replaced by the unreachable.

* Only caller is 'write_z' with calls it by passing its
  last argument ('len') as last argument ('len')

* "internal_proto(write_z)" implies that it is not called from
  outside libgfortran. The internal only caller is:

*  formatted_transfer_scalar_write, which calls it as:

case FMT_Z:
  ...
#ifdef HAVE_GFC_REAL_17
  if (type == BT_REAL && kind == 17)
kind = 16;
#endif
  write_z (dtp, f, p, kind);

I am not aware of any logigal/integer/real(+comples)/character kind > 16,
except for this PPC one. And complex numbers are pairs of BT_REAL.

Thus, I think that patch should be fine - except:


Does anything error earlier if it is larger?  I mean, say user calling
_gfortan_transfer_integer by hand with kind 1024?


I think this will fail. We have various ways to deal with this in libgfortran;
I see some cases where the switch "default:" sets the length to 0; we have
other places where we use an "assert", I think we have other places were
we run into UB.

Thus, one option would be to either 'assert(len <= 16)' or
'assert((size_t)len < GFC_OTOA_BUF_SIZE - 1)' instead.

Or we could handle it as len=0 and silently ignore the output or ...

I am fine with either of the many options - except that I like something
explicit involving 'len' and a comparison (unreachable, assert, regarding as 
len = 0)
better than the existing warning suppression which is too indirect for
me. (Besides: it does not work for LTO.) Preferences? Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] use *_grow_cleared rather than *_grow on vec

2023-09-28 Thread Jakub Jelinek
On Thu, Sep 28, 2023 at 12:29:15PM +0200, Jakub Jelinek wrote:
> On Thu, Sep 28, 2023 at 09:29:31AM +, Richard Biener wrote:
> > > The following patch splits the bitmap_head class into a POD
> > > struct bitmap_head_pod and bitmap_head derived from it with non-trivial
> > > default constexpr constructor.  Most code should keep using bitmap_head
> > > as before, bitmap_head_pod is there just for the cases where we want to
> > > embed the bitmap head into a vector which we want to e.g. 
> > > {quick,safe}_grow
> > > and in a loop bitmap_initialize it afterwards (to avoid having to
> > > {quick,safe}_grow_cleared them just to overwrite with bitmap_initialize).
> > > The patch is larger than I hoped, because e.g. some code just used bitmap
> > > and bitmap_head * or const_bitmap and const bitmap_head * interchangeably.
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > OK if there are no comments indicating otherwise.
> 
> A counter argument against this patch would be that it weakens the intent
> to catch uses of uninitialized bitmaps for saving a few compile time cycles.
> If one uses
>   bitmap_head var;
>   bitmap_initialize (, NULL);
> etc., we spend those extra cycles to initialize it and nothing is told that
> bitmap_initialize overwrites the whole var without ever using of any of its
> elements, so DSE can't eliminate that.  And in the vec case which prompted
> this patch it was about
>   vec a;
>   a.create (n);
>   a.safe_grow (n); // vs. a.safe_grow_cleared (n);
>   for (int i = 0; i < n; ++i)
> bitmap_initialize ([i], NULL);
> When using bitmap_head_pod, one needs to ensure initialization without
> help to catch such mistakes.

Here is the alternative patch which pays the small extra price while not
undermining the checking.  Verified in all those places there is a loop
doing bitmap_initialize immediately afterwards or worst case a few lines
later.

With the static_assert uncommented, the remaining failures are poly_int
related (supposedly gone with Richard S.'s poly_int patch) and the
vect_unpromoted_value/ao_ref still unresolved cases.

2023-09-28  Jakub Jelinek  

* tree-ssa-loop-im.cc (tree_ssa_lim_initialize): Use quick_grow_cleared
instead of quick_grow on vec members.
* cfganal.cc (control_dependences::control_dependences): Likewise.
* rtl-ssa/blocks.cc (function_info::build_info::build_info): Likewise.
(function_info::place_phis): Use safe_grow_cleared instead of safe_grow
on auto_vec vars.
* tree-ssa-live.cc (compute_live_vars): Use quick_grow_cleared instead
of quick_grow on vec var.

--- gcc/tree-ssa-loop-im.cc.jj  2023-09-28 12:06:03.527974171 +0200
+++ gcc/tree-ssa-loop-im.cc 2023-09-28 12:38:07.028966742 +0200
@@ -3496,13 +3496,13 @@ tree_ssa_lim_initialize (bool store_moti
 (mem_ref_alloc (NULL, 0, UNANALYZABLE_MEM_ID));
 
   memory_accesses.refs_loaded_in_loop.create (number_of_loops (cfun));
-  memory_accesses.refs_loaded_in_loop.quick_grow (number_of_loops (cfun));
+  memory_accesses.refs_loaded_in_loop.quick_grow_cleared (number_of_loops 
(cfun));
   memory_accesses.refs_stored_in_loop.create (number_of_loops (cfun));
-  memory_accesses.refs_stored_in_loop.quick_grow (number_of_loops (cfun));
+  memory_accesses.refs_stored_in_loop.quick_grow_cleared (number_of_loops 
(cfun));
   if (store_motion)
 {
   memory_accesses.all_refs_stored_in_loop.create (number_of_loops (cfun));
-  memory_accesses.all_refs_stored_in_loop.quick_grow
+  memory_accesses.all_refs_stored_in_loop.quick_grow_cleared
  (number_of_loops (cfun));
 }
 
--- gcc/cfganal.cc.jj   2023-09-28 11:31:45.013870771 +0200
+++ gcc/cfganal.cc  2023-09-28 12:37:34.302425957 +0200
@@ -468,7 +468,7 @@ control_dependences::control_dependences
 
   bitmap_obstack_initialize (_bitmaps);
   control_dependence_map.create (last_basic_block_for_fn (cfun));
-  control_dependence_map.quick_grow (last_basic_block_for_fn (cfun));
+  control_dependence_map.quick_grow_cleared (last_basic_block_for_fn (cfun));
   for (int i = 0; i < last_basic_block_for_fn (cfun); ++i)
 bitmap_initialize (_dependence_map[i], _bitmaps);
   for (int i = 0; i < num_edges; ++i)
--- gcc/rtl-ssa/blocks.cc.jj2023-09-28 11:31:45.413865158 +0200
+++ gcc/rtl-ssa/blocks.cc   2023-09-28 12:41:28.063145949 +0200
@@ -57,7 +57,7 @@ function_info::build_info::build_info (u
   // write to an entry before reading from it.  But poison the contents
   // when checking, just to make sure we don't accidentally use an
   // uninitialized value.
-  bb_phis.quick_grow (num_bb_indices);
+  bb_phis.quick_grow_cleared (num_bb_indices);
   bb_mem_live_out.quick_grow (num_bb_indices);
   bb_to_rpo.quick_grow (num_bb_indices);
   if (flag_checking)
@@ -614,7 +614,7 @@ function_info::place_phis (build_info 
 
   // Calculate dominance frontiers.
   auto_vec frontiers;
-  

Re: [PATCH v2] AArch64: Fix memmove operand corruption [PR111121]

2023-09-28 Thread Richard Sandiford
Wilco Dijkstra  writes:
> A MOPS memmove may corrupt registers since there is no copy of the input
> operands to temporary registers.  Fix this by calling
> aarch64_expand_cpymem_mops.
>
> Passes regress/bootstrap, OK for commit?
>
> gcc/ChangeLog/
> PR target/21
> * config/aarch64/aarch64.md (aarch64_movmemdi): Add new expander.
> (movmemdi): Call aarch64_expand_cpymem_mops for correct expansion.
> * config/aarch64/aarch64.cc (aarch64_expand_cpymem_mops): Add 
> support
> for memmove.
> * config/aarch64/aarch64-protos.h (aarch64_expand_cpymem_mops): 
> Add new
> function.
>
> gcc/testsuite/ChangeLog/
> PR target/21
> * gcc.target/aarch64/mops_4.c: Add memmove testcases.

OK, thanks.  Also OK for whichever branches need it.

Sorry for the slow review, too much email backlog :(

Richard

>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 70303d6fd953e0c397b9138ede8858c2db2e53db..e8d91cba30e32e03c4794ccc24254691d135f2dd
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -765,6 +765,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx);
>  bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
>  tree aarch64_vector_load_decl (tree);
>  void aarch64_expand_call (rtx, rtx, rtx, bool);
> +bool aarch64_expand_cpymem_mops (rtx *, bool);
>  bool aarch64_expand_cpymem (rtx *);
>  bool aarch64_expand_setmem (rtx *);
>  bool aarch64_float_const_zero_rtx_p (rtx);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 219c4ee6d4cd7522f6ad634c794485841e5d08fa..dd6874d13a75f20d10a244578afc355b25c73da2
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -25228,10 +25228,11 @@ aarch64_copy_one_block_and_progress_pointers (rtx 
> *src, rtx *dst,
>*dst = aarch64_progress_pointer (*dst);
>  }
>
> -/* Expand a cpymem using the MOPS extension.  OPERANDS are taken
> -   from the cpymem pattern.  Return true iff we succeeded.  */
> -static bool
> -aarch64_expand_cpymem_mops (rtx *operands)
> +/* Expand a cpymem/movmem using the MOPS extension.  OPERANDS are taken
> +   from the cpymem/movmem pattern.  IS_MEMMOVE is true if this is a memmove
> +   rather than memcpy.  Return true iff we succeeded.  */
> +bool
> +aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove = false)
>  {
>if (!TARGET_MOPS)
>  return false;
> @@ -25243,8 +25244,10 @@ aarch64_expand_cpymem_mops (rtx *operands)
>rtx dst_mem = replace_equiv_address (operands[0], dst_addr);
>rtx src_mem = replace_equiv_address (operands[1], src_addr);
>rtx sz_reg = copy_to_mode_reg (DImode, operands[2]);
> -  emit_insn (gen_aarch64_cpymemdi (dst_mem, src_mem, sz_reg));
> -
> +  if (is_memmove)
> +emit_insn (gen_aarch64_movmemdi (dst_mem, src_mem, sz_reg));
> +  else
> +emit_insn (gen_aarch64_cpymemdi (dst_mem, src_mem, sz_reg));
>return true;
>  }
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 60133b541e9289610ce58116b0258a61f29bdc00..6d0f072a9dd6d094e8764a513222a9129d8296fa
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1635,7 +1635,22 @@ (define_expand "cpymemdi"
>  }
>  )
>
> -(define_insn "aarch64_movmemdi"
> +(define_expand "aarch64_movmemdi"
> +  [(parallel
> + [(set (match_operand 2) (const_int 0))
> +  (clobber (match_dup 3))
> +  (clobber (match_dup 4))
> +  (clobber (reg:CC CC_REGNUM))
> +  (set (match_operand 0)
> +  (unspec:BLK [(match_operand 1) (match_dup 2)] UNSPEC_MOVMEM))])]
> +  "TARGET_MOPS"
> +  {
> +operands[3] = XEXP (operands[0], 0);
> +operands[4] = XEXP (operands[1], 0);
> +  }
> +)
> +
> +(define_insn "*aarch64_movmemdi"
>[(parallel [
> (set (match_operand:DI 2 "register_operand" "+") (const_int 0))
> (clobber (match_operand:DI 0 "register_operand" "+"))
> @@ -1668,17 +1683,9 @@ (define_expand "movmemdi"
> && INTVAL (sz_reg) < aarch64_mops_memmove_size_threshold)
>   FAIL;
>
> -   rtx addr_dst = XEXP (operands[0], 0);
> -   rtx addr_src = XEXP (operands[1], 0);
> -
> -   if (!REG_P (sz_reg))
> - sz_reg = force_reg (DImode, sz_reg);
> -   if (!REG_P (addr_dst))
> - addr_dst = force_reg (DImode, addr_dst);
> -   if (!REG_P (addr_src))
> - addr_src = force_reg (DImode, addr_src);
> -   emit_insn (gen_aarch64_movmemdi (addr_dst, addr_src, sz_reg));
> -   DONE;
> +  if (aarch64_expand_cpymem_mops (operands, true))
> +DONE;
> +  FAIL;
>  }
>  )
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/mops_4.c 
> b/gcc/testsuite/gcc.target/aarch64/mops_4.c
> index 
> 1b87759cb5e8bbcbb58cf63404d1d579d44b2818..dd796115cb4093251964d881e93bf4b98ade0c32
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/mops_4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/mops_4.c
> @@ -50,6 

Re: [PATCH] bitmap: Introduce bitmap_head_pod

2023-09-28 Thread Jakub Jelinek
On Thu, Sep 28, 2023 at 09:29:31AM +, Richard Biener wrote:
> > The following patch splits the bitmap_head class into a POD
> > struct bitmap_head_pod and bitmap_head derived from it with non-trivial
> > default constexpr constructor.  Most code should keep using bitmap_head
> > as before, bitmap_head_pod is there just for the cases where we want to
> > embed the bitmap head into a vector which we want to e.g. {quick,safe}_grow
> > and in a loop bitmap_initialize it afterwards (to avoid having to
> > {quick,safe}_grow_cleared them just to overwrite with bitmap_initialize).
> > The patch is larger than I hoped, because e.g. some code just used bitmap
> > and bitmap_head * or const_bitmap and const bitmap_head * interchangeably.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> OK if there are no comments indicating otherwise.

A counter argument against this patch would be that it weakens the intent
to catch uses of uninitialized bitmaps for saving a few compile time cycles.
If one uses
  bitmap_head var;
  bitmap_initialize (, NULL);
etc., we spend those extra cycles to initialize it and nothing is told that
bitmap_initialize overwrites the whole var without ever using of any of its
elements, so DSE can't eliminate that.  And in the vec case which prompted
this patch it was about
  vec a;
  a.create (n);
  a.safe_grow (n); // vs. a.safe_grow_cleared (n);
  for (int i = 0; i < n; ++i)
bitmap_initialize ([i], NULL);
When using bitmap_head_pod, one needs to ensure initialization without
help to catch such mistakes.

Jakub



[PATCH v3 2/2] LoongArch: Modify check_effective_target_vect_int_mod according to SX/ASX capabilities.

2023-09-28 Thread Chenghui Pan
gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add LoongArch in
check_effective_target_vect_int_mod according to SX/ASX capabilities.
---
 gcc/testsuite/lib/target-supports.exp | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 25958aaf0c5..5632904ddfd 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8747,6 +8747,8 @@ proc check_effective_target_vect_int_mod { } {
 return [check_cached_effective_target_indexed vect_int_mod {
   expr { ([istarget powerpc*-*-*]
  && [check_effective_target_has_arch_pwr10])
+ || ([istarget loongarch*-*-*]
+ && [check_effective_target_loongarch_sx])
  || [istarget amdgcn-*-*] }}]
 }
 
@@ -12824,6 +12826,14 @@ proc 
check_effective_target_const_volatile_readonly_section { } {
   return 1
 }
 
+proc check_effective_target_loongarch_sx { } {
+return [check_no_compiler_messages loongarch_lsx assembly {
+   #if !defined(__loongarch_sx)
+   #error "LSX not defined"
+   #endif
+}]
+}
+
 proc check_effective_target_loongarch_sx_hw { } {
 return [check_runtime loongarch_sx_hw {
#include 
@@ -12836,6 +12846,14 @@ proc check_effective_target_loongarch_sx_hw { } {
 } "-mlsx"]
 }
 
+proc check_effective_target_loongarch_asx { } {
+return [check_no_compiler_messages loongarch_asx assembly {
+   #if !defined(__loongarch_asx)
+   #error "LASX not defined"
+   #endif
+}]
+}
+
 proc check_effective_target_loongarch_asx_hw { } {
 return [check_runtime loongarch_asx_hw {
#include 
-- 
2.36.0



[PATCH v3 0/2] LoongArch: Update target-supports.exp for LoongArch SX/ASX.

2023-09-28 Thread Chenghui Pan
This is the update of: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631379.html

This version does not include changes for codes, but fixes the commit title
format and appends the missing PR info.

Chenghui Pan (2):
  LoongArch: Enable vect.exp for LoongArch. [PR111424]
  LoongArch: Modify check_effective_target_vect_int_mod according to
SX/ASX capabilities.

 gcc/testsuite/lib/target-supports.exp | 49 +++
 1 file changed, 49 insertions(+)

-- 
2.36.0



[PATCH v3 1/2] LoongArch: Enable vect.exp for LoongArch. [PR111424]

2023-09-28 Thread Chenghui Pan
gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Enable vect.exp for LoongArch.
---
 gcc/testsuite/lib/target-supports.exp | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 3a472943a9b..25958aaf0c5 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11335,6 +11335,13 @@ proc check_vect_support_and_set_flags { } {
lappend DEFAULT_VECTCFLAGS "--param" "riscv-vector-abi"
set dg-do-what-default compile
}
+} elseif [istarget loongarch*-*-*] {
+  lappend DEFAULT_VECTCFLAGS "-mdouble-float" "-mlasx"
+  if [check_effective_target_loongarch_asx_hw] {
+ set dg-do-what-default run
+  } else {
+ set dg-do-what-default compile
+  }
 } else {
 return 0
 }
@@ -12817,6 +12824,30 @@ proc 
check_effective_target_const_volatile_readonly_section { } {
   return 1
 }
 
+proc check_effective_target_loongarch_sx_hw { } {
+return [check_runtime loongarch_sx_hw {
+   #include 
+   int main (void)
+   {
+ __m128i a, b, c;
+ c = __lsx_vand_v (a, b);
+ return 0;
+   }
+} "-mlsx"]
+}
+
+proc check_effective_target_loongarch_asx_hw { } {
+return [check_runtime loongarch_asx_hw {
+   #include 
+   int main (void)
+   {
+ __m256i a, b, c;
+ c = __lasx_xvand_v (a, b);
+ return 0;
+   }
+} "-mlasx"]
+}
+
 # Appends necessary Python flags to extra-tool-flags if Python.h is supported.
 # Otherwise, modifies dg-do-what.
 proc dg-require-python-h { args } {
-- 
2.36.0



[PATCH] Remove poly_int_pod

2023-09-28 Thread Richard Sandiford
poly_int was written before the switch to C++11 and so couldn't
use explicit default constructors.  This led to an awkward split
between poly_int_pod and poly_int.  poly_int simply inherited from
poly_int_pod and added constructors, with the argumentless constructor
having an empty body.  But inheritance meant that poly_int had to
repeat the assignment operators from poly_int_pod (again, no C++11,
so no "using" to inherit base-class implementations).

All that goes away if we switch to using default constructors.

The main complication is ensuring that braced initialisation still
gives a constexpr, so that static variables can be initialised without
runtime code.  The two problems here are:

(1) When initialising a poly_int with fewer than N
coefficients, the other coefficients need to be a zero of
the same precision as the explicit coefficients.  This was
previously done in a for loop using wi::ints_for<...>::zero,
but C++11 constexpr constructors can't have function bodies.
The patch instead uses a series of delegated initialisers to
fill in the implicit coefficients.

(2) The initialisation in:

  void f(int x) {
unsigned int foo {x};
  }

produces the warning:

  warning: narrowing conversion of 'x' from 'int' to 'unsigned int' 
[-Wnarrowing]

whereas:

  void f(int x) {
unsigned int foo = x;
  }

does not.  So switching to direct initialisation of the coeffs array
would mean that:

  poly_uin64_t x = 0;

would trigger a warning for using 0 rather than 0u.  That seemed
overly pedantic, so the patch adds explicit casts to the constructor.
The complication is to do that without adding extra code to
wide-int versions.  The patch uses a new init_cast type for that.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  Also tested with
Jakub's vec.h patch with the static_asserts uncommented; there were
no errors from poly_int-related stuff.  OK to install?

Richard


gcc/
* poly-int.h (poly_int_pod): Delete.
(poly_coeff_traits::init_cast): New type.
(poly_int_full, poly_int_hungry, poly_int_fullness): New structures.
(poly_int): Replace constructors that take 1 and 2 coefficients with
a general one that takes an arbitrary number of coefficients.
Delegate initialization to two new private constructors, one of
which uses the coefficients as-is and one of which adds an extra
zero of the appropriate type (and precision, where applicable).
(gt_ggc_mx, gt_pch_nx): Operate on poly_ints rather than poly_int_pods.
* poly-int-types.h (poly_uint16_pod, poly_int64_pod, poly_uint64_pod)
(poly_offset_int_pod, poly_wide_int_pod, poly_widest_int_pod): Delete.
* gengtype.cc (main): Don't register poly_int64_pod.
* calls.cc (initialize_argument_information): Use poly_int rather
than poly_int_pod.
(combine_pending_stack_adjustment_and_call): Likewise.
* config/aarch64/aarch64.cc (pure_scalable_type_info): Likewise.
* data-streamer.h (bp_unpack_poly_value): Likewise.
* dwarf2cfi.cc (struct dw_trace_info): Likewise.
(struct queued_reg_save): Likewise.
* dwarf2out.h (struct dw_cfa_location): Likewise.
* emit-rtl.h (struct incoming_args): Likewise.
(struct rtl_data): Likewise.
* expr.cc (get_bit_range): Likewise.
(get_inner_reference): Likewise.
* expr.h (get_bit_range): Likewise.
* fold-const.cc (split_address_to_core_and_offset): Likewise.
(ptr_difference_const): Likewise.
* fold-const.h (ptr_difference_const): Likewise.
* function.cc (try_fit_stack_local): Likewise.
(instantiate_new_reg): Likewise.
* function.h (struct expr_status): Likewise.
(struct args_size): Likewise.
* genmodes.cc (ZERO_COEFFS): Likewise.
(mode_size_inline): Likewise.
(mode_nunits_inline): Likewise.
(emit_mode_precision): Likewise.
(emit_mode_size): Likewise.
(emit_mode_nunits): Likewise.
* gimple-fold.cc (get_base_constructor): Likewise.
* gimple-ssa-store-merging.cc (struct symbolic_number): Likewise.
* inchash.h (class hash): Likewise.
* ipa-modref-tree.cc (modref_access_node::dump): Likewise.
* ipa-modref.cc (modref_access_analysis::merge_call_side_effects):
Likewise.
* ira-int.h (ira_spilled_reg_stack_slot): Likewise.
* lra-eliminations.cc (self_elim_offsets): Likewise.
* machmode.h (mode_size, mode_precision, mode_nunits): Likewise.
* omp-low.cc (omplow_simd_context): Likewise.
* pretty-print.cc (pp_wide_integer): Likewise.
* pretty-print.h (pp_wide_integer): Likewise.
* reload.cc (struct decomposition): Likewise.
* reload.h (struct reload): Likewise.
* reload1.cc (spill_stack_slot_width): Likewise.
(struct elim_table): Likewise.
 

[pushed] Remove some unused poly_int variables

2023-09-28 Thread Richard Sandiford
Switching to default constructors for poly_int exposed some
unused variables that weren't previously diagnosed.

Tested on aarch64-linux-gnu & x86_64-linux-gnu, pushed as obvious.

Richard


gcc/
* dwarf2out.cc (mem_loc_descriptor): Remove unused variables.
* tree-affine.cc (expr_to_aff_combination): Likewise.

gcc/cp/
* constexpr.cc (cxx_fold_indirect_ref): Remove unused variables.

gcc/rust/
* backend/rust-constexpr.cc (rs_fold_indirect_ref): Remove unused
variables.
---
 gcc/cp/constexpr.cc| 1 -
 gcc/dwarf2out.cc   | 1 -
 gcc/rust/backend/rust-constexpr.cc | 1 -
 gcc/tree-affine.cc | 1 -
 4 files changed, 4 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 2a6601c0cbc..0f948db7c2d 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -5643,7 +5643,6 @@ cxx_fold_indirect_ref (const constexpr_ctx *ctx, 
location_t loc, tree type,
 {
   tree sub = op0;
   tree subtype;
-  poly_uint64 const_op01;
 
   /* STRIP_NOPS, but stop if REINTERPRET_CAST_P.  */
   while (CONVERT_EXPR_P (sub) || TREE_CODE (sub) == NON_LVALUE_EXPR
diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index f60a0656d8f..ad2be7c961a 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -15967,7 +15967,6 @@ mem_loc_descriptor (rtx rtl, machine_mode mode,
   enum dwarf_location_atom op;
   dw_loc_descr_ref op0, op1;
   rtx inner = NULL_RTX;
-  poly_int64 offset;
 
   if (mode == VOIDmode)
 mode = GET_MODE (rtl);
diff --git a/gcc/rust/backend/rust-constexpr.cc 
b/gcc/rust/backend/rust-constexpr.cc
index 4e581a3f2cf..b28fa27b2d0 100644
--- a/gcc/rust/backend/rust-constexpr.cc
+++ b/gcc/rust/backend/rust-constexpr.cc
@@ -737,7 +737,6 @@ rs_fold_indirect_ref (const constexpr_ctx *ctx, location_t 
loc, tree type,
 {
   tree sub = op0;
   tree subtype;
-  poly_uint64 const_op01;
 
   /* STRIP_NOPS, but stop if REINTERPRET_CAST_P.  */
   while (CONVERT_EXPR_P (sub) || TREE_CODE (sub) == NON_LVALUE_EXPR
diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
index ee327e63a23..ecab4671ab4 100644
--- a/gcc/tree-affine.cc
+++ b/gcc/tree-affine.cc
@@ -268,7 +268,6 @@ expr_to_aff_combination (aff_tree *comb, tree_code code, 
tree type,
 tree op0, tree op1 = NULL_TREE)
 {
   aff_tree tmp;
-  poly_int64 bitpos, bitsize, bytepos;
 
   switch (code)
 {
-- 
2.25.1



Re: [PATCH] RISC-V/testsuite: Fix ILP32 RVV failures from missing

2023-09-28 Thread Maciej W. Rozycki
On Wed, 27 Sep 2023, Jeff Law wrote:

> > IMO this is one of those places where we should just be as normal as
> > possible.  So if the other big ports allow system headers then we should,
> > otherwise we should move everyone over to testing in some way we'll catch
> > these before commit.
> Exactly.  I think the dance we've been doing with stdint-gcc.h is a bit silly,
> but I haven't pushed on it at all.
> 
> No other port does anything similar.  When they need stdint.h, they include
> it.  It does mean you have to have the appropriate headers installed for each
> multilib configuration, but that's the way every other port handles this
> problem.  There's no good reason I'm aware of for RISC-V to be different.

 I agree that using standard system headers where required is a reasonable 
expectation, but I maintain that when using a non-default ABI/multilib by 
an explicit request of a test case, it is the responsibility of our test 
framework to verify that the chosen ABI/multilib is compatible with the 
environment.  I think it should apply equally to all the tests whether 
they are run, link, or compilation tests.

 I think a requirement for a verifier to have headers for all the possible 
ABIs/multilibs installed is unreasonable.  For a hosted target such as 
Linux/*BSD/whatever it may yet be feasible.  For a bare-metal target it 
may not be possible, and in particular such a target may possibly support 
one specific ABI only and #error if an incompatible configuration is 
detected.  This must not cause a test to FAIL, because GIGO.

 Overall I think an expectation ought to be for a given ABI/multilib to be 
verified by running the whole testsuite for the desired configuration, 
either by having it as the default for the toolchain under test or via an 
explicit target board option.  I accept the desire to have alternative 
ABIs/multilibs smoke-tested by a couple of tests explicitly requesting 
them via a compilation option, to verify that the option works if nothing 
else.

 I think the burden of verifying the compatibility of the environment must 
not be on the individual tests, but the framework itself, and I think the 
MIPS framework fulfils the requirement, as it verifies the options given 
in the individual tests without the tests themselves having to request 
anything, they just list the compilation options they require the usual 
way via `dg-options'.

 I think that to verify the compatibility of ABI/multilib options for 
compilation tests we can pick up a system header we can reasonably expect 
to change depending on the ABI/multilib chosen, and therefore to pull any 
ABI/multilib-specific bits, such as .  If a given environment 
turns out incompatible with a given ABI/multilib option, then all the 
tests requesting said option ought to be automatically demoted to 
UNSUPPORTED.

 Most importantly implementing this approach in our test framework is a 
one-off project, while requiring people to have their environment set up 
for ABI/multilib configurations they have no interest in would cause them 
continuous effort.

  Maciej


Re: [PATCH] vec.h, v3: Make some ops work with non-trivially copy constructible and/or destructible types

2023-09-28 Thread Richard Biener
On Thu, 28 Sep 2023, Jakub Jelinek wrote:

> Hi!
> 
> On Wed, Sep 27, 2023 at 12:46:45PM +0200, Jakub Jelinek wrote:
> > On Wed, Sep 27, 2023 at 07:17:22AM +, Richard Biener wrote:
> > > OK I guess.  Can you summarize the limitations for non-POD types
> > > in the big comment at the start of vec.h?
> > 
> > Still haven't done that, but will do after we flesh out the details
> > below.
> > 
> > > (can we put in static_asserts
> > > in the places that obviously do not work?)
> > 
> > I've tried to do this though, I think the static_asserts will allow
> > making sure we only use what is supportable and will serve better than
> > any kind of comment.
> 
> The following patch adds the file comment, as discussed on IRC adds an
> exception for qsort/sort/stablesort such that std::pair of 2 trivially
> copyable types is also allowed, and fixes some of the grow vs. grow_cleared
> issues (on top of the bitmap_head_pod patch far more), but still not all
> yet, so I've kept that static_assert for now commented out.  Richard
> Sandiford said he's playing with poly_int_pod vs. poly_int and I'll resolve
> the remaining stuff incrementally afterwards plus enable the assert.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2023-09-28  Jakub Jelinek  
>   Jonathan Wakely  
> 
>   * vec.h: Mention in file comment limited support for non-POD types
>   in some operations.
>   (vec_destruct): New function template.
>   (release): Use it for non-trivially destructible T.
>   (truncate): Likewise.
>   (quick_push): Perform a placement new into slot
>   instead of assignment.
>   (pop): For non-trivially destructible T return void
>   rather than T & and destruct the popped element.
>   (quick_insert, ordered_remove): Note that they aren't suitable
>   for non-trivially copyable types.  Add static_asserts for that.
>   (block_remove): Assert T is trivially copyable.
>   (vec_detail::is_trivially_copyable_or_pair): New trait.
>   (qsort, sort, stablesort): Assert T is trivially copyable or
>   std::pair with both trivally copyable types.
>   (quick_grow): Add assert T is trivially default constructible,
>   for now commented out.
>   (quick_grow_cleared): Don't call quick_grow, instead inline it
>   by hand except for the new static_assert.
>   (gt_ggc_mx): Assert T is trivially destructable.
>   (auto_vec::operator=): Formatting fixes.
>   (auto_vec::auto_vec): Likewise.
>   (vec_safe_grow_cleared): Don't call vec_safe_grow, instead inline
>   it manually and call quick_grow_cleared method rather than quick_grow.
>   (safe_grow_cleared): Likewise.
>   * edit-context.cc (class line_event): Move definition earlier.
>   * tree-ssa-loop-im.cc (seq_entry::seq_entry): Make default ctor
>   defaulted.
>   * ipa-fnsummary.cc (evaluate_properties_for_edge): Use
>   safe_grow_cleared instead of safe_grow followed by placement new
>   constructing the elements.
> 
> --- gcc/vec.h.jj  2023-09-27 10:38:50.635845540 +0200
> +++ gcc/vec.h 2023-09-28 11:05:14.776215137 +0200
> @@ -111,6 +111,24 @@ extern void *ggc_realloc (void *, size_t
> the 'space' predicate will tell you whether there is spare capacity
> in the vector.  You will not normally need to use these two functions.
>  
> +   Not all vector operations support non-POD types and such restrictions
> +   are enforced through static assertions.  Some operations which often use
> +   memmove to move elements around like quick_insert, safe_insert,
> +   ordered_remove, unordered_remove, block_remove etc. require trivially
> +   copyable types.  Sorting operations, qsort, sort and stablesort, require
> +   those too but as an extension allow also std::pair of 2 trivially copyable
> +   types which happens to work even when std::pair itself isn't trivially
> +   copyable.  The quick_grow and safe_grow operations require trivially
> +   default constructible types.  One can use quick_grow_cleared or
> +   safe_grow_cleared for non-trivially default constructible types if needed
> +   (but of course such operation is more expensive then).  The pop operation
> +   returns reference to the last element only for trivially destructible
> +   types, for non-trivially destructible types one should use last operation
> +   followed by pop which in that case returns void.
> +   And finally, the GC and GC atomic vectors should always be used with
> +   trivially destructible types, as nothing will invoke destructors when they
> +   are freed during GC.
> +
> Notes on the different layout strategies
>  
> * Embeddable vectors (vec)
> @@ -185,6 +203,16 @@ extern void dump_vec_loc_statistics (voi
>  /* Hashtable mapping vec addresses to descriptors.  */
>  extern htab_t vec_mem_usage_hash;
>  
> +/* Destruct N elements in DST.  */
> +
> +template 
> +inline void
> +vec_destruct (T *dst, unsigned n)
> +{

Re: [PATCH] bitmap: Introduce bitmap_head_pod

2023-09-28 Thread Richard Biener
On Thu, 28 Sep 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following patch splits the bitmap_head class into a POD
> struct bitmap_head_pod and bitmap_head derived from it with non-trivial
> default constexpr constructor.  Most code should keep using bitmap_head
> as before, bitmap_head_pod is there just for the cases where we want to
> embed the bitmap head into a vector which we want to e.g. {quick,safe}_grow
> and in a loop bitmap_initialize it afterwards (to avoid having to
> {quick,safe}_grow_cleared them just to overwrite with bitmap_initialize).
> The patch is larger than I hoped, because e.g. some code just used bitmap
> and bitmap_head * or const_bitmap and const bitmap_head * interchangeably.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK if there are no comments indicating otherwise.

Thanks,
Richard.

> 2023-09-28  Jakub Jelinek  
> 
>   * coretypes.h (struct bitmap_head_pod): New forward declaration.
>   (bitmap, const_bitmap): Replace bitmap_head with bitmap_head_pod
>   in typedefs.
>   * bitmap.h (struct bitmap_obstack): Change heads element type to
>   bitmap_head_pod *.
>   (struct bitmap_head_pod): New type.
>   (class bitmap_head): Changed into derived class from bitmap_head_pod
>   with a non-trivial constexpr default ctor.
>   * bitmap.cc (bitmap_head::crashme): Remove.
>   (bitmap_head_pod::crashme): New static data member.
>   (debug): Change bitmap_head to bitmap_head_pod.
>   (bitmap_head::dump): Renamed to ...
>   (bitmap_head_pod::dump): ... this.
>   * tree-ssa-live.h (compute_live_vars, live_vars_at_stmt,
>   destroy_live_vars): Change vec to vec.
>   * tree-ssa-live.cc (struct compute_live_vars_data): Change active
>   member type to vec.
>   (compute_live_vars, live_vars_at_stmt, destroy_live_vars): Change
>   vec to vec.
>   * sel-sched-ir.h (forced_ebb_heads): Change type to bitmap.
>   * gimple-range-path.cc (path_range_query::path_range_query,
>   path_range_query::reset_path, path_range_query::compute_ranges):
>   Change const bitmap_head * in argument types to const_bitmap.
>   * gimple-range-path.h (path_range_query::path_range_query,
>   path_range_query::reset_path, path_range_query::compute_ranges):
>   Likewise.
>   (path_range_query::compute_exit_dependencies): Change bitmap_head *
>   in argument type to bitmap.
>   * tree-ssa-loop-im.cc (memory_accesses): Change type of 3 members
>   from vec to vec.
>   * tree-tailcall.cc (live_vars_vec): Change type from vec
>   to vec.
>   * cfganal.cc (compute_dominance_frontiers, compute_idf): Change
>   argument types from bitmap_head * to bitmap.
>   * rtl-ssa/internals.h (function_info::bb_phi_info): Change regs
>   member type from bitmap_head to bitmap_head_pod.
>   * rtl-ssa/blocks.cc (function_info::place_phis): Change frontiers
>   and unfiltered variable types from auto_vec to
>   auto_vec.
>   * tree-ssa-pre.cc (class bitmap_set): Change expressions and values
>   member types from bitmap_head to bitmap_head_pod.
>   * tree-inline.cc (add_clobbers_to_eh_landing_pad): Change live
>   variable type from vec to vec.
>   * cfganal.h (class control_dependences): Change control_dependence_map
>   member type from vec to vec.
>   (compute_dominance_frontiers, compute_idf): Change argument types from
>   class bitmap_head * to bitmap.
> 
> --- gcc/coretypes.h.jj2023-08-24 15:37:28.233424065 +0200
> +++ gcc/coretypes.h   2023-09-27 14:17:00.764453202 +0200
> @@ -47,9 +47,10 @@ typedef int64_t gcov_type;
>  typedef uint64_t gcov_type_unsigned;
>  
>  struct bitmap_obstack;
> +struct bitmap_head_pod;
>  class bitmap_head;
> -typedef class bitmap_head *bitmap;
> -typedef const class bitmap_head *const_bitmap;
> +typedef class bitmap_head_pod *bitmap;
> +typedef const class bitmap_head_pod *const_bitmap;
>  struct simple_bitmap_def;
>  typedef struct simple_bitmap_def *sbitmap;
>  typedef const struct simple_bitmap_def *const_sbitmap;
> --- gcc/bitmap.h.jj   2023-04-19 09:33:59.141354388 +0200
> +++ gcc/bitmap.h  2023-09-27 15:13:45.700647295 +0200
> @@ -293,7 +293,7 @@ typedef unsigned long BITMAP_WORD;
>  /* Obstack for allocating bitmaps and elements from.  */
>  struct bitmap_obstack {
>struct bitmap_element *elements;
> -  bitmap_head *heads;
> +  bitmap_head_pod *heads;
>struct obstack obstack;
>  };
>  
> @@ -323,15 +323,16 @@ struct GTY((chain_next ("%h.next"))) bit
>  };
>  
>  /* Head of bitmap linked list.  The 'current' member points to something
> -   already pointed to by the chain started by first, so GTY((skip)) it.  */
> +   already pointed to by the chain started by first, so GTY((skip)) it.
> +   desc("0"), tag("0") is just to make gengtype happy, for GC there is
> +   no difference between bitmap_head_pod and bitmap_head types.  */
>  
> -class GTY(()) bitmap_head {
> 

[PATCH] tree-optimization/111614 - missing convert in undistribute_bitref_for_vector

2023-09-28 Thread Richard Biener
The following adjusts a flawed guard for converting the first vector
of the sum we create in undistribute_bitref_for_vector.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111614
* tree-ssa-reassoc.cc (undistribute_bitref_for_vector): Properly
convert the first vector when required.

* gcc.dg/torture/pr111614.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr111614.c | 23 +
 gcc/tree-ssa-reassoc.cc | 27 ++---
 2 files changed, 38 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111614.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr111614.c 
b/gcc/testsuite/gcc.dg/torture/pr111614.c
new file mode 100644
index 000..0f3ecbae86c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111614.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+
+int a, b, c, d, e;
+static void f() {
+  int *g = 
+  b = 1;
+  for (; b >= 0; b--) {
+c = 0;
+for (; c <= 1; c++)
+  e = 0;
+for (; e <= 1; e++) {
+  int h, i = h = 13;
+  for (; h; h--)
+i = i << a;
+  d &= i + c + 9 + *g;
+}
+  }
+}
+int main() {
+  f();
+  for (;;)
+;
+}
diff --git a/gcc/tree-ssa-reassoc.cc b/gcc/tree-ssa-reassoc.cc
index eda03bf98a6..41ee36413b5 100644
--- a/gcc/tree-ssa-reassoc.cc
+++ b/gcc/tree-ssa-reassoc.cc
@@ -2102,12 +2102,24 @@ undistribute_bitref_for_vector (enum tree_code opcode,
{
  sum = build_and_add_sum (vec_type, sum_vec,
   valid_vecs[i + 1], opcode);
+ /* Update the operands only after build_and_add_sum,
+so that we don't have to repeat the placement algorithm
+of build_and_add_sum.  */
+ if (sum_vec == tvec
+ && !useless_type_conversion_p (vec_type, TREE_TYPE (sum_vec)))
+   {
+ gimple_stmt_iterator gsi = gsi_for_stmt (sum);
+ tree vce = build1 (VIEW_CONVERT_EXPR, vec_type, sum_vec);
+ tree lhs = make_ssa_name (vec_type);
+ gimple *g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, vce);
+ gimple_set_uid (g, gimple_uid (sum));
+ gsi_insert_before (, g, GSI_NEW_STMT);
+ gimple_assign_set_rhs1 (sum, lhs);
+ update_stmt (sum);
+   }
  if (!useless_type_conversion_p (vec_type,
  TREE_TYPE (valid_vecs[i + 1])))
{
- /* Update the operands only after build_and_add_sum,
-so that we don't have to repeat the placement algorithm
-of build_and_add_sum.  */
  gimple_stmt_iterator gsi = gsi_for_stmt (sum);
  tree vce = build1 (VIEW_CONVERT_EXPR, vec_type,
 valid_vecs[i + 1]);
@@ -2116,15 +2128,6 @@ undistribute_bitref_for_vector (enum tree_code opcode,
  gimple_set_uid (g, gimple_uid (sum));
  gsi_insert_before (, g, GSI_NEW_STMT);
  gimple_assign_set_rhs2 (sum, lhs);
- if (sum_vec == tvec)
-   {
- vce = build1 (VIEW_CONVERT_EXPR, vec_type, sum_vec);
- lhs = make_ssa_name (vec_type);
- g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, vce);
- gimple_set_uid (g, gimple_uid (sum));
- gsi_insert_before (, g, GSI_NEW_STMT);
- gimple_assign_set_rhs1 (sum, lhs);
-   }
  update_stmt (sum);
}
  sum_vec = gimple_get_lhs (sum);
-- 
2.35.3


[PATCH] vec.h, v3: Make some ops work with non-trivially copy constructible and/or destructible types

2023-09-28 Thread Jakub Jelinek
Hi!

On Wed, Sep 27, 2023 at 12:46:45PM +0200, Jakub Jelinek wrote:
> On Wed, Sep 27, 2023 at 07:17:22AM +, Richard Biener wrote:
> > OK I guess.  Can you summarize the limitations for non-POD types
> > in the big comment at the start of vec.h?
> 
> Still haven't done that, but will do after we flesh out the details
> below.
> 
> > (can we put in static_asserts
> > in the places that obviously do not work?)
> 
> I've tried to do this though, I think the static_asserts will allow
> making sure we only use what is supportable and will serve better than
> any kind of comment.

The following patch adds the file comment, as discussed on IRC adds an
exception for qsort/sort/stablesort such that std::pair of 2 trivially
copyable types is also allowed, and fixes some of the grow vs. grow_cleared
issues (on top of the bitmap_head_pod patch far more), but still not all
yet, so I've kept that static_assert for now commented out.  Richard
Sandiford said he's playing with poly_int_pod vs. poly_int and I'll resolve
the remaining stuff incrementally afterwards plus enable the assert.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-09-28  Jakub Jelinek  
Jonathan Wakely  

* vec.h: Mention in file comment limited support for non-POD types
in some operations.
(vec_destruct): New function template.
(release): Use it for non-trivially destructible T.
(truncate): Likewise.
(quick_push): Perform a placement new into slot
instead of assignment.
(pop): For non-trivially destructible T return void
rather than T & and destruct the popped element.
(quick_insert, ordered_remove): Note that they aren't suitable
for non-trivially copyable types.  Add static_asserts for that.
(block_remove): Assert T is trivially copyable.
(vec_detail::is_trivially_copyable_or_pair): New trait.
(qsort, sort, stablesort): Assert T is trivially copyable or
std::pair with both trivally copyable types.
(quick_grow): Add assert T is trivially default constructible,
for now commented out.
(quick_grow_cleared): Don't call quick_grow, instead inline it
by hand except for the new static_assert.
(gt_ggc_mx): Assert T is trivially destructable.
(auto_vec::operator=): Formatting fixes.
(auto_vec::auto_vec): Likewise.
(vec_safe_grow_cleared): Don't call vec_safe_grow, instead inline
it manually and call quick_grow_cleared method rather than quick_grow.
(safe_grow_cleared): Likewise.
* edit-context.cc (class line_event): Move definition earlier.
* tree-ssa-loop-im.cc (seq_entry::seq_entry): Make default ctor
defaulted.
* ipa-fnsummary.cc (evaluate_properties_for_edge): Use
safe_grow_cleared instead of safe_grow followed by placement new
constructing the elements.

--- gcc/vec.h.jj2023-09-27 10:38:50.635845540 +0200
+++ gcc/vec.h   2023-09-28 11:05:14.776215137 +0200
@@ -111,6 +111,24 @@ extern void *ggc_realloc (void *, size_t
the 'space' predicate will tell you whether there is spare capacity
in the vector.  You will not normally need to use these two functions.
 
+   Not all vector operations support non-POD types and such restrictions
+   are enforced through static assertions.  Some operations which often use
+   memmove to move elements around like quick_insert, safe_insert,
+   ordered_remove, unordered_remove, block_remove etc. require trivially
+   copyable types.  Sorting operations, qsort, sort and stablesort, require
+   those too but as an extension allow also std::pair of 2 trivially copyable
+   types which happens to work even when std::pair itself isn't trivially
+   copyable.  The quick_grow and safe_grow operations require trivially
+   default constructible types.  One can use quick_grow_cleared or
+   safe_grow_cleared for non-trivially default constructible types if needed
+   (but of course such operation is more expensive then).  The pop operation
+   returns reference to the last element only for trivially destructible
+   types, for non-trivially destructible types one should use last operation
+   followed by pop which in that case returns void.
+   And finally, the GC and GC atomic vectors should always be used with
+   trivially destructible types, as nothing will invoke destructors when they
+   are freed during GC.
+
Notes on the different layout strategies
 
* Embeddable vectors (vec)
@@ -185,6 +203,16 @@ extern void dump_vec_loc_statistics (voi
 /* Hashtable mapping vec addresses to descriptors.  */
 extern htab_t vec_mem_usage_hash;
 
+/* Destruct N elements in DST.  */
+
+template 
+inline void
+vec_destruct (T *dst, unsigned n)
+{
+  for ( ; n; ++dst, --n)
+dst->~T ();
+}
+
 /* Control data for vectors.  This contains the number of allocated
and used slots inside a vector.  */
 
@@ -310,6 +338,9 @@ 

[PATCH] bitmap: Introduce bitmap_head_pod

2023-09-28 Thread Jakub Jelinek
Hi!

The following patch splits the bitmap_head class into a POD
struct bitmap_head_pod and bitmap_head derived from it with non-trivial
default constexpr constructor.  Most code should keep using bitmap_head
as before, bitmap_head_pod is there just for the cases where we want to
embed the bitmap head into a vector which we want to e.g. {quick,safe}_grow
and in a loop bitmap_initialize it afterwards (to avoid having to
{quick,safe}_grow_cleared them just to overwrite with bitmap_initialize).
The patch is larger than I hoped, because e.g. some code just used bitmap
and bitmap_head * or const_bitmap and const bitmap_head * interchangeably.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-09-28  Jakub Jelinek  

* coretypes.h (struct bitmap_head_pod): New forward declaration.
(bitmap, const_bitmap): Replace bitmap_head with bitmap_head_pod
in typedefs.
* bitmap.h (struct bitmap_obstack): Change heads element type to
bitmap_head_pod *.
(struct bitmap_head_pod): New type.
(class bitmap_head): Changed into derived class from bitmap_head_pod
with a non-trivial constexpr default ctor.
* bitmap.cc (bitmap_head::crashme): Remove.
(bitmap_head_pod::crashme): New static data member.
(debug): Change bitmap_head to bitmap_head_pod.
(bitmap_head::dump): Renamed to ...
(bitmap_head_pod::dump): ... this.
* tree-ssa-live.h (compute_live_vars, live_vars_at_stmt,
destroy_live_vars): Change vec to vec.
* tree-ssa-live.cc (struct compute_live_vars_data): Change active
member type to vec.
(compute_live_vars, live_vars_at_stmt, destroy_live_vars): Change
vec to vec.
* sel-sched-ir.h (forced_ebb_heads): Change type to bitmap.
* gimple-range-path.cc (path_range_query::path_range_query,
path_range_query::reset_path, path_range_query::compute_ranges):
Change const bitmap_head * in argument types to const_bitmap.
* gimple-range-path.h (path_range_query::path_range_query,
path_range_query::reset_path, path_range_query::compute_ranges):
Likewise.
(path_range_query::compute_exit_dependencies): Change bitmap_head *
in argument type to bitmap.
* tree-ssa-loop-im.cc (memory_accesses): Change type of 3 members
from vec to vec.
* tree-tailcall.cc (live_vars_vec): Change type from vec
to vec.
* cfganal.cc (compute_dominance_frontiers, compute_idf): Change
argument types from bitmap_head * to bitmap.
* rtl-ssa/internals.h (function_info::bb_phi_info): Change regs
member type from bitmap_head to bitmap_head_pod.
* rtl-ssa/blocks.cc (function_info::place_phis): Change frontiers
and unfiltered variable types from auto_vec to
auto_vec.
* tree-ssa-pre.cc (class bitmap_set): Change expressions and values
member types from bitmap_head to bitmap_head_pod.
* tree-inline.cc (add_clobbers_to_eh_landing_pad): Change live
variable type from vec to vec.
* cfganal.h (class control_dependences): Change control_dependence_map
member type from vec to vec.
(compute_dominance_frontiers, compute_idf): Change argument types from
class bitmap_head * to bitmap.

--- gcc/coretypes.h.jj  2023-08-24 15:37:28.233424065 +0200
+++ gcc/coretypes.h 2023-09-27 14:17:00.764453202 +0200
@@ -47,9 +47,10 @@ typedef int64_t gcov_type;
 typedef uint64_t gcov_type_unsigned;
 
 struct bitmap_obstack;
+struct bitmap_head_pod;
 class bitmap_head;
-typedef class bitmap_head *bitmap;
-typedef const class bitmap_head *const_bitmap;
+typedef class bitmap_head_pod *bitmap;
+typedef const class bitmap_head_pod *const_bitmap;
 struct simple_bitmap_def;
 typedef struct simple_bitmap_def *sbitmap;
 typedef const struct simple_bitmap_def *const_sbitmap;
--- gcc/bitmap.h.jj 2023-04-19 09:33:59.141354388 +0200
+++ gcc/bitmap.h2023-09-27 15:13:45.700647295 +0200
@@ -293,7 +293,7 @@ typedef unsigned long BITMAP_WORD;
 /* Obstack for allocating bitmaps and elements from.  */
 struct bitmap_obstack {
   struct bitmap_element *elements;
-  bitmap_head *heads;
+  bitmap_head_pod *heads;
   struct obstack obstack;
 };
 
@@ -323,15 +323,16 @@ struct GTY((chain_next ("%h.next"))) bit
 };
 
 /* Head of bitmap linked list.  The 'current' member points to something
-   already pointed to by the chain started by first, so GTY((skip)) it.  */
+   already pointed to by the chain started by first, so GTY((skip)) it.
+   desc("0"), tag("0") is just to make gengtype happy, for GC there is
+   no difference between bitmap_head_pod and bitmap_head types.  */
 
-class GTY(()) bitmap_head {
-public:
+struct GTY((desc("0"), tag("0"))) bitmap_head_pod {
   static bitmap_obstack crashme;
-  /* Poison obstack to not make it not a valid initialized GC bitmap.  */
-  CONSTEXPR bitmap_head()
-: indx (0), 

Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]

2023-09-28 Thread HAO CHEN GUI
Kewen and Richard,
  Thanks for your comments. Please let me clarify it.

在 2023/9/27 19:10, Richard Sandiford 写道:
> Yeah, I agree there doesn't seem to be a good reason to exclude vectors.
> Sorry to dive straight into details, but maybe we should have something
> called bitwise_mode_for_size that tries to use integer modes where possible,
> but falls back to vector modes otherwise.  That mode could then be used
> for copying, storing, bitwise ops, and equality comparisons (if there
> is appropriate optabs support).

  The vector mode is not supported for compare_by_pieces and move_by_pieces.
But it is supported for set_by_pieces and clear_by_pieces. The help function
widest_fixed_size_mode_for_size returns vector mode when qi_vector is set to
true.

static fixed_size_mode
widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector)

I tried to enable qi_vector for compare_by_pieces. It can pick up a vector
mode (eg. V16QImode) and works on some cases. But it fails on a constant
string case.

int compare (const char* s1)
{
  return __builtin_memcmp_eq (s1, "__GCC_HAVE_DWARF2_CFI_ASM", 16);
}

As the second op is a constant string, it calls builtin_memcpy_read_str to
build the string. Unfortunately, the inner function doesn't support
vector mode.

  /* The by-pieces infrastructure does not try to pick a vector mode
 for memcpy expansion.  */
  return c_readstr (rep + offset, as_a  (mode),
/*nul_terminated=*/false);

Seems by-pieces infrastructure itself supports vector mode, but low level
functions do not.

I think there are two ways enable vector mode for compare_by_pieces.
One is to modify the by-pieces infrastructure . Another is to enable it
by cmpmem expand. The expand is target specific and be flexible.

What's your opinion?

Thanks
Gui Haochen


Re: [committed] libstdc++: Add GDB printers for types

2023-09-28 Thread Jonathan Wakely
On Wed, 27 Sept 2023 at 20:57, Jonathan Wakely 
wrote:
>
>
>
> On Wed, 27 Sept 2023, 18:25 Tom Tromey via Libstdc++, <
libstd...@gcc.gnu.org> wrote:
>>
>> >> I have fixes for most of the issues that are worth fixing (I didn't
>> >> bother with line lengths -- FWIW in gdb we just run 'black' and don't
>> >> worry about these details),
>>
>> Jonathan> I used autopep8 and committed the result as
>> Jonathan> e08559271b2d797f658579ac8610dbf5e58bcfd8 so the line lengths
>> Jonathan> should be OK now.
>>
>> Yeah, my patches are on top of that, but flake8 still complains, and I
>> still see lines > 79 characters.  However maybe flake8 isn't the checker
>> you want to use, or maybe you have something set up for a different line
>> length?
>
>
> I don't think I have anything set up for python formatting at all, I just
committed whatever autopep8 did with its default settings.

It looks like adding the -a flag would have made more changes.

>
> If that's suboptimal, we can consider other tools, if they're reliable
and easy to run.

The changes made by black seem reasonable, though I prefer it with -S to
disable string-normalization. It also needs an option to use 79 as the
maximum line length.


Re: [PATCH v1] RISC-V: Support {U}INT64 to FP16 auto-vectorization

2023-09-28 Thread juzhe.zh...@rivai.ai
Plz add "!flag_trapping_math"



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-28 13:59
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support {U}INT64 to FP16 auto-vectorization
From: Pan Li 
 
This patch would like to support the auto-vectorization from
the INT64 to FP16. We take below steps for the conversion.
 
* INT64 to FP32.
* FP32 to FP16.
 
Given sample code as below:
void
test_func (int64_t * __restrict a, _Float16 *b, unsigned n)
{
  for (unsigned i = 0; i < n; i++)
b[i] = (_Float16) (a[i]);
}
 
Before this patch:
test.c:6:26: missed: couldn't vectorize loop
test.c:6:26: missed: not vectorized: unsupported data-type
ld  a0,0(s0)
call__floatdihf
fsh fa0,0(s1)
addis0,s0,8
addis1,s1,2
bne s2,s0,.L3
ld  ra,24(sp)
ld  s0,16(sp)
ld  s1,8(sp)
ld  s2,0(sp)
addisp,sp,32
 
After this patch:
vsetvli a5,a2,e8,mf8,ta,ma
vle64.v v1,0(a0)
vsetvli a4,zero,e32,mf2,ta,ma
vfncvt.f.x.wv1,v1
vsetvli zero,zero,e16,mf4,ta,ma
vfncvt.f.f.wv1,v1
vsetvli zero,a2,e16,mf4,ta,ma
vse16.v v1,0(a1)
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
PR target/111506
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2):
* config/riscv/vector-iterators.md:
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c:
Adjust checker.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cvt-0.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 24 ++
gcc/config/riscv/vector-iterators.md  | 38 +++
.../autovec/conversions/vfncvt-itof-rv32gcv.c |  5 +-
.../autovec/conversions/vfncvt-itof-rv64gcv.c |  5 +-
.../gcc.target/riscv/rvv/autovec/unop/cvt-0.c | 21 +
.../gcc.target/riscv/rvv/autovec/unop/cvt-1.c | 22 +
.../gcc.target/riscv/rvv/autovec/vls/cvt-0.c  | 47 +++
7 files changed, 158 insertions(+), 4 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cvt-0.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd0cbdd2889..6dd3b96a423 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -974,6 +974,30 @@ (define_insn_and_split "2"
}
[(set_attr "type" "vfncvtitof")])
+;; This operation can be performed in the loop vectorizer but unfortunately
+;; not applicable for now. We can remove this pattern after loop vectorizer
+;; is able to take care of INT64 to FP16 conversion.
+(define_insn_and_split "2"
+  [(set (match_operand:  0 "register_operand")
+ (any_float:
+   (match_operand:VWWCONVERTI 1 "register_operand")))]
+  "TARGET_VECTOR && TARGET_ZVFH && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx single = gen_reg_rtx (mode); /* Get vector SF mode.  */
+
+/* Step-1, INT64 => FP32.  */
+emit_insn (gen_2 (single, operands[1]));
+/* Step-2, FP32 => FP16.  */
+emit_insn (gen_trunc2 (operands[0], single));
+
+DONE;
+  }
+  [(set_attr "type" "vfncvtitof")]
+)
+
;; =
;; == Unary arithmetic
;; =
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index b6cd872eb42..c9a7344b1bc 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1247,6 +1247,24 @@ (define_mode_iterator VWCONVERTI [
   (V512DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && 
TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 4096")
])
+(define_mode_iterator VWWCONVERTI [
+  (RVVM8DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM4DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM2DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (RVVM1DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+
+  (V1DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V2DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V4DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH")
+  (V8DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 64")
+  (V16DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 128")
+  (V32DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 256")
+  (V64DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 512")
+  (V128DI "TARGET_VECTOR_VLS &&