Re: Re: [PATCH] RISC-V: Add Zvfbfmin extension to the -march= option

2023-12-15 Thread Xiao Zeng
2023-12-16 03:27  Jeff Law  wrote:
>
 
>
>
>On 12/12/23 20:24, Xiao Zeng wrote:
>> This patch would like to add new sub extension (aka Zvfbfmin) to the
>> -march= option. It introduces a new data type BF16.
>>
>> Depending on different usage scenarios, the Zvfbfmin extension may
>> depend on 'V' or 'Zve32f'. This patch only implements dependencies
>> in scenario of Embedded Processor. In scenario of Application
>> Processor, it is necessary to explicitly indicate the dependent
>> 'V' extension.
>>
>> You can locate more information about Zvfbfmin from below spec doc.
>>
>> https://github.com/riscv/riscv-bfloat16/releases/download/20231027/riscv-bfloat16.pdf
>>
>> gcc/ChangeLog:
>>
>> * common/config/riscv/riscv-common.cc:
>> (riscv_implied_info): Add zvfbfmin item.
>>  (riscv_ext_version_table): Ditto.
>>  (riscv_ext_flag_table): Ditto.
>> * config/riscv/riscv.opt:
>> (MASK_ZVFBFMIN): New macro.
>> (MASK_VECTOR_ELEN_BF_16): Ditto.
>> (TARGET_ZVFBFMIN): Ditto.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/arch-31.c: New test.
>> * gcc.target/riscv/arch-32.c: New test.
>> * gcc.target/riscv/predef-32.c: New test.
>> * gcc.target/riscv/predef-33.c: New test.
>I fixed the trivial whitespace issue with the ChangeLog and pushed this
>to the trunk. 
Thank you, Jeff. I will pay attention to these issues in the future patches.

>However, I do want to stress that all future
>contributions need to indicate that the patch was successfully
>regression tested. 
Similarly, this should also be indicated.

>
>jeff
 
Thanks
Xiao Zeng



Re: [committed v4 5/5] aarch64: Add function multiversioning support

2023-12-15 Thread Ramana Radhakrishnan
On Sat, Dec 16, 2023 at 6:18 AM Andrew Carlotti  wrote:
>
> This adds initial support for function multiversioning on aarch64 using
> the target_version and target_clones attributes.  This loosely follows
> the Beta specification in the ACLE [1], although with some differences
> that still need to be resolved (possibly as follow-up patches).
>
> Existing function multiversioning implementations are broken in various
> ways when used across translation units.  This includes placing
> resolvers in the wrong translation units, and using symbol mangling that
> callers to unintentionally bypass the resolver in some circumstances.
> Fixing these issues for aarch64 will require modifications to our ACLE
> specification.  It will also require further adjustments to existing
> middle end code, to facilitate different mangling and resolver
> placement while preserving existing target behaviours.
>
> The list of function multiversioning features specified in the ACLE is
> also inconsistent with the list of features supported in target option
> extensions.  I intend to resolve some or all of these inconsistencies at
> a later stage.
>
> The target_version attribute is currently only supported in C++, since
> this is the only frontend with existing support for multiversioning
> using the target attribute.  On the other hand, this patch happens to
> enable multiversioning with the target_clones attribute in Ada and D, as
> well as the entire C family, using their existing frontend support.
>
> This patch also does not support the following aspects of the Beta
> specification:
>
> - The target_clones attribute should allow an implicit unlisted
>   "default" version.
> - There should be an option to disable function multiversioning at
>   compile time.
> - Unrecognised target names in a target_clones attribute should be
>   ignored (with an optional warning).  This current patch raises an
>   error instead.
>
> [1] 
> https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
>
> Committed as approved with the coding convention fix, plus some adjustments to
> aarch64-option-extensions.def to accommodate recent changes on master. The
> series passed regression testing as a whole post-rebase on aarch64.

Pretty neat, very nice to see this work land - I would consider this
for the NEWS page for GCC-14.

Ramana

>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-feature-deps.h (fmv_deps_):
> Define aarch64_feature_flags mask foreach FMV feature.
> * config/aarch64/aarch64-option-extensions.def: Use new macros
> to define FMV feature extensions.
> * config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
> Check for target_version attribute after processing target
> attribute.
> (aarch64_fmv_feature_data): New.
> (aarch64_parse_fmv_features): New.
> (aarch64_process_target_version_attr): New.
> (aarch64_option_valid_version_attribute_p): New.
> (get_feature_mask_for_version): New.
> (compare_feature_masks): New.
> (aarch64_compare_version_priority): New.
> (build_ifunc_arg_type): New.
> (make_resolver_func): New.
> (add_condition_to_bb): New.
> (dispatch_function_versions): New.
> (aarch64_generate_version_dispatcher_body): New.
> (aarch64_get_function_versions_dispatcher): New.
> (aarch64_common_function_versions): New.
> (aarch64_mangle_decl_assembler_name): New.
> (TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
> (TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
> (TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
> (TARGET_COMPARE_VERSION_PRIORITY): New implementation.
> (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
> (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
> (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
> * config/aarch64/aarch64.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE):
> Set target macro.
> * config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
> new value to report duplicate FMV feature.
> * common/config/aarch64/cpuinfo.h: New file.
>
> libgcc/ChangeLog:
>
> * config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
> copy in gcc/common
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/options_set_17.c: Reorder expected flags.
> * gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
> * 

Re: [PATCH] c++: Fix unchecked use of CLASSTYPE_AS_BASE [PR113031]

2023-12-15 Thread Jason Merrill

On 12/15/23 19:20, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu with GLIBCXX_TESTSUITE_STDS=20
and RUNTESTFLAGS="--target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0".


OK, thanks.


-- >8 --

My previous patch (naively) assumed that a TREE_CODE of RECORD_TYPE or
UNION_TYPE was sufficient for optype to be considered a "class type".
However, this does not account for e.g. template type parameters of
record or union type. This patch corrects to check for CLASS_TYPE_P
before checking for as-base conversion.

PR c++/113031

gcc/cp/ChangeLog:

* constexpr.cc (cxx_fold_indirect_ref_1): Check for CLASS_TYPE
before using CLASSTYPE_AS_BASE.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/pr113031.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/constexpr.cc   |  3 ++-
  gcc/testsuite/g++.dg/cpp0x/pr113031.C | 34 +++
  2 files changed, 36 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr113031.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index e1b2d27fc36..051f73fb73f 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -5709,7 +5709,8 @@ cxx_fold_indirect_ref_1 (const constexpr_ctx *ctx, 
location_t loc, tree type,
  }
  
/* Handle conversion to "as base" type.  */

-  if (CLASSTYPE_AS_BASE (optype) == type)
+  if (CLASS_TYPE_P (optype)
+ && CLASSTYPE_AS_BASE (optype) == type)
return op;
  
/* Handle conversion to an empty base class, which is represented with a

diff --git a/gcc/testsuite/g++.dg/cpp0x/pr113031.C 
b/gcc/testsuite/g++.dg/cpp0x/pr113031.C
new file mode 100644
index 000..aecdc3fc4b2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/pr113031.C
@@ -0,0 +1,34 @@
+// PR c++/113031
+// { dg-do compile }
+
+template  struct variant;
+
+template 
+variant<_Types> __variant_cast(_Tp __rhs) { return 
static_cast&>(__rhs); }
+
+template 
+struct _Move_assign_base : _Types {
+  void operator=(_Move_assign_base __rhs) { __variant_cast<_Types>(__rhs); }
+};
+
+template 
+struct variant : _Move_assign_base<_Types> {
+  void emplace() {
+variant __tmp;
+*this = __tmp;
+  }
+};
+
+struct _Undefined_class {
+  struct _Nocopy_types {
+void (_Undefined_class::*_M_member_pointer)();
+  };
+  struct function : _Nocopy_types {
+struct optional {
+  void test03() {
+variant v;
+v.emplace();
+  }
+};
+  };
+};




[committed v4 5/5] aarch64: Add function multiversioning support

2023-12-15 Thread Andrew Carlotti
This adds initial support for function multiversioning on aarch64 using
the target_version and target_clones attributes.  This loosely follows
the Beta specification in the ACLE [1], although with some differences
that still need to be resolved (possibly as follow-up patches).

Existing function multiversioning implementations are broken in various
ways when used across translation units.  This includes placing
resolvers in the wrong translation units, and using symbol mangling that
callers to unintentionally bypass the resolver in some circumstances.
Fixing these issues for aarch64 will require modifications to our ACLE
specification.  It will also require further adjustments to existing
middle end code, to facilitate different mangling and resolver
placement while preserving existing target behaviours.

The list of function multiversioning features specified in the ACLE is
also inconsistent with the list of features supported in target option
extensions.  I intend to resolve some or all of these inconsistencies at
a later stage.

The target_version attribute is currently only supported in C++, since
this is the only frontend with existing support for multiversioning
using the target attribute.  On the other hand, this patch happens to
enable multiversioning with the target_clones attribute in Ada and D, as
well as the entire C family, using their existing frontend support.

This patch also does not support the following aspects of the Beta
specification:

- The target_clones attribute should allow an implicit unlisted
  "default" version.
- There should be an option to disable function multiversioning at
  compile time.
- Unrecognised target names in a target_clones attribute should be
  ignored (with an optional warning).  This current patch raises an
  error instead.

[1] 
https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

Committed as approved with the coding convention fix, plus some adjustments to
aarch64-option-extensions.def to accommodate recent changes on master. The
series passed regression testing as a whole post-rebase on aarch64.

gcc/ChangeLog:

* config/aarch64/aarch64-feature-deps.h (fmv_deps_):
Define aarch64_feature_flags mask foreach FMV feature.
* config/aarch64/aarch64-option-extensions.def: Use new macros
to define FMV feature extensions.
* config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
Check for target_version attribute after processing target
attribute.
(aarch64_fmv_feature_data): New.
(aarch64_parse_fmv_features): New.
(aarch64_process_target_version_attr): New.
(aarch64_option_valid_version_attribute_p): New.
(get_feature_mask_for_version): New.
(compare_feature_masks): New.
(aarch64_compare_version_priority): New.
(build_ifunc_arg_type): New.
(make_resolver_func): New.
(add_condition_to_bb): New.
(dispatch_function_versions): New.
(aarch64_generate_version_dispatcher_body): New.
(aarch64_get_function_versions_dispatcher): New.
(aarch64_common_function_versions): New.
(aarch64_mangle_decl_assembler_name): New.
(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
* config/aarch64/aarch64.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE):
Set target macro.
* config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
new value to report duplicate FMV feature.
* common/config/aarch64/cpuinfo.h: New file.

libgcc/ChangeLog:

* config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
copy in gcc/common

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_17.c: Reorder expected flags.
* gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.


diff --git a/gcc/common/config/aarch64/cpuinfo.h 
b/gcc/common/config/aarch64/cpuinfo.h
new file mode 100644
index 

[committed v4 4/5] Add support for target_version attribute

2023-12-15 Thread Andrew Carlotti
This patch adds support for the "target_version" attribute to the middle
end and the C++ frontend, which will be used to implement function
multiversioning in the aarch64 backend.

On targets that don't use the "target" attribute for multiversioning,
there is no conflict between the "target" and "target_clones"
attributes.  This patch therefore makes the mutual exclusion in
C-family, D and Ada conditonal upon the value of the
expanded_clones_attribute target hook.

The "target_version" attribute is only added to C++ in this patch,
because this is currently the only frontend which supports
multiversioning using the "target" attribute.  Support for the
"target_version" attribute will be extended to C at a later date.

Targets that currently use the "target" attribute for function
multiversioning (i.e. i386 and rs6000) are not affected by this patch.

Committed as approved with adjustments to comments in c-attribs.

gcc/ChangeLog:

* attribs.cc (decl_attributes): Pass attribute name to target.
(is_function_default_version): Update comment to specify
incompatibility with target_version attributes.
* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
Call valid_version_attribute_p for target_version attributes.
* defaults.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): New macro.
* target.def (valid_version_attribute_p): New hook.
* doc/tm.texi.in: Add new hook.
* doc/tm.texi: Regenerate.
* multiple_target.cc (create_dispatcher_calls): Remove redundant
is_function_default_version check.
(expand_target_clones): Use target macro to pick attribute name.
* targhooks.cc (default_target_option_valid_version_attribute_p):
New.
* targhooks.h (default_target_option_valid_version_attribute_p):
New.
* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
target_version attributes.

gcc/c-family/ChangeLog:

* c-attribs.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto, and add target_version.
(attr_target_version_exclusions): New.
(c_common_attribute_table): Add target_version.
(handle_target_version_attribute): New.
(handle_target_attribute): Amend comment.
(handle_target_clones_attribute): Ditto.

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto.

gcc/d/ChangeLog:

* d-attribs.cc (attr_target_exclusions): Make
target/target_clones exclusion target-dependent.
(attr_target_clones_exclusions): Ditto.

gcc/cp/ChangeLog:

* decl2.cc (check_classfn): Update comment to include
target_version attributes.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 
3eabbec6bd34116910a0589b4ebf269b916cc607..17f6afd687d1dbd7648d52d86417414b04c0d896
 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -146,14 +146,16 @@ static const struct attribute_spec::exclusions 
attr_noinline_exclusions[] =
 
 static const struct attribute_spec::exclusions attr_target_exclusions[] =
 {
-  { "target_clones", true, true, true },
+  { "target_clones", TARGET_HAS_FMV_TARGET_ATTRIBUTE,
+TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE },
   { NULL, false, false, false },
 };
 
 static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
 {
   { "always_inline", true, true, true },
-  { "target", true, true, true },
+  { "target", TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE,
+TARGET_HAS_FMV_TARGET_ATTRIBUTE },
   { NULL, false, false, false },
 };
 
diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 
4e313d38f0f0608991c3267f55f43e3f0dd9d74a..0ca2779788569b7a02a79eab4db558df112aff87
 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -675,7 +675,8 @@ decl_attributes (tree *node, tree attributes, int flags,
  options to the attribute((target(...))) list.  */
   if (TREE_CODE (*node) == FUNCTION_DECL
   && current_target_pragma
-  && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
+  && targetm.target_option.valid_attribute_p (*node,
+ get_identifier ("target"),
  current_target_pragma, 0))
 {
   tree cur_attr = lookup_attribute ("target", attributes);
@@ -1276,8 +1277,9 @@ make_dispatcher_decl (const tree decl)
   return func_decl;  
 }
 
-/* Returns true if decl is multi-versioned and DECL is the default function,
-   that is it is not tagged with target specific optimization.  */
+/* Returns true if DECL is multi-versioned using the target attribute, and this
+   is the default version.  This function can only be used for targets that do
+   

Re: [PATCH] libstdc++: Make __gnu_debug::vector usable in constant expressions [PR109536]

2023-12-15 Thread Patrick Palka
On Wed, 6 Dec 2023, Jonathan Wakely wrote:

> Any comments on this approach?
> 
> -- >8 --
> 
> This makes constexpr std::vector (mostly) work in Debug Mode. All safe
> iterator instrumentation and checking is disabled during constant
> evaluation, because it requires mutex locks and calls to non-inline
> functions defined in libstdc++.so. It should be OK to disable the safety
> checks, because most UB should be detected during constant evaluation
> anyway.
> 
> We could try to enable the full checking in constexpr, but it would mean
> wrapping all the non-inline functions like _M_attach with an inline
> _M_constexpr_attach that does the iterator housekeeping inline without
> mutex locks when calling for constant evaluation, and calls the
> non-inline function at runtime. That could be done in future if we find
> that we've lost safety or useful checking by disabling the safe
> iterators.
> 
> There are a few test failures in C++20 mode, which I'm unable to
> explain. The _Safe_iterator::operator++() member gives errors for using
> non-constexpr functions during constant evaluation, even though those
> functions are guarded by std::is_constant_evaluated() checks. The same
> code works fine for C++23 and up.

AFAICT these C++20 test failures are really due to the variable
definition of non-literal type

381__gnu_cxx::__scoped_lock __l(this->_M_get_mutex());

which were prohibited in a constexpr function (even if that code was
never executed) until C++23's P2242R3.

We can use an immediately invoked lambda to work around this:

381[this] {
382  __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
383  ++base();
384}();
385return *this;

> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/109536
>   * include/bits/c++config (__glibcxx_constexpr_assert): Remove
>   macro.
>   * include/bits/stl_algobase.h (__niter_base, __copy_move_a)
>   (__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
>   (__lexicographical_compare_aux): Add constexpr to overloads for
>   debug mode iterators.
>   * include/debug/helper_functions.h (__unsafe): Add constexpr.
>   * include/debug/macros.h (_GLIBCXX_DEBUG_VERIFY_COND_AT): Remove
>   macro, folding it into ...
>   (_GLIBCXX_DEBUG_VERIFY_AT_F): ... here. Do not use
>   __glibcxx_constexpr_assert.
>   * include/debug/safe_base.h (_Safe_iterator_base): Add constexpr
>   to some member functions. Omit attaching, detaching and checking
>   operations during constant evaluation.
>   * include/debug/safe_container.h (_Safe_container): Likewise.
>   * include/debug/safe_iterator.h (_Safe_iterator): Likewise.
>   * include/debug/safe_iterator.tcc (__niter_base, __copy_move_a)
>   (__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
>   (__lexicographical_compare_aux): Add constexpr.
>   * include/debug/vector (_Safe_vector, vector): Add constexpr.
>   Omit safe iterator operations during constant evaluation.
>   * testsuite/23_containers/vector/bool/capacity/constexpr.cc:
>   Remove dg-xfail-if for debug mode.
>   * testsuite/23_containers/vector/bool/cmp_c++20.cc: Likewise.
>   * testsuite/23_containers/vector/bool/cons/constexpr.cc:
>   Likewise.
>   * testsuite/23_containers/vector/bool/element_access/1.cc:
>   Likewise.
>   * testsuite/23_containers/vector/bool/element_access/constexpr.cc:
>   Likewise.
>   * testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc:
>   Likewise.
>   * testsuite/23_containers/vector/bool/modifiers/constexpr.cc:
>   Likewise.
>   * testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc:
>   Likewise.
>   * testsuite/23_containers/vector/capacity/constexpr.cc:
>   Likewise.
>   * testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
>   * testsuite/23_containers/vector/cons/constexpr.cc: Likewise.
>   * testsuite/23_containers/vector/data_access/constexpr.cc:
>   Likewise.
>   * testsuite/23_containers/vector/element_access/constexpr.cc:
>   Likewise.
>   * testsuite/23_containers/vector/modifiers/assign/constexpr.cc:
>   Likewise.
>   * testsuite/23_containers/vector/modifiers/constexpr.cc:
>   Likewise.
>   * testsuite/23_containers/vector/modifiers/swap/constexpr.cc:
>   Likewise.
> ---
>  libstdc++-v3/include/bits/c++config   |   9 -
>  libstdc++-v3/include/bits/stl_algobase.h  |  15 ++
>  libstdc++-v3/include/debug/helper_functions.h |   1 +
>  libstdc++-v3/include/debug/macros.h   |   9 +-
>  libstdc++-v3/include/debug/safe_base.h|  35 +++-
>  libstdc++-v3/include/debug/safe_container.h   |  15 +-
>  libstdc++-v3/include/debug/safe_iterator.h| 186 +++---
>  libstdc++-v3/include/debug/safe_iterator.tcc  |  15 ++
>  libstdc++-v3/include/debug/vector | 146 --
>  .../vector/bool/capacity/constexpr.cc |   1 -
>  

[PATCH] c++: Fix unchecked use of CLASSTYPE_AS_BASE [PR113031]

2023-12-15 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu with GLIBCXX_TESTSUITE_STDS=20
and RUNTESTFLAGS="--target_board=unix/-D_GLIBCXX_USE_CXX11_ABI=0".

-- >8 --

My previous patch (naively) assumed that a TREE_CODE of RECORD_TYPE or
UNION_TYPE was sufficient for optype to be considered a "class type".
However, this does not account for e.g. template type parameters of
record or union type. This patch corrects to check for CLASS_TYPE_P
before checking for as-base conversion.

PR c++/113031

gcc/cp/ChangeLog:

* constexpr.cc (cxx_fold_indirect_ref_1): Check for CLASS_TYPE
before using CLASSTYPE_AS_BASE.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/pr113031.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/constexpr.cc   |  3 ++-
 gcc/testsuite/g++.dg/cpp0x/pr113031.C | 34 +++
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr113031.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index e1b2d27fc36..051f73fb73f 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -5709,7 +5709,8 @@ cxx_fold_indirect_ref_1 (const constexpr_ctx *ctx, 
location_t loc, tree type,
  }
 
   /* Handle conversion to "as base" type.  */
-  if (CLASSTYPE_AS_BASE (optype) == type)
+  if (CLASS_TYPE_P (optype)
+ && CLASSTYPE_AS_BASE (optype) == type)
return op;
 
   /* Handle conversion to an empty base class, which is represented with a
diff --git a/gcc/testsuite/g++.dg/cpp0x/pr113031.C 
b/gcc/testsuite/g++.dg/cpp0x/pr113031.C
new file mode 100644
index 000..aecdc3fc4b2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/pr113031.C
@@ -0,0 +1,34 @@
+// PR c++/113031
+// { dg-do compile }
+
+template  struct variant;
+
+template 
+variant<_Types> __variant_cast(_Tp __rhs) { return 
static_cast&>(__rhs); }
+
+template 
+struct _Move_assign_base : _Types {
+  void operator=(_Move_assign_base __rhs) { __variant_cast<_Types>(__rhs); }
+};
+
+template 
+struct variant : _Move_assign_base<_Types> {
+  void emplace() {
+variant __tmp;
+*this = __tmp;
+  }
+};
+
+struct _Undefined_class {
+  struct _Nocopy_types {
+void (_Undefined_class::*_M_member_pointer)();
+  };
+  struct function : _Nocopy_types {
+struct optional {
+  void test03() {
+variant v;
+v.emplace();
+  }
+};
+  };
+};
-- 
2.42.0



Re: [PATCH] RISC-V: Don't make Ztso imply A

2023-12-15 Thread Andrew Waterman
On Fri, Dec 15, 2023 at 1:38 PM Jeff Law  wrote:
>
>
>
> On 12/12/23 20:54, Palmer Dabbelt wrote:
> > I can't actually find anything in the ISA manual that makes Ztso imply
> > A.  In theory the memory ordering is just a different thing that the set
> > of availiable instructions (ie, Ztso without A would still imply TSO for
> > loads and stores).  It also seems like a configuration that could be
> > sane to build: without A it's all but impossible to write any meaningful
> > multi-core code, and TSO is really cheap for a single core.
> >
> > That said, I think it's kind of reasonable to provide A to users asking
> > for Ztso.  So maybe even if this was a mistake it's the right thing to
> > do?
> >
> > gcc/ChangeLog:
> >
> >   * common/config/riscv/riscv-common.cc (riscv_implied_info):
> >   Remove {"ztso", "a"}.
> I'd tend to think step #1 is to determine what the ISA intent is,
> meaning engagement with RVI.
>
> We've got time for that engagement and to adjust based on the result.
> So I'd tend to defer until we know if Ztso should imply A or not.

Palmer is correct.  There is no coupling between Ztso and A.  (And
there are uncontrived examples of such systems: e.g. embedded
processors without caches that don't support the LR/SC instructions,
but happen to be TSO.)

>
> jeff


Re: [PATCH v4 3/3] RISC-V: Add support for XCVbi extension in CV32E40P

2023-12-15 Thread Jeff Law




On 12/12/23 12:32, Mary Bennett wrote:

Spec: 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
   Mary Bennett 
   Nandni Jamnadas 
   Pietra Ferreira 
   Charlie Keaney
   Jessica Mills
   Craig Blackmore 
   Simon Cook 
   Jeremy Bennett 
   Helene Chelin 

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Create XCVbi extension
  support.
* config/riscv/riscv.opt: Likewise.
* config/riscv/corev.md: Implement cv_branch pattern
  for cv.beqimm and cv.bneimm.
* config/riscv/riscv.md: Add CORE-V branch immediate to RISC-V
  branch instruction pattern.
* config/riscv/constraints.md: Implement constraints
  cv_bi_s5 - signed 5-bit immediate.
* config/riscv/predicates.md: Implement predicate
  const_int5s_operand - signed 5 bit immediate.
* doc/sourcebuild.texi: Add XCVbi documentation.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-bi-beqimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-beqimm-compile-2.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-1.c: New test.
* gcc.target/riscv/cv-bi-bneimm-compile-2.c: New test.
* lib/target-supports.exp: Add proc for XCVbi.
---
  gcc/common/config/riscv/riscv-common.cc   |  2 +
  gcc/config/riscv/constraints.md   |  6 +++
  gcc/config/riscv/corev.md | 32 +
  gcc/config/riscv/predicates.md|  4 ++
  gcc/config/riscv/riscv.md |  2 +-
  gcc/config/riscv/riscv.opt|  2 +
  gcc/doc/sourcebuild.texi  |  3 ++
  .../gcc.target/riscv/cv-bi-beqimm-compile-1.c | 17 +++
  .../gcc.target/riscv/cv-bi-beqimm-compile-2.c | 48 +++
  .../gcc.target/riscv/cv-bi-bneimm-compile-1.c | 17 +++
  .../gcc.target/riscv/cv-bi-bneimm-compile-2.c | 48 +++
  gcc/testsuite/lib/target-supports.exp | 13 +
  12 files changed, 193 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-1.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-beqimm-compile-2.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-1.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/cv-bi-bneimm-compile-2.c




diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 2711efe68c5..718b4bd77df 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -247,3 +247,9 @@
(and (match_code "const_int")
 (and (match_test "IN_RANGE (ival, 0, 1073741823)")
  (match_test "exact_log2 (ival + 1) != -1"
+
+(define_constraint "CV_bi_sign5"
+  "@internal
+   A 5-bit signed immediate for CORE-V Immediate Branch."
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (ival, -16, 15)")))
diff --git a/gcc/config/riscv/corev.md b/gcc/config/riscv/corev.md
index 92bf0b5d6a6..92e30a8ae04 100644
--- a/gcc/config/riscv/corev.md
+++ b/gcc/config/riscv/corev.md
@@ -706,3 +706,35 @@
  
[(set_attr "type" "load")

(set_attr "mode" "SI")])
+
+;; XCVBI Instructions
+(define_insn "cv_branch" > +  [(set (pc)
+   (if_then_else
+(match_operator 1 "equality_operator"
+[(match_operand:X 2 "register_operand" "r")
+ (match_operand:X 3 "const_int5s_operand" 
"CV_bi_sign5")])
+(label_ref (match_operand 0 "" ""))
+(pc)))]
+  "TARGET_XCVBI"
+  "cv.b%C1imm\t%2,%3,%0"
+  [(set_attr "type" "branch")
+   (set_attr "mode" "none")])

So I think Kito wanted the name of this pattern to be prefixed with '*'.

My question is how does that pattern deal with out of range branch 
targets?  As Kito mentioned on the V3, you probably need to handle that.



I think this suggestion from Kito was meant to be added to that pattern 
so that it works in a manner similar to the *branch pattern:



if (get_attr_length (insn) == 12)
  return "cv.b%N1\t%2,%z3,1f; jump\t%l0,ra; 1:";



Jeff


Re: [PATCH v4 2/3] RISC-V: Update XCValu constraints to match other vendors

2023-12-15 Thread Jeff Law




On 12/12/23 12:32, Mary Bennett wrote:

gcc/ChangeLog:
* config/riscv/constraints.md: CVP2 -> CV_alu_pow2.
* config/riscv/corev.md: Likewise.
---

Kito ack'd the V3 patch, so I went ahead and pushed this to the trunk.

jeff


Re: [PATCH v4 1/3] RISC-V: Add support for XCVelw extension in CV32E40P

2023-12-15 Thread Jeff Law




On 12/12/23 12:32, Mary Bennett wrote:

Spec: 
github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md

Contributors:
   Mary Bennett 
   Nandni Jamnadas 
   Pietra Ferreira 
   Charlie Keaney
   Jessica Mills
   Craig Blackmore 
   Simon Cook 
   Jeremy Bennett 
   Helene Chelin 

gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add XCVelw.
* config/riscv/corev.def: Likewise.
* config/riscv/corev.md: Likewise.
* config/riscv/riscv-builtins.cc (AVAIL): Likewise.
* config/riscv/riscv-ftypes.def: Likewise.
* config/riscv/riscv.opt: Likewise.
* doc/extend.texi: Add XCVelw builtin documentation.
* doc/sourcebuild.texi: Likewise.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-elw-compile-1.c: Create test for cv.elw.
* testsuite/lib/target-supports.exp: Add proc for the XCVelw extension.
Kito ACK'd V3.   I'm going to go ahead and push this to the trunk on 
Mary's behalf.  It looks independent to me and there's no need for it to 
wait.


jeff


Re: [PATCH] RISC-V: Add -fno-vect-cost-model to pr112773 testcase

2023-12-15 Thread Jeff Law




On 12/14/23 14:32, Patrick O'Neill wrote:

The testcase for pr112773 started passing after r14-6472-g8501edba91e
which was before the actual fix. This patch adds -fno-vect-cost-model
which prevents the testcase from passing due to the vls change.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/pr112773.c: Add
-fno-vect-cost-model.

Signed-off-by: Patrick O'Neill 

I've pushed this to the trunk.

jeff


Re: [PATCH] RISC-V: Don't make Ztso imply A

2023-12-15 Thread Jeff Law




On 12/12/23 20:54, Palmer Dabbelt wrote:

I can't actually find anything in the ISA manual that makes Ztso imply
A.  In theory the memory ordering is just a different thing that the set
of availiable instructions (ie, Ztso without A would still imply TSO for
loads and stores).  It also seems like a configuration that could be
sane to build: without A it's all but impossible to write any meaningful
multi-core code, and TSO is really cheap for a single core.

That said, I think it's kind of reasonable to provide A to users asking
for Ztso.  So maybe even if this was a mistake it's the right thing to
do?

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_implied_info):
Remove {"ztso", "a"}.
I'd tend to think step #1 is to determine what the ISA intent is, 
meaning engagement with RVI.


We've got time for that engagement and to adjust based on the result. 
So I'd tend to defer until we know if Ztso should imply A or not.


jeff


Re: [PR target/110201] Fix operand types for various scalar crypto insns

2023-12-15 Thread Jeff Law




On 12/14/23 17:14, Christoph Müllner wrote:

On Fri, Dec 15, 2023 at 12:36 AM Jeff Law  wrote:




On 12/14/23 02:46, Christoph Müllner wrote:

On Tue, Jun 20, 2023 at 12:34 AM Jeff Law via Gcc-patches
 wrote:



A handful of the scalar crypto instructions are supposed to take a
constant integer argument 0..3 inclusive.  A suitable constraint was
created and used for this purpose (D03), but the operand's predicate is
"register_operand".  That's just wrong.

This patch adds a new predicate "const_0_3_operand" and fixes the
relevant insns to use it.  One could argue the constraint is redundant
now (and you'd be correct).  I wouldn't lose sleep if someone wanted
that removed, in which case I'll spin up a V2.

The testsuite was broken in a way that made it consistent with the
compiler, so the tests passed, when they really should have been issuing
errors all along.

This patch adjusts the existing tests so that they all expect a
diagnostic on the invalid operand usage (including out of range
constants).  It adds new tests with proper constants, testing the
extremes of valid values.

OK for the trunk, or should we remove the D03 constraint?


Reviewed-by: Christoph Muellner 

The patch does not apply cleanly anymore, because there were some
small changes in crypto.md.

Here's an update to that old patch that also takes care of the pattern
where we allow 0..10 inclusive, but not registers.

Regression tested on rv64gc without new failures.  It'll need a
ChangeLog when approved, but that's easy to adjust.


Looks good and tests pass for rv64gc and rv32gc.

Reviewed-by: Christoph Muellner 
Tested-by: Christoph Muellner 

I've pushed this to the trunk with Liao listed as a co-author.

jeff


Re: [PATCH v4 10/11] aarch64: Add new load/store pair fusion pass

2023-12-15 Thread Alex Coplan
On 15/12/2023 15:34, Richard Sandiford wrote:
> Alex Coplan  writes:
> > This is a v6 of the aarch64 load/store pair fusion pass, which
> > addresses the feedback from Richard's last review here:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640539.html
> >
> > In particular this version implements the suggested changes which
> > greatly simplify the double list walk.
> >
> > Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
> >
> > Thanks,
> > Alex
> >
> > -- >8 --
> >
> > This adds a new aarch64-specific RTL-SSA pass dedicated to forming load
> > and store pairs (LDPs and STPs).
> >
> > As a motivating example for the kind of thing this improves, take the
> > following testcase:
> >
> > extern double c[20];
> >
> > double f(double x)
> > {
> >   double y = x*x;
> >   y += c[16];
> >   y += c[17];
> >   y += c[18];
> >   y += c[19];
> >   return y;
> > }
> >
> > for which we currently generate (at -O2):
> >
> > f:
> > adrpx0, c
> > add x0, x0, :lo12:c
> > ldp d31, d29, [x0, 128]
> > ldr d30, [x0, 144]
> > fmadd   d0, d0, d0, d31
> > ldr d31, [x0, 152]
> > faddd0, d0, d29
> > faddd0, d0, d30
> > faddd0, d0, d31
> > ret
> >
> > but with the pass, we generate:
> >
> > f:
> > .LFB0:
> > adrpx0, c
> > add x0, x0, :lo12:c
> > ldp d31, d29, [x0, 128]
> > fmadd   d0, d0, d0, d31
> > ldp d30, d31, [x0, 144]
> > faddd0, d0, d29
> > faddd0, d0, d30
> > faddd0, d0, d31
> > ret
> >
> > The pass is local (only considers a BB at a time).  In theory, it should
> > be possible to extend it to run over EBBs, at least in the case of pure
> > (MEM_READONLY_P) loads, but this is left for future work.
> >
> > The pass works by identifying two kinds of bases: tree decls obtained
> > via MEM_EXPR, and RTL register bases in the form of RTL-SSA def_infos.
> > If a candidate memory access has a MEM_EXPR base, then we track it via
> > this base, and otherwise if it is of a simple reg +  form, we track
> > it via the RTL-SSA def_info for the register.
> >
> > For each BB, for a given kind of base, we build up a hash table mapping
> > the base to an access_group.  The access_group data structure holds a
> > list of accesses at each offset relative to the same base.  It uses a
> > splay tree to support efficient insertion (while walking the bb), and
> > the nodes are chained using a linked list to support efficient
> > iteration (while doing the transformation).
> >
> > For each base, we then iterate over the access_group to identify
> > adjacent accesses, and try to form load/store pairs for those insns that
> > access adjacent memory.
> >
> > The pass is currently run twice, both before and after register
> > allocation.  The first copy of the pass is run late in the pre-RA RTL
> > pipeline, immediately after sched1, since it was found that sched1 was
> > increasing register pressure when the pass was run before.  The second
> > copy of the pass runs immediately before peephole2, so as to get any
> > opportunities that the existing ldp/stp peepholes can handle.
> >
> > There are some cases that we punt on before RA, e.g.
> > accesses relative to eliminable regs (such as the soft frame pointer).
> > We do this since we can't know the elimination offset before RA, and we
> > want to avoid the RA reloading the offset (due to being out of ldp/stp
> > immediate range) as this can generate worse code.
> >
> > The post-RA copy of the pass is there to pick up the crumbs that were
> > left behind / things we punted on in the pre-RA pass.  Among other
> > things, it's needed to handle accesses relative to the stack pointer.
> > It can also handle code that didn't exist at the time the pre-RA pass
> > was run (spill code, prologue/epilogue code).
> >
> > This is an initial implementation, and there are (among other possible
> > improvements) the following notable caveats / missing features that are
> > left for future work, but could give further improvements:
> >
> >  - Moving accesses between BBs within in an EBB, see above.
> >  - Out-of-range opportunities: currently the pass refuses to form pairs
> >if there isn't a suitable base register with an immediate in range
> >for ldp/stp, but it can be profitable to emit anchor addresses in the
> >case that there are four or more out-of-range nearby accesses that can
> >be formed into pairs.  This is handled by the current ldp/stp
> >peepholes, so it would be good to support this in the future.
> >  - Discovery: currently we prioritize MEM_EXPR bases over RTL bases, which 
> > can
> >lead to us missing opportunities in the case that two accesses have 
> > distinct
> >MEM_EXPR bases (i.e. different DECLs) but they are still adjacent in 
> > memory
> >(e.g. adjacent variables on the stack).  I hope to address this for 

Re: [PATCH] RISC-V: Add Zvfbfmin extension to the -march= option

2023-12-15 Thread Jeff Law




On 12/12/23 20:24, Xiao Zeng wrote:

This patch would like to add new sub extension (aka Zvfbfmin) to the
-march= option. It introduces a new data type BF16.

Depending on different usage scenarios, the Zvfbfmin extension may
depend on 'V' or 'Zve32f'. This patch only implements dependencies
in scenario of Embedded Processor. In scenario of Application
Processor, it is necessary to explicitly indicate the dependent
'V' extension.

You can locate more information about Zvfbfmin from below spec doc.

https://github.com/riscv/riscv-bfloat16/releases/download/20231027/riscv-bfloat16.pdf

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
(riscv_implied_info): Add zvfbfmin item.
 (riscv_ext_version_table): Ditto.
 (riscv_ext_flag_table): Ditto.
* config/riscv/riscv.opt:
(MASK_ZVFBFMIN): New macro.
(MASK_VECTOR_ELEN_BF_16): Ditto.
(TARGET_ZVFBFMIN): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-31.c: New test.
* gcc.target/riscv/arch-32.c: New test.
* gcc.target/riscv/predef-32.c: New test.
* gcc.target/riscv/predef-33.c: New test.
I fixed the trivial whitespace issue with the ChangeLog and pushed this 
to the trunk.  However, I do want to stress that all future 
contributions need to indicate that the patch was successfully 
regression tested.


jeff


Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-15 Thread Richard Earnshaw




On 15/12/2023 11:31, Lipeng Zhu wrote:



On 2023/12/14 23:50, Richard Earnshaw (lists) wrote:

On 09/12/2023 15:39, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.



It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from 
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran

/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In 
function

‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ 
[-Wimplicit-function-declaration]

  1023 |   WRLOCK (_rwlock);
   |   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ 
[-Wimplicit-function-declaration]

  1025 |   RWUNLOCK (_rwlock);
   |   ^~~~


R.


Hi Richard,

The root cause is that the macro WRLOCK and RWUNLOCK are not defined in 
io.h. The reason of x86 platform not failed is that 
HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never 
been used. Code logic show as below:

#ifdef HAVE_ATOMIC_FETCH_ADD
   (void) __atomic_fetch_add (>waiting, -1, __ATOMIC_RELAXED);
#else
   WRLOCK (_rwlock);
   u->waiting--;
   RWUNLOCK (_rwlock);
#endif

I just draft a patch try to fix this bug, because I didn't have arm 
platform, would you help to validate if it was fixed on arm platform?


diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 15daa0995b1..c7f0f7d7d9e 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u)
  #ifdef HAVE_ATOMIC_FETCH_ADD
    (void) __atomic_fetch_add (>waiting, -1, __ATOMIC_RELAXED);
  #else
-  WRLOCK (_rwlock);
+#ifdef __GTHREAD_RWLOCK_INIT
+  __gthread_rwlock_wrlock (_rwlock);
+  u->waiting--;
+  __gthread_rwlock_unlock (_rwlock);
+#else
+  __gthread_mutex_lock (_rwlock);
    u->waiting--;
-  RWUNLOCK (_rwlock);
+  __gthread_mutex_unlock (_rwlock);
+#endif
  #endif
  }


Lipeng Zhu


Hi Lipeng,

Thanks for the quick reply.  I can confirm that with the above change 
the bootstrap failure is fixed.  However, this shouldn't be considered a 
formal review; libgfortran is not really my area.


I'll be away now until January 2nd.

Richard.


Re: [PATCH 2/2] c++: partial ordering and dep alias tmpl specs [PR90679]

2023-12-15 Thread Patrick Palka
On Thu, 1 Jun 2023, Patrick Palka wrote:

> During partial ordering, we want to look through dependent alias
> template specializations within template arguments and otherwise
> treat them as opaque in other contexts (see e.g. r7-7116-g0c942f3edab108
> and r11-7011-g6e0a231a4aa240).  To that end template_args_equal was
> given a partial_order flag that controls this behavior.  This flag
> does the right thing when a dependent alias template specialization
> appears as template argument of the partial specialization, e.g. in
> 
>   template using first_t = T;
>   template struct traits;
>   template struct traits> { }; // #1
>   template struct traits> { }; // #2
> 
> we correctly consider #2 to be more specialized than #1.  But if
> the alias specialization appears as a template argument of another
> class template specialization, e.g. in
> 
>   template struct traits>> { }; // #1
>   template struct traits>> { }; // #2
> 
> then we incorrectly consider #1 and #2 to be unordered.  This is because
> 
>   1. we don't propagate the flag to recursive template_args_equal calls
>   2. we don't use structural equality for class template specializations
>  written in terms of dependent alias template specializations
> 
> This patch fixes the first issue by turning the partial_order flag into
> a global.  This patch fixes the second issue by making us propagate
> structural equality appropriately when building a class template
> specialization.  In passing this patch also improves hashing of
> specializations that use structural equality.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?
> 
>   PR c++/90679
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (comp_template_args): Remove partial_order
>   parameter.
>   (template_args_equal): Likewise.
>   * pt.cc (iterative_hash_template_arg) : Hash
>   the template and arguments for specializations that use
>   structural equality.
>   (comparing_for_partial_ordering): New flag.
>   (template_args_equal): Remove partial order parameter and
>   use comparing_for_partial_ordering instead.
>   (comp_template_args): Likewise.
>   (comp_template_args_porder): Set comparing_for_partial_ordering
>   instead.  Make static.
>   (any_template_arguments_need_structural_equality_p): Return true
>   for an argument that's a dependent alias template specialization
>   or a class template specialization that itself needs structural
>   equality.
>   * tree.cc (cp_tree_equal) : Adjust call to
>   comp_template_args.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/alias-decl-75a.C: New test.
>   * g++.dg/cpp0x/alias-decl-75b.C: New test.

Ping.  Here's the rebased patch:

-- >8 --

Subject: [PATCH 2/2] c++: partial ordering and dep alias tmpl specs [PR90679]

During partial ordering, we want to look through dependent alias
template specializations within template arguments and otherwise
treat them as opaque in other contexts (see e.g. r7-7116-g0c942f3edab108
and r11-7011-g6e0a231a4aa240).  To that end template_args_equal was
given a partial_order flag that controls this behavior.  This flag
does the right thing when a dependent alias template specialization
appears as template argument of the partial specialization, e.g. in

  template using first_t = T;
  template struct traits;
  template struct traits> { }; // #1
  template struct traits> { }; // #2

we correctly consider #2 to be more specialized than #1.  But if
the alias specialization appears as a template argument of another
class template specialization, e.g. in

  template struct traits>> { }; // #1
  template struct traits>> { }; // #2

then we incorrectly consider #1 and #2 to be unordered.  This is because

  1. we don't propagate the flag to recursive template_args_equal calls
  2. we don't use structural equality for class template specializations
 written in terms of dependent alias template specializations

This patch fixes the first issue by turning the partial_order flag into
a global.  This patch fixes the second issue by making us propagate
structural equality appropriately when building a class template
specialization.  In passing this patch also improves hashing of
specializations that use structural equality.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/90679

gcc/cp/ChangeLog:

* cp-tree.h (comp_template_args): Remove partial_order
parameter.
(template_args_equal): Likewise.
* pt.cc (iterative_hash_template_arg) : Hash
the template and arguments for specializations that use
structural equality.
(comparing_for_partial_ordering): New flag.
(template_args_equal): Remove partial order parameter and
use comparing_for_partial_ordering instead.
(comp_template_args): Likewise.
(comp_template_args_porder): Set comparing_for_partial_ordering

Re: [PATCH 1/2] c++: refine dependent_alias_template_spec_p [PR90679]

2023-12-15 Thread Patrick Palka
On Mon, 11 Sep 2023, Patrick Palka wrote:

> On Thu, 1 Jun 2023, Patrick Palka wrote:
> 
> > For a complex alias template-id, dependent_alias_template_spec_p returns
> > true if any template argument of the template-id is dependent.  This
> > predicate indicates that substitution into the template-id may behave
> > differently with respect to SFINAE than substitution into the expanded
> > alias, and so the alias is in a way non-transparent.  For example
> > 'first_t' in
> > 
> >   template using first_t = T;
> >   template first_t f();
> > 
> > is such an alias template-id since first_t doesn't use its second
> > template parameter and so the substitution into the expanded alias would
> > discard the SFINAE effects of the corresponding (dependent) argument 'T&'.
> > 
> > But this predicate is overly conservative since what really matters for
> > sake of SFINAE equivalence is whether a template argument corresponding
> > to an _unused_ template parameter is dependent.  So the predicate should
> > return false for e.g. 'first_t' or 'first_t'.
> > 
> > This patch refines the predicate appropriately.  We need to be able to
> > efficiently determine which template parameters of a complex alias
> > template are unused, so to that end we add a new out parameter to
> > complex_alias_template_p and cache its result in an on-the-side
> > hash_map that replaces the existing TEMPLATE_DECL_COMPLEX_ALIAS_P
> > flag.  And in doing so, we fix a latent bug that this flag wasn't
> > being propagated during partial instantiation, and so we were treating
> > all partially instantiated member alias templates as non-complex.
> > 
> > PR c++/90679
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (TEMPLATE_DECL_COMPLEX_ALIAS_P): Remove.
> > (most_general_template): Constify parameter.
> > * pt.cc (push_template_decl): Adjust after removing
> > TEMPLATE_DECL_COMPLEX_ALIAS_P.
> > (complex_alias_tmpl_info): New hash_map.
> > (uses_all_template_parms_data::seen): Change type to
> > tree* from bool*.
> > (complex_alias_template_r): Adjust accordingly.
> > (complex_alias_template_p): Add 'seen_out' out parameter.
> > Call most_general_template and check PRIMARY_TEMPLATE_P.
> > Use complex_alias_tmpl_info to cache the result and set
> > '*seen_out' accordigly.
> > (dependent_alias_template_spec_p): Add !processing_template_decl
> > early exit test.  Consider dependence of only template arguments
> > corresponding to seen template parameters as per
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp0x/alias-decl-75.C: New test.
> 
> Ping.

Ping.  Here's a rebased patch:

-- >8 --

Subject: [PATCH 1/2] c++: refine dependent_alias_template_spec_p [PR90679]

For a (complex) alias template-id, dependent_alias_template_spec_p
returns true if any template argument of the template-id is dependent.
This predicate indicates that substitution into the template-id may
behave differently with respect to SFINAE than substitution into the
expanded alias, and so the alias is in a way non-transparent.

For example, 'first_t' in

  template using first_t = T;
  template first_t f();

is such an alias template-id since first_t doesn't use its second
template parameter and so the substitution into the expanded alias would
discard the SFINAE effects of the corresponding (dependent) argument 'T&'.

But this predicate is overly conservative since what really matters for
sake of SFINAE equivalence is whether a template argument corresponding
to an _unused_ template parameter is dependent.  So the predicate should
return false for e.g. 'first_t'.

This patch refines the predicate appropriately.  We need to be able to
efficiently determine which template parameters of a complex alias
template are unused, so to that end we add a new out parameter to
complex_alias_template_p and cache its result in an on-the-side
hash_map that replaces the existing TEMPLATE_DECL_COMPLEX_ALIAS_P
flag.

PR c++/90679

gcc/cp/ChangeLog:

* cp-tree.h (TEMPLATE_DECL_COMPLEX_ALIAS_P): Remove.
(most_general_template): Constify parameter.
* pt.cc (push_template_decl): Adjust after removing
TEMPLATE_DECL_COMPLEX_ALIAS_P.
(complex_alias_tmpl_info): New hash_map.
(uses_all_template_parms_data::seen): Change type to
tree* from bool*.
(complex_alias_template_r): Adjust accordingly.
(complex_alias_template_p): Add 'seen_out' out parameter.
Call most_general_template and check PRIMARY_TEMPLATE_P.
Use complex_alias_tmpl_info to cache the result and set
'*seen_out' accordigly.
(dependent_alias_template_spec_p): Add !processing_template_decl
early exit test.  Consider dependence of only template arguments
corresponding to seen template parameters as per

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-76.C: New test.
---
 gcc/cp/cp-tree.h   |   7 +-
 gcc/cp/pt.cc  

[PATCH 3/3][RFC] RISC-V: Enable assert for insn_has_dfa_reservation

2023-12-15 Thread Edwin Lu
Enables assert that every typed instruction is associated with a
dfa reservation

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): enable assert

Signed-off-by: Edwin Lu 
---
 gcc/config/riscv/riscv.cc | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ab0f95e5fe9..3adeb415bec 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8048,9 +8048,7 @@ riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, 
int more)
 
   /* If we ever encounter an insn without an insn reservation, trip
  an assert so we can find and fix this problem.  */
-#if 0
   gcc_assert (insn_has_dfa_reservation_p (insn));
-#endif
 
   return more - 1;
 }
-- 
2.34.1



[PATCH 2/3][RFC] RISC-V: Add vector related reservations

2023-12-15 Thread Edwin Lu
This patch copies the vector reservations from generic-ooo.md and
inserts them into generic.md and sifive.md. The vector pipelines are
necessary to avoid an ICE from the assert

gcc/ChangeLog:

* config/riscv/generic-ooo.md: syntax
* config/riscv/generic.md (pipe0): new reservation
(generic_vec_load): ditto
(generic_vec_store): ditto
(generic_vec_loadstore_seg): ditto
(generic_generic_vec_alu): ditto
(generic_vec_fcmp): ditto
(generic_vec_imul): ditto
(generic_vec_fadd): ditto
(generic_vec_fmul): ditto
(generic_crypto): ditto
(generic_vec_perm): ditto
(generic_vec_reduction): ditto
(generic_vec_ordered_reduction): ditto
(generic_vec_idiv): ditto
(generic_vec_float_divsqrt): ditto
(generic_vec_mask): ditto
(generic_vec_vesetvl): ditto
(generic_vec_setrm): ditto
(generic_vec_readlen): ditto
* config/riscv/sifive-7.md (sifive_7): new reservation
(sifive_7_vec_load): ditto
(sifive_7_vec_store): ditto
(sifive_7_vec_loadstore_seg): ditto
(sifive_7_sifive_7_vec_alu): ditto
(sifive_7_vec_fcmp): ditto
(sifive_7_vec_imul): ditto
(sifive_7_vec_fadd): ditto
(sifive_7_vec_fmul): ditto
(sifive_7_crypto): ditto
(sifive_7_vec_perm): ditto
(sifive_7_vec_reduction): ditto
(sifive_7_vec_ordered_reduction): ditto
(sifive_7_vec_idiv): ditto
(sifive_7_vec_float_divsqrt): ditto
(sifive_7_vec_mask): ditto
(sifive_7_vec_vesetvl): ditto
(sifive_7_vec_setrm): ditto
(sifive_7_vec_readlen): ditto

Signed-off-by: Edwin Lu 
Co-authored-by: Robin Dapp 
---
 gcc/config/riscv/generic-ooo.md |  19 ++---
 gcc/config/riscv/generic.md | 118 
 gcc/config/riscv/sifive-7.md| 118 
 3 files changed, 242 insertions(+), 13 deletions(-)

diff --git a/gcc/config/riscv/generic-ooo.md b/gcc/config/riscv/generic-ooo.md
index de93245f965..18b606bb981 100644
--- a/gcc/config/riscv/generic-ooo.md
+++ b/gcc/config/riscv/generic-ooo.md
@@ -106,16 +106,14 @@ (define_insn_reservation "generic_ooo_vec_store" 6
 ;; Vector segment loads/stores.
 (define_insn_reservation "generic_ooo_vec_loadstore_seg" 10
   (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" "vlsegde,vlsegds,vlsegdux,vlsegdox,vlsegdff,\
-   vssegte,vssegts,vssegtux,vssegtox"))
+   (eq_attr "type" 
"vlsegde,vlsegds,vlsegdux,vlsegdox,vlsegdff,vssegte,vssegts,vssegtux,vssegtox"))
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
 
 ;; Generic integer instructions.
 (define_insn_reservation "generic_ooo_alu" 1
   (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" "unknown,const,arith,shift,slt,multi,auipc,nop,logical,\
-   
move,bitmanip,rotate,min,max,minu,maxu,clz,ctz,atomic,condmove,cbo,mvpair,zicond"))
+   (eq_attr "type" 
"unknown,const,arith,shift,slt,multi,auipc,nop,logical,move,bitmanip,rotate,min,max,minu,maxu,clz,ctz,atomic,condmove,cbo,mvpair,zicond"))
   "generic_ooo_issue,generic_ooo_ixu_alu")
 
 (define_insn_reservation "generic_ooo_sfb_alu" 2
@@ -193,16 +191,13 @@ (define_insn_reservation "generic_ooo_popcount" 2
 ;; Regular vector operations and integer comparisons.
 (define_insn_reservation "generic_ooo_vec_alu" 3
   (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" 
"vialu,viwalu,vext,vicalu,vshift,vnshift,viminmax,vicmp,\
-   vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov,vector"))
+   (eq_attr "type" 
"vialu,viwalu,vext,vicalu,vshift,vnshift,viminmax,vicmp,vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov,vector"))
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
 ;; Vector float comparison, conversion etc.
 (define_insn_reservation "generic_ooo_vec_fcmp" 3
   (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" "vfrecp,vfminmax,vfcmp,vfsgnj,vfclass,vfcvtitof,\
-   vfcvtftoi,vfwcvtitof,vfwcvtftoi,vfwcvtftof,vfncvtitof,\
-   vfncvtftoi,vfncvtftof"))
+   (eq_attr "type" 
"vfrecp,vfminmax,vfcmp,vfsgnj,vfclass,vfcvtitof,vfcvtftoi,vfwcvtitof,vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof"))
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
 ;; Vector integer multiplication.
@@ -232,8 +227,7 @@ (define_insn_reservation "generic_ooo_crypto" 4
 ;; Vector permute.
 (define_insn_reservation "generic_ooo_perm" 3
   (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" "vimerge,vfmerge,vslideup,vslidedown,vislide1up,\
-   
vislide1down,vfslide1up,vfslide1down,vgather,vcompress"))
+   (eq_attr "type" 
"vimerge,vfmerge,vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,vgather,vcompress"))
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
 ;; Vector reduction.
@@ -265,8 +259,7 @@ (define_insn_reservation 

[PATCH 1/3][RFC] RISC-V: Add non-vector types to pipelines

2023-12-15 Thread Edwin Lu
This patch does not create vector related insn reservations for
generic.md and sifive-7.md. It updates/creates insn reservations
for all non-vector typed insns

gcc/ChangeLog:

* config/riscv/generic-ooo.md (generic_ooo_sfb_alu): create/update 
reservation
(generic_ooo_branch): ditto
* config/riscv/generic.md (generic_sfb_alu): ditto
* config/riscv/sifive-7.md (sifive_7_popcount): ditto

Signed-off-by: Edwin Lu 
---
 gcc/config/riscv/generic-ooo.md | 16 +---
 gcc/config/riscv/generic.md | 13 +
 gcc/config/riscv/sifive-7.md| 12 +---
 3 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/generic-ooo.md b/gcc/config/riscv/generic-ooo.md
index 78b9e48f935..de93245f965 100644
--- a/gcc/config/riscv/generic-ooo.md
+++ b/gcc/config/riscv/generic-ooo.md
@@ -95,7 +95,7 @@ (define_insn_reservation "generic_ooo_float_store" 6
 ;; Vector load/store
 (define_insn_reservation "generic_ooo_vec_load" 6
   (and (eq_attr "tune" "generic_ooo")
-   (eq_attr "type" "vlde,vldm,vlds,vldux,vldox,vldff,vldr"))
+   (eq_attr "type" "vlde,vldm,vlds,vldux,vldox,vldff,vldr,rdfrm"))
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
 (define_insn_reservation "generic_ooo_vec_store" 6
@@ -115,9 +115,19 @@ (define_insn_reservation "generic_ooo_vec_loadstore_seg" 10
 (define_insn_reservation "generic_ooo_alu" 1
   (and (eq_attr "tune" "generic_ooo")
(eq_attr "type" "unknown,const,arith,shift,slt,multi,auipc,nop,logical,\
-   move,bitmanip,min,max,minu,maxu,clz,ctz"))
+   
move,bitmanip,rotate,min,max,minu,maxu,clz,ctz,atomic,condmove,cbo,mvpair,zicond"))
   "generic_ooo_issue,generic_ooo_ixu_alu")
 
+(define_insn_reservation "generic_ooo_sfb_alu" 2
+  (and (eq_attr "tune" "generic_ooo")
+   (eq_attr "type" "sfb_alu"))
+  "generic_ooo_issue,generic_ooo_ixu_alu")
+
+;; Branch instructions
+(define_insn_reservation "generic_ooo_branch" 1
+  (and (eq_attr "tune" "generic_ooo")
+   (eq_attr "type" "branch,jump,call,jalr,ret,trap,pushpop"))
+  "generic_ooo_issue,generic_ooo_ixu_alu")
 
 ;; Float move, convert and compare.
 (define_insn_reservation "generic_ooo_float_move" 3
@@ -184,7 +194,7 @@ (define_insn_reservation "generic_ooo_popcount" 2
 (define_insn_reservation "generic_ooo_vec_alu" 3
   (and (eq_attr "tune" "generic_ooo")
(eq_attr "type" 
"vialu,viwalu,vext,vicalu,vshift,vnshift,viminmax,vicmp,\
-   vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov"))
+   vimov,vsalu,vaalu,vsshift,vnclip,vmov,vfmov,vector"))
   "generic_ooo_vxu_issue,generic_ooo_vxu_alu")
 
 ;; Vector float comparison, conversion etc.
diff --git a/gcc/config/riscv/generic.md b/gcc/config/riscv/generic.md
index 88940483829..3e49d942495 100644
--- a/gcc/config/riscv/generic.md
+++ b/gcc/config/riscv/generic.md
@@ -27,7 +27,7 @@ (define_cpu_unit "fdivsqrt" "pipe0")
 
 (define_insn_reservation "generic_alu" 1
   (and (eq_attr "tune" "generic")
-   (eq_attr "type" 
"unknown,const,arith,shift,slt,multi,auipc,nop,logical,move,bitmanip,min,max,minu,maxu,clz,ctz,cpop"))
+   (eq_attr "type" 
"unknown,const,arith,shift,slt,multi,auipc,nop,logical,move,bitmanip,min,max,minu,maxu,clz,ctz,rotate,atomic,condmove,crypto,mvpair,zicond"))
   "alu")
 
 (define_insn_reservation "generic_load" 3
@@ -42,17 +42,22 @@ (define_insn_reservation "generic_store" 1
 
 (define_insn_reservation "generic_xfer" 3
   (and (eq_attr "tune" "generic")
-   (eq_attr "type" "mfc,mtc,fcvt,fmove,fcmp"))
+   (eq_attr "type" "mfc,mtc,fcvt,fmove,fcmp,cbo"))
   "alu")
 
 (define_insn_reservation "generic_branch" 1
   (and (eq_attr "tune" "generic")
-   (eq_attr "type" "branch,jump,call,jalr"))
+   (eq_attr "type" "branch,jump,call,jalr,ret,trap,pushpop"))
+  "alu")
+
+(define_insn_reservation "generic_sfb_alu" 2
+  (and (eq_attr "tune" "generic")
+   (eq_attr "type" "sfb_alu"))
   "alu")
 
 (define_insn_reservation "generic_imul" 10
   (and (eq_attr "tune" "generic")
-   (eq_attr "type" "imul,clmul"))
+   (eq_attr "type" "imul,clmul,cpop"))
   "imuldiv*10")
 
 (define_insn_reservation "generic_idivsi" 34
diff --git a/gcc/config/riscv/sifive-7.md b/gcc/config/riscv/sifive-7.md
index a63394c8c58..65d27cf6dc9 100644
--- a/gcc/config/riscv/sifive-7.md
+++ b/gcc/config/riscv/sifive-7.md
@@ -34,7 +34,7 @@ (define_insn_reservation "sifive_7_fpstore" 1
 
 (define_insn_reservation "sifive_7_branch" 1
   (and (eq_attr "tune" "sifive_7")
-   (eq_attr "type" "branch"))
+   (eq_attr "type" "branch,ret,trap"))
   "sifive_7_B")
 
 (define_insn_reservation "sifive_7_sfb_alu" 2
@@ -44,7 +44,7 @@ (define_insn_reservation "sifive_7_sfb_alu" 2
 
 (define_insn_reservation "sifive_7_jump" 1
   (and (eq_attr "tune" "sifive_7")
-   (eq_attr "type" "jump,call,jalr"))
+   (eq_attr "type" "jump,call,jalr,pushpop"))
   "sifive_7_B")
 
 (define_insn_reservation "sifive_7_mul" 3
@@ -59,7 

[PATCH 0/3][RFC] RISC-V: Associate typed insns to dfa reservation

2023-12-15 Thread Edwin Lu
This series is a prototype for adding all typed instructions to a dfa 
scheduling pipeline.

I've been working on adding insn reservations for all typed instructions
to ensure all instructions are part of a dfa pipeline. I don't have a good 
understanding of vector instruction latency, so I have been struggling
with what I should do for those. 

As of right now, I have copied the insn reservations from generic-ooo.md 
for vector instructions into the generic.md and sifive-7.md files. This 
prevents ICEs from enabling the assert but introduces numerous scan
dump failures (when tested in linux rv64gcv and rv64gc_zba_zbb_zbc_zbs).

Currently, only patch 1/3 RISC-V: Add non-vector types to pipelines does
not introduce regressions (when tested against linux rv32/64 gc/gcv
on rocket).  I hope that the locations I added the insn types make sense. 
Please let me know if they should change.

The final patch enables the assert for insn_has_dfa_reservation. 

I tested the full patch series on both rocket and sifive-7-series. The 
series does introduce additional scan dump failures compared to their
respective baselines, however, I'm not sure how many failures were
due to the patch vs incorrect modeling assumptions. I created
PR113035 which has the full testsuite failures I saw (without the patches
applied).

Edwin Lu (3):
  RISC-V: Add non-vector types to pipelines
  RISC-V: Add vector related reservations
  RISC-V: Enable assert for insn_has_dfa_reservation

 gcc/config/riscv/generic-ooo.md |  31 
 gcc/config/riscv/generic.md | 131 +++-
 gcc/config/riscv/riscv.cc   |   2 -
 gcc/config/riscv/sifive-7.md| 130 ++-
 4 files changed, 271 insertions(+), 23 deletions(-)

-- 
2.34.1



Re: [PATCH] match.pd: Optimize sign-extension followed by truncation [PR113024]

2023-12-15 Thread Richard Sandiford
Jakub Jelinek  writes:
> Hi!
>
> While looking at a bitint ICE, I've noticed we don't optimize
> in f1 and f5 functions below the 2 casts into just one at GIMPLE,
> even when optimize it in convert_to_integer if it appears in the same
> stmt.  The large match.pd simplification of two conversions in a row
> has many complex rules and as the testcase shows, everything else from
> the narrowest -> widest -> prec_in_between all integer conversions
> is already handled, either because the inside_unsignedp == inter_unsignedp
> rule kicks in, or the
>  && ((inter_unsignedp && inter_prec > inside_prec)
>  == (final_unsignedp && final_prec > inter_prec))
> one, but there is no reason why sign extension to from narrowest to
> widest type followed by truncation to something in between can't be
> done just as sign extension from narrowest to the final type.  After all,
> if the widest type is signed rather than unsigned, regardless of the final
> type signedness we already handle it that way.
> And since PR93044 we also handle it if the final precision is not wider
> than the inside precision.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2023-12-14  Jakub Jelinek  
>
>   PR tree-optimization/113024
>   * match.pd (two conversions in a row): Simplify scalar integer
>   sign-extension followed by truncation.
>
>   * gcc.dg/tree-ssa/pr113024.c: New test.
>
> --- gcc/match.pd.jj   2023-12-14 11:59:28.0 +0100
> +++ gcc/match.pd  2023-12-14 18:25:00.457961975 +0100
> @@ -4754,11 +4754,14 @@ (define_operator_list SYNC_FETCH_AND_AND
>  /* If we have a sign-extension of a zero-extended value, we can
> replace that by a single zero-extension.  Likewise if the
> final conversion does not change precision we can drop the
> -   intermediate conversion.  */
> +   intermediate conversion.  Similarly truncation of a sign-extension
> +   can be replaced by a single sign-extension.  */
>  (if (inside_int && inter_int && final_int
>&& ((inside_prec < inter_prec && inter_prec < final_prec
> && inside_unsignedp && !inter_unsignedp)
> -  || final_prec == inter_prec))
> +  || final_prec == inter_prec
> +  || (inside_prec < inter_prec && inter_prec > final_prec
> +  && !inside_unsignedp && inter_unsignedp)))

Just curious: is the inter_unsignedp part needed for correctness?
If it's bigger than both the initial and final type then I wouldn't
have expected its signedness to matter.

Thanks,
Richard

>   (ocvt @0))
>  
>  /* Two conversions in a row are not needed unless:
> --- gcc/testsuite/gcc.dg/tree-ssa/pr113024.c.jj   2023-12-14 
> 18:35:30.652225327 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr113024.c  2023-12-14 18:37:42.056403418 
> +0100
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/113024 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-forwprop1" } */
> +/* Make sure we have just a single cast per function rather than 2 casts in 
> some cases.  */
> +/* { dg-final { scan-tree-dump-times " = \\\(\[a-z \]*\\\) \[xy_\]" 16 
> "forwprop1" { target { ilp32 || lp64 } } } } */
> +
> +unsigned int f1 (signed char x) { unsigned long long y = x; return y; }
> +unsigned int f2 (unsigned char x) { unsigned long long y = x; return y; }
> +unsigned int f3 (signed char x) { long long y = x; return y; }
> +unsigned int f4 (unsigned char x) { long long y = x; return y; }
> +int f5 (signed char x) { unsigned long long y = x; return y; }
> +int f6 (unsigned char x) { unsigned long long y = x; return y; }
> +int f7 (signed char x) { long long y = x; return y; }
> +int f8 (unsigned char x) { long long y = x; return y; }
> +unsigned int f9 (signed char x) { return (unsigned long long) x; }
> +unsigned int f10 (unsigned char x) { return (unsigned long long) x; }
> +unsigned int f11 (signed char x) { return (long long) x; }
> +unsigned int f12 (unsigned char x) { return (long long) x; }
> +int f13 (signed char x) { return (unsigned long long) x; }
> +int f14 (unsigned char x) { return (unsigned long long) x; }
> +int f15 (signed char x) { return (long long) x; }
> +int f16 (unsigned char x) { return (long long) x; }
>
>   Jakub


[r14-6559 Regression] FAIL: gcc.dg/guality/pr58791-4.c -Os -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i == 486 on Linux/x86_64

2023-12-15 Thread haochen.jiang
On Linux/x86_64,

8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652 is the first bad commit
commit 8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652
Author: Di Zhao 
Date:   Fri Dec 15 03:22:32 2023 +0800

Consider fully pipelined FMA in get_reassociation_width

caused

FAIL: gcc.dg/pr110279-2.c scan-tree-dump-not reassoc2 "was chosen for 
reassociation"
FAIL: gcc.dg/pr110279-2.c scan-tree-dump-times optimized "\\.FMA " 3

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6559/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr110279-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr110279-2.c 
--target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH v4 10/11] aarch64: Add new load/store pair fusion pass

2023-12-15 Thread Richard Sandiford
Alex Coplan  writes:
> This is a v6 of the aarch64 load/store pair fusion pass, which
> addresses the feedback from Richard's last review here:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640539.html
>
> In particular this version implements the suggested changes which
> greatly simplify the double list walk.
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> Thanks,
> Alex
>
> -- >8 --
>
> This adds a new aarch64-specific RTL-SSA pass dedicated to forming load
> and store pairs (LDPs and STPs).
>
> As a motivating example for the kind of thing this improves, take the
> following testcase:
>
> extern double c[20];
>
> double f(double x)
> {
>   double y = x*x;
>   y += c[16];
>   y += c[17];
>   y += c[18];
>   y += c[19];
>   return y;
> }
>
> for which we currently generate (at -O2):
>
> f:
> adrpx0, c
> add x0, x0, :lo12:c
> ldp d31, d29, [x0, 128]
> ldr d30, [x0, 144]
> fmadd   d0, d0, d0, d31
> ldr d31, [x0, 152]
> faddd0, d0, d29
> faddd0, d0, d30
> faddd0, d0, d31
> ret
>
> but with the pass, we generate:
>
> f:
> .LFB0:
> adrpx0, c
> add x0, x0, :lo12:c
> ldp d31, d29, [x0, 128]
> fmadd   d0, d0, d0, d31
> ldp d30, d31, [x0, 144]
> faddd0, d0, d29
> faddd0, d0, d30
> faddd0, d0, d31
> ret
>
> The pass is local (only considers a BB at a time).  In theory, it should
> be possible to extend it to run over EBBs, at least in the case of pure
> (MEM_READONLY_P) loads, but this is left for future work.
>
> The pass works by identifying two kinds of bases: tree decls obtained
> via MEM_EXPR, and RTL register bases in the form of RTL-SSA def_infos.
> If a candidate memory access has a MEM_EXPR base, then we track it via
> this base, and otherwise if it is of a simple reg +  form, we track
> it via the RTL-SSA def_info for the register.
>
> For each BB, for a given kind of base, we build up a hash table mapping
> the base to an access_group.  The access_group data structure holds a
> list of accesses at each offset relative to the same base.  It uses a
> splay tree to support efficient insertion (while walking the bb), and
> the nodes are chained using a linked list to support efficient
> iteration (while doing the transformation).
>
> For each base, we then iterate over the access_group to identify
> adjacent accesses, and try to form load/store pairs for those insns that
> access adjacent memory.
>
> The pass is currently run twice, both before and after register
> allocation.  The first copy of the pass is run late in the pre-RA RTL
> pipeline, immediately after sched1, since it was found that sched1 was
> increasing register pressure when the pass was run before.  The second
> copy of the pass runs immediately before peephole2, so as to get any
> opportunities that the existing ldp/stp peepholes can handle.
>
> There are some cases that we punt on before RA, e.g.
> accesses relative to eliminable regs (such as the soft frame pointer).
> We do this since we can't know the elimination offset before RA, and we
> want to avoid the RA reloading the offset (due to being out of ldp/stp
> immediate range) as this can generate worse code.
>
> The post-RA copy of the pass is there to pick up the crumbs that were
> left behind / things we punted on in the pre-RA pass.  Among other
> things, it's needed to handle accesses relative to the stack pointer.
> It can also handle code that didn't exist at the time the pre-RA pass
> was run (spill code, prologue/epilogue code).
>
> This is an initial implementation, and there are (among other possible
> improvements) the following notable caveats / missing features that are
> left for future work, but could give further improvements:
>
>  - Moving accesses between BBs within in an EBB, see above.
>  - Out-of-range opportunities: currently the pass refuses to form pairs
>if there isn't a suitable base register with an immediate in range
>for ldp/stp, but it can be profitable to emit anchor addresses in the
>case that there are four or more out-of-range nearby accesses that can
>be formed into pairs.  This is handled by the current ldp/stp
>peepholes, so it would be good to support this in the future.
>  - Discovery: currently we prioritize MEM_EXPR bases over RTL bases, which can
>lead to us missing opportunities in the case that two accesses have 
> distinct
>MEM_EXPR bases (i.e. different DECLs) but they are still adjacent in memory
>(e.g. adjacent variables on the stack).  I hope to address this for GCC 15,
>hopefully getting to the point where we can remove the ldp/stp peepholes 
> and
>scheduling hooks.  Furthermore it would be nice to make the pass aware of
>section anchors (adding these as a third kind of base) allowing merging
>accesses to adjacent variables within 

[PATCH v4 10/11] aarch64: Add new load/store pair fusion pass

2023-12-15 Thread Alex Coplan
This is a v6 of the aarch64 load/store pair fusion pass, which
addresses the feedback from Richard's last review here:

https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640539.html

In particular this version implements the suggested changes which
greatly simplify the double list walk.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

-- >8 --

This adds a new aarch64-specific RTL-SSA pass dedicated to forming load
and store pairs (LDPs and STPs).

As a motivating example for the kind of thing this improves, take the
following testcase:

extern double c[20];

double f(double x)
{
  double y = x*x;
  y += c[16];
  y += c[17];
  y += c[18];
  y += c[19];
  return y;
}

for which we currently generate (at -O2):

f:
adrpx0, c
add x0, x0, :lo12:c
ldp d31, d29, [x0, 128]
ldr d30, [x0, 144]
fmadd   d0, d0, d0, d31
ldr d31, [x0, 152]
faddd0, d0, d29
faddd0, d0, d30
faddd0, d0, d31
ret

but with the pass, we generate:

f:
.LFB0:
adrpx0, c
add x0, x0, :lo12:c
ldp d31, d29, [x0, 128]
fmadd   d0, d0, d0, d31
ldp d30, d31, [x0, 144]
faddd0, d0, d29
faddd0, d0, d30
faddd0, d0, d31
ret

The pass is local (only considers a BB at a time).  In theory, it should
be possible to extend it to run over EBBs, at least in the case of pure
(MEM_READONLY_P) loads, but this is left for future work.

The pass works by identifying two kinds of bases: tree decls obtained
via MEM_EXPR, and RTL register bases in the form of RTL-SSA def_infos.
If a candidate memory access has a MEM_EXPR base, then we track it via
this base, and otherwise if it is of a simple reg +  form, we track
it via the RTL-SSA def_info for the register.

For each BB, for a given kind of base, we build up a hash table mapping
the base to an access_group.  The access_group data structure holds a
list of accesses at each offset relative to the same base.  It uses a
splay tree to support efficient insertion (while walking the bb), and
the nodes are chained using a linked list to support efficient
iteration (while doing the transformation).

For each base, we then iterate over the access_group to identify
adjacent accesses, and try to form load/store pairs for those insns that
access adjacent memory.

The pass is currently run twice, both before and after register
allocation.  The first copy of the pass is run late in the pre-RA RTL
pipeline, immediately after sched1, since it was found that sched1 was
increasing register pressure when the pass was run before.  The second
copy of the pass runs immediately before peephole2, so as to get any
opportunities that the existing ldp/stp peepholes can handle.

There are some cases that we punt on before RA, e.g.
accesses relative to eliminable regs (such as the soft frame pointer).
We do this since we can't know the elimination offset before RA, and we
want to avoid the RA reloading the offset (due to being out of ldp/stp
immediate range) as this can generate worse code.

The post-RA copy of the pass is there to pick up the crumbs that were
left behind / things we punted on in the pre-RA pass.  Among other
things, it's needed to handle accesses relative to the stack pointer.
It can also handle code that didn't exist at the time the pre-RA pass
was run (spill code, prologue/epilogue code).

This is an initial implementation, and there are (among other possible
improvements) the following notable caveats / missing features that are
left for future work, but could give further improvements:

 - Moving accesses between BBs within in an EBB, see above.
 - Out-of-range opportunities: currently the pass refuses to form pairs
   if there isn't a suitable base register with an immediate in range
   for ldp/stp, but it can be profitable to emit anchor addresses in the
   case that there are four or more out-of-range nearby accesses that can
   be formed into pairs.  This is handled by the current ldp/stp
   peepholes, so it would be good to support this in the future.
 - Discovery: currently we prioritize MEM_EXPR bases over RTL bases, which can
   lead to us missing opportunities in the case that two accesses have distinct
   MEM_EXPR bases (i.e. different DECLs) but they are still adjacent in memory
   (e.g. adjacent variables on the stack).  I hope to address this for GCC 15,
   hopefully getting to the point where we can remove the ldp/stp peepholes and
   scheduling hooks.  Furthermore it would be nice to make the pass aware of
   section anchors (adding these as a third kind of base) allowing merging
   accesses to adjacent variables within the same section.

gcc/ChangeLog:

* config.gcc: Add aarch64-ldp-fusion.o to extra_objs for aarch64.
* config/aarch64/aarch64-passes.def: Add copies of pass_ldp_fusion
before and after RA.
* 

Re: [committed] libstdc++: Implement C++23 header [PR107760]

2023-12-15 Thread Tim Song
On Fri, Dec 15, 2023 at 4:43 AM Jonathan Wakely  wrote:

> On Fri, 15 Dec 2023 at 01:17, Tim Song wrote:
> >
> > On Thu, Dec 14, 2023 at 6:05 PM Jonathan Wakely 
> wrote:
> >> +  inline void
> >> +  vprint_unicode(ostream& __os, string_view __fmt, format_args __args)
> >> +  {
> >> +ostream::sentry __cerb(__os);
> >> +if (__cerb)
> >> +  {
> >> +
> >> +   const streamsize __w = __os.width();
> >> +   const bool __left
> >> + = (__os.flags() & ios_base::adjustfield) == ios_base::left;
> >
> >
> > I'm pretty sure - when I wrote this wording anyway - that the intent was
> that it was just an unformatted write at the end. The wording in
> [ostream.formatted.print] doesn't use the "determines padding" words of
> power that would invoke [ostream.formatted.reqmts]/3.
>
> Ah, OK. I misunderstood "formatted output function" as implying
> padding, failing to notice that we need those words of power to be
> present. My thinking was that if the stream has padding set in its
> format flags, it could be surprising if they're ignored by a formatted
> output function. And padding in the format string applies to
> individual replacement fields, not the whole string, and it's hard to
> use the stream's fill character and alignment.
>

But we would get none of the Unicode-aware padding logic we
do in format, which puts it in a very weird place.

And for cases where Unicode is not a problem, it's easy to get padding
with just os << std::format(...);


> You can do this to use the ostream's width:
>
> std::print("{0:{1}}", std::format(...), os.width());
>
> But to reuse its fill char and adjustfield you need to do something
> awful like I did in the committed code:
>
> std::string_view align;
> if (os.flags() & ios::adjustfield) == ios::right)
>   align = ">"
> auto fs = std::format("{{:{}{}{}}}", os.fill(), align, os.width());
> std::vprint_nonunicode(os, fs, std::make_args(std::format(...)));


> And now you have to hardcode a choice between vprint_unicode and
> vprint_nonunicode, instead of letting std::print decide it. Let's hope
> nobody ever needs to do any of that ;-)
>

At least the upcoming runtime_format alleviates that :)


>
> I'll remove the code for padding the padding, thanks for checking the
> patch.
>
>


Re: [pushed] testsuite: move more analyzer test cases to c-c++-common (3) [PR96395]

2023-12-15 Thread Rainer Orth
David Malcolm  writes:

> Move a further 268 tests from gcc.dg/analyzer to c-c++-common/analyzer.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Pushed to trunk as r14-6564-gae034b9106fbdd.

This patch introduced 840 additional FAILs on i386-pc-solaris2.11, no
doubt more instances of PR analyzer/111475.  Is this supposed to work
anywhere but Linux?  Right now the analyzer testsuite is a total
nightmare...

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[committed] libstdc++: Fix std::print test case for Windows

2023-12-15 Thread Jonathan Wakely
Tested x86_64-linux and x86_64-w64-mingw. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* src/c++23/print.cc (__write_to_terminal) [_WIN32]: If handle
does not refer to the console then just write to it using normal
file I/O.
* testsuite/27_io/print/2.cc (as_printed_to_terminal): Print
error message on failure.
(test_utf16_transcoding): Adjust for as_printed_to_terminal
modifying its argument.
---
 libstdc++-v3/src/c++23/print.cc | 13 -
 libstdc++-v3/testsuite/27_io/print/2.cc |  7 ++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/src/c++23/print.cc b/libstdc++-v3/src/c++23/print.cc
index 2fe7a2e3565..d72ab856017 100644
--- a/libstdc++-v3/src/c++23/print.cc
+++ b/libstdc++-v3/src/c++23/print.cc
@@ -35,7 +35,8 @@
 
 #ifdef _WIN32
 # include// _fileno
-# include   // _get_osfhandle
+# include   // _get_osfhandle, _open_osfhandle, _write
+# include// _O_APPEND
 # include  // GetLastError, WriteConsoleW
 #elifdef _GLIBCXX_HAVE_UNISTD_H
 # include// fileno
@@ -324,6 +325,16 @@ namespace
 if (!to_valid_utf16(str, wstr))
   ec = std::make_error_code(errc::illegal_byte_sequence);
 
+// This allows us to test this function with a normal file,
+// see testsuite/27_io/print/2.cc
+if (!check_for_console(term))
+  {
+   int fd = _open_osfhandle((intptr_t)term, _O_APPEND);
+   if (_write(fd, wstr.data(), wstr.size() * 2) == -1)
+ ec = {errno, generic_category()};
+   return ec;
+  }
+
 unsigned long nchars = 0;
 WriteConsoleW(term, wstr.data(), wstr.size(), , nullptr);
 if (nchars != wstr.size())
diff --git a/libstdc++-v3/testsuite/27_io/print/2.cc 
b/libstdc++-v3/testsuite/27_io/print/2.cc
index e101201f109..8aa7888e7bd 100644
--- a/libstdc++-v3/testsuite/27_io/print/2.cc
+++ b/libstdc++-v3/testsuite/27_io/print/2.cc
@@ -39,7 +39,11 @@ as_printed_to_terminal(std::string& s)
 #else
   const auto ec = std::__write_to_terminal(strm, s);
 #endif
-  VERIFY( !ec || ec == std::make_error_code(std::errc::illegal_byte_sequence) 
);
+  if (ec && ec != std::make_error_code(std::errc::illegal_byte_sequence))
+{
+  std::println("Failed to : {}", ec.message());
+  VERIFY(!ec);
+}
   std::fclose(strm);
   std::ifstream in(f.path);
   s.assign(std::istreambuf_iterator(in), {});
@@ -114,6 +118,7 @@ test_utf16_transcoding()
   VERIFY( as_printed_to_terminal(s) );
   VERIFY( utf16_from_bytes(s) == s2 );
 
+  s = (const char*)u8"£🇬🇧 €🇪🇺";
   s += " \xa3 10.99 \xee\xdd";
   VERIFY( ! as_printed_to_terminal(s) );
   std::u16string repl = u"\uFFFD";
-- 
2.43.0



[committed] libstdc++: Simplify std::vprint_unicode for non-Windows targets

2023-12-15 Thread Jonathan Wakely
Tested x86_64-linux and x86_64-w64-mingw. Pushed to trunk.

-- >8 --

Since we don't need to do anything special to print Unicode on
non-Windows targets, we might as well just use std::vprint_nonunicode to
implement std::vprint_unicode. Removing the duplicated code should
reduce code size in cases where those calls aren't inlined.

Also use an RAII type for the unused case where a non-Windows target
calls __open_terminal(streambuf*) and needs to fclose the result. This
makes the code futureproof in case we ever start using the
__write_terminal function for non-Windows targets.

libstdc++-v3/ChangeLog:

* include/std/ostream (vprint_unicode) [_WIN32]: Use RAII guard.
(vprint_unicode) [!_WIN32]: Just call vprint_nonunicode.
* include/std/print (vprint_unicode) [!_WIN32]: Likewise.
---
 libstdc++-v3/include/std/ostream | 31 +++
 libstdc++-v3/include/std/print   | 10 +++---
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index 0cac293e4d6..a9a7aefad71 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -906,6 +906,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   inline void
   vprint_unicode(ostream& __os, string_view __fmt, format_args __args)
   {
+#ifndef _WIN32
+// For most targets we don't need to do anything special to write
+// Unicode to a terminal.
+std::vprint_nonunicode(__os, __fmt, __args);
+#else
 ostream::sentry __cerb(__os);
 if (__cerb)
   {
@@ -913,12 +918,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
std::vformat_to(__buf.out(), __os.getloc(), __fmt, __args);
auto __out = __buf.view();
 
-#ifdef _WIN32
void* __open_terminal(streambuf*);
error_code __write_to_terminal(void*, span);
// If stream refers to a terminal, write a Unicode string to it.
if (auto __term = __open_terminal(__os.rdbuf()))
  {
+#ifndef _WIN32
+   // For POSIX, __open_terminal(streambuf*) uses fdopen to open a
+   // new file, so we would need to close it here. This code is not
+   // actually compiled because it's inside an #ifdef _WIN32 group,
+   // but just in case that changes in future ...
+   struct _Guard
+   {
+ _Guard(void* __p) : _M_f((FILE*)__p) { }
+ ~_Guard() { std::fclose(_M_f); }
+ _Guard(_Guard&&) = delete;
+ _Guard& operator=(_Guard&&) = delete;
+ FILE* _M_f;
+   };
+   _Guard __g(__term);
+#endif
+
ios_base::iostate __err = ios_base::goodbit;
__try
  {
@@ -927,11 +947,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
else if (auto __e = __write_to_terminal(__term, __out))
  if (__e != std::make_error_code(errc::illegal_byte_sequence))
__err = ios::badbit;
-#ifndef _WIN32
-   // __open_terminal(streambuf*) opens a new FILE with fdopen,
-   // so we need to close it here.
-   std::fclose((FILE*)__term);
-#endif
  }
__catch(const __cxxabiv1::__forced_unwind&)
  {
@@ -945,9 +960,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __os.setstate(__err);
return;
  }
-#endif
 
-   // Otherwise just insert the string as normal.
+   // Otherwise just insert the string as vprint_nonunicode does.
__try
  {
std::__ostream_write(__os, __out.data(), __out.size());
@@ -960,6 +974,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__catch(...)
  { __os._M_setstate(ios_base::badbit); }
   }
+#endif // _WIN32
   }
 
   template
diff --git a/libstdc++-v3/include/std/print b/libstdc++-v3/include/std/print
index e7099ab6fe3..bb029610d70 100644
--- a/libstdc++-v3/include/std/print
+++ b/libstdc++-v3/include/std/print
@@ -64,11 +64,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   inline void
   vprint_unicode(FILE* __stream, string_view __fmt, format_args __args)
   {
+#ifndef _WIN32
+// For most targets we don't need to do anything special to write
+// Unicode to a terminal.
+std::vprint_nonunicode(__stream, __fmt, __args);
+#else
 __format::_Str_sink __buf;
 std::vformat_to(__buf.out(), __fmt, __args);
 auto __out = __buf.view();
 
-#ifdef _WIN32
 void* __open_terminal(FILE*);
 error_code __write_to_terminal(void*, span);
 // If stream refers to a terminal, write a native Unicode string to it.
@@ -88,11 +92,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __e = error_code(errno, generic_category());
_GLIBCXX_THROW_OR_ABORT(system_error(__e, "std::vprint_unicode"));
   }
-#endif
 
-// Otherwise just write the string to the file.
+// Otherwise just write the string to the file as vprint_nonunicode does.
 if (std::fwrite(__out.data(), 1, __out.size(), __stream) != __out.size())
   

[committed] libstdc++: Do not add padding for std::print to std::ostream

2023-12-15 Thread Jonathan Wakely
Tested x86_64-linux and x86_64-w64-mingw. Pushed to trunk.

-- >8 --

Tim Song pointed out that although std::print behaves as a formatted
output function, it does "determine padding" using the stream's flags.

libstdc++-v3/ChangeLog:

* include/std/ostream (vprint_nonunicode, vprint_unicode): Do
not insert padding.
* testsuite/27_io/basic_ostream/print/1.cc: Adjust expected
behaviour.
---
 libstdc++-v3/include/std/ostream  | 46 +--
 .../testsuite/27_io/basic_ostream/print/1.cc  | 10 ++--
 2 files changed, 8 insertions(+), 48 deletions(-)

diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index 4f1cdc281a3..0cac293e4d6 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -891,21 +891,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
__try
  {
-   const streamsize __w = __os.width();
-   const streamsize __n = __out.size();
-   if (__w > __n)
- {
-   const bool __left
- = (__os.flags() & ios_base::adjustfield) == ios_base::left;
-   if (!__left)
- std::__ostream_fill(__os, __w - __n);
-   if (__os.good())
- std::__ostream_write(__os, __out.data(), __n);
-   if (__left && __os.good())
- std::__ostream_fill(__os, __w - __n);
- }
-   else
- std::__ostream_write(__os, __out.data(), __n);
+   std::__ostream_write(__os, __out.data(), __out.size());
  }
__catch(const __cxxabiv1::__forced_unwind&)
  {
@@ -923,11 +909,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 ostream::sentry __cerb(__os);
 if (__cerb)
   {
-
-   const streamsize __w = __os.width();
-   const bool __left
- = (__os.flags() & ios_base::adjustfield) == ios_base::left;
-
__format::_Str_sink __buf;
std::vformat_to(__buf.out(), __os.getloc(), __fmt, __args);
auto __out = __buf.view();
@@ -938,18 +919,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// If stream refers to a terminal, write a Unicode string to it.
if (auto __term = __open_terminal(__os.rdbuf()))
  {
-   __format::_Str_sink __buf2;
-   if (__w != 0)
- {
-   char __fmt[] = "{0:..{1}}";
-   __fmt[3] == __os.fill();
-   __fmt[4] == __left ? '<' : '>';
-   string_view __str(__out);
-   std::vformat_to(__buf2.out(), // N.B. no need to use getloc()
-   __fmt, std::make_format_args(__str, __w));
-   __out = __buf2.view();
- }
-
ios_base::iostate __err = ios_base::goodbit;
__try
  {
@@ -981,18 +950,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// Otherwise just insert the string as normal.
__try
  {
-   const streamsize __n = __out.size();
-   if (__w > __n)
- {
-   if (!__left)
- std::__ostream_fill(__os, __w - __n);
-   if (__os.good())
- std::__ostream_write(__os, __out.data(), __n);
-   if (__left && __os.good())
- std::__ostream_fill(__os, __w - __n);
- }
-   else
- std::__ostream_write(__os, __out.data(), __n);
+   std::__ostream_write(__os, __out.data(), __out.size());
  }
__catch(const __cxxabiv1::__forced_unwind&)
  {
diff --git a/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc 
b/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc
index 28dc8af33e6..b3abc570d1e 100644
--- a/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc
@@ -42,14 +42,16 @@ test_print_raw()
 }
 
 void
-test_print_formatted()
+test_print_no_padding()
 {
+  // [ostream.formatted.print] does not say this function "determines padding",
+  // see https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640680.html
   char buf[64];
   std::spanstream os(buf);
-  os << std::setw(20) << std::setfill('*') << std::right;
+  os << std::setw(60) << std::setfill('?') << std::right; // should be ignored
   std::print(os, "{} Luftballons", 99);
   std::string_view txt(os.span());
-  VERIFY( txt == "**99 Luftballons" );
+  VERIFY( txt == "99 Luftballons" );
 }
 
 void
@@ -106,7 +108,7 @@ int main()
   test_print_ostream();
   test_println_ostream();
   test_print_raw();
-  test_print_formatted();
+  test_print_no_padding();
   test_vprint_nonunicode();
   test_locale();
 }
-- 
2.43.0



[PATCH] LoongArch: Remove constraint z from movsi_internal

2023-12-15 Thread Xi Ruoyao
We don't allow SImode in FCC, so constraint z is never really used
here.

gcc/ChangeLog:

* config/loongarch/loongarch.md (movsi_internal): Remove
constraint z.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index a5d0dcd65fe..404a663c1a6 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2108,8 +2108,8 @@ (define_expand "movsi"
 })
 
 (define_insn_and_split "*movsi_internal"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,w,*f,f,*r,*m,*r,*z")
-   (match_operand:SI 1 "move_operand" "r,Yd,w,rJ,*r*J,m,*f,*f,*z,*r"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,w,*f,f,*r,*m")
+   (match_operand:SI 1 "move_operand" "r,Yd,w,rJ,*r*J,m,*f,*f"))]
   "(register_operand (operands[0], SImode)
 || reg_or_0_operand (operands[1], SImode))"
   { return loongarch_output_move (operands[0], operands[1]); }
@@ -2122,7 +2122,7 @@ (define_insn_and_split "*movsi_internal"
   DONE;
 }
   "
-  [(set_attr "move_type" 
"move,const,load,store,mgtf,fpload,mftg,fpstore,mftg,mgtf")
+  [(set_attr "move_type" "move,const,load,store,mgtf,fpload,mftg,fpstore")
(set_attr "mode" "SI")])
 
 ;; 16-bit Integer moves
-- 
2.43.0



Re: [PATCH v7 4/5] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic

2023-12-15 Thread Thomas Schwinge
Hi!

On 2023-12-14T15:26:38+0100, Tobias Burnus  wrote:
> On 19.08.23 00:47, Julian Brown wrote:
>> This patch adds support for non-constant component offsets in "map"
>> clauses for OpenMP (and the equivalants for OpenACC) [...]

Should eventually also add some OpenACC test cases?


> LGTM with:
>
> - inclusion of your follow-up fix for shared-memory systems (see email
> of August 21)

This was applied here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c

>> +/* { dg-output "(\n|\r|\r\n)" } */
>> +/* { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" } */
>> +/* { dg-shouldfail "" { offload_device_nonshared_as } } */

..., and here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c

>> +/* { dg-output "(\n|\r|\r\n)" } */
>> +/* { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" } */
>> +/* { dg-shouldfail "" { offload_device_nonshared_as } } */

..., but not here:

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90

>> +! { dg-output "(\n|\r|\r\n)" }
>> +! { dg-output "libgomp: Mapped array elements must be the same 
>> .*(\n|\r|\r\n)+" }
>> +! { dg-shouldfail "" { offload_device_nonshared_as } }

Pushed to master branch commit bc7546e32c5a942e240ef97776352d21105ef291
"In 'libgomp.fortran/map-subarray-5.f90', restrict 'dg-output's to 'target 
offload_device_nonshared_as'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From bc7546e32c5a942e240ef97776352d21105ef291 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 15 Dec 2023 13:05:24 +0100
Subject: [PATCH] In 'libgomp.fortran/map-subarray-5.f90', restrict
 'dg-output's to 'target offload_device_nonshared_as'

..., as in 'libgomp.c-c++-common/map-arrayofstruct-{2,3}.c'.

Minor fix-up for commit f5745dc1426bdb1a53ebaf7af758b2250ccbff02
"OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic".

	libgomp/
	* testsuite/libgomp.fortran/map-subarray-5.f90: Restrict
	'dg-output's to 'target offload_device_nonshared_as'.
---
 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
index e7cdf11e610..59ad01ab76b 100644
--- a/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
+++ b/libgomp/testsuite/libgomp.fortran/map-subarray-5.f90
@@ -49,6 +49,6 @@ end do
 
 end
 
-! { dg-output "(\n|\r|\r\n)" }
-! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" }
+! { dg-output "(\n|\r|\r\n)" { target offload_device_nonshared_as } }
+! { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" { target offload_device_nonshared_as } }
 ! { dg-shouldfail "" { offload_device_nonshared_as } }
-- 
2.34.1



Re: [PATCH V2] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-15 Thread Robin Dapp
LGTM.

Regards
 Robin



[PATCH V2] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-15 Thread Juzhe-Zhong
This patch fixes the following FAILs in "full coverage" testing:

Running target 
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/vect-strided-mult-char-ls.c -flto -ffat-lto-objects execution 
test
FAIL: gcc.dg/vect/vect-strided-mult-char-ls.c execution test
FAIL: gcc.dg/vect/vect-strided-u8-i2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-strided-u8-i2.c execution test

The root cause is vmerge optimization on this following IR:

_45 = VEC_PERM_EXPR ;

It's obvious we have many index > 255 in shuffle indice. Here we use vmerge 
optimizaiton which is available but incorrect codgen cause run fail.

The bug codegen:
vsetvli zero,a4,e8,m8,ta,ma
vmsltu.vi   v0,v0,0 -> it should be 256 instead of 0, but since 
it is EEW8 vector, 256 is not a available value that 8bit register can hold it.
vmerge.vvm  v8,v8,v16,v0

After this patch:
vmv.v.x v0,a6
vmerge.vvm  v8,v8,v16,v0

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_merge_patterns): Fix bug.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/bug-1.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 59 ---
 .../gcc.target/riscv/rvv/autovec/bug-1.c  | 39 
 2 files changed, 90 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-1.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 680e2a0e03a..eade8db4cf1 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2987,20 +2987,63 @@ shuffle_merge_patterns (struct expand_vec_perm_d *d)
&& !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns))
   return false;
 
+  /* We need to use precomputed mask for such situation and such mask
+ can only be computed in compile-time known size modes.  */
+  bool indices_fit_selector_p
+= GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 256);
+  if (!indices_fit_selector_p && !vec_len.is_constant ())
+return false;
+
   if (d->testing_p)
 return true;
 
   machine_mode mask_mode = get_mask_mode (vmode);
   rtx mask = gen_reg_rtx (mask_mode);
 
-  rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
-
-  /* MASK = SELECTOR < NUNTIS ? 1 : 0.  */
-  rtx x = gen_int_mode (vec_len, GET_MODE_INNER (sel_mode));
-  insn_code icode = code_for_pred_cmp_scalar (sel_mode);
-  rtx cmp = gen_rtx_fmt_ee (LTU, mask_mode, sel, x);
-  rtx ops[] = {mask, cmp, sel, x};
-  emit_vlmax_insn (icode, COMPARE_OP, ops);
+  if (indices_fit_selector_p)
+{
+  /* MASK = SELECTOR < NUNTIS ? 1 : 0.  */
+  rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
+  rtx x = gen_int_mode (vec_len, GET_MODE_INNER (sel_mode));
+  insn_code icode = code_for_pred_cmp_scalar (sel_mode);
+  rtx cmp = gen_rtx_fmt_ee (LTU, mask_mode, sel, x);
+  rtx ops[] = {mask, cmp, sel, x};
+  emit_vlmax_insn (icode, COMPARE_OP, ops);
+}
+  else
+{
+  /* For EEW8 and NUNITS may be larger than 255, we can't use vmsltu
+directly to generate the selector mask, instead, we can only use
+precomputed mask.
+
+E.g. selector = <0, 257, 2, 259> for EEW8 vector with NUNITS = 256, we
+don't have a QImode scalar register to hold larger than 255.
+We also cannot hold that in a vector QImode register if LMUL = 8, and,
+since there is no larger HI mode vector we cannot create a larger
+selector.
+
+As the mask is a simple {0, 1, ...} pattern and the length is known we
+can store it in a scalar register and broadcast it to a mask register.
+   */
+  gcc_assert (vec_len.is_constant ());
+  int size = CEIL (GET_MODE_NUNITS (mask_mode).to_constant (), 8);
+  machine_mode mode = get_vector_mode (QImode, size).require ();
+  rtx tmp = gen_reg_rtx (mode);
+  rvv_builder v (mode, 1, size);
+  for (int i = 0; i < vec_len.to_constant () / 8; i++)
+   {
+ uint8_t value = 0;
+ for (int j = 0; j < 8; j++)
+   {
+ int index = i * 8 + j;
+ if (known_lt (d->perm[index], 256))
+   value |= 1 << j;
+   }
+ v.quick_push (gen_int_mode (value, QImode));
+   }
+  emit_move_insn (tmp, v.build ());
+  emit_move_insn (mask, gen_lowpart (mask_mode, tmp));
+}
 
   /* TARGET = MASK ? OP0 : OP1.  */
   /* swap op0 and op1 since the order is opposite to pred_merge.  */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-1.c
new file mode 100644
index 000..88059971503
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/bug-1.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d 
--param=riscv-autovec-lmul=m8 

Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-15 Thread Robin Dapp
On 12/15/23 13:52, juzhe.zh...@rivai.ai wrote:
> Do you mean :
> 
>   /* We need to use precomputed mask for such situation and such mask
>      can only be computed in compile-time known size modes.  */
>   bool indices_fit_selector_p
>     = GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 
> 256);
>   if (!indices_fit_selector_p && !vec_len.is_constant ())
>     return false;

Yes and then reuse this in the if.

Regards
 Robin


Re: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-15 Thread juzhe.zh...@rivai.ai
Do you mean :

  /* We need to use precomputed mask for such situation and such mask
 can only be computed in compile-time known size modes.  */
  bool indices_fit_selector_p
= GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 256);
  if (!indices_fit_selector_p && !vec_len.is_constant ())
return false;



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-15 20:44
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm 
vectorization
> Oh. I think it should be renamed into not_fit.
> 
> Is this following make sense to you ?
> 
>   /* We need to use precomputed mask for such situation and such mask
>  can only be computed in compile-time known size modes.  */
>   bool indices_not_fit_selector_p
> = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
>   if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8
>   && indices_not_fit_selector_p
>   && !vec_len.is_constant ())
> return false;
 
Mhm, right, I don't think this makes it nicer overall.  Maybe just like
the following then:
 
bool ..._p = GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt 
(vec_len, 256);
 
if (!..._p && !vec_len.is_constant ())
 
then later
 
if (..._p)
...
else
...
 
Regards
Robin
 
 


Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-15 Thread Robin Dapp
> Oh. I think it should be renamed into not_fit.
> 
> Is this following make sense to you ?
> 
>   /* We need to use precomputed mask for such situation and such mask
>      can only be computed in compile-time known size modes.  */
>   bool indices_not_fit_selector_p
>     = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
>   if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8
>       && indices_not_fit_selector_p
>       && !vec_len.is_constant ())
>     return false;

Mhm, right, I don't think this makes it nicer overall.  Maybe just like
the following then:

bool ..._p = GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt 
(vec_len, 256);

if (!..._p && !vec_len.is_constant ())

then later

if (..._p)
...
else
...

Regards
 Robin



Re: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-15 Thread juzhe.zh...@rivai.ai
Oh. I think it should be renamed into not_fit.

Is this following make sense to you ?

  /* We need to use precomputed mask for such situation and such mask
 can only be computed in compile-time known size modes.  */
  bool indices_not_fit_selector_p
= maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
  if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8
  && indices_not_fit_selector_p
  && !vec_len.is_constant ())
return false;



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-15 20:25
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm 
vectorization
On 12/15/23 13:16, juzhe.zh...@rivai.ai wrote:
> 
>>> bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE 
>>> (GET_MODE_INNER (vmode)));
> No, I think it will make us miss some optimization.
> 
> For example, for poly value [16,16]  maybe_ge ([16,16], 65536) which makes us 
> missed merge optimization but
> we definitely can do merge optimization.
 
I didn't mean to skip the && !vec_len.is_constant (), that should
stay.  Just the first part of condition that can be re-used in the
if as well (inverted).
 
Regards
Robin
 


Re: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-15 Thread juzhe.zh...@rivai.ai
Do you mean like this ?

  /* We need to use precomputed mask for such situation and such mask
 can only be computed in compile-time known size modes.  */
  bool indices_fit_selector_p
= maybe_ge (vec_len, 2 << GET_MODE_BITSIZE (GET_MODE_INNER (vmode)));
  if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8、
  && indices_fit_selector_p
  && !vec_len.is_constant ())
return false;



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-15 20:25
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm 
vectorization
On 12/15/23 13:16, juzhe.zh...@rivai.ai wrote:
> 
>>> bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE 
>>> (GET_MODE_INNER (vmode)));
> No, I think it will make us miss some optimization.
> 
> For example, for poly value [16,16]  maybe_ge ([16,16], 65536) which makes us 
> missed merge optimization but
> we definitely can do merge optimization.
 
I didn't mean to skip the && !vec_len.is_constant (), that should
stay.  Just the first part of condition that can be re-used in the
if as well (inverted).
 
Regards
Robin
 


Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-15 Thread Robin Dapp
On 12/15/23 13:16, juzhe.zh...@rivai.ai wrote:
> 
>>> bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE 
>>> (GET_MODE_INNER (vmode)));
> No, I think it will make us miss some optimization.
> 
> For example, for poly value [16,16]  maybe_ge ([16,16], 65536) which makes us 
> missed merge optimization but
> we definitely can do merge optimization.

I didn't mean to skip the && !vec_len.is_constant (), that should
stay.  Just the first part of condition that can be re-used in the
if as well (inverted).

Regards
 Robin


Re: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-15 Thread juzhe.zh...@rivai.ai

>> bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE 
>> (GET_MODE_INNER (vmode)));
No, I think it will make us miss some optimization.

For example, for poly value [16,16]  maybe_ge ([16,16], 65536) which makes us 
missed merge optimization but
we definitely can do merge optimization.

>> Also add a comment that the non-constant case is handled by
>> shuffle_decompress_patterns in case we have a HImode vector twice the
>> size that can hold our indices.

Ok.


>> Comment should go inside the if branch.
Ok.


>>Also add this as comment:

>>As the mask is a simple {0, 1, ...} pattern and the length is known we can
>>store it in a scalar register and broadcast it to a mask register.

Ok.

>>I would have hoped that a simple

>>  v.quick_push (gen_int_mode (0b01010101, QImode));

>>suffices but that will probably clash if there are more than
>>two npatterns.

No, we definitely can not use this. more details you can see the current test 
vmerge-*.c .
We have various patterns:

E.g.
0, nunits + 1, nunits+ 2, ... it is 011

nunits, 1, 2 it 100.


Many different kinds of patterns can be used vmerge optimization.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-15 19:14
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm 
vectorization
Hi Juzhe,
 
in general looks OK.
 
> +  /* We need to use precomputed mask for such situation and such mask
> + can only be computed in compile-time known size modes.  */
> +  if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8 && maybe_ge (vec_len, 
> 256)
> +  && !vec_len.is_constant ())
> +return false;
> +
 
We could make this a separate variable like:
bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE 
(GET_MODE_INNER (vmode)));
 
Also add a comment that the non-constant case is handled by
shuffle_decompress_patterns in case we have a HImode vector twice the
size that can hold our indices.
 
>/* MASK = SELECTOR < NUNTIS ? 1 : 0.  */
 
Comment should go inside the if branch.
 
> +  if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 
> 256))
 
> +}
> +  else
> +{
> +  /* For EEW8 and NUNITS may be larger than 255, we can't use vmsltu
> + directly to generate the selector mask, instead, we can only use
> + precomputed mask.
 
I find that comment a bit misleading as it's not the vmsltu itself but
rather that the indices cannot be held.
 
> +
> + E.g. selector = <0, 257, 2, 259> for EEW8 vector with NUNITS = 256, we
> + don't have a QImode scalar register to hold larger than 255.  */
 
We also cannot hold that in a vector QImode register, and, since there
is no larger HI mode vector we cannot create a larger selector.
 
Also add this as comment:
 
As the mask is a simple {0, 1, ...} pattern and the length is known we can
store it in a scalar register and broadcast it to a mask register.
 
> +  gcc_assert (vec_len.is_constant ());
> +  int size = CEIL (GET_MODE_NUNITS (mask_mode).to_constant (), 8);
> +  machine_mode mode = get_vector_mode (QImode, size).require ();
> +  rtx tmp = gen_reg_rtx (mode);
> +  rvv_builder v (mode, 1, size);
> +  for (int i = 0; i < vec_len.to_constant () / 8; i++)
> + {
> +   uint8_t value = 0;
> +   for (int j = 0; j < 8; j++)
> + {
> +   int index = i * 8 + j;
> +   if (known_lt (d->perm[index], 256))
> + value |= 1 << j;
> + }
 
I would have hoped that a simple
 
  v.quick_push (gen_int_mode (0b01010101, QImode));
 
suffices but that will probably clash if there are more than
two npatterns.
 
Regards
Robin
 
 


Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2023-12-15 Thread Lipeng Zhu




On 2023/12/14 23:50, Richard Earnshaw (lists) wrote:

On 09/12/2023 15:39, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.



It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In function
‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’ [-Wimplicit-function-declaration]
  1023 |   WRLOCK (_rwlock);
   |   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’ [-Wimplicit-function-declaration]
  1025 |   RWUNLOCK (_rwlock);
   |   ^~~~


R.


Hi Richard,

The root cause is that the macro WRLOCK and RWUNLOCK are not defined in 
io.h. The reason of x86 platform not failed is that 
HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never 
been used. Code logic show as below:

#ifdef HAVE_ATOMIC_FETCH_ADD
  (void) __atomic_fetch_add (>waiting, -1, __ATOMIC_RELAXED);
#else
  WRLOCK (_rwlock);
  u->waiting--;
  RWUNLOCK (_rwlock);
#endif

I just draft a patch try to fix this bug, because I didn't have arm 
platform, would you help to validate if it was fixed on arm platform?


diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 15daa0995b1..c7f0f7d7d9e 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -1020,9 +1020,15 @@ dec_waiting_unlocked (gfc_unit *u)
 #ifdef HAVE_ATOMIC_FETCH_ADD
   (void) __atomic_fetch_add (>waiting, -1, __ATOMIC_RELAXED);
 #else
-  WRLOCK (_rwlock);
+#ifdef __GTHREAD_RWLOCK_INIT
+  __gthread_rwlock_wrlock (_rwlock);
+  u->waiting--;
+  __gthread_rwlock_unlock (_rwlock);
+#else
+  __gthread_mutex_lock (_rwlock);
   u->waiting--;
-  RWUNLOCK (_rwlock);
+  __gthread_mutex_unlock (_rwlock);
+#endif
 #endif
 }


Lipeng Zhu


[PATCH] tree-optimization/113026 - avoid vector epilog in more cases

2023-12-15 Thread Richard Biener
The following avoids creating a niter peeling epilog more consistently,
matching what peeling later uses for the skip_vector condition, in
particular when versioning is required which then also ensures the
vector loop is entered unless the epilog is vectorized.  This should
ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed
later, some refactoring could make that better matching.

The patch also makes sure to adjust the upper bound of the epilogues
when we do not have a skip edge around the vector loop.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Tamar, I assume this will clash with early break vectorization
a bit so I'll defer until after that's in.

Thanks,
Richard.

PR tree-optimization/113026
* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
Avoid an epilog in more cases.
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust the
epilogues niter upper bounds and estimates.

* gcc.dg/torture/pr113026-1.c: New testcase.
* gcc.dg/torture/pr113026-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/pr113026-1.c | 11 +++
 gcc/testsuite/gcc.dg/torture/pr113026-2.c | 18 ++
 gcc/tree-vect-loop-manip.cc   | 13 +
 gcc/tree-vect-loop.cc |  6 +-
 4 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-2.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-1.c 
b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
new file mode 100644
index 000..56dfef3b36c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr113026-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */ 
+/* { dg-additional-options "-Wall" } */
+
+char dst[16];
+
+void
+foo (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst[i] = src[i]; /* { dg-bogus "" } */
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-2.c 
b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
new file mode 100644
index 000..b9d5857a403
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr113026-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */ 
+/* { dg-additional-options "-Wall" } */
+
+char dst1[17];
+void
+foo1 (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst1[i] = src[i]; /* { dg-bogus "" } */
+}
+
+char dst2[18];
+void
+foo2 (char *src, long n)
+{
+  for (long i = 0; i < n; i++)
+dst2[i] = src[i]; /* { dg-bogus "" } */
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index bcd90a331f5..07a30b7ee98 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3193,6 +3193,19 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
bb_before_epilog->count = single_pred_edge 
(bb_before_epilog)->count ();
  bb_before_epilog = loop_preheader_edge (epilog)->src;
}
+  else
+   {
+ /* When we do not have a loop-around edge to the epilog we know
+the vector loop covered at least VF scalar iterations.  Update
+any known upper bound with this knowledge.  */
+ if (loop->any_upper_bound)
+   epilog->nb_iterations_upper_bound -= constant_lower_bound (vf);
+ if (loop->any_likely_upper_bound)
+   epilog->nb_iterations_likely_upper_bound -= constant_lower_bound 
(vf);
+ if (loop->any_estimate)
+   epilog->nb_iterations_estimate -= constant_lower_bound (vf);
+   }
+
   /* If loop is peeled for non-zero constant times, now niters refers to
 orig_niters - prolog_peeling, it won't overflow even the orig_niters
 overflows.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7a3db5f098b..a4dd2caa400 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1260,7 +1260,11 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info 
loop_vinfo)
 the epilogue is unnecessary.  */
  && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
  || ((unsigned HOST_WIDE_INT) max_niter
- > (th / const_vf) * const_vf
+ /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD
+but that's only computed later based on our result.
+The following is the most conservative approximation.  */
+ > (std::max ((unsigned HOST_WIDE_INT) th,
+  const_vf) / const_vf) * const_vf
 return true;
 
   return false;
-- 
2.35.3


Re: [PATCH] RISC-V: Fix vmerge optimization bug in vec_perm vectorization

2023-12-15 Thread Robin Dapp
Hi Juzhe,

in general looks OK.

> +  /* We need to use precomputed mask for such situation and such mask
> + can only be computed in compile-time known size modes.  */
> +  if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8 && maybe_ge (vec_len, 
> 256)
> +  && !vec_len.is_constant ())
> +return false;
> +

We could make this a separate variable like:
bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE 
(GET_MODE_INNER (vmode)));

Also add a comment that the non-constant case is handled by
shuffle_decompress_patterns in case we have a HImode vector twice the
size that can hold our indices.

>/* MASK = SELECTOR < NUNTIS ? 1 : 0.  */

Comment should go inside the if branch.

> +  if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_lt (vec_len, 
> 256))

> +}
> +  else
> +{
> +  /* For EEW8 and NUNITS may be larger than 255, we can't use vmsltu
> +  directly to generate the selector mask, instead, we can only use
> +  precomputed mask.

I find that comment a bit misleading as it's not the vmsltu itself but
rather that the indices cannot be held.

> +
> +  E.g. selector = <0, 257, 2, 259> for EEW8 vector with NUNITS = 256, we
> +  don't have a QImode scalar register to hold larger than 255.  */

We also cannot hold that in a vector QImode register, and, since there
is no larger HI mode vector we cannot create a larger selector.

Also add this as comment:

As the mask is a simple {0, 1, ...} pattern and the length is known we can
store it in a scalar register and broadcast it to a mask register.

> +  gcc_assert (vec_len.is_constant ());
> +  int size = CEIL (GET_MODE_NUNITS (mask_mode).to_constant (), 8);
> +  machine_mode mode = get_vector_mode (QImode, size).require ();
> +  rtx tmp = gen_reg_rtx (mode);
> +  rvv_builder v (mode, 1, size);
> +  for (int i = 0; i < vec_len.to_constant () / 8; i++)
> + {
> +   uint8_t value = 0;
> +   for (int j = 0; j < 8; j++)
> + {
> +   int index = i * 8 + j;
> +   if (known_lt (d->perm[index], 256))
> + value |= 1 << j;
> + }

I would have hoped that a simple

  v.quick_push (gen_int_mode (0b01010101, QImode));

suffices but that will probably clash if there are more than
two npatterns.

Regards
 Robin



Re: [committed] libstdc++: Implement C++23 header [PR107760]

2023-12-15 Thread Jonathan Wakely
On Fri, 15 Dec 2023 at 01:17, Tim Song wrote:
>
> On Thu, Dec 14, 2023 at 6:05 PM Jonathan Wakely  wrote:
>> +  inline void
>> +  vprint_unicode(ostream& __os, string_view __fmt, format_args __args)
>> +  {
>> +ostream::sentry __cerb(__os);
>> +if (__cerb)
>> +  {
>> +
>> +   const streamsize __w = __os.width();
>> +   const bool __left
>> + = (__os.flags() & ios_base::adjustfield) == ios_base::left;
>
>
> I'm pretty sure - when I wrote this wording anyway - that the intent was that 
> it was just an unformatted write at the end. The wording in 
> [ostream.formatted.print] doesn't use the "determines padding" words of power 
> that would invoke [ostream.formatted.reqmts]/3.

Ah, OK. I misunderstood "formatted output function" as implying
padding, failing to notice that we need those words of power to be
present. My thinking was that if the stream has padding set in its
format flags, it could be surprising if they're ignored by a formatted
output function. And padding in the format string applies to
individual replacement fields, not the whole string, and it's hard to
use the stream's fill character and alignment.

You can do this to use the ostream's width:

std::print("{0:{1}}", std::format(...), os.width());

But to reuse its fill char and adjustfield you need to do something
awful like I did in the committed code:

std::string_view align;
if (os.flags() & ios::adjustfield) == ios::right)
  align = ">"
auto fs = std::format("{{:{}{}{}}}", os.fill(), align, os.width());
std::vprint_nonunicode(os, fs, std::make_args(std::format(...)));

And now you have to hardcode a choice between vprint_unicode and
vprint_nonunicode, instead of letting std::print decide it. Let's hope
nobody ever needs to do any of that ;-)

I'll remove the code for padding the padding, thanks for checking the patch.



Re: [PATCH] bitint: Introduce abi_limb_mode

2023-12-15 Thread Richard Biener
On Thu, 14 Dec 2023, Jakub Jelinek wrote:

> Hi!
> 
> Given what I saw in the aarch64/arm psABIs for BITINT_TYPE, as I said
> earlier I'm afraid we need to differentiate between the limb mode/precision
> specified in the psABIs (what is used to decide how it is actually passed,
> aligned or what size it has) vs. what limb mode/precision should be used
> during bitint lowering and in the libgcc bitint APIs.
> While in the x86_64 psABI a limb is 64-bit, which is perfect for both,
> that is a wordsize which we can perform operations natively in,
> e.g. aarch64 wants 128-bit limbs for alignment/sizing purposes, but
> on the bitint lowering side I believe it would result in terribly bad code
> and on the libgcc side wouldn't work at all (because it relies there on
> longlong.h support).
> 
> So, the following patch makes it possible for aarch64 to use TImode
> as abi_limb_mode for _BitInt(129) and larger, while using DImode as
> limb_mode.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2023-12-14  Jakub Jelinek  
> 
>   * target.h (struct bitint_info): Add abi_limb_mode member, adjust
>   comment.
>   * target.def (bitint_type_info): Mention abi_limb_mode instead of
>   limb_mode.
>   * varasm.cc (output_constant): Use abi_limb_mode rather than
>   limb_mode.
>   * stor-layout.cc (finish_bitfield_representative): Likewise.  Assert
>   that if precision is smaller or equal to abi_limb_mode precision or
>   if info.big_endian is different from WORDS_BIG_ENDIAN, info.limb_mode
>   must be the same as info.abi_limb_mode.
>   (layout_type): Use abi_limb_mode rather than limb_mode.
>   * gimple-fold.cc (clear_padding_bitint_needs_padding_p): Likewise.
>   (clear_padding_type): Likewise.
>   * config/i386/i386.cc (ix86_bitint_type_info): Also set
>   info->abi_limb_mode.
>   * doc/tm.texi: Regenerated.
> 
> --- gcc/target.h.jj   2023-09-06 17:28:24.228977486 +0200
> +++ gcc/target.h  2023-12-14 14:26:48.490047206 +0100
> @@ -69,15 +69,23 @@ union cumulative_args_t { void *p; };
>  #endif /* !CHECKING_P */
>  
>  /* Target properties of _BitInt(N) type.  _BitInt(N) is to be represented
> -   as series of limb_mode CEIL (N, GET_MODE_PRECISION (limb_mode)) limbs,
> -   ordered from least significant to most significant if !big_endian,
> +   as series of abi_limb_mode CEIL (N, GET_MODE_PRECISION (abi_limb_mode))
> +   limbs, ordered from least significant to most significant if !big_endian,
> otherwise from most significant to least significant.  If extended is
> false, the bits above or equal to N are undefined when stored in a 
> register
> or memory, otherwise they are zero or sign extended depending on if
> -   it is unsigned _BitInt(N) or _BitInt(N) / signed _BitInt(N).  */
> +   it is unsigned _BitInt(N) or _BitInt(N) / signed _BitInt(N).
> +   limb_mode is either the same as abi_limb_mode, or some narrower mode
> +   in which _BitInt lowering should actually perform operations in and
> +   what libgcc _BitInt helpers should use.
> +   E.g. abi_limb_mode could be TImode which is something some processor
> +   specific ABI would specify to use, but it would be desirable to handle
> +   it as an array of DImode instead for efficiency.
> +   Note, abi_limb_mode can be different from limb_mode only if big_endian
> +   matches WORDS_BIG_ENDIAN.  */
>  
>  struct bitint_info {
> -  machine_mode limb_mode;
> +  machine_mode abi_limb_mode, limb_mode;
>bool big_endian;
>bool extended;
>  };
> --- gcc/target.def.jj 2023-12-08 08:28:23.644171016 +0100
> +++ gcc/target.def2023-12-14 14:27:25.239537794 +0100
> @@ -6357,8 +6357,8 @@ DEFHOOK
>  (bitint_type_info,
>   "This target hook returns true if @code{_BitInt(@var{N})} is supported 
> and\n\
>  provides details on it.  @code{_BitInt(@var{N})} is to be represented as\n\
> -series of @code{info->limb_mode}\n\
> -@code{CEIL (@var{N}, GET_MODE_PRECISION (info->limb_mode))} limbs,\n\
> +series of @code{info->abi_limb_mode}\n\
> +@code{CEIL (@var{N}, GET_MODE_PRECISION (info->abi_limb_mode))} limbs,\n\
>  ordered from least significant to most significant if\n\
>  @code{!info->big_endian}, otherwise from most significant to least\n\
>  significant.  If @code{info->extended} is false, the bits above or equal 
> to\n\
> --- gcc/varasm.cc.jj  2023-12-01 08:10:44.504299177 +0100
> +++ gcc/varasm.cc 2023-12-14 14:55:45.821971713 +0100
> @@ -5315,7 +5315,8 @@ output_constant (tree exp, unsigned HOST
> tree type = TREE_TYPE (exp);
> bool ok = targetm.c.bitint_type_info (TYPE_PRECISION (type), );
> gcc_assert (ok);
> -   scalar_int_mode limb_mode = as_a  (info.limb_mode);
> +   scalar_int_mode limb_mode
> + = as_a  (info.abi_limb_mode);
> if (TYPE_PRECISION (type) <= GET_MODE_PRECISION (limb_mode))
>   {
> cst = expand_expr (exp, NULL_RTX, VOIDmode, EXPAND_INITIALIZER);
> --- 

Re: [PATCH v2 2/4] libgrust: Add libproc_macro and build system

2023-12-15 Thread Thomas Schwinge
Hi Jason!

I think you usually deal with these kind of GCC Git things?  If not,
please let me know.

On 2023-10-26T10:21:18+0200, I wrote:
> First, I've pushed into GCC upstream Git branch devel/rust/libgrust-v2
> the "v2" libgrust changes as posted by Arthur, so that people can easily
> test this before it getting into Git master branch.  [...]

Please now delete the GCC Git 'devel/rust/libgrust-v2' branch, which was
only used temporarily, and is now obsolete.

$ git push upstream :devel/rust/libgrust-v2
remote: *** Deleting branch 'devel/rust/libgrust-v2' is not allowed.
remote: *** 
remote: *** This repository currently only allow the deletion of references
remote: *** whose name matches the following:
remote: *** 
remote: *** refs/users/[^/]*/heads/.*
remote: *** refs/vendors/[^/]*/heads/.*
remote: *** 
remote: *** Branch deletion is only allowed for user and vendor branches.  
If another branch was created by mistake, contact an administrator to delete it 
on the server with git update-ref.  If a development branch is dead, also 
contact an administrator to move it under refs/dead/heads/ rather than deleting 
it.
remote: error: hook declined to update refs/heads/devel/rust/libgrust-v2
To git+ssh://gcc.gnu.org/git/gcc.git
 ! [remote rejected]   devel/rust/libgrust-v2 (hook declined)
error: failed to push some refs to 'git+ssh://gcc.gnu.org/git/gcc.git'


Grüße
 Thomas


RE: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-15 Thread Thomas Schwinge
Hi!

On 2023-12-13T08:14:28+, Di Zhao OS  wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr110279-2.c
> @@ -0,0 +1,41 @@
> +/* PR tree-optimization/110279 */
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast --param tree-reassoc-width=4 --param 
> fully-pipelined-fma=1 -fdump-tree-reassoc2-details -fdump-tree-optimized" } */
> +/* { dg-additional-options "-march=armv8.2-a" { target aarch64-*-* } } */
> +
> +#define LOOP_COUNT 8
> +typedef double data_e;
> +
> +#include 
> +
> +__attribute_noinline__ data_e
> +foo (data_e in)

Pushed to master branch commit 91e9e8faea4086b3b8aef2355fc12c1559d425f6
"Fix 'gcc.dg/pr110279-2.c' syntax error due to '__attribute_noinline__'",
see attached.

However:

> +{
> +  data_e a1, a2, a3, a4;
> +  data_e tmp, result = 0;
> +  a1 = in + 0.1;
> +  a2 = in * 0.1;
> +  a3 = in + 0.01;
> +  a4 = in * 0.59;
> +
> +  data_e result2 = 0;
> +
> +  for (int ic = 0; ic < LOOP_COUNT; ic++)
> +{
> +  /* Test that a complete FMA chain with length=4 is not broken.  */
> +  tmp = a1 + a2 * a2 + a3 * a3 + a4 * a4 ;
> +  result += tmp - ic;
> +  result2 = result2 / 2 - tmp;
> +
> +  a1 += 0.91;
> +  a2 += 0.1;
> +  a3 -= 0.01;
> +  a4 -= 0.89;
> +
> +}
> +
> +  return result + result2;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "was chosen for reassociation" 
> "reassoc2"} } */
> +/* { dg-final { scan-tree-dump-times {\.FMA } 3 "optimized"} } */

..., I still see these latter two tree dump scans FAIL, for GCN:

$ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
  2 *: a3_40
  2 *: a2_39
Width = 4 was chosen for reassociation
Transforming _15 = powmult_1 + powmult_3;
 into _63 = powmult_1 + a1_38;
$ grep -F .FMA pr110279-2.c.265t.optimized
  _63 = .FMA (a2_39, a2_39, a1_38);
  _64 = .FMA (a3_40, a3_40, powmult_5);

..., nvptx:

$ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
  2 *: a3_40
  2 *: a2_39
Width = 4 was chosen for reassociation
Transforming _15 = powmult_1 + powmult_3;
 into _63 = powmult_1 + a1_38;
$ grep -F .FMA pr110279-2.c.265t.optimized
  _63 = .FMA (a2_39, a2_39, a1_38);
  _64 = .FMA (a3_40, a3_40, powmult_5);

..., but also x86_64-pc-linux-gnu:

$  grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2
  2 *: a3_40
  2 *: a2_39
Width = 2 was chosen for reassociation
Transforming _15 = powmult_1 + powmult_3;
 into _63 = powmult_1 + powmult_3;
$ grep -cF .FMA pr110279-2.c.265t.optimized
0


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 91e9e8faea4086b3b8aef2355fc12c1559d425f6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 15 Dec 2023 10:03:12 +0100
Subject: [PATCH] Fix 'gcc.dg/pr110279-2.c' syntax error due to
 '__attribute_noinline__'

For example, for GCN or nvptx target configurations, using newlib:

FAIL: gcc.dg/pr110279-2.c (test for excess errors)
UNRESOLVED: gcc.dg/pr110279-2.c scan-tree-dump-not reassoc2 "was chosen for reassociation"
UNRESOLVED: gcc.dg/pr110279-2.c scan-tree-dump-times optimized "\\.FMA " 3

[...]/source-gcc/gcc/testsuite/gcc.dg/pr110279-2.c:11:1: error: unknown type name '__attribute_noinline__'
[...]/source-gcc/gcc/testsuite/gcc.dg/pr110279-2.c:12:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'foo'

We cannot assume 'stdio.h' to define '__attribute_noinline__' -- but then, that
also isn't necessary for this test case (there is nothing to inline into).

	gcc/testsuite/
	* gcc.dg/pr110279-2.c: Don't '#include '.  Remove
	'__attribute_noinline__'.
---
 gcc/testsuite/gcc.dg/pr110279-2.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr110279-2.c b/gcc/testsuite/gcc.dg/pr110279-2.c
index 0304a77aa66..b6b69969c6b 100644
--- a/gcc/testsuite/gcc.dg/pr110279-2.c
+++ b/gcc/testsuite/gcc.dg/pr110279-2.c
@@ -6,9 +6,7 @@
 #define LOOP_COUNT 8
 typedef double data_e;
 
-#include 
-
-__attribute_noinline__ data_e
+data_e
 foo (data_e in)
 {
   data_e a1, a2, a3, a4;
-- 
2.34.1



[PATCH v2] LoongArch: Implement FCCmode reload and cstore4

2023-12-15 Thread Xi Ruoyao
We used a branch to load floating-point comparison results into GPR.
This is very slow when the branch is not predictable.

Implement movfcc so we can reload FCCmode into GPRs, FPRs, and MEM.
Then implement cstore4.

gcc/ChangeLog:

* config/loongarch/loongarch-tune.h
(loongarch_rtx_cost_data::movcf2gr): New field.
(loongarch_rtx_cost_data::movcf2gr_): New method.
(loongarch_rtx_cost_data::use_movcf2gr): New method.
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Set movcf2gr
to COSTS_N_INSNS (7) and movgr2cf to COSTS_N_INSNS (15), based
on timing on LA464.
(loongarch_cpu_rtx_cost_data): Set movcf2gr and movgr2cf to
COSTS_N_INSNS (1) for LA664.
(loongarch_rtx_cost_optimize_size): Set movcf2gr and movgr2cf to
COSTS_N_INSNS (1) + 1.
* config/loongarch/predicates.md (loongarch_fcmp_operator): New
predicate.
* config/loongarch/loongarch.md (movfcc): Change to
define_expand.
(movfcc_internal): New define_insn.
(fcc_to_): New define_insn.
(cstore4): New define_expand.
* config/loongarch/loongarch.cc
(loongarch_hard_regno_mode_ok_uncached): Allow FCCmode in GPRs
and GPRs.
(loongarch_secondary_reload): Reload FCCmode via FPR and/or GPR.
(loongarch_emit_float_compare): Call gen_reg_rtx instead of
loongarch_allocate_fcc.
(loongarch_allocate_fcc): Remove.
(loongarch_move_to_gpr_cost): Handle FCC_REGS -> GR_REGS.
(loongarch_move_from_gpr_cost): Handle GR_REGS -> FCC_REGS.
(loongarch_register_move_cost): Handle FCC_REGS -> FCC_REGS,
FCC_REGS -> FP_REGS, and FP_REGS -> FCC_REGS.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/movcf2gr.c: New test.
* gcc.target/loongarch/movcf2gr-via-fr.c: New test.
---

Superseds
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640497.html.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch-def.cc | 13 +++-
 gcc/config/loongarch/loongarch-tune.h | 15 +++-
 gcc/config/loongarch/loongarch.cc | 70 ---
 gcc/config/loongarch/loongarch.md | 69 --
 gcc/config/loongarch/predicates.md|  4 ++
 .../gcc.target/loongarch/movcf2gr-via-fr.c| 10 +++
 gcc/testsuite/gcc.target/loongarch/movcf2gr.c |  9 +++
 7 files changed, 157 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/movcf2gr-via-fr.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/movcf2gr.c

diff --git a/gcc/config/loongarch/loongarch-def.cc 
b/gcc/config/loongarch/loongarch-def.cc
index 4a8885e8343..843be78e46e 100644
--- a/gcc/config/loongarch/loongarch-def.cc
+++ b/gcc/config/loongarch/loongarch-def.cc
@@ -101,15 +101,21 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
 int_mult_di (COSTS_N_INSNS (4)),
 int_div_si (COSTS_N_INSNS (5)),
 int_div_di (COSTS_N_INSNS (5)),
+movcf2gr (COSTS_N_INSNS (7)),
+movgr2cf (COSTS_N_INSNS (15)),
 branch_cost (6),
 memory_latency (4) {}
 
 /* The following properties cannot be looked up directly using "cpucfg".
  So it is necessary to provide a default value for "unknown native"
  tune targets (i.e. -mtune=native while PRID does not correspond to
- any known "-mtune" type).  Currently all numbers are default.  */
+ any known "-mtune" type).  */
 array_tune loongarch_cpu_rtx_cost_data =
-  array_tune ();
+  array_tune ()
+.set (CPU_LA664,
+ loongarch_rtx_cost_data ()
+   .movcf2gr_ (COSTS_N_INSNS (1))
+   .movgr2cf_ (COSTS_N_INSNS (1)));
 
 /* RTX costs to use when optimizing for size.
We use a value slightly larger than COSTS_N_INSNS (1) for all of them
@@ -125,7 +131,8 @@ const loongarch_rtx_cost_data 
loongarch_rtx_cost_optimize_size =
 .int_mult_si_ (COST_COMPLEX_INSN)
 .int_mult_di_ (COST_COMPLEX_INSN)
 .int_div_si_ (COST_COMPLEX_INSN)
-.int_div_di_ (COST_COMPLEX_INSN);
+.int_div_di_ (COST_COMPLEX_INSN)
+.movcf2gr_ (COST_COMPLEX_INSN);
 
 array_tune loongarch_cpu_issue_rate = array_tune ()
   .set (CPU_NATIVE, 4)
diff --git a/gcc/config/loongarch/loongarch-tune.h 
b/gcc/config/loongarch/loongarch-tune.h
index 4aa01c54c08..7a75c8dd9d9 100644
--- a/gcc/config/loongarch/loongarch-tune.h
+++ b/gcc/config/loongarch/loongarch-tune.h
@@ -35,6 +35,8 @@ struct loongarch_rtx_cost_data
   unsigned short int_mult_di;
   unsigned short int_div_si;
   unsigned short int_div_di;
+  unsigned short movcf2gr;
+  unsigned short movgr2cf;
   unsigned short branch_cost;
   unsigned short memory_latency;
 
@@ -95,6 +97,18 @@ struct loongarch_rtx_cost_data
 return *this;
   }
 
+  loongarch_rtx_cost_data movcf2gr_ (unsigned short _movcf2gr)
+  {
+movcf2gr = _movcf2gr;
+return *this;
+  }
+
+  loongarch_rtx_cost_data movgr2cf_ (unsigned short 

[PATCH] sel-sched: Verify change before replacing dest in EXPR_INSN_RTX [PR112995]

2023-12-15 Thread Kewen.Lin
Hi,

PR112995 exposed one issue in current try_replace_dest_reg
that the result rtx insn after replace_dest_with_reg_in_expr
is probably unable to match any constraints.  Although there
are some checks on the changes onto dest or src of orig_insn,
none is performed on the EXPR_INSN_RTX.

This patch is to add the check before actually replacing dest
in expr with reg.

Bootstrapped and regtested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-
PR rtl-optimization/112995

gcc/ChangeLog:

* sel-sched.cc (try_replace_dest_reg): Check the validity of the
replaced insn before actually replacing dest in expr.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr112995.c: New test.
---
 gcc/sel-sched.cc| 10 +-
 gcc/testsuite/gcc.target/powerpc/pr112995.c | 14 ++
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr112995.c

diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc
index 1925f4a9461..a35b5e16c91 100644
--- a/gcc/sel-sched.cc
+++ b/gcc/sel-sched.cc
@@ -1614,7 +1614,15 @@ try_replace_dest_reg (ilist_t orig_insns, rtx best_reg, 
expr_t expr)
   /* Make sure that EXPR has the right destination
  register.  */
   if (expr_dest_regno (expr) != REGNO (best_reg))
-replace_dest_with_reg_in_expr (expr, best_reg);
+{
+  rtx_insn *vinsn = EXPR_INSN_RTX (expr);
+  validate_change (vinsn, _DEST (PATTERN (vinsn)), best_reg, 1);
+  bool res = verify_changes (0);
+  cancel_changes (0);
+  if (!res)
+   return false;
+  replace_dest_with_reg_in_expr (expr, best_reg);
+}
   else
 EXPR_TARGET_AVAILABLE (expr) = 1;

diff --git a/gcc/testsuite/gcc.target/powerpc/pr112995.c 
b/gcc/testsuite/gcc.target/powerpc/pr112995.c
new file mode 100644
index 000..4adcb5f3851
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr112995.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target float128 } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -fselective-scheduling2" } */
+
+/* Verify there is no ICE.  */
+
+int a[10];
+int b(_Float128 e) {
+  int c;
+  _Float128 d;
+  c = e;
+  d = c;
+  d = a[c] + d;
+  return d;
+}
--
2.39.3



[pushed] wwwdocs: projects/cli: Update ECMA reference

2023-12-15 Thread Gerald Pfeifer
Gerald

---
 htdocs/projects/cli.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/projects/cli.html b/htdocs/projects/cli.html
index 394832b6..47ddb362 100644
--- a/htdocs/projects/cli.html
+++ b/htdocs/projects/cli.html
@@ -460,7 +460,7 @@ allowing the user to provide a native implementation if 
necessary.
 [1] wwwdocs:
 
 
-ECMA, https://www.ecma-international.org/publications-and-standards/standards/ecma-335/;>
+ECMA, https://ecma-international.org/publications-and-standards/standards/ecma-335/;>
 Common Language Infrastructure (CLI), 4th edition, June 2006.
 
 
-- 
2.43.0


[pushed] doc: Update nvptx-tools Github link

2023-12-15 Thread Gerald Pfeifer
I pushed this obvious change.

Gerald


gcc:

* doc/install.texi (Specific) : Update nvptx-tools
Github link.
---
 gcc/doc/install.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index c1128d9274c..fffad700af7 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -4779,7 +4779,7 @@ Andes NDS32 target in big endian mode.
 Nvidia PTX target.
 
 Instead of GNU binutils, you will need to install
-@uref{https://github.com/MentorEmbedded/nvptx-tools/,,nvptx-tools}.
+@uref{https://github.com/SourceryTools/nvptx-tools,,nvptx-tools}.
 Tell GCC where to find it:
 @option{--with-build-time-tools=[install-nvptx-tools]/nvptx-none/bin}.
 
-- 
2.43.0


[Committed] RISC-V: Remove xfail for some of the SLP tests

2023-12-15 Thread Juzhe-Zhong
Due to recent middle-end cost model changes, now we can do more VLA SLP.

Fix these following regressions:

XPASS: gcc.target/riscv/rvv/autovec/partial/slp-1.c scan-assembler \\tvand
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-1.c scan-assembler \\tvand
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-1.c scan-assembler \\tvid\\.v
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-1.c scan-assembler \\tvid\\.v
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-1.c scan-tree-dump-times 
optimized ".VEC_PERM" 1
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-1.c scan-tree-dump-times 
optimized ".VEC_PERM" 1
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-16.c scan-assembler \\tvid\\.v
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-16.c scan-assembler \\tvid\\.v
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-16.c scan-tree-dump-times 
optimized ".VEC_PERM" 1
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-16.c scan-tree-dump-times 
optimized ".VEC_PERM" 1
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-3.c scan-tree-dump-times 
optimized ".VEC_PERM" 1
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-3.c scan-tree-dump-times 
optimized ".VEC_PERM" 1
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-5.c scan-tree-dump-times 
optimized ".VEC_PERM" 1
XPASS: gcc.target/riscv/rvv/autovec/partial/slp-5.c scan-tree-dump-times 
optimized ".VEC_PERM" 1

Committed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Remove xfail of M2.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.

---
 .../gcc.target/riscv/rvv/autovec/partial/slp-1.c  | 8 
 .../gcc.target/riscv/rvv/autovec/partial/slp-16.c | 6 +++---
 .../gcc.target/riscv/rvv/autovec/partial/slp-3.c  | 4 ++--
 .../gcc.target/riscv/rvv/autovec/partial/slp-5.c  | 4 ++--
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
index 34622ce9aff..948b20b68d3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-1.c
@@ -20,7 +20,7 @@ f (int8_t *restrict a, int8_t *restrict b, int n)
 }
 
 /* FIXME: Since we don't have VECT cost model yet, LOAD_LANES/STORE_LANES are 
chosen
-   instead of SLP when riscv-autovec-lmul=m1 or m2.  */
-/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" { xfail { 
any-opts "--param riscv-autovec-lmul=m1" "--param riscv-autovec-lmul=m2" 
"--param riscv-autovec-lmul=m8" } } } } */
-/* { dg-final { scan-assembler {\tvid\.v} { xfail { any-opts "--param 
riscv-autovec-lmul=m1" "--param riscv-autovec-lmul=m2" } } } } */
-/* { dg-final { scan-assembler {\tvand} { xfail { any-opts "--param 
riscv-autovec-lmul=m1" "--param riscv-autovec-lmul=m2" } } } } */
+   instead of SLP when riscv-autovec-lmul=m1.  */
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" { xfail { 
any-opts "--param riscv-autovec-lmul=m1" "--param riscv-autovec-lmul=m8" } } } 
} */
+/* { dg-final { scan-assembler {\tvid\.v} { xfail { any-opts "--param 
riscv-autovec-lmul=m1" } } } } */
+/* { dg-final { scan-assembler {\tvand} { xfail { any-opts "--param 
riscv-autovec-lmul=m1" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-16.c
index 80c77ef679a..7b23cafab3f 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-16.c
@@ -20,7 +20,7 @@ f (uint8_t *restrict a, uint8_t *restrict b, int n)
 }
 
 /* FIXME: Since we don't have VECT cost model yet, LOAD_LANES/STORE_LANES are 
chosen
-   instead of SLP when riscv-autovec-lmul=m1 or m2.  */
-/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" { xfail { 
any-opts "--param riscv-autovec-lmul=m1" "--param riscv-autovec-lmul=m2" 
"--param riscv-autovec-lmul=m8" } } } } */
-/* { dg-final { scan-assembler {\tvid\.v} { xfail { any-opts "--param 
riscv-autovec-lmul=m1" "--param riscv-autovec-lmul=m2" } } } } */
+   instead of SLP when riscv-autovec-lmul=m1.  */
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 1 "optimized" { xfail { 
any-opts "--param riscv-autovec-lmul=m1" "--param riscv-autovec-lmul=m8" } } } 
} */
+/* { dg-final { scan-assembler {\tvid\.v} { xfail { any-opts "--param 
riscv-autovec-lmul=m1"} } } } */
 /* { dg-final { scan-assembler-not {\tvmul} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
index 75298bd7525..3622c59c439 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-3.c
@@ -20,5 +20,5 @@ f (int8_t *restrict a, int8_t *restrict b, int 

Re: [r14-6559 Regression] FAIL: gcc.dg/guality/pr58791-4.c -Os -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i == 486 on Linux/x86_64

2023-12-15 Thread Richard Biener
On Fri, Dec 15, 2023 at 2:25 AM haochen.jiang
 wrote:
>
> On Linux/x86_64,
>
> 8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652 is the first bad commit
> commit 8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652
> Author: Di Zhao 
> Date:   Fri Dec 15 03:22:32 2023 +0800
>
> Consider fully pipelined FMA in get_reassociation_width
>
> caused
>
> FAIL: gcc.dg/guality/pr58791-4.c   -O2  -DPREVENT_OPTIMIZATION  line 
> pr58791-4.c:32 i2 == 487
> FAIL: gcc.dg/guality/pr58791-4.c   -O2  -DPREVENT_OPTIMIZATION  line 
> pr58791-4.c:32 i == 486
> FAIL: gcc.dg/guality/pr58791-4.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i2 == 487
> FAIL: gcc.dg/guality/pr58791-4.c   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i == 486
> FAIL: gcc.dg/guality/pr58791-4.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i2 == 487
> FAIL: gcc.dg/guality/pr58791-4.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i == 486
> FAIL: gcc.dg/guality/pr58791-4.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 
> pr58791-4.c:32 i2 == 487
> FAIL: gcc.dg/guality/pr58791-4.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 
> pr58791-4.c:32 i == 486
> FAIL: gcc.dg/guality/pr58791-4.c   -Os  -DPREVENT_OPTIMIZATION  line 
> pr58791-4.c:32 i2 == 487
> FAIL: gcc.dg/guality/pr58791-4.c   -Os  -DPREVENT_OPTIMIZATION  line 
> pr58791-4.c:32 i == 486
>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-6559/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr58791-4.c 
> --target_board='unix{-m64\ -march=cascadelake}'"

There's an extra intermediate stmt inserted (for much later use, but
reassoc inserts close to defs) that is
then also used for FMA forming.  This disturbs things in some way:

  g_5 = (double) f_4;
  # DEBUG g => g_5
  # DEBUG BEGIN_STMT
  h_7 = (double) b_6(D);
  # DEBUG h => h_7
  # DEBUG BEGIN_STMT
  _39 = h_7 * 3.25e+0;
  # DEBUG D#5 => g_5 * h_7
  # DEBUG i => D#5
  # DEBUG BEGIN_STMT
  # DEBUG i2 => D#5 + 1.0e+0
  # DEBUG BEGIN_STMT
  # DEBUG D#8 => g_5 * _39
  _3 = .FMA (g_5, _39, h_7);

g_5 is dead after the FMA.  Interestingly removing the

  asm volatile (NOP : : : "memory");
  asm volatile (NOP : : : "memory");

lines fixes the regression because then we can TER the FMA, keeping
g_5 live for longer.

Richard.