Re: [PATCH] xtensa: Prepare the transition from Reload to LRA

2022-10-15 Thread Max Filippov via Gcc-patches
Hi Suwa-san,

On Fri, Oct 14, 2022 at 4:19 AM Takayuki 'January June' Suwa
 wrote:
> This patch provides the first step in the transition from Reload to LRA
> in Xtensa.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa-proto.h (xtensa_split1_is_finished_p):
> New prototype.
> * config/xtensa/xtensa.cc
> (xtensa_split1_is_finished_p, xtensa_lra_p): New functions.
> (TARGET_LRA_P): Replace the dummy hook with xtensa_lra_p.
> (xt_true_regnum): Rework.
> * gcc/config/xtensa/xtensa.h (CALL_REALLY_USED_REGISTERS):
> Rename from CALL_USED_REGISTERS, and remove what correspond to
> FIXED_REGISTERS.
> * gcc/config/xtensa/constraints.md (Y):
> Use !xtensa_split1_is_finished_p() instead of can_create_pseudo_p().
> * gcc/config/xtensa/predicates.md (move_operand): Ditto.
> * gcc/config/xtensa/xtensa.md:
> Add new split pattern that puts out-of-constraint integer constants
> into the constant pool.
> * gcc/config/xtensa/xtensa.opt (-mlra): New target-specific option
> for testing purpose.
> ---
>  gcc/config/xtensa/constraints.md  |  2 +-
>  gcc/config/xtensa/predicates.md   |  2 +-
>  gcc/config/xtensa/xtensa-protos.h |  1 +
>  gcc/config/xtensa/xtensa.cc   | 48 ---
>  gcc/config/xtensa/xtensa.h|  6 ++--
>  gcc/config/xtensa/xtensa.md   | 12 
>  gcc/config/xtensa/xtensa.opt  |  4 +++
>  7 files changed, 60 insertions(+), 15 deletions(-)

Thank you for doing this, I couldn't find time to get back to it since 2020 ):

This change results in a few new regressions in the following tests
caused by ICE even when running without -mlra option:

+FAIL: gcc.c-torture/execute/pr92904.c   -O1  (internal compiler
error: in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -O2  (internal compiler
error: in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error: in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -O3 -g  (internal compiler
error: in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -Os  (internal compiler
error: in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  (internal compiler error:
in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  (internal compiler error: in extract_insn, at
recog.cc:2791)
+FAIL: g++.dg/torture/vshuf-v2si.C   -O3 -g  (internal compiler error:
in extract_insn, at recog.cc:2791)
+FAIL: g++.dg/torture/vshuf-v8qi.C   -O3 -g  (internal compiler error:
in extract_insn, at recog.cc:2791)

The backtraces look like this in all of them:

gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c:395:1: error:
unrecognizable insn:
(insn 10501 7 10502 2 (set (reg:SI 5913)
   (const_int 1431655765 [0x]))
"gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c":239:9 -1
(nil))
during RTL pass: subreg3
gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c:395:1: internal
compiler error: in extract_insn, at recog.cc:2791
0x6b17f7 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
   gcc/gcc/rtl-error.cc:108
0x6b187a _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
   gcc/gcc/rtl-error.cc:116
0x6a2aa4 extract_insn(rtx_insn*)
   gcc/gcc/recog.cc:2791
0x179e94d decompose_multiword_subregs
   gcc/gcc/lower-subreg.cc:1678
0x179ebdd execute
   gcc/gcc/lower-subreg.cc:1820

There's also the following runtime failures, but only on
call0 configuration:

+FAIL: gcc.c-torture/execute/20010122-1.c   -O1  execution test
+FAIL: gcc.c-torture/execute/20010122-1.c   -O2  execution test
+FAIL: gcc.c-torture/execute/20010122-1.c   -O3 -g  execution test
+FAIL: gcc.c-torture/execute/20010122-1.c   -Os  execution test
+FAIL: gcc.c-torture/execute/20010122-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test

-- 
Thanks.
-- Max


[committed] Fix bug in register move costing on H8/300

2022-10-15 Thread Jeff Law via Gcc-patches


Spotted while looking at a regression on the H8/300.   We used MAC_REG 
rather than MAC_REGS when referring to a register class. This caused 
code quality regressions while looking into Paul K's suggestion for 
fixing the auto-increment issue on the H8.



Pushed to the trunk,

Jeff


commit 6c3da5ca84ded7c5754183f8d2cad0d01e1562ff
Author: Jeff Law 
Date:   Sat Oct 15 23:38:20 2022 -0400

Fix bug in register move costing on H8/300

gcc/
* config/h8300/h8300.cc (h8300_register_move_cost): Fix typo.

diff --git a/gcc/config/h8300/h8300.cc b/gcc/config/h8300/h8300.cc
index 78cf15f15c7..be3e385c91e 100644
--- a/gcc/config/h8300/h8300.cc
+++ b/gcc/config/h8300/h8300.cc
@@ -1140,7 +1140,7 @@ static int
 h8300_register_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
  reg_class_t from, reg_class_t to)
 {
-  if (from == MAC_REGS || to == MAC_REG)
+  if (from == MAC_REGS || to == MAC_REGS)
 return 6;
   else
 return 3;


Re: [PATCH] c++ modules: streaming constexpr_fundef [PR101449]

2022-10-15 Thread Nathan Sidwell via Gcc-patches

On 10/14/22 13:00, Patrick Palka wrote:

IIUC we currently avoid streaming the RESULT_DECL and PARM_DECLs
of a constexpr_fundef entry under the assumption that they're just
copies of the DECL_RESULT and DECL_ARGUMENTS of the FUNCTION_DECL.
Thus we can just make new copies of DECL_RESULT and DECL_ARGUMENTs
on stream in rather than separately streaming them.

Unfortunately this assumption isn't true generally: the FUNCTION_DECL
contains genericized trees, whereas the constexpr_fundef entry contains
pre-GENERIC trees.  So in particular DECL_RESULT and DECL_ARGUMENTs
may have been turned into invisiref parms which we don't handle during
during constexpr evaluation and so we ICE in the below testcase.

This patch fixes this by faithfully streaming the RESULT_DECL and
PARM_DECLs of a constexpr_fundef entry.


Hm, the reason for the complexity was that I wanted to recreate the tree graph 
where the fndecl came from one TU and the defn came from another one -- we need 
the definition to refer to argument decls from the already-read decl.  However, 
it seems that for constexpr fns here, that is not needed, resulting in a 
significant simplification.


ok.

nathan



PR c++/101449

gcc/cp/ChangeLog:

* module.cc (trees_out::write_function_def): Stream the
parms and result of the constexpr_fundef entry.
(trees_in::read_function_def): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/cexpr-3_a.C: New test.
* g++.dg/modules/cexpr-3_b.C: New test.
---
  gcc/cp/module.cc | 59 
  gcc/testsuite/g++.dg/modules/cexpr-3_a.C | 14 ++
  gcc/testsuite/g++.dg/modules/cexpr-3_b.C |  7 +++
  3 files changed, 29 insertions(+), 51 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/cexpr-3_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/cexpr-3_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 7ffeefa7c1f..999ff3faafc 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -11553,34 +11553,13 @@ trees_out::write_function_def (tree decl)
tree_node (DECL_FRIEND_CONTEXT (decl));
  
constexpr_fundef *cexpr = retrieve_constexpr_fundef (decl);

-  int tag = 0;
-  if (cexpr)
-{
-  if (cexpr->result == error_mark_node)
-   /* We'll stream the RESULT_DECL naturally during the
-  serialization.  We never need to fish it back again, so
-  that's ok.  */
-   tag = 0;
-  else
-   tag = insert (cexpr->result);
-}
+
if (streaming_p ())
+u (cexpr != nullptr);
+  if (cexpr)
  {
-  i (tag);
-  if (tag)
-   dump (dumper::TREE)
- && dump ("Constexpr:%d result %N", tag, cexpr->result);
-}
-  if (tag)
-{
-  unsigned ix = 0;
-  for (tree parm = cexpr->parms; parm; parm = DECL_CHAIN (parm), ix++)
-   {
- tag = insert (parm);
- if (streaming_p ())
-   dump (dumper::TREE)
- && dump ("Constexpr:%d parm:%u %N", tag, ix, parm);
-   }
+  chained_decls (cexpr->parms);
+  tree_node (cexpr->result);
tree_node (cexpr->body);
  }
  
@@ -11613,32 +11592,10 @@ trees_in::read_function_def (tree decl, tree maybe_template)

tree maybe_dup = odr_duplicate (maybe_template, DECL_SAVED_TREE (decl));
bool installing = maybe_dup && !DECL_SAVED_TREE (decl);
  
-  if (int wtag = i ())

+  if (u ())
  {
-  int tag = 1;
-  cexpr.result = error_mark_node;
-
-  cexpr.result = copy_decl (result);
-  tag = insert (cexpr.result);
-
-  if (wtag != tag)
-   set_overrun ();
-  dump (dumper::TREE)
-   && dump ("Constexpr:%d result %N", tag, cexpr.result);
-
-  cexpr.parms = NULL_TREE;
-  tree *chain = &cexpr.parms;
-  unsigned ix = 0;
-  for (tree parm = DECL_ARGUMENTS (maybe_dup ? maybe_dup : decl);
-  parm; parm = DECL_CHAIN (parm), ix++)
-   {
- tree p = copy_decl (parm);
- tag = insert (p);
- dump (dumper::TREE)
-   && dump ("Constexpr:%d parm:%u %N", tag, ix, p);
- *chain = p;
- chain = &DECL_CHAIN (p);
-   }
+  cexpr.parms = chained_decls ();
+  cexpr.result = tree_node ();
cexpr.body = tree_node ();
cexpr.decl = decl;
  }
diff --git a/gcc/testsuite/g++.dg/modules/cexpr-3_a.C 
b/gcc/testsuite/g++.dg/modules/cexpr-3_a.C
new file mode 100644
index 000..be24bb43a7b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/cexpr-3_a.C
@@ -0,0 +1,14 @@
+// PR c++/101449
+// { dg-additional-options -fmodules-ts }
+// { dg-module-cmi pr101449 }
+
+export module pr101449;
+
+struct X {
+  bool b = true;
+  constexpr X() { }
+  constexpr X(const X&) { }
+};
+
+export constexpr X f() { return {}; }
+export constexpr bool g(X x) { return x.b; }
diff --git a/gcc/testsuite/g++.dg/modules/cexpr-3_b.C 
b/gcc/testsuite/g++.dg/modules/cexpr-3_b.C
new file mode 100644
index 000..cbf3be4fcab
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/cexpr-3_b.C
@@ -0,0

Re: libstdc++: Address '-Wunused-function' for 'print_raw' (was: [committed] libstdc++: Simplify print_raw function for debug assertions)

2022-10-15 Thread Jonathan Wakely via Gcc-patches
On Sat, 15 Oct 2022 at 14:30, Jonathan Wakely  wrote:
>
> On Sat, 15 Oct 2022 at 11:52, Thomas Schwinge  wrote:
> >
> > Hi!
> >
> > On 2022-10-14T15:36:02+0100, Jonathan Wakely via Gcc-patches 
> >  wrote:
> > > Tested powerpc64le-linux. Pushed to trunk.
> > >
> > > -- >8 --
> > >
> > > Replace two uses of print_raw where it's clearer to just use fprintf
> > > directly. Then the only remaining use of print_raw is as the print_func
> > > argument of pretty_print.
> >
> > OK to push the attached
> > "libstdc++: Address '-Wunused-function' for 'print_raw'",
> > or should this be addressed differently?
>
> Oh yes, I didn't notice it's only used within the conditional block,
> because I only tested with stacktrace enabled.
>
> I think it would be a little better to move print_raw down to where
> it's actually needed:
>
> --- a/libstdc++-v3/src/c++11/debug.cc
> +++ b/libstdc++-v3/src/c++11/debug.cc
> @@ -609,14 +609,6 @@ namespace
> print_literal(PrintContext& ctx, const char(&word)[Length])
> { print_word(ctx, word, Length - 1); }
>
> -  void
> -  print_raw(PrintContext& ctx, const char* str, ptrdiff_t nbc)
> -  {
> -if (nbc == -1)
> -  nbc = INT_MAX;
> -ctx._M_column += fprintf(stderr, "%.*s", (int)nbc, str);
> -  }
> -
>   void
>   print_word(PrintContext& ctx, const char* word, ptrdiff_t nbc = -1)
>   {
> @@ -1092,6 +1084,14 @@ namespace
>   { print_string(ctx, str, nbc, nullptr, 0); }
>
> #if _GLIBCXX_HAVE_STACKTRACE
> +  void
> +  print_raw(PrintContext& ctx, const char* str, ptrdiff_t nbc)
> +  {
> +if (nbc == -1)
> +  nbc = INT_MAX;
> +ctx._M_column += fprintf(stderr, "%.*s", (int)nbc, str);
> +  }
> +
>   int
>   print_backtrace(void* data, __UINTPTR_TYPE__ pc, const char* filename,
>  int lineno, const char* function)
>
>
> I'll push that later today, or feel free to do it yourself if you want
> the warning to go away :-)

Now pushed as r13-3314-g030a08c8572049

Thanks for pointing out the warning.



[committed] libstdc++: Implement constexpr std::to_chars for C++23 (P2291R3)

2022-10-15 Thread Jonathan Wakely via Gcc-patches
This has been approved for C++23.

I've just realised that the commit msg doesn't match the actual patch. I
had a consteval lambda in an earlier version, but I changed it to do:

+#if __cpp_lib_constexpr_charconv
+ if (std::__is_constant_evaluated())
+   return __table<_DecOnly>.__data[__c];
+#endif

So the code is unchanged for non-constexpr, and uses a global variable
template for constexpr evaluations. If P2647R0 gets accepted and applies
to C++23 mode then we can remove this and the static constexpr variable
will work for both cases.

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

Some of the helper functions use static constexpr local variables, which
is not permitted in a core constant expression. Removing the 'static'
seems to have negligible performance effect for __to_chars and
__to_chars_16. For __from_chars_alnum_to_val removing the 'static'
causes a significant performance impact for base 36 conversions. Use a
consteval lambda instead.

libstdc++-v3/ChangeLog:

* include/bits/charconv.h (__to_chars_10_impl): Add constexpr
for C++23. Remove 'static' from array.
* include/std/charconv (__cpp_lib_constexpr_charconv): Define.
(__to_chars, __to_chars_16): Remove 'static' from array, add
constexpr.
(__to_chars_10, __to_chars_8, __to_chars_2, __to_chars_i)
(to_chars, __raise_and_add, __from_chars_pow2_base)
(__from_chars_alnum, from_chars): Add constexpr.
(__from_chars_alnum_to_val): Avoid local static during constant
evaluation. Add constexpr.
* include/std/version (__cpp_lib_constexpr_charconv): Define.
* testsuite/20_util/from_chars/constexpr.cc: New test.
* testsuite/20_util/to_chars/constexpr.cc: New test.
* testsuite/20_util/to_chars/version.cc: New test.
---
 libstdc++-v3/include/bits/charconv.h  |   4 +-
 libstdc++-v3/include/std/charconv |  41 +++--
 libstdc++-v3/include/std/version  |   1 +
 .../testsuite/20_util/from_chars/constexpr.cc |  57 ++
 .../testsuite/20_util/to_chars/constexpr.cc   | 172 ++
 .../testsuite/20_util/to_chars/version.cc |  16 ++
 6 files changed, 275 insertions(+), 16 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/from_chars/constexpr.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/to_chars/constexpr.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/to_chars/version.cc

diff --git a/libstdc++-v3/include/bits/charconv.h 
b/libstdc++-v3/include/bits/charconv.h
index 4cae10a72f7..d04aab77624 100644
--- a/libstdc++-v3/include/bits/charconv.h
+++ b/libstdc++-v3/include/bits/charconv.h
@@ -68,13 +68,13 @@ namespace __detail
   // The caller is required to provide a buffer of exactly the right size
   // (which can be determined by the __to_chars_len function).
   template
-void
+_GLIBCXX23_CONSTEXPR void
 __to_chars_10_impl(char* __first, unsigned __len, _Tp __val) noexcept
 {
   static_assert(is_integral<_Tp>::value, "implementation bug");
   static_assert(is_unsigned<_Tp>::value, "implementation bug");
 
-  static constexpr char __digits[201] =
+  constexpr char __digits[201] =
"0001020304050607080910111213141516171819"
"2021222324252627282930313233343536373839"
"4041424344454647484950515253545556575859"
diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index 64d0584a55d..4b6cc83a567 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -50,6 +50,10 @@
 # define __cpp_lib_to_chars 201611L
 #endif
 
+#if __cplusplus > 202002L
+# define __cpp_lib_constexpr_charconv 202202L
+#endif
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -119,7 +123,7 @@ namespace __detail
 
   // Generic implementation for arbitrary bases.
   template
-to_chars_result
+constexpr to_chars_result
 __to_chars(char* __first, char* __last, _Tp __val, int __base) noexcept
 {
   static_assert(is_integral<_Tp>::value, "implementation bug");
@@ -138,7 +142,7 @@ namespace __detail
 
   unsigned __pos = __len - 1;
 
-  static constexpr char __digits[] = {
+  constexpr char __digits[] = {
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
@@ -160,7 +164,7 @@ namespace __detail
 }
 
   template
-__integer_to_chars_result_type<_Tp>
+constexpr __integer_to_chars_result_type<_Tp>
 __to_chars_16(char* __first, char* __last, _Tp __val) noexcept
 {
   static_assert(is_integral<_Tp>::value, "implementation bug");
@@ -177,7 +181,7 @@ namespace __detail
  return __res;
}
 
-  static constexpr char __digits[] = {
+  constexpr char __digits[] = {
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'a', 'b', 'c', 'd', 'e', 'f'
   };

[committed] libstdc++: Fix uses_allocator_construction args for cv pair (LWG 3677)

2022-10-15 Thread Jonathan Wakely via Gcc-patches
This issue is Tentatively Ready, and the resolution seems obviously
correct so we can do it now.

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

The _Std_pair concept uses in  handles const
qualified pairs, but not volatile qualified. That's because it just uses
__is_pair which is specialized for const pairs.

This removes the partial specialization __is_pair>, so
that __is_pair is now only true for cv-unqualified pairs. Then _Std_pair
needs to explicitly use remove_cv_t for the argument to __is_pair.

The other use of __is_pair is in map::insert(Pair&&) which doesn't want
to handle volatile so should just use remove_const_t.

libstdc++-v3/ChangeLog:

* include/bits/stl_map.h (map::insert(Pair&&)): Use
remove_const_t on argument to __is_pair.
* include/bits/stl_pair.h (__is_pair>): Remove
partial specialization.
* include/bits/uses_allocator_args.h (_Std_pair): Use
remove_cv_t as per LWG 3677.
* testsuite/20_util/uses_allocator/lwg3677.cc: New test.
---
 libstdc++-v3/include/bits/stl_map.h   |  2 +-
 libstdc++-v3/include/bits/stl_pair.h  |  3 --
 .../include/bits/uses_allocator_args.h|  2 +-
 .../20_util/uses_allocator/lwg3677.cc | 52 +++
 4 files changed, 54 insertions(+), 5 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/uses_allocator/lwg3677.cc

diff --git a/libstdc++-v3/include/bits/stl_map.h 
b/libstdc++-v3/include/bits/stl_map.h
index 9c2b0745673..83c579aaedc 100644
--- a/libstdc++-v3/include/bits/stl_map.h
+++ b/libstdc++-v3/include/bits/stl_map.h
@@ -847,7 +847,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
{
 #if __cplusplus >= 201703L
  using _P2 = remove_reference_t<_Pair>;
- if constexpr (__is_pair<_P2>)
+ if constexpr (__is_pair>)
if constexpr (is_same_v>)
  if constexpr (__usable_key)
{
diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index d0f07b09d34..195167019b7 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -889,9 +889,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 inline constexpr bool __is_pair> = true;
-
-  template
-inline constexpr bool __is_pair> = true;
 #endif
 
   /// @cond undocumented
diff --git a/libstdc++-v3/include/bits/uses_allocator_args.h 
b/libstdc++-v3/include/bits/uses_allocator_args.h
index ef5c4fffb70..77e48602aac 100644
--- a/libstdc++-v3/include/bits/uses_allocator_args.h
+++ b/libstdc++-v3/include/bits/uses_allocator_args.h
@@ -44,7 +44,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
-concept _Std_pair = __is_pair<_Tp>;
+concept _Std_pair = __is_pair>;
 
 /** @addtogroup allocators
  *  @{
diff --git a/libstdc++-v3/testsuite/20_util/uses_allocator/lwg3677.cc 
b/libstdc++-v3/testsuite/20_util/uses_allocator/lwg3677.cc
new file mode 100644
index 000..649b98d3922
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/uses_allocator/lwg3677.cc
@@ -0,0 +1,52 @@
+// { dg-options "-std=gnu++23" }
+// { dg-do run { target c++20 } }
+
+#include 
+#include 
+
+struct UsesAlloc
+{
+  using allocator_type = std::allocator;
+
+  bool passed_alloc;
+
+  UsesAlloc(int) : passed_alloc(false) { }
+
+  UsesAlloc(int, std::allocator) : passed_alloc(true) { }
+};
+
+using Pair = std::pair;
+
+void
+test_const()
+{
+  std::allocator a;
+  int i = 0;
+  auto p = std::make_obj_using_allocator(a, i, i);
+  VERIFY( p.first.passed_alloc );
+}
+
+void
+test_volatile()
+{
+  std::allocator a;
+  int i = 0;
+  auto p = std::make_obj_using_allocator(a, i, i);
+  VERIFY( p.first.passed_alloc );
+}
+
+void
+test_const_volatile()
+{
+  std::allocator a;
+  int i = 0;
+  auto p = std::make_obj_using_allocator(a, i, i);
+  VERIFY( p.first.passed_alloc );
+}
+
+int main()
+{
+  test_const();
+  test_volatile();
+  test_const_volatile();
+}
-- 
2.37.3



[PATCH] improved const shifts for AVR targets

2022-10-15 Thread Georg Johann Lay

Hi,
recently I used some arduino uno for a project and realized some areas
which do not output optimal asm code. Especially around shifts and function
calls.
With this as motivation and hacktoberfest I started patching things.
Since patch files do not provide a good overview and I hope for a
"hacktoberfest-accepted" label on the PR on github I also opened it there:
https://github.com/gcc-mirror/gcc/pull/73

This patch improves shifts with const right hand operand. While 8bit and
16bit shifts where mostly fine 24bit and 32bit where not handled well.

Testing
I checked output with a local installation of compiler explorer in asm and
a tiny unit test comparing shifts with mul/div by 2.
I however did not write any testcases in gcc for it.


Hi, for such large changes, IMO it's a good idea to run the testsuite 
against the changes and make sure that there are no regressions.  Maybe 
even add new runtime tests in gcc.target/avr/torture to cover 
significant amount of the changes?


For example a test could go like:

__attribute__((__always_inline__))
static inline void shr (long x, int off)
{
long y = x >> off;
__asm ("" : "+r" (x));
if (x >> off != y)
__builtin_abort();
}

void test_shr (void)
{
long x = 0x76543215;
shr (x, 13);
shr (x, 14);
shr (x, 15);
shr (x, 16);
}

One shift is folded away by the compiler, and the other one has to be 
carried out.


However, the insn output also depends on available register classes like 
"ldi_ok" and whether a "d" class scratch is available, so it will be 
hard to achieve full coverage.  As it appears, testing for the lower 
registers can be forced by, where this won't work for AVR_TINY, of course:


static inline void shr (long x, int off)
{
long y = x >> off;
__asm ("" : "+l" (x));
x >>= off;
__asm ("" : "+l" (x));
if (x != y)
__builtin_abort();
}


Target
This patch is only targeting atmel avr family of chips.

Changelog
improved const shifts for AVR targets


You can have a look at existing ChangeLog files to see the format and style.



Patch
-
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 4ed390e4cf9..c7b70812d5c 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -6043,9 +6043,6 @@ out_shift_with_cnt (const char *templ, rtx_insn
*insn, rtx operands[],
   op[2] = operands[2];
   op[3] = operands[3];

-  if (plen)
-*plen = 0;
-


This looks wrong.  These functions are used in two different contexts:

One is computing the instructions lengths (in words) which is needed for 
jump offset computations for relative jumps that are crossing the insn. 
This is done for plen != NULL, and the length must be returned in *plen.


Second is actual output of the instruction sequence rest. return 
respective sting (depending on context), which must have a length no 
longer than computed.  This is performed if plen == NULL.


Not initializing *plen means that you get garbage for instruction 
lengths.  Runtime errors will occur but just not very frequently, e.g. 
if an instruction sequence is longer than anticipated, a jump target 
might be out of reach which results in a linker error.



   if (CONST_INT_P (operands[2]))
 {
   /* Operand 3 is a scratch register if this is a
@@ -6150,96 +6147,68 @@ out_shift_with_cnt (const char *templ, rtx_insn
*insn, rtx operands[],
 /* 8bit shift left ((char)x << i)   */

 const char *
-ashlqi3_out (rtx_insn *insn, rtx operands[], int *len)
+ashlqi3_out (rtx_insn *insn, rtx operands[], int *plen)
 {
   if (CONST_INT_P (operands[2]))
 {
-  int k;
-
-  if (!len)
- len = &k;
-
   switch (INTVAL (operands[2]))
  {
  default:
   if (INTVAL (operands[2]) < 8)
 break;

-  *len = 1;
-  return "clr %0";
-
- case 1:
-  *len = 1;
-  return "lsl %0";
-
- case 2:
-  *len = 2;
-  return ("lsl %0" CR_TAB
-  "lsl %0");
-
- case 3:
-  *len = 3;
-  return ("lsl %0" CR_TAB
-  "lsl %0" CR_TAB
-  "lsl %0");
+return avr_asm_len ("clr %0", operands, plen, 1);


I don't get it.  This prints *one* CLR instruction for all shift offsets 
1...3?




  case 4:
   if (test_hard_reg_class (LD_REGS, operands[0]))
 {
-  *len = 2;
-  return ("swap %0" CR_TAB
-  "andi %0,0xf0");
+return avr_asm_len ("swap %0" CR_TAB
+  "andi %0,0xf0", operands, plen, 2);


Glitch of coding-rules (GNU style it is), similar in many placed down 
the line which seem to have incorrect indentations.  It's not always 
easy to tell this just from looking at a patch, so better double-check 
your indentations.



 }
-  *len = 4;
-  return ("lsl %0" CR_TAB
+return avr_asm_len ("lsl %0" CR_TAB
   "lsl %0" CR_TAB
   "lsl %0" CR_TAB
-  "lsl %0");
+  "lsl %0", operands, plen, 4);

  case 5:
   if (test_hard_reg_class (LD_REGS, operands[0]))
 {
-  *len = 3;
-  return ("swap %0" CR_TAB
+return avr_asm_len ("swap %0" CR_TAB
   "lsl %0"  CR_TAB
-  "andi %0,0xe0");
+  "andi %0,0xe0", operands, plen, 

Re: libstdc++: Address '-Wunused-function' for 'print_raw' (was: [committed] libstdc++: Simplify print_raw function for debug assertions)

2022-10-15 Thread Jonathan Wakely via Gcc-patches
On Sat, 15 Oct 2022 at 11:52, Thomas Schwinge  wrote:
>
> Hi!
>
> On 2022-10-14T15:36:02+0100, Jonathan Wakely via Gcc-patches 
>  wrote:
> > Tested powerpc64le-linux. Pushed to trunk.
> >
> > -- >8 --
> >
> > Replace two uses of print_raw where it's clearer to just use fprintf
> > directly. Then the only remaining use of print_raw is as the print_func
> > argument of pretty_print.
>
> OK to push the attached
> "libstdc++: Address '-Wunused-function' for 'print_raw'",
> or should this be addressed differently?

Oh yes, I didn't notice it's only used within the conditional block,
because I only tested with stacktrace enabled.

I think it would be a little better to move print_raw down to where
it's actually needed:

--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -609,14 +609,6 @@ namespace
print_literal(PrintContext& ctx, const char(&word)[Length])
{ print_word(ctx, word, Length - 1); }

-  void
-  print_raw(PrintContext& ctx, const char* str, ptrdiff_t nbc)
-  {
-if (nbc == -1)
-  nbc = INT_MAX;
-ctx._M_column += fprintf(stderr, "%.*s", (int)nbc, str);
-  }
-
  void
  print_word(PrintContext& ctx, const char* word, ptrdiff_t nbc = -1)
  {
@@ -1092,6 +1084,14 @@ namespace
  { print_string(ctx, str, nbc, nullptr, 0); }

#if _GLIBCXX_HAVE_STACKTRACE
+  void
+  print_raw(PrintContext& ctx, const char* str, ptrdiff_t nbc)
+  {
+if (nbc == -1)
+  nbc = INT_MAX;
+ctx._M_column += fprintf(stderr, "%.*s", (int)nbc, str);
+  }
+
  int
  print_backtrace(void* data, __UINTPTR_TYPE__ pc, const char* filename,
 int lineno, const char* function)


I'll push that later today, or feel free to do it yourself if you want
the warning to go away :-)



Re: [PATCH] improved const shifts for AVR targets

2022-10-15 Thread A. Binzberger via Gcc-patches

On 12.10.22 19:57, Jeff Law wrote:


On 10/4/22 11:06, Alexander Binzberger via Gcc-patches wrote:

Hi,
recently I used some arduino uno for a project and realized some areas
which do not output optimal asm code. Especially around shifts and 
function

calls.
With this as motivation and hacktoberfest I started patching things.
Since patch files do not provide a good overview and I hope for a
"hacktoberfest-accepted" label on the PR on github I also opened it 
there:

https://github.com/gcc-mirror/gcc/pull/73

This patch improves shifts with const right hand operand. While 8bit and
16bit shifts where mostly fine 24bit and 32bit where not handled well.

Testing
I checked output with a local installation of compiler explorer in 
asm and

a tiny unit test comparing shifts with mul/div by 2.
I however did not write any testcases in gcc for it.

Target
This patch is only targeting atmel avr family of chips.

Changelog
improved const shifts for AVR targets


It would be helpful if you could show the before/after code for the 
cases you're changing.  Extra credit if you include cycles & size 
information for those cases.  That would help someone like me who 
knows GCC well, but isn't particularly well versed in the AVR target 
evaluate the overarching goal of the patch (ie, better code).


about the avr family targets:

* consider every branch instruction = 1/2 cycles

* consider every 2byte/word instruction (besides move word if available) 
= 2 cycles


* consider multiplication (if available) = 2 cycles

* consider every load (beside load immediate "ldi" 1cylce) = 2cycles (+1 
for prog mem)


* pop and jump mostly = 2 cycles

* call is basically = 2-4 cycles

* ret is about =  4/5 cycles

* consider every instruction (bit/bit-test, most compare, arithmetic, 
logic, some other) = 1 cycle


* division does not exist

or as a summary for this patch: branches and such are 2 cycles the rest 
is 1 cycle


note that shifts are 1bit per cycle and the instructions are at least 
mostly byte based.


also note that operations using immediate do only work with the upper 
half of registers.



a description for the code before my change and what changed:

* shifts on 8bit (beside arithmetic shifts right) were optimized and 
always unrolled (only aligned with the rest of the code without actual 
change)


* arithmetic shift 8bit and 16bit shifts were mostly optimized and 
mostly unrolled - depending on registers and Os (I added the missing 
cases there)


* 24bit and 32bit shifts were basically not optimized at all and never 
unrolled (I added those cases and aligned the optimizer logic with the 
others. They also reuse the other shift code since they may reduce to 
those cases after a move for bigger shifts.)


* out_shift_with_cnt provides a fallback implementation as a loop over 
shifts which may get unrolled. in case of Os to about inner_len + 3,4 or 
5 and in other cases of optimizer e.g. O2 it gets unrolled if size is 
smaller 10. see max_len (basically unchanged)


* did not touch non const cases in this patch but may in a future patch 
for O2 and O3


note that in case of Os the smaller code is picked which is the loop at 
least in some cases but other optimizer cases profit a lot.


also note that it is debatable if Os needs to be that strict with size 
since the compute overhead of the loop is high with 5 per loop 
iteration/cycle- so per bit shift. A lot more cases could be covered 
with +1 or +2 more instructions.



about plen:

If plen is NULL the asm code gets returned.

If plen is a pointer the code does count the instruction count which I 
guess is used (or could be used) as a rough estimate of cycles as well 
as byte code size.


Some of the functions named this len. The 24bit functions mainly named 
this plen and used it like it is now in all functions. This is mostly a 
readability improvement.


I am not sure how this works together with the optimizer or the rest.

To my understanding however the functions may get called once by the 
optimizer with a length given, then to output code and potentially again 
with a len given over avr_adjust_length to return the size.


I may be wrong about this part but as far as I can tell I did not change 
the way it operates.



size and cycles summary:

The asm instruction count is used as size and cycle estimate. This gets 
close to the real deal for the shifts since the instructions are all 1 
cylce anyway and similar in byte code size.


8bit gets always optimized and unrolled to get max performance and less 
code size (beside shift of 6 with lower half registers used which is the 
worst case with +1 instruction).


16bit, 24bit and 32bit gets unrolled depending on optimizer setting - 
and registers used (see out_shift_with_cnt:max_len). So 16bit gets 
mostly optimized and unrolled in Os (see comments for plen/max_len) and 
always in O2 and such (max_len=10). Shift optimization and unroll for 
24bit and 32bit is mostly only relevant when not optimizing for size.


I

libstdc++: Address '-Wunused-function' for 'print_raw' (was: [committed] libstdc++: Simplify print_raw function for debug assertions)

2022-10-15 Thread Thomas Schwinge
Hi!

On 2022-10-14T15:36:02+0100, Jonathan Wakely via Gcc-patches 
 wrote:
> Tested powerpc64le-linux. Pushed to trunk.
>
> -- >8 --
>
> Replace two uses of print_raw where it's clearer to just use fprintf
> directly. Then the only remaining use of print_raw is as the print_func
> argument of pretty_print.

OK to push the attached
"libstdc++: Address '-Wunused-function' for 'print_raw'",
or should this be addressed differently?


Grüße
 Thomas


> When called by pretty_print the count is
> either a positive integer or -1, so we can simplify print_raw itself.
>
> Remove the default argument, because it's never used. Remove the check
> for nbc == 0, which never happens (but would be harmless if it did).
> Replace the conditional expression with a single call to fprintf, using
> INT_MAX as the maximum length.
>
> libstdc++-v3/ChangeLog:
>
>   * src/c++11/debug.cc (print_raw): Simplify.
>   (print_word): Print indentation by calling fprintf directly.
>   (_Error_formatter::_M_error): Print unindented string by calling
>   fprintf directly.
> ---
>  libstdc++-v3/src/c++11/debug.cc | 21 -
>  1 file changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
> index abc4124c01e..f2b25fbefce 100644
> --- a/libstdc++-v3/src/c++11/debug.cc
> +++ b/libstdc++-v3/src/c++11/debug.cc
> @@ -37,6 +37,7 @@
>  #include// for std::abort
>  #include // for std::isspace.
>  #include// for std::strstr.
> +#include// for INT_MAX
>
>  #include  // for std::min.
>
> @@ -609,14 +610,11 @@ namespace
>  { print_word(ctx, word, Length - 1); }
>
>void
> -  print_raw(PrintContext& ctx, const char* str, ptrdiff_t nbc = -1)
> +  print_raw(PrintContext& ctx, const char* str, ptrdiff_t nbc)
>{
> -if (nbc != 0)
> -  {
> - ctx._M_column += (nbc > 0)
> -   ? fprintf(stderr, "%.*s", (int)nbc, str)
> -   : fprintf(stderr, "%s", str);
> -  }
> +if (nbc == -1)
> +  nbc = INT_MAX;
> +ctx._M_column += fprintf(stderr, "%.*s", (int)nbc, str);
>}
>
>void
> @@ -645,12 +643,9 @@ namespace
>   || (ctx._M_column + visual_length < ctx._M_max_length)
>   || (visual_length >= ctx._M_max_length && ctx._M_column == 1))
>{
> - // If this isn't the first line, indent
> + // If this isn't the first line, indent.
>   if (ctx._M_column == 1 && !ctx._M_first_line)
> -   {
> - const char spacing[PrintContext::_S_indent + 1] = "";
> - print_raw(ctx, spacing, PrintContext::_S_indent);
> -   }
> +   ctx._M_column += fprintf(stderr, "%*c", PrintContext::_S_indent, ' ');
>
>   int written = fprintf(stderr, "%.*s", (int)length, word);
>
> @@ -1166,7 +1161,7 @@ namespace __gnu_debug
>  PrintContext ctx;
>  if (_M_file)
>{
> - print_raw(ctx, _M_file);
> + ctx._M_column += fprintf(stderr, "%s", _M_file);
>   print_literal(ctx, ":");
>   go_to_next_line = true;
>}


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9ba7d0e6026ef4c3d095b0a57f9b88a87df403ea Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sat, 15 Oct 2022 12:15:58 +0200
Subject: [PATCH] libstdc++: Address '-Wunused-function' for 'print_raw'
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Per recent
commit r13-3303-gcf0b7e9787c3686c47219a725f2cbcaa19faaaca
"libstdc++: Simplify print_raw function for debug assertions":

> Replace two uses of print_raw where it's clearer to just use fprintf
> directly. Then the only remaining use of print_raw is as the print_func
> argument of pretty_print.

But that one is only compiled '#if _GLIBCXX_HAVE_STACKTRACE'.
In a '--enable-werror' build I've thus run into:

[...]/source-gcc/libstdc++-v3/src/c++11/debug.cc:613:3: error: ‘void {anonymous}::print_raw(PrintContext&, const char*, ptrdiff_t)’ defined but not used [-Werror=unused-function]
  613 |   print_raw(PrintContext& ctx, const char* str, ptrdiff_t nbc)
  |   ^
cc1plus: all warnings being treated as errors

	libstdc++-v3/
	* src/c++11/debug.cc (print_raw): Only '#if _GLIBCXX_HAVE_STACKTRACE'.
---
 libstdc++-v3/src/c++11/debug.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index f2b25fbefce0..f62a1f755cf1 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -609,6 +609,7 @@ namespace
 print_literal(PrintContext& ctx, const char(&word)[Length])
 { print_word(ctx, word, Length - 1); }
 
+#if _GLIBCXX_HAVE_STACKTRACE
   void
   print_raw(PrintContext& ctx, const char* str, ptrdiff_t nbc)
   {
@@ -616,6 +617,7 @@ namespace
   nbc = INT_MAX;
 ctx

Re: [PATCH] middle-end IFN_ASSUME support [PR106654]

2022-10-15 Thread Jakub Jelinek via Gcc-patches
On Sat, Oct 15, 2022 at 10:07:46AM +0200, Martin Uecker wrote:
> But why? Do we really want to encourage people to
> write such code?

Of course these ++ cases inside of expressions are just obfuscation.
But the point is to support using predicates that can be inlined and
simplified into something really simple the optimizers can understand.
The paper shows as useful e.g. being able to assert something is finite:
[[assume (std::isfinite (x)]];
and with the recent changes on the GCC side it is now or shortly will be
possible to take advantage of such predicates.
It is true that
[[assume (__builtin_isfinite (x)]];
could work if we check TREE_SIDE_EFFECTS on the GCC side because
it is a const function, but that is a GNU extension, so the standard
can't count with that.  std::isfinite isn't even marked const in libstdc++
and one can figure that out during IPA propagation only.
There are many similar predicates, or user could have some that are useful
to his program.  And either in the end it wouldn't have side-effects
but the compiler doesn't know, or would but those side-effects would be
unimportant to the optimizations the compiler can derive from those.

As the spec defines it well what happens with the side-effects and it
is an attribute, not a function and the languages have non-evaluated
contexts in other places, I don't see where a user confusion could come.
We don't warn for sizeof (i++) and similar either.

__builtin_assume (i++) is a bad choice because it looks like a function
call (after all, the compilers have many similar builtins) and its argument
looks like normal argument to the function, so it is certainly unexpected
that the side-effects aren't evaluated.

Jakub



Re: [PATCH] middle-end IFN_ASSUME support [PR106654]

2022-10-15 Thread Martin Uecker via Gcc-patches
Am Freitag, den 14.10.2022, 23:20 +0200 schrieb Jakub Jelinek:
> On Fri, Oct 14, 2022 at 10:43:16PM +0200, Martin Uecker wrote:
> > Am Montag, den 10.10.2022, 10:54 +0200 schrieb Jakub Jelinek:
> > > Hi!
> > > 
> > > My earlier patches gimplify the simplest non-side-effects assumptions
> > > into if (cond) ; else __builtin_unreachable (); and throw the rest
> > > on the floor.
> > > The following patch attempts to do something with the rest too.
> > 
> > My recommendation would be to only process side-effect-free
> > assumptions and warn about the rest (clang does this for
> > __builtin_assume).   I do not think this is worth the
> 
> I think that is bad choice and makes it useless.

I think

[[assume(10 == i + 1)]]

could be useful as it is nicer syntax than

if (10 != i + 1)
  unreachable();

 but

[[assume(10 == i++)]]

is confusing / untestable and therefor seems a bit
dangerous. 

But we do not have to agree, I just stating my opinion
here. I would personally then suggest to have an
option for warning about side effects in assume.

> > complexity and I am not so sure the semantics of a
> > hypothetical evaluation are terribly well defined.
> 
> I think the C++23 paper is quite clear.  Yes, you can't verify it
> in debug mode, but there are many other UBs that are hard to verify
> through runtime instrumentation.

And none are good. Some are very useful though
(or even required in the context of C/C++). But I think
there should be a very good reason for adding more.

Personally, I do not see [[assume]] how side effects
is useful enough to justify adding another kind of
untestable UB.

Especially also because it has somewhat vaguely 
defined semantics. I don't know any formalization the
proposed semantics and the normative wording
"the converted expression would evaluate to true"
is probably good starting point for a PhD thesis.


> And, OpenMP has a similar feature (though, in that case it is even
> a stronger guarantee where something is guaranteed to hold across
> a whole region rather than just on its entry.
>
> > That you can not verify this properly by turning it
> > into traps in debug mode (as this would execute the
> > side effects) also makes this a terrible feature IMHO.
> > 
> > MSVC which this feature was based does not seem to make
> > much to sense to me: https://godbolt.org/z/4Ebar3G9b
> 
> So maybe their feature is different from what is in C++23,
> or is badly implemented?

The paper says as the first sentence in the abstract:

"We propose a standard facility providing the semantics
of existing compiler built-ins such as
__builtin_assume (Clang) and __assume (MSVC, ICC)."

But Clang does not support side effects and MSVC
is broken.  But yes, the paper then describes these
extended semantics for expression with side effects. 
According to the author this was based on discussions
with MSVC developers, but - to me - the fact that MSVC
still implements something else which does not seem
to make sense speaks against this feature.


> I think with what we have in the works for GCC we'll be able to optimize
> in
> int f(int i)
> {
>   [[assume(1 == i++)]];
>   return (1 == i++);
> }
> 
> int g(int i)
> {
>   [[assume(1 == ++i)]];
>   return (1 == ++i);
> }
> 
> extern int i;
> 
> int h(void) 
> {
>   [[assume(1 == ++i)]];
>   return (1 == ++i);
> }
> 
> 
> int k(int i)
> {
>   [[assume(42 == ++i)]];
>   return i;
> }
> at least f/g to return 1; and k to return 41;
> The h case might take a while to take advantage of.

But why? Do we really want to encourage people to
write such code?


Martin