date:20240313

[PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-13 Thread liuhongt

When we split
(insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
(mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct 
SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])) "test.C":22:42 84 
{*movdi_internal}
 (expr_list:REG_EH_REGION (const_int -11 [0xfff5])

into

(insn 104 36 37 10 (set (subreg:V2DI (reg:DI 124) 0)
(vec_concat:V2DI (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 
MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])
(const_int 0 [0]))) "test.C":22:42 -1
(nil)))
(insn 37 104 105 10 (set (subreg:V2DI (reg:DI 104 [ _18 ]) 0)
(subreg:V2DI (reg:DI 124) 0)) "test.C":22:42 2024 {movv2di_internal}
 (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
(nil)))

we must copy the REG_EH_REGION note to the first insn and split the block
after the newly added insn.  The REG_EH_REGION on the second insn will be
removed later since it no longer traps.

Currently we only handle memory_operand, are there any other insns
need to be handled???

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} for trunk and 
gcc-13/gcc-12 release branch.
Ok for trunk and backport?

gcc/ChangeLog:

* config/i386/i386-features.cc
(general_scalar_chain::convert_op): Handle REG_EH_REGION note.
(convert_scalars_to_vector): Ditto.
* config/i386/i386-features.h (class scalar_chain): New
memeber control_flow_insns.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr111822.C: New test.
---
 gcc/config/i386/i386-features.cc | 48 ++--
 gcc/config/i386/i386-features.h  |  1 +
 gcc/testsuite/g++.target/i386/pr111822.C | 45 ++
 3 files changed, 90 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr111822.C

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 1de2a07ed75..2ed27a9ebdd 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -998,20 +998,36 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
 }
   else if (MEM_P (*op))
 {
+  rtx_insn* eh_insn, *movabs = NULL;
   rtx tmp = gen_reg_rtx (GET_MODE (*op));
 
   /* Handle movabs.  */
   if (!memory_operand (*op, GET_MODE (*op)))
{
  rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
+ movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
 
- emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
  *op = tmp2;
}
 
-  emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
-gen_gpr_to_xmm_move_src (vmode, *op)),
-   insn);
+  eh_insn
+   = emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
+gen_gpr_to_xmm_move_src (vmode, *op)),
+   insn);
+
+  if (cfun->can_throw_non_call_exceptions)
+   {
+ /* Handle REG_EH_REGION note.  */
+ rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
+ if (note)
+   {
+ if (movabs)
+   eh_insn = movabs;
+ control_flow_insns.safe_push (eh_insn);
+ add_reg_note (eh_insn, REG_EH_REGION, XEXP (note, 0));
+   }
+   }
+
   *op = gen_rtx_SUBREG (vmode, tmp, 0);
 
   if (dump_file)
@@ -2494,6 +2510,7 @@ convert_scalars_to_vector (bool timode_p)
 {
   basic_block bb;
   int converted_insns = 0;
+  auto_vec control_flow_insns;
 
   bitmap_obstack_initialize (NULL);
   const machine_mode cand_mode[3] = { SImode, DImode, TImode };
@@ -2575,6 +2592,11 @@ convert_scalars_to_vector (bool timode_p)
 chain->chain_id);
}
 
+ rtx_insn* iter_insn;
+ unsigned int ii;
+ FOR_EACH_VEC_ELT (chain->control_flow_insns, ii, iter_insn)
+   control_flow_insns.safe_push (iter_insn);
+
  delete chain;
}
 }
@@ -2643,6 +2665,24 @@ convert_scalars_to_vector (bool timode_p)
  DECL_INCOMING_RTL (parm) = gen_rtx_SUBREG (TImode, r, 0);
  }
  }
+
+  if (!control_flow_insns.is_empty ())
+   {
+ free_dominance_info (CDI_DOMINATORS);
+
+ unsigned int i;
+ rtx_insn* insn;
+ FOR_EACH_VEC_ELT (control_flow_insns, i, insn)
+   if (control_flow_insn_p (insn))
+ {
+   /* Split the block after insn.  There will be a fallthru
+  edge, which is OK so we keep it.  We have to create
+  the exception edges ourselves.  */
+   bb = BLOCK_FOR_INSN (insn);
+   split_block (bb, insn);
+   rtl_make_eh_edge (NULL, bb, BB_END (bb));
+ }
+   }
 }
 
   return 0;
diff --git a/gcc/config/i386/i386-features.h b/gcc/config/i386/i386-features.h
index 8bab2d8666d..b259cf679af 100644
---

[PATCH v2] LoongArch: Remove masking process for operand 3 of xvpermi.q.

2024-03-13 Thread Chenghui Pan

The behavior of non-zero unused bits in xvpermi.q instruction's
third operand is undefined on LoongArch, according to our
discussion (https://github.com/llvm/llvm-project/pull/83540),
we think that keeping original insn operand as unmodified
state is better solution.

This patch partially reverts 7b158e036a95b1ab40793dd53bed7dbd770ffdaf.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvpermi_q_):
Remove masking of operand 3.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c:
Reposition operand 3's value into instruction's defined accept range.
---
 gcc/config/loongarch/lasx.md| 5 -
 .../gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c   | 6 +++---
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index ac84db7f0ce..3f25c0c1756 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -640,8 +640,6 @@ (define_insn "lasx_xvpermi_d__1"
(set_attr "mode" "")])
 
 ;; xvpermi.q
-;; Unused bits in operands[3] need be set to 0 to avoid
-;; causing undefined behavior on LA464.
 (define_insn "lasx_xvpermi_q_"
   [(set (match_operand:LASX 0 "register_operand" "=f")
(unspec:LASX
@@ -651,9 +649,6 @@ (define_insn "lasx_xvpermi_q_"
  UNSPEC_LASX_XVPERMI_Q))]
   "ISA_HAS_LASX"
 {
-  int mask = 0x33;
-  mask &= INTVAL (operands[3]);
-  operands[3] = GEN_INT (mask);
   return "xvpermi.q\t%u0,%u2,%3";
 }
   [(set_attr "type" "simd_splat")
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c
index dbc29d2fb22..f89dfc31120 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c
@@ -27,7 +27,7 @@ main ()
   *((unsigned long*)& __m256i_result[2]) = 0x7fff7fff7fff;
   *((unsigned long*)& __m256i_result[1]) = 0x7fe37fe3001d001d;
   *((unsigned long*)& __m256i_result[0]) = 0x7fff7fff7fff;
-  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x2a);
+  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x22);
   ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
 
   *((unsigned long*)& __m256i_op0[3]) = 0x;
@@ -42,7 +42,7 @@ main ()
   *((unsigned long*)& __m256i_result[2]) = 0x0019001c;
   *((unsigned long*)& __m256i_result[1]) = 0x;
   *((unsigned long*)& __m256i_result[0]) = 0x01fe;
-  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0xb9);
+  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x31);
   ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
 
   *((unsigned long*)& __m256i_op0[3]) = 0x00ff00ff00ff00ff;
@@ -57,7 +57,7 @@ main ()
   *((unsigned long*)& __m256i_result[2]) = 0x;
   *((unsigned long*)& __m256i_result[1]) = 0x00ff00ff00ff00ff;
   *((unsigned long*)& __m256i_result[0]) = 0x00ff00ff00ff00ff;
-  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0xca);
+  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x02);
   ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
 
   return 0;
-- 
2.39.3

[committed] libstdc++: Move test error_category to global scope

2024-03-13 Thread Jonathan Wakely

Tested with GDB 14.1 on x86_64-linux. I'll backport this too.

-- >8 --

A recent GDB change causes this test to fail due to missing RTTI for the
custom_cast type. This is presumably because the custom_cat type was
defined as a local class, so has no linkage. Moving it to local scope
seems to fix the test regressions, and probably makes the test more
realistic as a local class with no linkage isn't practical to use as an
error category that almost certainly needs to be referred to in other
scopes.

libstdc++-v3/ChangeLog:

* testsuite/libstdc++-prettyprinters/cxx11.cc: Move custom_cat
to namespace scope.
---
 .../testsuite/libstdc++-prettyprinters/cxx11.cc| 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx11.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx11.cc
index f867ea18306..2f75d12703c 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx11.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx11.cc
@@ -63,6 +63,11 @@ struct datum
 
 std::unique_ptr global;
 
+struct custom_cat : std::error_category {
+  const char* name() const noexcept { return "miaow"; }
+  std::string message(int) const { return ""; }
+};
+
 int
 main()
 {
@@ -179,10 +184,7 @@ main()
   std::error_condition ecinval = 
std::make_error_condition(std::errc::invalid_argument);
   // { dg-final { note-test ecinval {std::error_condition = {"generic": 
EINVAL}} } }
 
-  struct custom_cat : std::error_category {
-const char* name() const noexcept { return "miaow"; }
-std::string message(int) const { return ""; }
-  } cat;
+  custom_cat cat;
   std::error_code emiaow(42, cat);
   // { dg-final { note-test emiaow {std::error_code = {custom_cat: 42}} } }
   std::error_condition ecmiaow(42, cat);
-- 
2.44.0

[committed] libstdc++: Improve documentation on debugging with libstdc++

2024-03-13 Thread Jonathan Wakely

Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/xml/manual/debug.xml: Improve docs on debug builds and
using ASan. Mention _GLIBCXX_ASSERTIONS. Reorder sections to put
the most relevant ones first.
* doc/xml/manual/using.xml: Add comma.
* doc/html/*: Regenerate.
---
 libstdc++-v3/doc/html/index.html  |  2 +-
 libstdc++-v3/doc/html/manual/debug.html   | 75 +--
 .../doc/html/manual/ext_compile_checks.html   | 18 ++--
 libstdc++-v3/doc/html/manual/index.html   |  2 +-
 libstdc++-v3/doc/html/manual/intro.html   |  2 +-
 libstdc++-v3/doc/html/manual/using.html   |  2 +-
 .../doc/html/manual/using_macros.html |  2 +-
 libstdc++-v3/doc/xml/manual/debug.xml | 95 +++
 libstdc++-v3/doc/xml/manual/using.xml |  2 +-
 9 files changed, 120 insertions(+), 80 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/debug.xml 
b/libstdc++-v3/doc/xml/manual/debug.xml
index 7f6d0876fc6..23dbae5e521 100644
--- a/libstdc++-v3/doc/xml/manual/debug.xml
+++ b/libstdc++-v3/doc/xml/manual/debug.xml
@@ -30,7 +30,7 @@
 flags can be varied to change debugging characteristics. For
 instance, turning off all optimization via the -g -O0
 -fno-inline flags will disable inlining and optimizations,
-and add debugging information, so that stepping through all functions,
+and include debugging information, so that stepping through all functions,
 (including inlined constructors and destructors) is possible. In
 addition, -fno-eliminate-unused-debug-types can be
 used when additional debug information, such as nested class info,
@@ -55,41 +55,30 @@
 
 
 
-Debug Versions of Library Binary 
Files
-
+Debug Mode
 
 
-  If you would like debug symbols in libstdc++, there are two ways to
-  build libstdc++ with debug flags. The first is to create a separate
-  debug build by running make from the top-level of a tree
-  freshly-configured with
-
-
- --enable-libstdcxx-debug
-
-and perhaps
-
- --enable-libstdcxx-debug-flags='...'
-
-
-  Both the normal build and the debug build will persist, without
-  having to specify CXXFLAGS, and the debug library will
-  be installed in a separate directory tree, in 
(prefix)/lib/debug.
-  For more information, look at the
-  configuration section.
+  The Debug Mode
+  has compile and run-time checks for many containers.
 
 
 
-  A second approach is to use the configuration flags
+  There are also lightweight assertions for checking function preconditions,
+  such as checking for out-of-bounds indices when accessing a
+  std::vector. These can be enabled without using
+  the full Debug Mode, by using -D_GLIBCXX_ASSERTIONS
+  (see ).
 
-
- make CXXFLAGS='-g3 -fno-inline -O0' all
-
+
+
+
+Tracking uncaught 
exceptions
 
 
-  This quick and dirty approach is often sufficient for quick
-  debugging tasks, when you cannot or don't want to recompile your
-  application to use the debug 
mode.
+  The verbose
+  termination handler gives information about uncaught
+  exceptions which kill the program.
+
 
 
 Memory Leak Hunting
@@ -99,6 +88,13 @@
   which is enabled by the -fsanitize=address option.
 
 
+
+  The std::vector implementation has additional
+  instrumentation to work with AddressSanitizer, but this has to be enabled
+  explicitly by using -D_GLIBCXX_SANITIZE_VECTOR
+  (see ).
+
+
 
   There are also various third party memory tracing and debug utilities
   that can be used to provide detailed memory allocation information
@@ -331,21 +327,44 @@
 
 
 
-Tracking uncaught 
exceptions
+Debug Versions of Library Binary 
Files
 
 
-  The verbose
-  termination handler gives information about uncaught
-  exceptions which kill the program.
+  As described above, libstdc++ is built with debug symbols enabled by default,
+  but because it's also built with optimizations the code can be hard to
+  follow when stepping into the library in a debugger.
 
+
+
+  If you would like to debug libstdc++.so itself,
+  there are two ways to build an unoptimized libstdc++ with debug flags.
+  The first is to create a separate debug build by running make from the
+  top-level of a tree freshly-configured with
+
+
+ --enable-libstdcxx-debug
+
+and perhaps
+
+ --enable-libstdcxx-debug-flags='...'
+
+
+  Both the normal build and the debug build will persist, without
+  having to specify CXXFLAGS, and the debug library will
+  be installed in a separate directory tree, in 
(prefix)/lib/debug.
+  For more information, look at the
+  configuration section.
+
+
+
+  A second approach is to use the configuration flags
+
+
+ make CXXFLAGS='-g3 -fno-inline -O0' all
+
+
 
 
-Debug Mode
-
-   The Debug Mode
-  has compile and run-time checks for many containers.
-  
-
 
 Compile Time 
Checking
 
diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
b/libstdc++-v3/doc/xml/manual/using.xml
index b3b0c368e44..8ac7e74034c 100644
---

Re: [PATCH] libstdc++: Document that _GLIBCXX_CONCEPT_CHECKS might be removed in future

2024-03-13 Thread Jonathan Wakely

On Thu, 7 Mar 2024 at 12:07, Jonathan Wakely wrote:
>
> Any objection to this update to make the docs reflect reality?

Pushed to trunk now.


>
> -- >8 --
>
> The macro-based concept checks are unmaintained and do not support C++11
> or later, so reject valid code. If nobody plans to update them we should
> consider removing them. Alternatively, we could ignore the macro for
> C++11 and later, so they have no effect and don't reject valid code.
>
> libstdc++-v3/ChangeLog:
>
> * doc/xml/manual/debug.xml: Document that concept checking might
> be removed in future.
> * doc/xml/manual/extensions.xml: Likewise.
> ---
>  libstdc++-v3/doc/xml/manual/debug.xml  |  2 ++
>  libstdc++-v3/doc/xml/manual/extensions.xml | 18 --
>  2 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/libstdc++-v3/doc/xml/manual/debug.xml 
> b/libstdc++-v3/doc/xml/manual/debug.xml
> index 42d4d32aa29..7f6d0876fc6 100644
> --- a/libstdc++-v3/doc/xml/manual/debug.xml
> +++ b/libstdc++-v3/doc/xml/manual/debug.xml
> @@ -351,6 +351,8 @@
>
> The Compile-Time
>Checks extension has compile-time checks for many algorithms.
> +  These checks were designed for C++98 and have not been updated to work
> +  with C++11 and later standards. They might be removed at a future date.
>
>  
>
> diff --git a/libstdc++-v3/doc/xml/manual/extensions.xml 
> b/libstdc++-v3/doc/xml/manual/extensions.xml
> index d4fe2f509d4..490a50cc331 100644
> --- a/libstdc++-v3/doc/xml/manual/extensions.xml
> +++ b/libstdc++-v3/doc/xml/manual/extensions.xml
> @@ -77,8 +77,7 @@ extensions, be aware of two things:
>object file.  The checks are also cleaner and easier to read and
>understand.
> 
> -   They are off by default for all versions of GCC from 3.0 to 3.4 (the
> -  latest release at the time of writing).
> +   They are off by default for all GCC 3.0 and all later versions.
>They can be enabled at configure time with
> linkend="manual.intro.setup.configure">--enable-concept-checks.
>You can enable them on a per-translation-unit basis with
> @@ -89,10 +88,17 @@ extensions, be aware of two things:
> 
>
> Please note that the concept checks only validate the requirements
> -   of the old C++03 standard. C++11 was expected to have first-class
> -   support for template parameter constraints based on concepts in the core
> -   language. This would have obviated the need for the library-simulated 
> concept
> -   checking described above, but was not part of C++11.
> +   of the old C++03 standard and reject some valid code that meets the 
> relaxed
> +   requirements of C++11 and later standards.
> +   C++11 was expected to have first-class support for template parameter
> +   constraints based on concepts in the core language.
> +   This would have obviated the need for the library-simulated concept 
> checking
> +   described above, but was not part of C++11.
> +   C++20 adds a different model of concepts, which is now used to constrain
> +   some new parts of the C++20 library, e.g. the
> +   ranges header and the new overloads in the
> +   algorithm header for working with ranges.
> +   The old library-simulated concept checks might be removed at a future 
> date.
> 
>
>  
> --
> 2.43.2
>

Re: [PATCH] libcpp: Fix __has_include_next ICE in the last directory of the path [PR80755]

2024-03-13 Thread Joseph Myers

On Thu, 21 Dec 2023, Lewis Hyatt wrote:

> In libcpp/files.cc, the function _cpp_has_header(), which implements
> __has_include and __has_include_next, does not check for a NULL return value
> from search_path_head(), leading to an ICE tripping an assert when
> _cpp_find_file() tries to use it. Fix it by checking for that case and
> silently returning false instead.
> 
> As suggested by the PR author, it is easiest to make a testcase by using
> the -idirafter option. To enable that, also modify the dg-additional-options
> testsuite procedure to make the global $srcdir available, since -idirafter
> requires the full path.
> 
> libcpp/ChangeLog:
> 
>   PR preprocessor/80755
>   * files.cc (search_path_head): Add SUPPRESS_DIAGNOSTIC argument
>   defaulting to false.
>   (_cpp_has_header): Silently return false if the search path has been
>   exhausted, rather than issuing a diagnostic and then hitting an
>   assert.
> 
> gcc/testsuite/ChangeLog:
> 
>   * lib/gcc-defs.exp (dg-additional-options): Make $srcdir usable in a
>   dg-additional-options directive.
>   * c-c++-common/cpp/has-include-next-2-dir/has-include-next-2.h: New 
> test.
>   * c-c++-common/cpp/has-include-next-2.c: New test.

OK.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH V12]: Improve code sinking pass

2024-03-13 Thread Jeff Law





On 3/13/24 4:22 AM, Richard Biener wrote:



... this hunk is OK (please test and split it out separatley).  In the spirit of
moving the stmt the least amount (in this case not schedule it within the
basic-block).  In the same spirit one would choose an earlier basic-block
but only if the old choosen one post-dominates that, dominance isn't
a good criteria since you'd move it where the computation might not be
needed.  A practical testcase would be

   tem = a + b;
   if (foo)
 bar ();
   tem2 = tem + d;

where we at the moment would sink 'tem = a+ b' to the block containing
'tem2 = tem + d' not reducing the number of evaluations (of course bar()
might not return, but that's a minor detail).  Code motion like that should
be subject to register-pressure considerations which we do not estimate
here at all.  So it could be argued we shouldn't perform any sinking here.
Agreed.  This looks more like a scheduling and register-pressure issue 
rather than a classic sinking issue.


Sinking is supposed to be moving code to lesser executed points.  In the 
case above, the only way sinking into the tem2 = block would be if bar() 
doesn't return.  It just doesn't make sense to me from a sinking standpoint.


The block execution data generally prevents this kind of gratuitous 
movement.


I actually evaluated our sinking code several years ago against an 
implementation of Click's algorithm.  In general they were quite 
comparable in terms of selecting an "optimal" block from an execution 
standpoint.  There were a couple of fixes that were added to our 
implementation at that time, but again, generally we were picking 
sensible blocks.





A good first-order heuristic would be to avoid the scheduling
when the number of non-virtual SSA uses on the stmt to be moved is bigger
than one.  For zero we reduce the lifetime of the def.  For one we're not
making things worse.  For more uses it depends on whether we're moving
within the lifetime of the uses and it becomes a global problem (we're
greedily moving dependent statements, so we even get "local global" wrong
then).

That said, changing will cause regressions, given both before and after
is somewhat ad-hoc it's hard to argue one is more correct than the other.

IMO scheduling should be left to a stmt scheduler on GIMPLE
(which we don't have).
Click's work can function as a statement scheduler, though I'm not 
convinced it's actually a good one.  Essentially most statements are 
conceptually disassociated from their blocks, then re-scheduled by 
visiting defining statements of "pinned" instructions.  That model is 
mostly for driving redundancy elimination.  Scheduling is just a side 
effect.




Bernd had a statement scheduler for gimple years ago, but it was 
somewhat controversial at the time and never moved forward enough to get 
integrated.  IIRC it ran just before or just after TER and its primary 
objective was to avoid some of the pathological cases that ultimately 
result in significant spilling after we're done with the bulk of the RTL 
pipeline.

Re: [PATCH] libcpp: Fix macro expansion for argument of __has_include [PR110558]

2024-03-13 Thread Joseph Myers

On Tue, 12 Dec 2023, Lewis Hyatt wrote:

> When the file name for a #include directive is the result of stringifying a
> macro argument, libcpp needs to take some care to get the whitespace
> correct; in particular stringify_arg() needs to see a CPP_PADDING token
> between macro tokens so that it can figure out when to output space between
> tokens. The CPP_PADDING tokens are not normally generated when handling a
> preprocessor directive, but for #include-like directives, libcpp sets the
> state variable pfile->state.directive_wants_padding to TRUE so that the
> CPP_PADDING tokens will be output, and then everything works fine for
> computed includes.
> 
> As the PR points out, things do not work fine for __has_include. Fix that by
> setting the state variable the same as is done for #include.
> 
> libcpp/ChangeLog:
> 
>   PR preprocessor/110558
>   * macro.cc (builtin_has_include): Set
>   pfile->state.directive_wants_padding prior to lexing the
>   file name, in case it comes from macro expansion.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR preprocessor/110558
>   * c-c++-common/cpp/has-include-2.c: New test.
>   * c-c++-common/cpp/has-include-2.h: New test.

OK.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH v6 4/5] Use the .ACCESS_WITH_SIZE in bound sanitizer.

2024-03-13 Thread Qing Zhao



On Mar 11, 2024, at 13:15, Siddhesh Poyarekar  wrote:



On 2024-02-16 14:47, Qing Zhao wrote:
gcc/c-family/ChangeLog:
* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE.
gcc/testsuite/ChangeLog:
* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
 gcc/c-family/c-ubsan.cc   | 42 +
 .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++
 .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds.c  | 46 +++
 4 files changed, 167 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c
diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..164b29845b3a 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc)
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
 +/* Get the tree that represented the number of counted_by, i.e, the maximum
+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int type_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));

Again for consistency, this should probably be class_of_size.

Okay, I will update this consistently with the change relate to the 3rd 
argument.

+  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+build_int_cst (ptr_type_node, 0));
+  /* If size is negative value, treat it as zero.  */
+  if (!TYPE_UNSIGNED (type))
+  {
+tree cond = fold_build2 (LT_EXPR, boolean_type_node,
+  unshare_expr (size), build_zero_cst (type));
+size = fold_build3 (COND_EXPR, type, cond,
+ build_zero_cst (type), size);
+  }
+
+  /* Only when type_of_size is 1,i.e, the number of the elements of
+ the object type, return the size.  */
+  if (type_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+
 /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
that gets expanded in the sanopt pass, and make an array dimension
of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
&& COMPLETE_TYPE_P (type)
&& integer_zerop (TYPE_SIZE (type)))
  bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  else if (INDIRECT_REF_P (array)
+&& is_access_with_size_p ((TREE_OPERAND (array, 0
+ {
+   bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0)));
+   bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound),
+bound,
+build_int_cst (TREE_TYPE (bound), 1));
+ }

This will wrap if bound == 0, maybe that needs to be special-cased.  And maybe 
also add a test for it below.

Will check on this to see whether a new testing is needed.

Thanks a lot for the review.

Qing

   else
  return NULL_TREE;
 }
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
new file mode 100644
index ..148934975ee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
@@ -0,0 +1,45 @@
+/* test the attribute counted_by and its usage in
+   bounds sanitizer combined with VLA.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-output "index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int 
\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+
+
+#include 
+
+void __attribute__((__noinline__)) setup_and_test_vla (int n, int m)
+{
+   struct foo {
+   int n;
+   int p[][n] __attribute__((counted_by(n)));
+   } *f;
+
+   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n]));
+   f->n = m;
+   f->p[m][n-1]=1;
+   return;
+}
+
+void __attribute__((__noinline__)) setup_and_test_vla_1 (int n1, int n2, int m)
+{
+  struct foo {
+int n;
+int p[][n2][n1] __attribute__((counted_by(n)));
+  } *f;
+
+

Re: [PATCH v6 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-03-13 Thread Qing Zhao



On Mar 11, 2024, at 13:11, Siddhesh Poyarekar  wrote:



On 2024-02-16 14:47, Qing Zhao wrote:
gcc/ChangeLog:
* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.
gcc/testsuite/ChangeLog:
* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 .../gcc.dg/flex-array-counted-by-5.c  |  48 +
 gcc/tree-object-size.cc   |  59 ++
 5 files changed, 359 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();   \
 return 0;   \
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v;   \
+  if (p == v)   \
+__builtin_printf ("ok:  %s == %zd\n", #p, p);   \
+  else   \
+{   \
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);   \
+  FAIL ();   \
+}   \
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..0066c32ca808
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f;
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
+  + normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+   + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
+  + attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..3ce7f3545549
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+we should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say anything about the object it points to,
+   So, __builtin_object_size can not directly use the type of the pointee
+   to decide the size of the object the pointer points to.
+
+   there are only two reliable ways:
+   A. observed allocations  (call to the allocation functions in the routine)
+   B. observed accesses (read or write access to the location of the
+ pointer points to)
+
+   that provide information about the type/existence of an object at
+   the corresponding address.
+
+   for A, we use the "alloc_size" attribute for the corresponding

Re: [PATCH v6 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-03-13 Thread Qing Zhao



> On Mar 11, 2024, at 13:09, Siddhesh Poyarekar  wrote:
> 
> 
> 
> On 2024-02-16 14:47, Qing Zhao wrote:
>> Including the following changes:
>> * The definition of the new internal function .ACCESS_WITH_SIZE
>>   in internal-fn.def.
>> * C FE converts every reference to a FAM with a "counted_by" attribute
>>   to a call to the internal function .ACCESS_WITH_SIZE.
>>   (build_component_ref in c_typeck.cc)
>>   This includes the case when the object is statically allocated and
>>   initialized.
>>   In order to make this working, the routines initializer_constant_valid_p_1
>>   and output_constant in varasm.cc are updated to handle calls to
>>   .ACCESS_WITH_SIZE.
>>   (initializer_constant_valid_p_1 and output_constant in varasm.c)
>>   However, for the reference inside "offsetof", the "counted_by" attribute is
>>   ignored since it's not useful at all.
>>   (c_parser_postfix_expression in c/c-parser.cc)
>>   In addtion to "offsetof", for the reference inside operator "typeof" and
>>   "alignof", we ignore counted_by attribute too.
>>   When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
>>   replace the call with its first argument.
>> * Convert every call to .ACCESS_WITH_SIZE to its first argument.
>>   (expand_ACCESS_WITH_SIZE in internal-fn.cc)
>> * Adjust alias analysis to exclude the new internal from clobbering anything.
>>   (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
>> tree-ssa-alias.cc)
>> * Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
>> when
>>   it's LHS is eliminated as dead code.
>>   (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
>> * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
>>   get the reference from the call to .ACCESS_WITH_SIZE.
>>   (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
>> gcc/c/ChangeLog:
>>  * c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
>>  attribute when build_component_ref inside offsetof operator.
>>  * c-tree.h (build_component_ref): Add one more parameter.
>>  * c-typeck.cc (build_counted_by_ref): New function.
>>  (build_access_with_size_for_counted_by): New function.
>>  (build_component_ref): Check the counted-by attribute and build
>>  call to .ACCESS_WITH_SIZE.
>>  (build_unary_op): When building ADDR_EXPR for
>> .ACCESS_WITH_SIZE, use its first argument.
>> (lvalue_p): Accept call to .ACCESS_WITH_SIZE.
>> gcc/ChangeLog:
>>  * internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
>>  * internal-fn.def (ACCESS_WITH_SIZE): New internal function.
>>  * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
>>  IFN_ACCESS_WITH_SIZE.
>>  (call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
>>  * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
>>  to .ACCESS_WITH_SIZE when its LHS is dead.
>>  * tree.cc (process_call_operands): Adjust side effect for function
>>  .ACCESS_WITH_SIZE.
>>  (is_access_with_size_p): New function.
>>  (get_ref_from_access_with_size): New function.
>>  * tree.h (is_access_with_size_p): New prototype.
>>  (get_ref_from_access_with_size): New prototype.
>>  * varasm.cc (initializer_constant_valid_p_1): Handle call to
>>  .ACCESS_WITH_SIZE.
>>  (output_constant): Handle call to .ACCESS_WITH_SIZE.
>> gcc/testsuite/ChangeLog:
>>  * gcc.dg/flex-array-counted-by-2.c: New test.
>> ---
>>  gcc/c/c-parser.cc |  10 +-
>>  gcc/c/c-tree.h|   2 +-
>>  gcc/c/c-typeck.cc | 128 +-
>>  gcc/internal-fn.cc|  36 +
>>  gcc/internal-fn.def   |   4 +
>>  .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
>>  gcc/tree-ssa-alias.cc |   2 +
>>  gcc/tree-ssa-dce.cc   |   5 +-
>>  gcc/tree.cc   |  25 +++-
>>  gcc/tree.h|   8 ++
>>  gcc/varasm.cc |  10 ++
>>  11 files changed, 332 insertions(+), 10 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
>> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
>> index c31349dae2ff..a6ed5ac43bb1 100644
>> --- a/gcc/c/c-parser.cc
>> +++ b/gcc/c/c-parser.cc
>> @@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
>>  if (c_parser_next_token_is (parser, CPP_NAME))
>>{
>>  c_token *comp_tok = c_parser_peek_token (parser);
>> +/* Ignore the counted_by attribute for reference inside
>> +   offsetof since the information is not useful at all.  */
>>  offsetof_ref
>>= build_component_ref (loc, offsetof_ref, comp_tok->value,
>> - comp_tok->location, UNKNOWN_LOCATION);
>> +

Re: [PATCH, OpenACC 2.7] struct/array reductions for Fortran

2024-03-13 Thread Tobias Burnus


Hi Chung-Lin, hi Thomas, hello world,

some thoughts glancing at the patch.

Chung-Lin Tang wrote:

There is still some shortcomings in the current state, mainly that only explicit-shaped 
arrays can be used (like its C counterpart). Anything else is currently a bit more 
complicated in the middle-end, since the existing reduction code creates an 
"init-op" (literal of initial values) which can't be done when say 
TYPE_MAX_VALUE (TYPE_DOMAIN (array_type)) is not a tree constant. I think we'll be on the 
hook to solve this later, but I think the current state is okay to submit.


I think having some initial support is fine, but it needs an 
understandable and somewhat complete error diagnostic and testcases. 
More to this below.



+  if (!TREE_CONSTANT (min_tree) || !TREE_CONSTANT (max_tree))
+   {
+ error_at (loc, "array in reduction must be of constant size");
+ return error_mark_node;
+   }

Shouldn't this use a sorry_at instead?


+ /* OpenACC current only supports array reductions on explicit-shape
+arrays.  */
+ if ((n->sym->as && n->sym->as->type != AS_EXPLICIT)
+ || n->sym->attr.codimension)
gfc_error ("Array %qs is not permitted in reduction at %L",
   n->sym->name, >where);
[Coarray excursion. I am in favor of allowing it for the reasons above, 
but it could be also rejected but I would prefer to have a proper error 
message in that case.]


While coarrays are unspecified, I do not see a reason why a corray 
shouldn't be permitted here – as long as it is not coindexed. At the 
end, it is just a normal array with some additional properties, which 
make it possible to remotely access it.


Note: For coarray scalars, we have 'sym->as', thus the check should be 
'(n->sym->as && n->sym->as->rank)' to permit scalar coarrays.


* * *

Coarray excursion: A coarray variables exists in multiple processes 
("images", e.g. MPI processes). If 'caf' and 'caf2' are coarrays, then 
'caf = 5' and 'i = caf2' refer to the local variable.


On the other hand, 'caf[n] = 5' or 'i = caf[3,m]' refers to the 'caf' 
variable on image 'n' or [3,m]', respectively, which implies in general 
some function call to read or set the remote data, unless the memory is 
directly accessible (→ e.g. some offset calculation) and the compiler 
already knows how to handle this.


While a coarrary might be allocated in some special memory, as long as 
one uses the local version (i.e. not coindexed / without the image index 
in brackets).


Assume for the example above, e.g., integer :: caf[*], caf2[3:6, 7:*].

* * *

Thus, in terms of OpenACC or OpenMP, there is no reason to fret a 
coarray as long as it is not coindexed and as long as OpenMP/OpenACC 
does not interfere with the memory allocation – either directly ('!$omp 
allocators') or indirectly by placing it into special memory (pinned, 
pseudo-unified-shared memory → OG13's -foffload-memory=pinned/unified).


In the meanwhile, OpenMP actually explicitly allows coarrays with few 
exceptions while OpenACC talks about unspecified behavior.


* * *

Back to generic comments:

If I look at the existing code, I see at gfc_match_omp_clause_reduction:


 if (gfc_match_omp_variable_list (" :", >lists[list_idx], false, NULL,
  , openacc, allow_derived) != 
MATCH_YES)


If 'openacc' is true, array sections are permitted - but the code added 
(see quote above) does not handle n->expr at all and only n->sym.


I think there needs to be at least a "gfc_error ("Sorry, subarrays/array 
sections not yet handled" [subarray is the OpenACC wording, 'array 
section' is the Fortran one, which might be clearer.


But you could consider to handle at least array elements, i.e. 
n->expr->rank == 0.


Additionally, I think the current error message is completely unhelpful 
given that some arrays are supported but most are not.


I think there should be also some testcases for the not-yet-supported 
case. I think the following will trigger the omp-low.cc 'sorry_at' (or 
currently 'error' - but I think it should be a sorry):


subroutine foo(n)

integer :: n, A(n)

... reduction(+:A)

And most others will trigger in openmp.cc; for those, you should have an 
allocatable/pointer and assumed-shape arrays for the diagnostic testcase 
as well.


* * *

I have not really experimented with the code, but does it handle 
multi-dimensional constant arrays like 'integer :: a(3:6,10,-1:1)' ? — I 
bet it does, at least after handling my example [2] for the C patch [1].


Thanks,

Tobias

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html

[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647704.html

Re: [PATCH v6 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-03-13 Thread Qing Zhao

Sid,

Thanks a lot for your time to review the code.
See my reply below:

On Mar 11, 2024, at 10:57, Siddhesh Poyarekar  wrote:

On 2024-02-16 14:47, Qing Zhao wrote:
 return true;
   else
 return targetm.attribute_takes_identifier_p (attr_id);
@@ -2806,6 +2811,53 @@ handle_strict_flex_array_attribute (tree *node, tree 
name,
   return NULL_TREE;
 }
 +/* Handle a "counted_by" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_counted_by_attribute (tree *node, tree name,
+  tree args, int ARG_UNUSED (flags),
+  bool *no_add_attrs)
+{
+  tree decl = *node;
+  tree argval = TREE_VALUE (args);
+
+  /* This attribute only applies to field decls of a structure.  */
+  if (TREE_CODE (decl) != FIELD_DECL)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+ "%qE attribute may not be specified for non-field"
+ " declaration %q+D", name, decl);
+  *no_add_attrs = true;
+}
+  /* This attribute only applies to field with array type.  */
+  else if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+ "%qE attribute may not be specified for a non-array field",
+ name);
+  *no_add_attrs = true;
+}
+  /* This attribute only applies to a C99 flexible array member type.  */
+  else if (! c_flexible_array_member_type_p (TREE_TYPE (decl)))
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+ "%qE attribute may not be specified for a non"
+ " flexible array member field",
+ name);
+  *no_add_attrs = true;
+}

How about "not allowed" instead of "may not be specified"?

Okay, will update them.

+  /* The argument should be an identifier.  */
+  else if (TREE_CODE (argval) != IDENTIFIER_NODE)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+ "% argument not an identifier");
+  *no_add_attrs = true;
+}

Validate that the attribute only applies to a C99 flexible array member of a 
structure and the argument should be an identifier node.  OK. 
verify_counted_by_attribute does more extensive validation on argval.
Yes.

+
+  return NULL_TREE;
+}
+
 /* Handle a "weak" attribute; arguments as in
struct attribute_spec.handler.  */
 diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index e15eff698dfd..56d828e3dfaf 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -9909,6 +9909,19 @@ c_common_finalize_early_debug (void)
   (*debug_hooks->early_global_decl) (cnode->decl);
 }
 +/* Determine whether TYPE is a ISO C99 flexible array memeber type "[]".  */

s/memeber/member/
Okay, will update it.

+bool
+c_flexible_array_member_type_p (const_tree type)
+{
+  if (TREE_CODE (type) == ARRAY_TYPE
+  && TYPE_SIZE (type) == NULL_TREE
+  && TYPE_DOMAIN (type) != NULL_TREE
+  && TYPE_MAX_VALUE (TYPE_DOMAIN (type)) == NULL_TREE)
+return true;
+
+  return false;
+}
+

Moved from c/c-decl.cc.  OK.

 /* Get the LEVEL of the strict_flex_array for the ARRAY_FIELD based on the
values of attribute strict_flex_array and the flag_strict_flex_arrays.  */
 unsigned int
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 2d5f53998855..3e0eed0548b0 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -904,6 +904,7 @@ extern tree fold_for_warn (tree);
 extern tree c_common_get_narrower (tree, int *);
 extern bool get_attribute_operand (tree, unsigned HOST_WIDE_INT *);
 extern void c_common_finalize_early_debug (void);
+extern bool c_flexible_array_member_type_p (const_tree);
 extern unsigned int c_strict_flex_array_level_of (tree);
 extern bool c_option_is_from_cpp_diagnostics (int);
 extern tree c_hardbool_type_attr_1 (tree, tree *, tree *);
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index fe20bc21c926..4348123502e4 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5301,19 +5301,6 @@ set_array_declarator_inner (struct c_declarator *decl,
   return decl;
 }
 -/* Determine whether TYPE is a ISO C99 flexible array memeber type "[]".  */
-static bool
-flexible_array_member_type_p (const_tree type)
-{
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && TYPE_SIZE (type) == NULL_TREE
-  && TYPE_DOMAIN (type) != NULL_TREE
-  && TYPE_MAX_VALUE (TYPE_DOMAIN (type)) == NULL_TREE)
-return true;
-
-  return false;
-}
-
 /* Determine whether TYPE is a one-element array type "[1]".  */
 static bool
 one_element_array_type_p (const_tree type)
@@ -5350,7 +5337,7 @@ add_flexible_array_elts_to_size (tree decl, tree init)
 elt = CONSTRUCTOR_ELTS (init)->last ().value;
   type = TREE_TYPE (elt);
-  if (flexible_array_member_type_p (type))
+  if (c_flexible_array_member_type_p (type))
 {
   complete_array_type (, elt, false);
   DECL_SIZE (decl)
@@ -9317,7 +9304,7 @@ is_flexible_array_member_p (bool is_last_field,
 bool is_zero_length_array = zero_length_array_type_p (TREE_TYPE (x));
   bool is_one_element_array = one_element_array_type_p (TREE_TYPE (x));
-  bool is_flexible_array =

Re: [PATCH, OpenACC 2.7] Implement reductions for arrays and structs

2024-03-13 Thread Tobias Burnus


Hi Chung-Lin,


https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641669.html

Chung-Lin Tang wrote:

this patch implements reductions for arrays and structs for OpenACC. Following 
the pattern for OpenACC reductions [...]


(Stumbled over while looking at the Fortran patch, but applying to 
C/C++, hence mentioned here; the Fortran patch is at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645205.html )



OpenACC permits array elements and subarrays. I have not checked whether 
array elements are currently rejected or fully supported, but I miss a 
testcase for both array elements (unless there is one already) and array 
sections.


If implemented, I think there should be a working run-time test.
If not supported, there should be a sorry_at error for those.

Note: the parser should handle array sections as OpenMP handles them.

The testcase should cover something like the following:

void f(int n)
{
  int x[5][5]; // Multimensional array;
  int y[n]; // VLA
  int *z = (int*)malloc(5*5*sizeof(int)); // Allocated array

... reduction(+:x)
... reduction(+:y)

... reduction(+:x[0:5][2:1])  // OK
... reduction(+:x[1:4][2:1])
  // invalid - while contiguous, first dim does not span the whole array
... reduction(+:y[2:2])  // OK
... reduction(+:y[3:])  // OK - same as [3:n-3]
... reduction(+:y[:2])  // OK - same as [0:2]
... reduction(+:z[1:2][1:6])  // OK

And the same where at least one of the const number is replaced by
a variable.

Note: The 'invalid' reduction is fine in terms of being contiguous (last 
dimension contains a single element, hence, the dimension before does 
not need to span the whole extend) - but OpenACC requires the all 
dimensions but the last to span the whole range.


See "2.7.1 Data Specification in Data Clauses" for the subarray description.

I think - if known at compile time - there should be also a diagnostic 
if the any dimension but the last does not span the whole range.


Thanks,

Tobias

Re: [wwwdocs] Reverse development timeline graph

2024-03-13 Thread Richard Biener




> Am 13.03.2024 um 16:45 schrieb Jonathan Wakely :
> 
> Every year I have to scroll down further and further to the useful part,
> and I'm getting too old to spend my time doing that! :)
> 
> I suggested this on IRC and iains agreed. What do others think?

It feels a bit odd.  Can we use html to collapse parts referring to no longer 
maintained releases. -allowing a click to expand it?

> -- >8 --
> 
> This seems more useful with the recent history first.
> ---
> htdocs/develop.html | 819 ++--
> 1 file changed, 411 insertions(+), 408 deletions(-)
> 
> diff --git a/htdocs/develop.html b/htdocs/develop.html
> index 702256cf..f741bd4a 100644
> --- a/htdocs/develop.html
> +++ b/htdocs/develop.html
> @@ -298,421 +298,424 @@ number carried little to no useful information.
> 
> Release Timeline
> 
> -Here is a history of recent and a tentative timeline of upcoming
> +Here is a history of releases and a tentative timeline of 
> upcoming
> stages of development, branch points, and releases:
> 
> 
> 
> -  ... former releases ...
> -   |
> -   +-- GCC 3.0 branch created --+
> -   |  (Feb 12 2001)  \
> -   |  v
> -   v   GCC 3.0 release (Jun 18 2001)
> -  New development plan announced\
> -   |  (Jul 18 2001)  v
> -   |   GCC 3.0.1 release (Aug 20 2001)
> -   |   \
> -   vv
> -  GCC 3.1 Stage 1 (ended Oct 15 2001)  GCC 3.0.2 release (Oct 25 2001)
> -   |  \
> -   v   v
> -  GCC 3.1 Stage 2 (ended Dec 19 2001)  GCC 3.0.3 release (Dec 20 2001)
> -   | \
> -   v  v
> -  GCC 3.1 Stage 3 (ended Feb 26 2002)  GCC 3.0.4 release (Feb 20 2002)
> -   |
> -   +-- GCC 3.1 branch created --+
> -   | \
> -   |  v
> -   v   GCC 3.1 release (May 15 2002)
> -  GCC 3.2 Stage 1 (ended Jun 22 2002)   \
> -   | v
> -   |   GCC 3.1.1 release (Jul 25 2002)
> -   |   \
> -   vv
> -  New development plan announced   Branch renamed to GCC 3.2 to
> -   |  (Jul 14 2002)accommodate for C++ ABI fixes
> -   |   (C++ binary incompatible with
> -   |   GCC 3.1, see release info)
> -   | \
> -   |  v
> -   |   GCC 3.2 release (Aug 14 2002)
> -   |\
> -   | v
> -   |   GCC 3.2.1 release (Nov 19 2002)
> -   |   \
> -   |v
> -   |   GCC 3.2.2 release (Feb 05 2003)
> -   |  \
> -   |   v
> -   |   GCC 3.2.3 release (April 22 2003)
> -   v
> -  GCC 3.3 Stage 2 (ends Aug 15 2002)
> -   |
> -   v
> -  GCC 3.3 Stage 3 (ends Oct 15 2002)
> -   |
> -   +-- GCC 3.3 branch created --+
> -   |(Dec 14 2002)\
> -   |  v
> -   |   GCC 3.3 release (May 13 2003)
> -   |\
> -   v v
> -  GCC 3.4 Stage 1 (ends July 4 2003)   GCC 3.3.1 release (Aug 8 2003)
> -   |   \
> -   vv
> -  GCC 3.4 Stage 2 (ends October 15 2003)   GCC 3.3.2 release (Oct 17 2003)
> -   |  \
> -   v   v
> -  GCC 3.4 Stage 3  GCC 3.3.3 release (Feb 14 2004)
> -   | \
> -   |  v
> -   |   GCC 3.3.4 release (May 31 2004)
> -   |\
> -   | v
> -   |   GCC 3.3.5 release (Sep 30 2004)
> -   |

[PATCH V2 1/1] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-13 Thread Ajit Agarwal

Hello All:

Common infrastructure using generic code for load store fusion of rs6000
target.

Generic code are implemented and defined  that can be used in target specific
code for aarch64 and rs6000 target.

Generic code are implemeneted in gcc/pair-fusion-base.h, 
gcc/pair-fusion-common.cc
and gcc/pair-fusion.cc.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs600-mem-fusion.cc that uses generic code.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

rs6000: Load store fusion for rs6000 target using common infrastructure

Common infrastructure using generic code for load store fusion of rs6000
target.

Generic code are implemented and defined  that can be used in target specific
code for aarch64 and rs6000 target.

Generic code are implemeneted in gcc/pair-fusion-base.h, 
gcc/pair-fusion-common.cc
and gcc/pair-fusion.cc.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs600-mem-fusion.cc that uses generic code.

2024-03-13  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: New mem fusion pass
before pass_early_remat.
* config/rs6000/rs6000-mem-fusion.cc: Add new pass.
Add target specific implementation using pure virtual
functions.
* config.gcc: Add new executable.
* config/rs6000/rs6000-protos.h: Add new prototype for mem
fusion pass.
* config/rs6000/rs6000.cc: Add new prototype for mem fusion
pass.
* config/rs6000/t-rs6000: Add new rule.
* rtl-ssa/accesses.h: Moved set_is_live_out_use as public
from private.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/me-fusion.C: New test.
* g++.target/powerpc/mem-fusion-1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
---
 gcc/config.gcc|   2 +
 gcc/config/rs6000/rs6000-mem-fusion.cc| 697 ++
 gcc/config/rs6000/rs6000-passes.def   |   4 +-
 gcc/config/rs6000/rs6000-protos.h |   1 +
 gcc/config/rs6000/rs6000.cc   |   1 +
 gcc/config/rs6000/t-rs6000|   5 +
 gcc/rtl-ssa/accesses.h|   2 +-
 .../g++.target/powerpc/mem-fusion-1.C |  22 +
 gcc/testsuite/g++.target/powerpc/mem-fusion.C |  15 +
 .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
 10 files changed, 749 insertions(+), 4 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-mem-fusion.cc
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 624e0dae191..52ecd66dcc6 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -522,6 +522,7 @@ powerpc*-*-*)
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -558,6 +559,7 @@ rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
;;
diff --git a/gcc/config/rs6000/rs6000-mem-fusion.cc 
b/gcc/config/rs6000/rs6000-mem-fusion.cc
new file mode 100644
index 000..3522582f6fb
--- /dev/null
+++ b/gcc/config/rs6000/rs6000-mem-fusion.cc
@@ -0,0 +1,697 @@
+/* Subroutines used to replace lxv with lxvp
+   for TARGET_POWER10 and TARGET_VSX,
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#define IN_TARGET_CODE 1
+#define

[wwwdocs] Reverse development timeline graph

2024-03-13 Thread Jonathan Wakely

Every year I have to scroll down further and further to the useful part,
and I'm getting too old to spend my time doing that! :)

I suggested this on IRC and iains agreed. What do others think?

-- >8 --

This seems more useful with the recent history first.
---
 htdocs/develop.html | 819 ++--
 1 file changed, 411 insertions(+), 408 deletions(-)

diff --git a/htdocs/develop.html b/htdocs/develop.html
index 702256cf..f741bd4a 100644
--- a/htdocs/develop.html
+++ b/htdocs/develop.html
@@ -298,421 +298,424 @@ number carried little to no useful information.
 
 Release Timeline
 
-Here is a history of recent and a tentative timeline of upcoming
+Here is a history of releases and a tentative timeline of upcoming
 stages of development, branch points, and releases:
 
 
 
-  ... former releases ...
-   |
-   +-- GCC 3.0 branch created --+
-   |  (Feb 12 2001)  \
-   |  v
-   v   GCC 3.0 release (Jun 18 2001)
-  New development plan announced\
-   |  (Jul 18 2001)  v
-   |   GCC 3.0.1 release (Aug 20 2001)
-   |   \
-   vv
-  GCC 3.1 Stage 1 (ended Oct 15 2001)  GCC 3.0.2 release (Oct 25 2001)
-   |  \
-   v   v
-  GCC 3.1 Stage 2 (ended Dec 19 2001)  GCC 3.0.3 release (Dec 20 2001)
-   | \
-   v  v
-  GCC 3.1 Stage 3 (ended Feb 26 2002)  GCC 3.0.4 release (Feb 20 2002)
-   |
-   +-- GCC 3.1 branch created --+
-   | \
-   |  v
-   v   GCC 3.1 release (May 15 2002)
-  GCC 3.2 Stage 1 (ended Jun 22 2002)   \
-   | v
-   |   GCC 3.1.1 release (Jul 25 2002)
-   |   \
-   vv
-  New development plan announced   Branch renamed to GCC 3.2 to
-   |  (Jul 14 2002)accommodate for C++ ABI fixes
-   |   (C++ binary incompatible with
-   |   GCC 3.1, see release info)
-   | \
-   |  v
-   |   GCC 3.2 release (Aug 14 2002)
-   |\
-   | v
-   |   GCC 3.2.1 release (Nov 19 2002)
-   |   \
-   |v
-   |   GCC 3.2.2 release (Feb 05 2003)
-   |  \
-   |   v
-   |   GCC 3.2.3 release (April 22 2003)
-   v
-  GCC 3.3 Stage 2 (ends Aug 15 2002)
-   |
-   v
-  GCC 3.3 Stage 3 (ends Oct 15 2002)
-   |
-   +-- GCC 3.3 branch created --+
-   |(Dec 14 2002)\
-   |  v
-   |   GCC 3.3 release (May 13 2003)
-   |\
-   v v
-  GCC 3.4 Stage 1 (ends July 4 2003)   GCC 3.3.1 release (Aug 8 2003)
-   |   \
-   vv
-  GCC 3.4 Stage 2 (ends October 15 2003)   GCC 3.3.2 release (Oct 17 2003)
-   |  \
-   v   v
-  GCC 3.4 Stage 3  GCC 3.3.3 release (Feb 14 2004)
-   | \
-   |  v
-   |   GCC 3.3.4 release (May 31 2004)
-   |\
-   | v
-   |   GCC 3.3.5 release (Sep 30 2004)
-   |   \
-   |v
-   |   GCC 3.3.6 release (May 03 2005)
-   |
-   +-- GCC 3.4 branch created --+
-   |(Jan 16 2004)\
-   |  v
-  Tree SSA infrastructure  GCC 3.4.0 release (Apr 18

[PATCH V2 0/1] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-13 Thread Ajit Agarwal



Hello All:

Common infrastructure using generic code for load store fusion of rs6000
target.

This patch is split-patch 0 which uses generic code are implemented and defined
that can be used in target specific code for aarch64 and rs6000 target.

Generic code are implemeneted in gcc/pair-fusion-base.h, 
gcc/pair-fusion-common.cc
and gcc/pair-fusion.cc.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs600-mem-fusion.cc that uses generic code.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


rs6000: Load store fusion for rs6000 target using common infrastructure

Common infrastructure using generic code for load store fusion of rs6000
target.

Generic code are implemented and defined  that can be used in target specific
code for aarch64 and rs6000 target.

Generic code are implemeneted in gcc/pair-fusion-base.h, 
gcc/pair-fusion-common.cc
and gcc/pair-fusion.cc.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs600-mem-fusion.cc that uses generic code.

2024-03-13  Ajit Kumar Agarwal  

gcc/ChangeLog:

* pair-fusion-base.h: Generic header code for load store fusion
that can be shared across different architectures.
* pair-fusion-common.cc: Generic source code for load store
fusion that can be shared across different architectures.
* pair-fusion.cc: Generic implementation of pair_fusion class
defined in pair-fusion-base.h
* Makefile.in: Add new executable pair-fusion.o and
pair-fusion-common.o.
---
 gcc/Makefile.in   |2 +
 gcc/pair-fusion-base.h|  613 ++
 gcc/pair-fusion-common.cc | 1200 
 gcc/pair-fusion.cc| 1230 +
 4 files changed, 3045 insertions(+)
 create mode 100644 gcc/pair-fusion-base.h
 create mode 100644 gcc/pair-fusion-common.cc
 create mode 100644 gcc/pair-fusion.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a74761b7ab3..df5061ddfe7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1563,6 +1563,8 @@ OBJS = \
ipa-strub.o \
ipa.o \
ira.o \
+   pair-fusion-common.o \
+   pair-fusion.o \
ira-build.o \
ira-costs.o \
ira-conflicts.o \
diff --git a/gcc/pair-fusion-base.h b/gcc/pair-fusion-base.h
new file mode 100644
index 000..0d9b5db12be
--- /dev/null
+++ b/gcc/pair-fusion-base.h
@@ -0,0 +1,613 @@
+// Generic code for Pair MEM  fusion optimization pass.
+// Copyright (C) 2024 Free Software Foundation, Inc.
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify it
+// under the terms of the GNU General Public License as published by
+// the Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// GCC is distributed in the hope that it will be useful, but
+// WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+// General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with GCC; see the file COPYING3.  If not see
+// .
+
+#ifndef GCC_PAIR_FUSION_H
+#define GCC_PAIR_FUSION_H
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+#define INCLUDE_LIST
+#define INCLUDE_TYPE_TRAITS
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "df.h"
+#include "rtl-iter.h"
+#include "rtl-ssa.h"
+#include "cfgcleanup.h"
+#include "tree-pass.h"
+#include "ordered-hash-map.h"
+#include "tree-dfa.h"
+#include "fold-const.h"
+#include "tree-hash-traits.h"
+#include "print-tree.h"
+#include "insn-attr.h"
+using namespace rtl_ssa;
+// We pack these fields (load_p, fpsimd_p, and size) into an integer
+// (LFS) which we use as part of the key into the main hash tables.
+//
+// The idea is that we group candidates together only if they agree on
+// the fields below.  Candidates that disagree on any of these
+// properties shouldn't be merged together.
+struct lfs_fields
+{
+  bool load_p;
+  bool fpsimd_p;
+  unsigned size;
+};
+
+using insn_list_t = std::list;
+using insn_iter_t = insn_list_t::iterator;
+
+// Information about the accesses at a given offset from a particular
+// base.  Stored in an access_group, see below.
+struct access_record
+{
+  poly_int64 offset;
+  std::list cand_insns;
+  std::list::iterator place;
+
+  access_record (poly_int64 off) : offset (off) {}
+};
+
+// A group of accesses where adjacent accesses could be ldp/stp
+// candidates.  The splay tree supports efficient insertion,
+// while the list supports efficient iteration.
+struct access_group
+{
+  splay_tree tree;
+  std::list list;
+
+  template
+  inline void track (Alloc node_alloc, poly_int64 offset,

[PATCH V1 1/1] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-13 Thread Ajit Agarwal

Hello All:

Common infrastructure using generic code for load store fusion of rs6000
target.

Generic code are implemented and defined  that can be used in target specific
code for aarch64 and rs6000 target.

Generic code are implemeneted in gcc/pair-fusion-base.h, 
gcc/pair-fusion-common.cc
and gcc/pair-fusion.cc.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs600-mem-fusion.cc that uses generic code.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

rs6000: Load store fusion for rs6000 target using common infrastructure

Common infrastructure using generic code for load store fusion of rs6000
target.

Generic code are implemented and defined  that can be used in target specific
code for aarch64 and rs6000 target.

Generic code are implemeneted in gcc/pair-fusion-base.h, 
gcc/pair-fusion-common.cc
and gcc/pair-fusion.cc.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs600-mem-fusion.cc that uses generic code.

2024-03-13  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: New mem fusion pass
before pass_early_remat.
* config/rs6000/rs6000-mem-fusion.cc: Add new pass.
Add target specific implementation using pure virtual
functions.
* config.gcc: Add new executable.
* config/rs6000/rs6000-protos.h: Add new prototype for mem
fusion pass.
* config/rs6000/rs6000.cc: Add new prototype for mem fusion
pass.
* config/rs6000/t-rs6000: Add new rule.
* rtl-ssa/accesses.h: Moved set_is_live_out_use as public
from private.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/vecload-fusion.C: New test.
* g++.target/powerpc/vecload-fusion_1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
---
 gcc/config.gcc|   2 +
 gcc/config/rs6000/rs6000-mem-fusion.cc| 704 ++
 gcc/config/rs6000/rs6000-passes.def   |   4 +-
 gcc/config/rs6000/rs6000-protos.h |   1 +
 gcc/config/rs6000/rs6000.cc   |   1 +
 gcc/config/rs6000/t-rs6000|   5 +
 gcc/rtl-ssa/accesses.h|   2 +-
 .../g++.target/powerpc/mem-fusion-1.C |  22 +
 gcc/testsuite/g++.target/powerpc/mem-fusion.C |  15 +
 .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
 10 files changed, 756 insertions(+), 4 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-mem-fusion.cc
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 624e0dae191..52ecd66dcc6 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -522,6 +522,7 @@ powerpc*-*-*)
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -558,6 +559,7 @@ rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
;;
diff --git a/gcc/config/rs6000/rs6000-mem-fusion.cc 
b/gcc/config/rs6000/rs6000-mem-fusion.cc
new file mode 100644
index 000..3a92d5be61c
--- /dev/null
+++ b/gcc/config/rs6000/rs6000-mem-fusion.cc
@@ -0,0 +1,704 @@
+/* Subroutines used to replace lxv with lxvp
+   for TARGET_POWER10 and TARGET_VSX,
+
+   Copyright (C) 2020-2023 Free Software Foundation, Inc.
+   Contributed by Ajit Kumar Agarwal .
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .

[PATCH] match.pd: Only merge truncation with conversion for -fno-signed-zeros

2024-03-13 Thread Joe Ramsay

This optimisation does not honour signed zeros, so should not be
enabled except with -fno-signed-zeros.

OK for master? I do not have commit rights for GCC, so if the patch
is fine would someone be able to commit for me? The bug is present
in all GCC versions from 12.1.0 onwards - is it possible to backport
this?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Thanks,
Joe

gcc/ChangeLog:

* match.pd: Fix truncation pattern for -fno-signed-zeroes

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/no_merge_trunc_signed_zero.c: New test.
---
 gcc/match.pd  |  2 +-
 .../aarch64/no_merge_trunc_signed_zero.c  | 24 +++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 9ce313323a3..45c34c810cf 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4857,7 +4857,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 #if GIMPLE
 (simplify
(float (fix_trunc @0))
-   (if (!flag_trapping_math
+   (if (!flag_trapping_math && !HONOR_SIGNED_ZEROS(type)
&& types_match (type, TREE_TYPE (@0))
&& direct_internal_fn_supported_p (IFN_TRUNC, type,
  OPTIMIZE_FOR_BOTH))
diff --git a/gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c 
b/gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c
new file mode 100644
index 000..b2c93e55567
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-trapping-math -fsigned-zeros" } */
+
+#include 
+
+float
+f1 (float x)
+{
+  return (int) rintf(x);
+}
+
+double
+f2 (double x)
+{
+  return (long) rint(x);
+}
+
+/* { dg-final { scan-assembler "frintx\\ts\[0-9\]+, s\[0-9\]+" } } */
+/* { dg-final { scan-assembler "cvtzs\\ts\[0-9\]+, s\[0-9\]+" } } */
+/* { dg-final { scan-assembler "scvtf\\ts\[0-9\]+, s\[0-9\]+" } } */
+/* { dg-final { scan-assembler "frintx\\td\[0-9\]+, d\[0-9\]+" } } */
+/* { dg-final { scan-assembler "cvtzs\\td\[0-9\]+, d\[0-9\]+" } } */
+/* { dg-final { scan-assembler "scvtf\\td\[0-9\]+, d\[0-9\]+" } } */
+
-- 
2.27.0

[PATCH V1 0/1] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-13 Thread Ajit Agarwal

Hello All:

Common infrastructure using generic code for load store fusion of rs6000
target.

This patch is split-patch 0 which uses generic code are implemented and defined
that can be used in target specific code for aarch64 and rs6000 target.

Generic code are implemeneted in gcc/pair-fusion-base.h, 
gcc/pair-fusion-common.cc
and gcc/pair-fusion.cc.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs600-mem-fusion.cc that uses generic code.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

rs6000: Load store fusion for rs6000 target using common infrastructure

Common infrastructure using generic code for load store fusion of rs6000
target.

Generic code are implemented and defined  that can be used in target specific
code for aarch64 and rs6000 target.

Generic code are implemeneted in gcc/pair-fusion-base.h, 
gcc/pair-fusion-common.cc
and gcc/pair-fusion.cc.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs600-mem-fusion.cc that uses generic code.

2024-03-13  Ajit Kumar Agarwal  

gcc/ChangeLog:

* pair-fusion-base.h: Generic header code for load store fusion
that can be shared across different architectures.
* pair-fusion-common.cc: Generic source code for load store
fusion that can be shared across different architectures.
* pair-fusion.cc: Generic implementation of pair_fusion class
defined in pair-fusion-base.h
* Makefile.in: Add new executable pair-fusion.o and
pair-fusion-common.o.
---
 gcc/Makefile.in   |2 +
 gcc/pair-fusion-base.h|  618 +++
 gcc/pair-fusion-common.cc | 1204 
 gcc/pair-fusion.cc| 1232 +
 4 files changed, 3056 insertions(+)
 create mode 100644 gcc/pair-fusion-base.h
 create mode 100644 gcc/pair-fusion-common.cc
 create mode 100644 gcc/pair-fusion.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a74761b7ab3..df5061ddfe7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1563,6 +1563,8 @@ OBJS = \
ipa-strub.o \
ipa.o \
ira.o \
+   pair-fusion-common.o \
+   pair-fusion.o \
ira-build.o \
ira-costs.o \
ira-conflicts.o \
diff --git a/gcc/pair-fusion-base.h b/gcc/pair-fusion-base.h
new file mode 100644
index 000..53393c1f823
--- /dev/null
+++ b/gcc/pair-fusion-base.h
@@ -0,0 +1,618 @@
+// Generic code for Pair MEM  fusion optimization pass.
+// Copyright (C) 2024 Free Software Foundation, Inc.
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify it
+// under the terms of the GNU General Public License as published by
+// the Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// GCC is distributed in the hope that it will be useful, but
+// WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+// General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with GCC; see the file COPYING3.  If not see
+// .
+
+#ifndef GCC_PAIR_FUSION_H
+#define GCC_PAIR_FUSION_H
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+#define INCLUDE_LIST
+#define INCLUDE_TYPE_TRAITS
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "df.h"
+#include "rtl-iter.h"
+#include "rtl-ssa.h"
+#include "cfgcleanup.h"
+#include "tree-pass.h"
+#include "ordered-hash-map.h"
+#include "tree-dfa.h"
+#include "fold-const.h"
+#include "tree-hash-traits.h"
+#include "print-tree.h"
+#include "insn-attr.h"
+using namespace rtl_ssa;
+// We pack these fields (load_p, fpsimd_p, and size) into an integer
+// (LFS) which we use as part of the key into the main hash tables.
+//
+// The idea is that we group candidates together only if they agree on
+// the fields below.  Candidates that disagree on any of these
+// properties shouldn't be merged together.
+struct lfs_fields
+{
+  bool load_p;
+  bool fpsimd_p;
+  unsigned size;
+};
+
+using insn_list_t = std::list;
+using insn_iter_t = insn_list_t::iterator;
+
+// Information about the accesses at a given offset from a particular
+// base.  Stored in an access_group, see below.
+struct access_record
+{
+  poly_int64 offset;
+  std::list cand_insns;
+  std::list::iterator place;
+
+  access_record (poly_int64 off) : offset (off) {}
+};
+
+// A group of accesses where adjacent accesses could be ldp/stp
+// candidates.  The splay tree supports efficient insertion,
+// while the list supports efficient iteration.
+struct access_group
+{
+  splay_tree tree;
+  std::list list;
+
+  template
+  inline void track (Alloc node_alloc, poly_int64 offset, insn_info

[PATCH 3/3] bpf: Corrected index computation when present with unnamed struct fields

2024-03-13 Thread Cupertino Miranda

Any unnamed structure field if not a member of the BTF_KIND_STRUCT.
For that reason, CO-RE access strings indexes should take that in
consideration. This patch adds a condition to the incrementer that
computes the index for the field access.

gcc/ChangeLog:
* config/bpf/core-builtins.cc (bpf_core_get_index): Check if
field contains a DECL_NAME.

gcc/testsuite/ChangeLog:
* gcc.target/bpf/core-builtin-fieldinfo-offset-1.c: Add
testcase for unnamed fields.
---
 gcc/config/bpf/core-builtins.cc|  6 +-
 .../gcc.target/bpf/core-builtin-fieldinfo-offset-1.c   | 10 --
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/config/bpf/core-builtins.cc b/gcc/config/bpf/core-builtins.cc
index 70b14e48e6e5..8333ad81d0e0 100644
--- a/gcc/config/bpf/core-builtins.cc
+++ b/gcc/config/bpf/core-builtins.cc
@@ -553,7 +553,11 @@ bpf_core_get_index (const tree node, bool *valid)
{
  if (l == node)
return i;
- i++;
+ /* Skip unnamed padding, not represented by BTF.  */
+ if (DECL_NAME(l) != NULL_TREE
+ || TREE_CODE (TREE_TYPE (l)) == UNION_TYPE
+ || TREE_CODE (TREE_TYPE (l)) == RECORD_TYPE)
+   i++;
}
 }
   else if (code == ARRAY_REF || code == ARRAY_RANGE_REF || code == MEM_REF)
diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-fieldinfo-offset-1.c 
b/gcc/testsuite/gcc.target/bpf/core-builtin-fieldinfo-offset-1.c
index 27654205287d..8b1d8b012a2a 100644
--- a/gcc/testsuite/gcc.target/bpf/core-builtin-fieldinfo-offset-1.c
+++ b/gcc/testsuite/gcc.target/bpf/core-builtin-fieldinfo-offset-1.c
@@ -14,6 +14,9 @@ struct T {
   struct S s[2];
   char c;
   char d;
+  int a: 1;
+  int:31;
+  int f;
 };
 
 enum {
@@ -38,7 +41,9 @@ unsigned int foo (struct T *t)
   unsigned e1 = __builtin_preserve_field_info (bar()->d, FIELD_BYTE_OFFSET);
   unsigned e2 = __builtin_preserve_field_info (bar()->s[1].a4, 
FIELD_BYTE_OFFSET);
 
-  return s0a1 + s0a4 + s0x + s1a1 + s1a4 + s1x + c + d + e1 + e2;
+  unsigned f1 = __builtin_preserve_field_info (t->f, FIELD_BYTE_OFFSET);
+
+  return s0a1 + s0a4 + s0x + s1a1 + s1a4 + s1x + c + d + e1 + e2 + f1;
 }
 
 /* { dg-final { scan-assembler-times "\[\t \]mov\[\t \]%r\[0-9\],4" 2 } } */
@@ -65,5 +70,6 @@ unsigned int foo (struct T *t)
 /* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0:1:1:4\"\\)" 1 } } 
*/
 /* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0:2\"\\)" 1 } } */
 /* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0:3\"\\)" 2 } } */
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0:5\"\\)" 1 } } */
 
-/* { dg-final { scan-assembler-times "0\[\t \]+\[^\n\]*bpfcr_kind" 10 } } */
+/* { dg-final { scan-assembler-times "0\[\t \]+\[^\n\]*bpfcr_kind" 11 } } */
-- 
2.30.2

[PATCH 2/3] bpf: Fix access string default for CO-RE type based relocations

2024-03-13 Thread Cupertino Miranda

Although part of all CO-RE relocation data, type based relocations do
not require an access string.
Initial implementation defined it as an empty string.
On the other hand, libbpf when parsing the CO-RE relocations verifies
that those strings would contain "0", otherwise reports an error.
This patch makes GCC compliant with libbpf expectations.

gcc/Changelog:
* config/bpf/btfext-out.cc (cpf_core_reloc_add): Correct for new code.
Add assert to validate the string is set.
* config/bpf/core-builtins.cc (cr_final): Make string struct
field as const.
(process_enum_value): Correct for field type change.
(process_type): Set access string to "0".

gcc/testsuite/ChangeLog:
* gcc.target/bpf/core-builtin-type-based.c: Correct.
* gcc.target/bpf/core-builtin-type-id.c: Correct.
---
 gcc/config/bpf/btfext-out.cc   |  5 +++--
 gcc/config/bpf/core-builtins.cc| 10 ++
 gcc/testsuite/gcc.target/bpf/core-builtin-type-based.c |  1 +
 gcc/testsuite/gcc.target/bpf/core-builtin-type-id.c|  1 +
 4 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/gcc/config/bpf/btfext-out.cc b/gcc/config/bpf/btfext-out.cc
index 57c0dc323812..ff1fd0739f1e 100644
--- a/gcc/config/bpf/btfext-out.cc
+++ b/gcc/config/bpf/btfext-out.cc
@@ -299,8 +299,9 @@ bpf_core_reloc_add (const tree type, const char * 
section_name,
 
   /* Buffer the access string in the auxiliary strtab.  */
   bpfcr->bpfcr_astr_off = 0;
-  if (accessor != NULL)
-bpfcr->bpfcr_astr_off = btf_ext_add_string (accessor);
+  gcc_assert (accessor != NULL);
+  bpfcr->bpfcr_astr_off = btf_ext_add_string (accessor);
+
   bpfcr->bpfcr_type = get_btf_id (ctf_lookup_tree_type (ctfc, type));
   bpfcr->bpfcr_insn_label = label;
   bpfcr->bpfcr_kind = kind;
diff --git a/gcc/config/bpf/core-builtins.cc b/gcc/config/bpf/core-builtins.cc
index 4256fea15e49..70b14e48e6e5 100644
--- a/gcc/config/bpf/core-builtins.cc
+++ b/gcc/config/bpf/core-builtins.cc
@@ -205,7 +205,7 @@ struct cr_local
 /* Core Relocation Final data */
 struct cr_final
 {
-  char *str;
+  const char *str;
   tree type;
   enum btf_core_reloc_kind kind;
 };
@@ -868,8 +868,10 @@ process_enum_value (struct cr_builtins *data)
{
  if (TREE_VALUE (l) == expr)
{
- ret.str = (char *) ggc_alloc_atomic ((index / 10) + 1);
- sprintf (ret.str, "%d", index);
+ char *tmp = (char *) ggc_alloc_atomic ((index / 10) + 1);
+ sprintf (tmp, "%d", index);
+ ret.str = (const char *) tmp;
+
  break;
}
  index++;
@@ -987,7 +989,7 @@ process_type (struct cr_builtins *data)
  || data->kind == BPF_RELO_TYPE_MATCHES);
 
   struct cr_final ret;
-  ret.str = NULL;
+  ret.str = ggc_strdup ("0");
   ret.type = data->type;
   ret.kind = data->kind;
 
diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-type-based.c 
b/gcc/testsuite/gcc.target/bpf/core-builtin-type-based.c
index 74a8d5a14d9d..9d818133c084 100644
--- a/gcc/testsuite/gcc.target/bpf/core-builtin-type-based.c
+++ b/gcc/testsuite/gcc.target/bpf/core-builtin-type-based.c
@@ -56,3 +56,4 @@ int foo(void *data)
 /* { dg-final { scan-assembler-times "0x8\[\t \]+\[^\n\]*bpfcr_kind" 13 } } 
BPF_TYPE_EXISTS */
 /* { dg-final { scan-assembler-times "0x9\[\t \]+\[^\n\]*bpfcr_kind" 11 } } 
BPF_TYPE_SIZE */
 /* { dg-final { scan-assembler-times "0xc\[\t \]+\[^\n\]*bpfcr_kind" 13 } } 
BPF_TYPE_MATCHES */
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \[(\"\]+0\[(\"\]+" 37 } } 
*/
diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-type-id.c 
b/gcc/testsuite/gcc.target/bpf/core-builtin-type-id.c
index 4b23288eac08..9576b91bc940 100644
--- a/gcc/testsuite/gcc.target/bpf/core-builtin-type-id.c
+++ b/gcc/testsuite/gcc.target/bpf/core-builtin-type-id.c
@@ -38,3 +38,4 @@ int foo(void *data)
 /* { dg-final { scan-assembler-times "0\[\t \]+\[^\n\]*bpfcr_type" 0  { xfail 
*-*-* } } } */
 /* { dg-final { scan-assembler-times "0x6\[\t \]+\[^\n\]*bpfcr_kind" 13 } } 
BPF_TYPE_ID_LOCAL */
 /* { dg-final { scan-assembler-times "0x7\[\t \]+\[^\n\]*bpfcr_kind" 7 } } 
BPF_TYPE_ID_TARGET */
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \[(\"\]+0\[(\"\]+" 20 } } 
*/
-- 
2.30.2

[PATCH 1/3] bpf: Fix CO-RE field expression builtins

2024-03-13 Thread Cupertino Miranda

This patch corrects bugs within the CO-RE builtin field expression
related builtins.
The following bugs were identified and corrected based on the expected
results of bpf-next selftests testsuite.
It addresses the following problems:
 - Expressions with pointer dereferencing now point to the BTF structure
   type, instead of the structure pointer type.
 - Pointer addition to structure root is now identified and constructed
   in CO-RE relocations as if it is an array access. For example,
  "&(s+2)->b" generates "2:1" as an access string where "2" is
  refering to the access for "s+2".

gcc/ChangeLog:
* config/bpf/core-builtins.cc (core_field_info): Add
support for POINTER_PLUS_EXPR in the root of the field expression.
(bpf_core_get_index): Likewise.
(pack_field_expr): Make the BTF type to point to the structure
related node, instead of its pointer type.
(make_core_safe_access_index): Correct to new code.

gcc/testsuite/ChangeLog:
* gcc.target/bpf/core-attr-5.c: Correct.
* gcc.target/bpf/core-attr-6.c: Likewise.
* gcc.target/bpf/core-attr-struct-as-array.c: Add test case for
pointer arithmetics as array access use case.
---
 gcc/config/bpf/core-builtins.cc   | 54 +++
 gcc/testsuite/gcc.target/bpf/core-attr-5.c|  4 +-
 gcc/testsuite/gcc.target/bpf/core-attr-6.c|  4 +-
 .../bpf/core-attr-struct-as-array.c   | 35 
 4 files changed, 82 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-attr-struct-as-array.c

diff --git a/gcc/config/bpf/core-builtins.cc b/gcc/config/bpf/core-builtins.cc
index 8d8c54c1fb3d..4256fea15e49 100644
--- a/gcc/config/bpf/core-builtins.cc
+++ b/gcc/config/bpf/core-builtins.cc
@@ -388,8 +388,8 @@ core_field_info (tree src, enum btf_core_reloc_kind kind)
 
   src = root_for_core_field_info (src);
 
-  get_inner_reference (src, , , _off, , ,
-  , );
+  tree root = get_inner_reference (src, , , _off, ,
+  , , );
 
   /* Note: Use DECL_BIT_FIELD_TYPE rather than DECL_BIT_FIELD here, because it
  remembers whether the field in question was originally declared as a
@@ -414,6 +414,23 @@ core_field_info (tree src, enum btf_core_reloc_kind kind)
 {
 case BPF_RELO_FIELD_BYTE_OFFSET:
   {
+   result = 0;
+   if (var_off == NULL_TREE
+   && TREE_CODE (root) == INDIRECT_REF
+   && TREE_CODE (TREE_OPERAND (root, 0)) == POINTER_PLUS_EXPR)
+ {
+   tree node = TREE_OPERAND (root, 0);
+   tree offset = TREE_OPERAND (node, 1);
+   tree type = TREE_TYPE (TREE_OPERAND (node, 0));
+   type = TREE_TYPE (type);
+
+   gcc_assert (TREE_CODE (offset) == INTEGER_CST && tree_fits_shwi_p 
(offset)
+   && COMPLETE_TYPE_P (type) && tree_fits_shwi_p (TYPE_SIZE 
(type)));
+
+   HOST_WIDE_INT offset_i = tree_to_shwi (offset);
+   result += offset_i;
+ }
+
type = unsigned_type_node;
if (var_off != NULL_TREE)
  {
@@ -422,9 +439,9 @@ core_field_info (tree src, enum btf_core_reloc_kind kind)
  }
 
if (bitfieldp)
- result = start_bitpos / 8;
+ result += start_bitpos / 8;
else
- result = bitpos / 8;
+ result += bitpos / 8;
   }
   break;
 
@@ -552,6 +569,7 @@ bpf_core_get_index (const tree node, bool *valid)
 {
   tree offset = TREE_OPERAND (node, 1);
   tree type = TREE_TYPE (TREE_OPERAND (node, 0));
+  type = TREE_TYPE (type);
 
   if (TREE_CODE (offset) == INTEGER_CST && tree_fits_shwi_p (offset)
  && COMPLETE_TYPE_P (type) && tree_fits_shwi_p (TYPE_SIZE (type)))
@@ -627,14 +645,18 @@ compute_field_expr (tree node, unsigned int *accessors,
 
   switch (TREE_CODE (node))
 {
-case ADDR_EXPR:
-  return 0;
 case INDIRECT_REF:
-  accessors[0] = 0;
-  return 1;
-case POINTER_PLUS_EXPR:
-  accessors[0] = bpf_core_get_index (node, valid);
-  return 1;
+  if (TREE_CODE (node = TREE_OPERAND (node, 0)) == POINTER_PLUS_EXPR)
+   {
+ accessors[0] = bpf_core_get_index (node, valid);
+ *access_node = TREE_OPERAND (node, 0);
+ return 1;
+   }
+  else
+   {
+ accessors[0] = 0;
+ return 1;
+   }
 case COMPONENT_REF:
   n = compute_field_expr (TREE_OPERAND (node, 0), accessors,
  valid,
@@ -660,6 +682,7 @@ compute_field_expr (tree node, unsigned int *accessors,
  access_node, false);
   return n;
 
+case ADDR_EXPR:
 case CALL_EXPR:
 case SSA_NAME:
 case VAR_DECL:
@@ -688,6 +711,9 @@ pack_field_expr (tree *args,
   tree access_node = NULL_TREE;
   tree type = NULL_TREE;
 
+  if (TREE_CODE (root) == ADDR_EXPR)
+root = TREE_OPERAND (root, 0);
+
   ret.reloc_decision = REPLACE_CREATE_RELOCATION;
 
   unsigned

Re: [PATCH v2] testsuite: xfail test for short_enums

2024-03-13 Thread Torbjorn SVENSSON





On 2024-03-12 14:21, Jason Merrill wrote:

On 3/11/24 06:23, Torbjörn SVENSSON wrote:

Changes compared to v1:
- Added reference to r14-6517-gb7e4a4c626e in dg-bogus comment
- Changed arm-*-* to short_enums in target selector
- Updated commit message to align with above changes


As the entire block generating the warning was removed in
r14-6517-gb7e4a4c626e, does it still make sense to add something to
trunk for the same line?
Do you want me to add the dg-bogus, but change "xfail" to "target" for
trunk?


Sounds good.


Pushed as basepoints/gcc-14-9452-g5a44e14eb4f




Is this patch ok for releases/gcc-13?


OK.


Pushed as releases/gcc-13.2.0-824-g1277f69b9b0

Kind regards,
Torbjörn

[comitted] testsuite: target test for short_enums

2024-03-13 Thread Torbjörn SVENSSON

Committed the blow as requested by Jason in the patch for releases/gcc-13.

--

On arm-none-eabi, the test case fails with below warning on GCC13
.../null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:63:65: warning: 
converting a packed 'enum obj_type' pointer (alignment 1) to a 'struct 
connection' pointer (alignment 4) may result in an unaligned pointer value 
[-Waddress-of-packed-member]

Add a dg-bogus to ensure that the warning is not reintroduced.

gcc/testsuite/ChangeLog:

* 
c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
Added dg-bogus with target on offending line for short_enums.

Signed-off-by: Torbjörn SVENSSON 
---
 .../null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
 
b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
index c1c8e6f6a39..a37962f1d85 100644
--- 
a/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
+++ 
b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
@@ -61,7 +61,7 @@ static inline enum obj_type obj_type(const enum obj_type *t)
 }
 static inline struct connection *__objt_conn(enum obj_type *t)
 {
- return ((struct connection *)(((char *)(t)) - ((long)&((struct connection 
*)0)->obj_type)));
+ return ((struct connection *)(((char *)(t)) - ((long)&((struct connection 
*)0)->obj_type))); /* { dg-bogus "may result in an unaligned pointer value" 
"Fixed in r14-6517-gb7e4a4c626e" { target short_enums } } */
 }
 static inline struct connection *objt_conn(enum obj_type *t)
 {
-- 
2.25.1

[PATCH] tree-ssa-sink: Improve code sinking pass

2024-03-13 Thread Ajit Agarwal

Hello Richard:

Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in begining of the block after the labels.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;

  if (a != 5)
{
  l = a + b + c + d +e + f;
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards

tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in begining of the block after the labels.

2024-03-13  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location):Sink statements at
the begining of the basic block after labels.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 +++
 gcc/tree-ssa-sink.cc|  7 ++-
 2 files changed, 17 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 880d6f70a80..1ec5c048fe7 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -208,7 +208,6 @@ select_best_block (basic_block early_bb,
 loop nest.  */
   temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
 }
-
   /* Placing a statement before a setjmp-like function would be invalid
  (it cannot be reevaluated when execution follows an abnormal edge).
  If we selected a block with abnormal predecessors, just punt.  */
@@ -430,6 +429,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
continue;
  break;
}
+
   use = USE_STMT (one_use);
 
   if (gimple_code (use) != GIMPLE_PHI)
@@ -439,10 +439,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
  if (sinkbb == frombb)
return false;
 
- if (sinkbb == gimple_bb (use))
-   *togsi = gsi_for_stmt (use);
- else
-   *togsi = gsi_after_labels (sinkbb);
+ *togsi = gsi_after_labels (sinkbb);
 
  return true;
}
-- 
2.39.3

[PATCH] LoongArch: Remove unused and incorrect "sge_" define_insn

2024-03-13 Thread Xi Ruoyao

If this insn is really used, we'll have something like

slti $r4,$r0,$r5

in the code.  The assembler will reject it because slti wants 2
register operands and 1 immediate operand.  But we've not got any bug
report for this, indicating this define_insn is unused at all.

Note that do_store_flag (in expr.cc) is already converting x >= 1 to
x > 0 unconditionally, so this define_insn is indeed unused and we can
just remove it.

gcc/ChangeLog:

* config/loongarch/loongarch.md (any_ge): Remove.
(sge_): Remove.
---

Not fully tested but should be obvious.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 525e1e82183..18fd9c1e7d5 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -517,7 +517,6 @@ (define_code_iterator equality_op [eq ne])
 ;; These code iterators allow the signed and unsigned scc operations to use
 ;; the same template.
 (define_code_iterator any_gt [gt gtu])
-(define_code_iterator any_ge [ge geu])
 (define_code_iterator any_lt [lt ltu])
 (define_code_iterator any_le [le leu])
 
@@ -3355,15 +3354,6 @@ (define_insn "*sgt_"
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
-(define_insn "*sge_"
-  [(set (match_operand:GPR 0 "register_operand" "=r")
-   (any_ge:GPR (match_operand:X 1 "register_operand" "r")
-(const_int 1)))]
-  ""
-  "slti\t%0,%.,%1"
-  [(set_attr "type" "slt")
-   (set_attr "mode" "")])
-
 (define_insn "*slt_"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(any_lt:GPR (match_operand:X 1 "register_operand" "r")
-- 
2.44.0

Re: [PATCH v1] LoongArch: Remove masking process for operand 3 of xvpermi.q.

2024-03-13 Thread Xi Ruoyao

On Tue, 2024-03-12 at 09:56 +0800, Chenghui Pan wrote:
> The behavior of non-zero unused bits in xvpermi.q instruction's
> third operand is undefined on LoongArch, according to our
> discussion (https://github.com/llvm/llvm-project/pull/83540),
> we think that keeping original insn operand as unmodified
> state is better solution.
> 
> This patch partially reverts 7b158e036a95b1ab40793dd53bed7dbd770ffdaf.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/lasx.md: Remove masking of operand 3.

Add (lasx_xvpermi_q_) before ":".

> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c:
>     Reposition operand 3's value into instruction's defined accept range.
^^

Remove these two white spaces.

Should be OK with these ChangeLog style issues fixed.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] vect: Call vect_convert_output with the right vecitype [PR114108]

2024-03-13 Thread Tejas Belagod


Ping!

On 3/7/24 4:14 PM, Tejas Belagod wrote:

This patch fixes a bug where vect_recog_abd_pattern called vect_convert_output
with the incorrect vecitype for the corresponding pattern_stmt.
vect_convert_output expects vecitype to be the vector form of the scalar type
of the LHS of pattern_stmt, but we were passing in the vector form of the LHS
of the new impending conversion statement.  This caused a skew in ABD's
pattern_stmt having the vectype of the following gimple pattern_stmt.

2024-03-06  Tejas Belagod  

gcc/ChangeLog:

PR middle-end/114108
* tree-vect-patterns.cc (vect_recog_abd_pattern): Call
vect_convert_output with the correct vecitype.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr114108.c: New test.
---
  gcc/testsuite/gcc.dg/vect/pr114108.c | 19 +++
  gcc/tree-vect-patterns.cc|  5 ++---
  2 files changed, 21 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/vect/pr114108.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr114108.c 
b/gcc/testsuite/gcc.dg/vect/pr114108.c
new file mode 100644
index 000..b3075d41398
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr114108.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+#include "tree-vect.h"
+
+typedef signed char schar;
+
+__attribute__((noipa, noinline, optimize("O3")))
+void foo(const schar *a, const schar *b, schar *c, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  unsigned u = __builtin_abs (a[i] - b[i]);
+  c[i] = u <= 7U ? u : 7U;
+}
+}
+
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target aarch64*-*-* 
} } } */
+/* { dg-final { scan-tree-dump "vect_recog_abd_pattern: detected" "vect" { 
target aarch64*-*-* } } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index d562f57920f..4f491c6b833 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1576,9 +1576,8 @@ vect_recog_abd_pattern (vec_info *vinfo,
&& !TYPE_UNSIGNED (abd_out_type))
  {
tree unsign = unsigned_type_for (abd_out_type);
-  tree unsign_vectype = get_vectype_for_scalar_type (vinfo, unsign);
-  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt,
- unsign_vectype);
+  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt, 
vectype_out);
+  vectype_out = get_vectype_for_scalar_type (vinfo, unsign);
  }
  
return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, vectype_out);

Re: [PATCH v2] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Richard Earnshaw





On 13/03/2024 12:12, Maxim Kuvyrkov wrote:

Changes in v2:
- Better changelog entry.
- NFC.


This patch has been tested on
- aarch64-linux-gnu
- arm-linux-gnueabihf (VFP, NEON disabled by default),
- arm-none-eabi (Soft-FP)
with the following [expected] differences in the test results:

   - FAIL now PASS [FAIL => PASS]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
(test for excess errors)

   - UNSUPPORTED disappears[UNSUP=> ]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98

   - UNSUPPORTED appears   [ =>UNSUP]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98

   - UNRESOLVED disappears [UNRES=> ]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
compilation failed to produce executable
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
produce executable

This patch was motivated by gcc.dg/vect/pr113576.c, which currently
fails to compile for ARM targets without NEON.

=== CUT ===

Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
-mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
overwritten by dg-options directive, which can cause tests to fail.

Behavior of dg-options is documented in vect.exp files, but not
all developers look at the .exp file when adding a new testcase.
This caused a few dg-options directives to be used instead of
the more appropriate dg-additional-options.

This patch changes target-independent dg-options into
dg-additional-options.  This patch does not touch target-specific
dg-options and target-specific tests to avoid disturbing the gentle
balance of target-specific vectorization.

This patch also removes a couple of unneeded "dg-do run" directives
to avoid failures on compile-only targets.  Default action is, again,
set by check_vect_support_and_set_flags.

Lastly, I avoided renaming tests that use -O options to O-*
filename format because this support is not consistent between
gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
It seems dg-additional-options is cleaner.

This patch does the following,
1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
2. do not change { dg-options FOO { target { target-*-pattern } } };
3. do not remove { dg-do run { target { target-*-pattern } } };
4. change { dg-options FOO } to { dg-additional-options FOO };
5. remove { dg-do run } in several tests, where it is clearly not needed.

gcc/testsuite/ChangeLog:

PR testsuite/114307
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: Remove dg-run.
* gcc.dg/vect/complex/complex-operations-run.c: Likewise.
* gcc.dg/vect/pr113576.c: Remove dg-run.  Use dg-additional-options for
test-specific flags.
* gcc.dg/vect/gimplefe-40.c: Use dg-additional-options for
test-specific flags.
* gcc.dg/vect/gimplefe-41.c: Likewise.
* gcc.dg/vect/pr101145inf.c: Likewise.
* gcc.dg/vect/pr101145inf_1.c: Likewise.
* gcc.dg/vect/pr108316.c: Likewise.
* gcc.dg/vect/pr109011-1.c: Likewise.
* gcc.dg/vect/pr109011-2.c: Likewise.
* gcc.dg/vect/pr109011-3.c: Likewise.
* gcc.dg/vect/pr109011-4.c: Likewise.
* gcc.dg/vect/pr109011-5.c: Likewise.
* gcc.dg/vect/pr111846.c: Likewise.
* gcc.dg/vect/pr111860-2.c: Likewise.
* gcc.dg/vect/pr111860-3.c: Likewise.
* gcc.dg/vect/pr113002.c: Likewise.
* gcc.dg/vect/pr84711.c: Likewise.
* gcc.dg/vect/pr85597.c: Likewise.
* gcc.dg/vect/pr88497-1.c: Likewise.
* gcc.dg/vect/pr88497-2.c: Likewise.
* gcc.dg/vect/pr88497-3.c: Likewise.
* gcc.dg/vect/pr88497-4.c: Likewise.
* gcc.dg/vect/pr88497-5.c: Likewise.
* gcc.dg/vect/pr88497-7.c: Likewise.
* gcc.dg/vect/pr92347.c: Likewise.
* gcc.dg/vect/pr93069.c: Likewise.
* gcc.dg/vect/pr97241.c: Likewise.
* gcc.dg/vect/pr99102.c: Likewise.
* gcc.dg/vect/vect-early-break_65.c: Likewise.
* gcc.dg/vect/vect-fold-1.c: Likewise.
* gcc.dg/vect/vect-ifcvt-19.c: Likewise.
* gcc.dg/vect/vect-ifcvt-20.c: Likewise.
* gcc.dg/vect/vect-reduc-epilogue-gaps.c: Likewise.
* gcc.dg/vect/vect-singleton_1.c: Likewise.
* g++.dg/vect/pr84556.cc: Likewise.
* gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise.
* gfortran.dg/vect/pr77848.f: Likewise.
* gfortran.dg/vect/pr90913.f90: Likewise.


OK.

(I wonder how many of the target-specific additional options are

[PATCH v2] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Maxim Kuvyrkov

Changes in v2:
- Better changelog entry.
- NFC.


This patch has been tested on
- aarch64-linux-gnu
- arm-linux-gnueabihf (VFP, NEON disabled by default),
- arm-none-eabi (Soft-FP)
with the following [expected] differences in the test results:

  - FAIL now PASS [FAIL => PASS]:
  Executed from: gcc:gcc.dg/vect/vect.exp
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
(test for excess errors)

  - UNSUPPORTED disappears[UNSUP=> ]:
  Executed from: g++:g++.dg/vect/vect.exp
g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98

  - UNSUPPORTED appears   [ =>UNSUP]:
  Executed from: g++:g++.dg/vect/vect.exp
g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98

  - UNRESOLVED disappears [UNRES=> ]:
  Executed from: gcc:gcc.dg/vect/vect.exp
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
compilation failed to produce executable
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
produce executable

This patch was motivated by gcc.dg/vect/pr113576.c, which currently
fails to compile for ARM targets without NEON.

=== CUT ===

Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
-mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
overwritten by dg-options directive, which can cause tests to fail.

Behavior of dg-options is documented in vect.exp files, but not
all developers look at the .exp file when adding a new testcase.
This caused a few dg-options directives to be used instead of
the more appropriate dg-additional-options.

This patch changes target-independent dg-options into
dg-additional-options.  This patch does not touch target-specific
dg-options and target-specific tests to avoid disturbing the gentle
balance of target-specific vectorization.

This patch also removes a couple of unneeded "dg-do run" directives
to avoid failures on compile-only targets.  Default action is, again,
set by check_vect_support_and_set_flags.

Lastly, I avoided renaming tests that use -O options to O-*
filename format because this support is not consistent between
gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
It seems dg-additional-options is cleaner.

This patch does the following,
1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
2. do not change { dg-options FOO { target { target-*-pattern } } };
3. do not remove { dg-do run { target { target-*-pattern } } };
4. change { dg-options FOO } to { dg-additional-options FOO };
5. remove { dg-do run } in several tests, where it is clearly not needed.

gcc/testsuite/ChangeLog:

PR testsuite/114307
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: Remove dg-run.
* gcc.dg/vect/complex/complex-operations-run.c: Likewise.
* gcc.dg/vect/pr113576.c: Remove dg-run.  Use dg-additional-options for
test-specific flags.
* gcc.dg/vect/gimplefe-40.c: Use dg-additional-options for
test-specific flags.
* gcc.dg/vect/gimplefe-41.c: Likewise.
* gcc.dg/vect/pr101145inf.c: Likewise.
* gcc.dg/vect/pr101145inf_1.c: Likewise.
* gcc.dg/vect/pr108316.c: Likewise.
* gcc.dg/vect/pr109011-1.c: Likewise.
* gcc.dg/vect/pr109011-2.c: Likewise.
* gcc.dg/vect/pr109011-3.c: Likewise.
* gcc.dg/vect/pr109011-4.c: Likewise.
* gcc.dg/vect/pr109011-5.c: Likewise.
* gcc.dg/vect/pr111846.c: Likewise.
* gcc.dg/vect/pr111860-2.c: Likewise.
* gcc.dg/vect/pr111860-3.c: Likewise.
* gcc.dg/vect/pr113002.c: Likewise.
* gcc.dg/vect/pr84711.c: Likewise.
* gcc.dg/vect/pr85597.c: Likewise.
* gcc.dg/vect/pr88497-1.c: Likewise.
* gcc.dg/vect/pr88497-2.c: Likewise.
* gcc.dg/vect/pr88497-3.c: Likewise.
* gcc.dg/vect/pr88497-4.c: Likewise.
* gcc.dg/vect/pr88497-5.c: Likewise.
* gcc.dg/vect/pr88497-7.c: Likewise.
* gcc.dg/vect/pr92347.c: Likewise.
* gcc.dg/vect/pr93069.c: Likewise.
* gcc.dg/vect/pr97241.c: Likewise.
* gcc.dg/vect/pr99102.c: Likewise.
* gcc.dg/vect/vect-early-break_65.c: Likewise.
* gcc.dg/vect/vect-fold-1.c: Likewise.
* gcc.dg/vect/vect-ifcvt-19.c: Likewise.
* gcc.dg/vect/vect-ifcvt-20.c: Likewise.
* gcc.dg/vect/vect-reduc-epilogue-gaps.c: Likewise.
* gcc.dg/vect/vect-singleton_1.c: Likewise.
* g++.dg/vect/pr84556.cc: Likewise.
* gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise.
* gfortran.dg/vect/pr77848.f: Likewise.
* gfortran.dg/vect/pr90913.f90: Likewise.
---
 gcc/testsuite/g++.dg/vect/pr84556.cc   | 2 +-

Re: [PATCH] Fix libcc1plugin and libc1plugin to avoid poisoned identifiers

2024-03-13 Thread Dimitry Andric

On 13 Mar 2024, at 12:30, Iain Sandoe  wrote:
> 
>> On 7 Mar 2024, at 16:48, Dimitry Andric  wrote:
>> 
>> Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632
>> 
>> Use INCLUDE_VECTOR before including system.h, instead of directly
>> including , to avoid running into poisoned identifiers.
> 
> I would say that the patch itself is obvious, but you have not mentioned how
> it was tested?

This was tested by doing a --disable-bootstrap build, on a FreeBSD
system where llvm-project's libc++ is the default C++ library
(specifically 15.0-CURRENT, which has llvm-project 17.0.6), against both
the lang/gcc14-devel port, and against gcc master as of
gcc-14-9346-g74e8cc28eda. This also required gcc-14-9360-g9970b576b7e to
be applied, before it was committed to master.

Note that if you do a fully bootstrapped build, there aren't any compile
errors, since it will compile the plugins against a freshly built
libstdc++: it has already transitively included  via other
standard headers, so the #include  statement after #include
"system.h" effectively does nothing, and won't run into poisoned
identifiers.

You would only get compile errors on those poisoned identifiers with the
non-bootstrapped, single-stage build which compiles everything against
the host system's C++ headers.

-Dimitry

Re: [PATCH] store-merging: Match bswap64 on 32-bit targets with bswapsi2 [PR114319]

2024-03-13 Thread Richard Biener

On Wed, 13 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> gimple-ssa-store-merging.cc tests bswap_optab in 3 different places,
> in 2 of them it has special exception for double-word bswap using pair
> of word-mode bswap optabs, but in the last one it doesn't.
> 
> The following patch changes even the last spot.
> We don't handle 128-bit bswaps in the passes at all, because currently we
> just use uint64_t to represent the byte reshuffling (we'd need to use
> offset_int or something like that instead) and we don't have
> __builtin_bswap128 nor type-generic __builtin_bswap, so there is nothing
> for 64-bit targets there.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2024-03-13  Jakub Jelinek  
> 
>   PR middle-end/114319
>   * gimple-ssa-store-merging.cc
>   (imm_store_chain_info::try_coalesce_bswap): For 32-bit targets
>   allow matching __builtin_bswap64 if there is bswapsi2 optab.
> 
>   * gcc.target/i386/pr114319.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.cc.jj2024-01-03 11:51:29.449760086 
> +0100
> +++ gcc/gimple-ssa-store-merging.cc   2024-03-12 23:51:30.740236577 +0100
> @@ -3051,7 +3051,10 @@ imm_store_chain_info::try_coalesce_bswap
>   return false;
>case 64:
>   if (builtin_decl_explicit_p (BUILT_IN_BSWAP64)
> - && optab_handler (bswap_optab, DImode) != CODE_FOR_nothing)
> + && (optab_handler (bswap_optab, DImode) != CODE_FOR_nothing
> + || (word_mode == SImode
> + && builtin_decl_explicit_p (BUILT_IN_BSWAP32)
> + && optab_handler (bswap_optab, SImode) != 
> CODE_FOR_nothing)))
> break;
>   return false;
>default:
> --- gcc/testsuite/gcc.target/i386/pr114319.c.jj   2024-03-13 
> 10:59:57.378404934 +0100
> +++ gcc/testsuite/gcc.target/i386/pr114319.c  2024-03-13 10:59:46.612554118 
> +0100
> @@ -0,0 +1,19 @@
> +/* PR middle-end/114319 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -masm=att -mno-movbe" } */
> +/* { dg-additional-options "-march=i486" { target ia32 } } */
> +/* { dg-final { scan-assembler-times "\tbswap\t%r" 1 { target { ! ia32 } } } 
> } */
> +/* { dg-final { scan-assembler-times "\tbswap\t%\[er]" 2 { target ia32 } } } 
> */
> +
> +void
> +foo (unsigned long long x, unsigned char *y)
> +{
> +  y[0] = x >> 56;
> +  y[1] = x >> 48;
> +  y[2] = x >> 40;
> +  y[3] = x >> 32;
> +  y[4] = x >> 24;
> +  y[5] = x >> 16;
> +  y[6] = x >> 8;
> +  y[7] = x;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Maxim Kuvyrkov

> On Mar 13, 2024, at 15:25, Richard Earnshaw  
> wrote:
> 
> 
> 
> On 13/03/2024 10:58, Maxim Kuvyrkov wrote:
>> This patch has been tested on
>> - aarch64-linux-gnu
>> - arm-linux-gnueabihf (VFP, NEON disabled by default),
>> - arm-none-eabi (Soft-FP)
>> with the following [expected] differences in the test results:
>>   - FAIL now PASS [FAIL => PASS]:
>>   Executed from: gcc:gcc.dg/vect/vect.exp
>> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
>> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
>> (test for excess errors)
>>   - UNSUPPORTED disappears[UNSUP=> ]:
>>   Executed from: g++:g++.dg/vect/vect.exp
>> g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98
>>   - UNSUPPORTED appears   [ =>UNSUP]:
>>   Executed from: g++:g++.dg/vect/vect.exp
>> g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98
>>   - UNRESOLVED disappears [UNRES=> ]:
>>   Executed from: gcc:gcc.dg/vect/vect.exp
>> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
>> compilation failed to produce executable
>> gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
>> produce executable
>> This patch was motivated by gcc.dg/vect/pr113576.c, which currently
>> fails to compile for ARM targets without NEON.
>> === CUT ===
>> Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
>> to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
>> -mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
>> overwritten by dg-options directive, which can cause tests to fail.
>> Behavior of dg-options is documented in vect.exp files, but not
>> all developers look at the .exp file when adding a new testcase.
>> This caused a few dg-options directives to be used instead of
>> the more appropriate dg-additional-options.
>> This patch changes target-independent dg-options into
>> dg-additional-options.  This patch does not touch target-specific
>> dg-options and target-specific tests to avoid disturbing the gentle
>> balance of target-specific vectorization.
>> This patch also removes a couple of unneeded "dg-do run" directives
>> to avoid failures on compile-only targets.  Default action is, again,
>> set by check_vect_support_and_set_flags.
>> Lastly, I avoided renaming tests that use -O options to O-*
>> filename format because this support is not consistent between
>> gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
>> It seems dg-additional-options is cleaner.
>> This patch does the following,
>> 1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
>> 2. do not change { dg-options FOO { target { target-*-pattern } } };
>> 3. do not remove { dg-do run { target { target-*-pattern } } };
>> 4. change { dg-options FOO } to { dg-additional-options FOO };
>> 5. remove { dg-do run } in several tests, where it is clearly not needed.
>> gcc/testsuite/ChangeLog:
>> PR testsuite/114307
>> * g++.dg/vect/pr84556.cc: Fixup.
>> * gcc.dg/vect/complex/complex-operations-run.c Fixup.
>> * gcc.dg/vect/gimplefe-40.c Fixup.
>> * gcc.dg/vect/gimplefe-41.c Fixup.
>> * gcc.dg/vect/pr101145inf.c Fixup.
>> * gcc.dg/vect/pr101145inf_1.c Fixup.
>> * gcc.dg/vect/pr108316.c Fixup.
>> * gcc.dg/vect/pr109011-1.c Fixup.
>> * gcc.dg/vect/pr109011-2.c Fixup.
>> * gcc.dg/vect/pr109011-3.c Fixup.
>> * gcc.dg/vect/pr109011-4.c Fixup.
>> * gcc.dg/vect/pr109011-5.c Fixup.
>> * gcc.dg/vect/pr111846.c Fixup.
>> * gcc.dg/vect/pr111860-2.c Fixup.
>> * gcc.dg/vect/pr111860-3.c Fixup.
>> * gcc.dg/vect/pr113002.c Fixup.
>> * gcc.dg/vect/pr113576.c Fixup.
>> * gcc.dg/vect/pr84711.c Fixup.
>> * gcc.dg/vect/pr85597.c Fixup.
>> * gcc.dg/vect/pr88497-1.c Fixup.
>> * gcc.dg/vect/pr88497-2.c Fixup.
>> * gcc.dg/vect/pr88497-3.c Fixup.
>> * gcc.dg/vect/pr88497-4.c Fixup.
>> * gcc.dg/vect/pr88497-5.c Fixup.
>> * gcc.dg/vect/pr88497-7.c Fixup.
>> * gcc.dg/vect/pr92347.c Fixup.
>> * gcc.dg/vect/pr93069.c Fixup.
>> * gcc.dg/vect/pr97241.c Fixup.
>> * gcc.dg/vect/pr99102.c Fixup.
>> * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c Fixup.
>> * gcc.dg/vect/vect-early-break_65.c Fixup.
>> * gcc.dg/vect/vect-fold-1.c Fixup.
>> * gcc.dg/vect/vect-ifcvt-19.c Fixup.
>> * gcc.dg/vect/vect-ifcvt-20.c Fixup.
>> * gcc.dg/vect/vect-reduc-epilogue-gaps.c Fixup.
>> * gcc.dg/vect/vect-singleton_1.c Fixup.
>> * gfortran.dg/vect/fast-math-mgrid-resid.f Fixup.
>> * gfortran.dg/vect/pr77848.f Fixup.
>> * gfortran.dg/vect/pr90913.f90 Fixup.
> 
> Thanks for looking into this, I agree that changing to dg-additional-options 
> looks the right choice.
> 
> The only thing to be wary of is that later 'dg-options' directives may 
> override dg-additional-options directives; you might want to test at least 
> one target where there are target-specific dg-options that you've not 
> modified.

As far as I see, the only case like this is

Re: [PATCH v1] libstdc++: Optimize removal from unique assoc containers [PR112934]

2024-03-13 Thread Jonathan Wakely

On Mon, 11 Mar 2024 at 23:36, Barnabás Pőcze  wrote:
>
> Previously, calling erase(key) on both std::map and std::set
> would execute that same code that std::multi{map,set} would.
> However, doing that is unnecessary because std::{map,set}
> guarantee that all elements are unique.
>
> It is reasonable to expect that erase(key) is equivalent
> or better than:
>
>   auto it = m.find(key);
>   if (it != m.end())
> m.erase(it);
>
> However, this was not the case. Fix that by adding a new
> function _Rb_tree<>::_M_erase_unique() that is essentially
> equivalent to the above snippet, and use this from both
> std::map and std::set.

Hi, this change looks reasonable, thanks for the patch. Please note
that GCC is currently in "stage 3" of its dev process so this change
would have to wait until after GCC 14 branches from trunk, due in a
few weeks.

I assume you ran the testsuite with no regressions. Do you have
benchmarks to show this making a difference?


>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/112934
> * include/bits/stl_tree.h (_Rb_tree<>::_M_erase_unique): Add.
> * include/bits/stl_map.h (map<>::erase): Use _M_erase_unique.
> * include/bits/stl_set.h (set<>::erase): Likewise.
> ---
>  libstdc++-v3/include/bits/stl_map.h  |  2 +-
>  libstdc++-v3/include/bits/stl_set.h  |  2 +-
>  libstdc++-v3/include/bits/stl_tree.h | 17 +
>  3 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/stl_map.h 
> b/libstdc++-v3/include/bits/stl_map.h
> index ad58a631af5..229643b77fd 100644
> --- a/libstdc++-v3/include/bits/stl_map.h
> +++ b/libstdc++-v3/include/bits/stl_map.h
> @@ -1115,7 +1115,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> */
>size_type
>erase(const key_type& __x)
> -  { return _M_t.erase(__x); }
> +  { return _M_t._M_erase_unique(__x); }
>
>  #if __cplusplus >= 201103L
>// _GLIBCXX_RESOLVE_LIB_DEFECTS
> diff --git a/libstdc++-v3/include/bits/stl_set.h 
> b/libstdc++-v3/include/bits/stl_set.h
> index c0eb4dbf65f..51a1717ec62 100644
> --- a/libstdc++-v3/include/bits/stl_set.h
> +++ b/libstdc++-v3/include/bits/stl_set.h
> @@ -684,7 +684,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> */
>size_type
>erase(const key_type& __x)
> -  { return _M_t.erase(__x); }
> +  { return _M_t._M_erase_unique(__x); }
>
>  #if __cplusplus >= 201103L
>// _GLIBCXX_RESOLVE_LIB_DEFECTS
> diff --git a/libstdc++-v3/include/bits/stl_tree.h 
> b/libstdc++-v3/include/bits/stl_tree.h
> index 6f470f04f6a..9e80d449c7e 100644
> --- a/libstdc++-v3/include/bits/stl_tree.h
> +++ b/libstdc++-v3/include/bits/stl_tree.h
> @@ -1225,6 +1225,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>size_type
>erase(const key_type& __x);
>
> +  size_type
> +  _M_erase_unique(const key_type& __x);
> +
>  #if __cplusplus >= 201103L
>// _GLIBCXX_RESOLVE_LIB_DEFECTS
>// DR 130. Associative erase should return an iterator.
> @@ -2518,6 +2521,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>return __old_size - size();
>  }
>
> +  template +  typename _Compare, typename _Alloc>
> +typename _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::size_type
> +_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
> +_M_erase_unique(const _Key& __x)
> +{
> +  iterator __it = find(__x);
> +  if (__it == end())
> +   return 0;
> +
> +  _M_erase_aux(__it);
> +  return 1;
> +}
> +
>templatetypename _Compare, typename _Alloc>
>  typename _Rb_tree<_Key, _Val, _KeyOfValue,
> --
> 2.44.0
>
>

Re: [PATCH][GCC] aarch64: Fix SCHEDULER_IDENT for Cortex-A520

2024-03-13 Thread Richard Earnshaw





On 12/03/2024 14:08, Richard Ball wrote:

The SCHEDULER_IDENT for this CPU was incorrectly
set to cortexa55, which is incorrect. This can cause
sub-optimal asm to be generated.

Ok for trunk?

gcc/ChangeLog:
PR target/114272
* config/aarch64/aarch64-cores.def (AARCH64_CORE):
Change SCHEDULER_IDENT from cortexa55 to cortexa53
for Cortex-A520.


I don't see having this as a separate patch to the one for Cortex-A510 
as having any value.


Please merge the two together.  A merged patch is pre-approved.

R.

Re: [PATCH] Fix libcc1plugin and libc1plugin to avoid poisoned identifiers

2024-03-13 Thread Iain Sandoe

Hi Dimitry,

> On 7 Mar 2024, at 16:48, Dimitry Andric  wrote:
> 
> Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632
> 
> Use INCLUDE_VECTOR before including system.h, instead of directly
> including , to avoid running into poisoned identifiers.

I would say that the patch itself is obvious, but you have not mentioned how
it was tested?

thanks
Iain

> 
> Signed-off-by: Dimitry Andric 
> ---
> libcc1/libcc1plugin.cc | 3 +--
> libcc1/libcp1plugin.cc | 3 +--
> 2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/libcc1/libcc1plugin.cc b/libcc1/libcc1plugin.cc
> index 72d17c3b81c..e64847466f4 100644
> --- a/libcc1/libcc1plugin.cc
> +++ b/libcc1/libcc1plugin.cc
> @@ -32,6 +32,7 @@
> #undef PACKAGE_VERSION
> 
> #define INCLUDE_MEMORY
> +#define INCLUDE_VECTOR
> #include "gcc-plugin.h"
> #include "system.h"
> #include "coretypes.h"
> @@ -69,8 +70,6 @@
> #include "gcc-c-interface.h"
> #include "context.hh"
> 
> -#include 
> -
> using namespace cc1_plugin;
> 
> 
> diff --git a/libcc1/libcp1plugin.cc b/libcc1/libcp1plugin.cc
> index 0eff7c68d29..da68c5d0ac1 100644
> --- a/libcc1/libcp1plugin.cc
> +++ b/libcc1/libcp1plugin.cc
> @@ -33,6 +33,7 @@
> #undef PACKAGE_VERSION
> 
> #define INCLUDE_MEMORY
> +#define INCLUDE_VECTOR
> #include "gcc-plugin.h"
> #include "system.h"
> #include "coretypes.h"
> @@ -71,8 +72,6 @@
> #include "rpc.hh"
> #include "context.hh"
> 
> -#include 
> -
> using namespace cc1_plugin;
> 
> 
> -- 
> 2.43.2
>

Re: [PATCH] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Richard Earnshaw





On 13/03/2024 10:58, Maxim Kuvyrkov wrote:

This patch has been tested on
- aarch64-linux-gnu
- arm-linux-gnueabihf (VFP, NEON disabled by default),
- arm-none-eabi (Soft-FP)
with the following [expected] differences in the test results:

   - FAIL now PASS [FAIL => PASS]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
(test for excess errors)

   - UNSUPPORTED disappears[UNSUP=> ]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98

   - UNSUPPORTED appears   [ =>UNSUP]:
   Executed from: g++:g++.dg/vect/vect.exp
 g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98

   - UNRESOLVED disappears [UNRES=> ]:
   Executed from: gcc:gcc.dg/vect/vect.exp
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
compilation failed to produce executable
 gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
produce executable

This patch was motivated by gcc.dg/vect/pr113576.c, which currently
fails to compile for ARM targets without NEON.

=== CUT ===

Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
-mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
overwritten by dg-options directive, which can cause tests to fail.

Behavior of dg-options is documented in vect.exp files, but not
all developers look at the .exp file when adding a new testcase.
This caused a few dg-options directives to be used instead of
the more appropriate dg-additional-options.

This patch changes target-independent dg-options into
dg-additional-options.  This patch does not touch target-specific
dg-options and target-specific tests to avoid disturbing the gentle
balance of target-specific vectorization.

This patch also removes a couple of unneeded "dg-do run" directives
to avoid failures on compile-only targets.  Default action is, again,
set by check_vect_support_and_set_flags.

Lastly, I avoided renaming tests that use -O options to O-*
filename format because this support is not consistent between
gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
It seems dg-additional-options is cleaner.

This patch does the following,
1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
2. do not change { dg-options FOO { target { target-*-pattern } } };
3. do not remove { dg-do run { target { target-*-pattern } } };
4. change { dg-options FOO } to { dg-additional-options FOO };
5. remove { dg-do run } in several tests, where it is clearly not needed.

gcc/testsuite/ChangeLog:

PR testsuite/114307
* g++.dg/vect/pr84556.cc: Fixup.
* gcc.dg/vect/complex/complex-operations-run.c Fixup.
* gcc.dg/vect/gimplefe-40.c Fixup.
* gcc.dg/vect/gimplefe-41.c Fixup.
* gcc.dg/vect/pr101145inf.c Fixup.
* gcc.dg/vect/pr101145inf_1.c Fixup.
* gcc.dg/vect/pr108316.c Fixup.
* gcc.dg/vect/pr109011-1.c Fixup.
* gcc.dg/vect/pr109011-2.c Fixup.
* gcc.dg/vect/pr109011-3.c Fixup.
* gcc.dg/vect/pr109011-4.c Fixup.
* gcc.dg/vect/pr109011-5.c Fixup.
* gcc.dg/vect/pr111846.c Fixup.
* gcc.dg/vect/pr111860-2.c Fixup.
* gcc.dg/vect/pr111860-3.c Fixup.
* gcc.dg/vect/pr113002.c Fixup.
* gcc.dg/vect/pr113576.c Fixup.
* gcc.dg/vect/pr84711.c Fixup.
* gcc.dg/vect/pr85597.c Fixup.
* gcc.dg/vect/pr88497-1.c Fixup.
* gcc.dg/vect/pr88497-2.c Fixup.
* gcc.dg/vect/pr88497-3.c Fixup.
* gcc.dg/vect/pr88497-4.c Fixup.
* gcc.dg/vect/pr88497-5.c Fixup.
* gcc.dg/vect/pr88497-7.c Fixup.
* gcc.dg/vect/pr92347.c Fixup.
* gcc.dg/vect/pr93069.c Fixup.
* gcc.dg/vect/pr97241.c Fixup.
* gcc.dg/vect/pr99102.c Fixup.
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c Fixup.
* gcc.dg/vect/vect-early-break_65.c Fixup.
* gcc.dg/vect/vect-fold-1.c Fixup.
* gcc.dg/vect/vect-ifcvt-19.c Fixup.
* gcc.dg/vect/vect-ifcvt-20.c Fixup.
* gcc.dg/vect/vect-reduc-epilogue-gaps.c Fixup.
* gcc.dg/vect/vect-singleton_1.c Fixup.
* gfortran.dg/vect/fast-math-mgrid-resid.f Fixup.
* gfortran.dg/vect/pr77848.f Fixup.
* gfortran.dg/vect/pr90913.f90 Fixup.


Thanks for looking into this, I agree that changing to 
dg-additional-options looks the right choice.


The only thing to be wary of is that later 'dg-options' directives may 
override dg-additional-options directives; you might want to test at 
least one target where there are target-specific dg-options that you've 
not modified.


The patch is OK, but the ChangeLog is not!  Fixup doesn't

Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

2024-03-13 Thread Jakub Jelinek

On Wed, Mar 13, 2024 at 12:18:45PM +0100, Jan Hubicka wrote:
> > On Wed, Mar 13, 2024 at 10:55:07AM +0100, Jan Hubicka wrote:
> > > > > So the ipa_jump_func are I think the only thing that actually can 
> > > > > differ
> > > > > on the ICF merging candidates from value range POV.
> > > > 
> > > > I agree.  Btw, I would have approved the original patch in this
> > > > thread that wipes SSA_NAME_INFO in merged bodies to mimic what LTO
> > > > effectively does right now.  That also looks most sensible to
> > > > backport.
> > > > 
> > > > But I'll defer to Honza in the end (but also want to point out we
> > > > need something suitable for backporting).
> > > 
> > > My main worry here is that I tried to relax matching of IL metadata in
> > > the past and it triggers interesting problems.  (I implemented TBAA
> > > merging and one needs to match additional things in summaries and loop
> > > structures)
> > 
> > The point of the patch is that it emulates what happens with LTO (though,
> > does that only for successful ICF merges), because LTO streaming ignores
> > SSA_NAME_{RANGE,PTR}_INFO.
> > So, because with LTO all you have in the IL is the m_vr in jump_tables,
> > pure/const analysis results, whatever else might be derived on the side
> > from the range info, you need to punt or union all that information anyway,
> > otherwise it will misbehave with LTO.
> > So punting on SSA_NAME_{RANGE,PTR}_INFO differences instead of throwing it
> > away means that non-LTO will get fewer ICF merges than LTO unnecessarily,
> > it doesn't improve anything for the code correctness at least for the LTO
> > case.
> 
> We have wrong code with LTO, too.

I know.

> The problem is that IPA passes (and
> not only that, loop analysis too) does analysis at compile time (with
> value numbers in) and streams the info separately.

And that is desirable, because otherwise it simply couldn't derive any
ranges.

>  Removal of value ranges
> (either by LTO or by your patch) happens between computing these
> summaries and using them, so this can be used to trigger wrong code,
> sadly.

Yes.  But with LTO, I don't see how the IPA ICF comparison whether
two functions are the same or not could be done with the
SSA_NAME_{RANGE,PTR}_INFO in, otherwise it could only ICF merge functions
from the same TUs.  So the comparison IMHO (and the assert checks in my
patch prove that) is done when the SSA_NAME_{RANGE,PTR}_INFO aren't in
anymore.  So, one just needs to compare and punt or union whatever
is or could be influenced in the IPA streamed data from the ranges etc.
And because one has to do it for LTO, doing it for non-LTO should be
sufficient too.

Jakub

Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

2024-03-13 Thread Jan Hubicka

> On Wed, Mar 13, 2024 at 10:55:07AM +0100, Jan Hubicka wrote:
> > > > So the ipa_jump_func are I think the only thing that actually can differ
> > > > on the ICF merging candidates from value range POV.
> > > 
> > > I agree.  Btw, I would have approved the original patch in this
> > > thread that wipes SSA_NAME_INFO in merged bodies to mimic what LTO
> > > effectively does right now.  That also looks most sensible to
> > > backport.
> > > 
> > > But I'll defer to Honza in the end (but also want to point out we
> > > need something suitable for backporting).
> > 
> > My main worry here is that I tried to relax matching of IL metadata in
> > the past and it triggers interesting problems.  (I implemented TBAA
> > merging and one needs to match additional things in summaries and loop
> > structures)
> 
> The point of the patch is that it emulates what happens with LTO (though,
> does that only for successful ICF merges), because LTO streaming ignores
> SSA_NAME_{RANGE,PTR}_INFO.
> So, because with LTO all you have in the IL is the m_vr in jump_tables,
> pure/const analysis results, whatever else might be derived on the side
> from the range info, you need to punt or union all that information anyway,
> otherwise it will misbehave with LTO.
> So punting on SSA_NAME_{RANGE,PTR}_INFO differences instead of throwing it
> away means that non-LTO will get fewer ICF merges than LTO unnecessarily,
> it doesn't improve anything for the code correctness at least for the LTO
> case.

We have wrong code with LTO, too.  The problem is that IPA passes (and
not only that, loop analysis too) does analysis at compile time (with
value numbers in) and streams the info separately.  Removal of value ranges
(either by LTO or by your patch) happens between computing these
summaries and using them, so this can be used to trigger wrong code,
sadly.

Honza
> 
>   Jakub
>

[PATCH] [testsuite] Fixup dg-options in {gcc, g++, gfortran}.dg/vect.exp tests

2024-03-13 Thread Maxim Kuvyrkov

This patch has been tested on
- aarch64-linux-gnu
- arm-linux-gnueabihf (VFP, NEON disabled by default),
- arm-none-eabi (Soft-FP)
with the following [expected] differences in the test results:

  - FAIL now PASS [FAIL => PASS]:
  Executed from: gcc:gcc.dg/vect/vect.exp
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c (test for excess errors)
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
(test for excess errors)

  - UNSUPPORTED disappears[UNSUP=> ]:
  Executed from: g++:g++.dg/vect/vect.exp
g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=gnu++98

  - UNSUPPORTED appears   [ =>UNSUP]:
  Executed from: g++:g++.dg/vect/vect.exp
g++:g++.dg/vect/vect.exp=g++.dg/vect/pr84556.cc  -std=c++98

  - UNRESOLVED disappears [UNRES=> ]:
  Executed from: gcc:gcc.dg/vect/vect.exp
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c -flto -ffat-lto-objects 
compilation failed to produce executable
gcc:gcc.dg/vect/vect.exp=gcc.dg/vect/pr113576.c compilation failed to 
produce executable

This patch was motivated by gcc.dg/vect/pr113576.c, which currently
fails to compile for ARM targets without NEON.

=== CUT ===

Testsuites driven by vect.exp rely on check_vect_support_and_set_flags
to set appropriate DEFAULT_VECTFLAGS for a given target (e.g., add
-mfpu=neon for arm-linux-gnueabi).  Unfortunately, these flags are
overwritten by dg-options directive, which can cause tests to fail.

Behavior of dg-options is documented in vect.exp files, but not
all developers look at the .exp file when adding a new testcase.
This caused a few dg-options directives to be used instead of
the more appropriate dg-additional-options.

This patch changes target-independent dg-options into
dg-additional-options.  This patch does not touch target-specific
dg-options and target-specific tests to avoid disturbing the gentle
balance of target-specific vectorization.

This patch also removes a couple of unneeded "dg-do run" directives
to avoid failures on compile-only targets.  Default action is, again,
set by check_vect_support_and_set_flags.

Lastly, I avoided renaming tests that use -O options to O-*
filename format because this support is not consistent between
gcc.dg/vect/, g++.dg/vect/, and gfortran.dg/vect/ testsuites.
It seems dg-additional-options is cleaner.

This patch does the following,
1. do not change target-specific tests, e.g., gcc.dg/vect/costmodel/riscv/*;
2. do not change { dg-options FOO { target { target-*-pattern } } };
3. do not remove { dg-do run { target { target-*-pattern } } };
4. change { dg-options FOO } to { dg-additional-options FOO };
5. remove { dg-do run } in several tests, where it is clearly not needed.

gcc/testsuite/ChangeLog:

PR testsuite/114307
* g++.dg/vect/pr84556.cc: Fixup.
* gcc.dg/vect/complex/complex-operations-run.c Fixup.
* gcc.dg/vect/gimplefe-40.c Fixup.
* gcc.dg/vect/gimplefe-41.c Fixup.
* gcc.dg/vect/pr101145inf.c Fixup.
* gcc.dg/vect/pr101145inf_1.c Fixup.
* gcc.dg/vect/pr108316.c Fixup.
* gcc.dg/vect/pr109011-1.c Fixup.
* gcc.dg/vect/pr109011-2.c Fixup.
* gcc.dg/vect/pr109011-3.c Fixup.
* gcc.dg/vect/pr109011-4.c Fixup.
* gcc.dg/vect/pr109011-5.c Fixup.
* gcc.dg/vect/pr111846.c Fixup.
* gcc.dg/vect/pr111860-2.c Fixup.
* gcc.dg/vect/pr111860-3.c Fixup.
* gcc.dg/vect/pr113002.c Fixup.
* gcc.dg/vect/pr113576.c Fixup.
* gcc.dg/vect/pr84711.c Fixup.
* gcc.dg/vect/pr85597.c Fixup.
* gcc.dg/vect/pr88497-1.c Fixup.
* gcc.dg/vect/pr88497-2.c Fixup.
* gcc.dg/vect/pr88497-3.c Fixup.
* gcc.dg/vect/pr88497-4.c Fixup.
* gcc.dg/vect/pr88497-5.c Fixup.
* gcc.dg/vect/pr88497-7.c Fixup.
* gcc.dg/vect/pr92347.c Fixup.
* gcc.dg/vect/pr93069.c Fixup.
* gcc.dg/vect/pr97241.c Fixup.
* gcc.dg/vect/pr99102.c Fixup.
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c Fixup.
* gcc.dg/vect/vect-early-break_65.c Fixup.
* gcc.dg/vect/vect-fold-1.c Fixup.
* gcc.dg/vect/vect-ifcvt-19.c Fixup.
* gcc.dg/vect/vect-ifcvt-20.c Fixup.
* gcc.dg/vect/vect-reduc-epilogue-gaps.c Fixup.
* gcc.dg/vect/vect-singleton_1.c Fixup.
* gfortran.dg/vect/fast-math-mgrid-resid.f Fixup.
* gfortran.dg/vect/pr77848.f Fixup.
* gfortran.dg/vect/pr90913.f90 Fixup.
---
 gcc/testsuite/g++.dg/vect/pr84556.cc   | 2 +-
 gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c | 1 -
 gcc/testsuite/gcc.dg/vect/gimplefe-40.c| 2 +-
 gcc/testsuite/gcc.dg/vect/gimplefe-41.c| 2 +-
 gcc/testsuite/gcc.dg/vect/pr101145inf.c| 2 +-
 gcc/testsuite/gcc.dg/vect/pr101145inf_1.c  | 2 +-

Re: [PATCH] testsuite: Fix vfprintf-chk-1.c with -fhardened

2024-03-13 Thread Jakub Jelinek

On Wed, Mar 13, 2024 at 06:05:29PM +0800, Xi Ruoyao wrote:
> On Tue, 2024-03-12 at 17:19 +0100, Jakub Jelinek wrote:
> > On Thu, Feb 15, 2024 at 10:53:08PM +, Sam James wrote:
> > > With _FORTIFY_SOURCE >= 2 (enabled by -fhardened), vfprintf-chk-1.c's
> > > __vfprintf_chk ends up calling __vprintf_chk rather than vprintf.
> 
> Do we really want to support adding random CFLAGS running the test
> suite?

Random flags certainly not, but some flags should be supported and are very
useful.
We already support the various ABI changing options (-m32 -m64 -mx32 and
the like) and ISA options in there (-march=whatever, -msse2, etc.),
and testing with -fstack-protector-strong is what some distros do for years,
testing with -fhardened is desirable if pretty much everything in the
distros is built with that flag.
Another thing is using --param whatever=whatever in the target_board flags,
or -fno-tree-dce etc. that may or might not work and user needs to be
prepared there will be extra fails.

Jakub

[PATCH] store-merging: Match bswap64 on 32-bit targets with bswapsi2 [PR114319]

2024-03-13 Thread Jakub Jelinek

Hi!

gimple-ssa-store-merging.cc tests bswap_optab in 3 different places,
in 2 of them it has special exception for double-word bswap using pair
of word-mode bswap optabs, but in the last one it doesn't.

The following patch changes even the last spot.
We don't handle 128-bit bswaps in the passes at all, because currently we
just use uint64_t to represent the byte reshuffling (we'd need to use
offset_int or something like that instead) and we don't have
__builtin_bswap128 nor type-generic __builtin_bswap, so there is nothing
for 64-bit targets there.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-13  Jakub Jelinek  

PR middle-end/114319
* gimple-ssa-store-merging.cc
(imm_store_chain_info::try_coalesce_bswap): For 32-bit targets
allow matching __builtin_bswap64 if there is bswapsi2 optab.

* gcc.target/i386/pr114319.c: New test.

--- gcc/gimple-ssa-store-merging.cc.jj  2024-01-03 11:51:29.449760086 +0100
+++ gcc/gimple-ssa-store-merging.cc 2024-03-12 23:51:30.740236577 +0100
@@ -3051,7 +3051,10 @@ imm_store_chain_info::try_coalesce_bswap
return false;
   case 64:
if (builtin_decl_explicit_p (BUILT_IN_BSWAP64)
-   && optab_handler (bswap_optab, DImode) != CODE_FOR_nothing)
+   && (optab_handler (bswap_optab, DImode) != CODE_FOR_nothing
+   || (word_mode == SImode
+   && builtin_decl_explicit_p (BUILT_IN_BSWAP32)
+   && optab_handler (bswap_optab, SImode) != 
CODE_FOR_nothing)))
  break;
return false;
   default:
--- gcc/testsuite/gcc.target/i386/pr114319.c.jj 2024-03-13 10:59:57.378404934 
+0100
+++ gcc/testsuite/gcc.target/i386/pr114319.c2024-03-13 10:59:46.612554118 
+0100
@@ -0,0 +1,19 @@
+/* PR middle-end/114319 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -masm=att -mno-movbe" } */
+/* { dg-additional-options "-march=i486" { target ia32 } } */
+/* { dg-final { scan-assembler-times "\tbswap\t%r" 1 { target { ! ia32 } } } } 
*/
+/* { dg-final { scan-assembler-times "\tbswap\t%\[er]" 2 { target ia32 } } } */
+
+void
+foo (unsigned long long x, unsigned char *y)
+{
+  y[0] = x >> 56;
+  y[1] = x >> 48;
+  y[2] = x >> 40;
+  y[3] = x >> 32;
+  y[4] = x >> 24;
+  y[5] = x >> 16;
+  y[6] = x >> 8;
+  y[7] = x;
+}

Jakub

Re: [PATCH V12]: Improve code sinking pass

2024-03-13 Thread Richard Biener

On Wed, Mar 13, 2024 at 10:02 AM Ajit Agarwal  wrote:
>
> Hello All:
>
> Currently, code sinking will sink code at the use points with loop having same
> nesting depth. The following patch improves code sinking by placing the sunk
> code in immediate dominator with same loop nest depth.
>
> Changes since v11:
>
> Reorganization of the code.
>
> For example :
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>   l = a + b + c + d +e + f;
>   if (a != 5)
> {
>   bar();
>   j = l;
> }
> }
>
> Code Sinking does the following:
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>
>   if (a != 5)
> {
>   l = a + b + c + d +e + f;
>   bar();
>   j = l;
> }
> }
>
> Bootstrapped regtested on powerpc64-linux-gnu.
>
> Thanks & Regards
>
>
> tree-ssa-sink: Improve code sinking pass
>
> Currently, code sinking will sink code at the use points with loop having same
> nesting depth. The following patch improves code sinking by placing the sunk
> code in immediate dominator with same loop nest depth.
>
> 2024-03-13  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> PR tree-optimization/81953
> * tree-ssa-sink.cc (statement_sink_location): Move statements with
> same loop nest depth.
> (select_best_block): Add heuristics to select the best blocks in the
> immediate dominator for same loop nest depth.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/81953
> * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
> * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++
>  gcc/tree-ssa-sink.cc| 26 +
>  3 files changed, 55 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> new file mode 100644
> index 000..d3b79ca5803
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> new file mode 100644
> index 000..84e7938c54f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j, x;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  if (b != 3)
> +x = 3;
> +  else
> +x = 5;
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 880d6f70a80..40f51e2f3b9 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
> bool *debug_stmts)
> tree, return the best basic block between them (inclusive) to place
> statements.
>
> +   The best basic block should be an immediate dominator of
> +   best basic block if we've moved to same loop nest.
> +
> We want the most control dependent block in the shallowest loop nest.
>
> If the resulting block is in a shallower loop nest, then use it.  Else
> @@ -209,6 +212,21 @@ select_best_block (basic_block early_bb,
>temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>  }
>
> +  temp_bb = best_bb;
> +  /* Move sinking to immediate dominator if the statement to be moved
> + is not memory operand and same loop nest.  */
> +  if (best_bb == late_bb
> +  && !gimple_vuse (stmt))
> +{
> +  while (temp_bb != early_bb)
> +   {
> + if (bb_loop_depth (temp_bb) == bb_loop_depth (best_bb))
> +   best_bb = temp_bb;
> +
> + temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
> +   }
> + }
> +

I don't think this is what sinking should do.  Instead ...

>/* Placing a statement before a setjmp-like function would be invalid
>   (it cannot be reevaluated when execution follows an abnormal edge).
>   If we selected a block with abnormal predecessors, just punt.  */
> @@ -250,7 +268,7 @@ select_best_block (basic_block early_bb,
>/* If result of comparsion

Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

2024-03-13 Thread Jakub Jelinek

On Wed, Mar 13, 2024 at 10:55:07AM +0100, Jan Hubicka wrote:
> > > So the ipa_jump_func are I think the only thing that actually can differ
> > > on the ICF merging candidates from value range POV.
> > 
> > I agree.  Btw, I would have approved the original patch in this
> > thread that wipes SSA_NAME_INFO in merged bodies to mimic what LTO
> > effectively does right now.  That also looks most sensible to
> > backport.
> > 
> > But I'll defer to Honza in the end (but also want to point out we
> > need something suitable for backporting).
> 
> My main worry here is that I tried to relax matching of IL metadata in
> the past and it triggers interesting problems.  (I implemented TBAA
> merging and one needs to match additional things in summaries and loop
> structures)

The point of the patch is that it emulates what happens with LTO (though,
does that only for successful ICF merges), because LTO streaming ignores
SSA_NAME_{RANGE,PTR}_INFO.
So, because with LTO all you have in the IL is the m_vr in jump_tables,
pure/const analysis results, whatever else might be derived on the side
from the range info, you need to punt or union all that information anyway,
otherwise it will misbehave with LTO.
So punting on SSA_NAME_{RANGE,PTR}_INFO differences instead of throwing it
away means that non-LTO will get fewer ICF merges than LTO unnecessarily,
it doesn't improve anything for the code correctness at least for the LTO
case.

Jakub

Re: [PATCH] testsuite: Fix vfprintf-chk-1.c with -fhardened

2024-03-13 Thread Xi Ruoyao

On Tue, 2024-03-12 at 17:19 +0100, Jakub Jelinek wrote:
> On Thu, Feb 15, 2024 at 10:53:08PM +, Sam James wrote:
> > With _FORTIFY_SOURCE >= 2 (enabled by -fhardened), vfprintf-chk-1.c's
> > __vfprintf_chk ends up calling __vprintf_chk rather than vprintf.

Do we really want to support adding random CFLAGS running the test
suite?  AFAIK adding random CFLAGS will just cause test failures here or
there.  We are adjusting the test suite for -fPIE -pie and -fstack-
protector-strong but it's because they can be implicitly enabled with --
enable-default-* options, and we don't have --enable-default-hardened as
at now.

If we need to bootstrap a hardened GCC and test it, pass -fhardened as
how "info gccinstall" suggests:

make BOOT_CFLAGS="-O2 -g -fhardened"

instead of

env C{,XX}FLAGS="-O2 -g -fhardened" /path/to/gcc/configure ...

which will taint the test suite with -fhardened.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

2024-03-13 Thread Richard Biener

On Wed, 13 Mar 2024, Jan Hubicka wrote:

> > On Tue, 12 Mar 2024, Jakub Jelinek wrote:
> > 
> > > On Tue, Mar 12, 2024 at 05:21:58PM +0100, Jakub Jelinek wrote:
> > > > On Tue, Mar 12, 2024 at 10:46:42AM +0100, Jan Hubicka wrote:
> > > > > I am sorry for delaying this.  I made the variant that simply compares
> > > > > value range of functions and prevents merging if they diverge and 
> > > > > wanted
> > > > > to make some bigger statistics.  This made me notice some performance
> > > > > problems on clang performance and libstdc++ RB-trees which disrailed 
> > > > > me
> > > > > from the original PR.  I will finish the statistics today.
> > > > 
> > > > With the posted patch, perhaps if we don't want to union jump_tables 
> > > > etc.,
> > > > all we could punt on is differences in the jump_table VRs rather than 
> > > > just
> > > > any SSA_NAME_RANGE_INFO differences.
> > > 
> > > To expand on this, I think we need to either union or punt on jump_func
> > > differences in any case, because for LTO we can't really punt on
> > > SSA_NAME_RANGE_INFO differences given that we don't stream that out and 
> > > in.
> 
> I noticed that yesterday too (I added my jump function testcase to
> testsuit and it fails with -flto, too).  I implemented comparator for
> them too and run the stats.  There was over 3000 functions in bootstrap
> where we run into differences in value-range and about 150k in LLVM
> build.
> 
> Inspecting random examples shown that those are usually false positives
> (pair of functions that are different but triggers hash colision) caused
> by the fact that we do not hash PHI arguments, so code like
> 
> int test (int a)
> {
> return a>0 ? CST1:  CST2;
> }
> 
> gets same hash value no matter what CST1/CST2 is.  I added hasher and I
> am re-running stats.

The hash should be commutative here at least.

> > > So the ipa_jump_func are I think the only thing that actually can differ
> > > on the ICF merging candidates from value range POV.
> > 
> > I agree.  Btw, I would have approved the original patch in this
> > thread that wipes SSA_NAME_INFO in merged bodies to mimic what LTO
> > effectively does right now.  That also looks most sensible to
> > backport.
> > 
> > But I'll defer to Honza in the end (but also want to point out we
> > need something suitable for backporting).
> 
> My main worry here is that I tried to relax matching of IL metadata in
> the past and it triggers interesting problems.  (I implemented TBAA
> merging and one needs to match additional things in summaries and loop
> structures)
> 
> If value range differs at IPA analysis time, we need to be sure that we
> compare everything that possibly depends on it. So it is always safer to
> just compare more than try to merge. Which is what we do in all cases so
> far.  Here, for the first time, is the problem is with LTO streaming
> missing the data though.
> 
> Thinking more about it, I wonder if different value ranges can be
> exploited to cause different decisions about function being finite
> (confuse pure/const) or different outcome of alias analysis yielding to
> different aggregate jump functions (confusing ipa-prop).

The obvious thing would be range info making it possible to prove
a stmt cannot trap, say for an array index or for arithmetic with
-ftrapv.  But I'm not sure that makes a differnce for pure/const-ness.

Making something looping vs. non-looping should be easy though,
just have range info for a loop with an unsigned IV that evolves
like { 0, +, increment } and with a != exit condition where for some 
'increment' we know we eventually reach equality but with some
other we know we never do.

But that just means pure/const state needs to be recomputed after
merging?  Or compared.

Richard.

Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

2024-03-13 Thread Jan Hubicka

> On Tue, 12 Mar 2024, Jakub Jelinek wrote:
> 
> > On Tue, Mar 12, 2024 at 05:21:58PM +0100, Jakub Jelinek wrote:
> > > On Tue, Mar 12, 2024 at 10:46:42AM +0100, Jan Hubicka wrote:
> > > > I am sorry for delaying this.  I made the variant that simply compares
> > > > value range of functions and prevents merging if they diverge and wanted
> > > > to make some bigger statistics.  This made me notice some performance
> > > > problems on clang performance and libstdc++ RB-trees which disrailed me
> > > > from the original PR.  I will finish the statistics today.
> > > 
> > > With the posted patch, perhaps if we don't want to union jump_tables etc.,
> > > all we could punt on is differences in the jump_table VRs rather than just
> > > any SSA_NAME_RANGE_INFO differences.
> > 
> > To expand on this, I think we need to either union or punt on jump_func
> > differences in any case, because for LTO we can't really punt on
> > SSA_NAME_RANGE_INFO differences given that we don't stream that out and in.

I noticed that yesterday too (I added my jump function testcase to
testsuit and it fails with -flto, too).  I implemented comparator for
them too and run the stats.  There was over 3000 functions in bootstrap
where we run into differences in value-range and about 150k in LLVM
build.

Inspecting random examples shown that those are usually false positives
(pair of functions that are different but triggers hash colision) caused
by the fact that we do not hash PHI arguments, so code like

int test (int a)
{
return a>0 ? CST1:  CST2;
}

gets same hash value no matter what CST1/CST2 is.  I added hasher and I
am re-running stats.

> > So the ipa_jump_func are I think the only thing that actually can differ
> > on the ICF merging candidates from value range POV.
> 
> I agree.  Btw, I would have approved the original patch in this
> thread that wipes SSA_NAME_INFO in merged bodies to mimic what LTO
> effectively does right now.  That also looks most sensible to
> backport.
> 
> But I'll defer to Honza in the end (but also want to point out we
> need something suitable for backporting).

My main worry here is that I tried to relax matching of IL metadata in
the past and it triggers interesting problems.  (I implemented TBAA
merging and one needs to match additional things in summaries and loop
structures)

If value range differs at IPA analysis time, we need to be sure that we
compare everything that possibly depends on it. So it is always safer to
just compare more than try to merge. Which is what we do in all cases so
far.  Here, for the first time, is the problem is with LTO streaming
missing the data though.

Thinking more about it, I wonder if different value ranges can be
exploited to cause different decisions about function being finite
(confuse pure/const) or different outcome of alias analysis yielding to
different aggregate jump functions (confusing ipa-prop).

I will try to build testcases today.
Honza
> 
> Richard.

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-13 Thread Xi Ruoyao

On Wed, 2024-03-13 at 10:24 +0800, Xi Ruoyao wrote:
>    return TARGET_EXPLICIT_RELOCS
> -    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> -  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
> -  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
> -  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
> -  \tadd.d\t$r4,$r4,%2\n\
> -  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> -  \tjirl\t$r1,$r1,%%desc_call(%1)"
> -    : "la.tls.desc\t%0,%2,%1";
> +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%0)\n\t"
> +  "addi.d\t%1,$r0,%%desc_pc_lo12(%0)\n\t"
> +  "lu32i.d\t%1,%%desc64_pc_lo20(%0)\n\t"
> +  "lu52i.d\t%1,%2,%%desc64_pc_hi12(%0)\n\t"

Oops, the "%2" in the above line should be "%1".

> +  "add.d\t$r4,$r4,%1\n\t"
> +  "ld.d\t$r1,$r4,%%desc_ld(%0)\n\t"
> +  "jirl\t$r1,$r1,%%desc_call(%0)"
> +    : "la.tls.desc\t$r4,%1,%0";

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH V3 3/4] ree: Improve ree pass.

2024-03-13 Thread Ajit Agarwal

Hello All:

For rs6000 target we see redundant zero and sign extension and done to improve
ree pass to eliminate such redundant zero and sign extension. Support of
zero_extend/sign_extend/AND. Also support of AND with extension with different
constants like 0x7/0x7F/0x7 other than 1.

Changes since v2:

- Added all constants 0x7/0x7F/0x7 other than 1 for machine modes.
- Improving coding conventions.
- Reorganization of the code.

Bootstrapped and regtested for powerpc64-linux-gnu.

contrib/check_GNU_stype.sh looks good.

spec 2017 INT and FP benchmarks runs looks good.

Thanks & Regards
Ajit


ree: Improve ree pass for rs6000 target

For rs6000 target we see redundant zero and sign extension and done to improve
ree pass to eliminate such redundant zero and sign extension. Support of
zero_extend/sign_extend/AND. Also support of AND with extension with different
constants like 0x7/0x7F/0x7 other than 1.

2024-03-13  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (eliminate_across_bbs_p): Add checks to enable extension
elimination across and within basic blocks.
(def_arith_p): New function to check definition has arithmetic
operation.
(combine_set_extension): Modification to incorporate AND
and current zero_extend and sign_extend instruction.
(merge_def_and_ext): Add calls to eliminate_across_bbs_p and
zero_extend sign_extend and AND instruction.
(rtx_is_zext_p): New function.
(feasible_cfg): New function.
* rtl.h (reg_used_set_between_p): Add prototype.
* rtlanal.cc (reg_used_set_between_p): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim.C: New testcase.
* g++.target/powerpc/zext-elim-1.C: New testcase.
* g++.target/powerpc/zext-elim-2.C: New testcase.
* g++.target/powerpc/sext-elim.C: New testcase.
---
Changes since v2:

- Added all constants 0x7/0x7F/0x7 other than 1 for machine modes.
- Improving coding conventions.
- Reorganization of the code.
---
 gcc/ree.cc| 517 --
 gcc/rtl.h |   1 +
 gcc/rtlanal.cc|  15 +
 gcc/testsuite/g++.target/powerpc/sext-elim.C  |  16 +
 .../g++.target/powerpc/zext-elim-1.C  |  18 +
 .../g++.target/powerpc/zext-elim-2.C  |  10 +
 gcc/testsuite/g++.target/powerpc/zext-elim.C  |  29 +
 7 files changed, 558 insertions(+), 48 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index bfc4b4b0412..43fed62d755 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -253,6 +253,77 @@ struct ext_cand
 
 static int max_insn_uid;
 
+/* Return TRUE if OP can be considered a zero extension from one or
+   more sub-word modes to larger modes up to a full word.
+
+   For example (and:DI (reg) (const_int X))
+
+   Depending on the value of X could be considered a zero extension
+   from QI, HI and SI to larger modes up to DImode.  */
+
+static bool
+rtx_is_zext_p (rtx insn)
+{
+  if (GET_CODE (insn) == AND)
+{
+  rtx set = XEXP (insn, 0);
+  if (REG_P (set))
+   {
+ rtx src = XEXP (insn, 1);
+ machine_mode m_mode = GET_MODE (set);
+
+ if (CONST_INT_P (src)
+ && (INTVAL (src) == 1
+ || (m_mode == QImode && INTVAL (src) == 0x7)
+ || (m_mode == QImode && INTVAL (src) == 0x007F)
+ || (m_mode == HImode && INTVAL (src) == 0x7FFF)
+ || (m_mode == SImode && INTVAL (src) == 0x007F)))
+   return true;
+
+   }
+  else
+   return false;
+}
+
+  return false;
+}
+/* Return TRUE if OP can be considered a zero extension from one or
+   more sub-word modes to larger modes up to a full word.
+
+   For example (and:DI (reg) (const_int X))
+
+   Depending on the value of X could be considered a zero extension
+   from QI, HI and SI to larger modes up to DImode.  */
+
+static bool
+rtx_is_zext_p (rtx_insn *insn)
+{
+  rtx body = single_set (insn);
+
+  if (GET_CODE (body) == SET && GET_CODE (SET_SRC (body)) == AND)
+   {
+ rtx set = XEXP (SET_SRC (body), 0);
+
+ if (REG_P (set) && GET_MODE (SET_DEST (body)) == GET_MODE (set))
+   {
+ rtx src = XEXP (SET_SRC (body), 1);
+ machine_mode m_mode = GET_MODE (set);
+
+ if (CONST_INT_P (src)
+ && (INTVAL (src) == 1
+ || (m_mode == QImode && INTVAL (src) == 0x7)
+ || (m_mode == QImode && INTVAL (src) == 0x007F)
+ || (m_mode == HImode && INTVAL (src) == 0x7FFF)
+ || (m_mode == SImode && INTVAL (src) == 0x007F)))
+   return true;
+   }

Re: [PATCH v4] LoongArch: Add support for TLS descriptors

2024-03-13 Thread Xi Ruoyao

On Wed, 2024-03-13 at 11:06 +0800, mengqinggang wrote:
> 
> 在 2024/3/13 上午6:15, Xi Ruoyao 写道:
> > On Tue, 2024-03-12 at 17:20 +0800, mengqinggang wrote:
> > > +(define_insn "@got_load_tls_desc"
> > > +  [(set (match_operand:P 0 "register_operand" "=r")
> > > + (unspec:P
> > > +     [(match_operand:P 1 "symbolic_operand" "")]
> > > +     UNSPEC_TLS_DESC))
> > > +    (clobber (reg:SI FCC0_REGNUM))
> > > +    (clobber (reg:SI FCC1_REGNUM))
> > > +    (clobber (reg:SI FCC2_REGNUM))
> > > +    (clobber (reg:SI FCC3_REGNUM))
> > > +    (clobber (reg:SI FCC4_REGNUM))
> > > +    (clobber (reg:SI FCC5_REGNUM))
> > > +    (clobber (reg:SI FCC6_REGNUM))
> > > +    (clobber (reg:SI FCC7_REGNUM))
> > > +    (clobber (reg:SI RETURN_ADDR_REGNUM))]
> > > +  "TARGET_TLS_DESC"
> > > +{
> > > +  return TARGET_EXPLICIT_RELOCS
> > > +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> > > +  \taddi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\
> > > +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> > > +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> > Use something like
> > 
> >  ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\t"
> >    "addi.d\t$r4,$r4,%%desc_pc_lo12(%1)\n\t"
> >    "ld.d\t$r1,$r4,%%desc_ld(%1)\n\t"
> >    "jirl\t$r1,$r1,%%desc_call(%1)"
> >  : "la.tls.desc\t%0,%1";
> > 
> > to prevent additional white spaces in the output asm before tabs.
> > 
> > > +    : "la.tls.desc\t%0,%1";
> > > +}
> > > +  [(set_attr "got" "load")
> > > +   (set_attr "mode" "")
> > > +   (set_attr "length" "16")])
> > > +
> > > +(define_insn "got_load_tls_desc_off64"
> > > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > > + (unspec:DI
> > > +     [(match_operand:DI 1 "symbolic_operand" "")]
> > > +     UNSPEC_TLS_DESC_OFF64))
> > > +    (clobber (reg:SI FCC0_REGNUM))
> > > +    (clobber (reg:SI FCC1_REGNUM))
> > > +    (clobber (reg:SI FCC2_REGNUM))
> > > +    (clobber (reg:SI FCC3_REGNUM))
> > > +    (clobber (reg:SI FCC4_REGNUM))
> > > +    (clobber (reg:SI FCC5_REGNUM))
> > > +    (clobber (reg:SI FCC6_REGNUM))
> > > +    (clobber (reg:SI FCC7_REGNUM))
> > > +    (clobber (reg:SI RETURN_ADDR_REGNUM))
> > > +    (clobber (match_operand:DI 2 "register_operand" "="))]
> > > +  "TARGET_TLS_DESC && TARGET_CMODEL_EXTREME"
> > > +{
> > > +  return TARGET_EXPLICIT_RELOCS
> > > +    ? "pcalau12i\t$r4,%%desc_pc_hi20(%1)\n\
> > > +  \taddi.d\t%2,$r0,%%desc_pc_lo12(%1)\n\
> > > +  \tlu32i.d\t%2,%%desc64_pc_lo20(%1)\n\
> > > +  \tlu52i.d\t%2,%2,%%desc64_pc_hi12(%1)\n\
> > > +  \tadd.d\t$r4,$r4,%2\n\
> > > +  \tld.d\t$r1,$r4,%%desc_ld(%1)\n\
> > > +  \tjirl\t$r1,$r1,%%desc_call(%1)"
> > > +    : "la.tls.desc\t%0,%2,%1";
> > Likewise.
> > 
> > > +}
> > > +  [(set_attr "got" "load")
> > > +   (set_attr "length" "28")])
> > Otherwise OK.
> > 
> > It's better to allow splitting these two instructions but we can do it
> > in another patch.  And IMO it's better to enable TLS desc by default if
> > supported by both the assembler and the libc, but we'll have to defer it
> > until Glibc 2.40 release.
> 
> 
> Do we need to wait until LLVM also supports TLS DESC  before setting it 
> as default?

Hmm, maybe...  I remember when we added R_LARCH_ALIGN lld was being
broken for a while.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: No rule to make target '../libbacktrace/libbacktrace.la', needed by 'libgo.la'. [PR106472]

2024-03-13 Thread Jakub Jelinek

On Wed, Mar 13, 2024 at 07:37:26AM +0100, Дилян Палаузов wrote:
> Non-parallel build can fail, depending on the ./configure parameters -
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106472 .
> 
> The change below does fix the problem.

CCing build system maintainers and the Go maintainer.

While the first Makefile.tpl hunk looks obviously ok, the others look
completely wrong to me.
There is nothing special about libgo vs. libbacktrace/libatomic
compared to any other target library which is not bootstrapped vs. any
of its dependencies which are in the bootstrapped set.
So, Makefile.tpl shouldn't hardcode such dependencies.

The
all-target-libgo: maybe-all-target-libbacktrace
all-target-libgo: maybe-all-target-libatomic
dependencies which are in Makefile.in already are I believe intentionally
guarded with
@unless gcc-bootstrap
because when bootstrapping, there are the
all-target-libgo: stage_current
configure-target-libgo: stage_last
dependencies instead plus there is always
all-target-libgo: configure-target-libgo
and stage_last should I believe ensure that everything is bootstrapped,
gcc as well as the bootstrapped target libraries like libbacktrace or
libatomic.
Now, if those are built only sometimes depending on configured languages
- I see
grep 'lib\(backtrace\|atomic\)' gcc/*/config-lang.in 
gcc/ada/gcc-interface/config-lang.in 
gcc/d/config-lang.in:phobos_target_deps="target-zlib target-libbacktrace"
gcc/fortran/config-lang.in:target_libs="target-libgfortran target-libbacktrace"
gcc/go/config-lang.in:target_libs="target-libgo target-libffi 
target-libbacktrace"
then perhaps Makefile.def should know that it is not a bootstrap=true module
target_modules = { module= libbacktrace; bootstrap=true; };
unconditionally and arrange for the dependencies between non-bootstrap
target modules and these maybe ones to be emitted even if gcc-bootstrap.

> I do not understand the build system to say, that this is the best approach,
> so if there are questions I might or might not be able to answer them.
> 
> I tried different things, this worked on the releases/gcc-13 branch.  On the
> master branch last weekend the problem was that stage2 and stage3 results
> are not equal, so I have not verified this change there.  depend= in
> Makefile.def seem to have only effect if bootstrapping is involved and
> gcc/go/config-lang.in does not have boot_language=yes .  The lines below are
> present in the Makefile.in:@unless gcc-bootstrap snippet.  Actually I think
> ./configure --enable-languages=all and then serial build work, because this
> implied D and it does imply bootstrapping for libbacktrace and libatomic.  I
> also do not want to invest much more time on this.
> 
> I do not know, if 2×`maybe-`  is necessary.
> 
> 
> diff --git a/Makefile.in b/Makefile.in
> index 06a9398e172..236e5cda942 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -66481,6 +66481,7 @@ configure-target-libgfortran:
> maybe-all-target-libquadmath
> 
> 
>  @if gcc-bootstrap
> +all-target-libgo: maybe-all-target-libbacktrace maybe-all-target-libatomic
>  configure-gnattools: stage_last
>  configure-libcc1: stage_last
>  configure-c++tools: stage_last
> diff --git a/Makefile.tpl b/Makefile.tpl
> index dfbd74b68f8..98160c7626b 100644
> --- a/Makefile.tpl
> +++ b/Makefile.tpl
> @@ -1952,7 +1952,7 @@ configure-target-[+module+]: maybe-all-gcc[+
> (define dep-maybe (lambda ()
>(if (exist? "hard") "" "maybe-")))
> 
> -   ;; dep-kind returns returns "prebootstrap" for configure or build
> +   ;; dep-kind returns "prebootstrap" for configure or build
> ;; dependencies of bootstrapped modules on a build module
> ;; (e.g. all-gcc on all-build-bison); "normal" if the dependency is
> ;; on an "install" target, or if the dependence module is not
> @@ -2017,6 +2017,7 @@ configure-target-[+module+]: maybe-all-gcc[+
>  [+ ESAC +][+ ENDFOR dependencies +]
> 
>  @if gcc-bootstrap
> +all-target-libgo: maybe-all-target-libbacktrace maybe-all-target-libatomic
>  [+ FOR dependencies +][+ CASE (dep-kind) +]
>  [+ == "postbootstrap" +][+ (make-postboot-dep) +][+ ESAC +][+
>  ENDFOR dependencies +]@endif gcc-bootstrap
> 

Jakub

Re: [PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends

2024-03-13 Thread Thomas Schwinge

Hi Chung-Lin!

On 2024-03-07T17:02:02+0900, Chung-Lin Tang  wrote:
> On 2023/10/26 6:43 PM, Thomas Schwinge wrote:
>> +++ b/gcc/tree.h
>> @@ -1813,6 +1813,14 @@ class auto_suppress_location_wrappers
>>   #define OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE(NODE) \
>> (OMP_CLAUSE_SUBCODE_CHECK (NODE, 
>> OMP_CLAUSE_MAP)->base.addressable_flag)
>>
>> +/* Nonzero if OpenACC 'readonly' modifier set, used for 'copyin'.  */
>> +#define OMP_CLAUSE_MAP_READONLY(NODE) \
>> +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
>> +
>> +/* Same as above, for use in OpenACC cache directives.  */
>> +#define OMP_CLAUSE__CACHE__READONLY(NODE) \
>> +  TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))
> I'm not sure if these special accessor functions are actually useful, or
> we should just directly use 'TREE_READONLY' instead?  We're only using
> them in contexts where it's clear that the 'OMP_CLAUSE_SUBCODE_CHECK' is
> satisfied, for example.
 I find directly using TREE_READONLY confusing.
>>>
>>> FWIW, I've changed to use TREE_NOTHROW instead, if it can give a better 
>>> sense of safety :P
>> 
>> I don't understand that, why not use 'TREE_READONLY'?
>> 
>>> I think there's a misunderstanding here anyways: we are not relying on a 
>>> DECL marked
>>> TREE_READONLY here. We merely need the OMP_CLAUSE_MAP to be marked as 
>>> OMP_CLAUSE_MAP_READONLY == 1.
>> 
>> Yes, I understand that.  My question was why we don't just use
>> 'TREE_READONLY (c)', where 'c' is the
>> 'OMP_CLAUSE_MAP'/'OMP_CLAUSE__CACHE_' clause (not its decl), and avoid
>> the indirection through
>> '#define OMP_CLAUSE_MAP_READONLY'/'#define OMP_CLAUSE__CACHE__READONLY',
>> given that we're only using them in contexts where it's clear that the
>> 'OMP_CLAUSE_SUBCODE_CHECK' is satisfied.  I don't have a strong
>> preference, though.
>
> After further re-testing using TREE_NOTHROW, I have reverted to using 
> TREE_READONLY

ACK, thanks.

> because TREE_NOTHROW clashes
> with OMP_CLAUSE_RELEASE_DESCRIPTOR (which doesn't use the OMP_CLAUSE_MAP_* 
> naming convention and is
> not documented in gcc/tree-core.h either, hmmm...)

Yeah, it's a mess...  The same bits of information spread over three
different places.

(One day I'll turn 'tree's into a proper C++ class hierarchy, with
accessor methods for such flags, statically checked at compile-time, and
thus documented in a single place.  Etc.)

> I have added the comment adjustments in gcc/tree-core.h for the new uses of 
> TREE_READONLY/readonly_flag.
>
> We basically all use OMP_CLAUSE_SUBCODE_CHECK macros for OpenMP clause 
> expressions exclusively,
> so I don't see a reason to diverge from that style (even when context is 
> clear).

ACK.

> I have greatly expanded the test scan patterns to include 
> parallel/kernels/serial/data/enter data,
> as well as non-readonly copyin clause together with readonly.

Thanks.

> Also added simple 'declare' tests, but there is not anything to scan in the 
> 'tree-original' dump though.

Yeah, the current OpenACC 'declare' implementation is "special".

>>> --- a/gcc/fortran/openmp.cc
>>> +++ b/gcc/fortran/openmp.cc
>>> @@ -1197,7 +1197,7 @@ omp_inv_mask::omp_inv_mask (const omp_mask ) : 
>>> omp_mask (m)
>>>
>>>  static bool
>>>  gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
>>> -   bool allow_common, bool allow_derived)
>>> +   bool allow_common, bool allow_derived, bool 
>>> readonly = false)
>>>  {
>>>gfc_omp_namelist **head = NULL;
>>>if (gfc_match_omp_variable_list ("", list, allow_common, NULL, , 
>>> true,
>>> @@ -1206,7 +1206,10 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, 
>>> gfc_omp_map_op map_op,
>>>  {
>>>gfc_omp_namelist *n;
>>>for (n = *head; n; n = n->next)
>>> - n->u.map_op = map_op;
>>> + {
>>> +   n->u.map.op = map_op;
>>> +   n->u.map.readonly = readonly;
>>> + }
>>>return true;
>>>  }
>> 
>> Didn't we conclude that "not doing it here is cleaner" (Tobias' words),
>> and instead do this "Similar to 'c_parser_omp_var_list_parens'" (my
>> words)?  That is, not add the 'bool readonly' formal parameter to
>> 'gfc_match_omp_map_clause'.
>
> Fixed in this v3 patch.

Thanks.

> Again, tested on x86_64-linux + nvptx offloading. Okay for mainline?

Yes, thanks.


Grüße
 Thomas


> gcc/c/ChangeLog:
>
>   * c-parser.cc (c_parser_oacc_data_clause): Add parsing support for
>   'readonly' modifier, set OMP_CLAUSE_MAP_READONLY if readonly modifier
>   found, update comments.
>   (c_parser_oacc_cache): Add parsing support for 'readonly' modifier,
>   set OMP_CLAUSE__CACHE__READONLY if readonly modifier found, update
>   comments.
>
> gcc/cp/ChangeLog:
>
>   * parser.cc (cp_parser_oacc_data_clause): Add parsing support for
>   'readonly' modifier, set OMP_CLAUSE_MAP_READONLY if readonly modifier
>

Re: [PATCH] Fortran: fix IS_CONTIGUOUS for polymorphic dummy arguments [PR114001]

2024-03-13 Thread Paul Richard Thomas

Hi Harald,

This looks good to me. The testcase gives the same result with other brands.

OK for mainline and for backporting.

Thanks

Paul

On Tue, 12 Mar 2024 at 22:12, Harald Anlauf  wrote:

> Dear all,
>
> here's another small fix: IS_CONTIGUOUS did erroneously always
> return .true. for CLASS dummy arguments.  The solution was to
> adjust the logic in gfc_is_simply_contiguous to also handle
> CLASS symbols.
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
>
> Thanks,
> Harald
>
>

[PATCH V12]: Improve code sinking pass

2024-03-13 Thread Ajit Agarwal

Hello All:

Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in immediate dominator with same loop nest depth.

Changes since v11:

Reorganization of the code.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;

  if (a != 5)
{
  l = a + b + c + d +e + f;
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards


tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in immediate dominator with same loop nest depth.

2024-03-13  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements with
same loop nest depth.
(select_best_block): Add heuristics to select the best blocks in the
immediate dominator for same loop nest depth.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
* gcc.dg/tree-ssa/ssa-sink-22.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++
 gcc/tree-ssa-sink.cc| 26 +
 3 files changed, 55 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 880d6f70a80..40f51e2f3b9 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
tree, return the best basic block between them (inclusive) to place
statements.
 
+   The best basic block should be an immediate dominator of
+   best basic block if we've moved to same loop nest.
+
We want the most control dependent block in the shallowest loop nest.
 
If the resulting block is in a shallower loop nest, then use it.  Else
@@ -209,6 +212,21 @@ select_best_block (basic_block early_bb,
   temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
 }
 
+  temp_bb = best_bb;
+  /* Move sinking to immediate dominator if the statement to be moved
+ is not memory operand and same loop nest.  */
+  if (best_bb == late_bb
+  && !gimple_vuse (stmt))
+{
+  while (temp_bb != early_bb)
+   {
+ if (bb_loop_depth (temp_bb) == bb_loop_depth (best_bb))
+   best_bb = temp_bb;
+
+ temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
+   }
+ }
+
   /* Placing a statement before a setjmp-like function would be invalid
  (it cannot be reevaluated when execution follows an abnormal edge).
  If we selected a block with abnormal predecessors, just punt.  */
@@ -250,7 +268,7 @@ select_best_block (basic_block early_bb,
   /* If result of comparsion is unknown, prefer EARLY_BB.
 Thus use !(...>=..) rather than (...<...)  */
   && !(best_bb->count * 100 >= early_bb->count * threshold))
-return best_bb;
+ return best_bb;
 
   /* No better block found, so return EARLY_BB, which happens to be the
  statement's original block.  */
@@ -430,6 +448,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
continue;

Re: [PATCH] bitint: Fix up lowering of bitfield loads/stores [PR114313]

2024-03-13 Thread Richard Biener

On Wed, 13 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs, because for large/huge _BitInt bitfield
> loads/stores we use the DECL_BIT_FIELD_REPRESENTATIVE as the underlying
> "var" and indexes into it can be larger than the precision of the
> bitfield might normally allow.
> 
> The following patch fixes that by passing NULL_TREE type in that case
> to limb_access, so that we always return m_limb_type type and don't
> do the extra assertions, after all, the callers expect that too.
> I had to add the first hunk to avoid ICE, it was using type in one place
> even when it was NULL.  But TYPE_SIZE (TREE_TYPE (var)) seems like the
> right size to use anyway because the code uses VIEW_CONVERT_EXPR on it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.
 
> 2024-03-13  Jakub Jelinek  
> 
>   PR middle-end/114313
>   * gimple-lower-bitint.cc (bitint_large_huge::limb_access): Use
>   TYPE_SIZE of TREE_TYPE (var) rather than TYPE_SIZE of type.
>   (bitint_large_huge::handle_load): Pass NULL_TREE rather than
>   rhs_type to limb_access for the bitfield load cases.
>   (bitint_large_huge::lower_mergeable_stmt): Pass NULL_TREE rather than
>   lhs_type to limb_access if nlhs is non-NULL.
> 
>   * gcc.dg/torture/bitint-62.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-03-05 10:27:04.609415622 +0100
> +++ gcc/gimple-lower-bitint.cc2024-03-12 16:45:50.152914901 +0100
> @@ -640,7 +640,7 @@ bitint_large_huge::limb_access (tree typ
>TREE_TYPE (TREE_TYPE (var
>   {
> unsigned HOST_WIDE_INT nelts
> - = CEIL (tree_to_uhwi (TYPE_SIZE (type)), limb_prec);
> + = CEIL (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))), limb_prec);
> tree atype = build_array_type_nelts (ltype, nelts);
> var = build1 (VIEW_CONVERT_EXPR, atype, var);
>   }
> @@ -1854,7 +1854,7 @@ bitint_large_huge::handle_load (gimple *
>   m_gsi = gsi_after_labels (gsi_bb (m_gsi));
> else
>   gsi_next (_gsi);
> -   tree t = limb_access (rhs_type, nrhs1, size_int (bo_idx), true);
> +   tree t = limb_access (NULL_TREE, nrhs1, size_int (bo_idx), true);
> tree iv = make_ssa_name (m_limb_type);
> g = gimple_build_assign (iv, t);
> insert_before (g);
> @@ -1941,7 +1941,7 @@ bitint_large_huge::handle_load (gimple *
>tree iv2 = NULL_TREE;
>if (nidx0)
>   {
> -   tree t = limb_access (rhs_type, nrhs1, nidx0, true);
> +   tree t = limb_access (NULL_TREE, nrhs1, nidx0, true);
> iv = make_ssa_name (m_limb_type);
> g = gimple_build_assign (iv, t);
> insert_before (g);
> @@ -1966,7 +1966,7 @@ bitint_large_huge::handle_load (gimple *
> if_then (g, profile_probability::likely (),
>  edge_true, edge_false);
>   }
> -   tree t = limb_access (rhs_type, nrhs1, nidx1, true);
> +   tree t = limb_access (NULL_TREE, nrhs1, nidx1, true);
> if (m_upwards_2limb
> && !m_first
> && !m_bitfld_load
> @@ -2728,8 +2728,8 @@ bitint_large_huge::lower_mergeable_stmt
> /* Otherwise, stores to any other lhs.  */
> if (!done)
>   {
> -   tree l = limb_access (lhs_type, nlhs ? nlhs : lhs,
> - nidx, true);
> +   tree l = limb_access (nlhs ? NULL_TREE : lhs_type,
> + nlhs ? nlhs : lhs, nidx, true);
> g = gimple_build_assign (l, rhs1);
>   }
> insert_before (g);
> @@ -2873,7 +2873,8 @@ bitint_large_huge::lower_mergeable_stmt
> /* Otherwise, stores to any other lhs.  */
> if (!done)
>   {
> -   tree l = limb_access (lhs_type, nlhs ? nlhs : lhs, nidx, true);
> +   tree l = limb_access (nlhs ? NULL_TREE : lhs_type,
> + nlhs ? nlhs : lhs, nidx, true);
> g = gimple_build_assign (l, rhs1);
>   }
> insert_before (g);
> --- gcc/testsuite/gcc.dg/torture/bitint-62.c.jj   2024-03-12 
> 16:40:38.400198787 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-62.c  2024-03-12 16:41:43.988297525 
> +0100
> @@ -0,0 +1,28 @@
> +/* PR middle-end/114313 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 256
> +struct S { _BitInt(257) : 257; _BitInt(256) b : 182; } s;
> +
> +__attribute__((noipa)) _BitInt(256)
> +foo (void)
> +{
> +  return s.b;
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __BITINT_MAXWIDTH__ >= 256
> +  s.b = 1414262180967678524960294186228886540125217087586381431wb;
> +  if (foo () !=

[PATCH] bitint: Fix up lowering of bitfield loads/stores [PR114313]

2024-03-13 Thread Jakub Jelinek

Hi!

The following testcase ICEs, because for large/huge _BitInt bitfield
loads/stores we use the DECL_BIT_FIELD_REPRESENTATIVE as the underlying
"var" and indexes into it can be larger than the precision of the
bitfield might normally allow.

The following patch fixes that by passing NULL_TREE type in that case
to limb_access, so that we always return m_limb_type type and don't
do the extra assertions, after all, the callers expect that too.
I had to add the first hunk to avoid ICE, it was using type in one place
even when it was NULL.  But TYPE_SIZE (TREE_TYPE (var)) seems like the
right size to use anyway because the code uses VIEW_CONVERT_EXPR on it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-13  Jakub Jelinek  

PR middle-end/114313
* gimple-lower-bitint.cc (bitint_large_huge::limb_access): Use
TYPE_SIZE of TREE_TYPE (var) rather than TYPE_SIZE of type.
(bitint_large_huge::handle_load): Pass NULL_TREE rather than
rhs_type to limb_access for the bitfield load cases.
(bitint_large_huge::lower_mergeable_stmt): Pass NULL_TREE rather than
lhs_type to limb_access if nlhs is non-NULL.

* gcc.dg/torture/bitint-62.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-03-05 10:27:04.609415622 +0100
+++ gcc/gimple-lower-bitint.cc  2024-03-12 16:45:50.152914901 +0100
@@ -640,7 +640,7 @@ bitint_large_huge::limb_access (tree typ
 TREE_TYPE (TREE_TYPE (var
{
  unsigned HOST_WIDE_INT nelts
-   = CEIL (tree_to_uhwi (TYPE_SIZE (type)), limb_prec);
+   = CEIL (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))), limb_prec);
  tree atype = build_array_type_nelts (ltype, nelts);
  var = build1 (VIEW_CONVERT_EXPR, atype, var);
}
@@ -1854,7 +1854,7 @@ bitint_large_huge::handle_load (gimple *
m_gsi = gsi_after_labels (gsi_bb (m_gsi));
  else
gsi_next (_gsi);
- tree t = limb_access (rhs_type, nrhs1, size_int (bo_idx), true);
+ tree t = limb_access (NULL_TREE, nrhs1, size_int (bo_idx), true);
  tree iv = make_ssa_name (m_limb_type);
  g = gimple_build_assign (iv, t);
  insert_before (g);
@@ -1941,7 +1941,7 @@ bitint_large_huge::handle_load (gimple *
   tree iv2 = NULL_TREE;
   if (nidx0)
{
- tree t = limb_access (rhs_type, nrhs1, nidx0, true);
+ tree t = limb_access (NULL_TREE, nrhs1, nidx0, true);
  iv = make_ssa_name (m_limb_type);
  g = gimple_build_assign (iv, t);
  insert_before (g);
@@ -1966,7 +1966,7 @@ bitint_large_huge::handle_load (gimple *
  if_then (g, profile_probability::likely (),
   edge_true, edge_false);
}
- tree t = limb_access (rhs_type, nrhs1, nidx1, true);
+ tree t = limb_access (NULL_TREE, nrhs1, nidx1, true);
  if (m_upwards_2limb
  && !m_first
  && !m_bitfld_load
@@ -2728,8 +2728,8 @@ bitint_large_huge::lower_mergeable_stmt
  /* Otherwise, stores to any other lhs.  */
  if (!done)
{
- tree l = limb_access (lhs_type, nlhs ? nlhs : lhs,
-   nidx, true);
+ tree l = limb_access (nlhs ? NULL_TREE : lhs_type,
+   nlhs ? nlhs : lhs, nidx, true);
  g = gimple_build_assign (l, rhs1);
}
  insert_before (g);
@@ -2873,7 +2873,8 @@ bitint_large_huge::lower_mergeable_stmt
  /* Otherwise, stores to any other lhs.  */
  if (!done)
{
- tree l = limb_access (lhs_type, nlhs ? nlhs : lhs, nidx, true);
+ tree l = limb_access (nlhs ? NULL_TREE : lhs_type,
+   nlhs ? nlhs : lhs, nidx, true);
  g = gimple_build_assign (l, rhs1);
}
  insert_before (g);
--- gcc/testsuite/gcc.dg/torture/bitint-62.c.jj 2024-03-12 16:40:38.400198787 
+0100
+++ gcc/testsuite/gcc.dg/torture/bitint-62.c2024-03-12 16:41:43.988297525 
+0100
@@ -0,0 +1,28 @@
+/* PR middle-end/114313 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 256
+struct S { _BitInt(257) : 257; _BitInt(256) b : 182; } s;
+
+__attribute__((noipa)) _BitInt(256)
+foo (void)
+{
+  return s.b;
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 256
+  s.b = 1414262180967678524960294186228886540125217087586381431wb;
+  if (foo () != 1414262180967678524960294186228886540125217087586381431wb)
+__builtin_abort ();
+  s.b = -581849792837428541666755934071828568425158644418477999wb;
+  if (foo () !=

[committed] asan, v2: Fix ICE during instrumentation of returns_twice calls [PR112709]

2024-03-13 Thread Jakub Jelinek

On Tue, Mar 12, 2024 at 02:46:07PM +0100, Richard Biener wrote:
> OK.

Thanks.  Here is the actually committed version which uses
gsi_safe_insert_before instead.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to
trunk.

2024-03-13  Jakub Jelinek  

PR sanitizer/112709
* asan.cc (maybe_create_ssa_name, maybe_cast_to_ptrmode,
build_check_stmt, maybe_instrument_call, asan_expand_mark_ifn): Use
gsi_safe_insert_before instead of gsi_insert_before.

* gcc.dg/asan/pr112709-2.c: New test.

--- gcc/asan.cc.jj  2024-03-11 13:49:58.931045179 +0100
+++ gcc/asan.cc 2024-03-11 18:38:29.047330489 +0100
@@ -2574,7 +2589,7 @@ maybe_create_ssa_name (location_t loc, t
   gimple *g = gimple_build_assign (make_ssa_name (TREE_TYPE (base)), base);
   gimple_set_location (g, loc);
   if (before_p)
-gsi_insert_before (iter, g, GSI_SAME_STMT);
+gsi_safe_insert_before (iter, g);
   else
 gsi_insert_after (iter, g, GSI_NEW_STMT);
   return gimple_assign_lhs (g);
@@ -2593,7 +2608,7 @@ maybe_cast_to_ptrmode (location_t loc, t
  NOP_EXPR, len);
   gimple_set_location (g, loc);
   if (before_p)
-gsi_insert_before (iter, g, GSI_SAME_STMT);
+gsi_safe_insert_before (iter, g);
   else
 gsi_insert_after (iter, g, GSI_NEW_STMT);
   return gimple_assign_lhs (g);
@@ -2684,7 +2699,7 @@ build_check_stmt (location_t loc, tree b
 align / BITS_PER_UNIT));
   gimple_set_location (g, loc);
   if (before_p)
-gsi_insert_before (, g, GSI_SAME_STMT);
+gsi_safe_insert_before (, g);
   else
 {
   gsi_insert_after (, g, GSI_NEW_STMT);
@@ -3025,7 +3040,7 @@ maybe_instrument_call (gimple_stmt_itera
  tree decl = builtin_decl_implicit (BUILT_IN_ASAN_HANDLE_NO_RETURN);
  gimple *g = gimple_build_call (decl, 0);
  gimple_set_location (g, gimple_location (stmt));
- gsi_insert_before (iter, g, GSI_SAME_STMT);
+ gsi_safe_insert_before (iter, g);
}
 }
 
@@ -3852,7 +3867,7 @@ asan_expand_mark_ifn (gimple_stmt_iterat
   g = gimple_build_assign (make_ssa_name (pointer_sized_int_node),
   NOP_EXPR, len);
   gimple_set_location (g, loc);
-  gsi_insert_before (iter, g, GSI_SAME_STMT);
+  gsi_safe_insert_before (iter, g);
   tree sz_arg = gimple_assign_lhs (g);
 
   tree fun
--- gcc/testsuite/gcc.dg/asan/pr112709-2.c.jj   2024-03-11 18:30:59.813488200 
+0100
+++ gcc/testsuite/gcc.dg/asan/pr112709-2.c  2024-03-11 18:31:06.506396462 
+0100
@@ -0,0 +1,50 @@
+/* PR sanitizer/112709 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=address -O2" } */
+
+struct S { char c[1024]; } *p;
+int foo (int);
+
+__attribute__((returns_twice, noipa)) int
+bar (struct S x)
+{
+  (void) x.c[0];
+  return 0;
+}
+
+void
+baz (int *y)
+{
+  foo (1);
+  *y = bar (*p);
+}
+
+void
+qux (int x, int *y)
+{
+  if (x == 25)
+x = foo (2);
+  else if (x == 42)
+x = foo (foo (3));
+  *y = bar (*p);
+}
+
+void
+corge (int x, int *y)
+{
+  void *q[] = { &, &, &, & };
+  if (x == 25)
+{
+l1:
+  x = foo (2);
+}
+  else if (x == 42)
+{
+l2:
+  x = foo (foo (3));
+}
+l3:
+  *y = bar (*p);
+  if (x < 4)
+goto *q[x & 3];
+}


Jakub

Re: [PATCH] gimple-iterator, ubsan, v3: Fix ICE during instrumentation of returns_twice calls [PR112709]

2024-03-13 Thread Richard Biener

On Tue, 12 Mar 2024, Jakub Jelinek wrote:

> On Tue, Mar 12, 2024 at 02:31:28PM +0100, Richard Biener wrote:
> > Ah, yeah, I see :/
> > 
> > > So, the intention of edge_before_returns_twice_call is just that
> > > it in the common case just finds the non-EDGE_ABNORMAL edge if there is 
> > > one,
> > > if there isn't just one, it adjusts the IL such that there is just one.
> > > And then the next step is to handle that case.
> > 
> > So I guess the updated patch is OK then.
> 
> For the naming thing, another variant would be to export
> 
> void
> gsi_safe_insert_before (gimple_stmt_iterator *iter, gimple *g)
> {
>   gimple *stmt = gsi_stmt (*iter);
>   if (stmt
>   && is_gimple_call (stmt)
>   && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
> gsi_insert_before_returns_twice_call (gsi_bb (*iter), g);
>   else
> gsi_insert_before (iter, g, GSI_SAME_STMT);
> }
>
> /* Similarly for sequence SEQ.  */
> 
> void
> gsi_safe_insert_seq_before (gimple_stmt_iterator *iter, gimple_seq seq)
> ...
> 
> and inline the gsi_insert_*before_returns_twice_call calls by hand
> in there.
> Then asan.cc/ubsan.cc wouldn't need to define those functions.
> I could even outline the updating of SSA_NAMEs on a single statement
> into a helper inline function.
> 
> The patch is even shorter then (the asan patch as well).
> 
> Tested again with make check-gcc check-g++ RUNTESTFLAGS='ubsan.exp asan.exp'

I like this more, thus OK.

Richard.
 
> 2024-03-12  Jakub Jelinek  
> 
>   PR sanitizer/112709
>   * gimple-iterator.h (gsi_safe_insert_before,
>   gsi_safe_insert_seq_before): Declare.
>   * gimple-iterator.cc: Include gimplify.h.
>   (edge_before_returns_twice_call, adjust_before_returns_twice_call,
>   gsi_safe_insert_before, gsi_safe_insert_seq_before): New functions.
>   * ubsan.cc (instrument_mem_ref, instrument_pointer_overflow,
>   instrument_nonnull_arg, instrument_nonnull_return): Use
>   gsi_safe_insert_before instead of gsi_insert_before.
>   (maybe_instrument_pointer_overflow): Use force_gimple_operand,
>   gimple_seq_add_seq_without_update and gsi_safe_insert_seq_before
>   instead of force_gimple_operand_gsi.
>   (instrument_object_size): Likewise.  Use gsi_safe_insert_before
>   instead of gsi_insert_before.
> 
>   * gcc.dg/ubsan/pr112709-1.c: New test.
>   * gcc.dg/ubsan/pr112709-2.c: New test.
> 
> --- gcc/gimple-iterator.h.jj  2024-03-12 10:15:41.253529859 +0100
> +++ gcc/gimple-iterator.h 2024-03-12 15:10:23.594845422 +0100
> @@ -93,6 +93,8 @@ extern void gsi_insert_on_edge (edge, gi
>  extern void gsi_insert_seq_on_edge (edge, gimple_seq);
>  extern basic_block gsi_insert_on_edge_immediate (edge, gimple *);
>  extern basic_block gsi_insert_seq_on_edge_immediate (edge, gimple_seq);
> +extern void gsi_safe_insert_before (gimple_stmt_iterator *, gimple *);
> +extern void gsi_safe_insert_seq_before (gimple_stmt_iterator *, gimple_seq);
>  extern void gsi_commit_edge_inserts (void);
>  extern void gsi_commit_one_edge_insert (edge, basic_block *);
>  extern gphi_iterator gsi_start_phis (basic_block);
> --- gcc/gimple-iterator.cc.jj 2024-03-12 10:15:41.209530471 +0100
> +++ gcc/gimple-iterator.cc2024-03-12 15:29:17.814171376 +0100
> @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-cfg.h"
>  #include "tree-ssa.h"
>  #include "value-prof.h"
> +#include "gimplify.h"
>  
>  
>  /* Mark the statement STMT as modified, and update it.  */
> @@ -944,3 +945,137 @@ gsi_start_phis (basic_block bb)
>  
>return i;
>  }
> +
> +/* Helper function for gsi_safe_insert_before and gsi_safe_insert_seq_before.
> +   Find edge to insert statements before returns_twice call at the start of 
> BB,
> +   if there isn't just one, split the bb and adjust PHIs to ensure that.  */
> +
> +static edge
> +edge_before_returns_twice_call (basic_block bb)
> +{
> +  gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
> +  gcc_checking_assert (is_gimple_call (gsi_stmt (gsi))
> +&& (gimple_call_flags (gsi_stmt (gsi))
> +& ECF_RETURNS_TWICE) != 0);
> +  edge_iterator ei;
> +  edge e, ad_edge = NULL, other_edge = NULL;
> +  bool split = false;
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +{
> +  if ((e->flags & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL)
> + {
> +   gimple_stmt_iterator gsi
> + = gsi_start_nondebug_after_labels_bb (e->src);
> +   gimple *ad = gsi_stmt (gsi);
> +   if (ad && gimple_call_internal_p (ad, IFN_ABNORMAL_DISPATCHER))
> + {
> +   gcc_checking_assert (ad_edge == NULL);
> +   ad_edge = e;
> +   continue;
> + }
> + }
> +  if (other_edge || e->flags & (EDGE_ABNORMAL | EDGE_EH))
> + split = true;
> +  other_edge = e;
> +}
> +  gcc_checking_assert (ad_edge);
> +  if (other_edge == NULL)
> +split = true;
> +  if (split)
> +{
> +  other_edge = split_block_after_labels (bb);

Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

2024-03-13 Thread Richard Biener

On Tue, 12 Mar 2024, Jakub Jelinek wrote:

> On Tue, Mar 12, 2024 at 05:21:58PM +0100, Jakub Jelinek wrote:
> > On Tue, Mar 12, 2024 at 10:46:42AM +0100, Jan Hubicka wrote:
> > > I am sorry for delaying this.  I made the variant that simply compares
> > > value range of functions and prevents merging if they diverge and wanted
> > > to make some bigger statistics.  This made me notice some performance
> > > problems on clang performance and libstdc++ RB-trees which disrailed me
> > > from the original PR.  I will finish the statistics today.
> > 
> > With the posted patch, perhaps if we don't want to union jump_tables etc.,
> > all we could punt on is differences in the jump_table VRs rather than just
> > any SSA_NAME_RANGE_INFO differences.
> 
> To expand on this, I think we need to either union or punt on jump_func
> differences in any case, because for LTO we can't really punt on
> SSA_NAME_RANGE_INFO differences given that we don't stream that out and in.
> So the ipa_jump_func are I think the only thing that actually can differ
> on the ICF merging candidates from value range POV.

I agree.  Btw, I would have approved the original patch in this
thread that wipes SSA_NAME_INFO in merged bodies to mimic what LTO
effectively does right now.  That also looks most sensible to
backport.

But I'll defer to Honza in the end (but also want to point out we
need something suitable for backporting).

Richard.

No rule to make target '../libbacktrace/libbacktrace.la', needed by 'libgo.la'. [PR106472]

2024-03-13 Thread Дилян Палаузов

Non-parallel build can fail, depending on the ./configure parameters - 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106472 .


The change below does fix the problem.

I do not understand the build system to say, that this is the best 
approach, so if there are questions I might or might not be able to 
answer them.


I tried different things, this worked on the releases/gcc-13 branch.  On 
the master branch last weekend the problem was that stage2 and stage3 
results are not equal, so I have not verified this change there.  
depend= in Makefile.def seem to have only effect if bootstrapping is 
involved and gcc/go/config-lang.in does not have boot_language=yes .  
The lines below are present in the Makefile.in:@unless gcc-bootstrap 
snippet.  Actually I think ./configure --enable-languages=all and then 
serial build work, because this implied D and it does imply 
bootstrapping for libbacktrace and libatomic.  I also do not want to 
invest much more time on this.


I do not know, if 2×`maybe-`  is necessary.


diff --git a/Makefile.in b/Makefile.in
index 06a9398e172..236e5cda942 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -66481,6 +66481,7 @@ configure-target-libgfortran: 
maybe-all-target-libquadmath



 @if gcc-bootstrap
+all-target-libgo: maybe-all-target-libbacktrace 
maybe-all-target-libatomic

 configure-gnattools: stage_last
 configure-libcc1: stage_last
 configure-c++tools: stage_last
diff --git a/Makefile.tpl b/Makefile.tpl
index dfbd74b68f8..98160c7626b 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -1952,7 +1952,7 @@ configure-target-[+module+]: maybe-all-gcc[+
(define dep-maybe (lambda ()
   (if (exist? "hard") "" "maybe-")))

-   ;; dep-kind returns returns "prebootstrap" for configure or build
+   ;; dep-kind returns "prebootstrap" for configure or build
;; dependencies of bootstrapped modules on a build module
;; (e.g. all-gcc on all-build-bison); "normal" if the dependency is
;; on an "install" target, or if the dependence module is not
@@ -2017,6 +2017,7 @@ configure-target-[+module+]: maybe-all-gcc[+
 [+ ESAC +][+ ENDFOR dependencies +]

 @if gcc-bootstrap
+all-target-libgo: maybe-all-target-libbacktrace 
maybe-all-target-libatomic

 [+ FOR dependencies +][+ CASE (dep-kind) +]
 [+ == "postbootstrap" +][+ (make-postboot-dep) +][+ ESAC +][+
 ENDFOR dependencies +]@endif gcc-bootstrap

62 matches

Mail list logo