[PATCH] RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]

2023-07-27 Thread Li Xu
From: xuli 

Computation of `vsadd`, `vsaddu`, `vssub`, and `vssubu` do not need the
rounding mode, therefore the intrinsics of these instructions do not have
the parameter for rounding mode control.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: remove rounding mode of 
vsadd[u] and vssub[u].
* config/riscv/vector.md: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/bug-12.C: Adapt testcase.
* g++.target/riscv/rvv/base/bug-14.C: Ditto.
* g++.target/riscv/rvv/base/bug-18.C: Ditto.
* g++.target/riscv/rvv/base/bug-19.C: Ditto.
* g++.target/riscv/rvv/base/bug-20.C: Ditto.
* g++.target/riscv/rvv/base/bug-21.C: Ditto.
* g++.target/riscv/rvv/base/bug-22.C: Ditto.
* g++.target/riscv/rvv/base/bug-23.C: Ditto.
* g++.target/riscv/rvv/base/bug-3.C: Ditto.
* g++.target/riscv/rvv/base/bug-8.C: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto.
* gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto.
* gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto.
* gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c: New test.
* gcc.target/riscv/rvv/base/fixed-point-vxrm.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  6 --
 gcc/config/riscv/vector.md| 42 +++---
 .../g++.target/riscv/rvv/base/bug-12.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-14.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-18.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-19.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-20.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-21.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-22.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-23.C|  2 +-
 .../g++.target/riscv/rvv/base/bug-3.C |  2 +-
 .../g++.target/riscv/rvv/base/bug-8.C |  2 +-
 .../riscv/rvv/base/binop_vx_constraint-100.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-101.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-102.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-103.c  | 28 +++
 .../riscv/rvv/base/binop_vx_constraint-104.c  | 16 ++--
 .../riscv/rvv/base/binop_vx_constraint-105.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-106.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-107.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-108.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-109.c  | 28 +++
 .../riscv/rvv/base/binop_vx_constraint-110.c  | 16 ++--
 .../riscv/rvv/base/binop_vx_constraint-111.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-112.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-113.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-114.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-115.c  | 16 ++--
 .../riscv/rvv/base/binop_vx_constraint-116.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-117.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-118.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-119.c  |  4 +-
 .../riscv/rvv/base/binop_vx_constraint-97.c   | 28 +++
 .../riscv/rvv/base/binop_vx_constraint-98.c   | 16 ++--
 .../riscv/rvv/base/fixed-point-vxrm-error.c   | 24 ++
 .../riscv/rvv/base/fixed-point-vxrm.c | 81 +++
 .../riscv/rvv/base/merge_constraint-1.c   |  4 +-
 37 files changed, 233 insertions(+), 152 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/fixed-point-vxrm.c

diff --git 

[Bug sanitizer/110835] [13/14 Regression] -fsanitize=address causes huge runtime slowdown from std::rethrow_exception not called

2023-07-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835

--- Comment #5 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > Which might mean it is an issue in LLVM too ...
> 
> Yes the same runtime regression shows up between clang 15 and clang 16. This
> should reported upstream to them too.

What is interesting is that with -stdlib=libc++ the regression for clang/LLVM
shows up between their 14 and 15 releases.

Anyways this should be filed upstream ...

[Bug sanitizer/110835] [13/14 Regression] -fsanitize=address causes huge runtime slowdown from std::rethrow_exception not called

2023-07-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835

Andrew Pinski  changed:

   What|Removed |Added

Summary|-fsanitize=address causes   |[13/14 Regression]
   |slowdown from   |-fsanitize=address causes
   |std::rethrow_exception not  |huge runtime slowdown from
   |called  |std::rethrow_exception not
   ||called
   Target Milestone|--- |13.3
   Last reconfirmed||2023-07-28
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #4 from Andrew Pinski  ---
Confirmed.

[Bug sanitizer/110835] -fsanitize=address causes slowdown from std::rethrow_exception not called

2023-07-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835

--- Comment #3 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #2)
> Which might mean it is an issue in LLVM too ...

Yes the same runtime regression shows up between clang 15 and clang 16. This
should reported upstream to them too.

[Bug sanitizer/110835] -fsanitize=address causes slowdown from std::rethrow_exception not called

2023-07-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835

Andrew Pinski  changed:

   What|Removed |Added

 CC||dodji at gcc dot gnu.org,
   ||dvyukov at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org,
   ||kcc at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org
  Component|c++ |sanitizer

--- Comment #2 from Andrew Pinski  ---
The code generation does not look any difference ...
So I am suspecting this was a library change.
Which might mean it is an issue in LLVM too ...

Re: LRA for avr: Handling hard regs set directly at expand

2023-07-27 Thread SenthilKumar.Selvaraj--- via Gcc
On Thu, 2023-07-27 at 15:11 +0200, Georg-Johann Lay wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> Am 17.07.23 um 13:33 schrieb SenthilKumar.Selvaraj--- via Gcc:
> > Hi,
> > 
> >The avr target has a bunch of patterns that directly set hard regs at 
> > expand time, like so
> 
> The correct approach would be to use usual predicates together with
> constraints that describe the register instead of hard regs, e.g.
> (match_operand:HI n "register_operand" "R18_2") for a 2-byte register
> that starts at R18 instead of (reg:HI 18).  I deprecated and removed
> constraints starting with "R" long ago in order to get "R" free for that
> purpose.
> 
> Some years ago I tried such constraints (and hence also zoo of new
> register classes that are required to accommodate them).  The resulting
> code quality was so bad that I quickly abandoned that approach, and IIRC
> there were also spill fails.  Appears that reload / ira was overwhelmed
> by the multitude of new reg classes and took sub-optimal decisions.
> 
> The way out was more of explicit hard regs in expand, together with
> awkward functionalities like avr_fix_operands (PR63633) and the
> functions that use it.  That way we get correct code without performance
> penalties in unrelated places.
> 
> Most of such insns are explicitly modelling hand-written asm functions
> in libgcc, because most of these functions have a footprint smaller than
> the default ABI.  And some functions have an interface not complying to
> default ABI.
> 
> For the case of cpymem etc from below, explicit hard registers were used
> because register allocator did a bad job when using constraints like "e"
> (X, Y, or Z).

I guessed that much. Yes, using constraints works - I used "x" and "z" that
directly correspond to REG_X and REG_Z (ignore the weird operand numbering).

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index be0f8dcbe0e..6c6c4e4e212 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -1148,20 +1148,20 @@
 ;; "cpymem_qi"
 ;; "cpymem_hi"
 (define_insn_and_split "cpymem_"
-  [(set (mem:BLK (reg:HI REG_X))
-(mem:BLK (reg:HI REG_Z)))
+  [(set (mem:BLK (match_operand:HI 3 "register_operand" "+x"))
+(mem:BLK (match_operand:HI 4 "register_operand" "+z")))
(unspec [(match_operand:QI 0 "const_int_operand" "n")]
UNSPEC_CPYMEM)
(use (match_operand:QIHI 1 "register_operand" ""))
-   (clobber (reg:HI REG_X))
-   (clobber (reg:HI REG_Z))
+   (clobber (match_dup 3))
+   (clobber (match_dup 4))
(clobber (reg:QI LPM_REGNO))
(clobber (match_operand:QIHI 2 "register_operand" "=1"))]
   ""
   "#"
   "&& reload_completed"
-  [(parallel [(set (mem:BLK (reg:HI REG_X))
-   (mem:BLK (reg:HI REG_Z)))
+  [(parallel [(set (mem:BLK (match_dup 3))
+   (mem:BLK (match_dup 4)))
   (unspec [(match_dup 0)]
   UNSPEC_CPYMEM)
   (use (match_dup 1))

I know you did these changes a long time ago, but do you happen to have any
test cases lying around that I can use to see if LRA does a better job than
classic reload?

Vladimir, given that classic reload handled such hardcoded hard regs just 
fine, should LRA also be able to deal with them the same way? Or is this
something that LRA is not going to support?

Regards
Senthil

> 
> Johann
> 
> 
> > (define_expand "cpymemhi"
> >[(parallel [(set (match_operand:BLK 0 "memory_operand" "")
> > (match_operand:BLK 1 "memory_operand" ""))
> >(use (match_operand:HI 2 "const_int_operand" ""))
> >(use (match_operand:HI 3 "const_int_operand" ""))])]
> >""
> >{
> >  if (avr_emit_cpymemhi (operands))
> >DONE;
> > 
> >  FAIL;
> >})
> > 
> > where avr_emit_cpymemhi generates
> > 
> > (insn 14 13 15 4 (set (reg:HI 30 r30)
> >  (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1
> >   (nil))
> > (insn 15 14 16 4 (set (reg:HI 26 r26)
> >  (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1
> >   (nil))
> > (insn 16 15 17 4 (parallel [
> >  (set (mem:BLK (reg:HI 26 r26) [0  A8])
> >  (mem:BLK (reg:HI 30 r30) [0  A8]))
> >  (unspec [
> >  (const_int 0 [0])
> >  ] UNSPEC_CPYMEM)
> >  (use (reg:QI 52))
> >  (clobber (reg:HI 26 r26))
> >  (clobber (reg:HI 30 r30))
> >  (clobber (reg:QI 0 r0))
> >  (clobber (reg:QI 52))
> >  ]) "pr53505.c":21:22 -1
> >   (nil))
> > 
> > Classic reload knows about these - find_reg masks out bad_spill_regs, and 
> > bad_spill_regs
> > when ORed with chain->live_throughout in order_regs_for_reload picks up r30.
> > 
> > LRA, however, appears to not consider that, and proceeds to use such regs 
> > as reload regs.
> > For the same source, it generates
> > 
> >   Choosing alt 0 in insn 15:  (0) =r  (1) r 

[Bug c++/110835] -fsanitize=address causes slowdown from std::rethrow_exception not called

2023-07-27 Thread ed at catmur dot uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835

--- Comment #1 from Ed Catmur  ---
Motivation is
https://github.com/boostorg/exception/blob/b039b4ea18ef752d0c1684b3f715ce493b778060/include/boost/exception/detail/exception_ptr.hpp#L550
; the half-reduced code is:

#include 
struct S {};
int main() {
auto ep = boost::copy_exception(S());
for (int i = 0; i != 10; ++i)
try { boost::rethrow_exception(ep); } catch (...) {}
}

[Bug c++/110835] New: -fsanitize=address causes slowdown from std::rethrow_exception not called

2023-07-27 Thread ed at catmur dot uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835

Bug ID: 110835
   Summary: -fsanitize=address causes slowdown from
std::rethrow_exception not called
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ed at catmur dot uk
  Target Milestone: ---

#include 
std::exception_ptr p;
void f() {
  try { throw 1; } catch(char) { std::rethrow_exception(p); }
}
int main() {
  for (int i = 0; i != 10; ++i)
try { f(); } catch (...) { }
}

Compiled with -fsanitize=address (and at -O0 through -O3), this is roughly 30x
slower under gcc 13 than under gcc 12 (4.7s vs 0.15s on my Core i7 3 GHz).

Note that the std::rethrow_exception() is not called, but is still essential to
exhibit the bug. Also `f` needs to be a separate function (and not `static`).
At low optimization levels it can be an iife.

[PATCH v3 2/2] libstdc++: Use _GLIBCXX_HAS_BUILTIN_TRAIT

2023-07-27 Thread Ken Matsui via Gcc-patches
This patch uses _GLIBCXX_HAS_BUILTIN_TRAIT macro instead of
__has_builtin in the type_traits header. This macro supports to toggle
the use of built-in traits in the type_traits header through
_GLIBCXX_NO_BUILTIN_TRAITS macro, without needing to modify the
source code.

libstdc++-v3/ChangeLog:

* include/std/type_traits (__has_builtin): Replace with ...
(_GLIBCXX_HAS_BUILTIN): ... this.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 9f086992ebc..12423361b6e 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1411,7 +1411,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public __bool_constant<__is_base_of(_Base, _Derived)>
 { };
 
-#if __has_builtin(__is_convertible)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_convertible)
   template
 struct is_convertible
 : public __bool_constant<__is_convertible(_From, _To)>
@@ -1462,7 +1462,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if __cplusplus >= 202002L
 #define __cpp_lib_is_nothrow_convertible 201806L
 
-#if __has_builtin(__is_nothrow_convertible)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_nothrow_convertible)
   /// is_nothrow_convertible_v
   template
 inline constexpr bool is_nothrow_convertible_v
@@ -1537,7 +1537,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { using type = _Tp; };
 
   /// remove_cv
-#if __has_builtin(__remove_cv)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_cv)
   template
 struct remove_cv
 { using type = __remove_cv(_Tp); };
@@ -1606,7 +1606,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Reference transformations.
 
   /// remove_reference
-#if __has_builtin(__remove_reference)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_reference)
   template
 struct remove_reference
 { using type = __remove_reference(_Tp); };
@@ -2963,7 +2963,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template(_S_get())),
   typename = decltype(_S_conv<_Tp>(_S_get())),
-#if __has_builtin(__reference_converts_from_temporary)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__reference_converts_from_temporary)
   bool _Dangle = __reference_converts_from_temporary(_Tp, _Res_t)
 #else
   bool _Dangle = false
@@ -3420,7 +3420,7 @@ template
*/
 #define __cpp_lib_remove_cvref 201711L
 
-#if __has_builtin(__remove_cvref)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_cvref)
   template
 struct remove_cvref
 { using type = __remove_cvref(_Tp); };
@@ -3515,7 +3515,7 @@ template
 : public bool_constant>
 { };
 
-#if __has_builtin(__is_layout_compatible)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_layout_compatible)
 
   /// @since C++20
   template
@@ -3529,7 +3529,7 @@ template
 constexpr bool is_layout_compatible_v
   = __is_layout_compatible(_Tp, _Up);
 
-#if __has_builtin(__builtin_is_corresponding_member)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__builtin_is_corresponding_member)
 #define __cpp_lib_is_layout_compatible 201907L
 
   /// @since C++20
@@ -3540,7 +3540,7 @@ template
 #endif
 #endif
 
-#if __has_builtin(__is_pointer_interconvertible_base_of)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_pointer_interconvertible_base_of)
   /// True if `_Derived` is standard-layout and has a base class of type 
`_Base`
   /// @since C++20
   template
@@ -3554,7 +3554,7 @@ template
 constexpr bool is_pointer_interconvertible_base_of_v
   = __is_pointer_interconvertible_base_of(_Base, _Derived);
 
-#if __has_builtin(__builtin_is_pointer_interconvertible_with_class)
+#if 
_GLIBCXX_HAS_BUILTIN_TRAIT(__builtin_is_pointer_interconvertible_with_class)
 #define __cpp_lib_is_pointer_interconvertible 201907L
 
   /// True if `__mp` points to the first member of a standard-layout type
@@ -3590,8 +3590,8 @@ template
   template
 inline constexpr bool is_scoped_enum_v = is_scoped_enum<_Tp>::value;
 
-#if __has_builtin(__reference_constructs_from_temporary) \
-  && __has_builtin(__reference_converts_from_temporary)
+#if _GLIBCXX_HAS_BUILTIN_TRAIT(__reference_constructs_from_temporary) \
+  && _GLIBCXX_HAS_BUILTIN_TRAIT(__reference_converts_from_temporary)
 
 #define __cpp_lib_reference_from_temporary 202202L
 
@@ -3632,7 +3632,7 @@ template
   template
 inline constexpr bool reference_converts_from_temporary_v
   = reference_converts_from_temporary<_Tp, _Up>::value;
-#endif // __has_builtin for reference_from_temporary
+#endif // _GLIBCXX_HAS_BUILTIN_TRAIT for reference_from_temporary
 #endif // C++23
 
 #if _GLIBCXX_HAVE_IS_CONSTANT_EVALUATED
-- 
2.41.0



[PATCH v3 1/2] libstdc++: Define _GLIBCXX_HAS_BUILTIN_TRAIT

2023-07-27 Thread Ken Matsui via Gcc-patches
This patch defines _GLIBCXX_HAS_BUILTIN_TRAIT macro, which will be used
as a flag to toggle the use of built-in traits in the type_traits header
through _GLIBCXX_NO_BUILTIN_TRAITS macro, without needing to modify the
source code.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_HAS_BUILTIN_TRAIT): Define.
(_GLIBCXX_HAS_BUILTIN): Keep defined.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/bits/c++config | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index dd47f274d5f..984985d6fff 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -854,7 +854,15 @@ namespace __gnu_cxx
 # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
 #endif
 
-#undef _GLIBCXX_HAS_BUILTIN
+// Returns 1 if _GLIBCXX_NO_BUILTIN_TRAITS is not defined and the compiler
+// has a corresponding built-in type trait, 0 otherwise.
+// _GLIBCXX_NO_BUILTIN_TRAITS can be defined to disable the use of built-in
+// traits.
+#ifndef _GLIBCXX_NO_BUILTIN_TRAITS
+# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) _GLIBCXX_HAS_BUILTIN(BT)
+#else
+# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) 0
+#endif
 
 // Mark code that should be ignored by the compiler, but seen by Doxygen.
 #define _GLIBCXX_DOXYGEN_ONLY(X)
-- 
2.41.0



Order Inquiry - Port of Tampa

2023-07-27 Thread Samantha Palmer via Gcc-bugs


QR_001_July_23.xls
Description: MS-Excel spreadsheet


Re: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative

2023-07-27 Thread Demin Han
Sorry for not consider rv32 config.
The fix is OK. If convenient, please commit it.

On 2023/7/28 4:46, Patrick O'Neill wrote:
> The newly added testcase fails on rv32 targets with this message:
> FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test 
> for excess errors)
> 
> verbose log:
> compiler exited with status 1
> output is:
> cc1: error: ABI requires '-march=rv32'
> 
> Something like this appears to fix the issue:
> 
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
> index 14a9802667e..e10a9e9d0f5 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce 
> --param riscv-autovec-preference=scalable" } */
> +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3
> -fno-cprop-registers -fno-dce --param riscv-autovec-preference=scalable"
>  } */
>  
>  long
>  foo (long *__restrict a, long *__restrict b, long n)
> 
> On 7/27/23 04:57, Kito Cheng via Gcc-patches wrote:
> 
>> My first impression is those emit_insn (gen_rtx_SET()) seems
>> necessary, but I got the point after I checked vector.md :P
>>
>> Committed to trunk, thanks :)
>>
>>
>> On Thu, Jul 27, 2023 at 6:23 pmjuzhe.zh...@rivai.ai
>>   wrote:
>>> Oh, YES.
>>>
>>> Thanks for fixing it. It makes sense since the ternary operations in 
>>> "vector.md"
>>> generate "vmv.v.v" according to RA.
>>>
>>> Thanks for fixing it.
>>>
>>> @kito: Could you confirm it? If it's ok to you, commit it for Han (I am 
>>> lazy to commit patches :).
>>>
>>>
>>>
>>> juzhe.zh...@rivai.ai
>>>
>>> From: demin.han
>>> Date: 2023-07-27 17:48
>>> To:gcc-patches@gcc.gnu.org
>>> CC:kito.ch...@gmail.com;juzhe.zh...@rivai.ai
>>> Subject: [PATCH] RISC-V: Fix uninitialized and redundant use of 
>>> which_alternative
>>> When pass split2 starts, which_alternative is random depending on
>>> last set of certain pass.
>>>
>>> Even initialized, the generated movement is redundant.
>>> The movement can be generated by assembly output template.
>>>
>>> Signed-off-by: demin.han
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/riscv/autovec.md: Delete which_alternative use in split
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/riscv/rvv/autovec/madd-split2-1.c: New test.
>>>
>>> ---
>>> gcc/config/riscv/autovec.md | 12 
>>> .../gcc.target/riscv/rvv/autovec/madd-split2-1.c    | 13 +
>>> 2 files changed, 13 insertions(+), 12 deletions(-)
>>> create mode 100644 
>>> gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
>>>
>>> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
>>> index d899922586a..b7ea3101f5a 100644
>>> --- a/gcc/config/riscv/autovec.md
>>> +++ b/gcc/config/riscv/autovec.md
>>> @@ -1012,8 +1012,6 @@ (define_insn_and_split "*fma"
>>>     [(const_int 0)]
>>>     {
>>>   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
>>> -    if (which_alternative == 2)
>>> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>>>   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
>>> operands[0]};
>>>   riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus 
>>> (mode),
>>>     riscv_vector::RVV_TERNOP, ops, operands[4]);
>>> @@ -1058,8 +1056,6 @@ (define_insn_and_split "*fnma"
>>>     [(const_int 0)]
>>>     {
>>>   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
>>> -    if (which_alternative == 2)
>>> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>>>   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
>>> operands[0]};
>>>   riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
>>> (mode),
>>>  riscv_vector::RVV_TERNOP, ops, operands[4]);
>>> @@ -1102,8 +1098,6 @@ (define_insn_and_split "*fma"
>>>     [(const_int 0)]
>>>     {
>>>   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
>>> -    if (which_alternative == 2)
>>> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>>>   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
>>> operands[0]};
>>>   riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, 
>>> mode),
>>>    riscv_vector::RVV_TERNOP, ops, operands[4]);
>>> @@ -1148,8 +1142,6 @@ (define_insn_and_split "*fnma"
>>>     [(const_int 0)]
>>>     {
>>>   riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
>>> -    if (which_alternative == 2)
>>> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
>>>   rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
>>> operands[0]};
>>>   riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg 
>>> (PLUS, mode),
>>>    riscv_vector::RVV_TERNOP, ops, operands[4]);
>>> @@ -1194,8 +1186,6 @@ (define_insn_and_split "*fms"
>>>     [(const_int 0)]
>>>   

[Bug target/110788] Spilling to mask register for GPR vec_duplicate

2023-07-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788

--- Comment #6 from Hongtao.liu  ---
Fixed in trunk.

[Bug modula2/108121] Failing tests on x86_64-linux-gnu

2023-07-27 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108121

Gaius Mulley  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #12 from Gaius Mulley  ---
Closing now that the patch has been applied on the gcc-13 branch.

Re: [C++] [Coroutines] Does GCC want to support `-fno-coroutines`?

2023-07-27 Thread Andrew Pinski via Gcc
On Thu, Jul 27, 2023 at 7:11 PM chuanqi.xcq via Gcc  wrote:
>
> Hi,
>  We're discussing to implement `-fno-coroutines` in clang so that we can 
> disable the coroutine feature with C++ standard higher than 20.
> A full discussion can be found here: https://reviews.llvm.org/D156247. A 
> major motivation for us to do this is to keep consistency with GCC.
> However, we don't find `-fno-coroutines` in 
> https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/C_002b_002b-Dialect-Options.html#index-fcoroutines.
> Then we're not sure if GCC intends to support it. And we want to ask opinions 
> from GCC developers for `-fno-coroutines`.

It is already supported.
Read https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Invoking-GCC.html which says:
```
Many options have long names starting with ‘-f’ or with ‘-W’—for
example, -fmove-loop-invariants, -Wformat and so on. Most of these
have both positive and negative forms; the negative form of -ffoo is
-fno-foo. This manual documents only one of these two forms, whichever
one is not the default.
```

Thanks,
Andrew

> Thanks,
> Chuanqi


[Bug modula2/108121] Failing tests on x86_64-linux-gnu

2023-07-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108121

--- Comment #11 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Gaius Mulley
:

https://gcc.gnu.org/g:50fc6ce0cb8edf927ae6117a5484e4d8d52e393e

commit r13-7619-g50fc6ce0cb8edf927ae6117a5484e4d8d52e393e
Author: Gaius Mulley 
Date:   Fri Jul 28 03:10:01 2023 +0100

PR modula2/108121 Re-implement overflow detection for constant literals

This patch fixes the overflow detection for constant literals.
The ZTYPE is changed to int128 (or int64) if int128 is unavailable and
constant literals are built from widest_int.  The widest_int is converted
into the tree type and checked for overflow.
m2expr_interpret_integer and append_m2_digit are removed.

gcc/m2/ChangeLog:

PR modula2/108121
* gm2-compiler/M2ALU.mod (Less): Reformatted.
* gm2-compiler/SymbolTable.mod (DetermineSizeOfConstant): Remove
from import.
(ConstantStringExceedsZType): Import.
(GetConstLitType): Re-implement using ConstantStringExceedsZType.
* gm2-gcc/m2decl.cc (m2decl_DetermineSizeOfConstant): Remove.
(m2decl_ConstantStringExceedsZType): New function.
(m2decl_BuildConstLiteralNumber): Re-implement.
* gm2-gcc/m2decl.def (DetermineSizeOfConstant): Remove.
(ConstantStringExceedsZType): New function.
* gm2-gcc/m2decl.h (m2decl_DetermineSizeOfConstant): Remove.
(m2decl_ConstantStringExceedsZType): New function.
* gm2-gcc/m2expr.cc (append_digit): Remove.
(m2expr_interpret_integer): Remove.
(append_m2_digit): Remove.
(m2expr_StrToWideInt): New function.
(m2expr_interpret_m2_integer): Remove.
* gm2-gcc/m2expr.def (CheckConstStrZtypeRange): New function.
* gm2-gcc/m2expr.h (m2expr_StrToWideInt): New function.
* gm2-gcc/m2type.cc (build_m2_word64_type_node): New function.
(build_m2_ztype_node): New function.
(m2type_InitBaseTypes): Call build_m2_ztype_node.
* gm2-lang.cc (gm2_type_for_size): Re-write using early returns.

gcc/testsuite/ChangeLog:

PR modula2/108121
* gm2/pim/fail/largeconst.mod: Increased constant value test
to fail now that cc1gm2 uses widest_int to represent a ZTYPE.
* gm2/pim/fail/largeconst2.mod: New test.

(cherry picked from commit 68201409bc2867da45791331e385198826fa4576)

Signed-off-by: Gaius Mulley 

[C++] [Coroutines] Does GCC want to support `-fno-coroutines`?

2023-07-27 Thread chuanqi.xcq via Gcc
Hi,
 We're discussing to implement `-fno-coroutines` in clang so that we can 
disable the coroutine feature with C++ standard higher than 20.
A full discussion can be found here: https://reviews.llvm.org/D156247. A major 
motivation for us to do this is to keep consistency with GCC.
However, we don't find `-fno-coroutines` in 
https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/C_002b_002b-Dialect-Options.html#index-fcoroutines.
Then we're not sure if GCC intends to support it. And we want to ask opinions 
from GCC developers for `-fno-coroutines`.
Thanks,
Chuanqi


[Bug target/110788] Spilling to mask register for GPR vec_duplicate

2023-07-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788

--- Comment #5 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:54e54f77c1012ab53126314181c51eaee146ad5d

commit r14-2833-g54e54f77c1012ab53126314181c51eaee146ad5d
Author: liuhongt 
Date:   Thu Jul 27 15:14:39 2023 +0800

Add UNSPEC_MASKOP to vpbroadcastm pattern.

Prevent rtl optimization of vec_duplicate + zero_extend to
vpbroadcastm since there could be an extra kmov after RA.

gcc/ChangeLog:

PR target/110788
* config/i386/sse.md (avx512cd_maskb_vec_dup): Add
UNSPEC_MASKOP.
(avx512cd_maskw_vec_dup): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110788.c: New test.

Re: [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-27 Thread Jason Merrill via Gcc-patches

On 7/27/23 18:59, Lewis Hyatt wrote:

In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
inform the preprocessor about any tokens it won't be aware of.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

* c-common.h (c_init_preprocess): Declare.
(c_lex_enable_token_streaming): Declare.
* c-opts.cc (c_common_init): Call c_init_preprocess ().
* c-lex.cc (stream_tokens_to_preprocessor): New static variable.
(c_lex_enable_token_streaming): New function.
(cb_def_pragma): Add a comment.
(get_token): New function wrapping cpp_get_token.
(c_lex_with_flags): Use the new wrapper function to support
obtaining tokens in preprocess_only mode.
(lex_string): Likewise.
* c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
when needed.
* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
(pragma_diagnostic_lex): ...this.
(pragma_diagnostic_lex_pp): Remove.
(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
all modes.
(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
usage.
* c-pragma.h (pragma_lex_discard_to_eol): Declare.

gcc/c/ChangeLog:

* c-parser.cc (pragma_lex_discard_to_eol): New function.
(c_init_preprocess): New function.

gcc/cp/ChangeLog:

* parser.cc (c_init_preprocess): New function.
(maybe_read_tokens_for_pragma_lex): New function.
(pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New function.
---

Notes:
 Hello-
 
 Here is version 2 of the patch, incorporating Jason's feedback from

 https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html
 
 Thanks again, please let me know if it's OK? Bootstrap + regtest all

 languages on x86-64 Linux looks good.
 
 -Lewis


  gcc/c-family/c-common.h|  4 +++
  gcc/c-family/c-lex.cc  | 49 +
  gcc/c-family/c-opts.cc |  1 +
  gcc/c-family/c-ppoutput.cc | 17 +---
  gcc/c-family/c-pragma.cc   | 56 ++
  gcc/c-family/c-pragma.h|  2 ++
  gcc/c/c-parser.cc  | 21 ++
  gcc/cp/parser.cc   | 45 ++
  8 files changed, 138 insertions(+), 57 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..2fe2f194660 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,9 @@ extern void c_parse_file (void);
  
  extern void c_parse_final_cleanups (void);
  
+/* This initializes for preprocess-only mode.  */

+extern void c_init_preprocess (void);
+
  /* These macros provide convenient access to the various _STMT nodes.  */
  
  /* Nonzero if a given STATEMENT_LIST represents the outermost binding

@@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree);
  /* In c-lex.cc.  */
  extern enum cpp_ttype
  conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
+extern void c_lex_enable_token_streaming (bool enabled);
  
  /* In c-pch.cc  */

  extern void pch_init (void);
diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index dcd061c7cb1..ac4c018d863 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const 
cpp_string *);
  static void cb_def_pragma (cpp_reader *, unsigned int);
  static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
  static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
+
+/* Flag to remember if we are in a mode (such as flag_preprocess_only) in 

[PATCH v8] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-27 Thread Pan Li via Gcc-patches
From: Pan Li 

Update in PATCH v8:

1. Emit non-abnormal backup insn to edge.
2. Fix _after return when call.
3. Refine some run tests.
4. Cleanup code.

Original commit logs:

In basic dynamic rounding mode, we simply ignore call instructions and
we would like to take care of call in this PATCH.

During the call, the frm may be updated or keep as is. Thus, we must
make sure at least 2 things.

1. The static frm before call should not pollute the frm value in call.
2. The updated frm value in call should be sticky after call completed.

We will perfrom some steps to make above happen.

1. Mark call instruction with new mode DYN_CALL.
2. Mark the instruction after CALL from NONE to DYN.
3. When emit for a DYN_CALL, we will restore the frm value.
4. When emit from a DYN_CALL, we will backup the frm value.

Let's take a flow for this.

   +-+
   | Entry (DYN) | <- frrm a5
   +-+
  /   \
+---+ +---+
| VFADD | | VFADD RTZ |  <- fsrmi 1(RTZ)
+---+ +---+
  ||
+---+ +---+
| CALL  | | CALL  |  <- fsrm a5
+---+ +---+
  |   |
+---+ +---+
| SHIFT | <- frrm a5  | VFADD |  <- frrm a5
+---+ +---+
  |  /
+---+   /
| VFADD RUP | <- fsrm1 3(RUP)
+---+ /
   \ /
+-+
| Exit (DYN_EXIT) | <- fsrm a5
+-+

When call is the last insn of one bb, we take care of it when needed
for each insn by inserting one frm backup (frrm) insn to the end of
the current bb.

Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv.cc (DYNAMIC_FRM_RTL): New macro.
(STATIC_FRM_P): Ditto.
(struct mode_switching_info): New struct for mode switching.
(struct machine_function): Add new field mode switching.
(riscv_emit_frm_mode_set): Add DYN_CALL emit.
(riscv_frm_adjust_mode_after_call): New function for call mode.
(riscv_frm_emit_after_call_in_bb_end): New function for emit
insn when call as the end of bb.
(riscv_frm_mode_needed): New function for frm mode needed.
(frm_unknown_dynamic_p): Remove call check.
(riscv_mode_needed): Extrac function for frm.
(riscv_frm_mode_after): Add DYN_CALL after.
(riscv_mode_entry): Remove backup rtl initialization.
* config/riscv/vector.md (frm_mode): Add dyn_call.
(fsrmsi_restore_exit): Rename to _volatile.
(fsrmsi_restore_volatile): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: Adjust
test cases.
* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-run-2.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-frm-run-3.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-38.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-39.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-40.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-41.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-42.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-43.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-44.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-45.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-47.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-48.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-51.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-53.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-54.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-55.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-56.c: New test.
* 

Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-27 Thread Jason Merrill via Gcc

On 7/23/23 20:26, Ben Boeckel wrote:

On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote:

It occurs to me that the model I am envisioning is similar to CMake's object
libraries.  Object libraries are a convenient name for a bunch of object files.
IIUC they're linked by naming the individual object files (or I think the could
be implemented as a static lib linked with --whole-archive path/to/libfoo.a
-no-whole-archive.  But for this conversation consider them a bunch of separate
object files with a convenient group name.


Yes, `--whole-archive` would work great if it had any kind of
portability across CMake's platform set.


Consider also that object libraries could themselves contain object libraries (I
don't know of they can, but it seems like a useful concept).  Then one could
create an object library from a collection of object files and object libraries
(recursively).  CMake would handle the transitive gtaph.


I think this detail is relevant, but you can use
`$` as an `INTERFACE` sources and it would act
like that, but it is an explicit thing. Instead, `OBJECT` libraries
*only* provide their objects to targets that *directly* link them. If
not, given this:

 A (OBJECT library)
 B (library of some kind; links PUBLIC to A)
 C (links to B)

If `A` has things like linker flags (or, more likely, libraries) as part
of its usage requirements, C will get them on is link line. However, if
OBJECT files are transitive in the same way, the linker (on most
platforms at least) chokes because it now has duplicates of all of A's
symbols: those from the B library and those from A's objects on the link
line.


Now, allow an object library to itself have some kind of tangible, on-disk
representation.  *BUT* not like a static library -- it doesn't include the
object files.


Now that immediately maps onto modules.

CMI: Object library
Direct imports: Direct object libraries of an object library

This is why I don't understand the need explicitly indicate the indirect imports
of a CMI.  CMake knows them, because it knows the graph.


Sure, *CMake* knows them, but the *build tool* needs to be told
(typically `make` or `ninja`) because it is what is actually executing
the build graph. The way this is communicated is via `-MF` files and
that's what I'm providing in this patch. Note that `ninja` does not
allow rules to specify such dependencies for other rules than the one it
is reading the file for.


But since the direct imports need to be rebuilt themselves if the 
transitive imports change, the build graph should be the same whether or 
not the transitive imports are repeated?  Either way, if a transitive 
import changes you need to rebuild the direct import and then the importer.


I guess it shouldn't hurt to have the transitive imports in the -MF 
file, as long as they aren't also in the p1689 file, so I'm not 
particularly opposed to this change, but I don't see how it makes a 
practical difference.


Jason



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-27 Thread Jason Merrill via Gcc-patches

On 7/23/23 20:26, Ben Boeckel wrote:

On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote:

It occurs to me that the model I am envisioning is similar to CMake's object
libraries.  Object libraries are a convenient name for a bunch of object files.
IIUC they're linked by naming the individual object files (or I think the could
be implemented as a static lib linked with --whole-archive path/to/libfoo.a
-no-whole-archive.  But for this conversation consider them a bunch of separate
object files with a convenient group name.


Yes, `--whole-archive` would work great if it had any kind of
portability across CMake's platform set.


Consider also that object libraries could themselves contain object libraries (I
don't know of they can, but it seems like a useful concept).  Then one could
create an object library from a collection of object files and object libraries
(recursively).  CMake would handle the transitive gtaph.


I think this detail is relevant, but you can use
`$` as an `INTERFACE` sources and it would act
like that, but it is an explicit thing. Instead, `OBJECT` libraries
*only* provide their objects to targets that *directly* link them. If
not, given this:

 A (OBJECT library)
 B (library of some kind; links PUBLIC to A)
 C (links to B)

If `A` has things like linker flags (or, more likely, libraries) as part
of its usage requirements, C will get them on is link line. However, if
OBJECT files are transitive in the same way, the linker (on most
platforms at least) chokes because it now has duplicates of all of A's
symbols: those from the B library and those from A's objects on the link
line.


Now, allow an object library to itself have some kind of tangible, on-disk
representation.  *BUT* not like a static library -- it doesn't include the
object files.


Now that immediately maps onto modules.

CMI: Object library
Direct imports: Direct object libraries of an object library

This is why I don't understand the need explicitly indicate the indirect imports
of a CMI.  CMake knows them, because it knows the graph.


Sure, *CMake* knows them, but the *build tool* needs to be told
(typically `make` or `ninja`) because it is what is actually executing
the build graph. The way this is communicated is via `-MF` files and
that's what I'm providing in this patch. Note that `ninja` does not
allow rules to specify such dependencies for other rules than the one it
is reading the file for.


But since the direct imports need to be rebuilt themselves if the 
transitive imports change, the build graph should be the same whether or 
not the transitive imports are repeated?  Either way, if a transitive 
import changes you need to rebuild the direct import and then the importer.


I guess it shouldn't hurt to have the transitive imports in the -MF 
file, as long as they aren't also in the p1689 file, so I'm not 
particularly opposed to this change, but I don't see how it makes a 
practical difference.


Jason



[Bug middle-end/109986] missing fold (~a | b) ^ a => ~(a & b)

2023-07-27 Thread vanyacpp at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109986

--- Comment #5 from Ivan Sorokin  ---
(In reply to CVS Commits from comment #4)
> commit r14-2751-g2a3556376c69a1fb588dcf25225950575e42784f
> Author: Drew Ross 
> Co-authored-by: Jakub Jelinek 

Thank you!

[PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-27 Thread Lewis Hyatt via Gcc-patches
In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
inform the preprocessor about any tokens it won't be aware of.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

* c-common.h (c_init_preprocess): Declare.
(c_lex_enable_token_streaming): Declare.
* c-opts.cc (c_common_init): Call c_init_preprocess ().
* c-lex.cc (stream_tokens_to_preprocessor): New static variable.
(c_lex_enable_token_streaming): New function.
(cb_def_pragma): Add a comment.
(get_token): New function wrapping cpp_get_token.
(c_lex_with_flags): Use the new wrapper function to support
obtaining tokens in preprocess_only mode.
(lex_string): Likewise.
* c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
when needed.
* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
(pragma_diagnostic_lex): ...this.
(pragma_diagnostic_lex_pp): Remove.
(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
all modes.
(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
usage.
* c-pragma.h (pragma_lex_discard_to_eol): Declare.

gcc/c/ChangeLog:

* c-parser.cc (pragma_lex_discard_to_eol): New function.
(c_init_preprocess): New function.

gcc/cp/ChangeLog:

* parser.cc (c_init_preprocess): New function.
(maybe_read_tokens_for_pragma_lex): New function.
(pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New function.
---

Notes:
Hello-

Here is version 2 of the patch, incorporating Jason's feedback from
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html

Thanks again, please let me know if it's OK? Bootstrap + regtest all
languages on x86-64 Linux looks good.

-Lewis

 gcc/c-family/c-common.h|  4 +++
 gcc/c-family/c-lex.cc  | 49 +
 gcc/c-family/c-opts.cc |  1 +
 gcc/c-family/c-ppoutput.cc | 17 +---
 gcc/c-family/c-pragma.cc   | 56 ++
 gcc/c-family/c-pragma.h|  2 ++
 gcc/c/c-parser.cc  | 21 ++
 gcc/cp/parser.cc   | 45 ++
 8 files changed, 138 insertions(+), 57 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..2fe2f194660 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,9 @@ extern void c_parse_file (void);
 
 extern void c_parse_final_cleanups (void);
 
+/* This initializes for preprocess-only mode.  */
+extern void c_init_preprocess (void);
+
 /* These macros provide convenient access to the various _STMT nodes.  */
 
 /* Nonzero if a given STATEMENT_LIST represents the outermost binding
@@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree);
 /* In c-lex.cc.  */
 extern enum cpp_ttype
 conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
+extern void c_lex_enable_token_streaming (bool enabled);
 
 /* In c-pch.cc  */
 extern void pch_init (void);
diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index dcd061c7cb1..ac4c018d863 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const 
cpp_string *);
 static void cb_def_pragma (cpp_reader *, unsigned int);
 static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
 static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
+
+/* Flag to remember if we are in a mode (such as flag_preprocess_only) in which
+   tokens obtained here need to be streamed to the preprocessor.  */

Re: [PATCH] bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784]

2023-07-27 Thread David Faust via Gcc-patches



On 7/27/23 15:27, Jose E. Marchesi wrote:
> 
> Hi David.
> Thanks for the patch.
> 
>> BPF ISA V4 introduces sign-extending move and load operations.  This
>> patch makes the BPF backend generate those instructions, when enabled
>> and useful.
>>
>> A new option, -m[no-]smov gates generation of these instructions, and is
>> enabled by default for -mcpu=v4 and above.  Tests for the new
>> instructions and documentation for the new options are included.
>>
>> Tested on bpf-unknown-none.
>> OK?
>>
>> gcc/
>>
>>  * config/bpf/bpf.opt (msmov): New option.
>>  * config/bpf/bpf.cc (bpf_option_override): Handle it here.
>>  * config/bpf/bpf.md (*extendsidi2): New.
>>  (extendhidi2): New.
>>  (extendqidi2): New.
>>  (extendsisi2): New.
>>  (extendhisi2): New.
>>  (extendqisi2): New.
>>  * doc/invoke.texi (Option Summary): Add -msmov eBPF option.
>>  (eBPF Options): Add -m[no-]smov.  Document that -mcpu=v4
>>  also enables -msmov.
>>
>> gcc/testsuite/
>>
>>  * gcc.target/bpf/sload-1.c: New test.
>>  * gcc.target/bpf/sload-pseudoc-1.c: New test.
>>  * gcc.target/bpf/smov-1.c: New test.
>>  * gcc.target/bpf/smov-pseudoc-1.c: New test.
> 
> Looks like you forgot to mention the bugzilla PR in the changelog
> entries.  Would be nice to have them there so automatic updates happen
> in the bugzillas.

Good catch, thanks!

> 
> Other than that, OK.
> Thanks!

Pushed, with PRs added in the changelog and a tiny reword to the doc below.

> 
>> ---
>>  gcc/config/bpf/bpf.cc |  3 ++
>>  gcc/config/bpf/bpf.md | 50 +++
>>  gcc/config/bpf/bpf.opt|  4 ++
>>  gcc/doc/invoke.texi   |  9 +++-
>>  gcc/testsuite/gcc.target/bpf/sload-1.c| 16 ++
>>  .../gcc.target/bpf/sload-pseudoc-1.c  | 16 ++
>>  gcc/testsuite/gcc.target/bpf/smov-1.c | 18 +++
>>  gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c | 18 +++
>>  8 files changed, 133 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/sload-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/sload-pseudoc-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/smov-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c
>>
>> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
>> index 0e07b416add..b5b5674edbb 100644
>> --- a/gcc/config/bpf/bpf.cc
>> +++ b/gcc/config/bpf/bpf.cc
>> @@ -262,6 +262,9 @@ bpf_option_override (void)
>>if (bpf_has_sdiv == -1)
>>  bpf_has_sdiv = (bpf_isa >= ISA_V4);
>>  
>> +  if (bpf_has_smov == -1)
>> +bpf_has_smov = (bpf_isa >= ISA_V4);
>> +
>>/* Disable -fstack-protector as it is not supported in BPF.  */
>>if (flag_stack_protect)
>>  {
>> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
>> index 66436397bb7..a69a239b9d6 100644
>> --- a/gcc/config/bpf/bpf.md
>> +++ b/gcc/config/bpf/bpf.md
>> @@ -307,6 +307,56 @@ (define_expand "extendsidi2"
>>DONE;
>>  })
>>  
>> +;; ISA V4 introduces sign-extending move and load operations.
>> +
>> +(define_insn "*extendsidi2"
>> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
>> +(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,q")))]
>> +  "bpf_has_smov"
>> +  "@
>> +   {movs\t%0,%1,32|%0 = (s32) %1}
>> +   {ldxsw\t%0,%1|%0 = *(s32 *) (%1)}"
>> +  [(set_attr "type" "alu,ldx")])
>> +
>> +(define_insn "extendhidi2"
>> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
>> +(sign_extend:DI (match_operand:HI 1 "nonimmediate_operand" "r,q")))]
>> +  "bpf_has_smov"
>> +  "@
>> +   {movs\t%0,%1,16|%0 = (s16) %1}
>> +   {ldxsh\t%0,%1|%0 = *(s16 *) (%1)}"
>> +  [(set_attr "type" "alu,ldx")])
>> +
>> +(define_insn "extendqidi2"
>> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
>> +(sign_extend:DI (match_operand:QI 1 "nonimmediate_operand" "r,q")))]
>> +  "bpf_has_smov"
>> +  "@
>> +   {movs\t%0,%1,8|%0 = (s8) %1}
>> +   {ldxsb\t%0,%1|%0 = *(s8 *) (%1)}"
>> +  [(set_attr "type" "alu,ldx")])
>> +
>> +(define_insn "extendsisi2"
>> +  [(set (match_operand:SI 0 "register_operand" "=r")
>> +(sign_extend:SI (match_operand:SI 1 "register_operand" "r")))]
>> +  "bpf_has_smov"
>> +  "{movs32\t%0,%1,32|%w0 = (s32) %w1}"
>> +  [(set_attr "type" "alu")])
>> +
>> +(define_insn "extendhisi2"
>> +  [(set (match_operand:SI 0 "register_operand" "=r")
>> +(sign_extend:SI (match_operand:HI 1 "register_operand" "r")))]
>> +  "bpf_has_smov"
>> +  "{movs32\t%0,%1,16|%w0 = (s16) %w1}"
>> +  [(set_attr "type" "alu")])
>> +
>> +(define_insn "extendqisi2"
>> +  [(set (match_operand:SI 0 "register_operand" "=r")
>> +(sign_extend:SI (match_operand:QI 1 "register_operand" "r")))]
>> +  "bpf_has_smov"
>> +  "{movs32\t%0,%1,8|%w0 = (s8) %w1}"
>> +  [(set_attr "type" "alu")])
>> +
>>   Data movement
>>  
>>  (define_mode_iterator MM [QI HI SI DI SF DF])
>> diff --git 

gcc-11-20230727 is now available

2023-07-27 Thread GCC Administrator via Gcc
Snapshot gcc-11-20230727 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/11-20230727/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 11 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-11 revision 4286684bacd1189e38c1e6e087662152e0a306a1

You'll find:

 gcc-11-20230727.tar.xz   Complete GCC

  SHA256=144da96e72d5b5aa2e249596bef6b70b840f6ca1abac920d91b50d6e46c8aecd
  SHA1=811e8902343ba583f47e253d9b77d7b5080d3842

Diffs from 11-20230720 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-11
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug target/110782] bpf: make use of the V4 sign-extended load instructions

2023-07-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110782

--- Comment #1 from CVS Commits  ---
The master branch has been updated by David Faust :

https://gcc.gnu.org/g:14dab1a1bcc3f0315e33d166df06520fba409c9b

commit r14-2831-g14dab1a1bcc3f0315e33d166df06520fba409c9b
Author: David Faust 
Date:   Thu Jul 27 13:55:44 2023 -0700

bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784]

BPF ISA V4 introduces sign-extending move and load operations.  This
patch makes the BPF backend generate those instructions, when enabled
and useful.

A new option, -m[no-]smov gates generation of these instructions, and is
enabled by default for -mcpu=v4 and above.  Tests for the new
instructions and documentation for the new options are included.

PR target/110782
PR target/110784

gcc/

* config/bpf/bpf.opt (msmov): New option.
* config/bpf/bpf.cc (bpf_option_override): Handle it here.
* config/bpf/bpf.md (*extendsidi2): New.
(extendhidi2): New.
(extendqidi2): New.
(extendsisi2): New.
(extendhisi2): New.
(extendqisi2): New.
* doc/invoke.texi (Option Summary): Add -msmov eBPF option.
(eBPF Options): Add -m[no-]smov.  Document that -mcpu=v4
also enables -msmov.

gcc/testsuite/

* gcc.target/bpf/sload-1.c: New test.
* gcc.target/bpf/sload-pseudoc-1.c: New test.
* gcc.target/bpf/smov-1.c: New test.
* gcc.target/bpf/smov-pseudoc-1.c: New test.

[Bug target/110784] bpf: make use of the V4 sign-extended move instructions

2023-07-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110784

--- Comment #1 from CVS Commits  ---
The master branch has been updated by David Faust :

https://gcc.gnu.org/g:14dab1a1bcc3f0315e33d166df06520fba409c9b

commit r14-2831-g14dab1a1bcc3f0315e33d166df06520fba409c9b
Author: David Faust 
Date:   Thu Jul 27 13:55:44 2023 -0700

bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784]

BPF ISA V4 introduces sign-extending move and load operations.  This
patch makes the BPF backend generate those instructions, when enabled
and useful.

A new option, -m[no-]smov gates generation of these instructions, and is
enabled by default for -mcpu=v4 and above.  Tests for the new
instructions and documentation for the new options are included.

PR target/110782
PR target/110784

gcc/

* config/bpf/bpf.opt (msmov): New option.
* config/bpf/bpf.cc (bpf_option_override): Handle it here.
* config/bpf/bpf.md (*extendsidi2): New.
(extendhidi2): New.
(extendqidi2): New.
(extendsisi2): New.
(extendhisi2): New.
(extendqisi2): New.
* doc/invoke.texi (Option Summary): Add -msmov eBPF option.
(eBPF Options): Add -m[no-]smov.  Document that -mcpu=v4
also enables -msmov.

gcc/testsuite/

* gcc.target/bpf/sload-1.c: New test.
* gcc.target/bpf/sload-pseudoc-1.c: New test.
* gcc.target/bpf/smov-1.c: New test.
* gcc.target/bpf/smov-pseudoc-1.c: New test.

Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-07-27 Thread David Malcolm via Gcc
On Thu, 2023-07-27 at 18:13 -0400, Eric Feng wrote:
> Hi Dave,
> 
> Thanks for the comments!
> 
> [...]
> > Do you have any DejaGnu tests for this functionality?  For example,
> > given PyList_New
> >   https://docs.python.org/3/c-api/list.html#c.PyList_New
> > there could be a test like:
> > 
> > /* { dg-require-effective-target python_h } */
> > 
> > #define PY_SSIZE_T_CLEAN
> > #include 
> > #include "analyzer-decls.h"
> > 
> > PyObject *
> > test_PyList_New (Py_ssize_t len)
> > {
> >   PyObject *obj = PyList_New (len);
> >   if (obj)
> >     {
> >  __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE"
> > } */
> >  __analyzer_eval (PyList_Check (obj)); /* { dg-warning "TRUE" }
> > */
> >  __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning
> > "TRUE" } */
> >     }
> >   else
> >     __analyzer_dump_path (); /* { dg-warning "path" } */
> >   return obj;
> > }
> > 
> > ...or similar, to verify that we simulate that the call can both
> > succeed and fail, and to verify properties of the store along the
> > "success" path.  Caveat: I didn't look at exactly what properties
> > you're simulating, so the above tests might need adjusting.
> > 
> 
> I am currently in the process of developing more tests. Specific to
> the test you provided as an example, we are passing all cases except
> for PyList_Check. PyList_Check does not pass because I have not yet
> added support for the various definitions of tp_flags.

As noted in our chat earlier, I don't think we can easily make these
work.  Looking at CPython's implementation: PyList_Type's initializer
here:
https://github.com/python/cpython/blob/main/Objects/listobject.c#L3101
initializes tp_flags with the flags, but:
(a) we don't see that code when compiling a user's extension module
(b) even if we did, PyList_Type is non-const, so the analyzer has to
assume that tp_flags could have been written to since it was
initialized

In theory we could specialcase such lookups, so that, say, a plugin
could register assumptions into the analyzer about the value of bits
within (PyList_Type.tp_flags).

However, this seems like a future feature.

>  I also
> encountered a minor hiccup where PyList_CheckExact appeared to give
> "UNKNOWN" rather than "TRUE", but this has since been fixed. The
> problem was caused by accidentally using the tree representation of
> struct PyList_Type as opposed to struct PyList_Type * when creating a
> pointer sval to the region for Pylist_Type.

Ah, good.

> 
> [...]
> > 
> > > Let's consider the following example which lacks error checking:
> > > 
> > > PyObject* foo() {
> > >     PyObject item = PyLong_FromLong(10);
> > >     PyObject list = PyList_New(5);
> > >     return list;
> > > }
> > > 
> > > The states for when PyLong_FromLong fails and when
> > > PyLong_FromLong
> > > succeeds are merged before the call to PyObject* list =
> > > PyList_New(5).
> > 
> > Ideally we would emit a leak warning about the "success" case of
> > PyLong_FromLong here.  I think you're running into the problem of
> > the
> > "store" part of the program_state being separate from the "malloc"
> > state machine part of program_state - I'm guessing that you're
> > creating
> > a heap_allocated_region for the new python object, but the "malloc"
> > state machine isn't transitioning the pointer from "start" to
> > "assumed-
> > non-null".  Such state machine states inhibit state-merging, and so
> > this might solve your state-merging problem.
> > 
> > I think we need a way to call
> > malloc_state_machine::on_allocator_call
> > from outside of sm-malloc.cc.  See
> > region_model::on_realloc_with_move
> > for an example of how to do something similar.
> > 
> 
> Thank you for the suggestion — this worked great and has solved the
> issue!

Excellent!

Thanks for the update
Dave



[Bug c/110834] New: Incorrect format-nonliteral warning when wrapping a printf-family function using __builtin_va_arg_pack

2023-07-27 Thread ksperling at apple dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110834

Bug ID: 110834
   Summary: Incorrect format-nonliteral warning when wrapping a
printf-family function using __builtin_va_arg_pack
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ksperling at apple dot com
  Target Milestone: ---

Created attachment 55650
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55650=edit
pre-processed example code

When wrapping a printf-family function (e.g. printf, sprintf, …) with an inline
wrapper that uses __builtin_va_arg_pack() to forward the format arguments a
format-nonliteral warning is incorrectly generated, even thought the format
string argument is annotated with the correct format attribute, e.g. this code

  #include 

  __inline__ __attribute__((__always_inline__, __format__(printf, 1, 2)))
  int myprintf(char const *fmt, ...)
  {
return printf(fmt, __builtin_va_arg_pack());
  }

results in the following warning (or error with -Werror)

  $ gcc -c -Wformat -Werror=format-nonliteral format_va_arg_pack.c 
  format_va_arg_pack.c: In function ‘myprintf’:
  format_va_arg_pack.c:6:5: error: format not a string literal, argument types
not checked [-Werror=format-nonliteral]
   6 | return printf(fmt, __builtin_va_arg_pack());
   | ^~
  cc1: some warnings being treated as errors

The gcc output below and the attached .i file were generated with gcc 10.2.1 on
Debian, but I have verified that the issue also reproduces on gcc 13.2 and gcc
trunk on godbolt.org (https://godbolt.org/z/q6xMs5481).

For a real-world manifestation of this issue, see
https://github.com/openwrt/openwrt/issues/13016 where this issue is triggered
by wrapper functions of this style that are part of the fortify-headers library
(https://git.2f30.org/fortify-headers/).


Full compiler output with -v:

Using built-in specs.
COLLECT_GCC=gcc
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 10.2.1-6'
--with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-10
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-gcn/usr,hsa
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-mutex
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.1 20210110 (Debian 10.2.1-6) 
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-c' '-Wformat=1'
'-Werror=format-nonliteral' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/10/cc1 -E -quiet -v -imultiarch x86_64-linux-gnu
format_va_arg_pack.c -mtune=generic -march=x86-64 -Wformat=1
-Werror=format-nonliteral -fpch-preprocess -fasynchronous-unwind-tables -o
format_va_arg_pack.i
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/10/include-fixed"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/10/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-linux-gnu/10/include
 /usr/local/include
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-c' '-Wformat=1'
'-Werror=format-nonliteral' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/10/cc1 -fpreprocessed format_va_arg_pack.i
-quiet -dumpbase format_va_arg_pack.c -mtune=generic -march=x86-64 -auxbase
format_va_arg_pack -Wformat=1 -Werror=format-nonliteral -version
-fasynchronous-unwind-tables -o format_va_arg_pack.s
GNU C17 (Debian 10.2.1-6) version 10.2.1 20210110 (x86_64-linux-gnu)
compiled by GNU C version 10.2.1 20210110, GMP version 6.2.1, MPFR
version 4.1.0, MPC version 1.2.0, isl version isl-0.23-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C17 (Debian 

Re: [PATCH] bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784]

2023-07-27 Thread Jose E. Marchesi via Gcc-patches


Hi David.
Thanks for the patch.

> BPF ISA V4 introduces sign-extending move and load operations.  This
> patch makes the BPF backend generate those instructions, when enabled
> and useful.
>
> A new option, -m[no-]smov gates generation of these instructions, and is
> enabled by default for -mcpu=v4 and above.  Tests for the new
> instructions and documentation for the new options are included.
>
> Tested on bpf-unknown-none.
> OK?
>
> gcc/
>
>   * config/bpf/bpf.opt (msmov): New option.
>   * config/bpf/bpf.cc (bpf_option_override): Handle it here.
>   * config/bpf/bpf.md (*extendsidi2): New.
>   (extendhidi2): New.
>   (extendqidi2): New.
>   (extendsisi2): New.
>   (extendhisi2): New.
>   (extendqisi2): New.
>   * doc/invoke.texi (Option Summary): Add -msmov eBPF option.
>   (eBPF Options): Add -m[no-]smov.  Document that -mcpu=v4
>   also enables -msmov.
>
> gcc/testsuite/
>
>   * gcc.target/bpf/sload-1.c: New test.
>   * gcc.target/bpf/sload-pseudoc-1.c: New test.
>   * gcc.target/bpf/smov-1.c: New test.
>   * gcc.target/bpf/smov-pseudoc-1.c: New test.

Looks like you forgot to mention the bugzilla PR in the changelog
entries.  Would be nice to have them there so automatic updates happen
in the bugzillas.

Other than that, OK.
Thanks!

> ---
>  gcc/config/bpf/bpf.cc |  3 ++
>  gcc/config/bpf/bpf.md | 50 +++
>  gcc/config/bpf/bpf.opt|  4 ++
>  gcc/doc/invoke.texi   |  9 +++-
>  gcc/testsuite/gcc.target/bpf/sload-1.c| 16 ++
>  .../gcc.target/bpf/sload-pseudoc-1.c  | 16 ++
>  gcc/testsuite/gcc.target/bpf/smov-1.c | 18 +++
>  gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c | 18 +++
>  8 files changed, 133 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/sload-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/sload-pseudoc-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/smov-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c
>
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index 0e07b416add..b5b5674edbb 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -262,6 +262,9 @@ bpf_option_override (void)
>if (bpf_has_sdiv == -1)
>  bpf_has_sdiv = (bpf_isa >= ISA_V4);
>  
> +  if (bpf_has_smov == -1)
> +bpf_has_smov = (bpf_isa >= ISA_V4);
> +
>/* Disable -fstack-protector as it is not supported in BPF.  */
>if (flag_stack_protect)
>  {
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 66436397bb7..a69a239b9d6 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -307,6 +307,56 @@ (define_expand "extendsidi2"
>DONE;
>  })
>  
> +;; ISA V4 introduces sign-extending move and load operations.
> +
> +(define_insn "*extendsidi2"
> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
> +(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,q")))]
> +  "bpf_has_smov"
> +  "@
> +   {movs\t%0,%1,32|%0 = (s32) %1}
> +   {ldxsw\t%0,%1|%0 = *(s32 *) (%1)}"
> +  [(set_attr "type" "alu,ldx")])
> +
> +(define_insn "extendhidi2"
> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
> +(sign_extend:DI (match_operand:HI 1 "nonimmediate_operand" "r,q")))]
> +  "bpf_has_smov"
> +  "@
> +   {movs\t%0,%1,16|%0 = (s16) %1}
> +   {ldxsh\t%0,%1|%0 = *(s16 *) (%1)}"
> +  [(set_attr "type" "alu,ldx")])
> +
> +(define_insn "extendqidi2"
> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
> +(sign_extend:DI (match_operand:QI 1 "nonimmediate_operand" "r,q")))]
> +  "bpf_has_smov"
> +  "@
> +   {movs\t%0,%1,8|%0 = (s8) %1}
> +   {ldxsb\t%0,%1|%0 = *(s8 *) (%1)}"
> +  [(set_attr "type" "alu,ldx")])
> +
> +(define_insn "extendsisi2"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(sign_extend:SI (match_operand:SI 1 "register_operand" "r")))]
> +  "bpf_has_smov"
> +  "{movs32\t%0,%1,32|%w0 = (s32) %w1}"
> +  [(set_attr "type" "alu")])
> +
> +(define_insn "extendhisi2"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(sign_extend:SI (match_operand:HI 1 "register_operand" "r")))]
> +  "bpf_has_smov"
> +  "{movs32\t%0,%1,16|%w0 = (s16) %w1}"
> +  [(set_attr "type" "alu")])
> +
> +(define_insn "extendqisi2"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(sign_extend:SI (match_operand:QI 1 "register_operand" "r")))]
> +  "bpf_has_smov"
> +  "{movs32\t%0,%1,8|%w0 = (s8) %w1}"
> +  [(set_attr "type" "alu")])
> +
>   Data movement
>  
>  (define_mode_iterator MM [QI HI SI DI SF DF])
> diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
> index b21cfcab9ea..8e240d397e4 100644
> --- a/gcc/config/bpf/bpf.opt
> +++ b/gcc/config/bpf/bpf.opt
> @@ -71,6 +71,10 @@ msdiv
>  Target Var(bpf_has_sdiv) Init(-1)
>  Enable signed division and modulus instructions.
>  
> +msmov
> +Target 

Re: [PATCH] bpf: minor doc cleanup for command-line options

2023-07-27 Thread Jose E. Marchesi via Gcc-patches


Hi David, thanks for the patch.
OK.


> This patch makes some minor cleanups to eBPF options documented in
> invoke.texi:
>  - Delete some vestigal docs for removed -mkernel option
>  - Add -mbswap and -msdiv to the option summary
>  - Note the negative versions of several options
>  - Note that -mcpu=v4 also enables -msdiv.
>
> gcc/
>
>   * doc/invoke.texi (Option Summary): Remove -mkernel eBPF option.
>   Add -mbswap and -msdiv eBPF options.
>   (eBPF Options): Remove -mkernel.  Add -mno-{jmpext, jmp32,
>   alu32, v3-atomics, bswap, sdiv}.  Document that -mcpu=v4 also
>   enables -msdiv.
> ---
>  gcc/doc/invoke.texi | 48 ++---
>  1 file changed, 23 insertions(+), 25 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index e0fd7bd5b72..91113dd5821 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -945,9 +945,10 @@ Objective-C and Objective-C++ Dialects}.
>  -mmemory-latency=@var{time}}
>  
>  @emph{eBPF Options}
> -@gccoptlist{-mbig-endian -mlittle-endian -mkernel=@var{version}
> +@gccoptlist{-mbig-endian -mlittle-endian
>  -mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext
> --mjmp32 -malu32 -mv3-atomics -mcpu=@var{version} -masm=@var{dialect}}
> +-mjmp32 -malu32 -mv3-atomics -mbswap -msdiv -mcpu=@var{version}
> +-masm=@var{dialect}}
>  
>  @emph{FR30 Options}
>  @gccoptlist{-msmall-model  -mno-lsim}
> @@ -24674,18 +24675,6 @@ the value that can be specified should be less than 
> or equal to
>  @samp{32767}.  Defaults to whatever limit is imposed by the version of
>  the Linux kernel targeted.
>  
> -@opindex mkernel
> -@item -mkernel=@var{version}
> -This specifies the minimum version of the kernel that will run the
> -compiled program.  GCC uses this version to determine which
> -instructions to use, what kernel helpers to allow, etc.  Currently,
> -@var{version} can be one of @samp{4.0}, @samp{4.1}, @samp{4.2},
> -@samp{4.3}, @samp{4.4}, @samp{4.5}, @samp{4.6}, @samp{4.7},
> -@samp{4.8}, @samp{4.9}, @samp{4.10}, @samp{4.11}, @samp{4.12},
> -@samp{4.13}, @samp{4.14}, @samp{4.15}, @samp{4.16}, @samp{4.17},
> -@samp{4.18}, @samp{4.19}, @samp{4.20}, @samp{5.0}, @samp{5.1},
> -@samp{5.2}, @samp{latest} and @samp{native}.
> -
>  @opindex mbig-endian
>  @item -mbig-endian
>  Generate code for a big-endian target.
> @@ -24696,30 +24685,38 @@ Generate code for a little-endian target.  This is 
> the default.
>  
>  @opindex mjmpext
>  @item -mjmpext
> -Enable generation of extra conditional-branch instructions.
> +@itemx -mno-jmpext
> +Enable or disable generation of extra conditional-branch instructions.
>  Enabled for CPU v2 and above.
>  
>  @opindex mjmp32
>  @item -mjmp32
> -Enable 32-bit jump instructions. Enabled for CPU v3 and above.
> +@itemx -mno-jmp32
> +Enable or disable generation of 32-bit jump instructions.
> +Enabled for CPU v3 and above.
>  
>  @opindex malu32
>  @item -malu32
> -Enable 32-bit ALU instructions. Enabled for CPU v3 and above.
> +@itemx -mno-alu32
> +Enable or disable generation of 32-bit ALU instructions.
> +Enabled for CPU v3 and above.
> +
> +@opindex mv3-atomics
> +@item -mv3-atomics
> +@itemx -mno-v3-atomics
> +Enable or disable instructions for general atomic operations introduced
> +in CPU v3.  Enabled for CPU v3 and above.
>  
>  @opindex mbswap
>  @item -mbswap
> -Enable byte swap instructions.  Enabled for CPU v4 and above.
> +@itemx -mno-bswap
> +Enable or disable byte swap instructions.  Enabled for CPU v4 and above.
>  
>  @opindex msdiv
>  @item -msdiv
> -Enable signed division and modulus instructions.  Enabled for CPU v4
> -and above.
> -
> -@opindex mv3-atomics
> -@item -mv3-atomics
> -Enable instructions for general atomic operations introduced in CPU v3.
> -Enabled for CPU v3 and above.
> +@itemx -mno-sdiv
> +Enable or disable signed division and modulus instructions.  Enabled for
> +CPU v4 and above.
>  
>  @opindex mcpu
>  @item -mcpu=@var{version}
> @@ -24747,6 +24744,7 @@ All features of v2, plus:
>  All features of v3, plus:
>  @itemize @minus
>  @item Byte swap instructions, as in @option{-mbswap}
> +@item Signed division and modulus instructions, as in @option{-msdiv}
>  @end itemize
>  @end table


Re: [PATCH] Add -fsarif-time-report [PR109361]

2023-07-27 Thread David Malcolm via Gcc-patches
On Tue, 2023-04-11 at 08:43 +, Richard Biener wrote:
> On Tue, 4 Apr 2023, David Malcolm wrote:
> 
> > Richi, Jakub: I can probably self-approve this, but it's
> > technically a
> > new feature.  OK if I push this to trunk in stage 4?  I believe
> > it's
> > low risk, and is very useful for benchmarking -fanalyzer.
> 
> Please wait for stage1 at this point.  One comment on the patch
> below ...
> 
> > 
> > This patch adds support for embeddding profiling information about
> > the
> > compiler itself into the SARIF output.
> > 
> > In an earlier version of this patch I extended -ftime-report so
> > that
> > as well as writing to stderr, it would embed the information in any
> > SARIF output.  This turned out to be awkward to use, in that I
> > found
> > myself needing to get the data in JSON form without also having it
> > emitted on stderr (which was affecting the output of the build).
> > 
> > Hence this version of the patch adds a new -fsarif-time-report,
> > similar
> > to the existing -ftime-report for requesting GCC profile itself
> > using
> > the timevar machinery.
> > 
> > Specifically, if -fsarif-time-report is specified, the timing
> > information will be captured (as if -ftime-report were specified),
> > and
> > will be embedded in JSON form within any SARIF as a
> > "gcc/timeReport"
> > property within a property bag of the "invocation" object.
> > 
> > Here's an example of the output:
> > 
> >   "invocations": [
> >   {
> >   "executionSuccessful": true,
> >   "toolExecutionNotifications": [],
> >   "properties": {
> >   "gcc/timeReport": {
> >   "timevars": [
> >   {
> >   "name": "phase setup",
> >   "elapsed": {
> >   "user": 0.04,
> >   "sys": 0,
> >   "wall": 0.04,
> >   "ggc_mem": 1863472
> >   }
> >   },
> > 
> >   [...snip...]
> > 
> >   {
> >   "name": "analyzer: processing worklist",
> >   "elapsed": {
> >   "user": 0.06,
> >   "sys": 0,
> >   "wall": 0.06,
> >   "ggc_mem": 48
> >   }
> >   },
> >   {
> >   "name": "analyzer: emitting diagnostics",
> >   "elapsed": {
> >   "user": 0.01,
> >   "sys": 0,
> >   "wall": 0.01,
> >   "ggc_mem": 0
> >   }
> >   },
> >   {
> >   "name": "TOTAL",
> >   "elapsed": {
> >   "user": 0.21,
> >   "sys": 0.03,
> >   "wall": 0.24,
> >   "ggc_mem": 3368736
> >   }
> >   }
> >   ],
> >   "CHECKING_P": true,
> >   "flag_checking": true
> >   }
> >   }
> >   }
> >   ]
> > 
> > I have successfully used this in my analyzer integration tests to
> > get
> > timing information about which source files get slowed down by the
> > analyzer.  I've validated the generated .sarif files against the
> > SARIF
> > schema.
> > 
> > The documentation notes that the precise output format is subject
> > to change.
> > 
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > 
> > gcc/ChangeLog:
> > PR analyzer/109361
> > * common.opt (fsarif-time-report): New option.
> 
> 'sarif' is currently used only with -fdiagnostics-format= it seems.
> We already have
> 
> ftime-report
> Common Var(time_report)
> Report the time taken by each compiler pass.
> 
> ftime-report-details
> Common Var(time_report_details)
> Record times taken by sub-phases separately. 
> 
> so -fsarif-time-report is not a) -ftime-report-sarif and b) it's
> unclear if it applies to -ftime-report or to both -ftime-report
> and -ftime-report-details?  (note -ftime-report-details needs
> -ftime-report to be effective)
> 
> I'd rather have a -ftime-report-format= (or -freport-format in
> case we want to cover -fmem-report, -fmem-report-wpa,
> -fpre-ipa-mem-report and -fpost-ipa-mem-report as well?)
> 
> ISTR there's a summer of code project in this are as well.
> 
> Thanks,
> Richard.

Revisiting this; sorry about the delay.

As I understand the status quo, we currently have:
* -ftime-report: enable capturing of timing information (with a slight
speed hit), and report it to stderr
* -ftime-report-details: tweak how that information is captured (if -

Re: Update and Questions on CPython Extension Module -fanalyzer plugin development

2023-07-27 Thread Eric Feng via Gcc
Hi Dave,

Thanks for the comments!

[...]
> Do you have any DejaGnu tests for this functionality?  For example,
> given PyList_New
>   https://docs.python.org/3/c-api/list.html#c.PyList_New
> there could be a test like:
>
> /* { dg-require-effective-target python_h } */
>
> #define PY_SSIZE_T_CLEAN
> #include 
> #include "analyzer-decls.h"
>
> PyObject *
> test_PyList_New (Py_ssize_t len)
> {
>   PyObject *obj = PyList_New (len);
>   if (obj)
> {
>  __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
>  __analyzer_eval (PyList_Check (obj)); /* { dg-warning "TRUE" } */
>  __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
> }
>   else
> __analyzer_dump_path (); /* { dg-warning "path" } */
>   return obj;
> }
>
> ...or similar, to verify that we simulate that the call can both
> succeed and fail, and to verify properties of the store along the
> "success" path.  Caveat: I didn't look at exactly what properties
> you're simulating, so the above tests might need adjusting.
>

I am currently in the process of developing more tests. Specific to
the test you provided as an example, we are passing all cases except
for PyList_Check. PyList_Check does not pass because I have not yet
added support for the various definitions of tp_flags. I also
encountered a minor hiccup where PyList_CheckExact appeared to give
"UNKNOWN" rather than "TRUE", but this has since been fixed. The
problem was caused by accidentally using the tree representation of
struct PyList_Type as opposed to struct PyList_Type * when creating a
pointer sval to the region for Pylist_Type.

[...]
>
> > Let's consider the following example which lacks error checking:
> >
> > PyObject* foo() {
> > PyObject item = PyLong_FromLong(10);
> > PyObject list = PyList_New(5);
> > return list;
> > }
> >
> > The states for when PyLong_FromLong fails and when PyLong_FromLong
> > succeeds are merged before the call to PyObject* list =
> > PyList_New(5).
>
> Ideally we would emit a leak warning about the "success" case of
> PyLong_FromLong here.  I think you're running into the problem of the
> "store" part of the program_state being separate from the "malloc"
> state machine part of program_state - I'm guessing that you're creating
> a heap_allocated_region for the new python object, but the "malloc"
> state machine isn't transitioning the pointer from "start" to "assumed-
> non-null".  Such state machine states inhibit state-merging, and so
> this might solve your state-merging problem.
>
> I think we need a way to call malloc_state_machine::on_allocator_call
> from outside of sm-malloc.cc.  See region_model::on_realloc_with_move
> for an example of how to do something similar.
>

Thank you for the suggestion — this worked great and has solved the issue!

Best,
Eric


[PATCH] bpf: ISA V4 sign-extending move and load insns [PR110782, PR110784]

2023-07-27 Thread David Faust via Gcc-patches
BPF ISA V4 introduces sign-extending move and load operations.  This
patch makes the BPF backend generate those instructions, when enabled
and useful.

A new option, -m[no-]smov gates generation of these instructions, and is
enabled by default for -mcpu=v4 and above.  Tests for the new
instructions and documentation for the new options are included.

Tested on bpf-unknown-none.
OK?

gcc/

* config/bpf/bpf.opt (msmov): New option.
* config/bpf/bpf.cc (bpf_option_override): Handle it here.
* config/bpf/bpf.md (*extendsidi2): New.
(extendhidi2): New.
(extendqidi2): New.
(extendsisi2): New.
(extendhisi2): New.
(extendqisi2): New.
* doc/invoke.texi (Option Summary): Add -msmov eBPF option.
(eBPF Options): Add -m[no-]smov.  Document that -mcpu=v4
also enables -msmov.

gcc/testsuite/

* gcc.target/bpf/sload-1.c: New test.
* gcc.target/bpf/sload-pseudoc-1.c: New test.
* gcc.target/bpf/smov-1.c: New test.
* gcc.target/bpf/smov-pseudoc-1.c: New test.
---
 gcc/config/bpf/bpf.cc |  3 ++
 gcc/config/bpf/bpf.md | 50 +++
 gcc/config/bpf/bpf.opt|  4 ++
 gcc/doc/invoke.texi   |  9 +++-
 gcc/testsuite/gcc.target/bpf/sload-1.c| 16 ++
 .../gcc.target/bpf/sload-pseudoc-1.c  | 16 ++
 gcc/testsuite/gcc.target/bpf/smov-1.c | 18 +++
 gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c | 18 +++
 8 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/sload-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/sload-pseudoc-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/smov-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 0e07b416add..b5b5674edbb 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -262,6 +262,9 @@ bpf_option_override (void)
   if (bpf_has_sdiv == -1)
 bpf_has_sdiv = (bpf_isa >= ISA_V4);
 
+  if (bpf_has_smov == -1)
+bpf_has_smov = (bpf_isa >= ISA_V4);
+
   /* Disable -fstack-protector as it is not supported in BPF.  */
   if (flag_stack_protect)
 {
diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 66436397bb7..a69a239b9d6 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -307,6 +307,56 @@ (define_expand "extendsidi2"
   DONE;
 })
 
+;; ISA V4 introduces sign-extending move and load operations.
+
+(define_insn "*extendsidi2"
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
+(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,q")))]
+  "bpf_has_smov"
+  "@
+   {movs\t%0,%1,32|%0 = (s32) %1}
+   {ldxsw\t%0,%1|%0 = *(s32 *) (%1)}"
+  [(set_attr "type" "alu,ldx")])
+
+(define_insn "extendhidi2"
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
+(sign_extend:DI (match_operand:HI 1 "nonimmediate_operand" "r,q")))]
+  "bpf_has_smov"
+  "@
+   {movs\t%0,%1,16|%0 = (s16) %1}
+   {ldxsh\t%0,%1|%0 = *(s16 *) (%1)}"
+  [(set_attr "type" "alu,ldx")])
+
+(define_insn "extendqidi2"
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
+(sign_extend:DI (match_operand:QI 1 "nonimmediate_operand" "r,q")))]
+  "bpf_has_smov"
+  "@
+   {movs\t%0,%1,8|%0 = (s8) %1}
+   {ldxsb\t%0,%1|%0 = *(s8 *) (%1)}"
+  [(set_attr "type" "alu,ldx")])
+
+(define_insn "extendsisi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(sign_extend:SI (match_operand:SI 1 "register_operand" "r")))]
+  "bpf_has_smov"
+  "{movs32\t%0,%1,32|%w0 = (s32) %w1}"
+  [(set_attr "type" "alu")])
+
+(define_insn "extendhisi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(sign_extend:SI (match_operand:HI 1 "register_operand" "r")))]
+  "bpf_has_smov"
+  "{movs32\t%0,%1,16|%w0 = (s16) %w1}"
+  [(set_attr "type" "alu")])
+
+(define_insn "extendqisi2"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(sign_extend:SI (match_operand:QI 1 "register_operand" "r")))]
+  "bpf_has_smov"
+  "{movs32\t%0,%1,8|%w0 = (s8) %w1}"
+  [(set_attr "type" "alu")])
+
  Data movement
 
 (define_mode_iterator MM [QI HI SI DI SF DF])
diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt
index b21cfcab9ea..8e240d397e4 100644
--- a/gcc/config/bpf/bpf.opt
+++ b/gcc/config/bpf/bpf.opt
@@ -71,6 +71,10 @@ msdiv
 Target Var(bpf_has_sdiv) Init(-1)
 Enable signed division and modulus instructions.
 
+msmov
+Target Var(bpf_has_smov) Init(-1)
+Enable signed move and memory load instructions.
+
 mcpu=
 Target RejectNegative Joined Var(bpf_isa) Enum(bpf_isa) Init(ISA_V4)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 91113dd5821..e574acfd612 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -947,7 +947,7 @@ Objective-C and Objective-C++ Dialects}.
 @emph{eBPF Options}
 @gccoptlist{-mbig-endian -mlittle-endian
 

[Bug testsuite/108835] gm2 tests at large -jNN numbers do not return

2023-07-27 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108835

--- Comment #9 from Gaius Mulley  ---
This looks fixed from the commit trail - can this PR be closed now?

[PATCH] bpf: minor doc cleanup for command-line options

2023-07-27 Thread David Faust via Gcc-patches
This patch makes some minor cleanups to eBPF options documented in
invoke.texi:
 - Delete some vestigal docs for removed -mkernel option
 - Add -mbswap and -msdiv to the option summary
 - Note the negative versions of several options
 - Note that -mcpu=v4 also enables -msdiv.

gcc/

* doc/invoke.texi (Option Summary): Remove -mkernel eBPF option.
Add -mbswap and -msdiv eBPF options.
(eBPF Options): Remove -mkernel.  Add -mno-{jmpext, jmp32,
alu32, v3-atomics, bswap, sdiv}.  Document that -mcpu=v4 also
enables -msdiv.
---
 gcc/doc/invoke.texi | 48 ++---
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e0fd7bd5b72..91113dd5821 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -945,9 +945,10 @@ Objective-C and Objective-C++ Dialects}.
 -mmemory-latency=@var{time}}
 
 @emph{eBPF Options}
-@gccoptlist{-mbig-endian -mlittle-endian -mkernel=@var{version}
+@gccoptlist{-mbig-endian -mlittle-endian
 -mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext
--mjmp32 -malu32 -mv3-atomics -mcpu=@var{version} -masm=@var{dialect}}
+-mjmp32 -malu32 -mv3-atomics -mbswap -msdiv -mcpu=@var{version}
+-masm=@var{dialect}}
 
 @emph{FR30 Options}
 @gccoptlist{-msmall-model  -mno-lsim}
@@ -24674,18 +24675,6 @@ the value that can be specified should be less than or 
equal to
 @samp{32767}.  Defaults to whatever limit is imposed by the version of
 the Linux kernel targeted.
 
-@opindex mkernel
-@item -mkernel=@var{version}
-This specifies the minimum version of the kernel that will run the
-compiled program.  GCC uses this version to determine which
-instructions to use, what kernel helpers to allow, etc.  Currently,
-@var{version} can be one of @samp{4.0}, @samp{4.1}, @samp{4.2},
-@samp{4.3}, @samp{4.4}, @samp{4.5}, @samp{4.6}, @samp{4.7},
-@samp{4.8}, @samp{4.9}, @samp{4.10}, @samp{4.11}, @samp{4.12},
-@samp{4.13}, @samp{4.14}, @samp{4.15}, @samp{4.16}, @samp{4.17},
-@samp{4.18}, @samp{4.19}, @samp{4.20}, @samp{5.0}, @samp{5.1},
-@samp{5.2}, @samp{latest} and @samp{native}.
-
 @opindex mbig-endian
 @item -mbig-endian
 Generate code for a big-endian target.
@@ -24696,30 +24685,38 @@ Generate code for a little-endian target.  This is 
the default.
 
 @opindex mjmpext
 @item -mjmpext
-Enable generation of extra conditional-branch instructions.
+@itemx -mno-jmpext
+Enable or disable generation of extra conditional-branch instructions.
 Enabled for CPU v2 and above.
 
 @opindex mjmp32
 @item -mjmp32
-Enable 32-bit jump instructions. Enabled for CPU v3 and above.
+@itemx -mno-jmp32
+Enable or disable generation of 32-bit jump instructions.
+Enabled for CPU v3 and above.
 
 @opindex malu32
 @item -malu32
-Enable 32-bit ALU instructions. Enabled for CPU v3 and above.
+@itemx -mno-alu32
+Enable or disable generation of 32-bit ALU instructions.
+Enabled for CPU v3 and above.
+
+@opindex mv3-atomics
+@item -mv3-atomics
+@itemx -mno-v3-atomics
+Enable or disable instructions for general atomic operations introduced
+in CPU v3.  Enabled for CPU v3 and above.
 
 @opindex mbswap
 @item -mbswap
-Enable byte swap instructions.  Enabled for CPU v4 and above.
+@itemx -mno-bswap
+Enable or disable byte swap instructions.  Enabled for CPU v4 and above.
 
 @opindex msdiv
 @item -msdiv
-Enable signed division and modulus instructions.  Enabled for CPU v4
-and above.
-
-@opindex mv3-atomics
-@item -mv3-atomics
-Enable instructions for general atomic operations introduced in CPU v3.
-Enabled for CPU v3 and above.
+@itemx -mno-sdiv
+Enable or disable signed division and modulus instructions.  Enabled for
+CPU v4 and above.
 
 @opindex mcpu
 @item -mcpu=@var{version}
@@ -24747,6 +24744,7 @@ All features of v2, plus:
 All features of v3, plus:
 @itemize @minus
 @item Byte swap instructions, as in @option{-mbswap}
+@item Signed division and modulus instructions, as in @option{-msdiv}
 @end itemize
 @end table
 
-- 
2.40.1



[Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022

2023-07-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #16 from Jan Hubicka  ---
It is really hard to make loop splitting to do something.
It does not like canonicalized invariant variables since loop exit condition
should not be NE_EXPR and it does not like when VRP turns LT/GT into NE.

This is what happens in hmmer.  There is loop iterating 100 times and splitting
happens just before last BB
int M = 100;

void
__attribute__ ((noinline,noipa))
do_something()
{
}
void
__attribute__ ((noinline,noipa))
do_something2()
{
}

__attribute__ ((noinline,noipa))
void test1 (int n)
{
  if (n <= 0 || n > 10)
return; 
  for (int i = 0; i <= n; i++)
  if (i < n)
  do_something ();
  else
  do_something2 ();
}
int
main(int, char **)
{
for (int i = 0 ; i < 1000; i++)
  test1(M);
return 0;
}

[Bug c++/110824] Gcc crashing on a lambda capture

2023-07-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110824

--- Comment #5 from Andrew Pinski  ---
(In reply to Denis Yaroshevskiy from comment #4)
> Appreciate it.
> 
> I'm still going to support gcc11 for the forseable future. Is there some
> easy way you see I can confirm that this is this issue?
> So that I don't create more duplicates?

In this case, the pattern is simple is there a trailing return type and does it
use decltype with a template function and is that template function defined
(and in that scope or outer scope) then this will be a dup of that issue. (it
is only valid C++20 rather than being valid C++17 too).

Hope that helps. 

Also note C++20 support in GCC is still being fixed in many areas too so using
GCC 11 which came out less than 4 months after C++20 was ratification and
published is definitely going to be an issue.

[Bug modula2/109586] cc1gm2 ICE when compiling large source file

2023-07-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109586

--- Comment #5 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Gaius Mulley
:

https://gcc.gnu.org/g:2286745b12070320c8dcc5c75d76dd184cb7645e

commit r13-7616-g2286745b12070320c8dcc5c75d76dd184cb7645e
Author: Gaius Mulley 
Date:   Thu Jul 27 22:11:26 2023 +0100

PR modula2/109586 cc1gm2 ICE when compiling large source files.

The function m2block_RememberConstant calls m2tree_IsAConstant.
However IsAConstant does not recognise TREE_CODE(t) ==
CONSTRUCTOR as a constant.  Without this patch CONSTRUCTOR
contants are garbage collected (and not preserved) resulting in
a corrupt tree and crash.

gcc/m2/ChangeLog:

PR modula2/109586
* gm2-gcc/m2tree.cc (m2tree_IsAConstant): Add (TREE_CODE
(t) == CONSTRUCTOR) to expression.

(cherry picked from commit a7e1ee39e4fa37d005929c4ff9457d1a199559c6)

Signed-off-by: Gaius Mulley 

Re: [patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect

2023-07-27 Thread Thomas Schwinge
Hi Tobias!

On 2023-07-25T23:45:54+0200, Tobias Burnus  wrote:
> The attached patch calls CUDA's cuMemcopy2D and cuMemcpy3D
> for omp_target_memcpy_rect[,_async} for dim=2/dim=3. This should
> speed up the data transfer for noncontiguous data.

ACK, thanks.

> While being there, I ended up adding support for device to other device
> copying; while potentially slow, it is still better than not being able to
> copy - and with shared-memory, it shouldn't be that bad.

Makes sense, I guess.

> Comments, suggestions, remarks?
> If there are none, will commit it...

You're so quick -- I'm so slow...  ;-)

I've not verified all the logic in here, but I've got a few comments.

> Disclaimer: While I have done correctness tests (system with two nvptx GPUs,
> I have not done any performance tests.

Well, we should, eventually.

> (I also tested it without offloading
> configured, but that's rather boring.)

> OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect
>
> When copying a 2D or 3D rectangular memmory block, the performance is
> better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the
> data one by one. That's what this commit does.

So you've actually done some performance verification?

> Additionally, it permits device-to-device copies, if neccessary using a
> temporary variable on the host.

> --- a/include/cuda/cuda.h
> +++ b/include/cuda/cuda.h

I note that you're not actually using everything you're adding here.
(..., but I understand you're simply adding everying that relates to
these 'cuMemcpy[...]' routines -- OK as far as I'm concerned.)

> @@ -47,6 +47,7 @@ typedef void *CUevent;
>  typedef void *CUfunction;
>  typedef void *CUlinkState;
>  typedef void *CUmodule;
> +typedef void *CUarray;
>  typedef size_t (*CUoccupancyB2DSize)(int);
>  typedef void *CUstream;
>
> @@ -54,7 +55,10 @@ typedef enum {
>CUDA_SUCCESS = 0,
>CUDA_ERROR_INVALID_VALUE = 1,
>CUDA_ERROR_OUT_OF_MEMORY = 2,
> +  CUDA_ERROR_NOT_INITIALIZED = 3,
> +  CUDA_ERROR_DEINITIALIZED = 4,
>CUDA_ERROR_INVALID_CONTEXT = 201,
> +  CUDA_ERROR_INVALID_HANDLE = 400,
>CUDA_ERROR_NOT_FOUND = 500,
>CUDA_ERROR_NOT_READY = 600,
>CUDA_ERROR_LAUNCH_FAILED = 719,
> @@ -126,6 +130,75 @@ typedef enum {
>CU_LIMIT_MALLOC_HEAP_SIZE = 0x02,
>  } CUlimit;
>
> +typedef enum {
> +  CU_MEMORYTYPE_HOST = 0x01,
> +  CU_MEMORYTYPE_DEVICE = 0x02,
> +  CU_MEMORYTYPE_ARRAY = 0x03,
> +  CU_MEMORYTYPE_UNIFIED = 0x04
> +} CUmemorytype;
> +
> +typedef struct {
> +  size_t srcXInBytes, srcY;
> +  CUmemorytype srcMemoryType;
> +  const void *srcHost;
> +  CUdeviceptr srcDevice;
> +  CUarray srcArray;
> +  size_t srcPitch;
> +
> +  size_t dstXInBytes, dstY;
> +  CUmemorytype dstMemoryType;
> +  const void *dstHost;

That last one isn't 'const'.  ;-)

> +  CUdeviceptr dstDevice;
> +  CUarray dstArray;
> +  size_t dstPitch;
> +
> +  size_t WidthInBytes, Height;
> +} CUDA_MEMCPY2D;
> +
> +typedef struct {
> +  size_t srcXInBytes, srcY, srcZ;
> +  size_t srcLOD;
> +  CUmemorytype srcMemoryType;
> +  const void *srcHost;
> +  CUdeviceptr srcDevice;
> +  CUarray srcArray;
> +  void *dummy;

A 'cuda.h' that I looked at calls that last one 'reserved0', with comment
"Must be NULL".

> +  size_t srcPitch, srcHeight;
> +
> +  size_t dstXInBytes, dstY, dstZ;
> +  size_t dstLOD;
> +  CUmemorytype dstMemoryType;
> +  const void *dstHost;

Again, not 'const'.

> +  CUdeviceptr dstDevice;
> +  CUarray dstArray;
> +  void *dummy2;

Similar to above: 'reserved1', with comment "Must be NULL".

> +  size_t dstPitch, dstHeight;
> +
> +  size_t WidthInBytes, Height, Depth;
> +} CUDA_MEMCPY3D;
> +
> +typedef struct {
> +  size_t srcXInBytes, srcY, srcZ;
> +  size_t srcLOD;
> +  CUmemorytype srcMemoryType;
> +  const void *srcHost;
> +  CUdeviceptr srcDevice;
> +  CUarray srcArray;
> +  CUcontext srcContext;
> +  size_t srcPitch, srcHeight;
> +
> +  size_t dstXInBytes, dstY, dstZ;
> +  size_t dstLOD;
> +  CUmemorytype dstMemoryType;
> +  const void *dstHost;
> +  CUdeviceptr dstDevice;
> +  CUarray dstArray;
> +  CUcontext dstContext;
> +  size_t dstPitch, dstHeight;
> +
> +  size_t WidthInBytes, Height, Depth;
> +} CUDA_MEMCPY3D_PEER;
> +
>  #define cuCtxCreate cuCtxCreate_v2
>  CUresult cuCtxCreate (CUcontext *, unsigned, CUdevice);
>  #define cuCtxDestroy cuCtxDestroy_v2
> @@ -183,6 +256,18 @@ CUresult cuMemcpyDtoHAsync (void *, CUdeviceptr, size_t, 
> CUstream);
>  CUresult cuMemcpyHtoD (CUdeviceptr, const void *, size_t);
>  #define cuMemcpyHtoDAsync cuMemcpyHtoDAsync_v2
>  CUresult cuMemcpyHtoDAsync (CUdeviceptr, const void *, size_t, CUstream);
> +#define cuMemcpy2D cuMemcpy2D_v2
> +CUresult cuMemcpy2D (const CUDA_MEMCPY2D *);
> +#define cuMemcpy2DAsync cuMemcpy2DAsync_v2
> +CUresult cuMemcpy2DAsync (const CUDA_MEMCPY2D *, CUstream);
> +#define cuMemcpy2DUnaligned cuMemcpy2DUnaligned_v2
> +CUresult cuMemcpy2DUnaligned (const CUDA_MEMCPY2D *);
> +#define cuMemcpy3D cuMemcpy3D_v2
> +CUresult cuMemcpy3D (const CUDA_MEMCPY3D *);
> 

Re: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative

2023-07-27 Thread Patrick O'Neill

The newly added testcase fails on rv32 targets with this message:
FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test 
for excess errors)

verbose log:
compiler exited with status 1
output is:
cc1: error: ABI requires '-march=rv32'

Something like this appears to fix the issue:

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
index 14a9802667e..e10a9e9d0f5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce --param 
riscv-autovec-preference=scalable" } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3
-fno-cprop-registers -fno-dce --param riscv-autovec-preference=scalable"
 } */
 
 long

 foo (long *__restrict a, long *__restrict b, long n)

On 7/27/23 04:57, Kito Cheng via Gcc-patches wrote:


My first impression is those emit_insn (gen_rtx_SET()) seems
necessary, but I got the point after I checked vector.md :P

Committed to trunk, thanks :)


On Thu, Jul 27, 2023 at 6:23 pmjuzhe.zh...@rivai.ai
  wrote:

Oh, YES.

Thanks for fixing it. It makes sense since the ternary operations in "vector.md"
generate "vmv.v.v" according to RA.

Thanks for fixing it.

@kito: Could you confirm it? If it's ok to you, commit it for Han (I am lazy to 
commit patches :).



juzhe.zh...@rivai.ai

From: demin.han
Date: 2023-07-27 17:48
To:gcc-patches@gcc.gnu.org
CC:kito.ch...@gmail.com;juzhe.zh...@rivai.ai
Subject: [PATCH] RISC-V: Fix uninitialized and redundant use of 
which_alternative
When pass split2 starts, which_alternative is random depending on
last set of certain pass.

Even initialized, the generated movement is redundant.
The movement can be generated by assembly output template.

Signed-off-by: demin.han

gcc/ChangeLog:

* config/riscv/autovec.md: Delete which_alternative use in split

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/madd-split2-1.c: New test.

---
gcc/config/riscv/autovec.md | 12 
.../gcc.target/riscv/rvv/autovec/madd-split2-1.c| 13 +
2 files changed, 13 insertions(+), 12 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index d899922586a..b7ea3101f5a 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1012,8 +1012,6 @@ (define_insn_and_split "*fma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus 
(mode),
riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1058,8 +1056,6 @@ (define_insn_and_split "*fnma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
 riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1102,8 +1098,6 @@ (define_insn_and_split "*fma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, 
mode),
   riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1148,8 +1142,6 @@ (define_insn_and_split "*fnma"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg (PLUS, 
mode),
   riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1194,8 +1186,6 @@ (define_insn_and_split "*fms"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
  riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (MINUS, 
mode),
   riscv_vector::RVV_TERNOP, ops, operands[4]);
@@ -1242,8 +1232,6 @@ (define_insn_and_split "*fnms"
[(const_int 0)]
{
  riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
-if (which_alternative == 2)
-  emit_insn 

[Bug tree-optimization/110817] [14 Regression] wrong code with vector compares and vector lowering

2023-07-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110817

--- Comment #9 from Andrew Pinski  ---
Here is a reduced testcase that does not need -mno-sse or any other option but
fails everywhere:
```
typedef unsigned __attribute__((__vector_size__ (1*sizeof(unsigned V;

V v;
unsigned char c;

int
main (void)
{
  V x = (v > 0) > (v != c);
  volatile signed int t = x[0];
  if (t)
__builtin_abort ();
  return 0;
}
```

t in this case is -2

[Bug fortran/110825] TYPE(*) dummy argument to generate an unused hidden argument

2023-07-27 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110825

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |anlauf at gcc dot 
gnu.org

--- Comment #4 from anlauf at gcc dot gnu.org ---
Submitted: https://gcc.gnu.org/pipermail/fortran/2023-July/059658.html

[PATCH] Fortran: do not pass hidden character length for TYPE(*) dummy [PR110825]

2023-07-27 Thread Harald Anlauf via Gcc-patches
Dear all,

when passing a character actual argument to an assumed-type dummy
(TYPE(*)), we should not pass the character length for that argument,
as otherwise other hidden arguments that are passed as part of the
gfortran ABI will not be interpreted correctly.  This is in line
with the current way the procedure decl is generated.

The attached patch fixes the caller and clarifies the behavior
in the documentation.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 199e09c9862f5afe7e583839bc1b108c741a7efb Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 27 Jul 2023 21:30:26 +0200
Subject: [PATCH] Fortran: do not pass hidden character length for TYPE(*)
 dummy [PR110825]

gcc/fortran/ChangeLog:

	PR fortran/110825
	* gfortran.texi: Clarify argument passing convention.
	* trans-expr.cc (gfc_conv_procedure_call): Do not pass the character
	length as hidden argument when the declared dummy argument is
	assumed-type.

gcc/testsuite/ChangeLog:

	PR fortran/110825
	* gfortran.dg/assumed_type_18.f90: New test.
---
 gcc/fortran/gfortran.texi |  3 +-
 gcc/fortran/trans-expr.cc |  1 +
 gcc/testsuite/gfortran.dg/assumed_type_18.f90 | 52 +++
 3 files changed, 55 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/assumed_type_18.f90

diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 7786d23265f..f476a3719f5 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -3750,7 +3750,8 @@ front ends of GCC, e.g. to GCC's C99 compiler for @code{_Bool}
 or GCC's Ada compiler for @code{Boolean}.)

 For arguments of @code{CHARACTER} type, the character length is passed
-as a hidden argument at the end of the argument list.  For
+as a hidden argument at the end of the argument list, except when the
+corresponding dummy argument is declared as @code{TYPE(*)}.  For
 deferred-length strings, the value is passed by reference, otherwise
 by value.  The character length has the C type @code{size_t} (or
 @code{INTEGER(kind=C_SIZE_T)} in Fortran).  Note that this is
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index ef3e6d08f78..764565476af 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -7521,6 +7521,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	  && !(fsym && fsym->ts.type == BT_DERIVED && fsym->ts.u.derived
 	   && fsym->ts.u.derived->intmod_sym_id == ISOCBINDING_PTR
 	   && fsym->ts.u.derived->from_intmod == INTMOD_ISO_C_BINDING )
+	  && !(fsym && fsym->ts.type == BT_ASSUMED)
 	  && !(fsym && UNLIMITED_POLY (fsym)))
 	vec_safe_push (stringargs, parmse.string_length);

diff --git a/gcc/testsuite/gfortran.dg/assumed_type_18.f90 b/gcc/testsuite/gfortran.dg/assumed_type_18.f90
new file mode 100644
index 000..a3d791919a2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/assumed_type_18.f90
@@ -0,0 +1,52 @@
+! { dg-do run }
+! PR fortran/110825 - TYPE(*) and character actual arguments
+
+program foo
+  use iso_c_binding, only: c_loc, c_ptr, c_associated
+  implicit none
+  character(100):: not_used = ""
+  character(:), allocatable :: deferred
+  character :: c42(6,7) = "*"
+  call sub  (not_used,  "123")
+  call sub  ("0"  , "123")
+  deferred = "d"
+  call sub  (deferred , "123")
+  call sub2 ([1.0,2.0], "123")
+  call sub2 (["1","2"], "123")
+  call sub3 (c42  , "123")
+
+contains
+
+  subroutine sub (useless_var, print_this)
+type(*),  intent(in) :: useless_var
+character(*), intent(in) :: print_this
+if (len  (print_this) /= 3) stop 1
+if (len_trim (print_this) /= 3) stop 2
+  end
+
+  subroutine sub2 (a, c)
+type(*),  intent(in) :: a(:)
+character(*), intent(in) :: c
+if (len  (c) /= 3) stop 10
+if (len_trim (c) /= 3) stop 11
+if (size (a) /= 2) stop 12
+  end
+
+  subroutine sub3 (a, c)
+type(*),  intent(in), target, optional :: a(..)
+character(*), intent(in)   :: c
+type(c_ptr) :: cpt
+if (len  (c) /= 3) stop 20
+if (len_trim (c) /= 3) stop 21
+if (.not. present (a)) stop 22
+if (rank (a) /= 2) stop 23
+if (size (a)/= 42) stop 24
+if (any (shape  (a) /= [6,7])) stop 25
+if (any (lbound (a) /= [1,1])) stop 26
+if (any (ubound (a) /= [6,7])) stop 27
+if (.not. is_contiguous (a))   stop 28
+cpt = c_loc (a)
+if (.not. c_associated (cpt))  stop 29
+  end
+
+end
--
2.35.3



[r14-2797 Regression] FAIL: 23_containers/vector/bool/110807.cc (test for excess errors) on Linux/x86_64

2023-07-27 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

7931a1de9ec87b996d51d3d60786f5c81f63919f is the first bad commit
commit 7931a1de9ec87b996d51d3d60786f5c81f63919f
Author: Jonathan Wakely 
Date:   Wed Jul 26 14:09:24 2023 +0100

libstdc++: Avoid bogus overflow warnings in std::vector [PR110807]

caused

FAIL: 23_containers/vector/bool/110807.cc (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2797/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH 5/5] testsuite part 2 for _BitInt support [PR102989]

2023-07-27 Thread Joseph Myers
I think there should be tests for _Atomic _BitInt types.  Hopefully atomic 
compound assignment just works via the logic for compare-and-exchange 
loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.

2023-07-27 Thread Richard Biener via Gcc-patches



> Am 27.07.2023 um 19:12 schrieb Roger Sayle :
> 
> 
> Hi Richard,
> 
> You're 100% right.  It’s possible to significantly clean-up this code, 
> replacing
> the body of the conditional with a call to force_reg and simplifying the 
> conditions
> under which it is called.  These improvements are implemented in the patch
> below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap and
> make -k check, both with and without -m32, as usual.
> 
> Interestingly, the CONCAT clause afterwards is still required (I've learned 
> something
> new),  as calling force_reg (or gen_reg_rtx) with HCmode, actually returns a 
> CONCAT
> instead of a REG,

Heh, interesting.

> so although the code looks dead, it's required to build libgcc during
> a bootstrap.  But the remaining clean-up is good, reducing the number of 
> source lines
> and making the logic easier to understand.
> 
> Ok for mainline?

Ok.

Thanks,
Richard 

> 2023-07-27  Roger Sayle  
>Richard Biener  
> 
> gcc/ChangeLog
>PR middle-end/28071
>PR rtl-optimization/110587
>* expr.cc (emit_group_load_1): Simplify logic for calling
>force_reg on ORIG_SRC, to avoid making a copy if the source
>is already in a pseudo register.
> 
> Roger
> --
> 
>> -Original Message-
>> From: Richard Biener 
>> Sent: 25 July 2023 12:50
>> 
>>> On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle 
>>> wrote:
>>> 
>>> This patch is the third in series of fixes for PR
>>> rtl-optimization/110587, a compile-time regression with -O0, that
>>> attempts to address the underlying cause.  As noted previously, the
>>> pathological test case pr28071.c contains a large number of useless
>>> register-to-register moves that can produce quadratic behaviour (in
>>> LRA).  These move are generated during RTL expansion in
>>> emit_group_load_1, where the middle-end attempts to simplify the
>>> source before calling extract_bit_field.  This is reasonable if the
>>> source is a complex expression (from before the tree-ssa optimizers),
>>> or a SUBREG, or a hard register, but it's not particularly useful to
>>> copy a pseudo register into a new pseudo register.  This patch eliminates 
>>> that
>> redundancy.
>>> 
>>> The -fdump-tree-expand for pr28071.c compiled with -O0 currently
>>> contains 777K lines, with this patch it contains 717K lines, i.e.
>>> saving about 60K lines (admittedly of debugging text output, but it makes 
>>> the
>> point).
>>> 
>>> 
>>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>>> and make -k check, both with and without --target_board=unix{-m32}
>>> with no new failures.  Ok for mainline?
>>> 
>>> As always, I'm happy to revert this change quickly if there's a
>>> problem, and investigate why this additional copy might (still) be
>>> needed on other
>>> non-x86 targets.
>> 
>> @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src,
>> tree type,
>> be loaded directly into the destination.  */
>>   src = orig_src;
>>   if (!MEM_P (orig_src)
>> + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
>>  && (!CONSTANT_P (orig_src)
>>  || (GET_MODE (orig_src) != mode
>>  && GET_MODE (orig_src) != VOIDmode)))
>> 
>> so that means the code guarded by the conditional could instead be 
>> transformed
>> to
>> 
>>   src = force_reg (mode, orig_src);
>> 
>> ?  Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) !=
>> VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) for the
>> move ... that might also mean we have to use force_reg (GET_MODE (orig_src) 
>> ==
>> VOIDmode ? mode : GET_MODE (orig_src), orig_src))
>> 
>> Otherwise I think this is OK, as said, using force_reg somehow would improve
>> readability here I think.
>> 
>> I also wonder how the
>> 
>>  else if (GET_CODE (src) == CONCAT)
>> 
>> case will ever trigger with the current code.
>> 
>> Richard.
>> 
>>> 
>>> 2023-07-25  Roger Sayle  
>>> 
>>> gcc/ChangeLog
>>>PR middle-end/28071
>>>PR rtl-optimization/110587
>>>* expr.cc (emit_group_load_1): Avoid copying a pseudo register into
>>>a new pseudo register, i.e. only copy hard regs into a new pseudo.
>>> 
>>> 
> 
> 


Re: [PATCH] Use substituted GDCFLAGS

2023-07-27 Thread Iain Buclaw via Gcc-patches
Excerpts from Andreas Schwab via Gcc-patches's message of Juli 24, 2023 11:15 
am:
> Ping?
> 

OK from me.

Thanks,
Iain.


[Bug other/110831] [14 regression] gcc.dg/stack-check-3.c ICEs after r14-2822-g499b8079a6419b

2023-07-27 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110831

--- Comment #1 from seurer at gcc dot gnu.org ---
Also this one:

FAIL: gcc.dg/strcmpopt_5.c (internal compiler error: in to_gcov_type, at
profile-count.h:831)
FAIL: gcc.dg/strcmpopt_5.c (test for excess errors)

Re: [PATCH] bpf: correct pseudo-C template for add3 and sub3

2023-07-27 Thread Jose E. Marchesi via Gcc-patches


> The pseudo-C output templates for these instructions were incorrectly
> using operand 1 rather than operand 2 on the RHS, which led to some
> very incorrect assembly generation with -masm=pseudoc.
>
> Tested on bpf-unknown-none.
> OK?

OK.  Thanks for spotting and fixing this!

>
> gcc/
>
>   * config/bpf/bpf.md (add3): Use %w2 instead of %w1
>   in pseudo-C dialect output template.
>   (sub3): Likewise.
>
> gcc/testsuite/
>
>   * gcc.target/bpf/alu-2.c: New test.
>   * gcc.target/bpf/alu-pseudoc-2.c: Likewise.
> ---
>  gcc/config/bpf/bpf.md|  4 ++--
>  gcc/testsuite/gcc.target/bpf/alu-2.c | 12 
>  gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c | 13 +
>  3 files changed, 27 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/alu-2.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 2ffc4ebd17e..66436397bb7 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -131,7 +131,7 @@ (define_insn "add3"
>  (plus:AM (match_operand:AM 1 "register_operand"   " 0,0")
>   (match_operand:AM 2 "reg_or_imm_operand" " r,I")))]
>"1"
> -  "{add\t%0,%2|%w0 += %w1}"
> +  "{add\t%0,%2|%w0 += %w2}"
>[(set_attr "type" "")])
>  
>  ;;; Subtraction
> @@ -144,7 +144,7 @@ (define_insn "sub3"
>  (minus:AM (match_operand:AM 1 "register_operand" " 0")
>(match_operand:AM 2 "register_operand" " r")))]
>""
> -  "{sub\t%0,%2|%w0 -= %w1}"
> +  "{sub\t%0,%2|%w0 -= %w2}"
>[(set_attr "type" "")])
>  
>  ;;; Negation
> diff --git a/gcc/testsuite/gcc.target/bpf/alu-2.c 
> b/gcc/testsuite/gcc.target/bpf/alu-2.c
> new file mode 100644
> index 000..0444a9bc68a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/alu-2.c
> @@ -0,0 +1,12 @@
> +/* Check add and sub instructions.  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +long foo (long x, long y)
> +{
> +  return y - x + 4;
> +}
> +
> +/* { dg-final { scan-assembler-not {sub\t(%r.),\1\n} } } */
> +/* { dg-final { scan-assembler {sub\t(\%r.),(\%r.)\n} } } */
> +/* { dg-final { scan-assembler {add\t(\%r.),4\n} } } */
> diff --git a/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c 
> b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c
> new file mode 100644
> index 000..751db2477c0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c
> @@ -0,0 +1,13 @@
> +/* Check add and sub instructions (pseudoc asm dialect).  */
> +/* { dg-do compile } */
> +/* { dg-options "-masm=pseudoc" } */
> +
> +long foo (long x, long y)
> +{
> +  return y - x + 4;
> +}
> +
> +/* { dg-final { scan-assembler-not {\t(r.) -= \1\n} } } */
> +/* { dg-final { scan-assembler {\t(r.) -= (r.)\n} } } */
> +/* { dg-final { scan-assembler {\t(r.) \+= 4\n} } } */
> +


Re: [PATCH 0/5] GCC _BitInt support [PR102989]

2023-07-27 Thread Joseph Myers
On Thu, 27 Jul 2023, Jakub Jelinek via Gcc-patches wrote:

> - _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd 
> like
>   to enable those incrementally, but don't really see details on how such
>   bit-fields should be laid-out in memory nor passed inside of function
>   arguments; LLVM implements something, but it is a question if that is what
>   the various ABIs want

So if the x86-64 ABI (or any other _BitInt ABI that already exists) 
doesn't specify this adequately then an issue should be filed (at 
https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues in the x86-64 case).

(Note that the language specifies that e.g. _BitInt(123):45 gets promoted 
to _BitInt(123) by the integer promotions, rather than left as a type with 
the bit-field width.)

> - conversions between large/huge (see later) _BitInt and _Decimal{32,64,128}
>   aren't support and emit a sorry; I'm not familiar enough with DFP stuff
>   to implement that

Doing things incrementally might indicate first doing this only for BID 
(so sufficing for x86-64), with DPD support to be added when _BitInt 
support is added for an architecture using DPD, i.e. powerpc / s390.

This conversion is a mix of base conversion and things specific to DFP 
types.

For conversion *from DFP to _BitInt*, the DFP value needs to be 
interpreted (hopefully using existing libbid code) as the product of a 
sign, an integer and a power of 10, with appropriate truncation of the 
fractional part if there is one (and appropriate handling of infinity / 
NaN / values where the integer part obviously doesn't fit in the type as 
raising "invalid" and returning an arbitrary result).  Then it's just a 
matter of doing an integer multiplication and producing an appropriately 
signed result (which might itself overflow the range of representable 
values with the given sign, meaning "invalid" should be raised).  
Precomputed tables of powers of 10 in binary might speed up the 
multiplication process (don't know if various existing tables in libbid 
are usable for that).  It's unspecified whether "inexact" is raised for 
non-integer DFP values.

For conversion *from _BitInt to DFP*, the _BitInt value needs to be 
expressed in decimal.  In the absence of optimized multiplication / 
division for _BitInt, it seems reasonable enough to do this naively 
(repeatedly dividing by a power of 10 that fits in one limb to determine 
base 10^N digits from the least significant end, for example), modulo 
detecting obvious overflow cases up front (if the absolute value is at 
least 10^97, conversion to _Decimal32 definitely overflows in all rounding 
modes, for example, so you just need to do an overflowing computation that 
produces a result with the right sign in order to get the correct 
rounding-mode-dependent result and exceptions).  Probably it isn't 
necessary to convert most of those base 10^N digits into base 10 digits.  
Rather, it's enough to find the leading M (= precision of the DFP type in 
decimal digits) base 10 digits, plus to know whether what follows is 
exactly 0, exactly 0.5, between 0 and 0.5, or between 0.5 and 1.

Then adding two appropriate DFP values with the right sign produces the 
final DFP result.  Those DFP values would need to be produced from integer 
digits together with the relevant power of 10.  And there might be 
multiple possible choices for the DFP quantum exponent; the preferred 
exponent for exact results is 0, so the resulting exponent needs to be 
chosen to be as close to 0 as possible (which also produces correct 
results when the result is inexact).  (If the result is 0, note that 
quantum exponent of 0 is not the same as the zero from default 
initialization, which has the least exponent possible.)

-- 
Joseph S. Myers
jos...@codesourcery.com


[Bug middle-end/110833] New: gamess regression on Ice Lake with -Ofast -march=native between g:1c6231c05bdccab3 (2023-07-21 03:06) and g:bbc1a102735c72e3 (2023-07-23 04:55)

2023-07-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110833

Bug ID: 110833
   Summary: gamess regression on Ice Lake with -Ofast
-march=native between g:1c6231c05bdccab3 (2023-07-21
03:06) and g:bbc1a102735c72e3 (2023-07-23 04:55)
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=798.50.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=790.50.0

It may be interesting to know why it improved and now regressed again.

[Bug middle-end/110832] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core

2023-07-27 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #2 from Jan Hubicka  ---
I tested that the profile change makes no difference.

Make store likely in optimize_mask_stores

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi,
as discussed with Richard, we want store to be likely in
optimize_mask_stores.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* tree-vect-loop.cc (optimize_mask_stores): Make store
likely.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 2561552fe6e..a83952aff60 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11741,7 +11741,7 @@ optimize_mask_stores (class loop *loop)
   e->flags = EDGE_TRUE_VALUE;
   efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
   /* Put STORE_BB to likely part.  */
-  efalse->probability = profile_probability::unlikely ();
+  efalse->probability = profile_probability::likely ();
   e->probability = efalse->probability.invert ();
   store_bb->count = efalse->count ();
   make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU);


Fix profile update after RTL unrolling

2023-07-27 Thread Jan Hubicka via Gcc-patches
This patch fixes profile update after RTL unroll, that is now done same way as
in tree one.  We still produce (slightly) corrupted profile for multiple exit
loops I can try to fix incrementally.

I also updated testcases to look for profile mismatches so they do not creep
back in again.

Bootstrapped/regtested x86_64-liux, comitted.

gcc/ChangeLog:

* cfgloop.h (single_dom_exit): Declare.
* cfgloopmanip.h (update_exit_probability_after_unrolling): Declare.
* cfgrtl.cc (struct cfg_hooks): Fix comment.
* loop-unroll.cc (unroll_loop_constant_iterations): Update exit edge.
* tree-ssa-loop-ivopts.h (single_dom_exit): Do not declare it here.
* tree-ssa-loop-manip.cc (update_exit_probability_after_unrolling):
Break out from ...
(tree_transform_and_unroll_loop): ... here;

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/peel-1.c: Test for profile mismatches.
* gcc.dg/tree-prof/unroll-1.c: Test for profile mismatches.
* gcc.dg/tree-ssa/peel1.c: Test for profile mismatches.
* gcc.dg/unroll-1.c: Test for profile mismatches.
* gcc.dg/unroll-3.c: Test for profile mismatches.
* gcc.dg/unroll-4.c: Test for profile mismatches.
* gcc.dg/unroll-5.c: Test for profile mismatches.
* gcc.dg/unroll-6.c: Test for profile mismatches.

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 22293e1c237..c4622d4b853 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -921,6 +921,7 @@ extern bool get_estimated_loop_iterations (class loop 
*loop, widest_int *nit);
 extern bool get_max_loop_iterations (const class loop *loop, widest_int *nit);
 extern bool get_likely_max_loop_iterations (class loop *loop, widest_int *nit);
 extern int bb_loop_depth (const_basic_block);
+extern edge single_dom_exit (class loop *);
 
 /* Converts VAL to widest_int.  */
 
diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
index af6a29f70c4..dab7b31c1e7 100644
--- a/gcc/cfgloopmanip.h
+++ b/gcc/cfgloopmanip.h
@@ -68,5 +68,6 @@ class loop * loop_version (class loop *, void *,
 void adjust_loop_info_after_peeling (class loop *loop, int npeel, bool 
precise);
 void scale_dominated_blocks_in_loop (class loop *loop, basic_block bb,
 profile_count num, profile_count den);
+void update_exit_probability_after_unrolling (class loop *loop, edge new_exit);
 
 #endif /* GCC_CFGLOOPMANIP_H */
diff --git a/gcc/cfgrtl.cc b/gcc/cfgrtl.cc
index 36e43d0d737..abcb472e2a2 100644
--- a/gcc/cfgrtl.cc
+++ b/gcc/cfgrtl.cc
@@ -5409,7 +5409,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = {
   rtl_flow_call_edges_add,
   NULL, /* execute_on_growing_pred */
   NULL, /* execute_on_shrinking_pred */
-  duplicate_loop_body_to_header_edge, /* duplicate loop for trees */
+  duplicate_loop_body_to_header_edge, /* duplicate loop for rtl */
   rtl_lv_add_condition_to_bb, /* lv_add_condition_to_bb */
   NULL, /* lv_adjust_loop_header_phi*/
   rtl_extract_cond_bb_edges, /* extract_cond_bb_edges */
diff --git a/gcc/loop-unroll.cc b/gcc/loop-unroll.cc
index 9d8ba11..bbfa6ccc770 100644
--- a/gcc/loop-unroll.cc
+++ b/gcc/loop-unroll.cc
@@ -487,6 +487,7 @@ unroll_loop_constant_iterations (class loop *loop)
   bool exit_at_end = loop_exit_at_end_p (loop);
   struct opt_info *opt_info = NULL;
   bool ok;
+  bool flat = maybe_flat_loop_profile (loop);
 
   niter = desc->niter;
 
@@ -603,9 +604,14 @@ unroll_loop_constant_iterations (class loop *loop)
   ok = duplicate_loop_body_to_header_edge (
 loop, loop_latch_edge (loop), max_unroll, wont_exit, desc->out_edge,
 _edges,
-DLTHE_FLAG_UPDATE_FREQ | (opt_info ? DLTHE_RECORD_COPY_NUMBER : 0));
+DLTHE_FLAG_UPDATE_FREQ | (opt_info ? DLTHE_RECORD_COPY_NUMBER : 0)
+| (flat ? DLTHE_FLAG_FLAT_PROFILE : 0));
   gcc_assert (ok);
 
+  edge new_exit = single_dom_exit (loop);
+  if (new_exit)
+update_exit_probability_after_unrolling (loop, new_exit);
+
   if (opt_info)
 {
   apply_opt_in_copies (opt_info, max_unroll, true, true);
diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index 88a6431c21a..e860c5db540 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -650,6 +650,9 @@ public:
   return *this;
 }
 
+  /* Compute n-th power.  */
+  profile_probability pow (int) const;
+
   /* Get the value of the count.  */
   uint32_t value () const { return m_val; }
 
diff --git a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c 
b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
index 7245b68c1ee..32ecccb16da 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -fdump-tree-cunroll-details -fno-unroll-loops 
-fpeel-loops" } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details-blocks 
-fdump-tree-optimized-details-blocks -fno-unroll-loops -fpeel-loops" } */
 void abort();
 
 int a[1000];
@@ -21,3 +21,5 @@ main()
   return 0;
 }
 /* { dg-final-use { scan-tree-dump "Peeled loop ., 1 times" "cunroll" } 

[Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1

2023-07-27 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

Roger Sayle  changed:

   What|Removed |Added

   Assignee|roger at nextmovesoftware dot com  |unassigned at gcc dot 
gnu.org

--- Comment #16 from Roger Sayle  ---
My patch (in comment #15) is obsoleted by Richard Biener's much better
solution(s):
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625416.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625417.html

[Bug rtl-optimization/110701] [14 Regression] Wrong code at -O1/2/3/s on x86_64-linux-gnu

2023-07-27 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110701

--- Comment #7 from Roger Sayle  ---
Patch proposed here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625532.html

[PATCH] bpf: correct pseudo-C template for add3 and sub3

2023-07-27 Thread David Faust via Gcc-patches
The pseudo-C output templates for these instructions were incorrectly
using operand 1 rather than operand 2 on the RHS, which led to some
very incorrect assembly generation with -masm=pseudoc.

Tested on bpf-unknown-none.
OK?

gcc/

* config/bpf/bpf.md (add3): Use %w2 instead of %w1
in pseudo-C dialect output template.
(sub3): Likewise.

gcc/testsuite/

* gcc.target/bpf/alu-2.c: New test.
* gcc.target/bpf/alu-pseudoc-2.c: Likewise.
---
 gcc/config/bpf/bpf.md|  4 ++--
 gcc/testsuite/gcc.target/bpf/alu-2.c | 12 
 gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c | 13 +
 3 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/alu-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 2ffc4ebd17e..66436397bb7 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -131,7 +131,7 @@ (define_insn "add3"
 (plus:AM (match_operand:AM 1 "register_operand"   " 0,0")
  (match_operand:AM 2 "reg_or_imm_operand" " r,I")))]
   "1"
-  "{add\t%0,%2|%w0 += %w1}"
+  "{add\t%0,%2|%w0 += %w2}"
   [(set_attr "type" "")])
 
 ;;; Subtraction
@@ -144,7 +144,7 @@ (define_insn "sub3"
 (minus:AM (match_operand:AM 1 "register_operand" " 0")
   (match_operand:AM 2 "register_operand" " r")))]
   ""
-  "{sub\t%0,%2|%w0 -= %w1}"
+  "{sub\t%0,%2|%w0 -= %w2}"
   [(set_attr "type" "")])
 
 ;;; Negation
diff --git a/gcc/testsuite/gcc.target/bpf/alu-2.c 
b/gcc/testsuite/gcc.target/bpf/alu-2.c
new file mode 100644
index 000..0444a9bc68a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/alu-2.c
@@ -0,0 +1,12 @@
+/* Check add and sub instructions.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+long foo (long x, long y)
+{
+  return y - x + 4;
+}
+
+/* { dg-final { scan-assembler-not {sub\t(%r.),\1\n} } } */
+/* { dg-final { scan-assembler {sub\t(\%r.),(\%r.)\n} } } */
+/* { dg-final { scan-assembler {add\t(\%r.),4\n} } } */
diff --git a/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c 
b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c
new file mode 100644
index 000..751db2477c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c
@@ -0,0 +1,13 @@
+/* Check add and sub instructions (pseudoc asm dialect).  */
+/* { dg-do compile } */
+/* { dg-options "-masm=pseudoc" } */
+
+long foo (long x, long y)
+{
+  return y - x + 4;
+}
+
+/* { dg-final { scan-assembler-not {\t(r.) -= \1\n} } } */
+/* { dg-final { scan-assembler {\t(r.) -= (r.)\n} } } */
+/* { dg-final { scan-assembler {\t(r.) \+= 4\n} } } */
+
-- 
2.40.1



[Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022

2023-07-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #15 from Jan Hubicka  ---
   if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
   /* If result of comparsion is unknown, prefer EARLY_BB.
 Thus use !(...>=..) rather than (...<...)  */
-  && !(best_bb->count * 100 >= early_bb->count * threshold))
+  && !(best_bb->count * 100 > early_bb->count * threshold))
 return best_bb;

Comparing loop depths seems ceartainly odd.  
If we want to test best_bb and early_bb to be in same loop, we want to test
loop_father.  What is a benefit of testing across loop nests?

Profile report here claims:
dump id |static mismat|dynamic mismatch |   
|in count |in count  |time  |   
lsplit  |  5+5|   8151850567  +8151850567| 531506481006   +57.9%| 
ldist   |  9+4|  15345493501  +7193642934| 606848841056   +14.2%| 
ifcvt   | 10+1|  15487514871   +142021370| 689469797790   +13.6%| 
vect| 35   +25|  17558425961  +2070911090| 517375405715   -25.0%| 
cunroll | 42+7|  16898736178   -659689783| 452445796198-4.9%|  
loopdone| 33-9|   2678017188 -14220718990| 330969127663 |   
tracer  | 34+1|   2678018710+1522| 330613415364+0.0%|  
fre | 33-1|   2676980249 -1038461| 330465677073-0.0%|  
expand  | 28-5|   2497468467   -179511782|--|

so looks like loop splitting, distribution and vectorizer does disturb profile
signficantly. 
(Ifcft does so by design and the damage is undone later.)
Not sure if that is the real problem though.

[Bug c++/85944] Address of temporary at global scope not considered constexpr

2023-07-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85944

Andrew Pinski  changed:

   What|Removed |Added

 CC||gccbugbjorn at fahller dot se

--- Comment #12 from Andrew Pinski  ---
*** Bug 110828 has been marked as a duplicate of this bug. ***

[Bug c++/55004] [meta-bug] constexpr issues

2023-07-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55004
Bug 55004 depends on bug 110828, which changed state.

Bug 110828 Summary: union constexpr dtor not constexpr when used in member array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110828

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

[Bug c++/110828] union constexpr dtor not constexpr when used in member array

2023-07-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110828

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Andrew Pinski  ---
(In reply to Patrick Palka from comment #1)
> Does it work if you move the static_assert into a function scope? If so then
> this is probably a dup of PR85944.

Yes it does work with:
```
void f1() {
static_assert(S{}.f());
}
```
So yes it is a dup.

*** This bug has been marked as a duplicate of bug 85944 ***

[Bug tree-optimization/110755] [13 Regression] Wrong optimization of fabs on ppc64el at -O1

2023-07-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110755

--- Comment #14 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Jakub Jelinek
:

https://gcc.gnu.org/g:e684084a5fa9edaedb1a14e118b966a60e3449b9

commit r13-7615-ge684084a5fa9edaedb1a14e118b966a60e3449b9
Author: Jakub Jelinek 
Date:   Wed Jul 26 10:50:50 2023 +0200

range-op-float: Fix up -frounding-math frange_arithmetic +- handling
[PR110755]

IEEE754 says that x + (-x) and x - x result in +0 in all rounding modes
but rounding towards negative infinity, in which case the result is -0
for all finite x.  x + x and x - (-x) if it is zero retain sign of x.
Now, range_arithmetic implements the normal rounds to even rounding,
and as the addition or subtraction in those cases is exact, we don't do any
further rounding etc. and e.g. on the testcase below distilled from glibc
compute a range [+0, +INF], which is fine for -fno-rounding-math or
if we'd have a guarantee that those statements aren't executed with
rounding
towards negative infinity.

I believe it is only +- which has this problematic behavior and I think
it is best to deal with it in frange_arithmetic; if we know -frounding-math
is on, it is x + (-x) or x - x and we are asked to round to negative
infinity (i.e. want low bound rather than high bound), change +0 result to
-0.

2023-07-26  Jakub Jelinek  

PR tree-optimization/110755
* range-op-float.cc (frange_arithmetic): Change +0 result to -0
for PLUS_EXPR or MINUS_EXPR if -frounding-math, inf is negative and
it is exact op1 + (-op1) or op1 - op1.

* gcc.dg/pr110755.c: New test.

(cherry picked from commit 21da32d995c8b574c929ec420cd3b0fcfe6fa4fe)

[Bug gcov-profile/110827] C++20 coroutines aren't being measured by gcov

2023-07-27 Thread mwd at md5i dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110827

--- Comment #4 from Michael Duggan  ---
I should be more explicit.  The `std::cout` line in the example is just a
placeholder for "does some work here," and this example is specifically the
simplest version of a coroutine I could come up with that would demonstrate the
problem.  When I initially encountered this problem, I was doing coverage
testing that included a coroutine that was over 70 lines long, includes lots of
loops and branching, and exited and re-entered multiple times via `co_yield`. 
I wanted to know if my test programs properly covered all of the branches.  It
is not enough to know how many times the coroutine itself is called.

[Bug middle-end/110832] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core

2023-07-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #1 from Jan Hubicka  ---
This time it seems that there is only one profile change:

commit 645c67f80c6258c1f54ec567f604008adbdb8a04
Author: Jan Hubicka 
Date:   Wed Jul 26 08:59:23 2023 +0200

Fix profile_count::to_sreal_scale

gcc/ChangeLog:

* profile-count.cc (profile_count::to_sreal_scale): Value is not
know
if we divide by zero.

Which should not be very important.

[Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core

2023-07-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

Bug ID: 110832
   Summary: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76
(2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27
03:44) on zen3 and core
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

Biggest regression is seen here
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=466.758.0
zen3
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=466.758.0

Curiously zen2 improves:
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=171.758.0

I can see instruction count differnece in perfs:
 Performance counter stats for './a.out':

  10923.70 msec task-clock:u #1.000 CPUs
utilized 
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
 15510  page-faults:u#1.420 K/sec   
   59062937176  cycles:u #5.407 GHz
(83.33%)
  12607081  stalled-cycles-frontend:u#0.02% frontend
cycles idle(83.34%)
 122404896  stalled-cycles-backend:u #0.21% backend
cycles idle (83.34%)
  112648123380  instructions:u   #1.91  insn per
cycle
  #0.00  stalled cycles per
insn (83.34%)
9666338531  branches:u   #  884.896 M/sec  
(83.34%)
   2937216  branch-misses:u  #0.03% of all
branches (83.31%)

  10.924108973 seconds time elapsed

  10.912056000 seconds user
   0.01200 seconds sys


 Performance counter stats for './b.out':

  11025.38 msec task-clock:u #1.000 CPUs
utilized 
 0  context-switches:u   #0.000 /sec
 0  cpu-migrations:u #0.000 /sec
 14998  page-faults:u#1.360 K/sec   
   59436352848  cycles:u #5.391 GHz
(83.31%)
   9217660  stalled-cycles-frontend:u#0.02% frontend
cycles idle(83.32%)
 210162784  stalled-cycles-backend:u #0.35% backend
cycles idle (83.35%)
  131604240004  instructions:u   #2.21  insn per
cycle
  #0.00  stalled cycles per
insn (83.35%)
9657712171  branches:u   #  875.953 M/sec  
(83.35%)
   3146487  branch-misses:u  #0.03% of all
branches (83.33%)

  11.025701172 seconds time elapsed

  11.005646000 seconds user
   0.020002000 seconds sys

however perf report does not show clear differences in times of functions.
I

Re: [PATCH 4/5] testsuite part 1 for _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 27, 2023 at 07:15:28PM +0200, Jakub Jelinek via Gcc-patches wrote:
> testcases, I've been using
> https://defuse.ca/big-number-calculator.htm
> tool, a randombitint tool I wrote (will post as a reply to this) plus
> LLVM trunk on godbolt and the WIP GCC support checking if both compilers
> agree on stuff (and in case of differences tried constant evaluation etc.).

So, the randombitint.c tool is attached, when invoked like
./randombitint 174
it prints pseudo random 174 bit integer in decimal, when invoked as
./randombitint 575 mask
it prints all ones number as decimal for the 575 bit precision, and
./randombitint 275 0x432445aebe435646567547567647
prints the given hexadecimal number as decimal (all using gmp).

In the tests I've often used
__attribute__((noipa)) void
printme (const void *p, int n)
{
  __builtin_printf ("0x");
  if ((n & 7) != 0)
__builtin_printf ("%02x", ((const unsigned char *) p)[n / 8] & ((1 << (n & 
7)) - 1));
  for (int i = (n / 8) - 1; i >= 0; --i)
__builtin_printf ("%02x", ((const unsigned char *) p)[i]);
  __builtin_printf ("\n");
}
function to print hexadecimal values (temporaries or finals) and then used
the third invocation of the tool to convert those to decimal.
For unsigned _BitInt just called the above like
  printme (, 575);
where 575 was the N from unsigned _BitInt(N) whatever, or
  _BitInt(575) x = ...
  if (x < 0)
{
  __builtin_printf ("-");
  x = -x;
}
  printme (, 575);
to print it signed.

Jakub
#include 
#include 
#include 
#include 
#include 

int
main (int argc, const char *argv[])
{
  int n = atoi (argv[1]);
  int m = (n + 7) / 8;
  char *p = __builtin_alloca (m * 2 + 1);
  const char *q;
  srandom (getpid ());
  for (int i = 0; i < m; ++i)
{
  unsigned char v = random ();
  if (argc >= 3 && strcmp (argv[2], "mask") == 0)
v = 0xff;
  if (i == 0 && (n & 7) != 0)
v &= (1 << (n & 7)) - 1;
  sprintf ([2 * i], "%02x", v);
}
  p[m * 2] = '\0';
  mpz_t a;
  if (argc >= 3 && strcmp (argv[2], "mask") != 0)
{
  q = argv[2];
  if (q[0] == '0' && q[1] == 'x')
q += 2;
}
  else
q = p;
  gmp_sscanf (q, "%Zx", a);
  gmp_printf ("0x%s\n%Zd\n", q, a);
  return 0;
}


[PATCH 3/5] C _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds the C FE support, c-family support, small libcpp change
so that 123wb and 42uwb suffixes are handled plus glimits.h change
to define BITINT_MAXWIDTH macro.

The previous two patches really do nothing without this, which enables
all the support.

2023-07-27  Jakub Jelinek  

PR c/102989
gcc/
* glimits.h (BITINT_MAXWIDTH): Define if __BITINT_MAXWIDTH__ is
predefined.
gcc/c-family/
* c-common.cc (c_common_reswords): Add _BitInt as keyword.
(c_common_signed_or_unsigned_type): Handle BITINT_TYPE.
(check_builtin_function_arguments): Handle BITINT_TYPE like
INTEGER_TYPE.
(keyword_begins_type_specifier): Handle RID_BITINT.
* c-common.h (enum rid): Add RID_BITINT enumerator.
* c-cppbuiltin.cc (c_cpp_builtins): For C call
targetm.c.bitint_type_info and predefine __BITINT_MAXWIDTH__
and for -fbuilding-libgcc also __LIBGCC_BITINT_LIMB_WIDTH__ and
__LIBGCC_BITINT_ORDER__ macros if _BitInt is supported.
* c-lex.cc (interpret_integer): Handle CPP_N_BITINT.
* c-pretty-print.cc (c_pretty_printer::simple_type_specifier,
c_pretty_printer::direct_abstract_declarator): Handle BITINT_TYPE.
(pp_c_integer_constant): Handle printing of large precision wide_ints
which would buffer overflow digit_buffer.
* c-ubsan.cc (ubsan_instrument_shift): Use UBSAN_PRINT_FORCE_INT
for type0 type descriptor.
gcc/c/
* c-convert.cc (c_convert): Handle BITINT_TYPE like INTEGER_TYPE.
* c-decl.cc (declspecs_add_type): Formatting fixes.  Handle
cts_bitint.  Adjust for added union in *specs.  Handle RID_BITINT.
(finish_declspecs): Handle cts_bitint.  Adjust for added union in
*specs.
* c-parser.cc (c_keyword_starts_typename, c_token_starts_declspecs,
c_parser_declspecs, c_parser_gnu_attribute_any_word): Handle
RID_BITINT.
* c-tree.h (enum c_typespec_keyword): Mention _BitInt in comment.
Add cts_bitint enumerator.
(struct c_declspecs): Move int_n_idx and floatn_nx_idx into a union
and add bitint_prec there as well.
* c-typeck.cc (composite_type, c_common_type, comptypes_internal):
Handle BITINT_TYPE.
(build_array_ref, build_unary_op, build_conditional_expr,
convert_for_assignment, digest_init, build_binary_op): Likewise.
libcpp/
* expr.cc (interpret_int_suffix): Handle wb and WB suffixes.
* include/cpplib.h (CPP_N_BITINT): Define.

--- gcc/glimits.h.jj2023-01-03 00:20:35.071086812 +0100
+++ gcc/glimits.h   2023-07-27 15:03:24.238234396 +0200
@@ -157,6 +157,11 @@ see the files COPYING3 and COPYING.RUNTI
 # undef BOOL_WIDTH
 # define BOOL_WIDTH 1
 
+# ifdef __BITINT_MAXWIDTH__
+#  undef BITINT_MAXWIDTH
+#  define BITINT_MAXWIDTH __BITINT_MAXWIDTH__
+# endif
+
 # define __STDC_VERSION_LIMITS_H__ 202311L
 #endif
 
--- gcc/c-family/c-common.cc.jj 2023-07-24 17:48:26.436041278 +0200
+++ gcc/c-family/c-common.cc2023-07-27 15:03:24.276233865 +0200
@@ -349,6 +349,7 @@ const struct c_common_resword c_common_r
   { "_Alignas",RID_ALIGNAS,   D_CONLY },
   { "_Alignof",RID_ALIGNOF,   D_CONLY },
   { "_Atomic", RID_ATOMIC,D_CONLY },
+  { "_BitInt", RID_BITINT,D_CONLY },
   { "_Bool",   RID_BOOL,  D_CONLY },
   { "_Complex",RID_COMPLEX,0 },
   { "_Imaginary",  RID_IMAGINARY, D_CONLY },
@@ -2728,6 +2729,9 @@ c_common_signed_or_unsigned_type (int un
   || TYPE_UNSIGNED (type) == unsignedp)
 return type;
 
+  if (TREE_CODE (type) == BITINT_TYPE)
+return build_bitint_type (TYPE_PRECISION (type), unsignedp);
+
 #define TYPE_OK(node)  \
   (TYPE_MODE (type) == TYPE_MODE (node)
\
&& TYPE_PRECISION (type) == TYPE_PRECISION (node))
@@ -6341,8 +6345,10 @@ check_builtin_function_arguments (locati
  code0 = TREE_CODE (TREE_TYPE (args[0]));
  code1 = TREE_CODE (TREE_TYPE (args[1]));
  if (!((code0 == REAL_TYPE && code1 == REAL_TYPE)
-   || (code0 == REAL_TYPE && code1 == INTEGER_TYPE)
-   || (code0 == INTEGER_TYPE && code1 == REAL_TYPE)))
+   || (code0 == REAL_TYPE
+   && (code1 == INTEGER_TYPE || code1 == BITINT_TYPE))
+   || ((code0 == INTEGER_TYPE || code0 == BITINT_TYPE)
+   && code1 == REAL_TYPE)))
{
  error_at (loc, "non-floating-point arguments in call to "
"function %qE", fndecl);
@@ -8402,6 +8408,7 @@ keyword_begins_type_specifier (enum rid
 case RID_FRACT:
 case RID_ACCUM:
 case RID_BOOL:
+case RID_BITINT:
 case RID_WCHAR:
 case RID_CHAR8:
 case RID_CHAR16:
--- gcc/c-family/c-common.h.jj  2023-06-26 09:27:04.276367532 +0200
+++ gcc/c-family/c-common.h 

[PATCH 2/5] libgcc _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds the library helpers for multiplication, division + modulo
and casts from and to floating point.
As described in the intro, the first step is try to reduce further the
passed in precision by skipping over most significant limbs with just zeros
or sign bit copies.  For multiplication and division I've implemented
a simple algorithm, using something smarter like Karatsuba or Toom N-Way
might be faster for very large _BitInts (which we don't support right now
anyway), but could mean more code in libgcc, which maybe isn't what people
are willing to accept.
For the to/from floating point conversions the patch uses soft-fp, because
it already has tons of handy macros which can be used for that.  In theory
it could be implemented using {,unsigned} long long or {,unsigned} __int128
to/from floating point conversions with some frexp before/after, but at that
point we already need to force it into integer registers and analyze it
anyway.  Plus, for 32-bit arches there is no __int128 that could be used
for XF/TF mode stuff.
I know that soft-fp is owned by glibc and I think the op-common.h change
should be propagated there, but the bitint stuff is really GCC specific
and IMHO doesn't belong into the glibc copy.

2023-07-27  Jakub Jelinek  

PR c/102989
libgcc/
* config/aarch64/t-softfp (softfp_extras): Use += rather than :=.
* config/i386/64/t-softfp (softfp_extras): Likewise.
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Export _BitInt support
routines.
* config/i386/t-softfp (softfp_extras): Add fixxfbitint and
bf, hf and xf mode floatbitint.
(CFLAGS-floatbitintbf.c, CFLAGS-floatbitinthf.c): Add -msse2.
* config/riscv/t-softfp32 (softfp_extras): Use += rather than :=.
* config/rs6000/t-e500v1-fp (softfp_extras): Likewise.
* config/rs6000/t-e500v2-fp (softfp_extras): Likewise.
* config/t-softfp (softfp_floatbitint_funcs): New.
(softfp_func_list): Add sf and df mode from and to _BitInt libcalls.
* config/t-softfp-sfdftf (softfp_extras): Add fixtfbitint and
floatbitinttf.
* config/t-softfp-tf (softfp_extras): Likewise.
* libgcc2.c (bitint_reduce_prec): New inline function.
(BITINT_INC, BITINT_END): Define.
(bitint_mul_1, bitint_addmul_1): New helper functions.
(__mulbitint3): New function.
(bitint_negate, bitint_submul_1): New helper functions.
(__divmodbitint4): New function.
* libgcc2.h (LIBGCC2_UNITS_PER_WORD): When building _BitInt support
libcalls, redefine depending on __LIBGCC_BITINT_LIMB_WIDTH__.
(__mulbitint3, __divmodbitint4): Declare.
* libgcc-std.ver.in (GCC_14.0.0): Export _BitInt support routines.
* Makefile.in (lib2funcs): Add _mulbitint3.
(LIB2_DIVMOD_FUNCS): Add _divmodbitint4.
* soft-fp/bitint.h: New file.
* soft-fp/fixdfbitint.c: New file.
* soft-fp/fixsfbitint.c: New file.
* soft-fp/fixtfbitint.c: New file.
* soft-fp/fixxfbitint.c: New file.
* soft-fp/floatbitintbf.c: New file.
* soft-fp/floatbitintdf.c: New file.
* soft-fp/floatbitinthf.c: New file.
* soft-fp/floatbitintsf.c: New file.
* soft-fp/floatbitinttf.c: New file.
* soft-fp/floatbitintxf.c: New file.
* soft-fp/op-common.h (_FP_FROM_INT): Add support for rsize up to
4 * _FP_W_TYPE_SIZE rather than just 2 * _FP_W_TYPE_SIZE.

--- libgcc/config/aarch64/t-softfp.jj   2023-03-13 00:11:52.330213322 +0100
+++ libgcc/config/aarch64/t-softfp  2023-07-14 12:38:30.764869473 +0200
@@ -3,7 +3,7 @@ softfp_int_modes := si di ti
 softfp_extensions := sftf dftf hftf bfsf
 softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
 softfp_exclude_libgcc2 := n
-softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
+softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
 floatdibf floatundibf floattibf floatuntibf
 
 TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes
--- libgcc/config/i386/64/t-softfp.jj   2023-03-10 20:39:43.849687830 +0100
+++ libgcc/config/i386/64/t-softfp  2023-07-14 12:37:55.422344930 +0200
@@ -1,4 +1,4 @@
-softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
+softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
 floattibf floatuntibf
 
 CFLAGS-fixhfti.c += -msse2
--- libgcc/config/i386/libgcc-glibc.ver.jj  2023-07-11 13:39:49.760107863 
+0200
+++ libgcc/config/i386/libgcc-glibc.ver 2023-07-17 09:45:43.128281615 +0200
@@ -226,3 +226,13 @@ GCC_13.0.0 {
   __truncxfbf2
   __trunchfbf2
 }
+
+%inherit GCC_14.0.0 GCC_13.0.0
+GCC_14.0.0 {
+  __PFX__fixxfbitint
+  __PFX__fixtfbitint
+  __PFX__floatbitintbf
+  __PFX__floatbitinthf
+  __PFX__floatbitintxf
+  __PFX__floatbitinttf
+}
--- libgcc/config/i386/t-softfp.jj  2022-10-14 09:35:56.268989311 +0200
+++ libgcc/config/i386/t-softfp 2023-07-17 09:38:43.575980078 +0200
@@ -10,7 +10,7 @@ 

RE: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.

2023-07-27 Thread Roger Sayle

Hi Richard,

You're 100% right.  It’s possible to significantly clean-up this code, replacing
the body of the conditional with a call to force_reg and simplifying the 
conditions
under which it is called.  These improvements are implemented in the patch
below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap and
make -k check, both with and without -m32, as usual.

Interestingly, the CONCAT clause afterwards is still required (I've learned 
something
new),  as calling force_reg (or gen_reg_rtx) with HCmode, actually returns a 
CONCAT
instead of a REG, so although the code looks dead, it's required to build 
libgcc during
a bootstrap.  But the remaining clean-up is good, reducing the number of source 
lines
and making the logic easier to understand.

Ok for mainline?

2023-07-27  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR middle-end/28071
PR rtl-optimization/110587
* expr.cc (emit_group_load_1): Simplify logic for calling
force_reg on ORIG_SRC, to avoid making a copy if the source
is already in a pseudo register.

Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 25 July 2023 12:50
> 
> On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle 
> wrote:
> >
> > This patch is the third in series of fixes for PR
> > rtl-optimization/110587, a compile-time regression with -O0, that
> > attempts to address the underlying cause.  As noted previously, the
> > pathological test case pr28071.c contains a large number of useless
> > register-to-register moves that can produce quadratic behaviour (in
> > LRA).  These move are generated during RTL expansion in
> > emit_group_load_1, where the middle-end attempts to simplify the
> > source before calling extract_bit_field.  This is reasonable if the
> > source is a complex expression (from before the tree-ssa optimizers),
> > or a SUBREG, or a hard register, but it's not particularly useful to
> > copy a pseudo register into a new pseudo register.  This patch eliminates 
> > that
> redundancy.
> >
> > The -fdump-tree-expand for pr28071.c compiled with -O0 currently
> > contains 777K lines, with this patch it contains 717K lines, i.e.
> > saving about 60K lines (admittedly of debugging text output, but it makes 
> > the
> point).
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> > As always, I'm happy to revert this change quickly if there's a
> > problem, and investigate why this additional copy might (still) be
> > needed on other
> > non-x86 targets.
> 
> @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src,
> tree type,
>  be loaded directly into the destination.  */
>src = orig_src;
>if (!MEM_P (orig_src)
> + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
>   && (!CONSTANT_P (orig_src)
>   || (GET_MODE (orig_src) != mode
>   && GET_MODE (orig_src) != VOIDmode)))
> 
> so that means the code guarded by the conditional could instead be transformed
> to
> 
>src = force_reg (mode, orig_src);
> 
> ?  Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) !=
> VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) for the
> move ... that might also mean we have to use force_reg (GET_MODE (orig_src) ==
> VOIDmode ? mode : GET_MODE (orig_src), orig_src))
> 
> Otherwise I think this is OK, as said, using force_reg somehow would improve
> readability here I think.
> 
> I also wonder how the
> 
>   else if (GET_CODE (src) == CONCAT)
> 
> case will ever trigger with the current code.
> 
> Richard.
> 
> >
> > 2023-07-25  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR middle-end/28071
> > PR rtl-optimization/110587
> > * expr.cc (emit_group_load_1): Avoid copying a pseudo register into
> > a new pseudo register, i.e. only copy hard regs into a new pseudo.
> >
> >

diff --git a/gcc/expr.cc b/gcc/expr.cc
index fff09dc..174f8ac 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2622,16 +2622,11 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, 
tree type,
 be loaded directly into the destination.  */
   src = orig_src;
   if (!MEM_P (orig_src)
- && (!CONSTANT_P (orig_src)
- || (GET_MODE (orig_src) != mode
- && GET_MODE (orig_src) != VOIDmode)))
+ && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
+ && !CONSTANT_P (orig_src))
{
- if (GET_MODE (orig_src) == VOIDmode)
-   src = gen_reg_rtx (mode);
- else
-   src = gen_reg_rtx (GET_MODE (orig_src));
-
- emit_move_insn (src, orig_src);
+ gcc_assert (GET_MODE (orig_src) != VOIDmode);
+ src = force_reg (GET_MODE (orig_src), orig_src);
}
 
   /* Optimize the access just a bit.  */


[PATCH 0/5] GCC _BitInt support [PR102989]

2023-07-27 Thread Jakub Jelinek via Gcc-patches
[PATCH 0/5] GCC _BitInt support [PR102989]

The following patch series introduces support for C23 bit-precise integer
types.  In short, they are similar to other integral types in many ways,
just aren't subject for integral promotions if smaller than int and they can
have even much wider precisions than ordinary integer types.

It is enabled only on targets which have agreed on processor specific
ABI how to lay those out or pass as function arguments/return values,
which currently is just x86-64 I believe, would be nice if target maintainers
helped to get agreement on psABI changes and GCC 14 could enable it on far
more architectures than just one.

C23 says that  defines BITINT_MAXWIDTH macro and that is the
largest supported precision of the _BitInt types, smallest is precision
of unsigned long long (but due to lack of psABI agreement we'll violate
that on architectures which don't have the support done yet).
The following series uses for the time just WIDE_INT_MAX_PRECISION as
that BITINT_MAXWIDTH, with the intent to increase it incrementally later
on.  WIDE_INT_MAX_PRECISION is 575 bits on x86_64, but will be even smaller
on lots of architectures.  This is the largest precision we can support
without changes of wide_int/widest_int representation (to make those non-POD
and allow use of some allocated buffer rather than the included fixed size
one).  Once that would be overcome, there is another internal enforced limit,
INTEGER_CST in current layout allows at most 255 64-bit limbs, which is
16320 bits as another cap.  And if that is overcome, then we have limitation
of TYPE_PRECISION being 16-bit, so 65535 as maximum precision.  Perhaps
we could make TYPE_PRECISION dependent on BITINT_TYPE vs. others and use
32-bit precision in that case later.  Latest Clang/LLVM I think supports
on paper up to 8388608 bits, but is hardly usable even with much shorter
precisions.

Besides this hopefully temporary cap on supported precision and support
only on targets which buy into it, the support has the following limitations:

- _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd like
  to enable those incrementally, but don't really see details on how such
  bit-fields should be laid-out in memory nor passed inside of function
  arguments; LLVM implements something, but it is a question if that is what
  the various ABIs want

- conversions between large/huge (see later) _BitInt and _Decimal{32,64,128}
  aren't support and emit a sorry; I'm not familiar enough with DFP stuff
  to implement that

- _Complex _BitInt(N) isn't supported; again mainly because none of the psABIs
  mention how those should be passed/returned; in a limited way they are
  supported internally because the internal functions into which
  __builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE as a
  hack to return 2 values without using references/pointers

- vectors of _BitInt(N) aren't supported, both because psABIs don't specify
  how that works and because I'm not really sure it would be useful given
  lack of hw support for anything but bit-precise integers with the same
  bit precision as standard integer types

Because the bit-precise types have different behavior both in the C FE
(e.g. the lack of promotion) and do or can have different behavior in type
layout and function argument passing/returning values, the patch introduces
a new integral type, BITINT_TYPE, so various spots which explicitly check
for INTEGER_TYPE and not say INTEGRAL_TYPE_P macro need to be adjusted.
Also the assumption that all integral types have scalar integer type mode
is no longer true, larger BITINT_TYPEs have BLKmode type.

The patch makes 4 different categories of _BitInt depending on the target hook
decisions and their precision.  The x86-64 psABI says that _BitInt which fit
into signed/unsigned char, short, int, long and long long are laid out and
passed as those types (with padding bits undefined if they don't have mode
precision).  Such smallest precision bit-precise integer types are categorized
as small, the target hook gives for specific precision a scalar integral mode
where a single such mode contains all the bits.  Such small _BitInt types are
generally kept in the IL until expansion into RTL, with minor tweaks during
expansion to avoid relying on the padding bit values.  All larger precision
_BitInt types are supposed to be handled as structure containing an array
of limbs or so, where a limb has some integral mode (for libgcc purposes
best if it has word-size) and the limbs have either little or big endian
ordering in the array.  The padding bits in the most significant limb if any
are either undefined or should be always sign/zero extended (but support for 
this
isn't in yet, we don't know if any psABI will require it).  As mentioned in
some psABI proposals, while currently there is just one limb mode, if the limb
ordering would follow normal target endianity, there is always a possibility
to have two limb 

[Bug other/110831] New: [14 regression] gcc.dg/stack-check-3.c ICEs after r14-2822-g499b8079a6419b

2023-07-27 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110831

Bug ID: 110831
   Summary: [14 regression] gcc.dg/stack-check-3.c ICEs after
r14-2822-g499b8079a6419b
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:499b8079a6419bb8082de062ec30772296c6700c, r14-2822-g499b8079a6419b
make  -k check-gcc RUNTESTFLAGS="dg.exp=gcc.dg/stack-check-3.c"
FAIL: gcc.dg/stack-check-3.c (internal compiler error: in to_gcov_type, at
profile-count.h:831)
FAIL: gcc.dg/stack-check-3.c (test for excess errors)
FAIL: gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and probing
residuals" 7
FAIL: gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and probing
in loop" 4
FAIL: gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and probing
in rotated loop" 1
FAIL: gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and probing
inline" 1
FAIL: gcc.dg/stack-check-3.c scan-rtl-dump-times expand "skipped dynamic
allocation and probing loop" 1
# of unexpected failures7


spawn -ignore SIGHUP /home/seurer/gcc/git/build/gcc-test/gcc/xgcc
-B/home/seurer/gcc/git/build/gcc-test/gcc/ exceptions_enabled3672263.cc
-fdiagnostics-plain-output -Wno-complain-wrong-lang -S -o
exceptions_enabled3672263.s
FAIL: gcc.dg/stack-check-3.c (test for excess errors)
Excess errors:
during RTL pass: expand
dump file: stack-check-3.c.258r.expand
/home/seurer/gcc/git/gcc-test/gcc/testsuite/gcc.dg/stack-check-3.c:25:1:
internal compiler error: in to_gcov_type, at profile-count.h:831
0x10b3e7b7 profile_count::to_gcov_type() const
/home/seurer/gcc/git/gcc-test/gcc/profile-count.h:831
0x10b3e7b7 dump_prediction
/home/seurer/gcc/git/gcc-test/gcc/predict.cc:797
0x10b495bf combine_predictions_for_insn
/home/seurer/gcc/git/gcc-test/gcc/predict.cc:1039
0x10b495bf guess_outgoing_edge_probabilities(basic_block_def*)
/home/seurer/gcc/git/gcc-test/gcc/predict.cc:2356
0x11a4bec7 compute_outgoing_frequencies
/home/seurer/gcc/git/gcc-test/gcc/cfgbuild.cc:692
0x11a4bec7 find_many_sub_basic_blocks(simple_bitmap_def*)
/home/seurer/gcc/git/gcc-test/gcc/cfgbuild.cc:792
0x10520083 execute
/home/seurer/gcc/git/gcc-test/gcc/cfgexpand.cc:6933


commit 499b8079a6419bb8082de062ec30772296c6700c (HEAD)
Author: Jan Hubicka 
Date:   Thu Jul 27 15:57:54 2023 +0200

Fix profile_count::apply_probability

[committed] OpenMP/Fortran: Extend reject code between target + teams [PR71065, PR110725] (was: Re: [patch] OpenMP/Fortran: Reject declarations between target + teams (was: [Patch] OpenMP/Fortran: Rej

2023-07-27 Thread Tobias Burnus

Yet another omission, the flag was not properly set for deeply buried
'omp teams' as I stopped too early when walking up the stack.

Now fixed by commit r14-2826-g081e25d3cfd86c

* * *

This was found when 'repairing' the feature on the OG13
(devel/omp/gcc-13) branch for metadirectives, cf. the second attached
patch, applied after cherry-picking the mainline patch.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 081e25d3cfd86c4094999ded0bbe99b91762013c
Author: Tobias Burnus 
Date:   Thu Jul 27 18:14:11 2023 +0200

OpenMP/Fortran: Extend reject code between target + teams [PR71065, PR110725]

The previous version failed to diagnose when the 'teams' was nested
more deeply inside the target region, e.g. inside a DO or some
block or structured block.

PR fortran/110725
PR middle-end/71065

gcc/fortran/ChangeLog:

* openmp.cc (resolve_omp_target): Minor cleanup.
* parse.cc (decode_omp_directive): Find TARGET statement
also higher in the stack.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/teams-6.f90: Extend.

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 52eeaf2d4da..2952cd300ac 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -10666,15 +10666,14 @@ resolve_omp_target (gfc_code *code)
 
   if (!code->ext.omp_clauses->contains_teams_construct)
 return;
+  gfc_code *c = code->block->next;
   if (code->ext.omp_clauses->target_first_st_is_teams
-  && ((GFC_IS_TEAMS_CONSTRUCT (code->block->next->op)
-	   && code->block->next->next == NULL)
-	  || (code->block->next->op == EXEC_BLOCK
-	  && code->block->next->next
-	  && GFC_IS_TEAMS_CONSTRUCT (code->block->next->next->op)
-	  && code->block->next->next->next == NULL)))
+  && ((GFC_IS_TEAMS_CONSTRUCT (c->op) && c->next == NULL)
+	  || (c->op == EXEC_BLOCK
+	  && c->next
+	  && GFC_IS_TEAMS_CONSTRUCT (c->next->op)
+	  && c->next->next == NULL)))
 return;
-  gfc_code *c = code->block->next;
   while (c && !GFC_IS_TEAMS_CONSTRUCT (c->op))
 c = c->next;
   if (c)
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index aa6bb663def..e797402b59f 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -1318,32 +1318,27 @@ decode_omp_directive (void)
 case ST_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO:
 case ST_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
 case ST_OMP_TEAMS_LOOP:
-  if (gfc_state_stack->previous && gfc_state_stack->previous->tail)
-	{
-	  gfc_state_data *stk = gfc_state_stack;
-	  do {
-	   stk = stk->previous;
-	 } while (stk && stk->tail && stk->tail->op == EXEC_BLOCK);
-	  if (stk && stk->tail)
-	switch (stk->tail->op)
-	  {
-	  case EXEC_OMP_TARGET:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_SIMD:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO:
-	  case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TARGET_TEAMS_LOOP:
-	  case EXEC_OMP_TARGET_PARALLEL:
-	  case EXEC_OMP_TARGET_PARALLEL_DO:
-	  case EXEC_OMP_TARGET_PARALLEL_DO_SIMD:
-	  case EXEC_OMP_TARGET_PARALLEL_LOOP:
-	  case EXEC_OMP_TARGET_SIMD:
-		stk->tail->ext.omp_clauses->contains_teams_construct = 1;
-		break;
-	  default:
-	break;
-	  }
-	}
+  for (gfc_state_data *stk = gfc_state_stack->previous; stk;
+	   stk = stk->previous)
+	if (stk && stk->tail)
+	  switch (stk->tail->op)
+	{
+	case EXEC_OMP_TARGET:
+	case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE:
+	case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_SIMD:
+	case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO:
+	case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD:
+	case EXEC_OMP_TARGET_TEAMS_LOOP:
+	case EXEC_OMP_TARGET_PARALLEL:
+	case EXEC_OMP_TARGET_PARALLEL_DO:
+	case EXEC_OMP_TARGET_PARALLEL_DO_SIMD:
+	case EXEC_OMP_TARGET_PARALLEL_LOOP:
+	case EXEC_OMP_TARGET_SIMD:
+	  stk->tail->ext.omp_clauses->contains_teams_construct = 1;
+	  break;
+	default:
+	  break;
+	}
   break;
 case ST_OMP_ERROR:
   if (new_st.ext.omp_clauses->at != OMP_AT_EXECUTION)
diff --git a/gcc/testsuite/gfortran.dg/gomp/teams-6.f90 b/gcc/testsuite/gfortran.dg/gomp/teams-6.f90
index be453f27f40..0bd7735e738 100644
--- a/gcc/testsuite/gfortran.dg/gomp/teams-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/teams-6.f90
@@ -37,6 +37,16 @@ end block
   i = 5
   !$omp end teams
 !$omp end target
+
+
+!$omp target  ! { dg-error "OMP TARGET region at .1. with a nested TEAMS may not contain any other statement, declaration or directive outside of the single TEAMS construct" }
+block
+  do i = 5, 8
+!$omp teams
+block; end block
+ 

RE : Cfuture Manpower Hiring

2023-07-27 Thread Vinod Thomas


Hi,
I trust this email finds you well.
Our Organization hiring the best and the brightest talent in the industry. We 
hire individuals with a strong sense of pride in their performance, team 
spirit, and a desire to excel. To provide our clients with Professional, 
Quality and value added services ensuring customer delight, thus building a 
long term relationship rather than short term gains.
 Why you have to prefer us;
*TAT duration- Just 24 hours
*Deadline to close the position is one week(depends upon Client 
procedure)
*Availability - 6 days in a week, all available on call round the clock.
*Sources Access to the database from beginner to top management level

 Or service charges are as below;

A)  The professional fee will be calculated as a percentage of the 
incumbent's gross annual salary @ 8.33% on annual CTC which excludes GST.
 B) Payment should be made within 30 days from the date of submission of invoice
 C) Replacement of candidate who leave the organization within 90 days of 
joining
Thanks in advance. Assuring you the best of our efforts to begin a new 
relationship.
Would request you to revert with your confirmation which enables us to start 
the recruitment process.
We look forward to receiving your detailed job inquiry with specifications and 
other parameters to enable us to submit our suitable and competitive profiles.
Kind Regards,
Vinod Thomas
Bangalore

If 
you do not wish to receive future emails from us, please reply as "opt-out"




































[Bug c++/110828] union constexpr dtor not constexpr when used in member array

2023-07-27 Thread gccbugbjorn at fahller dot se via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110828

--- Comment #2 from Björn Fahller  ---
If I write it in the same way in a function, it compiles.

consteval auto f()
{
return S{}.f();
}

constexpr auto b = f();


However, if I break it into a constexpr object S s; and return s.f(), it does
not compile, this time because the construction of 's' fails because it refers
to an unititialized variable, regardless of whether the member is an array or
not.

union type {
constexpr type(){}
constexpr ~type() {}
int t;
};

struct S
{
constexpr S() {}
constexpr bool f() const { return true;}
type v{};
};

consteval auto f()
{
constexpr S s;
return s.f();
}

constexpr auto b = f();

https://godbolt.org/z/68zY6ecxs

[Bug middle-end/71065] Missing diagnostic for statements between OpenMP 'target' and 'teams'

2023-07-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71065

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Tobias Burnus :

https://gcc.gnu.org/g:081e25d3cfd86c4094999ded0bbe99b91762013c

commit r14-2826-g081e25d3cfd86c4094999ded0bbe99b91762013c
Author: Tobias Burnus 
Date:   Thu Jul 27 18:14:11 2023 +0200

OpenMP/Fortran: Extend reject code between target + teams [PR71065,
PR110725]

The previous version failed to diagnose when the 'teams' was nested
more deeply inside the target region, e.g. inside a DO or some
block or structured block.

PR fortran/110725
PR middle-end/71065

gcc/fortran/ChangeLog:

* openmp.cc (resolve_omp_target): Minor cleanup.
* parse.cc (decode_omp_directive): Find TARGET statement
also higher in the stack.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/teams-6.f90: Extend.

[Bug fortran/110725] [13/14 Regression] internal compiler error: in expand_expr_real_1, at expr.cc:10897

2023-07-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110725

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Tobias Burnus :

https://gcc.gnu.org/g:081e25d3cfd86c4094999ded0bbe99b91762013c

commit r14-2826-g081e25d3cfd86c4094999ded0bbe99b91762013c
Author: Tobias Burnus 
Date:   Thu Jul 27 18:14:11 2023 +0200

OpenMP/Fortran: Extend reject code between target + teams [PR71065,
PR110725]

The previous version failed to diagnose when the 'teams' was nested
more deeply inside the target region, e.g. inside a DO or some
block or structured block.

PR fortran/110725
PR middle-end/71065

gcc/fortran/ChangeLog:

* openmp.cc (resolve_omp_target): Minor cleanup.
* parse.cc (decode_omp_directive): Find TARGET statement
also higher in the stack.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/teams-6.f90: Extend.

[Bug target/110781] bpf: make use of the V4 long-range jump instruction (jal/gotol)

2023-07-27 Thread jemarch at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110781

Jose E. Marchesi  changed:

   What|Removed |Added

 Resolution|--- |MOVED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Jose E. Marchesi  ---
This is actually the assembler's business.  Moved to
https://sourceware.org/bugzilla/show_bug.cgi?id=30690

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2023-07-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #55642|0   |1
is obsolete||

--- Comment #91 from Jakub Jelinek  ---
Created attachment 55649
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55649=edit
gcc14-bitint.patch

Full patch including ChangeLog I'll submit after testing finishes.

[committed] libstdc++: Fix std::format alternate form for floating-point [PR108046]

2023-07-27 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk. Backport to gcc-13 to follow.

-- >8 --

A decimal point was being added to the end of the string for {:#.0}
because the __expc character was not being set, for the _Pres_none
presentation type, so __s.find(__expc) didn't the 'e' in "1e+01" and so
we created "1e+01." by appending the radix char to the end.

This can be fixed by ensuring that __expc='e' is set for the _Pres_none
case. I realized we can also set __expc='P' and __expc='E' when needed,
to save a call to std::toupper later.

For the {:#.0g} format, __expc='e' was being set and so the 'e' was
found in "1e+10" but then __z = __prec - __sigfigs would wraparound to
SIZE_MAX. That meant we would decide not to add a radix char because the
number of extra characters to insert would be 1+SIZE_MAX i.e. zero.

This can be fixed by using __z == 0 when __prec == 0.

libstdc++-v3/ChangeLog:

PR libstdc++/108046
* include/std/format (__formatter_fp::format): Ensure __expc is
always set for all presentation types. Set __z correctly for
zero precision.
* testsuite/std/format/functions/format.cc: Check problem cases.
---
 libstdc++-v3/include/std/format | 17 +
 .../testsuite/std/format/functions/format.cc|  4 
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 0c6069b2681..1e0ef612ddd 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1430,22 +1430,24 @@ namespace __format
  chars_format __fmt{};
  bool __upper = false;
  bool __trailing_zeros = false;
- char __expc = 0;
+ char __expc = 'e';
 
  switch (_M_spec._M_type)
  {
case _Pres_A:
  __upper = true;
+ __expc = 'P';
  [[fallthrough]];
case _Pres_a:
- __expc = 'p';
+ if (_M_spec._M_type != _Pres_A)
+   __expc = 'p';
  __fmt = chars_format::hex;
  break;
case _Pres_E:
  __upper = true;
+ __expc = 'E';
  [[fallthrough]];
case _Pres_e:
- __expc = 'e';
  __use_prec = true;
  __fmt = chars_format::scientific;
  break;
@@ -1455,10 +1457,10 @@ namespace __format
  break;
case _Pres_G:
  __upper = true;
+ __expc = 'E';
  [[fallthrough]];
case _Pres_g:
  __trailing_zeros = true;
- __expc = 'e';
  __use_prec = true;
  __fmt = chars_format::general;
  break;
@@ -1511,7 +1513,6 @@ namespace __format
{
  for (char* __p = __start; __p != __res.ptr; ++__p)
*__p = std::toupper(*__p);
- __expc = std::toupper(__expc);
}
 
  // Add sign for non-negative values.
@@ -1545,15 +1546,15 @@ namespace __format
  __p = __s.find(__expc);
  if (__p == __s.npos)
__p = __s.size();
- __d = __p;
+ __d = __p; // Position where '.' should be inserted.
  __sigfigs = __d;
}
 
- if (__trailing_zeros)
+ if (__trailing_zeros && __prec != 0)
{
  if (!__format::__is_xdigit(__s[0]))
--__sigfigs;
- __z = __prec - __sigfigs;
+ __z = __prec - __sigfigs; // Number of zeros to insert.
}
 
  if (size_t __extras = int(__d == __p) + __z)
diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc 
b/libstdc++-v3/testsuite/std/format/functions/format.cc
index 3485535e3cb..bd914df6d7c 100644
--- a/libstdc++-v3/testsuite/std/format/functions/format.cc
+++ b/libstdc++-v3/testsuite/std/format/functions/format.cc
@@ -152,6 +152,10 @@ test_alternate_forms()
 
   s = std::format("{:#.2g}", -0.0);
   VERIFY( s == "-0.0" );
+
+  // PR libstdc++/108046
+  s = std::format("{0:#.0} {0:#.1} {0:#.0g}", 10.0);
+  VERIFY( s == "1.e+01 1.e+01 1.e+01" );
 }
 
 struct euro_punc : std::numpunct
-- 
2.41.0



[Bug libstdc++/108046] The dot in the floating-point alternative form has wrong position

2023-07-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108046

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed|2022-12-10 00:00:00 |2023-07-27

--- Comment #4 from Jonathan Wakely  ---
Fixed on trunk so far.

[Bug libstdc++/108046] The dot in the floating-point alternative form has wrong position

2023-07-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108046

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:50bc490c090cc95175e6068ed7438788d7fd7040

commit r14-2825-g50bc490c090cc95175e6068ed7438788d7fd7040
Author: Jonathan Wakely 
Date:   Thu Jul 27 14:07:09 2023 +0100

libstdc++: Fix std::format alternate form for floating-point [PR108046]

A decimal point was being added to the end of the string for {:#.0}
because the __expc character was not being set, for the _Pres_none
presentation type, so __s.find(__expc) didn't the 'e' in "1e+01" and so
we created "1e+01." by appending the radix char to the end.

This can be fixed by ensuring that __expc='e' is set for the _Pres_none
case. I realized we can also set __expc='P' and __expc='E' when needed,
to save a call to std::toupper later.

For the {:#.0g} format, __expc='e' was being set and so the 'e' was
found in "1e+10" but then __z = __prec - __sigfigs would wraparound to
SIZE_MAX. That meant we would decide not to add a radix char because the
number of extra characters to insert would be 1+SIZE_MAX i.e. zero.

This can be fixed by using __z == 0 when __prec == 0.

libstdc++-v3/ChangeLog:

PR libstdc++/108046
* include/std/format (__formatter_fp::format): Ensure __expc is
always set for all presentation types. Set __z correctly for
zero precision.
* testsuite/std/format/functions/format.cc: Check problem cases.

[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e

2023-07-27 Thread shaohua.li at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172

--- Comment #52 from Shaohua Li  ---
*** Bug 107257 has been marked as a duplicate of this bug. ***

[Bug target/107257] [13 Regression] Wrong code at -O2 on x86_64-linux-gnu since r13-857-gf1652e3343b1ec47

2023-07-27 Thread shaohua.li at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107257

Shaohua Li  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|WAITING |RESOLVED

--- Comment #9 from Shaohua Li  ---
Sorry, this is indeed a dup.

*** This bug has been marked as a duplicate of bug 107172 ***

Re: [PATCH 0/5] Recognize Zicond extension

2023-07-27 Thread Jeff Law via Gcc-patches




On 7/27/23 02:43, Xiao Zeng wrote:



2. According to your opinions, I have modified the code, but out of caution
for upstream, I conducted a complete regression tests on patch V2, which took
some time. I was unable to reply to emails and upload patch V2 in a timely 
manner.
Sorry to have wasted your time -- zicond/xventanacondops has lingered 
for quite a while and I had a bit of free time yesterday.  I felt it was 
most useful to try and move this stuff forward.






3 After you and other maintainers made minor modifications to my patch[1/5]
and patch[2/5], it has been merged into the master, so I will no longer upload 
patch V2.

Agreed.



4 patch[1/5] and patch[2/5], which have been merged into the master, have only
completed basic support for Zicond, and further optimization work needs to be
completed. These further optimization reactions are reflected in my patch[3/5]
patch[4/5] and patch[5/5].

Agreed.



5 As you mentioned in your previous email 
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625427.html
"eswincomputing and ventana can both reduce our divergence from the trunk
and work together on the rest of the bits...". I will reorganize patch[3/5] 
patch[4/5]
and patch[5/5], provide more detailed explanations, and submit them as an 
alternative
solution for further optimization of Zicond.

Does that work for you?
I'm going to look at 3/5 today pretty closely.  Exposing zicond to 
movcc is something we had implemented inside Ventana and I want to 
compare/contrast your work with ours.


What I like about yours is it keeps all the logic in riscv.cc rather 
than scattering it across riscv.cc and riscv.md.  What I like about the 
internal Ventana bits is its ability to support arbitrary comparisons by 
utilizing sCC if the original is not an eq/ne comparison.


Ideally we'll be able to get the best of both.

Jeff



[Bug c++/110809] ICE: in unify, at cp/pt.cc:25226 with floating-point NTTPs

2023-07-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110809

--- Comment #9 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Patrick Palka
:

https://gcc.gnu.org/g:8e811edea309b2097e23cde48ee6fb6467a9094d

commit r13-7614-g8e811edea309b2097e23cde48ee6fb6467a9094d
Author: Patrick Palka 
Date:   Wed Jul 26 16:52:13 2023 -0400

c++: unifying REAL_CSTs [PR110809]

This teaches unify how to compare two REAL_CSTs.

PR c++/110809

gcc/cp/ChangeLog:

* pt.cc (unify) : Generalize to handle
REAL_CST as well.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-float3.C: New test.

(cherry picked from commit 744e1f35266dbd6b6fb95c7e8422562815f8b56f)

[Bug c++/110828] union constexpr dtor not constexpr when used in member array

2023-07-27 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110828

Patrick Palka  changed:

   What|Removed |Added

 CC||ppalka at gcc dot gnu.org

--- Comment #1 from Patrick Palka  ---
Does it work if you move the static_assert into a function scope? If so then
this is probably a dup of PR85944.

Fix profile update in tree_transform_and_unroll_loop

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi,
This patch fixes profile update in tree_transform_and_unroll_loop which is used
by predictive comming.  I stared by attempt to fix
gcc.dg/tree-ssa/update-unroll-1.c I xfailed last week, but it turned to be
harder job.

Unrolling was never fixed for changes in duplicate_loop_body_to_header_edge
which is now smarter on getting profile right when some exists are eliminated.
A lot of manual profile can thus now be done using existing infrastructure.

I also noticed that scale_dominated_blocks_in_loop does job identical
to loop I wrote in scale_loop_profile and thus I commonized the implementaiton
and removed recursion.

I also extended duplicate_loop_body_to_header_edge to handle flat profiles same
way as we do in vectorizer. Without it we end up with less then 0 iteration
count in gcc.dg/tree-ssa/update-unroll-1.c (it is unrolled 32times but predicted
to iterated fewer times) and added missing code to update loop_info.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* cfgloopmanip.cc (scale_dominated_blocks_in_loop): Move here from
tree-ssa-loop-manip.cc and avoid recursion.
(scale_loop_profile): Use scale_dominated_blocks_in_loop.
(duplicate_loop_body_to_header_edge): Add DLTHE_FLAG_FLAT_PROFILE
flag.
* cfgloopmanip.h (DLTHE_FLAG_FLAT_PROFILE): Define.
(scale_dominated_blocks_in_loop): Declare.
* predict.cc (dump_prediction): Do not ICE on uninitialized probability.
(change_edge_frequency): Remove.
* predict.h (change_edge_frequency): Remove.
* tree-ssa-loop-manip.cc (scale_dominated_blocks_in_loop): Move to
cfgloopmanip.cc.
(niter_for_unrolled_loop): Remove.
(tree_transform_and_unroll_loop): Fix profile update.

gcc/testsuite/ChangeLog:

* gcc.dg/pr102385.c: Check for no profile mismatches.
* gcc.dg/pr96931.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-1.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-2.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-3.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-4.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-5.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-7.c: Check for one profile mismatch.
* gcc.dg/tree-ssa/predcom-8.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-1.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-10.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-11.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-12.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-2.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-3.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-4.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-5.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-6.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-7.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-8.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/predcom-dse-9.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/update-unroll-1.c: Unxfail.

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 3012a8d60f7..c3d292d0dd4 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -499,6 +499,32 @@ scale_loop_frequencies (class loop *loop, 
profile_probability p)
   free (bbs);
 }
 
+/* Scales the frequencies of all basic blocks in LOOP that are strictly
+   dominated by BB by NUM/DEN.  */
+
+void
+scale_dominated_blocks_in_loop (class loop *loop, basic_block bb,
+   profile_count num, profile_count den)
+{
+  basic_block son;
+
+  if (!den.nonzero_p () && !(num == profile_count::zero ()))
+return;
+  auto_vec  worklist;
+  worklist.safe_push (bb);
+
+  while (!worklist.is_empty ())
+for (son = first_dom_son (CDI_DOMINATORS, worklist.pop ());
+son;
+son = next_dom_son (CDI_DOMINATORS, son))
+  {
+   if (!flow_bb_inside_loop_p (loop, son))
+ continue;
+   son->count = son->count.apply_scale (num, den);
+   worklist.safe_push (son);
+  }
+}
+
 /* Scale profile in LOOP by P.
If ITERATION_BOUND is not -1, scale even further if loop is predicted
to iterate too many times.
@@ -649,19 +675,9 @@ scale_loop_profile (class loop *loop, profile_probability 
p,
   if (other_edge && other_edge->dest == loop->latch)
loop->latch->count -= new_exit_count - old_exit_count;
   else
-   {
- basic_block *body = get_loop_body (loop);
- profile_count new_count = exit_edge->src->count - new_exit_count;
- profile_count old_count = exit_edge->src->count - old_exit_count;
-
- for (unsigned int i = 

Fix profile update in tree-ssa-loop-im.cc

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi,
this fixes two bugs in tree-ssa-loop-im.cc.  First is that cap probability is 
not
reliable, but it is constructed with adjusted quality.  Second is that sometimes
the conditional has wrong joiner BB count.  This is visible on
testsuite/gcc.dg/pr102385.c however the testcase triggers another profile
update bug in pcom, so I will update it in followup patch.

gcc/ChangeLog:

* tree-ssa-loop-im.cc (execute_sm_if_changed): Turn cap probability
to guessed; fix count of new_bb.

diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index f5b01e986ae..268f466bdc9 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -2059,7 +2059,8 @@ execute_sm_if_changed (edge ex, tree mem, tree tmp_var, 
tree flag,
nbbs++;
 }
 
-  profile_probability cap = profile_probability::always ().apply_scale (2, 3);
+  profile_probability cap
+ = profile_probability::guessed_always ().apply_scale (2, 3);
 
   if (flag_probability.initialized_p ())
 ;
@@ -2103,6 +2104,8 @@ execute_sm_if_changed (edge ex, tree mem, tree tmp_var, 
tree flag,
 
   old_dest = ex->dest;
   new_bb = split_edge (ex);
+  if (append_cond_position)
+new_bb->count += last_cond_fallthru->count ();
   then_bb = create_empty_bb (new_bb);
   then_bb->count = new_bb->count.apply_probability (flag_probability);
   if (irr)


Fix profile_count::apply_probability

2023-07-27 Thread Jan Hubicka via Gcc-patches
Hi,
profile_count::apply_probability misses check for uninitialized
probability which leads to completely random results on applying
uninitialized probability to initialized scale.  This can make
difference when i.e. inlining -fno-guess-branch-probability function to
-fguess-branch-probability one.

Boootstrapped/regtested x86_64-linux, commited.
gcc/ChangeLog:

* profile-count.h (profile_count::apply_probability): Fix
handling of uninitialized probabilities, optimize scaling
by probability 1.

diff --git a/gcc/profile-count.h b/gcc/profile-count.h
index bf1136782a3..e860c5db540 100644
--- a/gcc/profile-count.h
+++ b/gcc/profile-count.h
@@ -1129,11 +1132,11 @@ public:
   /* Scale counter according to PROB.  */
   profile_count apply_probability (profile_probability prob) const
 {
-  if (*this == zero ())
+  if (*this == zero () || prob == profile_probability::always ())
return *this;
   if (prob == profile_probability::never ())
return zero ();
-  if (!initialized_p ())
+  if (!initialized_p () || !prob.initialized_p ())
return uninitialized ();
   profile_count ret;
   uint64_t tmp;


[Bug rtl-optimization/91838] [8/9 Regression] incorrect use of shr and shrx to shift by 64, missed optimization of vector shift

2023-07-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #20 from Richard Biener  ---
The testcase is again fixed in GCC 14.

[Bug gcov-profile/110827] C++20 coroutines aren't being measured by gcov

2023-07-27 Thread mwd at md5i dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110827

--- Comment #3 from Michael Duggan  ---
(In reply to Richard Biener from comment #1)
> I'm seeing all code properly instrumented.  The coverage I see is
> 
> -:1:#include 
> -:2:#include 
> -:3:
> -:4:struct task {
> -:5:  struct promise {
> -:6:using handle_t = std::coroutine_handle;
> 1:7:task get_return_object() {
> 1:8:  return task{handle_t::from_promise(*this)};
> -:9:}
> 1:   10:std::suspend_never initial_suspend() noexcept { return
> {}; }
> 1:   11:std::suspend_always final_suspend() noexcept { return
> {}; }
> #:   12:void unhandled_exception() { std::terminate(); }
> 1:   13:void return_void() noexcept {}
> -:   14:friend task;
> -:   15:  };
> -:   16:  using promise_type = promise;
> 1:   17:  task(promise_type::handle_t handle) : handle_{handle} {}
> 1:   18:  ~task() {
> 1:   19:if (handle_) {
> 1:   20:  handle_.destroy();
> -:   21:}
> 1:   22:  }
> -:   23: private:
> -:   24:  promise_type::handle_t handle_;
> -:   25:};
> -:   26:
> 1:   27:task foo() {
> -:   28:  std::cout << "Running..." << std::endl;
> -:   29:  co_return;
> 2:   30:}
> -:   31:
> 1:   32:int main(int argc, char **argv) {
> 1:   33:  foo();
> 1:   34:  return 0;
> -:   35:}
> 
> I have no idea why for example line 28 isn't marked executed.

The point is that no matter what is put in the coroutine, foo, nothing within
the coroutine will ever be marked as having been run.

[Bug rtl-optimization/91838] [8/9 Regression] incorrect use of shr and shrx to shift by 64, missed optimization of vector shift

2023-07-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838

--- Comment #19 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:d1c072a1c3411a6fe29900750b38210af8451eeb

commit r14-2821-gd1c072a1c3411a6fe29900750b38210af8451eeb
Author: Richard Biener 
Date:   Thu Jul 27 13:08:32 2023 +0200

tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C

The following fixes the lack of simplification of a vector shift
by an out-of-bounds shift value.  For scalars this is done both
by CCP and VRP but vectors are not handled there.  This results
in PR91838 differences in outcome dependent on whether a vector
shift ISA is available and thus vector lowering does or does not
expose scalar shifts here.

The following adds a match.pd pattern to catch uniform out-of-bound
shifts, simplifying them to zero when not sanitizing shift amounts.

PR tree-optimization/91838
* gimple-match-head.cc: Include attribs.h and asan.h.
* generic-match-head.cc: Likewise.
* match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.

[Bug target/110788] Spilling to mask register for GPR vec_duplicate

2023-07-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788

--- Comment #4 from Hongtao.liu  ---

> kmovw   %edx, %k0
> vpbroadcastmw2d %k0, %xmm1
> 
> instead of
> 
> vpbroadcastw%edx, %xmm1
> 

It's not vpbroadcastw, it's
   movzwl  %dx, %ecx
   vpbroadcastd%ecx, %xmm0.

And non-kmask version should be better.

[Bug tree-optimization/92335] [11/12/13/14 Regression] sinking of loads happen too early which causes vectorization not to be done

2023-07-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92335

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #7 from Richard Biener  ---
I have a patch to fix this.

[Bug tree-optimization/64031] (un-)conditional execution state is not preserved by PRE/sink

2023-07-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64031

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Richard Biener  ---
This is now fixed in GCC 14.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2023-07-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 64031, which changed state.

Bug 64031 Summary: (un-)conditional execution state is not preserved by PRE/sink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64031

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

Re: Calling convention for Intel APX extension

2023-07-27 Thread Michael Matz via Gcc
Hey,

On Thu, 27 Jul 2023, Thomas Koenig via Gcc wrote:

> Intel recommends to have the new registers as caller-saved for
> compatibility with current calling conventions.  If I understand this
> correctly, this is required for exception unwinding, but not if the
> function called is __attribute__((nothrow)).

That's not the full truth.  It's not (only) exception handling but also 
context switching via setjmp/longjmp and make/get/setcontext.

The data structures for that are part of the ABI unfortunately, and can't 
be assumed to be extensible (as Florian says, for glibc there maybe be 
hacks (or maybe not) on x86-64.  Some other archs implemented 
extensibility from the outset).  So all registers (and register parts!) 
added after the initial psABI is defined usually _have_ to be 
call-clobbered.

> Since Fortran tends to use a lot of registers for its array descriptors,
> and also tends to call nothrow functions (all Fortran functions, and
> all Fortran intrinsics, such as sin/cos/etc) a lot, it could profit from
> making some of the new registers callee-saved, to save some spills
> at function calls.

I've recently submitted a patch that adds some attributes that basically 
say "these-and-those regs aren't clobbered by this function" (I did them 
for not clobbered xmm8-15).  Something similar could be used for the new 
GPRs as well.  Then it would be a matter of ensuring that the interesting 
functions are marked with that attributes (and then of course do the 
necessary call-save/restore).


Ciao,
Michael.


[PATCH] [x86] Add UNSPEC_MASKOP to vpbroadcastm pattern.

2023-07-27 Thread liuhongt via Gcc-patches
Prevent rtl optimization of vec_duplicate + zero_extend to
vpbroadcastm since there could be an extra kmov after RA.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ready to push to trunk.

gcc/ChangeLog:

PR target/110788
* config/i386/sse.md (avx512cd_maskb_vec_dup): Add
UNSPEC_MASKOP.
(avx512cd_maskw_vec_dup: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110788.c: New test.
---
 gcc/config/i386/sse.md   |  8 ++--
 gcc/testsuite/gcc.target/i386/pr110788.c | 11 +++
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr110788.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 35fd66ed4aa..51961bbfc0b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -26778,11 +26778,14 @@ (define_insn 
"avx512dq_broadcast_1"
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
+;; Use unspec to prevent rtl optimizer to optimize zero_extend + vec_duplicate
+;; to pbroadcastm, there could be an extra kmov after RA.
 (define_insn "avx512cd_maskb_vec_dup"
   [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v")
(vec_duplicate:VI8_AVX512VL
  (zero_extend:DI
-   (match_operand:QI 1 "register_operand" "k"]
+   (match_operand:QI 1 "register_operand" "k"
+   (unspec [(const_int 0)] UNSPEC_MASKOP)]
   "TARGET_AVX512CD"
   "vpbroadcastmb2q\t{%1, %0|%0, %1}"
   [(set_attr "type" "mskmov")
@@ -26793,7 +26796,8 @@ (define_insn "avx512cd_maskw_vec_dup"
   [(set (match_operand:VI4_AVX512VL 0 "register_operand" "=v")
(vec_duplicate:VI4_AVX512VL
  (zero_extend:SI
-   (match_operand:HI 1 "register_operand" "k"]
+   (match_operand:HI 1 "register_operand" "k"
+   (unspec [(const_int 0)] UNSPEC_MASKOP)]
   "TARGET_AVX512CD"
   "vpbroadcastmw2d\t{%1, %0|%0, %1}"
   [(set_attr "type" "mskmov")
diff --git a/gcc/testsuite/gcc.target/i386/pr110788.c 
b/gcc/testsuite/gcc.target/i386/pr110788.c
new file mode 100644
index 000..4cf1676ccb6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110788.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=cascadelake --param vect-partial-vector-usage=2" } 
*/
+/* { dg-final { scan-assembler-not "vpbroadcastm" } } */
+
+double a[1024], b[1024];
+
+void foo (int n)
+{
+  for (int i = 0; i < n; ++i)
+a[i] = b[i] * 3.;
+}
-- 
2.39.1.388.g2fc9e9ca3c



  1   2   3   4   >