[PATCH] RISC-V: Remove vxrm parameter for vsadd[u] and vssub[u]
From: xuli Computation of `vsadd`, `vsaddu`, `vssub`, and `vssubu` do not need the rounding mode, therefore the intrinsics of these instructions do not have the parameter for rounding mode control. gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: remove rounding mode of vsadd[u] and vssub[u]. * config/riscv/vector.md: Ditto. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/base/bug-12.C: Adapt testcase. * g++.target/riscv/rvv/base/bug-14.C: Ditto. * g++.target/riscv/rvv/base/bug-18.C: Ditto. * g++.target/riscv/rvv/base/bug-19.C: Ditto. * g++.target/riscv/rvv/base/bug-20.C: Ditto. * g++.target/riscv/rvv/base/bug-21.C: Ditto. * g++.target/riscv/rvv/base/bug-22.C: Ditto. * g++.target/riscv/rvv/base/bug-23.C: Ditto. * g++.target/riscv/rvv/base/bug-3.C: Ditto. * g++.target/riscv/rvv/base/bug-8.C: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-100.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-101.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-102.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-103.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-104.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-105.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-106.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-107.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-108.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-109.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-110.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-111.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-112.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-113.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-114.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-115.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-116.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-117.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-118.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-119.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-97.c: Ditto. * gcc.target/riscv/rvv/base/binop_vx_constraint-98.c: Ditto. * gcc.target/riscv/rvv/base/merge_constraint-1.c: Ditto. * gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c: New test. * gcc.target/riscv/rvv/base/fixed-point-vxrm.c: New test. --- .../riscv/riscv-vector-builtins-bases.cc | 6 -- gcc/config/riscv/vector.md| 42 +++--- .../g++.target/riscv/rvv/base/bug-12.C| 2 +- .../g++.target/riscv/rvv/base/bug-14.C| 2 +- .../g++.target/riscv/rvv/base/bug-18.C| 2 +- .../g++.target/riscv/rvv/base/bug-19.C| 2 +- .../g++.target/riscv/rvv/base/bug-20.C| 2 +- .../g++.target/riscv/rvv/base/bug-21.C| 2 +- .../g++.target/riscv/rvv/base/bug-22.C| 2 +- .../g++.target/riscv/rvv/base/bug-23.C| 2 +- .../g++.target/riscv/rvv/base/bug-3.C | 2 +- .../g++.target/riscv/rvv/base/bug-8.C | 2 +- .../riscv/rvv/base/binop_vx_constraint-100.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-101.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-102.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-103.c | 28 +++ .../riscv/rvv/base/binop_vx_constraint-104.c | 16 ++-- .../riscv/rvv/base/binop_vx_constraint-105.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-106.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-107.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-108.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-109.c | 28 +++ .../riscv/rvv/base/binop_vx_constraint-110.c | 16 ++-- .../riscv/rvv/base/binop_vx_constraint-111.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-112.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-113.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-114.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-115.c | 16 ++-- .../riscv/rvv/base/binop_vx_constraint-116.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-117.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-118.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-119.c | 4 +- .../riscv/rvv/base/binop_vx_constraint-97.c | 28 +++ .../riscv/rvv/base/binop_vx_constraint-98.c | 16 ++-- .../riscv/rvv/base/fixed-point-vxrm-error.c | 24 ++ .../riscv/rvv/base/fixed-point-vxrm.c | 81 +++ .../riscv/rvv/base/merge_constraint-1.c | 4 +- 37 files changed, 233 insertions(+), 152 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/fixed-point-vxrm.c diff --git
[Bug sanitizer/110835] [13/14 Regression] -fsanitize=address causes huge runtime slowdown from std::rethrow_exception not called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835 --- Comment #5 from Andrew Pinski --- (In reply to Andrew Pinski from comment #3) > (In reply to Andrew Pinski from comment #2) > > Which might mean it is an issue in LLVM too ... > > Yes the same runtime regression shows up between clang 15 and clang 16. This > should reported upstream to them too. What is interesting is that with -stdlib=libc++ the regression for clang/LLVM shows up between their 14 and 15 releases. Anyways this should be filed upstream ...
[Bug sanitizer/110835] [13/14 Regression] -fsanitize=address causes huge runtime slowdown from std::rethrow_exception not called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835 Andrew Pinski changed: What|Removed |Added Summary|-fsanitize=address causes |[13/14 Regression] |slowdown from |-fsanitize=address causes |std::rethrow_exception not |huge runtime slowdown from |called |std::rethrow_exception not ||called Target Milestone|--- |13.3 Last reconfirmed||2023-07-28 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #4 from Andrew Pinski --- Confirmed.
[Bug sanitizer/110835] -fsanitize=address causes slowdown from std::rethrow_exception not called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835 --- Comment #3 from Andrew Pinski --- (In reply to Andrew Pinski from comment #2) > Which might mean it is an issue in LLVM too ... Yes the same runtime regression shows up between clang 15 and clang 16. This should reported upstream to them too.
[Bug sanitizer/110835] -fsanitize=address causes slowdown from std::rethrow_exception not called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835 Andrew Pinski changed: What|Removed |Added CC||dodji at gcc dot gnu.org, ||dvyukov at gcc dot gnu.org, ||jakub at gcc dot gnu.org, ||kcc at gcc dot gnu.org, ||marxin at gcc dot gnu.org Component|c++ |sanitizer --- Comment #2 from Andrew Pinski --- The code generation does not look any difference ... So I am suspecting this was a library change. Which might mean it is an issue in LLVM too ...
Re: LRA for avr: Handling hard regs set directly at expand
On Thu, 2023-07-27 at 15:11 +0200, Georg-Johann Lay wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the > content is safe > > Am 17.07.23 um 13:33 schrieb SenthilKumar.Selvaraj--- via Gcc: > > Hi, > > > >The avr target has a bunch of patterns that directly set hard regs at > > expand time, like so > > The correct approach would be to use usual predicates together with > constraints that describe the register instead of hard regs, e.g. > (match_operand:HI n "register_operand" "R18_2") for a 2-byte register > that starts at R18 instead of (reg:HI 18). I deprecated and removed > constraints starting with "R" long ago in order to get "R" free for that > purpose. > > Some years ago I tried such constraints (and hence also zoo of new > register classes that are required to accommodate them). The resulting > code quality was so bad that I quickly abandoned that approach, and IIRC > there were also spill fails. Appears that reload / ira was overwhelmed > by the multitude of new reg classes and took sub-optimal decisions. > > The way out was more of explicit hard regs in expand, together with > awkward functionalities like avr_fix_operands (PR63633) and the > functions that use it. That way we get correct code without performance > penalties in unrelated places. > > Most of such insns are explicitly modelling hand-written asm functions > in libgcc, because most of these functions have a footprint smaller than > the default ABI. And some functions have an interface not complying to > default ABI. > > For the case of cpymem etc from below, explicit hard registers were used > because register allocator did a bad job when using constraints like "e" > (X, Y, or Z). I guessed that much. Yes, using constraints works - I used "x" and "z" that directly correspond to REG_X and REG_Z (ignore the weird operand numbering). diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index be0f8dcbe0e..6c6c4e4e212 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -1148,20 +1148,20 @@ ;; "cpymem_qi" ;; "cpymem_hi" (define_insn_and_split "cpymem_" - [(set (mem:BLK (reg:HI REG_X)) -(mem:BLK (reg:HI REG_Z))) + [(set (mem:BLK (match_operand:HI 3 "register_operand" "+x")) +(mem:BLK (match_operand:HI 4 "register_operand" "+z"))) (unspec [(match_operand:QI 0 "const_int_operand" "n")] UNSPEC_CPYMEM) (use (match_operand:QIHI 1 "register_operand" "")) - (clobber (reg:HI REG_X)) - (clobber (reg:HI REG_Z)) + (clobber (match_dup 3)) + (clobber (match_dup 4)) (clobber (reg:QI LPM_REGNO)) (clobber (match_operand:QIHI 2 "register_operand" "=1"))] "" "#" "&& reload_completed" - [(parallel [(set (mem:BLK (reg:HI REG_X)) - (mem:BLK (reg:HI REG_Z))) + [(parallel [(set (mem:BLK (match_dup 3)) + (mem:BLK (match_dup 4))) (unspec [(match_dup 0)] UNSPEC_CPYMEM) (use (match_dup 1)) I know you did these changes a long time ago, but do you happen to have any test cases lying around that I can use to see if LRA does a better job than classic reload? Vladimir, given that classic reload handled such hardcoded hard regs just fine, should LRA also be able to deal with them the same way? Or is this something that LRA is not going to support? Regards Senthil > > Johann > > > > (define_expand "cpymemhi" > >[(parallel [(set (match_operand:BLK 0 "memory_operand" "") > > (match_operand:BLK 1 "memory_operand" "")) > >(use (match_operand:HI 2 "const_int_operand" "")) > >(use (match_operand:HI 3 "const_int_operand" ""))])] > >"" > >{ > > if (avr_emit_cpymemhi (operands)) > >DONE; > > > > FAIL; > >}) > > > > where avr_emit_cpymemhi generates > > > > (insn 14 13 15 4 (set (reg:HI 30 r30) > > (reg:HI 48 [ ivtmp.10 ])) "pr53505.c":21:22 -1 > > (nil)) > > (insn 15 14 16 4 (set (reg:HI 26 r26) > > (reg/f:HI 38 virtual-stack-vars)) "pr53505.c":21:22 -1 > > (nil)) > > (insn 16 15 17 4 (parallel [ > > (set (mem:BLK (reg:HI 26 r26) [0 A8]) > > (mem:BLK (reg:HI 30 r30) [0 A8])) > > (unspec [ > > (const_int 0 [0]) > > ] UNSPEC_CPYMEM) > > (use (reg:QI 52)) > > (clobber (reg:HI 26 r26)) > > (clobber (reg:HI 30 r30)) > > (clobber (reg:QI 0 r0)) > > (clobber (reg:QI 52)) > > ]) "pr53505.c":21:22 -1 > > (nil)) > > > > Classic reload knows about these - find_reg masks out bad_spill_regs, and > > bad_spill_regs > > when ORed with chain->live_throughout in order_regs_for_reload picks up r30. > > > > LRA, however, appears to not consider that, and proceeds to use such regs > > as reload regs. > > For the same source, it generates > > > > Choosing alt 0 in insn 15: (0) =r (1) r
[Bug c++/110835] -fsanitize=address causes slowdown from std::rethrow_exception not called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835 --- Comment #1 from Ed Catmur --- Motivation is https://github.com/boostorg/exception/blob/b039b4ea18ef752d0c1684b3f715ce493b778060/include/boost/exception/detail/exception_ptr.hpp#L550 ; the half-reduced code is: #include struct S {}; int main() { auto ep = boost::copy_exception(S()); for (int i = 0; i != 10; ++i) try { boost::rethrow_exception(ep); } catch (...) {} }
[Bug c++/110835] New: -fsanitize=address causes slowdown from std::rethrow_exception not called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835 Bug ID: 110835 Summary: -fsanitize=address causes slowdown from std::rethrow_exception not called Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: ed at catmur dot uk Target Milestone: --- #include std::exception_ptr p; void f() { try { throw 1; } catch(char) { std::rethrow_exception(p); } } int main() { for (int i = 0; i != 10; ++i) try { f(); } catch (...) { } } Compiled with -fsanitize=address (and at -O0 through -O3), this is roughly 30x slower under gcc 13 than under gcc 12 (4.7s vs 0.15s on my Core i7 3 GHz). Note that the std::rethrow_exception() is not called, but is still essential to exhibit the bug. Also `f` needs to be a separate function (and not `static`). At low optimization levels it can be an iife.
[PATCH v3 2/2] libstdc++: Use _GLIBCXX_HAS_BUILTIN_TRAIT
This patch uses _GLIBCXX_HAS_BUILTIN_TRAIT macro instead of __has_builtin in the type_traits header. This macro supports to toggle the use of built-in traits in the type_traits header through _GLIBCXX_NO_BUILTIN_TRAITS macro, without needing to modify the source code. libstdc++-v3/ChangeLog: * include/std/type_traits (__has_builtin): Replace with ... (_GLIBCXX_HAS_BUILTIN): ... this. Signed-off-by: Ken Matsui --- libstdc++-v3/include/std/type_traits | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/libstdc++-v3/include/std/type_traits b/libstdc++-v3/include/std/type_traits index 9f086992ebc..12423361b6e 100644 --- a/libstdc++-v3/include/std/type_traits +++ b/libstdc++-v3/include/std/type_traits @@ -1411,7 +1411,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : public __bool_constant<__is_base_of(_Base, _Derived)> { }; -#if __has_builtin(__is_convertible) +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_convertible) template struct is_convertible : public __bool_constant<__is_convertible(_From, _To)> @@ -1462,7 +1462,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #if __cplusplus >= 202002L #define __cpp_lib_is_nothrow_convertible 201806L -#if __has_builtin(__is_nothrow_convertible) +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_nothrow_convertible) /// is_nothrow_convertible_v template inline constexpr bool is_nothrow_convertible_v @@ -1537,7 +1537,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { using type = _Tp; }; /// remove_cv -#if __has_builtin(__remove_cv) +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_cv) template struct remove_cv { using type = __remove_cv(_Tp); }; @@ -1606,7 +1606,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // Reference transformations. /// remove_reference -#if __has_builtin(__remove_reference) +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_reference) template struct remove_reference { using type = __remove_reference(_Tp); }; @@ -2963,7 +2963,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template(_S_get())), typename = decltype(_S_conv<_Tp>(_S_get())), -#if __has_builtin(__reference_converts_from_temporary) +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__reference_converts_from_temporary) bool _Dangle = __reference_converts_from_temporary(_Tp, _Res_t) #else bool _Dangle = false @@ -3420,7 +3420,7 @@ template */ #define __cpp_lib_remove_cvref 201711L -#if __has_builtin(__remove_cvref) +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_cvref) template struct remove_cvref { using type = __remove_cvref(_Tp); }; @@ -3515,7 +3515,7 @@ template : public bool_constant> { }; -#if __has_builtin(__is_layout_compatible) +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_layout_compatible) /// @since C++20 template @@ -3529,7 +3529,7 @@ template constexpr bool is_layout_compatible_v = __is_layout_compatible(_Tp, _Up); -#if __has_builtin(__builtin_is_corresponding_member) +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__builtin_is_corresponding_member) #define __cpp_lib_is_layout_compatible 201907L /// @since C++20 @@ -3540,7 +3540,7 @@ template #endif #endif -#if __has_builtin(__is_pointer_interconvertible_base_of) +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_pointer_interconvertible_base_of) /// True if `_Derived` is standard-layout and has a base class of type `_Base` /// @since C++20 template @@ -3554,7 +3554,7 @@ template constexpr bool is_pointer_interconvertible_base_of_v = __is_pointer_interconvertible_base_of(_Base, _Derived); -#if __has_builtin(__builtin_is_pointer_interconvertible_with_class) +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__builtin_is_pointer_interconvertible_with_class) #define __cpp_lib_is_pointer_interconvertible 201907L /// True if `__mp` points to the first member of a standard-layout type @@ -3590,8 +3590,8 @@ template template inline constexpr bool is_scoped_enum_v = is_scoped_enum<_Tp>::value; -#if __has_builtin(__reference_constructs_from_temporary) \ - && __has_builtin(__reference_converts_from_temporary) +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__reference_constructs_from_temporary) \ + && _GLIBCXX_HAS_BUILTIN_TRAIT(__reference_converts_from_temporary) #define __cpp_lib_reference_from_temporary 202202L @@ -3632,7 +3632,7 @@ template template inline constexpr bool reference_converts_from_temporary_v = reference_converts_from_temporary<_Tp, _Up>::value; -#endif // __has_builtin for reference_from_temporary +#endif // _GLIBCXX_HAS_BUILTIN_TRAIT for reference_from_temporary #endif // C++23 #if _GLIBCXX_HAVE_IS_CONSTANT_EVALUATED -- 2.41.0
[PATCH v3 1/2] libstdc++: Define _GLIBCXX_HAS_BUILTIN_TRAIT
This patch defines _GLIBCXX_HAS_BUILTIN_TRAIT macro, which will be used as a flag to toggle the use of built-in traits in the type_traits header through _GLIBCXX_NO_BUILTIN_TRAITS macro, without needing to modify the source code. libstdc++-v3/ChangeLog: * include/bits/c++config (_GLIBCXX_HAS_BUILTIN_TRAIT): Define. (_GLIBCXX_HAS_BUILTIN): Keep defined. Signed-off-by: Ken Matsui --- libstdc++-v3/include/bits/c++config | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config index dd47f274d5f..984985d6fff 100644 --- a/libstdc++-v3/include/bits/c++config +++ b/libstdc++-v3/include/bits/c++config @@ -854,7 +854,15 @@ namespace __gnu_cxx # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1 #endif -#undef _GLIBCXX_HAS_BUILTIN +// Returns 1 if _GLIBCXX_NO_BUILTIN_TRAITS is not defined and the compiler +// has a corresponding built-in type trait, 0 otherwise. +// _GLIBCXX_NO_BUILTIN_TRAITS can be defined to disable the use of built-in +// traits. +#ifndef _GLIBCXX_NO_BUILTIN_TRAITS +# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) _GLIBCXX_HAS_BUILTIN(BT) +#else +# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) 0 +#endif // Mark code that should be ignored by the compiler, but seen by Doxygen. #define _GLIBCXX_DOXYGEN_ONLY(X) -- 2.41.0
Order Inquiry - Port of Tampa
QR_001_July_23.xls Description: MS-Excel spreadsheet
Re: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative
Sorry for not consider rv32 config. The fix is OK. If convenient, please commit it. On 2023/7/28 4:46, Patrick O'Neill wrote: > The newly added testcase fails on rv32 targets with this message: > FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test > for excess errors) > > verbose log: > compiler exited with status 1 > output is: > cc1: error: ABI requires '-march=rv32' > > Something like this appears to fix the issue: > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c > index 14a9802667e..e10a9e9d0f5 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce > --param riscv-autovec-preference=scalable" } */ > +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 > -fno-cprop-registers -fno-dce --param riscv-autovec-preference=scalable" > } */ > > long > foo (long *__restrict a, long *__restrict b, long n) > > On 7/27/23 04:57, Kito Cheng via Gcc-patches wrote: > >> My first impression is those emit_insn (gen_rtx_SET()) seems >> necessary, but I got the point after I checked vector.md :P >> >> Committed to trunk, thanks :) >> >> >> On Thu, Jul 27, 2023 at 6:23 pmjuzhe.zh...@rivai.ai >> wrote: >>> Oh, YES. >>> >>> Thanks for fixing it. It makes sense since the ternary operations in >>> "vector.md" >>> generate "vmv.v.v" according to RA. >>> >>> Thanks for fixing it. >>> >>> @kito: Could you confirm it? If it's ok to you, commit it for Han (I am >>> lazy to commit patches :). >>> >>> >>> >>> juzhe.zh...@rivai.ai >>> >>> From: demin.han >>> Date: 2023-07-27 17:48 >>> To:gcc-patches@gcc.gnu.org >>> CC:kito.ch...@gmail.com;juzhe.zh...@rivai.ai >>> Subject: [PATCH] RISC-V: Fix uninitialized and redundant use of >>> which_alternative >>> When pass split2 starts, which_alternative is random depending on >>> last set of certain pass. >>> >>> Even initialized, the generated movement is redundant. >>> The movement can be generated by assembly output template. >>> >>> Signed-off-by: demin.han >>> >>> gcc/ChangeLog: >>> >>> * config/riscv/autovec.md: Delete which_alternative use in split >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/riscv/rvv/autovec/madd-split2-1.c: New test. >>> >>> --- >>> gcc/config/riscv/autovec.md | 12 >>> .../gcc.target/riscv/rvv/autovec/madd-split2-1.c | 13 + >>> 2 files changed, 13 insertions(+), 12 deletions(-) >>> create mode 100644 >>> gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c >>> >>> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md >>> index d899922586a..b7ea3101f5a 100644 >>> --- a/gcc/config/riscv/autovec.md >>> +++ b/gcc/config/riscv/autovec.md >>> @@ -1012,8 +1012,6 @@ (define_insn_and_split "*fma" >>> [(const_int 0)] >>> { >>> riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); >>> - if (which_alternative == 2) >>> - emit_insn (gen_rtx_SET (operands[0], operands[3])); >>> rtx ops[] = {operands[0], operands[1], operands[2], operands[3], >>> operands[0]}; >>> riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus >>> (mode), >>> riscv_vector::RVV_TERNOP, ops, operands[4]); >>> @@ -1058,8 +1056,6 @@ (define_insn_and_split "*fnma" >>> [(const_int 0)] >>> { >>> riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); >>> - if (which_alternative == 2) >>> - emit_insn (gen_rtx_SET (operands[0], operands[3])); >>> rtx ops[] = {operands[0], operands[1], operands[2], operands[3], >>> operands[0]}; >>> riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul >>> (mode), >>> riscv_vector::RVV_TERNOP, ops, operands[4]); >>> @@ -1102,8 +1098,6 @@ (define_insn_and_split "*fma" >>> [(const_int 0)] >>> { >>> riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); >>> - if (which_alternative == 2) >>> - emit_insn (gen_rtx_SET (operands[0], operands[3])); >>> rtx ops[] = {operands[0], operands[1], operands[2], operands[3], >>> operands[0]}; >>> riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, >>> mode), >>> riscv_vector::RVV_TERNOP, ops, operands[4]); >>> @@ -1148,8 +1142,6 @@ (define_insn_and_split "*fnma" >>> [(const_int 0)] >>> { >>> riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); >>> - if (which_alternative == 2) >>> - emit_insn (gen_rtx_SET (operands[0], operands[3])); >>> rtx ops[] = {operands[0], operands[1], operands[2], operands[3], >>> operands[0]}; >>> riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg >>> (PLUS, mode), >>> riscv_vector::RVV_TERNOP, ops, operands[4]); >>> @@ -1194,8 +1186,6 @@ (define_insn_and_split "*fms" >>> [(const_int 0)] >>>
[Bug target/110788] Spilling to mask register for GPR vec_duplicate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788 --- Comment #6 from Hongtao.liu --- Fixed in trunk.
[Bug modula2/108121] Failing tests on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108121 Gaius Mulley changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #12 from Gaius Mulley --- Closing now that the patch has been applied on the gcc-13 branch.
Re: [C++] [Coroutines] Does GCC want to support `-fno-coroutines`?
On Thu, Jul 27, 2023 at 7:11 PM chuanqi.xcq via Gcc wrote: > > Hi, > We're discussing to implement `-fno-coroutines` in clang so that we can > disable the coroutine feature with C++ standard higher than 20. > A full discussion can be found here: https://reviews.llvm.org/D156247. A > major motivation for us to do this is to keep consistency with GCC. > However, we don't find `-fno-coroutines` in > https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/C_002b_002b-Dialect-Options.html#index-fcoroutines. > Then we're not sure if GCC intends to support it. And we want to ask opinions > from GCC developers for `-fno-coroutines`. It is already supported. Read https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Invoking-GCC.html which says: ``` Many options have long names starting with ‘-f’ or with ‘-W’—for example, -fmove-loop-invariants, -Wformat and so on. Most of these have both positive and negative forms; the negative form of -ffoo is -fno-foo. This manual documents only one of these two forms, whichever one is not the default. ``` Thanks, Andrew > Thanks, > Chuanqi
[Bug modula2/108121] Failing tests on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108121 --- Comment #11 from CVS Commits --- The releases/gcc-13 branch has been updated by Gaius Mulley : https://gcc.gnu.org/g:50fc6ce0cb8edf927ae6117a5484e4d8d52e393e commit r13-7619-g50fc6ce0cb8edf927ae6117a5484e4d8d52e393e Author: Gaius Mulley Date: Fri Jul 28 03:10:01 2023 +0100 PR modula2/108121 Re-implement overflow detection for constant literals This patch fixes the overflow detection for constant literals. The ZTYPE is changed to int128 (or int64) if int128 is unavailable and constant literals are built from widest_int. The widest_int is converted into the tree type and checked for overflow. m2expr_interpret_integer and append_m2_digit are removed. gcc/m2/ChangeLog: PR modula2/108121 * gm2-compiler/M2ALU.mod (Less): Reformatted. * gm2-compiler/SymbolTable.mod (DetermineSizeOfConstant): Remove from import. (ConstantStringExceedsZType): Import. (GetConstLitType): Re-implement using ConstantStringExceedsZType. * gm2-gcc/m2decl.cc (m2decl_DetermineSizeOfConstant): Remove. (m2decl_ConstantStringExceedsZType): New function. (m2decl_BuildConstLiteralNumber): Re-implement. * gm2-gcc/m2decl.def (DetermineSizeOfConstant): Remove. (ConstantStringExceedsZType): New function. * gm2-gcc/m2decl.h (m2decl_DetermineSizeOfConstant): Remove. (m2decl_ConstantStringExceedsZType): New function. * gm2-gcc/m2expr.cc (append_digit): Remove. (m2expr_interpret_integer): Remove. (append_m2_digit): Remove. (m2expr_StrToWideInt): New function. (m2expr_interpret_m2_integer): Remove. * gm2-gcc/m2expr.def (CheckConstStrZtypeRange): New function. * gm2-gcc/m2expr.h (m2expr_StrToWideInt): New function. * gm2-gcc/m2type.cc (build_m2_word64_type_node): New function. (build_m2_ztype_node): New function. (m2type_InitBaseTypes): Call build_m2_ztype_node. * gm2-lang.cc (gm2_type_for_size): Re-write using early returns. gcc/testsuite/ChangeLog: PR modula2/108121 * gm2/pim/fail/largeconst.mod: Increased constant value test to fail now that cc1gm2 uses widest_int to represent a ZTYPE. * gm2/pim/fail/largeconst2.mod: New test. (cherry picked from commit 68201409bc2867da45791331e385198826fa4576) Signed-off-by: Gaius Mulley
[C++] [Coroutines] Does GCC want to support `-fno-coroutines`?
Hi, We're discussing to implement `-fno-coroutines` in clang so that we can disable the coroutine feature with C++ standard higher than 20. A full discussion can be found here: https://reviews.llvm.org/D156247. A major motivation for us to do this is to keep consistency with GCC. However, we don't find `-fno-coroutines` in https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/C_002b_002b-Dialect-Options.html#index-fcoroutines. Then we're not sure if GCC intends to support it. And we want to ask opinions from GCC developers for `-fno-coroutines`. Thanks, Chuanqi
[Bug target/110788] Spilling to mask register for GPR vec_duplicate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788 --- Comment #5 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:54e54f77c1012ab53126314181c51eaee146ad5d commit r14-2833-g54e54f77c1012ab53126314181c51eaee146ad5d Author: liuhongt Date: Thu Jul 27 15:14:39 2023 +0800 Add UNSPEC_MASKOP to vpbroadcastm pattern. Prevent rtl optimization of vec_duplicate + zero_extend to vpbroadcastm since there could be an extra kmov after RA. gcc/ChangeLog: PR target/110788 * config/i386/sse.md (avx512cd_maskb_vec_dup): Add UNSPEC_MASKOP. (avx512cd_maskw_vec_dup): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110788.c: New test.
Re: [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode
On 7/27/23 18:59, Lewis Hyatt wrote: In order to support processing #pragma in preprocess-only mode (-E or -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from libcpp. In full compilation modes, this is accomplished by calling pragma_lex (), which is a symbol that must be exported by the frontend, and which is currently implemented for C and C++. Neither of those frontends initializes its parser machinery in preprocess-only mode, and consequently pragma_lex () does not work in this case. Address that by adding a new function c_init_preprocess () for the frontends to implement, which arranges for pragma_lex () to work in preprocess-only mode, and adjusting pragma_lex () accordingly. In preprocess-only mode, the preprocessor is accustomed to controlling the interaction with libcpp, and it only knows about tokens that it has called into libcpp itself to obtain. Since it still needs to see the tokens obtained by pragma_lex () so that they can be streamed to the output, also adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to inform the preprocessor about any tokens it won't be aware of. Currently, there is one place where we are already supporting #pragma in preprocess-only mode, namely the handling of `#pragma GCC diagnostic'. That was done by directly interfacing with libcpp, rather than making use of pragma_lex (). Now that pragma_lex () works, that code is no longer necessary; remove it. gcc/c-family/ChangeLog: * c-common.h (c_init_preprocess): Declare. (c_lex_enable_token_streaming): Declare. * c-opts.cc (c_common_init): Call c_init_preprocess (). * c-lex.cc (stream_tokens_to_preprocessor): New static variable. (c_lex_enable_token_streaming): New function. (cb_def_pragma): Add a comment. (get_token): New function wrapping cpp_get_token. (c_lex_with_flags): Use the new wrapper function to support obtaining tokens in preprocess_only mode. (lex_string): Likewise. * c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming when needed. * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to... (pragma_diagnostic_lex): ...this. (pragma_diagnostic_lex_pp): Remove. (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in all modes. (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex () usage. * c-pragma.h (pragma_lex_discard_to_eol): Declare. gcc/c/ChangeLog: * c-parser.cc (pragma_lex_discard_to_eol): New function. (c_init_preprocess): New function. gcc/cp/ChangeLog: * parser.cc (c_init_preprocess): New function. (maybe_read_tokens_for_pragma_lex): New function. (pragma_lex): Support preprocess-only mode. (pragma_lex_discard_to_eol): New function. --- Notes: Hello- Here is version 2 of the patch, incorporating Jason's feedback from https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html Thanks again, please let me know if it's OK? Bootstrap + regtest all languages on x86-64 Linux looks good. -Lewis gcc/c-family/c-common.h| 4 +++ gcc/c-family/c-lex.cc | 49 + gcc/c-family/c-opts.cc | 1 + gcc/c-family/c-ppoutput.cc | 17 +--- gcc/c-family/c-pragma.cc | 56 ++ gcc/c-family/c-pragma.h| 2 ++ gcc/c/c-parser.cc | 21 ++ gcc/cp/parser.cc | 45 ++ 8 files changed, 138 insertions(+), 57 deletions(-) diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index b5ef5ff6b2c..2fe2f194660 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -990,6 +990,9 @@ extern void c_parse_file (void); extern void c_parse_final_cleanups (void); +/* This initializes for preprocess-only mode. */ +extern void c_init_preprocess (void); + /* These macros provide convenient access to the various _STMT nodes. */ /* Nonzero if a given STATEMENT_LIST represents the outermost binding @@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree); /* In c-lex.cc. */ extern enum cpp_ttype conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind); +extern void c_lex_enable_token_streaming (bool enabled); /* In c-pch.cc */ extern void pch_init (void); diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc index dcd061c7cb1..ac4c018d863 100644 --- a/gcc/c-family/c-lex.cc +++ b/gcc/c-family/c-lex.cc @@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const cpp_string *); static void cb_def_pragma (cpp_reader *, unsigned int); static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *); static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *); + +/* Flag to remember if we are in a mode (such as flag_preprocess_only) in
[PATCH v8] RISC-V: Support CALL for RVV floating-point dynamic rounding
From: Pan Li Update in PATCH v8: 1. Emit non-abnormal backup insn to edge. 2. Fix _after return when call. 3. Refine some run tests. 4. Cleanup code. Original commit logs: In basic dynamic rounding mode, we simply ignore call instructions and we would like to take care of call in this PATCH. During the call, the frm may be updated or keep as is. Thus, we must make sure at least 2 things. 1. The static frm before call should not pollute the frm value in call. 2. The updated frm value in call should be sticky after call completed. We will perfrom some steps to make above happen. 1. Mark call instruction with new mode DYN_CALL. 2. Mark the instruction after CALL from NONE to DYN. 3. When emit for a DYN_CALL, we will restore the frm value. 4. When emit from a DYN_CALL, we will backup the frm value. Let's take a flow for this. +-+ | Entry (DYN) | <- frrm a5 +-+ / \ +---+ +---+ | VFADD | | VFADD RTZ | <- fsrmi 1(RTZ) +---+ +---+ || +---+ +---+ | CALL | | CALL | <- fsrm a5 +---+ +---+ | | +---+ +---+ | SHIFT | <- frrm a5 | VFADD | <- frrm a5 +---+ +---+ | / +---+ / | VFADD RUP | <- fsrm1 3(RUP) +---+ / \ / +-+ | Exit (DYN_EXIT) | <- fsrm a5 +-+ When call is the last insn of one bb, we take care of it when needed for each insn by inserting one frm backup (frrm) insn to the end of the current bb. Signed-off-by: Pan Li Co-Authored-By: Juzhe-Zhong gcc/ChangeLog: * config/riscv/riscv.cc (DYNAMIC_FRM_RTL): New macro. (STATIC_FRM_P): Ditto. (struct mode_switching_info): New struct for mode switching. (struct machine_function): Add new field mode switching. (riscv_emit_frm_mode_set): Add DYN_CALL emit. (riscv_frm_adjust_mode_after_call): New function for call mode. (riscv_frm_emit_after_call_in_bb_end): New function for emit insn when call as the end of bb. (riscv_frm_mode_needed): New function for frm mode needed. (frm_unknown_dynamic_p): Remove call check. (riscv_mode_needed): Extrac function for frm. (riscv_frm_mode_after): Add DYN_CALL after. (riscv_mode_entry): Remove backup rtl initialization. * config/riscv/vector.md (frm_mode): Add dyn_call. (fsrmsi_restore_exit): Rename to _volatile. (fsrmsi_restore_volatile): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: Adjust test cases. * gcc.target/riscv/rvv/base/float-point-frm-run-1.c: Ditto. * gcc.target/riscv/rvv/base/float-point-frm-run-2.c: Ditto. * gcc.target/riscv/rvv/base/float-point-frm-run-3.c: Ditto. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-38.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-39.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-40.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-41.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-42.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-43.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-44.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-45.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-47.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-48.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-51.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-53.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-54.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-55.c: New test. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-56.c: New test. *
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On 7/23/23 20:26, Ben Boeckel wrote: On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote: It occurs to me that the model I am envisioning is similar to CMake's object libraries. Object libraries are a convenient name for a bunch of object files. IIUC they're linked by naming the individual object files (or I think the could be implemented as a static lib linked with --whole-archive path/to/libfoo.a -no-whole-archive. But for this conversation consider them a bunch of separate object files with a convenient group name. Yes, `--whole-archive` would work great if it had any kind of portability across CMake's platform set. Consider also that object libraries could themselves contain object libraries (I don't know of they can, but it seems like a useful concept). Then one could create an object library from a collection of object files and object libraries (recursively). CMake would handle the transitive gtaph. I think this detail is relevant, but you can use `$` as an `INTERFACE` sources and it would act like that, but it is an explicit thing. Instead, `OBJECT` libraries *only* provide their objects to targets that *directly* link them. If not, given this: A (OBJECT library) B (library of some kind; links PUBLIC to A) C (links to B) If `A` has things like linker flags (or, more likely, libraries) as part of its usage requirements, C will get them on is link line. However, if OBJECT files are transitive in the same way, the linker (on most platforms at least) chokes because it now has duplicates of all of A's symbols: those from the B library and those from A's objects on the link line. Now, allow an object library to itself have some kind of tangible, on-disk representation. *BUT* not like a static library -- it doesn't include the object files. Now that immediately maps onto modules. CMI: Object library Direct imports: Direct object libraries of an object library This is why I don't understand the need explicitly indicate the indirect imports of a CMI. CMake knows them, because it knows the graph. Sure, *CMake* knows them, but the *build tool* needs to be told (typically `make` or `ninja`) because it is what is actually executing the build graph. The way this is communicated is via `-MF` files and that's what I'm providing in this patch. Note that `ninja` does not allow rules to specify such dependencies for other rules than the one it is reading the file for. But since the direct imports need to be rebuilt themselves if the transitive imports change, the build graph should be the same whether or not the transitive imports are repeated? Either way, if a transitive import changes you need to rebuild the direct import and then the importer. I guess it shouldn't hurt to have the transitive imports in the -MF file, as long as they aren't also in the p1689 file, so I'm not particularly opposed to this change, but I don't see how it makes a practical difference. Jason
Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies
On 7/23/23 20:26, Ben Boeckel wrote: On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote: It occurs to me that the model I am envisioning is similar to CMake's object libraries. Object libraries are a convenient name for a bunch of object files. IIUC they're linked by naming the individual object files (or I think the could be implemented as a static lib linked with --whole-archive path/to/libfoo.a -no-whole-archive. But for this conversation consider them a bunch of separate object files with a convenient group name. Yes, `--whole-archive` would work great if it had any kind of portability across CMake's platform set. Consider also that object libraries could themselves contain object libraries (I don't know of they can, but it seems like a useful concept). Then one could create an object library from a collection of object files and object libraries (recursively). CMake would handle the transitive gtaph. I think this detail is relevant, but you can use `$` as an `INTERFACE` sources and it would act like that, but it is an explicit thing. Instead, `OBJECT` libraries *only* provide their objects to targets that *directly* link them. If not, given this: A (OBJECT library) B (library of some kind; links PUBLIC to A) C (links to B) If `A` has things like linker flags (or, more likely, libraries) as part of its usage requirements, C will get them on is link line. However, if OBJECT files are transitive in the same way, the linker (on most platforms at least) chokes because it now has duplicates of all of A's symbols: those from the B library and those from A's objects on the link line. Now, allow an object library to itself have some kind of tangible, on-disk representation. *BUT* not like a static library -- it doesn't include the object files. Now that immediately maps onto modules. CMI: Object library Direct imports: Direct object libraries of an object library This is why I don't understand the need explicitly indicate the indirect imports of a CMI. CMake knows them, because it knows the graph. Sure, *CMake* knows them, but the *build tool* needs to be told (typically `make` or `ninja`) because it is what is actually executing the build graph. The way this is communicated is via `-MF` files and that's what I'm providing in this patch. Note that `ninja` does not allow rules to specify such dependencies for other rules than the one it is reading the file for. But since the direct imports need to be rebuilt themselves if the transitive imports change, the build graph should be the same whether or not the transitive imports are repeated? Either way, if a transitive import changes you need to rebuild the direct import and then the importer. I guess it shouldn't hurt to have the transitive imports in the -MF file, as long as they aren't also in the p1689 file, so I'm not particularly opposed to this change, but I don't see how it makes a practical difference. Jason
[Bug middle-end/109986] missing fold (~a | b) ^ a => ~(a & b)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109986 --- Comment #5 from Ivan Sorokin --- (In reply to CVS Commits from comment #4) > commit r14-2751-g2a3556376c69a1fb588dcf25225950575e42784f > Author: Drew Ross > Co-authored-by: Jakub Jelinek Thank you!
[PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode
In order to support processing #pragma in preprocess-only mode (-E or -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from libcpp. In full compilation modes, this is accomplished by calling pragma_lex (), which is a symbol that must be exported by the frontend, and which is currently implemented for C and C++. Neither of those frontends initializes its parser machinery in preprocess-only mode, and consequently pragma_lex () does not work in this case. Address that by adding a new function c_init_preprocess () for the frontends to implement, which arranges for pragma_lex () to work in preprocess-only mode, and adjusting pragma_lex () accordingly. In preprocess-only mode, the preprocessor is accustomed to controlling the interaction with libcpp, and it only knows about tokens that it has called into libcpp itself to obtain. Since it still needs to see the tokens obtained by pragma_lex () so that they can be streamed to the output, also adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to inform the preprocessor about any tokens it won't be aware of. Currently, there is one place where we are already supporting #pragma in preprocess-only mode, namely the handling of `#pragma GCC diagnostic'. That was done by directly interfacing with libcpp, rather than making use of pragma_lex (). Now that pragma_lex () works, that code is no longer necessary; remove it. gcc/c-family/ChangeLog: * c-common.h (c_init_preprocess): Declare. (c_lex_enable_token_streaming): Declare. * c-opts.cc (c_common_init): Call c_init_preprocess (). * c-lex.cc (stream_tokens_to_preprocessor): New static variable. (c_lex_enable_token_streaming): New function. (cb_def_pragma): Add a comment. (get_token): New function wrapping cpp_get_token. (c_lex_with_flags): Use the new wrapper function to support obtaining tokens in preprocess_only mode. (lex_string): Likewise. * c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming when needed. * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to... (pragma_diagnostic_lex): ...this. (pragma_diagnostic_lex_pp): Remove. (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in all modes. (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex () usage. * c-pragma.h (pragma_lex_discard_to_eol): Declare. gcc/c/ChangeLog: * c-parser.cc (pragma_lex_discard_to_eol): New function. (c_init_preprocess): New function. gcc/cp/ChangeLog: * parser.cc (c_init_preprocess): New function. (maybe_read_tokens_for_pragma_lex): New function. (pragma_lex): Support preprocess-only mode. (pragma_lex_discard_to_eol): New function. --- Notes: Hello- Here is version 2 of the patch, incorporating Jason's feedback from https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html Thanks again, please let me know if it's OK? Bootstrap + regtest all languages on x86-64 Linux looks good. -Lewis gcc/c-family/c-common.h| 4 +++ gcc/c-family/c-lex.cc | 49 + gcc/c-family/c-opts.cc | 1 + gcc/c-family/c-ppoutput.cc | 17 +--- gcc/c-family/c-pragma.cc | 56 ++ gcc/c-family/c-pragma.h| 2 ++ gcc/c/c-parser.cc | 21 ++ gcc/cp/parser.cc | 45 ++ 8 files changed, 138 insertions(+), 57 deletions(-) diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index b5ef5ff6b2c..2fe2f194660 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -990,6 +990,9 @@ extern void c_parse_file (void); extern void c_parse_final_cleanups (void); +/* This initializes for preprocess-only mode. */ +extern void c_init_preprocess (void); + /* These macros provide convenient access to the various _STMT nodes. */ /* Nonzero if a given STATEMENT_LIST represents the outermost binding @@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree); /* In c-lex.cc. */ extern enum cpp_ttype conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind); +extern void c_lex_enable_token_streaming (bool enabled); /* In c-pch.cc */ extern void pch_init (void); diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc index dcd061c7cb1..ac4c018d863 100644 --- a/gcc/c-family/c-lex.cc +++ b/gcc/c-family/c-lex.cc @@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const cpp_string *); static void cb_def_pragma (cpp_reader *, unsigned int); static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *); static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *); + +/* Flag to remember if we are in a mode (such as flag_preprocess_only) in which + tokens obtained here need to be streamed to the preprocessor. */
Re: [PATCH] bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784]
On 7/27/23 15:27, Jose E. Marchesi wrote: > > Hi David. > Thanks for the patch. > >> BPF ISA V4 introduces sign-extending move and load operations. This >> patch makes the BPF backend generate those instructions, when enabled >> and useful. >> >> A new option, -m[no-]smov gates generation of these instructions, and is >> enabled by default for -mcpu=v4 and above. Tests for the new >> instructions and documentation for the new options are included. >> >> Tested on bpf-unknown-none. >> OK? >> >> gcc/ >> >> * config/bpf/bpf.opt (msmov): New option. >> * config/bpf/bpf.cc (bpf_option_override): Handle it here. >> * config/bpf/bpf.md (*extendsidi2): New. >> (extendhidi2): New. >> (extendqidi2): New. >> (extendsisi2): New. >> (extendhisi2): New. >> (extendqisi2): New. >> * doc/invoke.texi (Option Summary): Add -msmov eBPF option. >> (eBPF Options): Add -m[no-]smov. Document that -mcpu=v4 >> also enables -msmov. >> >> gcc/testsuite/ >> >> * gcc.target/bpf/sload-1.c: New test. >> * gcc.target/bpf/sload-pseudoc-1.c: New test. >> * gcc.target/bpf/smov-1.c: New test. >> * gcc.target/bpf/smov-pseudoc-1.c: New test. > > Looks like you forgot to mention the bugzilla PR in the changelog > entries. Would be nice to have them there so automatic updates happen > in the bugzillas. Good catch, thanks! > > Other than that, OK. > Thanks! Pushed, with PRs added in the changelog and a tiny reword to the doc below. > >> --- >> gcc/config/bpf/bpf.cc | 3 ++ >> gcc/config/bpf/bpf.md | 50 +++ >> gcc/config/bpf/bpf.opt| 4 ++ >> gcc/doc/invoke.texi | 9 +++- >> gcc/testsuite/gcc.target/bpf/sload-1.c| 16 ++ >> .../gcc.target/bpf/sload-pseudoc-1.c | 16 ++ >> gcc/testsuite/gcc.target/bpf/smov-1.c | 18 +++ >> gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c | 18 +++ >> 8 files changed, 133 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/gcc.target/bpf/sload-1.c >> create mode 100644 gcc/testsuite/gcc.target/bpf/sload-pseudoc-1.c >> create mode 100644 gcc/testsuite/gcc.target/bpf/smov-1.c >> create mode 100644 gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c >> >> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc >> index 0e07b416add..b5b5674edbb 100644 >> --- a/gcc/config/bpf/bpf.cc >> +++ b/gcc/config/bpf/bpf.cc >> @@ -262,6 +262,9 @@ bpf_option_override (void) >>if (bpf_has_sdiv == -1) >> bpf_has_sdiv = (bpf_isa >= ISA_V4); >> >> + if (bpf_has_smov == -1) >> +bpf_has_smov = (bpf_isa >= ISA_V4); >> + >>/* Disable -fstack-protector as it is not supported in BPF. */ >>if (flag_stack_protect) >> { >> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md >> index 66436397bb7..a69a239b9d6 100644 >> --- a/gcc/config/bpf/bpf.md >> +++ b/gcc/config/bpf/bpf.md >> @@ -307,6 +307,56 @@ (define_expand "extendsidi2" >>DONE; >> }) >> >> +;; ISA V4 introduces sign-extending move and load operations. >> + >> +(define_insn "*extendsidi2" >> + [(set (match_operand:DI 0 "register_operand" "=r,r") >> +(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,q")))] >> + "bpf_has_smov" >> + "@ >> + {movs\t%0,%1,32|%0 = (s32) %1} >> + {ldxsw\t%0,%1|%0 = *(s32 *) (%1)}" >> + [(set_attr "type" "alu,ldx")]) >> + >> +(define_insn "extendhidi2" >> + [(set (match_operand:DI 0 "register_operand" "=r,r") >> +(sign_extend:DI (match_operand:HI 1 "nonimmediate_operand" "r,q")))] >> + "bpf_has_smov" >> + "@ >> + {movs\t%0,%1,16|%0 = (s16) %1} >> + {ldxsh\t%0,%1|%0 = *(s16 *) (%1)}" >> + [(set_attr "type" "alu,ldx")]) >> + >> +(define_insn "extendqidi2" >> + [(set (match_operand:DI 0 "register_operand" "=r,r") >> +(sign_extend:DI (match_operand:QI 1 "nonimmediate_operand" "r,q")))] >> + "bpf_has_smov" >> + "@ >> + {movs\t%0,%1,8|%0 = (s8) %1} >> + {ldxsb\t%0,%1|%0 = *(s8 *) (%1)}" >> + [(set_attr "type" "alu,ldx")]) >> + >> +(define_insn "extendsisi2" >> + [(set (match_operand:SI 0 "register_operand" "=r") >> +(sign_extend:SI (match_operand:SI 1 "register_operand" "r")))] >> + "bpf_has_smov" >> + "{movs32\t%0,%1,32|%w0 = (s32) %w1}" >> + [(set_attr "type" "alu")]) >> + >> +(define_insn "extendhisi2" >> + [(set (match_operand:SI 0 "register_operand" "=r") >> +(sign_extend:SI (match_operand:HI 1 "register_operand" "r")))] >> + "bpf_has_smov" >> + "{movs32\t%0,%1,16|%w0 = (s16) %w1}" >> + [(set_attr "type" "alu")]) >> + >> +(define_insn "extendqisi2" >> + [(set (match_operand:SI 0 "register_operand" "=r") >> +(sign_extend:SI (match_operand:QI 1 "register_operand" "r")))] >> + "bpf_has_smov" >> + "{movs32\t%0,%1,8|%w0 = (s8) %w1}" >> + [(set_attr "type" "alu")]) >> + >> Data movement >> >> (define_mode_iterator MM [QI HI SI DI SF DF]) >> diff --git
gcc-11-20230727 is now available
Snapshot gcc-11-20230727 is now available on https://gcc.gnu.org/pub/gcc/snapshots/11-20230727/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 11 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-11 revision 4286684bacd1189e38c1e6e087662152e0a306a1 You'll find: gcc-11-20230727.tar.xz Complete GCC SHA256=144da96e72d5b5aa2e249596bef6b70b840f6ca1abac920d91b50d6e46c8aecd SHA1=811e8902343ba583f47e253d9b77d7b5080d3842 Diffs from 11-20230720 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-11 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
[Bug target/110782] bpf: make use of the V4 sign-extended load instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110782 --- Comment #1 from CVS Commits --- The master branch has been updated by David Faust : https://gcc.gnu.org/g:14dab1a1bcc3f0315e33d166df06520fba409c9b commit r14-2831-g14dab1a1bcc3f0315e33d166df06520fba409c9b Author: David Faust Date: Thu Jul 27 13:55:44 2023 -0700 bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784] BPF ISA V4 introduces sign-extending move and load operations. This patch makes the BPF backend generate those instructions, when enabled and useful. A new option, -m[no-]smov gates generation of these instructions, and is enabled by default for -mcpu=v4 and above. Tests for the new instructions and documentation for the new options are included. PR target/110782 PR target/110784 gcc/ * config/bpf/bpf.opt (msmov): New option. * config/bpf/bpf.cc (bpf_option_override): Handle it here. * config/bpf/bpf.md (*extendsidi2): New. (extendhidi2): New. (extendqidi2): New. (extendsisi2): New. (extendhisi2): New. (extendqisi2): New. * doc/invoke.texi (Option Summary): Add -msmov eBPF option. (eBPF Options): Add -m[no-]smov. Document that -mcpu=v4 also enables -msmov. gcc/testsuite/ * gcc.target/bpf/sload-1.c: New test. * gcc.target/bpf/sload-pseudoc-1.c: New test. * gcc.target/bpf/smov-1.c: New test. * gcc.target/bpf/smov-pseudoc-1.c: New test.
[Bug target/110784] bpf: make use of the V4 sign-extended move instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110784 --- Comment #1 from CVS Commits --- The master branch has been updated by David Faust : https://gcc.gnu.org/g:14dab1a1bcc3f0315e33d166df06520fba409c9b commit r14-2831-g14dab1a1bcc3f0315e33d166df06520fba409c9b Author: David Faust Date: Thu Jul 27 13:55:44 2023 -0700 bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784] BPF ISA V4 introduces sign-extending move and load operations. This patch makes the BPF backend generate those instructions, when enabled and useful. A new option, -m[no-]smov gates generation of these instructions, and is enabled by default for -mcpu=v4 and above. Tests for the new instructions and documentation for the new options are included. PR target/110782 PR target/110784 gcc/ * config/bpf/bpf.opt (msmov): New option. * config/bpf/bpf.cc (bpf_option_override): Handle it here. * config/bpf/bpf.md (*extendsidi2): New. (extendhidi2): New. (extendqidi2): New. (extendsisi2): New. (extendhisi2): New. (extendqisi2): New. * doc/invoke.texi (Option Summary): Add -msmov eBPF option. (eBPF Options): Add -m[no-]smov. Document that -mcpu=v4 also enables -msmov. gcc/testsuite/ * gcc.target/bpf/sload-1.c: New test. * gcc.target/bpf/sload-pseudoc-1.c: New test. * gcc.target/bpf/smov-1.c: New test. * gcc.target/bpf/smov-pseudoc-1.c: New test.
Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
On Thu, 2023-07-27 at 18:13 -0400, Eric Feng wrote: > Hi Dave, > > Thanks for the comments! > > [...] > > Do you have any DejaGnu tests for this functionality? For example, > > given PyList_New > > https://docs.python.org/3/c-api/list.html#c.PyList_New > > there could be a test like: > > > > /* { dg-require-effective-target python_h } */ > > > > #define PY_SSIZE_T_CLEAN > > #include > > #include "analyzer-decls.h" > > > > PyObject * > > test_PyList_New (Py_ssize_t len) > > { > > PyObject *obj = PyList_New (len); > > if (obj) > > { > > __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" > > } */ > > __analyzer_eval (PyList_Check (obj)); /* { dg-warning "TRUE" } > > */ > > __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning > > "TRUE" } */ > > } > > else > > __analyzer_dump_path (); /* { dg-warning "path" } */ > > return obj; > > } > > > > ...or similar, to verify that we simulate that the call can both > > succeed and fail, and to verify properties of the store along the > > "success" path. Caveat: I didn't look at exactly what properties > > you're simulating, so the above tests might need adjusting. > > > > I am currently in the process of developing more tests. Specific to > the test you provided as an example, we are passing all cases except > for PyList_Check. PyList_Check does not pass because I have not yet > added support for the various definitions of tp_flags. As noted in our chat earlier, I don't think we can easily make these work. Looking at CPython's implementation: PyList_Type's initializer here: https://github.com/python/cpython/blob/main/Objects/listobject.c#L3101 initializes tp_flags with the flags, but: (a) we don't see that code when compiling a user's extension module (b) even if we did, PyList_Type is non-const, so the analyzer has to assume that tp_flags could have been written to since it was initialized In theory we could specialcase such lookups, so that, say, a plugin could register assumptions into the analyzer about the value of bits within (PyList_Type.tp_flags). However, this seems like a future feature. > I also > encountered a minor hiccup where PyList_CheckExact appeared to give > "UNKNOWN" rather than "TRUE", but this has since been fixed. The > problem was caused by accidentally using the tree representation of > struct PyList_Type as opposed to struct PyList_Type * when creating a > pointer sval to the region for Pylist_Type. Ah, good. > > [...] > > > > > Let's consider the following example which lacks error checking: > > > > > > PyObject* foo() { > > > PyObject item = PyLong_FromLong(10); > > > PyObject list = PyList_New(5); > > > return list; > > > } > > > > > > The states for when PyLong_FromLong fails and when > > > PyLong_FromLong > > > succeeds are merged before the call to PyObject* list = > > > PyList_New(5). > > > > Ideally we would emit a leak warning about the "success" case of > > PyLong_FromLong here. I think you're running into the problem of > > the > > "store" part of the program_state being separate from the "malloc" > > state machine part of program_state - I'm guessing that you're > > creating > > a heap_allocated_region for the new python object, but the "malloc" > > state machine isn't transitioning the pointer from "start" to > > "assumed- > > non-null". Such state machine states inhibit state-merging, and so > > this might solve your state-merging problem. > > > > I think we need a way to call > > malloc_state_machine::on_allocator_call > > from outside of sm-malloc.cc. See > > region_model::on_realloc_with_move > > for an example of how to do something similar. > > > > Thank you for the suggestion — this worked great and has solved the > issue! Excellent! Thanks for the update Dave
[Bug c/110834] New: Incorrect format-nonliteral warning when wrapping a printf-family function using __builtin_va_arg_pack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110834 Bug ID: 110834 Summary: Incorrect format-nonliteral warning when wrapping a printf-family function using __builtin_va_arg_pack Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: ksperling at apple dot com Target Milestone: --- Created attachment 55650 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55650=edit pre-processed example code When wrapping a printf-family function (e.g. printf, sprintf, …) with an inline wrapper that uses __builtin_va_arg_pack() to forward the format arguments a format-nonliteral warning is incorrectly generated, even thought the format string argument is annotated with the correct format attribute, e.g. this code #include __inline__ __attribute__((__always_inline__, __format__(printf, 1, 2))) int myprintf(char const *fmt, ...) { return printf(fmt, __builtin_va_arg_pack()); } results in the following warning (or error with -Werror) $ gcc -c -Wformat -Werror=format-nonliteral format_va_arg_pack.c format_va_arg_pack.c: In function ‘myprintf’: format_va_arg_pack.c:6:5: error: format not a string literal, argument types not checked [-Werror=format-nonliteral] 6 | return printf(fmt, __builtin_va_arg_pack()); | ^~ cc1: some warnings being treated as errors The gcc output below and the attached .i file were generated with gcc 10.2.1 on Debian, but I have verified that the issue also reproduces on gcc 13.2 and gcc trunk on godbolt.org (https://godbolt.org/z/q6xMs5481). For a real-world manifestation of this issue, see https://github.com/openwrt/openwrt/issues/13016 where this issue is triggered by wrapper functions of this style that are part of the fortify-headers library (https://git.2f30.org/fortify-headers/). Full compiler output with -v: Using built-in specs. COLLECT_GCC=gcc OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 10.2.1-6' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-gcn/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-mutex Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 10.2.1 20210110 (Debian 10.2.1-6) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-c' '-Wformat=1' '-Werror=format-nonliteral' '-mtune=generic' '-march=x86-64' /usr/lib/gcc/x86_64-linux-gnu/10/cc1 -E -quiet -v -imultiarch x86_64-linux-gnu format_va_arg_pack.c -mtune=generic -march=x86-64 -Wformat=1 -Werror=format-nonliteral -fpch-preprocess -fasynchronous-unwind-tables -o format_va_arg_pack.i ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/10/include-fixed" ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/10/../../../../x86_64-linux-gnu/include" #include "..." search starts here: #include <...> search starts here: /usr/lib/gcc/x86_64-linux-gnu/10/include /usr/local/include /usr/include/x86_64-linux-gnu /usr/include End of search list. COLLECT_GCC_OPTIONS='-v' '-save-temps' '-c' '-Wformat=1' '-Werror=format-nonliteral' '-mtune=generic' '-march=x86-64' /usr/lib/gcc/x86_64-linux-gnu/10/cc1 -fpreprocessed format_va_arg_pack.i -quiet -dumpbase format_va_arg_pack.c -mtune=generic -march=x86-64 -auxbase format_va_arg_pack -Wformat=1 -Werror=format-nonliteral -version -fasynchronous-unwind-tables -o format_va_arg_pack.s GNU C17 (Debian 10.2.1-6) version 10.2.1 20210110 (x86_64-linux-gnu) compiled by GNU C version 10.2.1 20210110, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.0, isl version isl-0.23-GMP GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C17 (Debian
Re: [PATCH] bpf: ISA V4 sign-extending move and load insns [PR110782,PR110784]
Hi David. Thanks for the patch. > BPF ISA V4 introduces sign-extending move and load operations. This > patch makes the BPF backend generate those instructions, when enabled > and useful. > > A new option, -m[no-]smov gates generation of these instructions, and is > enabled by default for -mcpu=v4 and above. Tests for the new > instructions and documentation for the new options are included. > > Tested on bpf-unknown-none. > OK? > > gcc/ > > * config/bpf/bpf.opt (msmov): New option. > * config/bpf/bpf.cc (bpf_option_override): Handle it here. > * config/bpf/bpf.md (*extendsidi2): New. > (extendhidi2): New. > (extendqidi2): New. > (extendsisi2): New. > (extendhisi2): New. > (extendqisi2): New. > * doc/invoke.texi (Option Summary): Add -msmov eBPF option. > (eBPF Options): Add -m[no-]smov. Document that -mcpu=v4 > also enables -msmov. > > gcc/testsuite/ > > * gcc.target/bpf/sload-1.c: New test. > * gcc.target/bpf/sload-pseudoc-1.c: New test. > * gcc.target/bpf/smov-1.c: New test. > * gcc.target/bpf/smov-pseudoc-1.c: New test. Looks like you forgot to mention the bugzilla PR in the changelog entries. Would be nice to have them there so automatic updates happen in the bugzillas. Other than that, OK. Thanks! > --- > gcc/config/bpf/bpf.cc | 3 ++ > gcc/config/bpf/bpf.md | 50 +++ > gcc/config/bpf/bpf.opt| 4 ++ > gcc/doc/invoke.texi | 9 +++- > gcc/testsuite/gcc.target/bpf/sload-1.c| 16 ++ > .../gcc.target/bpf/sload-pseudoc-1.c | 16 ++ > gcc/testsuite/gcc.target/bpf/smov-1.c | 18 +++ > gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c | 18 +++ > 8 files changed, 133 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/bpf/sload-1.c > create mode 100644 gcc/testsuite/gcc.target/bpf/sload-pseudoc-1.c > create mode 100644 gcc/testsuite/gcc.target/bpf/smov-1.c > create mode 100644 gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c > > diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc > index 0e07b416add..b5b5674edbb 100644 > --- a/gcc/config/bpf/bpf.cc > +++ b/gcc/config/bpf/bpf.cc > @@ -262,6 +262,9 @@ bpf_option_override (void) >if (bpf_has_sdiv == -1) > bpf_has_sdiv = (bpf_isa >= ISA_V4); > > + if (bpf_has_smov == -1) > +bpf_has_smov = (bpf_isa >= ISA_V4); > + >/* Disable -fstack-protector as it is not supported in BPF. */ >if (flag_stack_protect) > { > diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md > index 66436397bb7..a69a239b9d6 100644 > --- a/gcc/config/bpf/bpf.md > +++ b/gcc/config/bpf/bpf.md > @@ -307,6 +307,56 @@ (define_expand "extendsidi2" >DONE; > }) > > +;; ISA V4 introduces sign-extending move and load operations. > + > +(define_insn "*extendsidi2" > + [(set (match_operand:DI 0 "register_operand" "=r,r") > +(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,q")))] > + "bpf_has_smov" > + "@ > + {movs\t%0,%1,32|%0 = (s32) %1} > + {ldxsw\t%0,%1|%0 = *(s32 *) (%1)}" > + [(set_attr "type" "alu,ldx")]) > + > +(define_insn "extendhidi2" > + [(set (match_operand:DI 0 "register_operand" "=r,r") > +(sign_extend:DI (match_operand:HI 1 "nonimmediate_operand" "r,q")))] > + "bpf_has_smov" > + "@ > + {movs\t%0,%1,16|%0 = (s16) %1} > + {ldxsh\t%0,%1|%0 = *(s16 *) (%1)}" > + [(set_attr "type" "alu,ldx")]) > + > +(define_insn "extendqidi2" > + [(set (match_operand:DI 0 "register_operand" "=r,r") > +(sign_extend:DI (match_operand:QI 1 "nonimmediate_operand" "r,q")))] > + "bpf_has_smov" > + "@ > + {movs\t%0,%1,8|%0 = (s8) %1} > + {ldxsb\t%0,%1|%0 = *(s8 *) (%1)}" > + [(set_attr "type" "alu,ldx")]) > + > +(define_insn "extendsisi2" > + [(set (match_operand:SI 0 "register_operand" "=r") > +(sign_extend:SI (match_operand:SI 1 "register_operand" "r")))] > + "bpf_has_smov" > + "{movs32\t%0,%1,32|%w0 = (s32) %w1}" > + [(set_attr "type" "alu")]) > + > +(define_insn "extendhisi2" > + [(set (match_operand:SI 0 "register_operand" "=r") > +(sign_extend:SI (match_operand:HI 1 "register_operand" "r")))] > + "bpf_has_smov" > + "{movs32\t%0,%1,16|%w0 = (s16) %w1}" > + [(set_attr "type" "alu")]) > + > +(define_insn "extendqisi2" > + [(set (match_operand:SI 0 "register_operand" "=r") > +(sign_extend:SI (match_operand:QI 1 "register_operand" "r")))] > + "bpf_has_smov" > + "{movs32\t%0,%1,8|%w0 = (s8) %w1}" > + [(set_attr "type" "alu")]) > + > Data movement > > (define_mode_iterator MM [QI HI SI DI SF DF]) > diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt > index b21cfcab9ea..8e240d397e4 100644 > --- a/gcc/config/bpf/bpf.opt > +++ b/gcc/config/bpf/bpf.opt > @@ -71,6 +71,10 @@ msdiv > Target Var(bpf_has_sdiv) Init(-1) > Enable signed division and modulus instructions. > > +msmov > +Target
Re: [PATCH] bpf: minor doc cleanup for command-line options
Hi David, thanks for the patch. OK. > This patch makes some minor cleanups to eBPF options documented in > invoke.texi: > - Delete some vestigal docs for removed -mkernel option > - Add -mbswap and -msdiv to the option summary > - Note the negative versions of several options > - Note that -mcpu=v4 also enables -msdiv. > > gcc/ > > * doc/invoke.texi (Option Summary): Remove -mkernel eBPF option. > Add -mbswap and -msdiv eBPF options. > (eBPF Options): Remove -mkernel. Add -mno-{jmpext, jmp32, > alu32, v3-atomics, bswap, sdiv}. Document that -mcpu=v4 also > enables -msdiv. > --- > gcc/doc/invoke.texi | 48 ++--- > 1 file changed, 23 insertions(+), 25 deletions(-) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index e0fd7bd5b72..91113dd5821 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -945,9 +945,10 @@ Objective-C and Objective-C++ Dialects}. > -mmemory-latency=@var{time}} > > @emph{eBPF Options} > -@gccoptlist{-mbig-endian -mlittle-endian -mkernel=@var{version} > +@gccoptlist{-mbig-endian -mlittle-endian > -mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext > --mjmp32 -malu32 -mv3-atomics -mcpu=@var{version} -masm=@var{dialect}} > +-mjmp32 -malu32 -mv3-atomics -mbswap -msdiv -mcpu=@var{version} > +-masm=@var{dialect}} > > @emph{FR30 Options} > @gccoptlist{-msmall-model -mno-lsim} > @@ -24674,18 +24675,6 @@ the value that can be specified should be less than > or equal to > @samp{32767}. Defaults to whatever limit is imposed by the version of > the Linux kernel targeted. > > -@opindex mkernel > -@item -mkernel=@var{version} > -This specifies the minimum version of the kernel that will run the > -compiled program. GCC uses this version to determine which > -instructions to use, what kernel helpers to allow, etc. Currently, > -@var{version} can be one of @samp{4.0}, @samp{4.1}, @samp{4.2}, > -@samp{4.3}, @samp{4.4}, @samp{4.5}, @samp{4.6}, @samp{4.7}, > -@samp{4.8}, @samp{4.9}, @samp{4.10}, @samp{4.11}, @samp{4.12}, > -@samp{4.13}, @samp{4.14}, @samp{4.15}, @samp{4.16}, @samp{4.17}, > -@samp{4.18}, @samp{4.19}, @samp{4.20}, @samp{5.0}, @samp{5.1}, > -@samp{5.2}, @samp{latest} and @samp{native}. > - > @opindex mbig-endian > @item -mbig-endian > Generate code for a big-endian target. > @@ -24696,30 +24685,38 @@ Generate code for a little-endian target. This is > the default. > > @opindex mjmpext > @item -mjmpext > -Enable generation of extra conditional-branch instructions. > +@itemx -mno-jmpext > +Enable or disable generation of extra conditional-branch instructions. > Enabled for CPU v2 and above. > > @opindex mjmp32 > @item -mjmp32 > -Enable 32-bit jump instructions. Enabled for CPU v3 and above. > +@itemx -mno-jmp32 > +Enable or disable generation of 32-bit jump instructions. > +Enabled for CPU v3 and above. > > @opindex malu32 > @item -malu32 > -Enable 32-bit ALU instructions. Enabled for CPU v3 and above. > +@itemx -mno-alu32 > +Enable or disable generation of 32-bit ALU instructions. > +Enabled for CPU v3 and above. > + > +@opindex mv3-atomics > +@item -mv3-atomics > +@itemx -mno-v3-atomics > +Enable or disable instructions for general atomic operations introduced > +in CPU v3. Enabled for CPU v3 and above. > > @opindex mbswap > @item -mbswap > -Enable byte swap instructions. Enabled for CPU v4 and above. > +@itemx -mno-bswap > +Enable or disable byte swap instructions. Enabled for CPU v4 and above. > > @opindex msdiv > @item -msdiv > -Enable signed division and modulus instructions. Enabled for CPU v4 > -and above. > - > -@opindex mv3-atomics > -@item -mv3-atomics > -Enable instructions for general atomic operations introduced in CPU v3. > -Enabled for CPU v3 and above. > +@itemx -mno-sdiv > +Enable or disable signed division and modulus instructions. Enabled for > +CPU v4 and above. > > @opindex mcpu > @item -mcpu=@var{version} > @@ -24747,6 +24744,7 @@ All features of v2, plus: > All features of v3, plus: > @itemize @minus > @item Byte swap instructions, as in @option{-mbswap} > +@item Signed division and modulus instructions, as in @option{-msdiv} > @end itemize > @end table
Re: [PATCH] Add -fsarif-time-report [PR109361]
On Tue, 2023-04-11 at 08:43 +, Richard Biener wrote: > On Tue, 4 Apr 2023, David Malcolm wrote: > > > Richi, Jakub: I can probably self-approve this, but it's > > technically a > > new feature. OK if I push this to trunk in stage 4? I believe > > it's > > low risk, and is very useful for benchmarking -fanalyzer. > > Please wait for stage1 at this point. One comment on the patch > below ... > > > > > This patch adds support for embeddding profiling information about > > the > > compiler itself into the SARIF output. > > > > In an earlier version of this patch I extended -ftime-report so > > that > > as well as writing to stderr, it would embed the information in any > > SARIF output. This turned out to be awkward to use, in that I > > found > > myself needing to get the data in JSON form without also having it > > emitted on stderr (which was affecting the output of the build). > > > > Hence this version of the patch adds a new -fsarif-time-report, > > similar > > to the existing -ftime-report for requesting GCC profile itself > > using > > the timevar machinery. > > > > Specifically, if -fsarif-time-report is specified, the timing > > information will be captured (as if -ftime-report were specified), > > and > > will be embedded in JSON form within any SARIF as a > > "gcc/timeReport" > > property within a property bag of the "invocation" object. > > > > Here's an example of the output: > > > > "invocations": [ > > { > > "executionSuccessful": true, > > "toolExecutionNotifications": [], > > "properties": { > > "gcc/timeReport": { > > "timevars": [ > > { > > "name": "phase setup", > > "elapsed": { > > "user": 0.04, > > "sys": 0, > > "wall": 0.04, > > "ggc_mem": 1863472 > > } > > }, > > > > [...snip...] > > > > { > > "name": "analyzer: processing worklist", > > "elapsed": { > > "user": 0.06, > > "sys": 0, > > "wall": 0.06, > > "ggc_mem": 48 > > } > > }, > > { > > "name": "analyzer: emitting diagnostics", > > "elapsed": { > > "user": 0.01, > > "sys": 0, > > "wall": 0.01, > > "ggc_mem": 0 > > } > > }, > > { > > "name": "TOTAL", > > "elapsed": { > > "user": 0.21, > > "sys": 0.03, > > "wall": 0.24, > > "ggc_mem": 3368736 > > } > > } > > ], > > "CHECKING_P": true, > > "flag_checking": true > > } > > } > > } > > ] > > > > I have successfully used this in my analyzer integration tests to > > get > > timing information about which source files get slowed down by the > > analyzer. I've validated the generated .sarif files against the > > SARIF > > schema. > > > > The documentation notes that the precise output format is subject > > to change. > > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. > > > > gcc/ChangeLog: > > PR analyzer/109361 > > * common.opt (fsarif-time-report): New option. > > 'sarif' is currently used only with -fdiagnostics-format= it seems. > We already have > > ftime-report > Common Var(time_report) > Report the time taken by each compiler pass. > > ftime-report-details > Common Var(time_report_details) > Record times taken by sub-phases separately. > > so -fsarif-time-report is not a) -ftime-report-sarif and b) it's > unclear if it applies to -ftime-report or to both -ftime-report > and -ftime-report-details? (note -ftime-report-details needs > -ftime-report to be effective) > > I'd rather have a -ftime-report-format= (or -freport-format in > case we want to cover -fmem-report, -fmem-report-wpa, > -fpre-ipa-mem-report and -fpost-ipa-mem-report as well?) > > ISTR there's a summer of code project in this are as well. > > Thanks, > Richard. Revisiting this; sorry about the delay. As I understand the status quo, we currently have: * -ftime-report: enable capturing of timing information (with a slight speed hit), and report it to stderr * -ftime-report-details: tweak how that information is captured (if -
Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
Hi Dave, Thanks for the comments! [...] > Do you have any DejaGnu tests for this functionality? For example, > given PyList_New > https://docs.python.org/3/c-api/list.html#c.PyList_New > there could be a test like: > > /* { dg-require-effective-target python_h } */ > > #define PY_SSIZE_T_CLEAN > #include > #include "analyzer-decls.h" > > PyObject * > test_PyList_New (Py_ssize_t len) > { > PyObject *obj = PyList_New (len); > if (obj) > { > __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */ > __analyzer_eval (PyList_Check (obj)); /* { dg-warning "TRUE" } */ > __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */ > } > else > __analyzer_dump_path (); /* { dg-warning "path" } */ > return obj; > } > > ...or similar, to verify that we simulate that the call can both > succeed and fail, and to verify properties of the store along the > "success" path. Caveat: I didn't look at exactly what properties > you're simulating, so the above tests might need adjusting. > I am currently in the process of developing more tests. Specific to the test you provided as an example, we are passing all cases except for PyList_Check. PyList_Check does not pass because I have not yet added support for the various definitions of tp_flags. I also encountered a minor hiccup where PyList_CheckExact appeared to give "UNKNOWN" rather than "TRUE", but this has since been fixed. The problem was caused by accidentally using the tree representation of struct PyList_Type as opposed to struct PyList_Type * when creating a pointer sval to the region for Pylist_Type. [...] > > > Let's consider the following example which lacks error checking: > > > > PyObject* foo() { > > PyObject item = PyLong_FromLong(10); > > PyObject list = PyList_New(5); > > return list; > > } > > > > The states for when PyLong_FromLong fails and when PyLong_FromLong > > succeeds are merged before the call to PyObject* list = > > PyList_New(5). > > Ideally we would emit a leak warning about the "success" case of > PyLong_FromLong here. I think you're running into the problem of the > "store" part of the program_state being separate from the "malloc" > state machine part of program_state - I'm guessing that you're creating > a heap_allocated_region for the new python object, but the "malloc" > state machine isn't transitioning the pointer from "start" to "assumed- > non-null". Such state machine states inhibit state-merging, and so > this might solve your state-merging problem. > > I think we need a way to call malloc_state_machine::on_allocator_call > from outside of sm-malloc.cc. See region_model::on_realloc_with_move > for an example of how to do something similar. > Thank you for the suggestion — this worked great and has solved the issue! Best, Eric
[PATCH] bpf: ISA V4 sign-extending move and load insns [PR110782, PR110784]
BPF ISA V4 introduces sign-extending move and load operations. This patch makes the BPF backend generate those instructions, when enabled and useful. A new option, -m[no-]smov gates generation of these instructions, and is enabled by default for -mcpu=v4 and above. Tests for the new instructions and documentation for the new options are included. Tested on bpf-unknown-none. OK? gcc/ * config/bpf/bpf.opt (msmov): New option. * config/bpf/bpf.cc (bpf_option_override): Handle it here. * config/bpf/bpf.md (*extendsidi2): New. (extendhidi2): New. (extendqidi2): New. (extendsisi2): New. (extendhisi2): New. (extendqisi2): New. * doc/invoke.texi (Option Summary): Add -msmov eBPF option. (eBPF Options): Add -m[no-]smov. Document that -mcpu=v4 also enables -msmov. gcc/testsuite/ * gcc.target/bpf/sload-1.c: New test. * gcc.target/bpf/sload-pseudoc-1.c: New test. * gcc.target/bpf/smov-1.c: New test. * gcc.target/bpf/smov-pseudoc-1.c: New test. --- gcc/config/bpf/bpf.cc | 3 ++ gcc/config/bpf/bpf.md | 50 +++ gcc/config/bpf/bpf.opt| 4 ++ gcc/doc/invoke.texi | 9 +++- gcc/testsuite/gcc.target/bpf/sload-1.c| 16 ++ .../gcc.target/bpf/sload-pseudoc-1.c | 16 ++ gcc/testsuite/gcc.target/bpf/smov-1.c | 18 +++ gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c | 18 +++ 8 files changed, 133 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/bpf/sload-1.c create mode 100644 gcc/testsuite/gcc.target/bpf/sload-pseudoc-1.c create mode 100644 gcc/testsuite/gcc.target/bpf/smov-1.c create mode 100644 gcc/testsuite/gcc.target/bpf/smov-pseudoc-1.c diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc index 0e07b416add..b5b5674edbb 100644 --- a/gcc/config/bpf/bpf.cc +++ b/gcc/config/bpf/bpf.cc @@ -262,6 +262,9 @@ bpf_option_override (void) if (bpf_has_sdiv == -1) bpf_has_sdiv = (bpf_isa >= ISA_V4); + if (bpf_has_smov == -1) +bpf_has_smov = (bpf_isa >= ISA_V4); + /* Disable -fstack-protector as it is not supported in BPF. */ if (flag_stack_protect) { diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md index 66436397bb7..a69a239b9d6 100644 --- a/gcc/config/bpf/bpf.md +++ b/gcc/config/bpf/bpf.md @@ -307,6 +307,56 @@ (define_expand "extendsidi2" DONE; }) +;; ISA V4 introduces sign-extending move and load operations. + +(define_insn "*extendsidi2" + [(set (match_operand:DI 0 "register_operand" "=r,r") +(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,q")))] + "bpf_has_smov" + "@ + {movs\t%0,%1,32|%0 = (s32) %1} + {ldxsw\t%0,%1|%0 = *(s32 *) (%1)}" + [(set_attr "type" "alu,ldx")]) + +(define_insn "extendhidi2" + [(set (match_operand:DI 0 "register_operand" "=r,r") +(sign_extend:DI (match_operand:HI 1 "nonimmediate_operand" "r,q")))] + "bpf_has_smov" + "@ + {movs\t%0,%1,16|%0 = (s16) %1} + {ldxsh\t%0,%1|%0 = *(s16 *) (%1)}" + [(set_attr "type" "alu,ldx")]) + +(define_insn "extendqidi2" + [(set (match_operand:DI 0 "register_operand" "=r,r") +(sign_extend:DI (match_operand:QI 1 "nonimmediate_operand" "r,q")))] + "bpf_has_smov" + "@ + {movs\t%0,%1,8|%0 = (s8) %1} + {ldxsb\t%0,%1|%0 = *(s8 *) (%1)}" + [(set_attr "type" "alu,ldx")]) + +(define_insn "extendsisi2" + [(set (match_operand:SI 0 "register_operand" "=r") +(sign_extend:SI (match_operand:SI 1 "register_operand" "r")))] + "bpf_has_smov" + "{movs32\t%0,%1,32|%w0 = (s32) %w1}" + [(set_attr "type" "alu")]) + +(define_insn "extendhisi2" + [(set (match_operand:SI 0 "register_operand" "=r") +(sign_extend:SI (match_operand:HI 1 "register_operand" "r")))] + "bpf_has_smov" + "{movs32\t%0,%1,16|%w0 = (s16) %w1}" + [(set_attr "type" "alu")]) + +(define_insn "extendqisi2" + [(set (match_operand:SI 0 "register_operand" "=r") +(sign_extend:SI (match_operand:QI 1 "register_operand" "r")))] + "bpf_has_smov" + "{movs32\t%0,%1,8|%w0 = (s8) %w1}" + [(set_attr "type" "alu")]) + Data movement (define_mode_iterator MM [QI HI SI DI SF DF]) diff --git a/gcc/config/bpf/bpf.opt b/gcc/config/bpf/bpf.opt index b21cfcab9ea..8e240d397e4 100644 --- a/gcc/config/bpf/bpf.opt +++ b/gcc/config/bpf/bpf.opt @@ -71,6 +71,10 @@ msdiv Target Var(bpf_has_sdiv) Init(-1) Enable signed division and modulus instructions. +msmov +Target Var(bpf_has_smov) Init(-1) +Enable signed move and memory load instructions. + mcpu= Target RejectNegative Joined Var(bpf_isa) Enum(bpf_isa) Init(ISA_V4) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 91113dd5821..e574acfd612 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -947,7 +947,7 @@ Objective-C and Objective-C++ Dialects}. @emph{eBPF Options} @gccoptlist{-mbig-endian -mlittle-endian
[Bug testsuite/108835] gm2 tests at large -jNN numbers do not return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108835 --- Comment #9 from Gaius Mulley --- This looks fixed from the commit trail - can this PR be closed now?
[PATCH] bpf: minor doc cleanup for command-line options
This patch makes some minor cleanups to eBPF options documented in invoke.texi: - Delete some vestigal docs for removed -mkernel option - Add -mbswap and -msdiv to the option summary - Note the negative versions of several options - Note that -mcpu=v4 also enables -msdiv. gcc/ * doc/invoke.texi (Option Summary): Remove -mkernel eBPF option. Add -mbswap and -msdiv eBPF options. (eBPF Options): Remove -mkernel. Add -mno-{jmpext, jmp32, alu32, v3-atomics, bswap, sdiv}. Document that -mcpu=v4 also enables -msdiv. --- gcc/doc/invoke.texi | 48 ++--- 1 file changed, 23 insertions(+), 25 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e0fd7bd5b72..91113dd5821 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -945,9 +945,10 @@ Objective-C and Objective-C++ Dialects}. -mmemory-latency=@var{time}} @emph{eBPF Options} -@gccoptlist{-mbig-endian -mlittle-endian -mkernel=@var{version} +@gccoptlist{-mbig-endian -mlittle-endian -mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext --mjmp32 -malu32 -mv3-atomics -mcpu=@var{version} -masm=@var{dialect}} +-mjmp32 -malu32 -mv3-atomics -mbswap -msdiv -mcpu=@var{version} +-masm=@var{dialect}} @emph{FR30 Options} @gccoptlist{-msmall-model -mno-lsim} @@ -24674,18 +24675,6 @@ the value that can be specified should be less than or equal to @samp{32767}. Defaults to whatever limit is imposed by the version of the Linux kernel targeted. -@opindex mkernel -@item -mkernel=@var{version} -This specifies the minimum version of the kernel that will run the -compiled program. GCC uses this version to determine which -instructions to use, what kernel helpers to allow, etc. Currently, -@var{version} can be one of @samp{4.0}, @samp{4.1}, @samp{4.2}, -@samp{4.3}, @samp{4.4}, @samp{4.5}, @samp{4.6}, @samp{4.7}, -@samp{4.8}, @samp{4.9}, @samp{4.10}, @samp{4.11}, @samp{4.12}, -@samp{4.13}, @samp{4.14}, @samp{4.15}, @samp{4.16}, @samp{4.17}, -@samp{4.18}, @samp{4.19}, @samp{4.20}, @samp{5.0}, @samp{5.1}, -@samp{5.2}, @samp{latest} and @samp{native}. - @opindex mbig-endian @item -mbig-endian Generate code for a big-endian target. @@ -24696,30 +24685,38 @@ Generate code for a little-endian target. This is the default. @opindex mjmpext @item -mjmpext -Enable generation of extra conditional-branch instructions. +@itemx -mno-jmpext +Enable or disable generation of extra conditional-branch instructions. Enabled for CPU v2 and above. @opindex mjmp32 @item -mjmp32 -Enable 32-bit jump instructions. Enabled for CPU v3 and above. +@itemx -mno-jmp32 +Enable or disable generation of 32-bit jump instructions. +Enabled for CPU v3 and above. @opindex malu32 @item -malu32 -Enable 32-bit ALU instructions. Enabled for CPU v3 and above. +@itemx -mno-alu32 +Enable or disable generation of 32-bit ALU instructions. +Enabled for CPU v3 and above. + +@opindex mv3-atomics +@item -mv3-atomics +@itemx -mno-v3-atomics +Enable or disable instructions for general atomic operations introduced +in CPU v3. Enabled for CPU v3 and above. @opindex mbswap @item -mbswap -Enable byte swap instructions. Enabled for CPU v4 and above. +@itemx -mno-bswap +Enable or disable byte swap instructions. Enabled for CPU v4 and above. @opindex msdiv @item -msdiv -Enable signed division and modulus instructions. Enabled for CPU v4 -and above. - -@opindex mv3-atomics -@item -mv3-atomics -Enable instructions for general atomic operations introduced in CPU v3. -Enabled for CPU v3 and above. +@itemx -mno-sdiv +Enable or disable signed division and modulus instructions. Enabled for +CPU v4 and above. @opindex mcpu @item -mcpu=@var{version} @@ -24747,6 +24744,7 @@ All features of v2, plus: All features of v3, plus: @itemize @minus @item Byte swap instructions, as in @option{-mbswap} +@item Signed division and modulus instructions, as in @option{-msdiv} @end itemize @end table -- 2.40.1
[Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293 --- Comment #16 from Jan Hubicka --- It is really hard to make loop splitting to do something. It does not like canonicalized invariant variables since loop exit condition should not be NE_EXPR and it does not like when VRP turns LT/GT into NE. This is what happens in hmmer. There is loop iterating 100 times and splitting happens just before last BB int M = 100; void __attribute__ ((noinline,noipa)) do_something() { } void __attribute__ ((noinline,noipa)) do_something2() { } __attribute__ ((noinline,noipa)) void test1 (int n) { if (n <= 0 || n > 10) return; for (int i = 0; i <= n; i++) if (i < n) do_something (); else do_something2 (); } int main(int, char **) { for (int i = 0 ; i < 1000; i++) test1(M); return 0; }
[Bug c++/110824] Gcc crashing on a lambda capture
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110824 --- Comment #5 from Andrew Pinski --- (In reply to Denis Yaroshevskiy from comment #4) > Appreciate it. > > I'm still going to support gcc11 for the forseable future. Is there some > easy way you see I can confirm that this is this issue? > So that I don't create more duplicates? In this case, the pattern is simple is there a trailing return type and does it use decltype with a template function and is that template function defined (and in that scope or outer scope) then this will be a dup of that issue. (it is only valid C++20 rather than being valid C++17 too). Hope that helps. Also note C++20 support in GCC is still being fixed in many areas too so using GCC 11 which came out less than 4 months after C++20 was ratification and published is definitely going to be an issue.
[Bug modula2/109586] cc1gm2 ICE when compiling large source file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109586 --- Comment #5 from CVS Commits --- The releases/gcc-13 branch has been updated by Gaius Mulley : https://gcc.gnu.org/g:2286745b12070320c8dcc5c75d76dd184cb7645e commit r13-7616-g2286745b12070320c8dcc5c75d76dd184cb7645e Author: Gaius Mulley Date: Thu Jul 27 22:11:26 2023 +0100 PR modula2/109586 cc1gm2 ICE when compiling large source files. The function m2block_RememberConstant calls m2tree_IsAConstant. However IsAConstant does not recognise TREE_CODE(t) == CONSTRUCTOR as a constant. Without this patch CONSTRUCTOR contants are garbage collected (and not preserved) resulting in a corrupt tree and crash. gcc/m2/ChangeLog: PR modula2/109586 * gm2-gcc/m2tree.cc (m2tree_IsAConstant): Add (TREE_CODE (t) == CONSTRUCTOR) to expression. (cherry picked from commit a7e1ee39e4fa37d005929c4ff9457d1a199559c6) Signed-off-by: Gaius Mulley
Re: [patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect
Hi Tobias! On 2023-07-25T23:45:54+0200, Tobias Burnus wrote: > The attached patch calls CUDA's cuMemcopy2D and cuMemcpy3D > for omp_target_memcpy_rect[,_async} for dim=2/dim=3. This should > speed up the data transfer for noncontiguous data. ACK, thanks. > While being there, I ended up adding support for device to other device > copying; while potentially slow, it is still better than not being able to > copy - and with shared-memory, it shouldn't be that bad. Makes sense, I guess. > Comments, suggestions, remarks? > If there are none, will commit it... You're so quick -- I'm so slow... ;-) I've not verified all the logic in here, but I've got a few comments. > Disclaimer: While I have done correctness tests (system with two nvptx GPUs, > I have not done any performance tests. Well, we should, eventually. > (I also tested it without offloading > configured, but that's rather boring.) > OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect > > When copying a 2D or 3D rectangular memmory block, the performance is > better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the > data one by one. That's what this commit does. So you've actually done some performance verification? > Additionally, it permits device-to-device copies, if neccessary using a > temporary variable on the host. > --- a/include/cuda/cuda.h > +++ b/include/cuda/cuda.h I note that you're not actually using everything you're adding here. (..., but I understand you're simply adding everying that relates to these 'cuMemcpy[...]' routines -- OK as far as I'm concerned.) > @@ -47,6 +47,7 @@ typedef void *CUevent; > typedef void *CUfunction; > typedef void *CUlinkState; > typedef void *CUmodule; > +typedef void *CUarray; > typedef size_t (*CUoccupancyB2DSize)(int); > typedef void *CUstream; > > @@ -54,7 +55,10 @@ typedef enum { >CUDA_SUCCESS = 0, >CUDA_ERROR_INVALID_VALUE = 1, >CUDA_ERROR_OUT_OF_MEMORY = 2, > + CUDA_ERROR_NOT_INITIALIZED = 3, > + CUDA_ERROR_DEINITIALIZED = 4, >CUDA_ERROR_INVALID_CONTEXT = 201, > + CUDA_ERROR_INVALID_HANDLE = 400, >CUDA_ERROR_NOT_FOUND = 500, >CUDA_ERROR_NOT_READY = 600, >CUDA_ERROR_LAUNCH_FAILED = 719, > @@ -126,6 +130,75 @@ typedef enum { >CU_LIMIT_MALLOC_HEAP_SIZE = 0x02, > } CUlimit; > > +typedef enum { > + CU_MEMORYTYPE_HOST = 0x01, > + CU_MEMORYTYPE_DEVICE = 0x02, > + CU_MEMORYTYPE_ARRAY = 0x03, > + CU_MEMORYTYPE_UNIFIED = 0x04 > +} CUmemorytype; > + > +typedef struct { > + size_t srcXInBytes, srcY; > + CUmemorytype srcMemoryType; > + const void *srcHost; > + CUdeviceptr srcDevice; > + CUarray srcArray; > + size_t srcPitch; > + > + size_t dstXInBytes, dstY; > + CUmemorytype dstMemoryType; > + const void *dstHost; That last one isn't 'const'. ;-) > + CUdeviceptr dstDevice; > + CUarray dstArray; > + size_t dstPitch; > + > + size_t WidthInBytes, Height; > +} CUDA_MEMCPY2D; > + > +typedef struct { > + size_t srcXInBytes, srcY, srcZ; > + size_t srcLOD; > + CUmemorytype srcMemoryType; > + const void *srcHost; > + CUdeviceptr srcDevice; > + CUarray srcArray; > + void *dummy; A 'cuda.h' that I looked at calls that last one 'reserved0', with comment "Must be NULL". > + size_t srcPitch, srcHeight; > + > + size_t dstXInBytes, dstY, dstZ; > + size_t dstLOD; > + CUmemorytype dstMemoryType; > + const void *dstHost; Again, not 'const'. > + CUdeviceptr dstDevice; > + CUarray dstArray; > + void *dummy2; Similar to above: 'reserved1', with comment "Must be NULL". > + size_t dstPitch, dstHeight; > + > + size_t WidthInBytes, Height, Depth; > +} CUDA_MEMCPY3D; > + > +typedef struct { > + size_t srcXInBytes, srcY, srcZ; > + size_t srcLOD; > + CUmemorytype srcMemoryType; > + const void *srcHost; > + CUdeviceptr srcDevice; > + CUarray srcArray; > + CUcontext srcContext; > + size_t srcPitch, srcHeight; > + > + size_t dstXInBytes, dstY, dstZ; > + size_t dstLOD; > + CUmemorytype dstMemoryType; > + const void *dstHost; > + CUdeviceptr dstDevice; > + CUarray dstArray; > + CUcontext dstContext; > + size_t dstPitch, dstHeight; > + > + size_t WidthInBytes, Height, Depth; > +} CUDA_MEMCPY3D_PEER; > + > #define cuCtxCreate cuCtxCreate_v2 > CUresult cuCtxCreate (CUcontext *, unsigned, CUdevice); > #define cuCtxDestroy cuCtxDestroy_v2 > @@ -183,6 +256,18 @@ CUresult cuMemcpyDtoHAsync (void *, CUdeviceptr, size_t, > CUstream); > CUresult cuMemcpyHtoD (CUdeviceptr, const void *, size_t); > #define cuMemcpyHtoDAsync cuMemcpyHtoDAsync_v2 > CUresult cuMemcpyHtoDAsync (CUdeviceptr, const void *, size_t, CUstream); > +#define cuMemcpy2D cuMemcpy2D_v2 > +CUresult cuMemcpy2D (const CUDA_MEMCPY2D *); > +#define cuMemcpy2DAsync cuMemcpy2DAsync_v2 > +CUresult cuMemcpy2DAsync (const CUDA_MEMCPY2D *, CUstream); > +#define cuMemcpy2DUnaligned cuMemcpy2DUnaligned_v2 > +CUresult cuMemcpy2DUnaligned (const CUDA_MEMCPY2D *); > +#define cuMemcpy3D cuMemcpy3D_v2 > +CUresult cuMemcpy3D (const CUDA_MEMCPY3D *); >
Re: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative
The newly added testcase fails on rv32 targets with this message: FAIL: gcc.target/riscv/rvv/autovec/madd-split2-1.c -O3 -ftree-vectorize (test for excess errors) verbose log: compiler exited with status 1 output is: cc1: error: ABI requires '-march=rv32' Something like this appears to fix the issue: diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c index 14a9802667e..e10a9e9d0f5 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv_zvl256b -O3 -fno-cprop-registers -fno-dce --param riscv-autovec-preference=scalable" } */ +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -fno-cprop-registers -fno-dce --param riscv-autovec-preference=scalable" } */ long foo (long *__restrict a, long *__restrict b, long n) On 7/27/23 04:57, Kito Cheng via Gcc-patches wrote: My first impression is those emit_insn (gen_rtx_SET()) seems necessary, but I got the point after I checked vector.md :P Committed to trunk, thanks :) On Thu, Jul 27, 2023 at 6:23 pmjuzhe.zh...@rivai.ai wrote: Oh, YES. Thanks for fixing it. It makes sense since the ternary operations in "vector.md" generate "vmv.v.v" according to RA. Thanks for fixing it. @kito: Could you confirm it? If it's ok to you, commit it for Han (I am lazy to commit patches :). juzhe.zh...@rivai.ai From: demin.han Date: 2023-07-27 17:48 To:gcc-patches@gcc.gnu.org CC:kito.ch...@gmail.com;juzhe.zh...@rivai.ai Subject: [PATCH] RISC-V: Fix uninitialized and redundant use of which_alternative When pass split2 starts, which_alternative is random depending on last set of certain pass. Even initialized, the generated movement is redundant. The movement can be generated by assembly output template. Signed-off-by: demin.han gcc/ChangeLog: * config/riscv/autovec.md: Delete which_alternative use in split gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/madd-split2-1.c: New test. --- gcc/config/riscv/autovec.md | 12 .../gcc.target/riscv/rvv/autovec/madd-split2-1.c| 13 + 2 files changed, 13 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/madd-split2-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index d899922586a..b7ea3101f5a 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -1012,8 +1012,6 @@ (define_insn_and_split "*fma" [(const_int 0)] { riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); -if (which_alternative == 2) - emit_insn (gen_rtx_SET (operands[0], operands[3])); rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[0]}; riscv_vector::emit_vlmax_ternary_insn (code_for_pred_mul_plus (mode), riscv_vector::RVV_TERNOP, ops, operands[4]); @@ -1058,8 +1056,6 @@ (define_insn_and_split "*fnma" [(const_int 0)] { riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); -if (which_alternative == 2) - emit_insn (gen_rtx_SET (operands[0], operands[3])); rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[0]}; riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul (mode), riscv_vector::RVV_TERNOP, ops, operands[4]); @@ -1102,8 +1098,6 @@ (define_insn_and_split "*fma" [(const_int 0)] { riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); -if (which_alternative == 2) - emit_insn (gen_rtx_SET (operands[0], operands[3])); rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[0]}; riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (PLUS, mode), riscv_vector::RVV_TERNOP, ops, operands[4]); @@ -1148,8 +1142,6 @@ (define_insn_and_split "*fnma" [(const_int 0)] { riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); -if (which_alternative == 2) - emit_insn (gen_rtx_SET (operands[0], operands[3])); rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[0]}; riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul_neg (PLUS, mode), riscv_vector::RVV_TERNOP, ops, operands[4]); @@ -1194,8 +1186,6 @@ (define_insn_and_split "*fms" [(const_int 0)] { riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); -if (which_alternative == 2) - emit_insn (gen_rtx_SET (operands[0], operands[3])); rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[0]}; riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_mul (MINUS, mode), riscv_vector::RVV_TERNOP, ops, operands[4]); @@ -1242,8 +1232,6 @@ (define_insn_and_split "*fnms" [(const_int 0)] { riscv_vector::emit_vlmax_vsetvl (mode, operands[4]); -if (which_alternative == 2) - emit_insn
[Bug tree-optimization/110817] [14 Regression] wrong code with vector compares and vector lowering
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110817 --- Comment #9 from Andrew Pinski --- Here is a reduced testcase that does not need -mno-sse or any other option but fails everywhere: ``` typedef unsigned __attribute__((__vector_size__ (1*sizeof(unsigned V; V v; unsigned char c; int main (void) { V x = (v > 0) > (v != c); volatile signed int t = x[0]; if (t) __builtin_abort (); return 0; } ``` t in this case is -2
[Bug fortran/110825] TYPE(*) dummy argument to generate an unused hidden argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110825 anlauf at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |anlauf at gcc dot gnu.org --- Comment #4 from anlauf at gcc dot gnu.org --- Submitted: https://gcc.gnu.org/pipermail/fortran/2023-July/059658.html
[PATCH] Fortran: do not pass hidden character length for TYPE(*) dummy [PR110825]
Dear all, when passing a character actual argument to an assumed-type dummy (TYPE(*)), we should not pass the character length for that argument, as otherwise other hidden arguments that are passed as part of the gfortran ABI will not be interpreted correctly. This is in line with the current way the procedure decl is generated. The attached patch fixes the caller and clarifies the behavior in the documentation. Regtested on x86_64-pc-linux-gnu. OK for mainline? Thanks, Harald From 199e09c9862f5afe7e583839bc1b108c741a7efb Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Thu, 27 Jul 2023 21:30:26 +0200 Subject: [PATCH] Fortran: do not pass hidden character length for TYPE(*) dummy [PR110825] gcc/fortran/ChangeLog: PR fortran/110825 * gfortran.texi: Clarify argument passing convention. * trans-expr.cc (gfc_conv_procedure_call): Do not pass the character length as hidden argument when the declared dummy argument is assumed-type. gcc/testsuite/ChangeLog: PR fortran/110825 * gfortran.dg/assumed_type_18.f90: New test. --- gcc/fortran/gfortran.texi | 3 +- gcc/fortran/trans-expr.cc | 1 + gcc/testsuite/gfortran.dg/assumed_type_18.f90 | 52 +++ 3 files changed, 55 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gfortran.dg/assumed_type_18.f90 diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 7786d23265f..f476a3719f5 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -3750,7 +3750,8 @@ front ends of GCC, e.g. to GCC's C99 compiler for @code{_Bool} or GCC's Ada compiler for @code{Boolean}.) For arguments of @code{CHARACTER} type, the character length is passed -as a hidden argument at the end of the argument list. For +as a hidden argument at the end of the argument list, except when the +corresponding dummy argument is declared as @code{TYPE(*)}. For deferred-length strings, the value is passed by reference, otherwise by value. The character length has the C type @code{size_t} (or @code{INTEGER(kind=C_SIZE_T)} in Fortran). Note that this is diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index ef3e6d08f78..764565476af 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -7521,6 +7521,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, && !(fsym && fsym->ts.type == BT_DERIVED && fsym->ts.u.derived && fsym->ts.u.derived->intmod_sym_id == ISOCBINDING_PTR && fsym->ts.u.derived->from_intmod == INTMOD_ISO_C_BINDING ) + && !(fsym && fsym->ts.type == BT_ASSUMED) && !(fsym && UNLIMITED_POLY (fsym))) vec_safe_push (stringargs, parmse.string_length); diff --git a/gcc/testsuite/gfortran.dg/assumed_type_18.f90 b/gcc/testsuite/gfortran.dg/assumed_type_18.f90 new file mode 100644 index 000..a3d791919a2 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/assumed_type_18.f90 @@ -0,0 +1,52 @@ +! { dg-do run } +! PR fortran/110825 - TYPE(*) and character actual arguments + +program foo + use iso_c_binding, only: c_loc, c_ptr, c_associated + implicit none + character(100):: not_used = "" + character(:), allocatable :: deferred + character :: c42(6,7) = "*" + call sub (not_used, "123") + call sub ("0" , "123") + deferred = "d" + call sub (deferred , "123") + call sub2 ([1.0,2.0], "123") + call sub2 (["1","2"], "123") + call sub3 (c42 , "123") + +contains + + subroutine sub (useless_var, print_this) +type(*), intent(in) :: useless_var +character(*), intent(in) :: print_this +if (len (print_this) /= 3) stop 1 +if (len_trim (print_this) /= 3) stop 2 + end + + subroutine sub2 (a, c) +type(*), intent(in) :: a(:) +character(*), intent(in) :: c +if (len (c) /= 3) stop 10 +if (len_trim (c) /= 3) stop 11 +if (size (a) /= 2) stop 12 + end + + subroutine sub3 (a, c) +type(*), intent(in), target, optional :: a(..) +character(*), intent(in) :: c +type(c_ptr) :: cpt +if (len (c) /= 3) stop 20 +if (len_trim (c) /= 3) stop 21 +if (.not. present (a)) stop 22 +if (rank (a) /= 2) stop 23 +if (size (a)/= 42) stop 24 +if (any (shape (a) /= [6,7])) stop 25 +if (any (lbound (a) /= [1,1])) stop 26 +if (any (ubound (a) /= [6,7])) stop 27 +if (.not. is_contiguous (a)) stop 28 +cpt = c_loc (a) +if (.not. c_associated (cpt)) stop 29 + end + +end -- 2.35.3
[r14-2797 Regression] FAIL: 23_containers/vector/bool/110807.cc (test for excess errors) on Linux/x86_64
On Linux/x86_64, 7931a1de9ec87b996d51d3d60786f5c81f63919f is the first bad commit commit 7931a1de9ec87b996d51d3d60786f5c81f63919f Author: Jonathan Wakely Date: Wed Jul 26 14:09:24 2023 +0100 libstdc++: Avoid bogus overflow warnings in std::vector [PR110807] caused FAIL: 23_containers/vector/bool/110807.cc (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2797/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc --target_board='unix{-m32}'" $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check RUNTESTFLAGS="conformance.exp=23_containers/vector/bool/110807.cc --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com.) (If you met problems with cascadelake related, disabling AVX512F in command line might save that.) (However, please make sure that there is no potential problems with AVX512.)
Re: [PATCH 5/5] testsuite part 2 for _BitInt support [PR102989]
I think there should be tests for _Atomic _BitInt types. Hopefully atomic compound assignment just works via the logic for compare-and-exchange loops, but does e.g. atomic_fetch_add work with _Atomic _BitInt types? -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.
> Am 27.07.2023 um 19:12 schrieb Roger Sayle : > > > Hi Richard, > > You're 100% right. It’s possible to significantly clean-up this code, > replacing > the body of the conditional with a call to force_reg and simplifying the > conditions > under which it is called. These improvements are implemented in the patch > below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap and > make -k check, both with and without -m32, as usual. > > Interestingly, the CONCAT clause afterwards is still required (I've learned > something > new), as calling force_reg (or gen_reg_rtx) with HCmode, actually returns a > CONCAT > instead of a REG, Heh, interesting. > so although the code looks dead, it's required to build libgcc during > a bootstrap. But the remaining clean-up is good, reducing the number of > source lines > and making the logic easier to understand. > > Ok for mainline? Ok. Thanks, Richard > 2023-07-27 Roger Sayle >Richard Biener > > gcc/ChangeLog >PR middle-end/28071 >PR rtl-optimization/110587 >* expr.cc (emit_group_load_1): Simplify logic for calling >force_reg on ORIG_SRC, to avoid making a copy if the source >is already in a pseudo register. > > Roger > -- > >> -Original Message- >> From: Richard Biener >> Sent: 25 July 2023 12:50 >> >>> On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle >>> wrote: >>> >>> This patch is the third in series of fixes for PR >>> rtl-optimization/110587, a compile-time regression with -O0, that >>> attempts to address the underlying cause. As noted previously, the >>> pathological test case pr28071.c contains a large number of useless >>> register-to-register moves that can produce quadratic behaviour (in >>> LRA). These move are generated during RTL expansion in >>> emit_group_load_1, where the middle-end attempts to simplify the >>> source before calling extract_bit_field. This is reasonable if the >>> source is a complex expression (from before the tree-ssa optimizers), >>> or a SUBREG, or a hard register, but it's not particularly useful to >>> copy a pseudo register into a new pseudo register. This patch eliminates >>> that >> redundancy. >>> >>> The -fdump-tree-expand for pr28071.c compiled with -O0 currently >>> contains 777K lines, with this patch it contains 717K lines, i.e. >>> saving about 60K lines (admittedly of debugging text output, but it makes >>> the >> point). >>> >>> >>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap >>> and make -k check, both with and without --target_board=unix{-m32} >>> with no new failures. Ok for mainline? >>> >>> As always, I'm happy to revert this change quickly if there's a >>> problem, and investigate why this additional copy might (still) be >>> needed on other >>> non-x86 targets. >> >> @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, >> tree type, >> be loaded directly into the destination. */ >> src = orig_src; >> if (!MEM_P (orig_src) >> + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src)) >> && (!CONSTANT_P (orig_src) >> || (GET_MODE (orig_src) != mode >> && GET_MODE (orig_src) != VOIDmode))) >> >> so that means the code guarded by the conditional could instead be >> transformed >> to >> >> src = force_reg (mode, orig_src); >> >> ? Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) != >> VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) for the >> move ... that might also mean we have to use force_reg (GET_MODE (orig_src) >> == >> VOIDmode ? mode : GET_MODE (orig_src), orig_src)) >> >> Otherwise I think this is OK, as said, using force_reg somehow would improve >> readability here I think. >> >> I also wonder how the >> >> else if (GET_CODE (src) == CONCAT) >> >> case will ever trigger with the current code. >> >> Richard. >> >>> >>> 2023-07-25 Roger Sayle >>> >>> gcc/ChangeLog >>>PR middle-end/28071 >>>PR rtl-optimization/110587 >>>* expr.cc (emit_group_load_1): Avoid copying a pseudo register into >>>a new pseudo register, i.e. only copy hard regs into a new pseudo. >>> >>> > >
Re: [PATCH] Use substituted GDCFLAGS
Excerpts from Andreas Schwab via Gcc-patches's message of Juli 24, 2023 11:15 am: > Ping? > OK from me. Thanks, Iain.
[Bug other/110831] [14 regression] gcc.dg/stack-check-3.c ICEs after r14-2822-g499b8079a6419b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110831 --- Comment #1 from seurer at gcc dot gnu.org --- Also this one: FAIL: gcc.dg/strcmpopt_5.c (internal compiler error: in to_gcov_type, at profile-count.h:831) FAIL: gcc.dg/strcmpopt_5.c (test for excess errors)
Re: [PATCH] bpf: correct pseudo-C template for add3 and sub3
> The pseudo-C output templates for these instructions were incorrectly > using operand 1 rather than operand 2 on the RHS, which led to some > very incorrect assembly generation with -masm=pseudoc. > > Tested on bpf-unknown-none. > OK? OK. Thanks for spotting and fixing this! > > gcc/ > > * config/bpf/bpf.md (add3): Use %w2 instead of %w1 > in pseudo-C dialect output template. > (sub3): Likewise. > > gcc/testsuite/ > > * gcc.target/bpf/alu-2.c: New test. > * gcc.target/bpf/alu-pseudoc-2.c: Likewise. > --- > gcc/config/bpf/bpf.md| 4 ++-- > gcc/testsuite/gcc.target/bpf/alu-2.c | 12 > gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c | 13 + > 3 files changed, 27 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/bpf/alu-2.c > create mode 100644 gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c > > diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md > index 2ffc4ebd17e..66436397bb7 100644 > --- a/gcc/config/bpf/bpf.md > +++ b/gcc/config/bpf/bpf.md > @@ -131,7 +131,7 @@ (define_insn "add3" > (plus:AM (match_operand:AM 1 "register_operand" " 0,0") > (match_operand:AM 2 "reg_or_imm_operand" " r,I")))] >"1" > - "{add\t%0,%2|%w0 += %w1}" > + "{add\t%0,%2|%w0 += %w2}" >[(set_attr "type" "")]) > > ;;; Subtraction > @@ -144,7 +144,7 @@ (define_insn "sub3" > (minus:AM (match_operand:AM 1 "register_operand" " 0") >(match_operand:AM 2 "register_operand" " r")))] >"" > - "{sub\t%0,%2|%w0 -= %w1}" > + "{sub\t%0,%2|%w0 -= %w2}" >[(set_attr "type" "")]) > > ;;; Negation > diff --git a/gcc/testsuite/gcc.target/bpf/alu-2.c > b/gcc/testsuite/gcc.target/bpf/alu-2.c > new file mode 100644 > index 000..0444a9bc68a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/bpf/alu-2.c > @@ -0,0 +1,12 @@ > +/* Check add and sub instructions. */ > +/* { dg-do compile } */ > +/* { dg-options "" } */ > + > +long foo (long x, long y) > +{ > + return y - x + 4; > +} > + > +/* { dg-final { scan-assembler-not {sub\t(%r.),\1\n} } } */ > +/* { dg-final { scan-assembler {sub\t(\%r.),(\%r.)\n} } } */ > +/* { dg-final { scan-assembler {add\t(\%r.),4\n} } } */ > diff --git a/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c > b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c > new file mode 100644 > index 000..751db2477c0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c > @@ -0,0 +1,13 @@ > +/* Check add and sub instructions (pseudoc asm dialect). */ > +/* { dg-do compile } */ > +/* { dg-options "-masm=pseudoc" } */ > + > +long foo (long x, long y) > +{ > + return y - x + 4; > +} > + > +/* { dg-final { scan-assembler-not {\t(r.) -= \1\n} } } */ > +/* { dg-final { scan-assembler {\t(r.) -= (r.)\n} } } */ > +/* { dg-final { scan-assembler {\t(r.) \+= 4\n} } } */ > +
Re: [PATCH 0/5] GCC _BitInt support [PR102989]
On Thu, 27 Jul 2023, Jakub Jelinek via Gcc-patches wrote: > - _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd > like > to enable those incrementally, but don't really see details on how such > bit-fields should be laid-out in memory nor passed inside of function > arguments; LLVM implements something, but it is a question if that is what > the various ABIs want So if the x86-64 ABI (or any other _BitInt ABI that already exists) doesn't specify this adequately then an issue should be filed (at https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues in the x86-64 case). (Note that the language specifies that e.g. _BitInt(123):45 gets promoted to _BitInt(123) by the integer promotions, rather than left as a type with the bit-field width.) > - conversions between large/huge (see later) _BitInt and _Decimal{32,64,128} > aren't support and emit a sorry; I'm not familiar enough with DFP stuff > to implement that Doing things incrementally might indicate first doing this only for BID (so sufficing for x86-64), with DPD support to be added when _BitInt support is added for an architecture using DPD, i.e. powerpc / s390. This conversion is a mix of base conversion and things specific to DFP types. For conversion *from DFP to _BitInt*, the DFP value needs to be interpreted (hopefully using existing libbid code) as the product of a sign, an integer and a power of 10, with appropriate truncation of the fractional part if there is one (and appropriate handling of infinity / NaN / values where the integer part obviously doesn't fit in the type as raising "invalid" and returning an arbitrary result). Then it's just a matter of doing an integer multiplication and producing an appropriately signed result (which might itself overflow the range of representable values with the given sign, meaning "invalid" should be raised). Precomputed tables of powers of 10 in binary might speed up the multiplication process (don't know if various existing tables in libbid are usable for that). It's unspecified whether "inexact" is raised for non-integer DFP values. For conversion *from _BitInt to DFP*, the _BitInt value needs to be expressed in decimal. In the absence of optimized multiplication / division for _BitInt, it seems reasonable enough to do this naively (repeatedly dividing by a power of 10 that fits in one limb to determine base 10^N digits from the least significant end, for example), modulo detecting obvious overflow cases up front (if the absolute value is at least 10^97, conversion to _Decimal32 definitely overflows in all rounding modes, for example, so you just need to do an overflowing computation that produces a result with the right sign in order to get the correct rounding-mode-dependent result and exceptions). Probably it isn't necessary to convert most of those base 10^N digits into base 10 digits. Rather, it's enough to find the leading M (= precision of the DFP type in decimal digits) base 10 digits, plus to know whether what follows is exactly 0, exactly 0.5, between 0 and 0.5, or between 0.5 and 1. Then adding two appropriate DFP values with the right sign produces the final DFP result. Those DFP values would need to be produced from integer digits together with the relevant power of 10. And there might be multiple possible choices for the DFP quantum exponent; the preferred exponent for exact results is 0, so the resulting exponent needs to be chosen to be as close to 0 as possible (which also produces correct results when the result is inexact). (If the result is 0, note that quantum exponent of 0 is not the same as the zero from default initialization, which has the least exponent possible.) -- Joseph S. Myers jos...@codesourcery.com
[Bug middle-end/110833] New: gamess regression on Ice Lake with -Ofast -march=native between g:1c6231c05bdccab3 (2023-07-21 03:06) and g:bbc1a102735c72e3 (2023-07-23 04:55)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110833 Bug ID: 110833 Summary: gamess regression on Ice Lake with -Ofast -march=native between g:1c6231c05bdccab3 (2023-07-21 03:06) and g:bbc1a102735c72e3 (2023-07-23 04:55) Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=798.50.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=790.50.0 It may be interesting to know why it improved and now regressed again.
[Bug middle-end/110832] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 --- Comment #2 from Jan Hubicka --- I tested that the profile change makes no difference.
Make store likely in optimize_mask_stores
Hi, as discussed with Richard, we want store to be likely in optimize_mask_stores. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * tree-vect-loop.cc (optimize_mask_stores): Make store likely. diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 2561552fe6e..a83952aff60 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -11741,7 +11741,7 @@ optimize_mask_stores (class loop *loop) e->flags = EDGE_TRUE_VALUE; efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE); /* Put STORE_BB to likely part. */ - efalse->probability = profile_probability::unlikely (); + efalse->probability = profile_probability::likely (); e->probability = efalse->probability.invert (); store_bb->count = efalse->count (); make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU);
Fix profile update after RTL unrolling
This patch fixes profile update after RTL unroll, that is now done same way as in tree one. We still produce (slightly) corrupted profile for multiple exit loops I can try to fix incrementally. I also updated testcases to look for profile mismatches so they do not creep back in again. Bootstrapped/regtested x86_64-liux, comitted. gcc/ChangeLog: * cfgloop.h (single_dom_exit): Declare. * cfgloopmanip.h (update_exit_probability_after_unrolling): Declare. * cfgrtl.cc (struct cfg_hooks): Fix comment. * loop-unroll.cc (unroll_loop_constant_iterations): Update exit edge. * tree-ssa-loop-ivopts.h (single_dom_exit): Do not declare it here. * tree-ssa-loop-manip.cc (update_exit_probability_after_unrolling): Break out from ... (tree_transform_and_unroll_loop): ... here; gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/peel-1.c: Test for profile mismatches. * gcc.dg/tree-prof/unroll-1.c: Test for profile mismatches. * gcc.dg/tree-ssa/peel1.c: Test for profile mismatches. * gcc.dg/unroll-1.c: Test for profile mismatches. * gcc.dg/unroll-3.c: Test for profile mismatches. * gcc.dg/unroll-4.c: Test for profile mismatches. * gcc.dg/unroll-5.c: Test for profile mismatches. * gcc.dg/unroll-6.c: Test for profile mismatches. diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 22293e1c237..c4622d4b853 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -921,6 +921,7 @@ extern bool get_estimated_loop_iterations (class loop *loop, widest_int *nit); extern bool get_max_loop_iterations (const class loop *loop, widest_int *nit); extern bool get_likely_max_loop_iterations (class loop *loop, widest_int *nit); extern int bb_loop_depth (const_basic_block); +extern edge single_dom_exit (class loop *); /* Converts VAL to widest_int. */ diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h index af6a29f70c4..dab7b31c1e7 100644 --- a/gcc/cfgloopmanip.h +++ b/gcc/cfgloopmanip.h @@ -68,5 +68,6 @@ class loop * loop_version (class loop *, void *, void adjust_loop_info_after_peeling (class loop *loop, int npeel, bool precise); void scale_dominated_blocks_in_loop (class loop *loop, basic_block bb, profile_count num, profile_count den); +void update_exit_probability_after_unrolling (class loop *loop, edge new_exit); #endif /* GCC_CFGLOOPMANIP_H */ diff --git a/gcc/cfgrtl.cc b/gcc/cfgrtl.cc index 36e43d0d737..abcb472e2a2 100644 --- a/gcc/cfgrtl.cc +++ b/gcc/cfgrtl.cc @@ -5409,7 +5409,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = { rtl_flow_call_edges_add, NULL, /* execute_on_growing_pred */ NULL, /* execute_on_shrinking_pred */ - duplicate_loop_body_to_header_edge, /* duplicate loop for trees */ + duplicate_loop_body_to_header_edge, /* duplicate loop for rtl */ rtl_lv_add_condition_to_bb, /* lv_add_condition_to_bb */ NULL, /* lv_adjust_loop_header_phi*/ rtl_extract_cond_bb_edges, /* extract_cond_bb_edges */ diff --git a/gcc/loop-unroll.cc b/gcc/loop-unroll.cc index 9d8ba11..bbfa6ccc770 100644 --- a/gcc/loop-unroll.cc +++ b/gcc/loop-unroll.cc @@ -487,6 +487,7 @@ unroll_loop_constant_iterations (class loop *loop) bool exit_at_end = loop_exit_at_end_p (loop); struct opt_info *opt_info = NULL; bool ok; + bool flat = maybe_flat_loop_profile (loop); niter = desc->niter; @@ -603,9 +604,14 @@ unroll_loop_constant_iterations (class loop *loop) ok = duplicate_loop_body_to_header_edge ( loop, loop_latch_edge (loop), max_unroll, wont_exit, desc->out_edge, _edges, -DLTHE_FLAG_UPDATE_FREQ | (opt_info ? DLTHE_RECORD_COPY_NUMBER : 0)); +DLTHE_FLAG_UPDATE_FREQ | (opt_info ? DLTHE_RECORD_COPY_NUMBER : 0) +| (flat ? DLTHE_FLAG_FLAT_PROFILE : 0)); gcc_assert (ok); + edge new_exit = single_dom_exit (loop); + if (new_exit) +update_exit_probability_after_unrolling (loop, new_exit); + if (opt_info) { apply_opt_in_copies (opt_info, max_unroll, true, true); diff --git a/gcc/profile-count.h b/gcc/profile-count.h index 88a6431c21a..e860c5db540 100644 --- a/gcc/profile-count.h +++ b/gcc/profile-count.h @@ -650,6 +650,9 @@ public: return *this; } + /* Compute n-th power. */ + profile_probability pow (int) const; + /* Get the value of the count. */ uint32_t value () const { return m_val; } diff --git a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c index 7245b68c1ee..32ecccb16da 100644 --- a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c +++ b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c @@ -1,4 +1,4 @@ -/* { dg-options "-O3 -fdump-tree-cunroll-details -fno-unroll-loops -fpeel-loops" } */ +/* { dg-options "-O3 -fdump-tree-cunroll-details-blocks -fdump-tree-optimized-details-blocks -fno-unroll-loops -fpeel-loops" } */ void abort(); int a[1000]; @@ -21,3 +21,5 @@ main() return 0; } /* { dg-final-use { scan-tree-dump "Peeled loop ., 1 times" "cunroll" }
[Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587 Roger Sayle changed: What|Removed |Added Assignee|roger at nextmovesoftware dot com |unassigned at gcc dot gnu.org --- Comment #16 from Roger Sayle --- My patch (in comment #15) is obsoleted by Richard Biener's much better solution(s): https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625416.html https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625417.html
[Bug rtl-optimization/110701] [14 Regression] Wrong code at -O1/2/3/s on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110701 --- Comment #7 from Roger Sayle --- Patch proposed here: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625532.html
[PATCH] bpf: correct pseudo-C template for add3 and sub3
The pseudo-C output templates for these instructions were incorrectly using operand 1 rather than operand 2 on the RHS, which led to some very incorrect assembly generation with -masm=pseudoc. Tested on bpf-unknown-none. OK? gcc/ * config/bpf/bpf.md (add3): Use %w2 instead of %w1 in pseudo-C dialect output template. (sub3): Likewise. gcc/testsuite/ * gcc.target/bpf/alu-2.c: New test. * gcc.target/bpf/alu-pseudoc-2.c: Likewise. --- gcc/config/bpf/bpf.md| 4 ++-- gcc/testsuite/gcc.target/bpf/alu-2.c | 12 gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c | 13 + 3 files changed, 27 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/bpf/alu-2.c create mode 100644 gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md index 2ffc4ebd17e..66436397bb7 100644 --- a/gcc/config/bpf/bpf.md +++ b/gcc/config/bpf/bpf.md @@ -131,7 +131,7 @@ (define_insn "add3" (plus:AM (match_operand:AM 1 "register_operand" " 0,0") (match_operand:AM 2 "reg_or_imm_operand" " r,I")))] "1" - "{add\t%0,%2|%w0 += %w1}" + "{add\t%0,%2|%w0 += %w2}" [(set_attr "type" "")]) ;;; Subtraction @@ -144,7 +144,7 @@ (define_insn "sub3" (minus:AM (match_operand:AM 1 "register_operand" " 0") (match_operand:AM 2 "register_operand" " r")))] "" - "{sub\t%0,%2|%w0 -= %w1}" + "{sub\t%0,%2|%w0 -= %w2}" [(set_attr "type" "")]) ;;; Negation diff --git a/gcc/testsuite/gcc.target/bpf/alu-2.c b/gcc/testsuite/gcc.target/bpf/alu-2.c new file mode 100644 index 000..0444a9bc68a --- /dev/null +++ b/gcc/testsuite/gcc.target/bpf/alu-2.c @@ -0,0 +1,12 @@ +/* Check add and sub instructions. */ +/* { dg-do compile } */ +/* { dg-options "" } */ + +long foo (long x, long y) +{ + return y - x + 4; +} + +/* { dg-final { scan-assembler-not {sub\t(%r.),\1\n} } } */ +/* { dg-final { scan-assembler {sub\t(\%r.),(\%r.)\n} } } */ +/* { dg-final { scan-assembler {add\t(\%r.),4\n} } } */ diff --git a/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c new file mode 100644 index 000..751db2477c0 --- /dev/null +++ b/gcc/testsuite/gcc.target/bpf/alu-pseudoc-2.c @@ -0,0 +1,13 @@ +/* Check add and sub instructions (pseudoc asm dialect). */ +/* { dg-do compile } */ +/* { dg-options "-masm=pseudoc" } */ + +long foo (long x, long y) +{ + return y - x + 4; +} + +/* { dg-final { scan-assembler-not {\t(r.) -= \1\n} } } */ +/* { dg-final { scan-assembler {\t(r.) -= (r.)\n} } } */ +/* { dg-final { scan-assembler {\t(r.) \+= 4\n} } } */ + -- 2.40.1
[Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293 --- Comment #15 from Jan Hubicka --- if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb) /* If result of comparsion is unknown, prefer EARLY_BB. Thus use !(...>=..) rather than (...<...) */ - && !(best_bb->count * 100 >= early_bb->count * threshold)) + && !(best_bb->count * 100 > early_bb->count * threshold)) return best_bb; Comparing loop depths seems ceartainly odd. If we want to test best_bb and early_bb to be in same loop, we want to test loop_father. What is a benefit of testing across loop nests? Profile report here claims: dump id |static mismat|dynamic mismatch | |in count |in count |time | lsplit | 5+5| 8151850567 +8151850567| 531506481006 +57.9%| ldist | 9+4| 15345493501 +7193642934| 606848841056 +14.2%| ifcvt | 10+1| 15487514871 +142021370| 689469797790 +13.6%| vect| 35 +25| 17558425961 +2070911090| 517375405715 -25.0%| cunroll | 42+7| 16898736178 -659689783| 452445796198-4.9%| loopdone| 33-9| 2678017188 -14220718990| 330969127663 | tracer | 34+1| 2678018710+1522| 330613415364+0.0%| fre | 33-1| 2676980249 -1038461| 330465677073-0.0%| expand | 28-5| 2497468467 -179511782|--| so looks like loop splitting, distribution and vectorizer does disturb profile signficantly. (Ifcft does so by design and the damage is undone later.) Not sure if that is the real problem though.
[Bug c++/85944] Address of temporary at global scope not considered constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85944 Andrew Pinski changed: What|Removed |Added CC||gccbugbjorn at fahller dot se --- Comment #12 from Andrew Pinski --- *** Bug 110828 has been marked as a duplicate of this bug. ***
[Bug c++/55004] [meta-bug] constexpr issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55004 Bug 55004 depends on bug 110828, which changed state. Bug 110828 Summary: union constexpr dtor not constexpr when used in member array https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110828 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE
[Bug c++/110828] union constexpr dtor not constexpr when used in member array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110828 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #3 from Andrew Pinski --- (In reply to Patrick Palka from comment #1) > Does it work if you move the static_assert into a function scope? If so then > this is probably a dup of PR85944. Yes it does work with: ``` void f1() { static_assert(S{}.f()); } ``` So yes it is a dup. *** This bug has been marked as a duplicate of bug 85944 ***
[Bug tree-optimization/110755] [13 Regression] Wrong optimization of fabs on ppc64el at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110755 --- Comment #14 from CVS Commits --- The releases/gcc-13 branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:e684084a5fa9edaedb1a14e118b966a60e3449b9 commit r13-7615-ge684084a5fa9edaedb1a14e118b966a60e3449b9 Author: Jakub Jelinek Date: Wed Jul 26 10:50:50 2023 +0200 range-op-float: Fix up -frounding-math frange_arithmetic +- handling [PR110755] IEEE754 says that x + (-x) and x - x result in +0 in all rounding modes but rounding towards negative infinity, in which case the result is -0 for all finite x. x + x and x - (-x) if it is zero retain sign of x. Now, range_arithmetic implements the normal rounds to even rounding, and as the addition or subtraction in those cases is exact, we don't do any further rounding etc. and e.g. on the testcase below distilled from glibc compute a range [+0, +INF], which is fine for -fno-rounding-math or if we'd have a guarantee that those statements aren't executed with rounding towards negative infinity. I believe it is only +- which has this problematic behavior and I think it is best to deal with it in frange_arithmetic; if we know -frounding-math is on, it is x + (-x) or x - x and we are asked to round to negative infinity (i.e. want low bound rather than high bound), change +0 result to -0. 2023-07-26 Jakub Jelinek PR tree-optimization/110755 * range-op-float.cc (frange_arithmetic): Change +0 result to -0 for PLUS_EXPR or MINUS_EXPR if -frounding-math, inf is negative and it is exact op1 + (-op1) or op1 - op1. * gcc.dg/pr110755.c: New test. (cherry picked from commit 21da32d995c8b574c929ec420cd3b0fcfe6fa4fe)
[Bug gcov-profile/110827] C++20 coroutines aren't being measured by gcov
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110827 --- Comment #4 from Michael Duggan --- I should be more explicit. The `std::cout` line in the example is just a placeholder for "does some work here," and this example is specifically the simplest version of a coroutine I could come up with that would demonstrate the problem. When I initially encountered this problem, I was doing coverage testing that included a coroutine that was over 70 lines long, includes lots of loops and branching, and exited and re-entered multiple times via `co_yield`. I wanted to know if my test programs properly covered all of the branches. It is not enough to know how many times the coroutine itself is called.
[Bug middle-end/110832] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 --- Comment #1 from Jan Hubicka --- This time it seems that there is only one profile change: commit 645c67f80c6258c1f54ec567f604008adbdb8a04 Author: Jan Hubicka Date: Wed Jul 26 08:59:23 2023 +0200 Fix profile_count::to_sreal_scale gcc/ChangeLog: * profile-count.cc (profile_count::to_sreal_scale): Value is not know if we divide by zero. Which should not be very important.
[Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 Bug ID: 110832 Summary: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Biggest regression is seen here https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=466.758.0 zen3 https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=466.758.0 Curiously zen2 improves: https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=171.758.0 I can see instruction count differnece in perfs: Performance counter stats for './a.out': 10923.70 msec task-clock:u #1.000 CPUs utilized 0 context-switches:u #0.000 /sec 0 cpu-migrations:u #0.000 /sec 15510 page-faults:u#1.420 K/sec 59062937176 cycles:u #5.407 GHz (83.33%) 12607081 stalled-cycles-frontend:u#0.02% frontend cycles idle(83.34%) 122404896 stalled-cycles-backend:u #0.21% backend cycles idle (83.34%) 112648123380 instructions:u #1.91 insn per cycle #0.00 stalled cycles per insn (83.34%) 9666338531 branches:u # 884.896 M/sec (83.34%) 2937216 branch-misses:u #0.03% of all branches (83.31%) 10.924108973 seconds time elapsed 10.912056000 seconds user 0.01200 seconds sys Performance counter stats for './b.out': 11025.38 msec task-clock:u #1.000 CPUs utilized 0 context-switches:u #0.000 /sec 0 cpu-migrations:u #0.000 /sec 14998 page-faults:u#1.360 K/sec 59436352848 cycles:u #5.391 GHz (83.31%) 9217660 stalled-cycles-frontend:u#0.02% frontend cycles idle(83.32%) 210162784 stalled-cycles-backend:u #0.35% backend cycles idle (83.35%) 131604240004 instructions:u #2.21 insn per cycle #0.00 stalled cycles per insn (83.35%) 9657712171 branches:u # 875.953 M/sec (83.35%) 3146487 branch-misses:u #0.03% of all branches (83.33%) 11.025701172 seconds time elapsed 11.005646000 seconds user 0.020002000 seconds sys however perf report does not show clear differences in times of functions. I
Re: [PATCH 4/5] testsuite part 1 for _BitInt support [PR102989]
On Thu, Jul 27, 2023 at 07:15:28PM +0200, Jakub Jelinek via Gcc-patches wrote: > testcases, I've been using > https://defuse.ca/big-number-calculator.htm > tool, a randombitint tool I wrote (will post as a reply to this) plus > LLVM trunk on godbolt and the WIP GCC support checking if both compilers > agree on stuff (and in case of differences tried constant evaluation etc.). So, the randombitint.c tool is attached, when invoked like ./randombitint 174 it prints pseudo random 174 bit integer in decimal, when invoked as ./randombitint 575 mask it prints all ones number as decimal for the 575 bit precision, and ./randombitint 275 0x432445aebe435646567547567647 prints the given hexadecimal number as decimal (all using gmp). In the tests I've often used __attribute__((noipa)) void printme (const void *p, int n) { __builtin_printf ("0x"); if ((n & 7) != 0) __builtin_printf ("%02x", ((const unsigned char *) p)[n / 8] & ((1 << (n & 7)) - 1)); for (int i = (n / 8) - 1; i >= 0; --i) __builtin_printf ("%02x", ((const unsigned char *) p)[i]); __builtin_printf ("\n"); } function to print hexadecimal values (temporaries or finals) and then used the third invocation of the tool to convert those to decimal. For unsigned _BitInt just called the above like printme (, 575); where 575 was the N from unsigned _BitInt(N) whatever, or _BitInt(575) x = ... if (x < 0) { __builtin_printf ("-"); x = -x; } printme (, 575); to print it signed. Jakub #include #include #include #include #include int main (int argc, const char *argv[]) { int n = atoi (argv[1]); int m = (n + 7) / 8; char *p = __builtin_alloca (m * 2 + 1); const char *q; srandom (getpid ()); for (int i = 0; i < m; ++i) { unsigned char v = random (); if (argc >= 3 && strcmp (argv[2], "mask") == 0) v = 0xff; if (i == 0 && (n & 7) != 0) v &= (1 << (n & 7)) - 1; sprintf ([2 * i], "%02x", v); } p[m * 2] = '\0'; mpz_t a; if (argc >= 3 && strcmp (argv[2], "mask") != 0) { q = argv[2]; if (q[0] == '0' && q[1] == 'x') q += 2; } else q = p; gmp_sscanf (q, "%Zx", a); gmp_printf ("0x%s\n%Zd\n", q, a); return 0; }
[PATCH 3/5] C _BitInt support [PR102989]
Hi! This patch adds the C FE support, c-family support, small libcpp change so that 123wb and 42uwb suffixes are handled plus glimits.h change to define BITINT_MAXWIDTH macro. The previous two patches really do nothing without this, which enables all the support. 2023-07-27 Jakub Jelinek PR c/102989 gcc/ * glimits.h (BITINT_MAXWIDTH): Define if __BITINT_MAXWIDTH__ is predefined. gcc/c-family/ * c-common.cc (c_common_reswords): Add _BitInt as keyword. (c_common_signed_or_unsigned_type): Handle BITINT_TYPE. (check_builtin_function_arguments): Handle BITINT_TYPE like INTEGER_TYPE. (keyword_begins_type_specifier): Handle RID_BITINT. * c-common.h (enum rid): Add RID_BITINT enumerator. * c-cppbuiltin.cc (c_cpp_builtins): For C call targetm.c.bitint_type_info and predefine __BITINT_MAXWIDTH__ and for -fbuilding-libgcc also __LIBGCC_BITINT_LIMB_WIDTH__ and __LIBGCC_BITINT_ORDER__ macros if _BitInt is supported. * c-lex.cc (interpret_integer): Handle CPP_N_BITINT. * c-pretty-print.cc (c_pretty_printer::simple_type_specifier, c_pretty_printer::direct_abstract_declarator): Handle BITINT_TYPE. (pp_c_integer_constant): Handle printing of large precision wide_ints which would buffer overflow digit_buffer. * c-ubsan.cc (ubsan_instrument_shift): Use UBSAN_PRINT_FORCE_INT for type0 type descriptor. gcc/c/ * c-convert.cc (c_convert): Handle BITINT_TYPE like INTEGER_TYPE. * c-decl.cc (declspecs_add_type): Formatting fixes. Handle cts_bitint. Adjust for added union in *specs. Handle RID_BITINT. (finish_declspecs): Handle cts_bitint. Adjust for added union in *specs. * c-parser.cc (c_keyword_starts_typename, c_token_starts_declspecs, c_parser_declspecs, c_parser_gnu_attribute_any_word): Handle RID_BITINT. * c-tree.h (enum c_typespec_keyword): Mention _BitInt in comment. Add cts_bitint enumerator. (struct c_declspecs): Move int_n_idx and floatn_nx_idx into a union and add bitint_prec there as well. * c-typeck.cc (composite_type, c_common_type, comptypes_internal): Handle BITINT_TYPE. (build_array_ref, build_unary_op, build_conditional_expr, convert_for_assignment, digest_init, build_binary_op): Likewise. libcpp/ * expr.cc (interpret_int_suffix): Handle wb and WB suffixes. * include/cpplib.h (CPP_N_BITINT): Define. --- gcc/glimits.h.jj2023-01-03 00:20:35.071086812 +0100 +++ gcc/glimits.h 2023-07-27 15:03:24.238234396 +0200 @@ -157,6 +157,11 @@ see the files COPYING3 and COPYING.RUNTI # undef BOOL_WIDTH # define BOOL_WIDTH 1 +# ifdef __BITINT_MAXWIDTH__ +# undef BITINT_MAXWIDTH +# define BITINT_MAXWIDTH __BITINT_MAXWIDTH__ +# endif + # define __STDC_VERSION_LIMITS_H__ 202311L #endif --- gcc/c-family/c-common.cc.jj 2023-07-24 17:48:26.436041278 +0200 +++ gcc/c-family/c-common.cc2023-07-27 15:03:24.276233865 +0200 @@ -349,6 +349,7 @@ const struct c_common_resword c_common_r { "_Alignas",RID_ALIGNAS, D_CONLY }, { "_Alignof",RID_ALIGNOF, D_CONLY }, { "_Atomic", RID_ATOMIC,D_CONLY }, + { "_BitInt", RID_BITINT,D_CONLY }, { "_Bool", RID_BOOL, D_CONLY }, { "_Complex",RID_COMPLEX,0 }, { "_Imaginary", RID_IMAGINARY, D_CONLY }, @@ -2728,6 +2729,9 @@ c_common_signed_or_unsigned_type (int un || TYPE_UNSIGNED (type) == unsignedp) return type; + if (TREE_CODE (type) == BITINT_TYPE) +return build_bitint_type (TYPE_PRECISION (type), unsignedp); + #define TYPE_OK(node) \ (TYPE_MODE (type) == TYPE_MODE (node) \ && TYPE_PRECISION (type) == TYPE_PRECISION (node)) @@ -6341,8 +6345,10 @@ check_builtin_function_arguments (locati code0 = TREE_CODE (TREE_TYPE (args[0])); code1 = TREE_CODE (TREE_TYPE (args[1])); if (!((code0 == REAL_TYPE && code1 == REAL_TYPE) - || (code0 == REAL_TYPE && code1 == INTEGER_TYPE) - || (code0 == INTEGER_TYPE && code1 == REAL_TYPE))) + || (code0 == REAL_TYPE + && (code1 == INTEGER_TYPE || code1 == BITINT_TYPE)) + || ((code0 == INTEGER_TYPE || code0 == BITINT_TYPE) + && code1 == REAL_TYPE))) { error_at (loc, "non-floating-point arguments in call to " "function %qE", fndecl); @@ -8402,6 +8408,7 @@ keyword_begins_type_specifier (enum rid case RID_FRACT: case RID_ACCUM: case RID_BOOL: +case RID_BITINT: case RID_WCHAR: case RID_CHAR8: case RID_CHAR16: --- gcc/c-family/c-common.h.jj 2023-06-26 09:27:04.276367532 +0200 +++ gcc/c-family/c-common.h
[PATCH 2/5] libgcc _BitInt support [PR102989]
Hi! This patch adds the library helpers for multiplication, division + modulo and casts from and to floating point. As described in the intro, the first step is try to reduce further the passed in precision by skipping over most significant limbs with just zeros or sign bit copies. For multiplication and division I've implemented a simple algorithm, using something smarter like Karatsuba or Toom N-Way might be faster for very large _BitInts (which we don't support right now anyway), but could mean more code in libgcc, which maybe isn't what people are willing to accept. For the to/from floating point conversions the patch uses soft-fp, because it already has tons of handy macros which can be used for that. In theory it could be implemented using {,unsigned} long long or {,unsigned} __int128 to/from floating point conversions with some frexp before/after, but at that point we already need to force it into integer registers and analyze it anyway. Plus, for 32-bit arches there is no __int128 that could be used for XF/TF mode stuff. I know that soft-fp is owned by glibc and I think the op-common.h change should be propagated there, but the bitint stuff is really GCC specific and IMHO doesn't belong into the glibc copy. 2023-07-27 Jakub Jelinek PR c/102989 libgcc/ * config/aarch64/t-softfp (softfp_extras): Use += rather than :=. * config/i386/64/t-softfp (softfp_extras): Likewise. * config/i386/libgcc-glibc.ver (GCC_14.0.0): Export _BitInt support routines. * config/i386/t-softfp (softfp_extras): Add fixxfbitint and bf, hf and xf mode floatbitint. (CFLAGS-floatbitintbf.c, CFLAGS-floatbitinthf.c): Add -msse2. * config/riscv/t-softfp32 (softfp_extras): Use += rather than :=. * config/rs6000/t-e500v1-fp (softfp_extras): Likewise. * config/rs6000/t-e500v2-fp (softfp_extras): Likewise. * config/t-softfp (softfp_floatbitint_funcs): New. (softfp_func_list): Add sf and df mode from and to _BitInt libcalls. * config/t-softfp-sfdftf (softfp_extras): Add fixtfbitint and floatbitinttf. * config/t-softfp-tf (softfp_extras): Likewise. * libgcc2.c (bitint_reduce_prec): New inline function. (BITINT_INC, BITINT_END): Define. (bitint_mul_1, bitint_addmul_1): New helper functions. (__mulbitint3): New function. (bitint_negate, bitint_submul_1): New helper functions. (__divmodbitint4): New function. * libgcc2.h (LIBGCC2_UNITS_PER_WORD): When building _BitInt support libcalls, redefine depending on __LIBGCC_BITINT_LIMB_WIDTH__. (__mulbitint3, __divmodbitint4): Declare. * libgcc-std.ver.in (GCC_14.0.0): Export _BitInt support routines. * Makefile.in (lib2funcs): Add _mulbitint3. (LIB2_DIVMOD_FUNCS): Add _divmodbitint4. * soft-fp/bitint.h: New file. * soft-fp/fixdfbitint.c: New file. * soft-fp/fixsfbitint.c: New file. * soft-fp/fixtfbitint.c: New file. * soft-fp/fixxfbitint.c: New file. * soft-fp/floatbitintbf.c: New file. * soft-fp/floatbitintdf.c: New file. * soft-fp/floatbitinthf.c: New file. * soft-fp/floatbitintsf.c: New file. * soft-fp/floatbitinttf.c: New file. * soft-fp/floatbitintxf.c: New file. * soft-fp/op-common.h (_FP_FROM_INT): Add support for rsize up to 4 * _FP_W_TYPE_SIZE rather than just 2 * _FP_W_TYPE_SIZE. --- libgcc/config/aarch64/t-softfp.jj 2023-03-13 00:11:52.330213322 +0100 +++ libgcc/config/aarch64/t-softfp 2023-07-14 12:38:30.764869473 +0200 @@ -3,7 +3,7 @@ softfp_int_modes := si di ti softfp_extensions := sftf dftf hftf bfsf softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf softfp_exclude_libgcc2 := n -softfp_extras := fixhfti fixunshfti floattihf floatuntihf \ +softfp_extras += fixhfti fixunshfti floattihf floatuntihf \ floatdibf floatundibf floattibf floatuntibf TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes --- libgcc/config/i386/64/t-softfp.jj 2023-03-10 20:39:43.849687830 +0100 +++ libgcc/config/i386/64/t-softfp 2023-07-14 12:37:55.422344930 +0200 @@ -1,4 +1,4 @@ -softfp_extras := fixhfti fixunshfti floattihf floatuntihf \ +softfp_extras += fixhfti fixunshfti floattihf floatuntihf \ floattibf floatuntibf CFLAGS-fixhfti.c += -msse2 --- libgcc/config/i386/libgcc-glibc.ver.jj 2023-07-11 13:39:49.760107863 +0200 +++ libgcc/config/i386/libgcc-glibc.ver 2023-07-17 09:45:43.128281615 +0200 @@ -226,3 +226,13 @@ GCC_13.0.0 { __truncxfbf2 __trunchfbf2 } + +%inherit GCC_14.0.0 GCC_13.0.0 +GCC_14.0.0 { + __PFX__fixxfbitint + __PFX__fixtfbitint + __PFX__floatbitintbf + __PFX__floatbitinthf + __PFX__floatbitintxf + __PFX__floatbitinttf +} --- libgcc/config/i386/t-softfp.jj 2022-10-14 09:35:56.268989311 +0200 +++ libgcc/config/i386/t-softfp 2023-07-17 09:38:43.575980078 +0200 @@ -10,7 +10,7 @@
RE: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.
Hi Richard, You're 100% right. It’s possible to significantly clean-up this code, replacing the body of the conditional with a call to force_reg and simplifying the conditions under which it is called. These improvements are implemented in the patch below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap and make -k check, both with and without -m32, as usual. Interestingly, the CONCAT clause afterwards is still required (I've learned something new), as calling force_reg (or gen_reg_rtx) with HCmode, actually returns a CONCAT instead of a REG, so although the code looks dead, it's required to build libgcc during a bootstrap. But the remaining clean-up is good, reducing the number of source lines and making the logic easier to understand. Ok for mainline? 2023-07-27 Roger Sayle Richard Biener gcc/ChangeLog PR middle-end/28071 PR rtl-optimization/110587 * expr.cc (emit_group_load_1): Simplify logic for calling force_reg on ORIG_SRC, to avoid making a copy if the source is already in a pseudo register. Roger -- > -Original Message- > From: Richard Biener > Sent: 25 July 2023 12:50 > > On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle > wrote: > > > > This patch is the third in series of fixes for PR > > rtl-optimization/110587, a compile-time regression with -O0, that > > attempts to address the underlying cause. As noted previously, the > > pathological test case pr28071.c contains a large number of useless > > register-to-register moves that can produce quadratic behaviour (in > > LRA). These move are generated during RTL expansion in > > emit_group_load_1, where the middle-end attempts to simplify the > > source before calling extract_bit_field. This is reasonable if the > > source is a complex expression (from before the tree-ssa optimizers), > > or a SUBREG, or a hard register, but it's not particularly useful to > > copy a pseudo register into a new pseudo register. This patch eliminates > > that > redundancy. > > > > The -fdump-tree-expand for pr28071.c compiled with -O0 currently > > contains 777K lines, with this patch it contains 717K lines, i.e. > > saving about 60K lines (admittedly of debugging text output, but it makes > > the > point). > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} > > with no new failures. Ok for mainline? > > > > As always, I'm happy to revert this change quickly if there's a > > problem, and investigate why this additional copy might (still) be > > needed on other > > non-x86 targets. > > @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, > tree type, > be loaded directly into the destination. */ >src = orig_src; >if (!MEM_P (orig_src) > + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src)) > && (!CONSTANT_P (orig_src) > || (GET_MODE (orig_src) != mode > && GET_MODE (orig_src) != VOIDmode))) > > so that means the code guarded by the conditional could instead be transformed > to > >src = force_reg (mode, orig_src); > > ? Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) != > VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) for the > move ... that might also mean we have to use force_reg (GET_MODE (orig_src) == > VOIDmode ? mode : GET_MODE (orig_src), orig_src)) > > Otherwise I think this is OK, as said, using force_reg somehow would improve > readability here I think. > > I also wonder how the > > else if (GET_CODE (src) == CONCAT) > > case will ever trigger with the current code. > > Richard. > > > > > 2023-07-25 Roger Sayle > > > > gcc/ChangeLog > > PR middle-end/28071 > > PR rtl-optimization/110587 > > * expr.cc (emit_group_load_1): Avoid copying a pseudo register into > > a new pseudo register, i.e. only copy hard regs into a new pseudo. > > > > diff --git a/gcc/expr.cc b/gcc/expr.cc index fff09dc..174f8ac 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -2622,16 +2622,11 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, tree type, be loaded directly into the destination. */ src = orig_src; if (!MEM_P (orig_src) - && (!CONSTANT_P (orig_src) - || (GET_MODE (orig_src) != mode - && GET_MODE (orig_src) != VOIDmode))) + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src)) + && !CONSTANT_P (orig_src)) { - if (GET_MODE (orig_src) == VOIDmode) - src = gen_reg_rtx (mode); - else - src = gen_reg_rtx (GET_MODE (orig_src)); - - emit_move_insn (src, orig_src); + gcc_assert (GET_MODE (orig_src) != VOIDmode); + src = force_reg (GET_MODE (orig_src), orig_src); } /* Optimize the access just a bit. */
[PATCH 0/5] GCC _BitInt support [PR102989]
[PATCH 0/5] GCC _BitInt support [PR102989] The following patch series introduces support for C23 bit-precise integer types. In short, they are similar to other integral types in many ways, just aren't subject for integral promotions if smaller than int and they can have even much wider precisions than ordinary integer types. It is enabled only on targets which have agreed on processor specific ABI how to lay those out or pass as function arguments/return values, which currently is just x86-64 I believe, would be nice if target maintainers helped to get agreement on psABI changes and GCC 14 could enable it on far more architectures than just one. C23 says that defines BITINT_MAXWIDTH macro and that is the largest supported precision of the _BitInt types, smallest is precision of unsigned long long (but due to lack of psABI agreement we'll violate that on architectures which don't have the support done yet). The following series uses for the time just WIDE_INT_MAX_PRECISION as that BITINT_MAXWIDTH, with the intent to increase it incrementally later on. WIDE_INT_MAX_PRECISION is 575 bits on x86_64, but will be even smaller on lots of architectures. This is the largest precision we can support without changes of wide_int/widest_int representation (to make those non-POD and allow use of some allocated buffer rather than the included fixed size one). Once that would be overcome, there is another internal enforced limit, INTEGER_CST in current layout allows at most 255 64-bit limbs, which is 16320 bits as another cap. And if that is overcome, then we have limitation of TYPE_PRECISION being 16-bit, so 65535 as maximum precision. Perhaps we could make TYPE_PRECISION dependent on BITINT_TYPE vs. others and use 32-bit precision in that case later. Latest Clang/LLVM I think supports on paper up to 8388608 bits, but is hardly usable even with much shorter precisions. Besides this hopefully temporary cap on supported precision and support only on targets which buy into it, the support has the following limitations: - _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd like to enable those incrementally, but don't really see details on how such bit-fields should be laid-out in memory nor passed inside of function arguments; LLVM implements something, but it is a question if that is what the various ABIs want - conversions between large/huge (see later) _BitInt and _Decimal{32,64,128} aren't support and emit a sorry; I'm not familiar enough with DFP stuff to implement that - _Complex _BitInt(N) isn't supported; again mainly because none of the psABIs mention how those should be passed/returned; in a limited way they are supported internally because the internal functions into which __builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE as a hack to return 2 values without using references/pointers - vectors of _BitInt(N) aren't supported, both because psABIs don't specify how that works and because I'm not really sure it would be useful given lack of hw support for anything but bit-precise integers with the same bit precision as standard integer types Because the bit-precise types have different behavior both in the C FE (e.g. the lack of promotion) and do or can have different behavior in type layout and function argument passing/returning values, the patch introduces a new integral type, BITINT_TYPE, so various spots which explicitly check for INTEGER_TYPE and not say INTEGRAL_TYPE_P macro need to be adjusted. Also the assumption that all integral types have scalar integer type mode is no longer true, larger BITINT_TYPEs have BLKmode type. The patch makes 4 different categories of _BitInt depending on the target hook decisions and their precision. The x86-64 psABI says that _BitInt which fit into signed/unsigned char, short, int, long and long long are laid out and passed as those types (with padding bits undefined if they don't have mode precision). Such smallest precision bit-precise integer types are categorized as small, the target hook gives for specific precision a scalar integral mode where a single such mode contains all the bits. Such small _BitInt types are generally kept in the IL until expansion into RTL, with minor tweaks during expansion to avoid relying on the padding bit values. All larger precision _BitInt types are supposed to be handled as structure containing an array of limbs or so, where a limb has some integral mode (for libgcc purposes best if it has word-size) and the limbs have either little or big endian ordering in the array. The padding bits in the most significant limb if any are either undefined or should be always sign/zero extended (but support for this isn't in yet, we don't know if any psABI will require it). As mentioned in some psABI proposals, while currently there is just one limb mode, if the limb ordering would follow normal target endianity, there is always a possibility to have two limb
[Bug other/110831] New: [14 regression] gcc.dg/stack-check-3.c ICEs after r14-2822-g499b8079a6419b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110831 Bug ID: 110831 Summary: [14 regression] gcc.dg/stack-check-3.c ICEs after r14-2822-g499b8079a6419b Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: seurer at gcc dot gnu.org Target Milestone: --- g:499b8079a6419bb8082de062ec30772296c6700c, r14-2822-g499b8079a6419b make -k check-gcc RUNTESTFLAGS="dg.exp=gcc.dg/stack-check-3.c" FAIL: gcc.dg/stack-check-3.c (internal compiler error: in to_gcov_type, at profile-count.h:831) FAIL: gcc.dg/stack-check-3.c (test for excess errors) FAIL: gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and probing residuals" 7 FAIL: gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and probing in loop" 4 FAIL: gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and probing in rotated loop" 1 FAIL: gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and probing inline" 1 FAIL: gcc.dg/stack-check-3.c scan-rtl-dump-times expand "skipped dynamic allocation and probing loop" 1 # of unexpected failures7 spawn -ignore SIGHUP /home/seurer/gcc/git/build/gcc-test/gcc/xgcc -B/home/seurer/gcc/git/build/gcc-test/gcc/ exceptions_enabled3672263.cc -fdiagnostics-plain-output -Wno-complain-wrong-lang -S -o exceptions_enabled3672263.s FAIL: gcc.dg/stack-check-3.c (test for excess errors) Excess errors: during RTL pass: expand dump file: stack-check-3.c.258r.expand /home/seurer/gcc/git/gcc-test/gcc/testsuite/gcc.dg/stack-check-3.c:25:1: internal compiler error: in to_gcov_type, at profile-count.h:831 0x10b3e7b7 profile_count::to_gcov_type() const /home/seurer/gcc/git/gcc-test/gcc/profile-count.h:831 0x10b3e7b7 dump_prediction /home/seurer/gcc/git/gcc-test/gcc/predict.cc:797 0x10b495bf combine_predictions_for_insn /home/seurer/gcc/git/gcc-test/gcc/predict.cc:1039 0x10b495bf guess_outgoing_edge_probabilities(basic_block_def*) /home/seurer/gcc/git/gcc-test/gcc/predict.cc:2356 0x11a4bec7 compute_outgoing_frequencies /home/seurer/gcc/git/gcc-test/gcc/cfgbuild.cc:692 0x11a4bec7 find_many_sub_basic_blocks(simple_bitmap_def*) /home/seurer/gcc/git/gcc-test/gcc/cfgbuild.cc:792 0x10520083 execute /home/seurer/gcc/git/gcc-test/gcc/cfgexpand.cc:6933 commit 499b8079a6419bb8082de062ec30772296c6700c (HEAD) Author: Jan Hubicka Date: Thu Jul 27 15:57:54 2023 +0200 Fix profile_count::apply_probability
[committed] OpenMP/Fortran: Extend reject code between target + teams [PR71065, PR110725] (was: Re: [patch] OpenMP/Fortran: Reject declarations between target + teams (was: [Patch] OpenMP/Fortran: Rej
Yet another omission, the flag was not properly set for deeply buried 'omp teams' as I stopped too early when walking up the stack. Now fixed by commit r14-2826-g081e25d3cfd86c * * * This was found when 'repairing' the feature on the OG13 (devel/omp/gcc-13) branch for metadirectives, cf. the second attached patch, applied after cherry-picking the mainline patch. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 081e25d3cfd86c4094999ded0bbe99b91762013c Author: Tobias Burnus Date: Thu Jul 27 18:14:11 2023 +0200 OpenMP/Fortran: Extend reject code between target + teams [PR71065, PR110725] The previous version failed to diagnose when the 'teams' was nested more deeply inside the target region, e.g. inside a DO or some block or structured block. PR fortran/110725 PR middle-end/71065 gcc/fortran/ChangeLog: * openmp.cc (resolve_omp_target): Minor cleanup. * parse.cc (decode_omp_directive): Find TARGET statement also higher in the stack. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/teams-6.f90: Extend. diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 52eeaf2d4da..2952cd300ac 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -10666,15 +10666,14 @@ resolve_omp_target (gfc_code *code) if (!code->ext.omp_clauses->contains_teams_construct) return; + gfc_code *c = code->block->next; if (code->ext.omp_clauses->target_first_st_is_teams - && ((GFC_IS_TEAMS_CONSTRUCT (code->block->next->op) - && code->block->next->next == NULL) - || (code->block->next->op == EXEC_BLOCK - && code->block->next->next - && GFC_IS_TEAMS_CONSTRUCT (code->block->next->next->op) - && code->block->next->next->next == NULL))) + && ((GFC_IS_TEAMS_CONSTRUCT (c->op) && c->next == NULL) + || (c->op == EXEC_BLOCK + && c->next + && GFC_IS_TEAMS_CONSTRUCT (c->next->op) + && c->next->next == NULL))) return; - gfc_code *c = code->block->next; while (c && !GFC_IS_TEAMS_CONSTRUCT (c->op)) c = c->next; if (c) diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc index aa6bb663def..e797402b59f 100644 --- a/gcc/fortran/parse.cc +++ b/gcc/fortran/parse.cc @@ -1318,32 +1318,27 @@ decode_omp_directive (void) case ST_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO: case ST_OMP_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: case ST_OMP_TEAMS_LOOP: - if (gfc_state_stack->previous && gfc_state_stack->previous->tail) - { - gfc_state_data *stk = gfc_state_stack; - do { - stk = stk->previous; - } while (stk && stk->tail && stk->tail->op == EXEC_BLOCK); - if (stk && stk->tail) - switch (stk->tail->op) - { - case EXEC_OMP_TARGET: - case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE: - case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_SIMD: - case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO: - case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: - case EXEC_OMP_TARGET_TEAMS_LOOP: - case EXEC_OMP_TARGET_PARALLEL: - case EXEC_OMP_TARGET_PARALLEL_DO: - case EXEC_OMP_TARGET_PARALLEL_DO_SIMD: - case EXEC_OMP_TARGET_PARALLEL_LOOP: - case EXEC_OMP_TARGET_SIMD: - stk->tail->ext.omp_clauses->contains_teams_construct = 1; - break; - default: - break; - } - } + for (gfc_state_data *stk = gfc_state_stack->previous; stk; + stk = stk->previous) + if (stk && stk->tail) + switch (stk->tail->op) + { + case EXEC_OMP_TARGET: + case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE: + case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_SIMD: + case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO: + case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO_SIMD: + case EXEC_OMP_TARGET_TEAMS_LOOP: + case EXEC_OMP_TARGET_PARALLEL: + case EXEC_OMP_TARGET_PARALLEL_DO: + case EXEC_OMP_TARGET_PARALLEL_DO_SIMD: + case EXEC_OMP_TARGET_PARALLEL_LOOP: + case EXEC_OMP_TARGET_SIMD: + stk->tail->ext.omp_clauses->contains_teams_construct = 1; + break; + default: + break; + } break; case ST_OMP_ERROR: if (new_st.ext.omp_clauses->at != OMP_AT_EXECUTION) diff --git a/gcc/testsuite/gfortran.dg/gomp/teams-6.f90 b/gcc/testsuite/gfortran.dg/gomp/teams-6.f90 index be453f27f40..0bd7735e738 100644 --- a/gcc/testsuite/gfortran.dg/gomp/teams-6.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/teams-6.f90 @@ -37,6 +37,16 @@ end block i = 5 !$omp end teams !$omp end target + + +!$omp target ! { dg-error "OMP TARGET region at .1. with a nested TEAMS may not contain any other statement, declaration or directive outside of the single TEAMS construct" } +block + do i = 5, 8 +!$omp teams +block; end block +
RE : Cfuture Manpower Hiring
Hi, I trust this email finds you well. Our Organization hiring the best and the brightest talent in the industry. We hire individuals with a strong sense of pride in their performance, team spirit, and a desire to excel. To provide our clients with Professional, Quality and value added services ensuring customer delight, thus building a long term relationship rather than short term gains. Why you have to prefer us; *TAT duration- Just 24 hours *Deadline to close the position is one week(depends upon Client procedure) *Availability - 6 days in a week, all available on call round the clock. *Sources Access to the database from beginner to top management level Or service charges are as below; A) The professional fee will be calculated as a percentage of the incumbent's gross annual salary @ 8.33% on annual CTC which excludes GST. B) Payment should be made within 30 days from the date of submission of invoice C) Replacement of candidate who leave the organization within 90 days of joining Thanks in advance. Assuring you the best of our efforts to begin a new relationship. Would request you to revert with your confirmation which enables us to start the recruitment process. We look forward to receiving your detailed job inquiry with specifications and other parameters to enable us to submit our suitable and competitive profiles. Kind Regards, Vinod Thomas Bangalore If you do not wish to receive future emails from us, please reply as "opt-out"
[Bug c++/110828] union constexpr dtor not constexpr when used in member array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110828 --- Comment #2 from Björn Fahller --- If I write it in the same way in a function, it compiles. consteval auto f() { return S{}.f(); } constexpr auto b = f(); However, if I break it into a constexpr object S s; and return s.f(), it does not compile, this time because the construction of 's' fails because it refers to an unititialized variable, regardless of whether the member is an array or not. union type { constexpr type(){} constexpr ~type() {} int t; }; struct S { constexpr S() {} constexpr bool f() const { return true;} type v{}; }; consteval auto f() { constexpr S s; return s.f(); } constexpr auto b = f(); https://godbolt.org/z/68zY6ecxs
[Bug middle-end/71065] Missing diagnostic for statements between OpenMP 'target' and 'teams'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71065 --- Comment #8 from CVS Commits --- The master branch has been updated by Tobias Burnus : https://gcc.gnu.org/g:081e25d3cfd86c4094999ded0bbe99b91762013c commit r14-2826-g081e25d3cfd86c4094999ded0bbe99b91762013c Author: Tobias Burnus Date: Thu Jul 27 18:14:11 2023 +0200 OpenMP/Fortran: Extend reject code between target + teams [PR71065, PR110725] The previous version failed to diagnose when the 'teams' was nested more deeply inside the target region, e.g. inside a DO or some block or structured block. PR fortran/110725 PR middle-end/71065 gcc/fortran/ChangeLog: * openmp.cc (resolve_omp_target): Minor cleanup. * parse.cc (decode_omp_directive): Find TARGET statement also higher in the stack. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/teams-6.f90: Extend.
[Bug fortran/110725] [13/14 Regression] internal compiler error: in expand_expr_real_1, at expr.cc:10897
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110725 --- Comment #9 from CVS Commits --- The master branch has been updated by Tobias Burnus : https://gcc.gnu.org/g:081e25d3cfd86c4094999ded0bbe99b91762013c commit r14-2826-g081e25d3cfd86c4094999ded0bbe99b91762013c Author: Tobias Burnus Date: Thu Jul 27 18:14:11 2023 +0200 OpenMP/Fortran: Extend reject code between target + teams [PR71065, PR110725] The previous version failed to diagnose when the 'teams' was nested more deeply inside the target region, e.g. inside a DO or some block or structured block. PR fortran/110725 PR middle-end/71065 gcc/fortran/ChangeLog: * openmp.cc (resolve_omp_target): Minor cleanup. * parse.cc (decode_omp_directive): Find TARGET statement also higher in the stack. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/teams-6.f90: Extend.
[Bug target/110781] bpf: make use of the V4 long-range jump instruction (jal/gotol)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110781 Jose E. Marchesi changed: What|Removed |Added Resolution|--- |MOVED Status|UNCONFIRMED |RESOLVED --- Comment #1 from Jose E. Marchesi --- This is actually the assembler's business. Moved to https://sourceware.org/bugzilla/show_bug.cgi?id=30690
[Bug c/102989] Implement C2x's n2763 (_BitInt)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989 Jakub Jelinek changed: What|Removed |Added Attachment #55642|0 |1 is obsolete|| --- Comment #91 from Jakub Jelinek --- Created attachment 55649 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55649=edit gcc14-bitint.patch Full patch including ChangeLog I'll submit after testing finishes.
[committed] libstdc++: Fix std::format alternate form for floating-point [PR108046]
Tested x86_64-linux. Pushed to trunk. Backport to gcc-13 to follow. -- >8 -- A decimal point was being added to the end of the string for {:#.0} because the __expc character was not being set, for the _Pres_none presentation type, so __s.find(__expc) didn't the 'e' in "1e+01" and so we created "1e+01." by appending the radix char to the end. This can be fixed by ensuring that __expc='e' is set for the _Pres_none case. I realized we can also set __expc='P' and __expc='E' when needed, to save a call to std::toupper later. For the {:#.0g} format, __expc='e' was being set and so the 'e' was found in "1e+10" but then __z = __prec - __sigfigs would wraparound to SIZE_MAX. That meant we would decide not to add a radix char because the number of extra characters to insert would be 1+SIZE_MAX i.e. zero. This can be fixed by using __z == 0 when __prec == 0. libstdc++-v3/ChangeLog: PR libstdc++/108046 * include/std/format (__formatter_fp::format): Ensure __expc is always set for all presentation types. Set __z correctly for zero precision. * testsuite/std/format/functions/format.cc: Check problem cases. --- libstdc++-v3/include/std/format | 17 + .../testsuite/std/format/functions/format.cc| 4 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format index 0c6069b2681..1e0ef612ddd 100644 --- a/libstdc++-v3/include/std/format +++ b/libstdc++-v3/include/std/format @@ -1430,22 +1430,24 @@ namespace __format chars_format __fmt{}; bool __upper = false; bool __trailing_zeros = false; - char __expc = 0; + char __expc = 'e'; switch (_M_spec._M_type) { case _Pres_A: __upper = true; + __expc = 'P'; [[fallthrough]]; case _Pres_a: - __expc = 'p'; + if (_M_spec._M_type != _Pres_A) + __expc = 'p'; __fmt = chars_format::hex; break; case _Pres_E: __upper = true; + __expc = 'E'; [[fallthrough]]; case _Pres_e: - __expc = 'e'; __use_prec = true; __fmt = chars_format::scientific; break; @@ -1455,10 +1457,10 @@ namespace __format break; case _Pres_G: __upper = true; + __expc = 'E'; [[fallthrough]]; case _Pres_g: __trailing_zeros = true; - __expc = 'e'; __use_prec = true; __fmt = chars_format::general; break; @@ -1511,7 +1513,6 @@ namespace __format { for (char* __p = __start; __p != __res.ptr; ++__p) *__p = std::toupper(*__p); - __expc = std::toupper(__expc); } // Add sign for non-negative values. @@ -1545,15 +1546,15 @@ namespace __format __p = __s.find(__expc); if (__p == __s.npos) __p = __s.size(); - __d = __p; + __d = __p; // Position where '.' should be inserted. __sigfigs = __d; } - if (__trailing_zeros) + if (__trailing_zeros && __prec != 0) { if (!__format::__is_xdigit(__s[0])) --__sigfigs; - __z = __prec - __sigfigs; + __z = __prec - __sigfigs; // Number of zeros to insert. } if (size_t __extras = int(__d == __p) + __z) diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc b/libstdc++-v3/testsuite/std/format/functions/format.cc index 3485535e3cb..bd914df6d7c 100644 --- a/libstdc++-v3/testsuite/std/format/functions/format.cc +++ b/libstdc++-v3/testsuite/std/format/functions/format.cc @@ -152,6 +152,10 @@ test_alternate_forms() s = std::format("{:#.2g}", -0.0); VERIFY( s == "-0.0" ); + + // PR libstdc++/108046 + s = std::format("{0:#.0} {0:#.1} {0:#.0g}", 10.0); + VERIFY( s == "1.e+01 1.e+01 1.e+01" ); } struct euro_punc : std::numpunct -- 2.41.0
[Bug libstdc++/108046] The dot in the floating-point alternative form has wrong position
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108046 Jonathan Wakely changed: What|Removed |Added Last reconfirmed|2022-12-10 00:00:00 |2023-07-27 --- Comment #4 from Jonathan Wakely --- Fixed on trunk so far.
[Bug libstdc++/108046] The dot in the floating-point alternative form has wrong position
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108046 --- Comment #3 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:50bc490c090cc95175e6068ed7438788d7fd7040 commit r14-2825-g50bc490c090cc95175e6068ed7438788d7fd7040 Author: Jonathan Wakely Date: Thu Jul 27 14:07:09 2023 +0100 libstdc++: Fix std::format alternate form for floating-point [PR108046] A decimal point was being added to the end of the string for {:#.0} because the __expc character was not being set, for the _Pres_none presentation type, so __s.find(__expc) didn't the 'e' in "1e+01" and so we created "1e+01." by appending the radix char to the end. This can be fixed by ensuring that __expc='e' is set for the _Pres_none case. I realized we can also set __expc='P' and __expc='E' when needed, to save a call to std::toupper later. For the {:#.0g} format, __expc='e' was being set and so the 'e' was found in "1e+10" but then __z = __prec - __sigfigs would wraparound to SIZE_MAX. That meant we would decide not to add a radix char because the number of extra characters to insert would be 1+SIZE_MAX i.e. zero. This can be fixed by using __z == 0 when __prec == 0. libstdc++-v3/ChangeLog: PR libstdc++/108046 * include/std/format (__formatter_fp::format): Ensure __expc is always set for all presentation types. Set __z correctly for zero precision. * testsuite/std/format/functions/format.cc: Check problem cases.
[Bug target/107172] [13 Regression] wrong code with "-O1 -ftree-vrp" on x86_64-linux-gnu since r13-1268-g8c99e307b20c502e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107172 --- Comment #52 from Shaohua Li --- *** Bug 107257 has been marked as a duplicate of this bug. ***
[Bug target/107257] [13 Regression] Wrong code at -O2 on x86_64-linux-gnu since r13-857-gf1652e3343b1ec47
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107257 Shaohua Li changed: What|Removed |Added Resolution|--- |DUPLICATE Status|WAITING |RESOLVED --- Comment #9 from Shaohua Li --- Sorry, this is indeed a dup. *** This bug has been marked as a duplicate of bug 107172 ***
Re: [PATCH 0/5] Recognize Zicond extension
On 7/27/23 02:43, Xiao Zeng wrote: 2. According to your opinions, I have modified the code, but out of caution for upstream, I conducted a complete regression tests on patch V2, which took some time. I was unable to reply to emails and upload patch V2 in a timely manner. Sorry to have wasted your time -- zicond/xventanacondops has lingered for quite a while and I had a bit of free time yesterday. I felt it was most useful to try and move this stuff forward. 3 After you and other maintainers made minor modifications to my patch[1/5] and patch[2/5], it has been merged into the master, so I will no longer upload patch V2. Agreed. 4 patch[1/5] and patch[2/5], which have been merged into the master, have only completed basic support for Zicond, and further optimization work needs to be completed. These further optimization reactions are reflected in my patch[3/5] patch[4/5] and patch[5/5]. Agreed. 5 As you mentioned in your previous email https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625427.html "eswincomputing and ventana can both reduce our divergence from the trunk and work together on the rest of the bits...". I will reorganize patch[3/5] patch[4/5] and patch[5/5], provide more detailed explanations, and submit them as an alternative solution for further optimization of Zicond. Does that work for you? I'm going to look at 3/5 today pretty closely. Exposing zicond to movcc is something we had implemented inside Ventana and I want to compare/contrast your work with ours. What I like about yours is it keeps all the logic in riscv.cc rather than scattering it across riscv.cc and riscv.md. What I like about the internal Ventana bits is its ability to support arbitrary comparisons by utilizing sCC if the original is not an eq/ne comparison. Ideally we'll be able to get the best of both. Jeff
[Bug c++/110809] ICE: in unify, at cp/pt.cc:25226 with floating-point NTTPs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110809 --- Comment #9 from CVS Commits --- The releases/gcc-13 branch has been updated by Patrick Palka : https://gcc.gnu.org/g:8e811edea309b2097e23cde48ee6fb6467a9094d commit r13-7614-g8e811edea309b2097e23cde48ee6fb6467a9094d Author: Patrick Palka Date: Wed Jul 26 16:52:13 2023 -0400 c++: unifying REAL_CSTs [PR110809] This teaches unify how to compare two REAL_CSTs. PR c++/110809 gcc/cp/ChangeLog: * pt.cc (unify) : Generalize to handle REAL_CST as well. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/nontype-float3.C: New test. (cherry picked from commit 744e1f35266dbd6b6fb95c7e8422562815f8b56f)
[Bug c++/110828] union constexpr dtor not constexpr when used in member array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110828 Patrick Palka changed: What|Removed |Added CC||ppalka at gcc dot gnu.org --- Comment #1 from Patrick Palka --- Does it work if you move the static_assert into a function scope? If so then this is probably a dup of PR85944.
Fix profile update in tree_transform_and_unroll_loop
Hi, This patch fixes profile update in tree_transform_and_unroll_loop which is used by predictive comming. I stared by attempt to fix gcc.dg/tree-ssa/update-unroll-1.c I xfailed last week, but it turned to be harder job. Unrolling was never fixed for changes in duplicate_loop_body_to_header_edge which is now smarter on getting profile right when some exists are eliminated. A lot of manual profile can thus now be done using existing infrastructure. I also noticed that scale_dominated_blocks_in_loop does job identical to loop I wrote in scale_loop_profile and thus I commonized the implementaiton and removed recursion. I also extended duplicate_loop_body_to_header_edge to handle flat profiles same way as we do in vectorizer. Without it we end up with less then 0 iteration count in gcc.dg/tree-ssa/update-unroll-1.c (it is unrolled 32times but predicted to iterated fewer times) and added missing code to update loop_info. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * cfgloopmanip.cc (scale_dominated_blocks_in_loop): Move here from tree-ssa-loop-manip.cc and avoid recursion. (scale_loop_profile): Use scale_dominated_blocks_in_loop. (duplicate_loop_body_to_header_edge): Add DLTHE_FLAG_FLAT_PROFILE flag. * cfgloopmanip.h (DLTHE_FLAG_FLAT_PROFILE): Define. (scale_dominated_blocks_in_loop): Declare. * predict.cc (dump_prediction): Do not ICE on uninitialized probability. (change_edge_frequency): Remove. * predict.h (change_edge_frequency): Remove. * tree-ssa-loop-manip.cc (scale_dominated_blocks_in_loop): Move to cfgloopmanip.cc. (niter_for_unrolled_loop): Remove. (tree_transform_and_unroll_loop): Fix profile update. gcc/testsuite/ChangeLog: * gcc.dg/pr102385.c: Check for no profile mismatches. * gcc.dg/pr96931.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-1.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-2.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-3.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-4.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-5.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-7.c: Check for one profile mismatch. * gcc.dg/tree-ssa/predcom-8.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-1.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-10.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-11.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-12.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-2.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-3.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-4.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-5.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-6.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-7.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-8.c: Check for no profile mismatches. * gcc.dg/tree-ssa/predcom-dse-9.c: Check for no profile mismatches. * gcc.dg/tree-ssa/update-unroll-1.c: Unxfail. diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc index 3012a8d60f7..c3d292d0dd4 100644 --- a/gcc/cfgloopmanip.cc +++ b/gcc/cfgloopmanip.cc @@ -499,6 +499,32 @@ scale_loop_frequencies (class loop *loop, profile_probability p) free (bbs); } +/* Scales the frequencies of all basic blocks in LOOP that are strictly + dominated by BB by NUM/DEN. */ + +void +scale_dominated_blocks_in_loop (class loop *loop, basic_block bb, + profile_count num, profile_count den) +{ + basic_block son; + + if (!den.nonzero_p () && !(num == profile_count::zero ())) +return; + auto_vec worklist; + worklist.safe_push (bb); + + while (!worklist.is_empty ()) +for (son = first_dom_son (CDI_DOMINATORS, worklist.pop ()); +son; +son = next_dom_son (CDI_DOMINATORS, son)) + { + if (!flow_bb_inside_loop_p (loop, son)) + continue; + son->count = son->count.apply_scale (num, den); + worklist.safe_push (son); + } +} + /* Scale profile in LOOP by P. If ITERATION_BOUND is not -1, scale even further if loop is predicted to iterate too many times. @@ -649,19 +675,9 @@ scale_loop_profile (class loop *loop, profile_probability p, if (other_edge && other_edge->dest == loop->latch) loop->latch->count -= new_exit_count - old_exit_count; else - { - basic_block *body = get_loop_body (loop); - profile_count new_count = exit_edge->src->count - new_exit_count; - profile_count old_count = exit_edge->src->count - old_exit_count; - - for (unsigned int i =
Fix profile update in tree-ssa-loop-im.cc
Hi, this fixes two bugs in tree-ssa-loop-im.cc. First is that cap probability is not reliable, but it is constructed with adjusted quality. Second is that sometimes the conditional has wrong joiner BB count. This is visible on testsuite/gcc.dg/pr102385.c however the testcase triggers another profile update bug in pcom, so I will update it in followup patch. gcc/ChangeLog: * tree-ssa-loop-im.cc (execute_sm_if_changed): Turn cap probability to guessed; fix count of new_bb. diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc index f5b01e986ae..268f466bdc9 100644 --- a/gcc/tree-ssa-loop-im.cc +++ b/gcc/tree-ssa-loop-im.cc @@ -2059,7 +2059,8 @@ execute_sm_if_changed (edge ex, tree mem, tree tmp_var, tree flag, nbbs++; } - profile_probability cap = profile_probability::always ().apply_scale (2, 3); + profile_probability cap + = profile_probability::guessed_always ().apply_scale (2, 3); if (flag_probability.initialized_p ()) ; @@ -2103,6 +2104,8 @@ execute_sm_if_changed (edge ex, tree mem, tree tmp_var, tree flag, old_dest = ex->dest; new_bb = split_edge (ex); + if (append_cond_position) +new_bb->count += last_cond_fallthru->count (); then_bb = create_empty_bb (new_bb); then_bb->count = new_bb->count.apply_probability (flag_probability); if (irr)
Fix profile_count::apply_probability
Hi, profile_count::apply_probability misses check for uninitialized probability which leads to completely random results on applying uninitialized probability to initialized scale. This can make difference when i.e. inlining -fno-guess-branch-probability function to -fguess-branch-probability one. Boootstrapped/regtested x86_64-linux, commited. gcc/ChangeLog: * profile-count.h (profile_count::apply_probability): Fix handling of uninitialized probabilities, optimize scaling by probability 1. diff --git a/gcc/profile-count.h b/gcc/profile-count.h index bf1136782a3..e860c5db540 100644 --- a/gcc/profile-count.h +++ b/gcc/profile-count.h @@ -1129,11 +1132,11 @@ public: /* Scale counter according to PROB. */ profile_count apply_probability (profile_probability prob) const { - if (*this == zero ()) + if (*this == zero () || prob == profile_probability::always ()) return *this; if (prob == profile_probability::never ()) return zero (); - if (!initialized_p ()) + if (!initialized_p () || !prob.initialized_p ()) return uninitialized (); profile_count ret; uint64_t tmp;
[Bug rtl-optimization/91838] [8/9 Regression] incorrect use of shr and shrx to shift by 64, missed optimization of vector shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED|RESOLVED --- Comment #20 from Richard Biener --- The testcase is again fixed in GCC 14.
[Bug gcov-profile/110827] C++20 coroutines aren't being measured by gcov
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110827 --- Comment #3 from Michael Duggan --- (In reply to Richard Biener from comment #1) > I'm seeing all code properly instrumented. The coverage I see is > > -:1:#include > -:2:#include > -:3: > -:4:struct task { > -:5: struct promise { > -:6:using handle_t = std::coroutine_handle; > 1:7:task get_return_object() { > 1:8: return task{handle_t::from_promise(*this)}; > -:9:} > 1: 10:std::suspend_never initial_suspend() noexcept { return > {}; } > 1: 11:std::suspend_always final_suspend() noexcept { return > {}; } > #: 12:void unhandled_exception() { std::terminate(); } > 1: 13:void return_void() noexcept {} > -: 14:friend task; > -: 15: }; > -: 16: using promise_type = promise; > 1: 17: task(promise_type::handle_t handle) : handle_{handle} {} > 1: 18: ~task() { > 1: 19:if (handle_) { > 1: 20: handle_.destroy(); > -: 21:} > 1: 22: } > -: 23: private: > -: 24: promise_type::handle_t handle_; > -: 25:}; > -: 26: > 1: 27:task foo() { > -: 28: std::cout << "Running..." << std::endl; > -: 29: co_return; > 2: 30:} > -: 31: > 1: 32:int main(int argc, char **argv) { > 1: 33: foo(); > 1: 34: return 0; > -: 35:} > > I have no idea why for example line 28 isn't marked executed. The point is that no matter what is put in the coroutine, foo, nothing within the coroutine will ever be marked as having been run.
[Bug rtl-optimization/91838] [8/9 Regression] incorrect use of shr and shrx to shift by 64, missed optimization of vector shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838 --- Comment #19 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:d1c072a1c3411a6fe29900750b38210af8451eeb commit r14-2821-gd1c072a1c3411a6fe29900750b38210af8451eeb Author: Richard Biener Date: Thu Jul 27 13:08:32 2023 +0200 tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C The following fixes the lack of simplification of a vector shift by an out-of-bounds shift value. For scalars this is done both by CCP and VRP but vectors are not handled there. This results in PR91838 differences in outcome dependent on whether a vector shift ISA is available and thus vector lowering does or does not expose scalar shifts here. The following adds a match.pd pattern to catch uniform out-of-bound shifts, simplifying them to zero when not sanitizing shift amounts. PR tree-optimization/91838 * gimple-match-head.cc: Include attribs.h and asan.h. * generic-match-head.cc: Likewise. * match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.
[Bug target/110788] Spilling to mask register for GPR vec_duplicate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788 --- Comment #4 from Hongtao.liu --- > kmovw %edx, %k0 > vpbroadcastmw2d %k0, %xmm1 > > instead of > > vpbroadcastw%edx, %xmm1 > It's not vpbroadcastw, it's movzwl %dx, %ecx vpbroadcastd%ecx, %xmm0. And non-kmask version should be better.
[Bug tree-optimization/92335] [11/12/13/14 Regression] sinking of loads happen too early which causes vectorization not to be done
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92335 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #7 from Richard Biener --- I have a patch to fix this.
[Bug tree-optimization/64031] (un-)conditional execution state is not preserved by PRE/sink
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64031 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.0 Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Richard Biener --- This is now fixed in GCC 14.
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 64031, which changed state. Bug 64031 Summary: (un-)conditional execution state is not preserved by PRE/sink https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64031 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
Re: Calling convention for Intel APX extension
Hey, On Thu, 27 Jul 2023, Thomas Koenig via Gcc wrote: > Intel recommends to have the new registers as caller-saved for > compatibility with current calling conventions. If I understand this > correctly, this is required for exception unwinding, but not if the > function called is __attribute__((nothrow)). That's not the full truth. It's not (only) exception handling but also context switching via setjmp/longjmp and make/get/setcontext. The data structures for that are part of the ABI unfortunately, and can't be assumed to be extensible (as Florian says, for glibc there maybe be hacks (or maybe not) on x86-64. Some other archs implemented extensibility from the outset). So all registers (and register parts!) added after the initial psABI is defined usually _have_ to be call-clobbered. > Since Fortran tends to use a lot of registers for its array descriptors, > and also tends to call nothrow functions (all Fortran functions, and > all Fortran intrinsics, such as sin/cos/etc) a lot, it could profit from > making some of the new registers callee-saved, to save some spills > at function calls. I've recently submitted a patch that adds some attributes that basically say "these-and-those regs aren't clobbered by this function" (I did them for not clobbered xmm8-15). Something similar could be used for the new GPRs as well. Then it would be a matter of ensuring that the interesting functions are marked with that attributes (and then of course do the necessary call-save/restore). Ciao, Michael.
[PATCH] [x86] Add UNSPEC_MASKOP to vpbroadcastm pattern.
Prevent rtl optimization of vec_duplicate + zero_extend to vpbroadcastm since there could be an extra kmov after RA. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} Ready to push to trunk. gcc/ChangeLog: PR target/110788 * config/i386/sse.md (avx512cd_maskb_vec_dup): Add UNSPEC_MASKOP. (avx512cd_maskw_vec_dup: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110788.c: New test. --- gcc/config/i386/sse.md | 8 ++-- gcc/testsuite/gcc.target/i386/pr110788.c | 11 +++ 2 files changed, 17 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr110788.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 35fd66ed4aa..51961bbfc0b 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -26778,11 +26778,14 @@ (define_insn "avx512dq_broadcast_1" (set_attr "prefix" "evex") (set_attr "mode" "")]) +;; Use unspec to prevent rtl optimizer to optimize zero_extend + vec_duplicate +;; to pbroadcastm, there could be an extra kmov after RA. (define_insn "avx512cd_maskb_vec_dup" [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v") (vec_duplicate:VI8_AVX512VL (zero_extend:DI - (match_operand:QI 1 "register_operand" "k"] + (match_operand:QI 1 "register_operand" "k" + (unspec [(const_int 0)] UNSPEC_MASKOP)] "TARGET_AVX512CD" "vpbroadcastmb2q\t{%1, %0|%0, %1}" [(set_attr "type" "mskmov") @@ -26793,7 +26796,8 @@ (define_insn "avx512cd_maskw_vec_dup" [(set (match_operand:VI4_AVX512VL 0 "register_operand" "=v") (vec_duplicate:VI4_AVX512VL (zero_extend:SI - (match_operand:HI 1 "register_operand" "k"] + (match_operand:HI 1 "register_operand" "k" + (unspec [(const_int 0)] UNSPEC_MASKOP)] "TARGET_AVX512CD" "vpbroadcastmw2d\t{%1, %0|%0, %1}" [(set_attr "type" "mskmov") diff --git a/gcc/testsuite/gcc.target/i386/pr110788.c b/gcc/testsuite/gcc.target/i386/pr110788.c new file mode 100644 index 000..4cf1676ccb6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr110788.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=cascadelake --param vect-partial-vector-usage=2" } */ +/* { dg-final { scan-assembler-not "vpbroadcastm" } } */ + +double a[1024], b[1024]; + +void foo (int n) +{ + for (int i = 0; i < n; ++i) +a[i] = b[i] * 3.; +} -- 2.39.1.388.g2fc9e9ca3c