Re: [PATCH 0/6] v2 of libdiagnostics
Hi David - and thanks for posting an outline for libdiagnostics at https://gcc.gnu.org/wiki/libdiagnostics Currently this shows both libdiagnosts and libdiagnostics-sarif-dump integrated into GCC. Is this the plan or would those be available as a top-level project (the program as an example for the library), possibly with the library sources also pushed to GCC? Oh, and one question as I stumbled over that today: Would libdiagnostics now (or in the future) use libtextstyle for formatting (and another possible sink: HTML)? Simon Am 23.11.2023 um 18:36 schrieb Pedro Alves: Hi David, On 2023-11-21 22:20, David Malcolm wrote: Here's v2 of the "libdiagnostics" shared library idea; see: https://gcc.gnu.org/wiki/libdiagnostics As in v1, patch 1 (for GCC) shows libdiagnostic.h (the public header file), along with examples of simple self-contained programs that show various uses of the API. As in v1, patch 2 (for GCC) is the work-in-progress implementation. Patch 3 (for GCC) adds a new libdiagnostics++.h, a wrapper API providing some syntactic sugar when using the API from C++. I've been using this to "eat my own dogfood" and write a simple SARIF-dumping tool: https://github.com/davidmalcolm/libdiagnostics-sarif-dump Patch 4 (for GCC) is an internal change needed by patch 1. Patch 5 (for GCC) updates GCC's source printing code so that when there's no column information, we don't print annotation lines. This fixes the extra lines seen using it from gas discussed in: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635575.html Patch 6 (for binutils) is an updated version of the experiment at using the API from gas. Thoughts? Do you have plans on making this a top level library instead? That would allow easily making it a non-optional dependency for binutils, as we could have the library in the binutils-gdb repo as well, for instance. From the Cauldron discussion I understood that the diagnostics stuff doesn't depend on much of GCC's data structures, and doesn't rely on the garbage collector. Is there something preventing that? (Other than "it's-a-matter-of-time/effort", of course.) Pedro Alves
[PATCH] c++/modules: Handle error header names in modules [PR107594]
I don't provide a new test because this error only happens when there are no include paths at all, and I haven't worked out a way to get this to happen within DejaGNU (as it adds a number of `-B` and `-I` flags). Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? -- >8 -- When there are no include paths while preprocessing a header-name token, an empty STRING_CST is returned. This patch ensures this is handled when attempting to create a module for this name. PR c++/107594 gcc/cp/ChangeLog: * module.cc (get_module): Bail on empty name. Signed-off-by: Nathaniel Shead --- gcc/cp/module.cc | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 840c7ef6dab..3c2fef0e3f4 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -14050,6 +14050,12 @@ get_primary (module_state *parent) module_state * get_module (tree name, module_state *parent, bool partition) { + /* We might be given an empty NAME if preprocessing fails to handle + a header-name token. */ + if (name && TREE_CODE (name) == STRING_CST + && TREE_STRING_LENGTH (name) == 0) +return nullptr; + if (partition) { if (!parent) -- 2.43.0
[PATCH] x86: Generate .cfi_undefined for unsaved callee-saved registers
When assembler directives for DWARF frame unwind is enabled, generate the .cfi_undefined directive for unsaved callee-saved registers which have been used in the function. gcc/ PR target/38534 * config/i386/i386.cc (ix86_post_cfi_startproc): New. (TARGET_ASM_POST_CFI_STARTPROC): Likewise. gcc/testsuite/ PR target/38534 * gcc.target/i386/no-callee-saved-19.c: New test. * gcc.target/i386/no-callee-saved-20.c: Likewise. * gcc.target/i386/pr38534-7.c: Likewise. * gcc.target/i386/pr38534-8.c: Likewise. --- gcc/config/i386/i386.cc | 37 +++ .../gcc.target/i386/no-callee-saved-19.c | 17 + .../gcc.target/i386/no-callee-saved-20.c | 12 ++ gcc/testsuite/gcc.target/i386/pr38534-7.c | 18 + gcc/testsuite/gcc.target/i386/pr38534-8.c | 13 +++ 5 files changed, 97 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-19.c create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-20.c create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-8.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index b3e7c74846e..d4c10a5ef9b 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -22662,6 +22662,40 @@ x86_output_mi_thunk (FILE *file, tree thunk_fndecl, HOST_WIDE_INT delta, flag_force_indirect_call = saved_flag_force_indirect_call; } +/* Implement TARGET_ASM_POST_CFI_STARTPROC. Triggered after a + .cfi_startproc directive is emitted into the assembly file. + When assembler directives for DWARF frame unwind is enabled, + output the .cfi_undefined directive for unsaved callee-saved + registers which have been used in the function. */ + +void +ix86_post_cfi_startproc (FILE *f, tree) +{ + if ((cfun->machine->call_saved_registers + == TYPE_NO_CALLEE_SAVED_REGISTERS) + && dwarf2out_do_cfi_asm ()) +for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++) + if (df_regs_ever_live_p (i) + && !fixed_regs[i] + && !call_used_regs[i] + && !STACK_REGNO_P (i) + && !MMX_REGNO_P (i)) + { + if (LEGACY_INT_REGNO_P (i)) + { + if (TARGET_64BIT) + asm_fprintf (f, "\t.cfi_undefined r%s\n", +hi_reg_name[i]); + else + asm_fprintf (f, "\t.cfi_undefined e%s\n", +hi_reg_name[i]); + } + else + asm_fprintf (f, "\t.cfi_undefined %s\n", +hi_reg_name[i]); + } +} + static void x86_file_start (void) { @@ -26281,6 +26315,9 @@ static const scoped_attribute_specs *const ix86_attribute_table[] = #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK #define TARGET_ASM_CAN_OUTPUT_MI_THUNK x86_can_output_mi_thunk +#undef TARGET_ASM_POST_CFI_STARTPROC +#define TARGET_ASM_POST_CFI_STARTPROC ix86_post_cfi_startproc + #undef TARGET_ASM_FILE_START #define TARGET_ASM_FILE_START x86_file_start diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-19.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-19.c new file mode 100644 index 000..60a492cffd3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-19.c @@ -0,0 +1,17 @@ +/* { dg-do assemble { target *-*-linux* *-*-gnu* } } */ +/* { dg-options "-save-temps -march=tigerlake -O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */ + +#include "no-callee-saved-1.c" + +/* { dg-final { scan-assembler-not "push" } } */ +/* { dg-final { scan-assembler-not "pop" } } */ +/* { dg-final { scan-assembler-times ".cfi_undefined rbx" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times ".cfi_undefined rbp" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times ".cfi_undefined r12" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times ".cfi_undefined r13" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times ".cfi_undefined r14" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times ".cfi_undefined r15" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times ".cfi_undefined ebx" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times ".cfi_undefined esi" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times ".cfi_undefined edi" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times ".cfi_undefined ebp" 1 { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-20.c b/gcc/testsuite/gcc.target/i386/no-callee-saved-20.c new file mode 100644 index 000..fc94778824a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-20.c @@ -0,0 +1,12 @@ +/* { dg-do compile { target cfi } } */ +/* { dg-options "-march=tigerlake -O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */ + +__attribute__ ((no_callee_saved_registers)) +void
[COMMITTED] bpf: add constant pointer to helper-skb-ancestor-cgroup-id.c test
The purpose of this test is to make sure that constant propagation is achieved with the proper optimization level, so a BPF call instruction to a kernel helper is generated. This patch updates the patch so it also covers kernel helpers defined with constant static pointers. The motivation for this patch is: https://lore.kernel.org/bpf/20240127185031.29854-1-jose.march...@oracle.com/T/#u Tested in bpf-unknown-none target x86_64-linux-gnu host. gcc/testsuite/ChangeLog * gcc.target/bpf/helper-skb-ancestor-cgroup-id.c: Add constant version of kernel helper static pointer. --- gcc/testsuite/gcc.target/bpf/helper-skb-ancestor-cgroup-id.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/bpf/helper-skb-ancestor-cgroup-id.c b/gcc/testsuite/gcc.target/bpf/helper-skb-ancestor-cgroup-id.c index 693f390b9bb..075dbe6f852 100644 --- a/gcc/testsuite/gcc.target/bpf/helper-skb-ancestor-cgroup-id.c +++ b/gcc/testsuite/gcc.target/bpf/helper-skb-ancestor-cgroup-id.c @@ -5,6 +5,7 @@ struct __sk_buff; static uint64_t (*bpf_skb_ancestor_cgroup_id)(struct __sk_buff *skb, int ancestor_level) = (void *) 83; +static uint64_t (* const const_bpf_skb_ancestor_cgroup_id)(struct __sk_buff *skb, int ancestor_level) = (void *) 84; void foo () @@ -13,7 +14,9 @@ foo () void *skb; int ancestor_level; - ret = bpf_skb_ancestor_cgroup_id (skb, ancestor_level); + ret = bpf_skb_ancestor_cgroup_id (skb, ancestor_level) ++ const_bpf_skb_ancestor_cgroup_id (skb, ancestor_level); } /* { dg-final { scan-assembler "call\t83" } } */ +/* { dg-final { scan-assembler "call\t84" } } */ -- 2.30.2
[PATCH, committed] Fortran: fix bounds-checking errors for CLASS array dummies [PR104908]
Dear all, commit r11-1235 for pr95331 addressed array bounds issues with unlimited polymorphic array dummies, but caused regressions for CLASS array dummies that lead to either wrong code with bounds-checking, or an ICE. The solution is simple: add a check whether the dummy is unlimited polymorphic and otherwise restore the previous behavior. The attached patch regtested fine on x86_64-pc-linux-gnu and was OK'ed in the PR by Jerry. Pushed as: r14-8471-gce61de1b8a1bb3 Since this is a 11/12/13/14 regression and appears safe otherwise, I intend to backport as suitable, unless there are comments. Thanks, Harald From ce61de1b8a1bb3a22118e900376f380768f2ba59 Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Sat, 27 Jan 2024 17:41:43 +0100 Subject: [PATCH] Fortran: fix bounds-checking errors for CLASS array dummies [PR104908] Commit r11-1235 addressed issues with bounds of unlimited polymorphic array dummies. However, using the descriptor from sym->backend_decl does break the case of CLASS array dummies. The obvious solution is to restrict the fix to the unlimited polymorphic case, thus keeping the original descriptor in the ordinary case. gcc/fortran/ChangeLog: PR fortran/104908 * trans-array.cc (gfc_conv_array_ref): Restrict use of transformed descriptor (sym->backend_decl) to the unlimited polymorphic case. gcc/testsuite/ChangeLog: PR fortran/104908 * gfortran.dg/pr104908.f90: New test. --- gcc/fortran/trans-array.cc | 5 +++- gcc/testsuite/gfortran.dg/pr104908.f90 | 32 ++ 2 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gfortran.dg/pr104908.f90 diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index 878a92aff18..1e0d698a949 100644 --- a/gcc/fortran/trans-array.cc +++ b/gcc/fortran/trans-array.cc @@ -4063,7 +4063,10 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr, } decl = se->expr; - if (IS_CLASS_ARRAY (sym) && sym->attr.dummy && ar->as->type != AS_DEFERRED) + if (UNLIMITED_POLY(sym) + && IS_CLASS_ARRAY (sym) + && sym->attr.dummy + && ar->as->type != AS_DEFERRED) decl = sym->backend_decl; cst_offset = offset = gfc_index_zero_node; diff --git a/gcc/testsuite/gfortran.dg/pr104908.f90 b/gcc/testsuite/gfortran.dg/pr104908.f90 new file mode 100644 index 000..c3a30b0003c --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr104908.f90 @@ -0,0 +1,32 @@ +! { dg-do compile } +! { dg-additional-options "-fcheck=bounds -fdump-tree-original" } +! +! PR fortran/104908 - incorrect out-of-bounds runtime error + +program test + implicit none + type vec + integer :: x(3) = [2,4,6] + end type vec + type(vec) :: w(2) + call sub(w) +contains + subroutine sub (v) +class(vec), intent(in) :: v(:) +integer :: k, q(3) +q = [ (v(1)%x(k), k = 1, 3) ] ! <-- was failing here after r11-1235 +print *, q + end +end + +subroutine sub2 (zz) + implicit none + type vec + integer :: x(2,1) + end type vec + class(vec), intent(in) :: zz(:) ! used to ICE after r11-1235 + integer :: k + k = zz(1)%x(2,1) +end + +! { dg-final { scan-tree-dump-times " above upper bound " 4 "original" } } -- 2.35.3
Fix ICE with -g and -std=c23 when forming composite types [PR113438]
Debug output ICEs when we do not set TYPE_STUB_DECL, fix this. Fix ICE with -g and -std=c23 when forming composite types [PR113438] Set TYPE_STUB_DECL to an artificial decl when creating a new structure as a composite type. PR c/113438 gcc/c/ * c-typeck.cc (composite_type_internal): Set TYPE_STUB_DECL. gcc/testsuite/ * gcc.dg/pr113438.c: New test. diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index 66c6abc9f07..cfa3b7ab10f 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -585,6 +585,11 @@ composite_type_internal (tree t1, tree t2, struct composite_cache* cache) /* Setup the struct/union type. Because we inherit all variably modified components, we can ignore the size expression. */ tree expr = NULL_TREE; + + /* Set TYPE_STUB_DECL for debugging symbols. */ + TYPE_STUB_DECL (n) = pushdecl (build_decl (input_location, TYPE_DECL, +NULL_TREE, n)); + n = finish_struct(input_location, n, fields, attributes, NULL, ); n = qualify_type (n, t1); diff --git a/gcc/testsuite/gcc.dg/pr113438.c b/gcc/testsuite/gcc.dg/pr113438.c new file mode 100644 index 000..5612ee4fa38 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr113438.c @@ -0,0 +1,7 @@ +/* PR113438 + * { dg-do compile } + * { dg-options "-std=c23 -g" } */ + +void g(struct foo { int x; } a); +void g(struct foo { int x; } a); +
Re: [aarch64] PR112950: gcc.target/aarch64/sve/acle/general/dupq_5.c fails on aarch64_be-linux-gnu
Prathamesh Kulkarni writes: > Hi, > The test passes -mlittle-endian option but doesn't have target check > for aarch64_little_endian and thus fails to compile on > aarch64_be-linux-gnu. The patch adds the missing aarch64_little_endian > target check, which makes it unsupported on the target. > OK to commit ? > > Thanks, > Prathamesh > > PR112950: Add aarch64_little_endian target check for dupq_5.c > > gcc/testsuite/ChangeLog: > PR target/112950 > * gcc.target/aarch64/sve/acle/general/dupq_5.c: Add > aarch64_little_endian target check. If we add this requirement, then there's no need to pass -mlittle-endian in the dg-options. But dupq_6.c (the corresponding big-endian test) has: /* To avoid needing big-endian header files. */ #pragma GCC aarch64 "arm_sve.h" instead of: #include Could you do the same thing here? Thanks, Richard > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c > index 6ae8d4c60b2..1990412d0e5 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c > @@ -1,5 +1,6 @@ > /* { dg-do compile } */ > /* { dg-options "-O2 -mlittle-endian" } */ > +/* { dg-require-effective-target aarch64_little_endian } */ > > #include >
[PATCH] vect: Tighten vect_determine_precisions_from_range [PR113281]
This was another PR caused by the way that vect_determine_precisions_from_range handle shifts. We tried to narrow 32768 >> x to a 16-bit shift based on range information for the inputs and outputs, with vect_recog_over_widening_pattern (after PR110828) adjusting the shift amount. But this doesn't work for the case where x is in [16, 31], since then 32-bit 32768 >> x is a well-defined zero, whereas no well-defined 16-bit 32768 >> y will produce 0. We could perhaps generate x < 16 ? 32768 >> x : 0 instead, but since vect_determine_precisions_from_range was never really supposed to rely on fix-ups, it seems better to fix that instead. The patch also makes the code more selective about which codes can be narrowed based on input and output ranges. This showed that vect_truncatable_operation_p was missing cases for BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR (equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1). pr113281-1.c is the original testcase. pr113281-[23].c failed before the patch due to overly optimistic narrowing. pr113281-[45].c previously passed and are meant to protect against accidental optimisation regressions. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard gcc/ PR target/113281 * tree-vect-patterns.cc (vect_recog_over_widening_pattern): Remove workaround for right shifts. (vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR. (vect_determine_precisions_from_range): Be more selective about which codes can be narrowed based on their input and output ranges. For shifts, require at least one more bit of precision than the maximum shift amount. gcc/testsuite/ PR target/113281 * gcc.dg/vect/pr113281-1.c: New test. * gcc.dg/vect/pr113281-2.c: Likewise. * gcc.dg/vect/pr113281-3.c: Likewise. * gcc.dg/vect/pr113281-4.c: Likewise. * gcc.dg/vect/pr113281-5.c: Likewise. --- gcc/testsuite/gcc.dg/vect/pr113281-1.c | 17 +++ gcc/testsuite/gcc.dg/vect/pr113281-2.c | 50 + gcc/testsuite/gcc.dg/vect/pr113281-3.c | 39 +++ gcc/testsuite/gcc.dg/vect/pr113281-4.c | 55 ++ gcc/testsuite/gcc.dg/vect/pr113281-5.c | 66 gcc/tree-vect-patterns.cc | 144 + 6 files changed, 305 insertions(+), 66 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-3.c create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-4.c create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-5.c diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-1.c b/gcc/testsuite/gcc.dg/vect/pr113281-1.c new file mode 100644 index 000..6df4231cb5f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr113281-1.c @@ -0,0 +1,17 @@ +#include "tree-vect.h" + +unsigned char a; + +int main() { + check_vect (); + + short b = a = 0; + for (; a != 19; a++) +if (a) + b = 32872 >> a; + + if (b == 0) +return 0; + else +return 1; +} diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-2.c b/gcc/testsuite/gcc.dg/vect/pr113281-2.c new file mode 100644 index 000..3a1170c28b6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr113281-2.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ + +#define N 128 + +short x[N]; +short y[N]; + +void +f1 (void) +{ + for (int i = 0; i < N; ++i) +x[i] >>= y[i]; +} + +void +f2 (void) +{ + for (int i = 0; i < N; ++i) +x[i] >>= (y[i] < 32 ? y[i] : 32); +} + +void +f3 (void) +{ + for (int i = 0; i < N; ++i) +x[i] >>= (y[i] < 31 ? y[i] : 31); +} + +void +f4 (void) +{ + for (int i = 0; i < N; ++i) +x[i] >>= (y[i] & 31); +} + +void +f5 (void) +{ + for (int i = 0; i < N; ++i) +x[i] >>= 0x8000 >> y[i]; +} + +void +f6 (void) +{ + for (int i = 0; i < N; ++i) +x[i] >>= 0x8000 >> (y[i] & 31); +} + +/* { dg-final { scan-tree-dump-not {can narrow[^\n]+>>} "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-3.c b/gcc/testsuite/gcc.dg/vect/pr113281-3.c new file mode 100644 index 000..5982dd2d16f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr113281-3.c @@ -0,0 +1,39 @@ +/* { dg-do compile } */ + +#define N 128 + +short x[N]; +short y[N]; + +void +f1 (void) +{ + for (int i = 0; i < N; ++i) +x[i] >>= (y[i] < 30 ? y[i] : 30); +} + +void +f2 (void) +{ + for (int i = 0; i < N; ++i) +x[i] >>= ((y[i] & 15) + 2); +} + +void +f3 (void) +{ + for (int i = 0; i < N; ++i) +x[i] >>= (y[i] < 16 ? y[i] : 16); +} + +void +f4 (void) +{ + for (int i = 0; i < N; ++i) +x[i] = 32768 >> ((y[i] & 15) + 3); +} + +/* { dg-final { scan-tree-dump {can narrow to signed:31 without loss [^\n]+>>} "vect" } } */ +/* { dg-final { scan-tree-dump {can narrow to signed:18 without loss [^\n]+>>} "vect" } } */ +/* { dg-final { scan-tree-dump {can narrow to signed:17 without loss [^\n]+>>} "vect" } } */ +/* { dg-final {
[PATCH] Handle function symbol reference in readonly data section
For function symbol reference in readonly data section, instead of putting it in .data.rel.ro or .rodata.cst section, call function_rodata_section to get the read-only or relocated read-only data section associated with the function DECL so that the COMDAT section will be used for a COMDAT function symbol. gcc/ PR rtl-optimization/113617 * varasm.cc (default_elf_select_rtx_section): Call function_rodata_section to get the read-only or relocated read-only data section for function symbol reference. gcc/testsuite/ PR rtl-optimization/113617 * g++.dg/pr113617-1a.C: New test. * g++.dg/pr113617-1b.C: Likewise. --- gcc/testsuite/g++.dg/pr113617-1a.C | 170 + gcc/testsuite/g++.dg/pr113617-1b.C | 8 ++ gcc/varasm.cc | 18 +++ 3 files changed, 196 insertions(+) create mode 100644 gcc/testsuite/g++.dg/pr113617-1a.C create mode 100644 gcc/testsuite/g++.dg/pr113617-1b.C diff --git a/gcc/testsuite/g++.dg/pr113617-1a.C b/gcc/testsuite/g++.dg/pr113617-1a.C new file mode 100644 index 000..effd50841c0 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr113617-1a.C @@ -0,0 +1,170 @@ +// { dg-do compile { target fpic } } +// { dg-require-visibility "" } +// { dg-options "-O2 -std=c++11 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden" } + +namespace { +template struct integral_constant { + static constexpr int value = __v; +}; +template using __bool_constant = integral_constant<__v>; +using true_type = __bool_constant; +template struct __conditional { + template using type = _Tp; +}; +template +using __conditional_t = typename __conditional<_Cond>::type<_If, _Else>; +true_type __trans_tmp_1; +template struct remove_cv { using type = _Tp; }; +template +struct __decay_selector +: __conditional_t, _Up> {}; +template struct decay { + using type = typename __decay_selector<_Tp>::type; +}; +} +struct vtkCellArray {}; +namespace blah { +struct _Any_data; +enum _Manager_operation {}; +template class function; +struct _Function_base { + using _Manager_type = bool (*)(_Any_data &, const _Any_data &, + _Manager_operation); + _Manager_type _M_manager; +}; +template class _Function_handler; +template +struct _Function_handler<_Res(_ArgTypes...), _Functor> { + static bool _M_manager(_Any_data &, const _Any_data &, _Manager_operation) { +return false; + } + __attribute__((noipa)) static _Res _M_invoke(const _Any_data &) {} +}; +template +struct function<_Res(_ArgTypes...)> : _Function_base { + template + using _Handler = _Function_handler<_Res(), _Functor>; + template function(_Functor) { +using _My_handler = _Handler<_Functor>; +_M_invoker = _My_handler::_M_invoke; +_M_manager = _My_handler::_M_manager; + } + using _Invoker_type = _Res (*)(const _Any_data &); + _Invoker_type _M_invoker; +}; +template class _Bind; +template +struct _Bind<_Functor(_Bound_args...)> {}; +template using __is_socketlike = decltype(__trans_tmp_1); +template struct _Bind_helper { + typedef _Bind::type( + typename decay<_BoundArgs>::type...)> + type; +}; +template +__attribute__((noipa)) typename _Bind_helper<__is_socketlike<_Func>::value, _Func, _BoundArgs...>::type +bind(_Func, _BoundArgs...) { return typename _Bind_helper<__is_socketlike<_Func>::value, _Func, _BoundArgs...>::type (); } +template struct __uniq_ptr_impl { + template struct _Ptr { using type = _Up *; }; + using pointer = typename _Ptr<_Tp>::type; +}; +template struct unique_ptr { + using pointer = typename __uniq_ptr_impl<_Tp>::pointer; + pointer operator->(); +}; +} +extern int For_threadNumber; +namespace vtk { +namespace detail { +namespace smp { +enum BackendType { Sequential, STDThread }; +template struct vtkSMPToolsImpl { + template + __attribute__((noipa)) void For(long long, long long, long long, FunctorInternal &) {} +}; +struct vtkSMPThreadPool { + vtkSMPThreadPool(int); + void DoJob(blah::function); +}; +template +__attribute__((noipa)) void ExecuteFunctorSTDThread(void *, long long, long long, long long) {} +template <> +template +void vtkSMPToolsImpl::For(long long, long long last, long long grain, + FunctorInternal ) { + vtkSMPThreadPool pool(For_threadNumber); + for (;;) { +auto job = blah::bind(ExecuteFunctorSTDThread, , grain, + grain, last); +pool.DoJob(job); + } +} +struct vtkSMPToolsAPI { + static vtkSMPToolsAPI (); + template + void For(long first, long last, long grain, FunctorInternal fi) { +switch (ActivatedBackend) { +case Sequential: + SequentialBackend->For(first, last, grain, fi); +case STDThread: + STDThreadBackend->For(first, last, grain, fi); +} + } + BackendType ActivatedBackend; + blah::unique_ptr> SequentialBackend; + blah::unique_ptr> STDThreadBackend; +}; +template struct vtkSMPTools_FunctorInternal; +template struct
[PATCH v2] x86: Save callee-saved registers in noreturn functions for -O0/-Og
Changes in v2: 1. Lookup noreturn attribute first. 2. Use __attribute__((noreturn, optimize("-Og"))) in pr38534-6.c. Save callee-saved registers in noreturn functions for -O0/-Og so that debugger can restore callee-saved registers in caller's frame. gcc/ PR target/38534 * config/i386/i386-options.cc (ix86_set_func_type): Save callee-saved registers in noreturn functions for -O0/-Og. gcc/testsuite/ PR target/38534 * gcc.target/i386/pr38534-5.c: New file. * gcc.target/i386/pr38534-6.c: Likewise. --- gcc/config/i386/i386-options.cc | 9 +--- gcc/testsuite/gcc.target/i386/pr38534-5.c | 26 +++ gcc/testsuite/gcc.target/i386/pr38534-6.c | 26 +++ 3 files changed, 58 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-6.c diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 473f5359fc9..a647b1bdf5c 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -3381,7 +3381,8 @@ static void ix86_set_func_type (tree fndecl) { /* No need to save and restore callee-saved registers for a noreturn - function with nothrow or compiled with -fno-exceptions. + function with nothrow or compiled with -fno-exceptions unless when + compiling with -O0 or -Og. NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn function. The local-pure-const pass turns an interrupt function @@ -3391,8 +3392,10 @@ ix86_set_func_type (tree fndecl) function is marked as noreturn in the IR output, which leads the incompatible attribute error in LTO1. */ bool has_no_callee_saved_registers -= (((TREE_NOTHROW (fndecl) || !flag_exceptions) - && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))) += ((lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl)) + && optimize + && !optimize_debug + && (TREE_NOTHROW (fndecl) || !flag_exceptions)) || lookup_attribute ("no_callee_saved_registers", TYPE_ATTRIBUTES (TREE_TYPE (fndecl; diff --git a/gcc/testsuite/gcc.target/i386/pr38534-5.c b/gcc/testsuite/gcc.target/i386/pr38534-5.c new file mode 100644 index 000..91c0c0f8c59 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr38534-5.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */ + +#define ARRAY_SIZE 256 + +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE]; +extern int value (int, int, int) +#ifndef __x86_64__ +__attribute__ ((regparm(3))) +#endif +; + +void +__attribute__((noreturn)) +no_return_to_caller (void) +{ + unsigned i, j, k; + for (i = ARRAY_SIZE; i > 0; --i) +for (j = ARRAY_SIZE; j > 0; --j) + for (k = ARRAY_SIZE; k > 0; --k) + array[i - 1][j - 1][k - 1] = value (i, j, k); + while (1); +} + +/* { dg-final { scan-assembler "push" } } */ +/* { dg-final { scan-assembler-not "pop" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr38534-6.c b/gcc/testsuite/gcc.target/i386/pr38534-6.c new file mode 100644 index 000..cf1463a9c66 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr38534-6.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */ + +#define ARRAY_SIZE 256 + +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE]; +extern int value (int, int, int) +#ifndef __x86_64__ +__attribute__ ((regparm(3))) +#endif +; + +void +__attribute__((noreturn, optimize("-Og"))) +no_return_to_caller (void) +{ + unsigned i, j, k; + for (i = ARRAY_SIZE; i > 0; --i) +for (j = ARRAY_SIZE; j > 0; --j) + for (k = ARRAY_SIZE; k > 0; --k) + array[i - 1][j - 1][k - 1] = value (i, j, k); + while (1); +} + +/* { dg-final { scan-assembler "push" } } */ +/* { dg-final { scan-assembler-not "pop" } } */ -- 2.43.0
Re: [PATCH] x86: Save callee-saved registers in noreturn functions for -O0/-Og
On Sat, Jan 27, 2024 at 6:09 AM Jakub Jelinek wrote: > > On Sat, Jan 27, 2024 at 05:52:34AM -0800, H.J. Lu wrote: > > @@ -3391,7 +3392,9 @@ ix86_set_func_type (tree fndecl) > > function is marked as noreturn in the IR output, which leads the > > incompatible attribute error in LTO1. */ > >bool has_no_callee_saved_registers > > -= (((TREE_NOTHROW (fndecl) || !flag_exceptions) > > += ((optimize > > + && !optimize_debug > > Shouldn't that be opt_for_fn (fndecl, optimize) and ditto for > optimize_debug? > I mean, aren't the options not restored yet when this function is called > (i.e. remain in whatever state they were in the previous function or > global state)? store_parm_decls is called when parsing a function. store_parm_decls calls allocate_struct_function which calls invoke_set_current_function_hook (fndecl); which has /* Change optimization options if needed. */ if (optimization_current_node != opts) { optimization_current_node = opts; cl_optimization_restore (_options, _options_set, TREE_OPTIMIZATION (opts)); } targetm.set_current_function (fndecl); which calls ix86_set_current_function after global_options has been updated. ix86_set_func_type is called from ix86_set_current_function. I don't see an issue with optimize and optimize_debug here. > Also, shouldn't the lookup_attribute ("noreturn" check be the first one? > I mean, noreturn functions are quite rare and so checking all the other I will fix it and updated one testcase with __attribute__((noreturn, optimize("-Og"))) > conditions upon each set_cfun could waste too much compile time. > > Also, why check "noreturn" attribute rather than > TREE_THIS_VOLATILE (fndecl)? > The comments above this code has NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn function. The local-pure-const pass turns an interrupt function into a noreturn function by setting TREE_THIS_VOLATILE. Normally the local-pure-const pass is run after ix86_set_func_type is called. When the local-pure-const pass is enabled for LTO, the interrupt function is marked as noreturn in the IR output, which leads the incompatible attribute error in LTO1. Thanks. -- H.J.
Re: [PATCH] x86: Save callee-saved registers in noreturn functions for -O0/-Og
On Sat, Jan 27, 2024 at 05:52:34AM -0800, H.J. Lu wrote: > @@ -3391,7 +3392,9 @@ ix86_set_func_type (tree fndecl) > function is marked as noreturn in the IR output, which leads the > incompatible attribute error in LTO1. */ >bool has_no_callee_saved_registers > -= (((TREE_NOTHROW (fndecl) || !flag_exceptions) > += ((optimize > + && !optimize_debug Shouldn't that be opt_for_fn (fndecl, optimize) and ditto for optimize_debug? I mean, aren't the options not restored yet when this function is called (i.e. remain in whatever state they were in the previous function or global state)? Also, shouldn't the lookup_attribute ("noreturn" check be the first one? I mean, noreturn functions are quite rare and so checking all the other conditions upon each set_cfun could waste too much compile time. Also, why check "noreturn" attribute rather than TREE_THIS_VOLATILE (fndecl)? Jakub
Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.
在 2024/1/27 下午7:11, Xi Ruoyao 写道: On Sat, 2024-01-27 at 18:02 +0800, Xi Ruoyao wrote: On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote: 在 2024/1/26 下午6:57, Xi Ruoyao 写道: On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote: 在 2024/1/26 下午4:49, Xi Ruoyao 写道: On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote: v3 -> v4: 1. Add macro support for TLS symbols 2. Added support for loading __get_tls_addr symbol address using call36. 3. Merge template got_load_tls_{ld/gd/le/ie}. 4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto. I've rebased and attached the patch to fix the bad split in -mexplicit- relocs={always,auto} -mcmodel=extreme on top of this series. I've not tested it seriously though (only tested the added and modified test cases). OK, I'll test the spec for correctness. I suppose this still won't work yet because Binutils is not fully fixed. GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0, foo", but ld is still not checking if an R_LARCH_RELAX is after R_LARCH_TLS_IE_PC_{HI20,LO12} properly. Thus an invalid "partial" TLS transition can still happen. The following situations are not handled in the patch: diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 3fab4b64453..6336a9f696f 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree thunk_fndecl ATTRIBUTE_UNUSED, { if (TARGET_CMODEL_EXTREME) { - emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE) + { + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr)); + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2)); + } + else + emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); It looks like this part is unreachable: with -mcmodel=extreme use_sibcall_p will never be true. So cleaned up this part and fixed an ERROR in the added test: diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 3a97ba61362..7b8c85a1606 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -7481,21 +7481,24 @@ loongarch_output_mi_thunk (FILE *file, tree thunk_fndecl ATTRIBUTE_UNUSED, allowed, otherwise load the address into a register first. */ if (use_sibcall_p) { - if (TARGET_CMODEL_EXTREME) - { - emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); - insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx)); - } - else - insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx)); + /* If TARGET_CMODEL_EXTREME, we cannot do a direct jump at all +and const_call_insn_operand should have returned false. */ + gcc_assert (!TARGET_CMODEL_EXTREME); + + insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx)); SIBLING_CALL_P (insn) = 1; } else { - if (TARGET_CMODEL_EXTREME) + if (!TARGET_CMODEL_EXTREME) + loongarch_emit_move (temp1, fnaddr); + else if (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE) emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); else - loongarch_emit_move (temp1, fnaddr); + { + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr)); + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2)); + } emit_jump_insn (gen_indirect_jump (temp1)); } diff --git a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c index 27baf4886d6..35bd4570a9e 100644 --- a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c +++ b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fPIC -mexplicit-relocs=auto -mcmodel=extreme -fno-plt" } */ -/* { dg-final { scan-assembler-not "la.tls.[lg]d" { target tls_native } } } */ +/* { dg-final { scan-assembler-not "la.tls.\[lg\]d" { target tls_native } } } */ #include "./explicit-relocs-auto-tls-ld-gd.c" And added 3 tests for output_mi_thunk. The updated patch attached, now running regression test. @@ -2870,20 +2872,30 @@ loongarch_call_tls_get_addr (rtx sym, enum loongarch_symbol_type type, rtx v0) { if (loongarch_explicit_relocs_p (SYMBOL_GOT_DISP)) { - rtx tmp1 = gen_reg_rtx (Pmode); - rtx high = gen_reg_rtx (Pmode); + gcc_assert (la_opt_explicit_relocs != + EXPLICIT_RELOCS_NONE); This operator is written at the end of the line, and I think there is no problem with anything else. But I need to see the
[PATCH] x86: Save callee-saved registers in noreturn functions for -O0/-Og
Save callee-saved registers in noreturn functions for -O0/-Og so that debugger can restore callee-saved registers in caller's frame. gcc/ PR target/38534 * config/i386/i386-options.cc (ix86_set_func_type): Save callee-saved registers in noreturn functions for -O0/-Og. gcc/testsuite/ PR target/38534 * gcc.target/i386/pr38534-5.c: New file. * gcc.target/i386/pr38534-6.c: Likewise. --- gcc/config/i386/i386-options.cc | 7 -- gcc/testsuite/gcc.target/i386/pr38534-5.c | 26 +++ gcc/testsuite/gcc.target/i386/pr38534-6.c | 26 +++ 3 files changed, 57 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-6.c diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 473f5359fc9..5ff5560df7a 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -3381,7 +3381,8 @@ static void ix86_set_func_type (tree fndecl) { /* No need to save and restore callee-saved registers for a noreturn - function with nothrow or compiled with -fno-exceptions. + function with nothrow or compiled with -fno-exceptions unless when + compiling with -O0 or -Og. NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn function. The local-pure-const pass turns an interrupt function @@ -3391,7 +3392,9 @@ ix86_set_func_type (tree fndecl) function is marked as noreturn in the IR output, which leads the incompatible attribute error in LTO1. */ bool has_no_callee_saved_registers -= (((TREE_NOTHROW (fndecl) || !flag_exceptions) += ((optimize + && !optimize_debug + && (TREE_NOTHROW (fndecl) || !flag_exceptions) && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))) || lookup_attribute ("no_callee_saved_registers", TYPE_ATTRIBUTES (TREE_TYPE (fndecl; diff --git a/gcc/testsuite/gcc.target/i386/pr38534-5.c b/gcc/testsuite/gcc.target/i386/pr38534-5.c new file mode 100644 index 000..91c0c0f8c59 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr38534-5.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */ + +#define ARRAY_SIZE 256 + +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE]; +extern int value (int, int, int) +#ifndef __x86_64__ +__attribute__ ((regparm(3))) +#endif +; + +void +__attribute__((noreturn)) +no_return_to_caller (void) +{ + unsigned i, j, k; + for (i = ARRAY_SIZE; i > 0; --i) +for (j = ARRAY_SIZE; j > 0; --j) + for (k = ARRAY_SIZE; k > 0; --k) + array[i - 1][j - 1][k - 1] = value (i, j, k); + while (1); +} + +/* { dg-final { scan-assembler "push" } } */ +/* { dg-final { scan-assembler-not "pop" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr38534-6.c b/gcc/testsuite/gcc.target/i386/pr38534-6.c new file mode 100644 index 000..756e1ec81f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr38534-6.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-Og -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */ + +#define ARRAY_SIZE 256 + +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE]; +extern int value (int, int, int) +#ifndef __x86_64__ +__attribute__ ((regparm(3))) +#endif +; + +void +__attribute__((noreturn)) +no_return_to_caller (void) +{ + unsigned i, j, k; + for (i = ARRAY_SIZE; i > 0; --i) +for (j = ARRAY_SIZE; j > 0; --j) + for (k = ARRAY_SIZE; k > 0; --k) + array[i - 1][j - 1][k - 1] = value (i, j, k); + while (1); +} + +/* { dg-final { scan-assembler "push" } } */ +/* { dg-final { scan-assembler-not "pop" } } */ -- 2.43.0
[aarch64] PR112950: gcc.target/aarch64/sve/acle/general/dupq_5.c fails on aarch64_be-linux-gnu
Hi, The test passes -mlittle-endian option but doesn't have target check for aarch64_little_endian and thus fails to compile on aarch64_be-linux-gnu. The patch adds the missing aarch64_little_endian target check, which makes it unsupported on the target. OK to commit ? Thanks, Prathamesh PR112950: Add aarch64_little_endian target check for dupq_5.c gcc/testsuite/ChangeLog: PR target/112950 * gcc.target/aarch64/sve/acle/general/dupq_5.c: Add aarch64_little_endian target check. diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c index 6ae8d4c60b2..1990412d0e5 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mlittle-endian" } */ +/* { dg-require-effective-target aarch64_little_endian } */ #include
Re: [PATCH] lower-bitint: Fix up VIEW_CONVERT_EXPR handling in lower_mergeable_stmt [PR113568]
> Am 27.01.2024 um 09:16 schrieb Jakub Jelinek : > > Hi! > > We generally allow merging mergeable stmts with some final cast (but not > further casts or mergeable operations after the cast). As some casts > are handled conditionally, if (idx < cst) handle_operand (idx); else if > idx == cst) handle_operand (cst); else ..., we must sure that e.g. the > mergeable PLUS_EXPR/MINUS_EXPR/NEGATE_EXPR never appear in handle_operand > called from such casts, because it ICEs on invalid SSA_NAME form (that part > could be fixable by adding further PHIs) but also because we'd need to > correctly propagate the overflow flags from the if to else if. > So, instead lower_mergeable_stmt handles an outermost widening cast (or > widening cast feeding outermost store) specially. > The problem was similar to PR113408, that VIEW_CONVERT_EXPR tree is > present in the gimple_assign_rhs1 while it is not for NOP_EXPR/CONVERT_EXPR, > so the checks whether the outermost cast should be handled didn't handle > the VCE case and so handle_plus_minus was called from the conditional > handle_cast. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? Ok Richard > 2024-01-27 Jakub Jelinek > >PR tree-optimization/113568 >* gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt): >For VIEW_CONVERT_EXPR use first operand of rhs1 instead of rhs1 >in the widening extension checks. > >* gcc.dg/bitint-78.c: New test. > > --- gcc/gimple-lower-bitint.cc.jj2024-01-26 17:40:29.083814064 +0100 > +++ gcc/gimple-lower-bitint.cc2024-01-26 18:05:24.461891138 +0100 > @@ -2401,6 +2401,8 @@ bitint_large_huge::lower_mergeable_stmt > rhs1 = gimple_assign_rhs1 (store_operand > ? SSA_NAME_DEF_STMT (store_operand) > : stmt); > + if (TREE_CODE (rhs1) == VIEW_CONVERT_EXPR) > +rhs1 = TREE_OPERAND (rhs1, 0); > /* Optimize mergeable ops ending with widening cast to _BitInt > (or followed by store). We can lower just the limbs of the > cast operand and widen afterwards. */ > --- gcc/testsuite/gcc.dg/bitint-78.c.jj2024-01-26 18:11:54.164435951 +0100 > +++ gcc/testsuite/gcc.dg/bitint-78.c2024-01-26 18:11:33.642723218 +0100 > @@ -0,0 +1,21 @@ > +/* PR tree-optimization/113568 */ > +/* { dg-do compile { target bitint } } */ > +/* { dg-options "-O2 -std=c23" } */ > + > +signed char c; > +#if __BITINT_MAXWIDTH__ >= 464 > +_BitInt(464) g; > + > +void > +foo (void) > +{ > + _BitInt(464) a[2] = {}; > + _BitInt(464) b; > + while (c) > +{ > + b = g + 1; > + g = a[0]; > + a[0] = b; > +} > +} > +#endif > >Jakub >
Re: [PATCH] lower-bitint: Add debugging dump of SSA_NAME -> decl mappings
> Am 27.01.2024 um 09:15 schrieb Jakub Jelinek : > > Hi! > > While the SSA coalescing performed by lower bitint prints some information > if -fdump-tree-bitintlower-details, it is really hard to read and doesn't > contain the most important information which one looks for when debugging > bitint lowering issues, namely what VAR_DECLs (or PARM_DECLs/RESULT_DECLs) > each SSA_NAME in large_huge.m_names bitmap maps to. > > So, the following patch adds dumping of that, so that we know that say > _3 -> bitint.3 > _8 -> bitint.7 > _16 -> bitint.7 > etc. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok Richard > 2024-01-27 Jakub Jelinek > >* gimple-lower-bitint.cc (gimple_lower_bitint): For >TDF_DETAILS dump mapping of SSA_NAMEs to decls. > > --- gcc/gimple-lower-bitint.cc.jj2024-01-26 00:07:35.629797857 +0100 > +++ gcc/gimple-lower-bitint.cc2024-01-26 17:40:29.083814064 +0100 > @@ -6344,22 +6344,33 @@ gimple_lower_bitint (void) > } > } > tree atype = NULL_TREE; > + if (dump_file && (dump_flags & TDF_DETAILS)) > +fprintf (dump_file, "Mapping SSA_NAMEs to decls:\n"); > EXECUTE_IF_SET_IN_BITMAP (large_huge.m_names, 0, i, bi) >{ > tree s = ssa_name (i); > int p = var_to_partition (large_huge.m_map, s); > - if (large_huge.m_vars[p] != NULL_TREE) > -continue; > - if (atype == NULL_TREE > - || !tree_int_cst_equal (TYPE_SIZE (atype), > - TYPE_SIZE (TREE_TYPE (s > + if (large_huge.m_vars[p] == NULL_TREE) >{ > - unsigned HOST_WIDE_INT nelts > -= tree_to_uhwi (TYPE_SIZE (TREE_TYPE (s))) / limb_prec; > - atype = build_array_type_nelts (large_huge.m_limb_type, nelts); > + if (atype == NULL_TREE > + || !tree_int_cst_equal (TYPE_SIZE (atype), > + TYPE_SIZE (TREE_TYPE (s > +{ > + unsigned HOST_WIDE_INT nelts > += tree_to_uhwi (TYPE_SIZE (TREE_TYPE (s))) / limb_prec; > + atype = build_array_type_nelts (large_huge.m_limb_type, > + nelts); > +} > + large_huge.m_vars[p] = create_tmp_var (atype, "bitint"); > + mark_addressable (large_huge.m_vars[p]); > +} > + if (dump_file && (dump_flags & TDF_DETAILS)) > +{ > + print_generic_expr (dump_file, s, TDF_SLIM); > + fprintf (dump_file, " -> "); > + print_generic_expr (dump_file, large_huge.m_vars[p], TDF_SLIM); > + fprintf (dump_file, "\n"); >} > - large_huge.m_vars[p] = create_tmp_var (atype, "bitint"); > - mark_addressable (large_huge.m_vars[p]); >} > } > > >Jakub >
Re: [PATCH] lower-bitint: Avoid sign-extending cast to unsigned types feeding div/mod/float [PR113614]
> Am 27.01.2024 um 09:18 schrieb Jakub Jelinek : > > Hi! > > The following testcase is miscompiled, because some narrower value > is sign-extended to wider unsigned _BitInt used as division operand. > handle_operand_addr for that case returns the narrower value and > precision -prec_of_narrower_value. That works fine for multiplication > (at least, normal multiplication, but we don't merge casts with > .MUL_OVERFLOW or the ubsan multiplication right now), because the > result is the same whether we treat the arguments as signed or unsigned. > But is completely wrong for division/modulo or conversions to > floating-point, if we pass negative prec for an input operand of a libgcc > handler, those treat it like a negative number, not an unsigned one > sign-extended from something smaller (and it doesn't know to what precision > it has been extended). > > So, the following patch fixes it by making sure we don't merge such > sign-extensions to unsigned _BitInt type with division, modulo or > conversions to floating point. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok Richard > 2024-01-27 Jakub Jelinek > >PR tree-optimization/113614 >* gimple-lower-bitint.cc (gimple_lower_bitint): Don't merge >widening casts from signed to unsigned types with TRUNC_DIV_EXPR, >TRUNC_MOD_EXPR or FLOAT_EXPR uses. > >* gcc.dg/torture/bitint-54.c: New test. > > --- gcc/gimple-lower-bitint.cc.jj2024-01-26 18:05:24.461891138 +0100 > +++ gcc/gimple-lower-bitint.cc2024-01-26 19:04:07.948780942 +0100 > @@ -6102,17 +6102,27 @@ gimple_lower_bitint (void) > && (TREE_CODE (rhs1) != SSA_NAME > || !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs1))) >{ > - if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE > - || (bitint_precision_kind (TREE_TYPE (rhs1)) > - < bitint_prec_large)) > -continue; > if (is_gimple_assign (use_stmt)) >switch (gimple_assign_rhs_code (use_stmt)) > { > - case MULT_EXPR: > case TRUNC_DIV_EXPR: > case TRUNC_MOD_EXPR: > case FLOAT_EXPR: > +/* For division, modulo and casts to floating > + point, avoid representing unsigned operands > + using negative prec if they were sign-extended > + from narrower precision. */ > +if (TYPE_UNSIGNED (TREE_TYPE (s)) > +&& !TYPE_UNSIGNED (TREE_TYPE (rhs1)) > +&& (TYPE_PRECISION (TREE_TYPE (s)) > +> TYPE_PRECISION (TREE_TYPE (rhs1 > + goto force_name; > +/* FALLTHRU */ > + case MULT_EXPR: > +if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE > +|| (bitint_precision_kind (TREE_TYPE (rhs1)) > +< bitint_prec_large)) > + continue; >/* Uses which use handle_operand_addr can't > deal with nested casts. */ >if (TREE_CODE (rhs1) == SSA_NAME > @@ -6126,6 +6136,10 @@ gimple_lower_bitint (void) > default: >break; >} > + if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE > + || (bitint_precision_kind (TREE_TYPE (rhs1)) > + < bitint_prec_large)) > +continue; > if ((TYPE_PRECISION (TREE_TYPE (rhs1)) > >= TYPE_PRECISION (TREE_TYPE (s))) > && mergeable_op (use_stmt)) > --- gcc/testsuite/gcc.dg/torture/bitint-54.c.jj2024-01-26 > 19:09:01.436688318 +0100 > +++ gcc/testsuite/gcc.dg/torture/bitint-54.c2024-01-26 19:16:24.908504368 > +0100 > @@ -0,0 +1,29 @@ > +/* PR tree-optimization/113614 */ > +/* { dg-do run { target bitint } } */ > +/* { dg-options "-std=c23 -pedantic-errors" } */ > +/* { dg-skip-if "" { ! run_expensive_tests } { "*" } { "-O0" "-O2" } } */ > +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */ > + > +_BitInt(8) a; > +_BitInt(8) b; > +_BitInt(8) c; > + > +#if __BITINT_MAXWIDTH__ >= 256 > +_BitInt(256) > +foo (_BitInt(8) y, unsigned _BitInt(256) z) > +{ > + unsigned _BitInt(256) d = -y; > + z /= d; > + return z + a + b + c; > +} > +#endif > + > +int > +main () > +{ > +#if __BITINT_MAXWIDTH__ >= 256 > + if (foo (0xfwb, 0x24euwb)) > +__builtin_abort (); > +#endif > + return 0; > +} > >Jakub >
Re: [PATCH v3 0/2] x86: Don't save callee-saved registers if not needed
On Wed, Jan 24, 2024 at 7:36 PM Hongtao Liu wrote: > > On Tue, Jan 23, 2024 at 11:00 PM H.J. Lu wrote: > > > > Changes in v3: > > > > 1. Rebase against commit 02e68389494 > > 2. Don't add call_no_callee_saved_registers to machine_function since > > all callee-saved registers are properly clobbered by callee with > > no_callee_saved_registers attribute. > > > The patch LGTM, it should be low risk since there's already > no_caller_save_registers attribute, the patch just extends to > no_callee_save_registers with the same approach. > So if there's no objection(or any concerns) in the next couple days, > I'm ok for the patch to be in GCC14 and backport. I am checking it in. Thanks. H.J. > > Changes in v2: > > > > 1. Rebase against commit f9df00340e3 > > 2. Don't add redundant clobbered_registers check in ix86_expand_call. > > > > In some cases, there are no need to save callee-saved registers: > > > > 1. If a noreturn function doesn't throw nor support exceptions, it can > > skip saving callee-saved registers. > > > > 2. When an interrupt handler is implemented by an assembly stub which does: > > > > 1. Save all registers. > > 2. Call a C function. > > 3. Restore all registers. > > 4. Return from interrupt. > > > > it is completely unnecessary to save and restore any registers in the C > > function called by the assembly stub, even if they would normally be > > callee-saved. > > > > This patch set adds no_callee_saved_registers function attribute, which > > is complementary to no_caller_saved_registers function attribute, to > > classify x86 backend call-saved register handling type with > > > > 1. Default call-saved registers. > > 2. No caller-saved registers with no_caller_saved_registers attribute. > > 3. No callee-saved registers with no_callee_saved_registers attribute. > > > > Functions of no callee-saved registers won't save callee-saved registers. > > If a noreturn function doesn't throw nor support exceptions, it is > > classified as the no callee-saved registers type. > > > > With these changes, __libc_start_main in glibc 2.39, which is a noreturn > > function, is changed from > > > > __libc_start_main: > > endbr64 > > push %r15 > > push %r14 > > mov%rcx,%r14 > > push %r13 > > push %r12 > > push %rbp > > mov%esi,%ebp > > push %rbx > > mov%rdx,%rbx > > sub$0x28,%rsp > > mov%rdi,(%rsp) > > mov%fs:0x28,%rax > > mov%rax,0x18(%rsp) > > xor%eax,%eax > > test %r9,%r9 > > > > to > > > > __libc_start_main: > > endbr64 > > sub$0x28,%rsp > > mov%esi,%ebp > > mov%rdx,%rbx > > mov%rcx,%r14 > > mov%rdi,(%rsp) > > mov%fs:0x28,%rax > > mov%rax,0x18(%rsp) > > xor%eax,%eax > > test %r9,%r9 > > > > In Linux kernel 6.7.0 on x86-64, do_exit is changed from > > > > do_exit: > > endbr64 > > call > > push %r15 > > push %r14 > > push %r13 > > push %r12 > > mov%rdi,%r12 > > push %rbp > > push %rbx > > mov%gs:0x0,%rbx > > sub$0x28,%rsp > > mov%gs:0x28,%rax > > mov%rax,0x20(%rsp) > > xor%eax,%eax > > call *0x0(%rip)# > > test $0x2,%ah > > je > > > > to > > > > do_exit: > > endbr64 > > call > > sub$0x28,%rsp > > mov%rdi,%r12 > > mov%gs:0x28,%rax > > mov%rax,0x20(%rsp) > > xor%eax,%eax > > mov%gs:0x0,%rbx > > call *0x0(%rip)# > > test $0x2,%ah > > je > > > > I compared GCC master branch bootstrap and test times on a slow machine > > with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13 > > with the backported patch. The performance data isn't precise since the > > measurements were done on different days with different GCC sources under > > different 6.6 kernel versions. > > > > GCC master branch build time in seconds: > > > > beforeafter improvement > > 30043.75user 30013.16user 0% > > 1274.85system 1243.72system 2.4% > > > > GCC master branch test time in seconds (new tests added): > > > > beforeafter improvement > > 216035.90user 216547.51user 0 > > 27365.51system26658.54system 2.6% > > > > Backported to GCC 13 to rebuild system glibc and kernel on Fedora 39. > > Systems perform normally. > > > > > > H.J. Lu (2): > > x86: Add no_callee_saved_registers function attribute > > x86: Don't save callee-saved registers in noreturn functions > > > > gcc/config/i386/i386-expand.cc| 52 +--- > > gcc/config/i386/i386-options.cc
Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.
On Sat, 2024-01-27 at 18:02 +0800, Xi Ruoyao wrote: > On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote: > > > > 在 2024/1/26 下午6:57, Xi Ruoyao 写道: > > > On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote: > > > > 在 2024/1/26 下午4:49, Xi Ruoyao 写道: > > > > > On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote: > > > > > > v3 -> v4: > > > > > > 1. Add macro support for TLS symbols > > > > > > 2. Added support for loading __get_tls_addr symbol address > > > > > > using call36. > > > > > > 3. Merge template got_load_tls_{ld/gd/le/ie}. > > > > > > 4. Enable explicit reloc for extreme TLS GD/LD with > > > > > > -mexplicit-relocs=auto. > > > > > I've rebased and attached the patch to fix the bad split in > > > > > -mexplicit- > > > > > relocs={always,auto} -mcmodel=extreme on top of this series. I've not > > > > > tested it seriously though (only tested the added and modified test > > > > > cases). > > > > > > > > > OK, I'll test the spec for correctness. > > > I suppose this still won't work yet because Binutils is not fully fixed. > > > GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0, > > > foo", but ld is still not checking if an R_LARCH_RELAX is after > > > R_LARCH_TLS_IE_PC_{HI20,LO12} properly. Thus an invalid "partial" TLS > > > transition can still happen. > > > > > > > The following situations are not handled in the patch: > > > > diff --git a/gcc/config/loongarch/loongarch.cc > > b/gcc/config/loongarch/loongarch.cc > > > > index 3fab4b64453..6336a9f696f 100644 > > --- a/gcc/config/loongarch/loongarch.cc > > +++ b/gcc/config/loongarch/loongarch.cc > > @@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree > > thunk_fndecl ATTRIBUTE_UNUSED, > > { > > if (TARGET_CMODEL_EXTREME) > > { > > - emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); > > + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE) > > + { > > + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr)); > > + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2)); > > + } > > + else > > + emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); It looks like this part is unreachable: with -mcmodel=extreme use_sibcall_p will never be true. So cleaned up this part and fixed an ERROR in the added test: diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 3a97ba61362..7b8c85a1606 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -7481,21 +7481,24 @@ loongarch_output_mi_thunk (FILE *file, tree thunk_fndecl ATTRIBUTE_UNUSED, allowed, otherwise load the address into a register first. */ if (use_sibcall_p) { - if (TARGET_CMODEL_EXTREME) - { - emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); - insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx)); - } - else - insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx)); + /* If TARGET_CMODEL_EXTREME, we cannot do a direct jump at all +and const_call_insn_operand should have returned false. */ + gcc_assert (!TARGET_CMODEL_EXTREME); + + insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx)); SIBLING_CALL_P (insn) = 1; } else { - if (TARGET_CMODEL_EXTREME) + if (!TARGET_CMODEL_EXTREME) + loongarch_emit_move (temp1, fnaddr); + else if (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE) emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); else - loongarch_emit_move (temp1, fnaddr); + { + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr)); + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2)); + } emit_jump_insn (gen_indirect_jump (temp1)); } diff --git a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c index 27baf4886d6..35bd4570a9e 100644 --- a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c +++ b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fPIC -mexplicit-relocs=auto -mcmodel=extreme -fno-plt" } */ -/* { dg-final { scan-assembler-not "la.tls.[lg]d" { target tls_native } } } */ +/* { dg-final { scan-assembler-not "la.tls.\[lg\]d" { target tls_native } } } */ #include "./explicit-relocs-auto-tls-ld-gd.c" And added 3 tests for output_mi_thunk. The updated patch attached, now running regression test. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University From ecbadf341234fcec2e0c16e6b2435d117bf80446 Mon Sep 17 00:00:00 2001 From: Xi Ruoyao Date: Fri, 5 Jan 2024 18:40:06 +0800 Subject: [PATCH 5/4] LoongArch: Don't split the instructions
Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.
On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote: > > 在 2024/1/26 下午6:57, Xi Ruoyao 写道: > > On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote: > > > 在 2024/1/26 下午4:49, Xi Ruoyao 写道: > > > > On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote: > > > > > v3 -> v4: > > > > > 1. Add macro support for TLS symbols > > > > > 2. Added support for loading __get_tls_addr symbol address using > > > > > call36. > > > > > 3. Merge template got_load_tls_{ld/gd/le/ie}. > > > > > 4. Enable explicit reloc for extreme TLS GD/LD with > > > > > -mexplicit-relocs=auto. > > > > I've rebased and attached the patch to fix the bad split in -mexplicit- > > > > relocs={always,auto} -mcmodel=extreme on top of this series. I've not > > > > tested it seriously though (only tested the added and modified test > > > > cases). > > > > > > > OK, I'll test the spec for correctness. > > I suppose this still won't work yet because Binutils is not fully fixed. > > GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0, > > foo", but ld is still not checking if an R_LARCH_RELAX is after > > R_LARCH_TLS_IE_PC_{HI20,LO12} properly. Thus an invalid "partial" TLS > > transition can still happen. > > > > The following situations are not handled in the patch: > > diff --git a/gcc/config/loongarch/loongarch.cc > b/gcc/config/loongarch/loongarch.cc > > index 3fab4b64453..6336a9f696f 100644 > --- a/gcc/config/loongarch/loongarch.cc > +++ b/gcc/config/loongarch/loongarch.cc > @@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree > thunk_fndecl ATTRIBUTE_UNUSED, > { > if (TARGET_CMODEL_EXTREME) > { > - emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); > + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE) > + { > + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr)); > + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2)); > + } > + else > + emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); > insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx)); > } > else > @@ -7482,7 +7488,15 @@ loongarch_output_mi_thunk (FILE *file, tree > thunk_fndecl ATTRIBUTE_UNUSED, > else > { > if (TARGET_CMODEL_EXTREME) > - emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); > + { > + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE) > + { > + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr)); > + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2)); > + } > + else > + emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2)); > + } > else > loongarch_emit_move (temp1, fnaddr); In deed. Considering the similarity of these two hunks I'll separate the logic into a static function though. And I'll also add some test case for them... -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
[PATCH] lower-bitint: Avoid sign-extending cast to unsigned types feeding div/mod/float [PR113614]
Hi! The following testcase is miscompiled, because some narrower value is sign-extended to wider unsigned _BitInt used as division operand. handle_operand_addr for that case returns the narrower value and precision -prec_of_narrower_value. That works fine for multiplication (at least, normal multiplication, but we don't merge casts with .MUL_OVERFLOW or the ubsan multiplication right now), because the result is the same whether we treat the arguments as signed or unsigned. But is completely wrong for division/modulo or conversions to floating-point, if we pass negative prec for an input operand of a libgcc handler, those treat it like a negative number, not an unsigned one sign-extended from something smaller (and it doesn't know to what precision it has been extended). So, the following patch fixes it by making sure we don't merge such sign-extensions to unsigned _BitInt type with division, modulo or conversions to floating point. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-01-27 Jakub Jelinek PR tree-optimization/113614 * gimple-lower-bitint.cc (gimple_lower_bitint): Don't merge widening casts from signed to unsigned types with TRUNC_DIV_EXPR, TRUNC_MOD_EXPR or FLOAT_EXPR uses. * gcc.dg/torture/bitint-54.c: New test. --- gcc/gimple-lower-bitint.cc.jj 2024-01-26 18:05:24.461891138 +0100 +++ gcc/gimple-lower-bitint.cc 2024-01-26 19:04:07.948780942 +0100 @@ -6102,17 +6102,27 @@ gimple_lower_bitint (void) && (TREE_CODE (rhs1) != SSA_NAME || !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs1))) { - if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE - || (bitint_precision_kind (TREE_TYPE (rhs1)) - < bitint_prec_large)) - continue; if (is_gimple_assign (use_stmt)) switch (gimple_assign_rhs_code (use_stmt)) { - case MULT_EXPR: case TRUNC_DIV_EXPR: case TRUNC_MOD_EXPR: case FLOAT_EXPR: + /* For division, modulo and casts to floating + point, avoid representing unsigned operands + using negative prec if they were sign-extended + from narrower precision. */ + if (TYPE_UNSIGNED (TREE_TYPE (s)) + && !TYPE_UNSIGNED (TREE_TYPE (rhs1)) + && (TYPE_PRECISION (TREE_TYPE (s)) + > TYPE_PRECISION (TREE_TYPE (rhs1 + goto force_name; + /* FALLTHRU */ + case MULT_EXPR: + if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE + || (bitint_precision_kind (TREE_TYPE (rhs1)) + < bitint_prec_large)) + continue; /* Uses which use handle_operand_addr can't deal with nested casts. */ if (TREE_CODE (rhs1) == SSA_NAME @@ -6126,6 +6136,10 @@ gimple_lower_bitint (void) default: break; } + if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE + || (bitint_precision_kind (TREE_TYPE (rhs1)) + < bitint_prec_large)) + continue; if ((TYPE_PRECISION (TREE_TYPE (rhs1)) >= TYPE_PRECISION (TREE_TYPE (s))) && mergeable_op (use_stmt)) --- gcc/testsuite/gcc.dg/torture/bitint-54.c.jj 2024-01-26 19:09:01.436688318 +0100 +++ gcc/testsuite/gcc.dg/torture/bitint-54.c2024-01-26 19:16:24.908504368 +0100 @@ -0,0 +1,29 @@ +/* PR tree-optimization/113614 */ +/* { dg-do run { target bitint } } */ +/* { dg-options "-std=c23 -pedantic-errors" } */ +/* { dg-skip-if "" { ! run_expensive_tests } { "*" } { "-O0" "-O2" } } */ +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */ + +_BitInt(8) a; +_BitInt(8) b; +_BitInt(8) c; + +#if __BITINT_MAXWIDTH__ >= 256 +_BitInt(256) +foo (_BitInt(8) y, unsigned _BitInt(256) z) +{ + unsigned _BitInt(256) d = -y; + z /= d; + return z + a + b + c; +} +#endif + +int +main () +{ +#if __BITINT_MAXWIDTH__ >= 256 + if (foo (0xfwb, 0x24euwb)) +__builtin_abort (); +#endif + return 0; +} Jakub
[PATCH] lower-bitint: Fix up VIEW_CONVERT_EXPR handling in lower_mergeable_stmt [PR113568]
Hi! We generally allow merging mergeable stmts with some final cast (but not further casts or mergeable operations after the cast). As some casts are handled conditionally, if (idx < cst) handle_operand (idx); else if idx == cst) handle_operand (cst); else ..., we must sure that e.g. the mergeable PLUS_EXPR/MINUS_EXPR/NEGATE_EXPR never appear in handle_operand called from such casts, because it ICEs on invalid SSA_NAME form (that part could be fixable by adding further PHIs) but also because we'd need to correctly propagate the overflow flags from the if to else if. So, instead lower_mergeable_stmt handles an outermost widening cast (or widening cast feeding outermost store) specially. The problem was similar to PR113408, that VIEW_CONVERT_EXPR tree is present in the gimple_assign_rhs1 while it is not for NOP_EXPR/CONVERT_EXPR, so the checks whether the outermost cast should be handled didn't handle the VCE case and so handle_plus_minus was called from the conditional handle_cast. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-01-27 Jakub Jelinek PR tree-optimization/113568 * gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt): For VIEW_CONVERT_EXPR use first operand of rhs1 instead of rhs1 in the widening extension checks. * gcc.dg/bitint-78.c: New test. --- gcc/gimple-lower-bitint.cc.jj 2024-01-26 17:40:29.083814064 +0100 +++ gcc/gimple-lower-bitint.cc 2024-01-26 18:05:24.461891138 +0100 @@ -2401,6 +2401,8 @@ bitint_large_huge::lower_mergeable_stmt rhs1 = gimple_assign_rhs1 (store_operand ? SSA_NAME_DEF_STMT (store_operand) : stmt); + if (TREE_CODE (rhs1) == VIEW_CONVERT_EXPR) + rhs1 = TREE_OPERAND (rhs1, 0); /* Optimize mergeable ops ending with widening cast to _BitInt (or followed by store). We can lower just the limbs of the cast operand and widen afterwards. */ --- gcc/testsuite/gcc.dg/bitint-78.c.jj 2024-01-26 18:11:54.164435951 +0100 +++ gcc/testsuite/gcc.dg/bitint-78.c2024-01-26 18:11:33.642723218 +0100 @@ -0,0 +1,21 @@ +/* PR tree-optimization/113568 */ +/* { dg-do compile { target bitint } } */ +/* { dg-options "-O2 -std=c23" } */ + +signed char c; +#if __BITINT_MAXWIDTH__ >= 464 +_BitInt(464) g; + +void +foo (void) +{ + _BitInt(464) a[2] = {}; + _BitInt(464) b; + while (c) +{ + b = g + 1; + g = a[0]; + a[0] = b; +} +} +#endif Jakub
[PATCH] lower-bitint: Add debugging dump of SSA_NAME -> decl mappings
Hi! While the SSA coalescing performed by lower bitint prints some information if -fdump-tree-bitintlower-details, it is really hard to read and doesn't contain the most important information which one looks for when debugging bitint lowering issues, namely what VAR_DECLs (or PARM_DECLs/RESULT_DECLs) each SSA_NAME in large_huge.m_names bitmap maps to. So, the following patch adds dumping of that, so that we know that say _3 -> bitint.3 _8 -> bitint.7 _16 -> bitint.7 etc. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-01-27 Jakub Jelinek * gimple-lower-bitint.cc (gimple_lower_bitint): For TDF_DETAILS dump mapping of SSA_NAMEs to decls. --- gcc/gimple-lower-bitint.cc.jj 2024-01-26 00:07:35.629797857 +0100 +++ gcc/gimple-lower-bitint.cc 2024-01-26 17:40:29.083814064 +0100 @@ -6344,22 +6344,33 @@ gimple_lower_bitint (void) } } tree atype = NULL_TREE; + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Mapping SSA_NAMEs to decls:\n"); EXECUTE_IF_SET_IN_BITMAP (large_huge.m_names, 0, i, bi) { tree s = ssa_name (i); int p = var_to_partition (large_huge.m_map, s); - if (large_huge.m_vars[p] != NULL_TREE) - continue; - if (atype == NULL_TREE - || !tree_int_cst_equal (TYPE_SIZE (atype), - TYPE_SIZE (TREE_TYPE (s + if (large_huge.m_vars[p] == NULL_TREE) { - unsigned HOST_WIDE_INT nelts - = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (s))) / limb_prec; - atype = build_array_type_nelts (large_huge.m_limb_type, nelts); + if (atype == NULL_TREE + || !tree_int_cst_equal (TYPE_SIZE (atype), + TYPE_SIZE (TREE_TYPE (s + { + unsigned HOST_WIDE_INT nelts + = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (s))) / limb_prec; + atype = build_array_type_nelts (large_huge.m_limb_type, + nelts); + } + large_huge.m_vars[p] = create_tmp_var (atype, "bitint"); + mark_addressable (large_huge.m_vars[p]); + } + if (dump_file && (dump_flags & TDF_DETAILS)) + { + print_generic_expr (dump_file, s, TDF_SLIM); + fprintf (dump_file, " -> "); + print_generic_expr (dump_file, large_huge.m_vars[p], TDF_SLIM); + fprintf (dump_file, "\n"); } - large_huge.m_vars[p] = create_tmp_var (atype, "bitint"); - mark_addressable (large_huge.m_vars[p]); } } Jakub
[PATCH] libgcc: Fix up _BitInt division [PR113604]
Hi! The following testcase ends up with SIGFPE in __divmodbitint4. The problem is a thinko in my attempt to implement Knuth's algorithm. The algorithm does (where b is 65536, i.e. one larger than what fits in their unsigned short word): // Compute estimate qhat of q[j]. qhat = (un[j+n]*b + un[j+n-1])/vn[n-1]; rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1]; again: if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2]) { qhat = qhat - 1; rhat = rhat + vn[n-1]; if (rhat < b) goto again; } The problem is that it uses a double-word / word -> double-word division (and modulo), while all we have is udiv_qrnnd unless we'd want to do further library calls, and udiv_qrnnd is a double-word / word -> word division and modulo. Now, as the algorithm description says, it can produce at most word bits + 1 bit quotient. And I believe that actually the highest qhat the original algorithm can produce is (1 << word_bits) + 1. The algorithm performs earlier canonicalization where both the divisor and dividend are shifted left such that divisor has msb set. If it has msb set already before, no shifting occurs but we start with added 0 limb, so in the first uv1:uv0 double-word uv1 is 0 and so we can't get too high qhat, if shifting occurs, the first limb of dividend is shifted right by UWtype bits - shift count into a new limb, so again in the first iteration in the uv1:uv0 double-word uv1 doesn't have msb set while vv1 does and qhat has to fit into word. In the following iterations, previous iteration should guarantee that the previous quotient digit is correct. Even if the divisor was the maximal possible vv1:all_ones_in_all_lower_limbs, if the old uv0:lower_limbs would be larger or equal to the divisor, the previous quotient digit would increase and another divisor would be subtracted, which I think implies that in the next iteration in uv1:uv0 double-word uv1 <= vv1, but uv0 could be up to all ones, e.g. in case of all lower limbs of divisor being all ones and at least one dividend limb below uv0 being not all ones. So, we can e.g. for 64-bit UWtype see uv1:uv0 / vv1 0x8000UL:0xUL / 0x8000UL or 0xUL:0xUL / 0xUL In all these cases (when uv1 == vv1 && uv0 >= uv1), qhat is 0x10001UL, i.e. 2 more than fits into UWtype result, if uv1 == vv1 && uv0 < uv1 it would be 0x1UL, i.e. 1 more than fits into UWtype result. Because we only have udiv_qrnnd which can't deal with those too large cases (SIGFPEs or otherwise invokes undefined behavior on those), I've tried to handle the uv1 >= vv1 case separately, but for one thing I thought it would be at most 1 larger than what fits, and for two have actually subtracted vv1:vv1 from uv1:uv0 instead of subtracting 0:vv1 from uv1:uv0. For the uv1 < vv1 case, the implementation already performs roughly what the algorithm does. Now, let's see what happens with the two possible extra cases in the original algorithm. If uv1 == vv1 && uv0 < uv1, qhat above would be b, so we take if (qhat >= b, decrement qhat by 1 (it becomes b - 1), add vn[n-1] aka vv1 to rhat and goto again if rhat < b (but because qhat already fits we can goto to the again label in the uv1 < vv1 code). rhat in this case is uv0 and rhat + vv1 can but doesn't have to overflow, say for uv0 42UL and vv1 0x8000UL it will not (and so we should goto again), while for uv0 0x8000UL and vv1 0x8001UL it will (and we shouldn't goto again). If uv1 == vv1 && uv0 >= uv1, qhat above would be b + 1, so we take if (qhat >= b, decrement qhat by 1 (it becomes b), add vn[n-1] aka vv1 to rhat. But because vv1 has msb set and rhat in this case is uv0 - vv1, the rhat + vv1 addition certainly doesn't overflow, because (uv0 - vv1) + vv1 is uv0, so in the algorithm we goto again, again take if (qhat >= b and decrement qhat so it finally becomes b - 1, and add vn[n-1] aka vv1 to rhat again. But this time I believe it must always overflow, simply because we added (uv0 - vv1) + vv1 + vv1 and vv1 has msb set, so already vv1 + vv1 must overflow. And because it overflowed, it will not goto again. So, I believe the following patch implements this correctly, by subtracting vv1 from uv1:uv0 double-word once, then comparing again if uv1 >= vv1. If that is true, subtract vv1 from uv1:uv0 again and add 2 * vv1 to rhat, no __builtin_add_overflow is needed as we know it always overflowed and so won't goto again. If after the first subtraction uv1 < vv1, use __builtin_add_overflow when adding vv1 to rhat, because it can but doesn't have to overflow. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-01-27 Jakub Jelinek PR libgcc/113604 * libgcc2.c (__divmodbitint4): If uv1 >= vv1, subtract vv1 from uv1:uv0 once or twice as needed, rather than subtracting vv1:vv1. *