Re: [PATCH 0/6] v2 of libdiagnostics

2024-01-27 Thread Simon Sobisch

Hi David - and thanks for posting an outline for libdiagnostics at
https://gcc.gnu.org/wiki/libdiagnostics

Currently this shows both libdiagnosts and libdiagnostics-sarif-dump 
integrated into GCC. Is this the plan or would those be available as a 
top-level project (the program as an example for the library), possibly 
with the library sources also pushed to GCC?


Oh, and one question as I stumbled over that today: Would libdiagnostics 
now (or in the future) use libtextstyle for formatting (and another 
possible sink: HTML)?


Simon

Am 23.11.2023 um 18:36 schrieb Pedro Alves:

Hi David,

On 2023-11-21 22:20, David Malcolm wrote:

Here's v2 of the "libdiagnostics" shared library idea; see:
   https://gcc.gnu.org/wiki/libdiagnostics

As in v1, patch 1 (for GCC) shows libdiagnostic.h (the public
header file), along with examples of simple self-contained programs that
show various uses of the API.

As in v1, patch 2 (for GCC) is the work-in-progress implementation.

Patch 3 (for GCC) adds a new libdiagnostics++.h, a wrapper API providing
some syntactic sugar when using the API from C++.  I've been using this
to "eat my own dogfood" and write a simple SARIF-dumping tool:
   https://github.com/davidmalcolm/libdiagnostics-sarif-dump

Patch 4 (for GCC) is an internal change needed by patch 1.

Patch 5 (for GCC) updates GCC's source printing code so that when
there's no column information, we don't print annotation lines.  This
fixes the extra lines seen using it from gas discussed in:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635575.html

Patch 6 (for binutils) is an updated version of the experiment at using
the API from gas.

Thoughts?


Do you have plans on making this a top level library instead?  That would allow 
easily
making it a non-optional dependency for binutils, as we could have the library 
in
the binutils-gdb repo as well, for instance.  From the Cauldron discussion I 
understood that
the diagnostics stuff doesn't depend on much of GCC's data structures, and 
doesn't rely on
the garbage collector.  Is there something preventing that?  (Other than 
"it's-a-matter-of-time/effort",
of course.)

Pedro Alves




[PATCH] c++/modules: Handle error header names in modules [PR107594]

2024-01-27 Thread Nathaniel Shead
I don't provide a new test because this error only happens when there
are no include paths at all, and I haven't worked out a way to get this
to happen within DejaGNU (as it adds a number of `-B` and `-I` flags).

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

When there are no include paths while preprocessing a header-name token,
an empty STRING_CST is returned. This patch ensures this is handled when
attempting to create a module for this name.

PR c++/107594

gcc/cp/ChangeLog:

* module.cc (get_module): Bail on empty name.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 840c7ef6dab..3c2fef0e3f4 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -14050,6 +14050,12 @@ get_primary (module_state *parent)
 module_state *
 get_module (tree name, module_state *parent, bool partition)
 {
+  /* We might be given an empty NAME if preprocessing fails to handle
+ a header-name token.  */
+  if (name && TREE_CODE (name) == STRING_CST
+  && TREE_STRING_LENGTH (name) == 0)
+return nullptr;
+
   if (partition)
 {
   if (!parent)
-- 
2.43.0



[PATCH] x86: Generate .cfi_undefined for unsaved callee-saved registers

2024-01-27 Thread H.J. Lu
When assembler directives for DWARF frame unwind is enabled, generate
the .cfi_undefined directive for unsaved callee-saved registers which
have been used in the function.

gcc/

PR target/38534
* config/i386/i386.cc (ix86_post_cfi_startproc): New.
(TARGET_ASM_POST_CFI_STARTPROC): Likewise.

gcc/testsuite/

PR target/38534
* gcc.target/i386/no-callee-saved-19.c: New test.
* gcc.target/i386/no-callee-saved-20.c: Likewise.
* gcc.target/i386/pr38534-7.c: Likewise.
* gcc.target/i386/pr38534-8.c: Likewise.
---
 gcc/config/i386/i386.cc   | 37 +++
 .../gcc.target/i386/no-callee-saved-19.c  | 17 +
 .../gcc.target/i386/no-callee-saved-20.c  | 12 ++
 gcc/testsuite/gcc.target/i386/pr38534-7.c | 18 +
 gcc/testsuite/gcc.target/i386/pr38534-8.c | 13 +++
 5 files changed, 97 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/no-callee-saved-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-8.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b3e7c74846e..d4c10a5ef9b 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22662,6 +22662,40 @@ x86_output_mi_thunk (FILE *file, tree thunk_fndecl, 
HOST_WIDE_INT delta,
   flag_force_indirect_call = saved_flag_force_indirect_call;
 }
 
+/* Implement TARGET_ASM_POST_CFI_STARTPROC.  Triggered after a
+   .cfi_startproc directive is emitted into the assembly file.
+   When assembler directives for DWARF frame unwind is enabled,
+   output the .cfi_undefined directive for unsaved callee-saved
+   registers which have been used in the function.  */
+
+void
+ix86_post_cfi_startproc (FILE *f, tree)
+{
+  if ((cfun->machine->call_saved_registers
+   == TYPE_NO_CALLEE_SAVED_REGISTERS)
+  && dwarf2out_do_cfi_asm ())
+for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+  if (df_regs_ever_live_p (i)
+ && !fixed_regs[i]
+ && !call_used_regs[i]
+ && !STACK_REGNO_P (i)
+ && !MMX_REGNO_P (i))
+   {
+ if (LEGACY_INT_REGNO_P (i))
+   {
+ if (TARGET_64BIT)
+   asm_fprintf (f, "\t.cfi_undefined r%s\n",
+hi_reg_name[i]);
+ else
+   asm_fprintf (f, "\t.cfi_undefined e%s\n",
+hi_reg_name[i]);
+   }
+ else
+   asm_fprintf (f, "\t.cfi_undefined %s\n",
+hi_reg_name[i]);
+   }
+}
+
 static void
 x86_file_start (void)
 {
@@ -26281,6 +26315,9 @@ static const scoped_attribute_specs *const 
ix86_attribute_table[] =
 #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK
 #define TARGET_ASM_CAN_OUTPUT_MI_THUNK x86_can_output_mi_thunk
 
+#undef TARGET_ASM_POST_CFI_STARTPROC
+#define TARGET_ASM_POST_CFI_STARTPROC ix86_post_cfi_startproc
+
 #undef TARGET_ASM_FILE_START
 #define TARGET_ASM_FILE_START x86_file_start
 
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-19.c 
b/gcc/testsuite/gcc.target/i386/no-callee-saved-19.c
new file mode 100644
index 000..60a492cffd3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-19.c
@@ -0,0 +1,17 @@
+/* { dg-do assemble { target *-*-linux* *-*-gnu* } } */
+/* { dg-options "-save-temps -march=tigerlake -O2 
-mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+#include "no-callee-saved-1.c"
+
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
+/* { dg-final { scan-assembler-times ".cfi_undefined rbx" 1 { target { ! ia32 
} } } } */
+/* { dg-final { scan-assembler-times ".cfi_undefined rbp" 1 { target { ! ia32 
} } } } */
+/* { dg-final { scan-assembler-times ".cfi_undefined r12" 1 { target { ! ia32 
} } } } */
+/* { dg-final { scan-assembler-times ".cfi_undefined r13" 1 { target { ! ia32 
} } } } */
+/* { dg-final { scan-assembler-times ".cfi_undefined r14" 1 { target { ! ia32 
} } } } */
+/* { dg-final { scan-assembler-times ".cfi_undefined r15" 1 { target { ! ia32 
} } } } */
+/* { dg-final { scan-assembler-times ".cfi_undefined ebx" 1 { target ia32 } } 
} */
+/* { dg-final { scan-assembler-times ".cfi_undefined esi" 1 { target ia32 } } 
} */
+/* { dg-final { scan-assembler-times ".cfi_undefined edi" 1 { target ia32 } } 
} */
+/* { dg-final { scan-assembler-times ".cfi_undefined ebp" 1 { target ia32 } } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/no-callee-saved-20.c 
b/gcc/testsuite/gcc.target/i386/no-callee-saved-20.c
new file mode 100644
index 000..fc94778824a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/no-callee-saved-20.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target cfi } } */
+/* { dg-options "-march=tigerlake -O2 
-mtune-ctrl=^prologue_using_move,^epilogue_using_move" } */
+
+__attribute__ ((no_callee_saved_registers))
+void

[COMMITTED] bpf: add constant pointer to helper-skb-ancestor-cgroup-id.c test

2024-01-27 Thread Jose E. Marchesi
The purpose of this test is to make sure that constant propagation is
achieved with the proper optimization level, so a BPF call instruction
to a kernel helper is generated.  This patch updates the patch so it
also covers kernel helpers defined with constant static pointers.

The motivation for this patch is:

  
https://lore.kernel.org/bpf/20240127185031.29854-1-jose.march...@oracle.com/T/#u

Tested in bpf-unknown-none target x86_64-linux-gnu host.

gcc/testsuite/ChangeLog

* gcc.target/bpf/helper-skb-ancestor-cgroup-id.c: Add constant
version of kernel helper static pointer.
---
 gcc/testsuite/gcc.target/bpf/helper-skb-ancestor-cgroup-id.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/bpf/helper-skb-ancestor-cgroup-id.c 
b/gcc/testsuite/gcc.target/bpf/helper-skb-ancestor-cgroup-id.c
index 693f390b9bb..075dbe6f852 100644
--- a/gcc/testsuite/gcc.target/bpf/helper-skb-ancestor-cgroup-id.c
+++ b/gcc/testsuite/gcc.target/bpf/helper-skb-ancestor-cgroup-id.c
@@ -5,6 +5,7 @@
 
 struct __sk_buff;
 static uint64_t (*bpf_skb_ancestor_cgroup_id)(struct __sk_buff *skb, int 
ancestor_level) = (void *) 83;
+static uint64_t (* const const_bpf_skb_ancestor_cgroup_id)(struct __sk_buff 
*skb, int ancestor_level) = (void *) 84;
 
 void
 foo ()
@@ -13,7 +14,9 @@ foo ()
   void *skb;
   int ancestor_level;
 
-  ret = bpf_skb_ancestor_cgroup_id (skb, ancestor_level);
+  ret = bpf_skb_ancestor_cgroup_id (skb, ancestor_level)
++ const_bpf_skb_ancestor_cgroup_id (skb, ancestor_level);
 }
 
 /* { dg-final { scan-assembler "call\t83" } } */
+/* { dg-final { scan-assembler "call\t84" } } */
-- 
2.30.2



[PATCH, committed] Fortran: fix bounds-checking errors for CLASS array dummies [PR104908]

2024-01-27 Thread Harald Anlauf
Dear all,

commit r11-1235 for pr95331 addressed array bounds issues with
unlimited polymorphic array dummies, but caused regressions for
CLASS array dummies that lead to either wrong code with bounds-checking,
or an ICE.  The solution is simple: add a check whether the dummy
is unlimited polymorphic and otherwise restore the previous behavior.

The attached patch regtested fine on x86_64-pc-linux-gnu and was
OK'ed in the PR by Jerry.

Pushed as: r14-8471-gce61de1b8a1bb3

Since this is a 11/12/13/14 regression and appears safe otherwise,
I intend to backport as suitable, unless there are comments.

Thanks,
Harald

From ce61de1b8a1bb3a22118e900376f380768f2ba59 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sat, 27 Jan 2024 17:41:43 +0100
Subject: [PATCH] Fortran: fix bounds-checking errors for CLASS array dummies
 [PR104908]

Commit r11-1235 addressed issues with bounds of unlimited polymorphic array
dummies.  However, using the descriptor from sym->backend_decl does break
the case of CLASS array dummies.  The obvious solution is to restrict the
fix to the unlimited polymorphic case, thus keeping the original descriptor
in the ordinary case.

gcc/fortran/ChangeLog:

	PR fortran/104908
	* trans-array.cc (gfc_conv_array_ref): Restrict use of transformed
	descriptor (sym->backend_decl) to the unlimited polymorphic case.

gcc/testsuite/ChangeLog:

	PR fortran/104908
	* gfortran.dg/pr104908.f90: New test.
---
 gcc/fortran/trans-array.cc |  5 +++-
 gcc/testsuite/gfortran.dg/pr104908.f90 | 32 ++
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr104908.f90

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 878a92aff18..1e0d698a949 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4063,7 +4063,10 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
 }

   decl = se->expr;
-  if (IS_CLASS_ARRAY (sym) && sym->attr.dummy && ar->as->type != AS_DEFERRED)
+  if (UNLIMITED_POLY(sym)
+  && IS_CLASS_ARRAY (sym)
+  && sym->attr.dummy
+  && ar->as->type != AS_DEFERRED)
 decl = sym->backend_decl;

   cst_offset = offset = gfc_index_zero_node;
diff --git a/gcc/testsuite/gfortran.dg/pr104908.f90 b/gcc/testsuite/gfortran.dg/pr104908.f90
new file mode 100644
index 000..c3a30b0003c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr104908.f90
@@ -0,0 +1,32 @@
+! { dg-do compile }
+! { dg-additional-options "-fcheck=bounds -fdump-tree-original" }
+!
+! PR fortran/104908 - incorrect out-of-bounds runtime error
+
+program test
+  implicit none
+  type vec
+ integer :: x(3) = [2,4,6]
+  end type vec
+  type(vec) :: w(2)
+  call sub(w)
+contains
+  subroutine sub (v)
+class(vec), intent(in) :: v(:)
+integer :: k, q(3)
+q = [ (v(1)%x(k), k = 1, 3) ]   ! <-- was failing here after r11-1235
+print *, q
+  end
+end
+
+subroutine sub2 (zz)
+  implicit none
+  type vec
+ integer :: x(2,1)
+  end type vec
+  class(vec), intent(in) :: zz(:)   ! used to ICE after r11-1235
+  integer :: k
+  k = zz(1)%x(2,1)
+end
+
+! { dg-final { scan-tree-dump-times " above upper bound " 4 "original" } }
--
2.35.3



Fix ICE with -g and -std=c23 when forming composite types [PR113438]

2024-01-27 Thread Martin Uecker


Debug output ICEs when we do not set TYPE_STUB_DECL, fix this.


Fix ICE with -g and -std=c23 when forming composite types [PR113438]

Set TYPE_STUB_DECL to an artificial decl when creating a new structure
as a composite type.

PR c/113438

gcc/c/
* c-typeck.cc (composite_type_internal): Set TYPE_STUB_DECL.

gcc/testsuite/
* gcc.dg/pr113438.c: New test.

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 66c6abc9f07..cfa3b7ab10f 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -585,6 +585,11 @@ composite_type_internal (tree t1, tree t2, struct 
composite_cache* cache)
  /* Setup the struct/union type.  Because we inherit all variably
 modified components, we can ignore the size expression.  */
  tree expr = NULL_TREE;
+
+ /* Set TYPE_STUB_DECL for debugging symbols.  */
+ TYPE_STUB_DECL (n) = pushdecl (build_decl (input_location, TYPE_DECL,
+NULL_TREE, n));
+
  n = finish_struct(input_location, n, fields, attributes, NULL, );
 
  n = qualify_type (n, t1);
diff --git a/gcc/testsuite/gcc.dg/pr113438.c b/gcc/testsuite/gcc.dg/pr113438.c
new file mode 100644
index 000..5612ee4fa38
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr113438.c
@@ -0,0 +1,7 @@
+/* PR113438
+ * { dg-do compile }
+ * { dg-options "-std=c23 -g" } */
+
+void g(struct foo { int x; } a);
+void g(struct foo { int x; } a);
+



Re: [aarch64] PR112950: gcc.target/aarch64/sve/acle/general/dupq_5.c fails on aarch64_be-linux-gnu

2024-01-27 Thread Richard Sandiford
Prathamesh Kulkarni  writes:
> Hi,
> The test passes -mlittle-endian option but doesn't have target check
> for aarch64_little_endian and thus fails to compile on
> aarch64_be-linux-gnu. The patch adds the missing aarch64_little_endian
> target check, which makes it unsupported on the target.
> OK to commit ?
>
> Thanks,
> Prathamesh
>
> PR112950: Add aarch64_little_endian target check for dupq_5.c
>
> gcc/testsuite/ChangeLog:
>   PR target/112950
>   * gcc.target/aarch64/sve/acle/general/dupq_5.c: Add
>   aarch64_little_endian target check.

If we add this requirement, then there's no need to pass -mlittle-endian
in the dg-options.

But dupq_6.c (the corresponding big-endian test) has:

  /* To avoid needing big-endian header files.  */
  #pragma GCC aarch64 "arm_sve.h"

instead of:

  #include 

Could you do the same thing here?

Thanks,
Richard

> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> index 6ae8d4c60b2..1990412d0e5 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mlittle-endian" } */
> +/* { dg-require-effective-target aarch64_little_endian } */
>  
>  #include 
>  


[PATCH] vect: Tighten vect_determine_precisions_from_range [PR113281]

2024-01-27 Thread Richard Sandiford
This was another PR caused by the way that
vect_determine_precisions_from_range handle shifts.  We tried to
narrow 32768 >> x to a 16-bit shift based on range information for
the inputs and outputs, with vect_recog_over_widening_pattern
(after PR110828) adjusting the shift amount.  But this doesn't
work for the case where x is in [16, 31], since then 32-bit
32768 >> x is a well-defined zero, whereas no well-defined
16-bit 32768 >> y will produce 0.

We could perhaps generate x < 16 ? 32768 >> x : 0 instead,
but since vect_determine_precisions_from_range was never really
supposed to rely on fix-ups, it seems better to fix that instead.

The patch also makes the code more selective about which codes
can be narrowed based on input and output ranges.  This showed
that vect_truncatable_operation_p was missing cases for
BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR
(equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1).

pr113281-1.c is the original testcase.  pr113281-[23].c failed
before the patch due to overly optimistic narrowing.  pr113281-[45].c
previously passed and are meant to protect against accidental
optimisation regressions.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
PR target/113281
* tree-vect-patterns.cc (vect_recog_over_widening_pattern): Remove
workaround for right shifts.
(vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR.
(vect_determine_precisions_from_range): Be more selective about
which codes can be narrowed based on their input and output ranges.
For shifts, require at least one more bit of precision than the
maximum shift amount.

gcc/testsuite/
PR target/113281
* gcc.dg/vect/pr113281-1.c: New test.
* gcc.dg/vect/pr113281-2.c: Likewise.
* gcc.dg/vect/pr113281-3.c: Likewise.
* gcc.dg/vect/pr113281-4.c: Likewise.
* gcc.dg/vect/pr113281-5.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr113281-1.c |  17 +++
 gcc/testsuite/gcc.dg/vect/pr113281-2.c |  50 +
 gcc/testsuite/gcc.dg/vect/pr113281-3.c |  39 +++
 gcc/testsuite/gcc.dg/vect/pr113281-4.c |  55 ++
 gcc/testsuite/gcc.dg/vect/pr113281-5.c |  66 
 gcc/tree-vect-patterns.cc  | 144 +
 6 files changed, 305 insertions(+), 66 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-5.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-1.c 
b/gcc/testsuite/gcc.dg/vect/pr113281-1.c
new file mode 100644
index 000..6df4231cb5f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr113281-1.c
@@ -0,0 +1,17 @@
+#include "tree-vect.h"
+
+unsigned char a;
+
+int main() {
+  check_vect ();
+
+  short b = a = 0;
+  for (; a != 19; a++)
+if (a)
+  b = 32872 >> a;
+
+  if (b == 0)
+return 0;
+  else
+return 1;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-2.c 
b/gcc/testsuite/gcc.dg/vect/pr113281-2.c
new file mode 100644
index 000..3a1170c28b6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr113281-2.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+
+#define N 128
+
+short x[N];
+short y[N];
+
+void
+f1 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= y[i];
+}
+
+void
+f2 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 32 ? y[i] : 32);
+}
+
+void
+f3 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 31 ? y[i] : 31);
+}
+
+void
+f4 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] & 31);
+}
+
+void
+f5 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= 0x8000 >> y[i];
+}
+
+void
+f6 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= 0x8000 >> (y[i] & 31);
+}
+
+/* { dg-final { scan-tree-dump-not {can narrow[^\n]+>>} "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-3.c 
b/gcc/testsuite/gcc.dg/vect/pr113281-3.c
new file mode 100644
index 000..5982dd2d16f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr113281-3.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+
+#define N 128
+
+short x[N];
+short y[N];
+
+void
+f1 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 30 ? y[i] : 30);
+}
+
+void
+f2 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= ((y[i] & 15) + 2);
+}
+
+void
+f3 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 16 ? y[i] : 16);
+}
+
+void
+f4 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] = 32768 >> ((y[i] & 15) + 3);
+}
+
+/* { dg-final { scan-tree-dump {can narrow to signed:31 without loss [^\n]+>>} 
"vect" } } */
+/* { dg-final { scan-tree-dump {can narrow to signed:18 without loss [^\n]+>>} 
"vect" } } */
+/* { dg-final { scan-tree-dump {can narrow to signed:17 without loss [^\n]+>>} 
"vect" } } */
+/* { dg-final { 

[PATCH] Handle function symbol reference in readonly data section

2024-01-27 Thread H.J. Lu
For function symbol reference in readonly data section, instead of putting
it in .data.rel.ro or .rodata.cst section, call function_rodata_section to
get the read-only or relocated read-only data section associated with the
function DECL so that the COMDAT section will be used for a COMDAT function
symbol.

gcc/

PR rtl-optimization/113617
* varasm.cc (default_elf_select_rtx_section): Call
function_rodata_section to get the read-only or relocated
read-only data section for function symbol reference.

gcc/testsuite/

PR rtl-optimization/113617
* g++.dg/pr113617-1a.C: New test.
* g++.dg/pr113617-1b.C: Likewise.
---
 gcc/testsuite/g++.dg/pr113617-1a.C | 170 +
 gcc/testsuite/g++.dg/pr113617-1b.C |   8 ++
 gcc/varasm.cc  |  18 +++
 3 files changed, 196 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/pr113617-1a.C
 create mode 100644 gcc/testsuite/g++.dg/pr113617-1b.C

diff --git a/gcc/testsuite/g++.dg/pr113617-1a.C 
b/gcc/testsuite/g++.dg/pr113617-1a.C
new file mode 100644
index 000..effd50841c0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr113617-1a.C
@@ -0,0 +1,170 @@
+// { dg-do compile { target fpic } }
+// { dg-require-visibility "" }
+// { dg-options "-O2 -std=c++11 -fPIC -fvisibility=hidden 
-fvisibility-inlines-hidden" }
+
+namespace {
+template  struct integral_constant {
+  static constexpr int value = __v;
+};
+template  using __bool_constant = integral_constant<__v>;
+using true_type = __bool_constant;
+template  struct __conditional {
+  template  using type = _Tp;
+};
+template 
+using __conditional_t = typename __conditional<_Cond>::type<_If, _Else>;
+true_type __trans_tmp_1;
+template  struct remove_cv { using type = _Tp; };
+template 
+struct __decay_selector
+: __conditional_t, _Up> {};
+template  struct decay {
+  using type = typename __decay_selector<_Tp>::type;
+};
+}
+struct vtkCellArray {};
+namespace blah {
+struct _Any_data;
+enum _Manager_operation {};
+template  class function;
+struct _Function_base {
+  using _Manager_type = bool (*)(_Any_data &, const _Any_data &,
+ _Manager_operation);
+  _Manager_type _M_manager;
+};
+template  class _Function_handler;
+template 
+struct _Function_handler<_Res(_ArgTypes...), _Functor> {
+  static bool _M_manager(_Any_data &, const _Any_data &, _Manager_operation) {
+return false;
+  }
+  __attribute__((noipa)) static _Res _M_invoke(const _Any_data &) {}
+};
+template 
+struct function<_Res(_ArgTypes...)> : _Function_base {
+  template 
+  using _Handler = _Function_handler<_Res(), _Functor>;
+  template  function(_Functor) {
+using _My_handler = _Handler<_Functor>;
+_M_invoker = _My_handler::_M_invoke;
+_M_manager = _My_handler::_M_manager;
+  }
+  using _Invoker_type = _Res (*)(const _Any_data &);
+  _Invoker_type _M_invoker;
+};
+template  class _Bind;
+template 
+struct _Bind<_Functor(_Bound_args...)> {};
+template  using __is_socketlike = decltype(__trans_tmp_1);
+template  struct _Bind_helper {
+  typedef _Bind::type(
+  typename decay<_BoundArgs>::type...)>
+  type;
+};
+template 
+__attribute__((noipa)) typename _Bind_helper<__is_socketlike<_Func>::value, 
_Func, _BoundArgs...>::type
+bind(_Func, _BoundArgs...) { return typename 
_Bind_helper<__is_socketlike<_Func>::value, _Func, _BoundArgs...>::type (); }
+template  struct __uniq_ptr_impl {
+  template  struct _Ptr { using type = _Up *; };
+  using pointer = typename _Ptr<_Tp>::type;
+};
+template  struct unique_ptr {
+  using pointer = typename __uniq_ptr_impl<_Tp>::pointer;
+  pointer operator->();
+};
+}
+extern int For_threadNumber;
+namespace vtk {
+namespace detail {
+namespace smp {
+enum BackendType { Sequential, STDThread };
+template  struct vtkSMPToolsImpl {
+  template 
+  __attribute__((noipa)) void For(long long, long long, long long, 
FunctorInternal &) {}
+};
+struct vtkSMPThreadPool {
+  vtkSMPThreadPool(int);
+  void DoJob(blah::function);
+};
+template 
+__attribute__((noipa)) void ExecuteFunctorSTDThread(void *, long long, long 
long, long long) {}
+template <>
+template 
+void vtkSMPToolsImpl::For(long long, long long last, long long 
grain,
+ FunctorInternal ) {
+  vtkSMPThreadPool pool(For_threadNumber);
+  for (;;) {
+auto job = blah::bind(ExecuteFunctorSTDThread, , grain,
+ grain, last);
+pool.DoJob(job);
+  }
+}
+struct vtkSMPToolsAPI {
+  static vtkSMPToolsAPI ();
+  template 
+  void For(long first, long last, long grain, FunctorInternal fi) {
+switch (ActivatedBackend) {
+case Sequential:
+  SequentialBackend->For(first, last, grain, fi);
+case STDThread:
+  STDThreadBackend->For(first, last, grain, fi);
+}
+  }
+  BackendType ActivatedBackend;
+  blah::unique_ptr> SequentialBackend;
+  blah::unique_ptr> STDThreadBackend;
+};
+template  struct vtkSMPTools_FunctorInternal;
+template  struct 

[PATCH v2] x86: Save callee-saved registers in noreturn functions for -O0/-Og

2024-01-27 Thread H.J. Lu
Changes in v2:

1. Lookup noreturn attribute first.
2. Use __attribute__((noreturn, optimize("-Og"))) in pr38534-6.c.


Save callee-saved registers in noreturn functions for -O0/-Og so that
debugger can restore callee-saved registers in caller's frame.

gcc/

PR target/38534
* config/i386/i386-options.cc (ix86_set_func_type): Save
callee-saved registers in noreturn functions for -O0/-Og.

gcc/testsuite/

PR target/38534
* gcc.target/i386/pr38534-5.c: New file.
* gcc.target/i386/pr38534-6.c: Likewise.
---
 gcc/config/i386/i386-options.cc   |  9 +---
 gcc/testsuite/gcc.target/i386/pr38534-5.c | 26 +++
 gcc/testsuite/gcc.target/i386/pr38534-6.c | 26 +++
 3 files changed, 58 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-6.c

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 473f5359fc9..a647b1bdf5c 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3381,7 +3381,8 @@ static void
 ix86_set_func_type (tree fndecl)
 {
   /* No need to save and restore callee-saved registers for a noreturn
- function with nothrow or compiled with -fno-exceptions.
+ function with nothrow or compiled with -fno-exceptions unless when
+ compiling with -O0 or -Og.
 
  NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
  function.  The local-pure-const pass turns an interrupt function
@@ -3391,8 +3392,10 @@ ix86_set_func_type (tree fndecl)
  function is marked as noreturn in the IR output, which leads the
  incompatible attribute error in LTO1.  */
   bool has_no_callee_saved_registers
-= (((TREE_NOTHROW (fndecl) || !flag_exceptions)
-   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl)))
+= ((lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))
+   && optimize
+   && !optimize_debug
+   && (TREE_NOTHROW (fndecl) || !flag_exceptions))
|| lookup_attribute ("no_callee_saved_registers",
TYPE_ATTRIBUTES (TREE_TYPE (fndecl;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr38534-5.c 
b/gcc/testsuite/gcc.target/i386/pr38534-5.c
new file mode 100644
index 000..91c0c0f8c59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr38534-5.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+
+#define ARRAY_SIZE 256
+
+extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
+extern int value (int, int, int)
+#ifndef __x86_64__
+__attribute__ ((regparm(3)))
+#endif
+;
+
+void
+__attribute__((noreturn))
+no_return_to_caller (void)
+{
+  unsigned i, j, k;
+  for (i = ARRAY_SIZE; i > 0; --i)
+for (j = ARRAY_SIZE; j > 0; --j)
+  for (k = ARRAY_SIZE; k > 0; --k)
+   array[i - 1][j - 1][k - 1] = value (i, j, k);
+  while (1);
+}
+
+/* { dg-final { scan-assembler "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr38534-6.c 
b/gcc/testsuite/gcc.target/i386/pr38534-6.c
new file mode 100644
index 000..cf1463a9c66
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr38534-6.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+
+#define ARRAY_SIZE 256
+
+extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
+extern int value (int, int, int)
+#ifndef __x86_64__
+__attribute__ ((regparm(3)))
+#endif
+;
+
+void
+__attribute__((noreturn, optimize("-Og")))
+no_return_to_caller (void)
+{
+  unsigned i, j, k;
+  for (i = ARRAY_SIZE; i > 0; --i)
+for (j = ARRAY_SIZE; j > 0; --j)
+  for (k = ARRAY_SIZE; k > 0; --k)
+   array[i - 1][j - 1][k - 1] = value (i, j, k);
+  while (1);
+}
+
+/* { dg-final { scan-assembler "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
-- 
2.43.0



Re: [PATCH] x86: Save callee-saved registers in noreturn functions for -O0/-Og

2024-01-27 Thread H.J. Lu
On Sat, Jan 27, 2024 at 6:09 AM Jakub Jelinek  wrote:
>
> On Sat, Jan 27, 2024 at 05:52:34AM -0800, H.J. Lu wrote:
> > @@ -3391,7 +3392,9 @@ ix86_set_func_type (tree fndecl)
> >   function is marked as noreturn in the IR output, which leads the
> >   incompatible attribute error in LTO1.  */
> >bool has_no_callee_saved_registers
> > -= (((TREE_NOTHROW (fndecl) || !flag_exceptions)
> > += ((optimize
> > + && !optimize_debug
>
> Shouldn't that be opt_for_fn (fndecl, optimize) and ditto for
> optimize_debug?
> I mean, aren't the options not restored yet when this function is called
> (i.e. remain in whatever state they were in the previous function or
> global state)?

store_parm_decls is called when parsing a function.  store_parm_decls
calls allocate_struct_function which calls

  invoke_set_current_function_hook (fndecl);

which has

 /* Change optimization options if needed.  */
  if (optimization_current_node != opts)
{
  optimization_current_node = opts;
  cl_optimization_restore (_options, _options_set,
   TREE_OPTIMIZATION (opts));
}

  targetm.set_current_function (fndecl);

which calls ix86_set_current_function after global_options
has been updated.   ix86_set_func_type is called from
ix86_set_current_function.

I don't see an issue with optimize and optimize_debug here.

> Also, shouldn't the lookup_attribute ("noreturn" check be the first one?
> I mean, noreturn functions are quite rare and so checking all the other

I will fix it and updated one testcase with

__attribute__((noreturn, optimize("-Og")))

> conditions upon each set_cfun could waste too much compile time.
>
> Also, why check "noreturn" attribute rather than
> TREE_THIS_VOLATILE (fndecl)?
>

The comments above this code has

 NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
 function.  The local-pure-const pass turns an interrupt function
 into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
 the local-pure-const pass is run after ix86_set_func_type is called.
 When the local-pure-const pass is enabled for LTO, the interrupt
 function is marked as noreturn in the IR output, which leads the
 incompatible attribute error in LTO1.

Thanks.

-- 
H.J.


Re: [PATCH] x86: Save callee-saved registers in noreturn functions for -O0/-Og

2024-01-27 Thread Jakub Jelinek
On Sat, Jan 27, 2024 at 05:52:34AM -0800, H.J. Lu wrote:
> @@ -3391,7 +3392,9 @@ ix86_set_func_type (tree fndecl)
>   function is marked as noreturn in the IR output, which leads the
>   incompatible attribute error in LTO1.  */
>bool has_no_callee_saved_registers
> -= (((TREE_NOTHROW (fndecl) || !flag_exceptions)
> += ((optimize
> + && !optimize_debug

Shouldn't that be opt_for_fn (fndecl, optimize) and ditto for
optimize_debug?
I mean, aren't the options not restored yet when this function is called
(i.e. remain in whatever state they were in the previous function or
global state)?

Also, shouldn't the lookup_attribute ("noreturn" check be the first one?
I mean, noreturn functions are quite rare and so checking all the other
conditions upon each set_cfun could waste too much compile time.

Also, why check "noreturn" attribute rather than
TREE_THIS_VOLATILE (fndecl)?

Jakub



Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-27 Thread chenglulu



在 2024/1/27 下午7:11, Xi Ruoyao 写道:

On Sat, 2024-01-27 at 18:02 +0800, Xi Ruoyao wrote:

On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote:

在 2024/1/26 下午6:57, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:

在 2024/1/26 下午4:49, Xi Ruoyao 写道:

On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:

v3 -> v4:
     1. Add macro support for TLS symbols
     2. Added support for loading __get_tls_addr symbol address using call36.
     3. Merge template got_load_tls_{ld/gd/le/ie}.
     4. Enable explicit reloc for extreme TLS GD/LD with -mexplicit-relocs=auto.

I've rebased and attached the patch to fix the bad split in -mexplicit-
relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
tested it seriously though (only tested the added and modified test
cases).


OK, I'll test the spec for correctness.

I suppose this still won't work yet because Binutils is not fully fixed.
GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
foo", but ld is still not checking if an R_LARCH_RELAX is after
R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
transition can still happen.


The following situations are not handled in the patch:

diff --git a/gcc/config/loongarch/loongarch.cc
b/gcc/config/loongarch/loongarch.cc

index 3fab4b64453..6336a9f696f 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree
thunk_fndecl ATTRIBUTE_UNUSED,
   {
     if (TARGET_CMODEL_EXTREME)
  {
- emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
+ if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
+   {
+ emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
+ emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
+   }
+ else
+   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));

It looks like this part is unreachable: with -mcmodel=extreme
use_sibcall_p will never be true.

So cleaned up this part and fixed an ERROR in the added test:

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3a97ba61362..7b8c85a1606 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7481,21 +7481,24 @@ loongarch_output_mi_thunk (FILE *file, tree 
thunk_fndecl ATTRIBUTE_UNUSED,
   allowed, otherwise load the address into a register first.  */
if (use_sibcall_p)
  {
-  if (TARGET_CMODEL_EXTREME)
-   {
- emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
- insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx));
-   }
-  else
-   insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx));
+  /* If TARGET_CMODEL_EXTREME, we cannot do a direct jump at all
+and const_call_insn_operand should have returned false.  */
+  gcc_assert (!TARGET_CMODEL_EXTREME);
+
+  insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx));
SIBLING_CALL_P (insn) = 1;
  }
else
  {
-  if (TARGET_CMODEL_EXTREME)
+  if (!TARGET_CMODEL_EXTREME)
+   loongarch_emit_move (temp1, fnaddr);
+  else if (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE)
emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
else
-   loongarch_emit_move (temp1, fnaddr);
+   {
+ emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
+ emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
+   }
  
emit_jump_insn (gen_indirect_jump (temp1));

  }
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
index 27baf4886d6..35bd4570a9e 100644
--- 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
+++ 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
@@ -1,5 +1,5 @@
  /* { dg-do compile } */
  /* { dg-options "-O2 -fPIC -mexplicit-relocs=auto -mcmodel=extreme -fno-plt" 
} */
-/* { dg-final { scan-assembler-not "la.tls.[lg]d" { target tls_native } } } */
+/* { dg-final { scan-assembler-not "la.tls.\[lg\]d" { target tls_native } } } 
*/
  
  #include "./explicit-relocs-auto-tls-ld-gd.c"


And added 3 tests for output_mi_thunk.  The updated patch attached, now
running regression test.



@@ -2870,20 +2872,30 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)

    {
  if (loongarch_explicit_relocs_p (SYMBOL_GOT_DISP))
    {
- rtx tmp1 = gen_reg_rtx (Pmode);
- rtx high = gen_reg_rtx (Pmode);
+ gcc_assert (la_opt_explicit_relocs !=
+ EXPLICIT_RELOCS_NONE);

This operator is written at the end of the line, and I think there is no 
problem with anything else.


But I need to see the 

[PATCH] x86: Save callee-saved registers in noreturn functions for -O0/-Og

2024-01-27 Thread H.J. Lu
Save callee-saved registers in noreturn functions for -O0/-Og so that
debugger can restore callee-saved registers in caller's frame.

gcc/

PR target/38534
* config/i386/i386-options.cc (ix86_set_func_type): Save
callee-saved registers in noreturn functions for -O0/-Og.

gcc/testsuite/

PR target/38534
* gcc.target/i386/pr38534-5.c: New file.
* gcc.target/i386/pr38534-6.c: Likewise.
---
 gcc/config/i386/i386-options.cc   |  7 --
 gcc/testsuite/gcc.target/i386/pr38534-5.c | 26 +++
 gcc/testsuite/gcc.target/i386/pr38534-6.c | 26 +++
 3 files changed, 57 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr38534-6.c

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 473f5359fc9..5ff5560df7a 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3381,7 +3381,8 @@ static void
 ix86_set_func_type (tree fndecl)
 {
   /* No need to save and restore callee-saved registers for a noreturn
- function with nothrow or compiled with -fno-exceptions.
+ function with nothrow or compiled with -fno-exceptions unless when
+ compiling with -O0 or -Og.
 
  NB: Don't use TREE_THIS_VOLATILE to check if this is a noreturn
  function.  The local-pure-const pass turns an interrupt function
@@ -3391,7 +3392,9 @@ ix86_set_func_type (tree fndecl)
  function is marked as noreturn in the IR output, which leads the
  incompatible attribute error in LTO1.  */
   bool has_no_callee_saved_registers
-= (((TREE_NOTHROW (fndecl) || !flag_exceptions)
+= ((optimize
+   && !optimize_debug
+   && (TREE_NOTHROW (fndecl) || !flag_exceptions)
&& lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl)))
|| lookup_attribute ("no_callee_saved_registers",
TYPE_ATTRIBUTES (TREE_TYPE (fndecl;
diff --git a/gcc/testsuite/gcc.target/i386/pr38534-5.c 
b/gcc/testsuite/gcc.target/i386/pr38534-5.c
new file mode 100644
index 000..91c0c0f8c59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr38534-5.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+
+#define ARRAY_SIZE 256
+
+extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
+extern int value (int, int, int)
+#ifndef __x86_64__
+__attribute__ ((regparm(3)))
+#endif
+;
+
+void
+__attribute__((noreturn))
+no_return_to_caller (void)
+{
+  unsigned i, j, k;
+  for (i = ARRAY_SIZE; i > 0; --i)
+for (j = ARRAY_SIZE; j > 0; --j)
+  for (k = ARRAY_SIZE; k > 0; --k)
+   array[i - 1][j - 1][k - 1] = value (i, j, k);
+  while (1);
+}
+
+/* { dg-final { scan-assembler "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr38534-6.c 
b/gcc/testsuite/gcc.target/i386/pr38534-6.c
new file mode 100644
index 000..756e1ec81f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr38534-6.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -mtune-ctrl=^prologue_using_move,^epilogue_using_move" } 
*/
+
+#define ARRAY_SIZE 256
+
+extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
+extern int value (int, int, int)
+#ifndef __x86_64__
+__attribute__ ((regparm(3)))
+#endif
+;
+
+void
+__attribute__((noreturn))
+no_return_to_caller (void)
+{
+  unsigned i, j, k;
+  for (i = ARRAY_SIZE; i > 0; --i)
+for (j = ARRAY_SIZE; j > 0; --j)
+  for (k = ARRAY_SIZE; k > 0; --k)
+   array[i - 1][j - 1][k - 1] = value (i, j, k);
+  while (1);
+}
+
+/* { dg-final { scan-assembler "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */
-- 
2.43.0



[aarch64] PR112950: gcc.target/aarch64/sve/acle/general/dupq_5.c fails on aarch64_be-linux-gnu

2024-01-27 Thread Prathamesh Kulkarni
Hi,
The test passes -mlittle-endian option but doesn't have target check
for aarch64_little_endian and thus fails to compile on
aarch64_be-linux-gnu. The patch adds the missing aarch64_little_endian
target check, which makes it unsupported on the target.
OK to commit ?

Thanks,
Prathamesh
PR112950: Add aarch64_little_endian target check for dupq_5.c

gcc/testsuite/ChangeLog:
PR target/112950
* gcc.target/aarch64/sve/acle/general/dupq_5.c: Add
aarch64_little_endian target check.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
index 6ae8d4c60b2..1990412d0e5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mlittle-endian" } */
+/* { dg-require-effective-target aarch64_little_endian } */
 
 #include 
 


Re: [PATCH] lower-bitint: Fix up VIEW_CONVERT_EXPR handling in lower_mergeable_stmt [PR113568]

2024-01-27 Thread Richard Biener



> Am 27.01.2024 um 09:16 schrieb Jakub Jelinek :
> 
> Hi!
> 
> We generally allow merging mergeable stmts with some final cast (but not
> further casts or mergeable operations after the cast).  As some casts
> are handled conditionally, if (idx < cst) handle_operand (idx); else if
> idx == cst) handle_operand (cst); else ..., we must sure that e.g. the
> mergeable PLUS_EXPR/MINUS_EXPR/NEGATE_EXPR never appear in handle_operand
> called from such casts, because it ICEs on invalid SSA_NAME form (that part
> could be fixable by adding further PHIs) but also because we'd need to
> correctly propagate the overflow flags from the if to else if.
> So, instead lower_mergeable_stmt handles an outermost widening cast (or
> widening cast feeding outermost store) specially.
> The problem was similar to PR113408, that VIEW_CONVERT_EXPR tree is
> present in the gimple_assign_rhs1 while it is not for NOP_EXPR/CONVERT_EXPR,
> so the checks whether the outermost cast should be handled didn't handle
> the VCE case and so handle_plus_minus was called from the conditional
> handle_cast.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

Ok

Richard 

> 2024-01-27  Jakub Jelinek  
> 
>PR tree-optimization/113568
>* gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt):
>For VIEW_CONVERT_EXPR use first operand of rhs1 instead of rhs1
>in the widening extension checks.
> 
>* gcc.dg/bitint-78.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj2024-01-26 17:40:29.083814064 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-26 18:05:24.461891138 +0100
> @@ -2401,6 +2401,8 @@ bitint_large_huge::lower_mergeable_stmt
>   rhs1 = gimple_assign_rhs1 (store_operand
> ? SSA_NAME_DEF_STMT (store_operand)
> : stmt);
> +  if (TREE_CODE (rhs1) == VIEW_CONVERT_EXPR)
> +rhs1 = TREE_OPERAND (rhs1, 0);
>   /* Optimize mergeable ops ending with widening cast to _BitInt
> (or followed by store).  We can lower just the limbs of the
> cast operand and widen afterwards.  */
> --- gcc/testsuite/gcc.dg/bitint-78.c.jj2024-01-26 18:11:54.164435951 +0100
> +++ gcc/testsuite/gcc.dg/bitint-78.c2024-01-26 18:11:33.642723218 +0100
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/113568 */
> +/* { dg-do compile { target bitint } } */
> +/* { dg-options "-O2 -std=c23" } */
> +
> +signed char c;
> +#if __BITINT_MAXWIDTH__ >= 464
> +_BitInt(464) g;
> +
> +void
> +foo (void)
> +{
> +  _BitInt(464) a[2] = {};
> +  _BitInt(464) b;
> +  while (c)
> +{
> +  b = g + 1;
> +  g = a[0];
> +  a[0] = b;
> +}
> +}
> +#endif
> 
>Jakub
> 


Re: [PATCH] lower-bitint: Add debugging dump of SSA_NAME -> decl mappings

2024-01-27 Thread Richard Biener



> Am 27.01.2024 um 09:15 schrieb Jakub Jelinek :
> 
> Hi!
> 
> While the SSA coalescing performed by lower bitint prints some information
> if -fdump-tree-bitintlower-details, it is really hard to read and doesn't
> contain the most important information which one looks for when debugging
> bitint lowering issues, namely what VAR_DECLs (or PARM_DECLs/RESULT_DECLs)
> each SSA_NAME in large_huge.m_names bitmap maps to.
> 
> So, the following patch adds dumping of that, so that we know that say
> _3 -> bitint.3
> _8 -> bitint.7
> _16 -> bitint.7
> etc.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-01-27  Jakub Jelinek  
> 
>* gimple-lower-bitint.cc (gimple_lower_bitint): For
>TDF_DETAILS dump mapping of SSA_NAMEs to decls.
> 
> --- gcc/gimple-lower-bitint.cc.jj2024-01-26 00:07:35.629797857 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-26 17:40:29.083814064 +0100
> @@ -6344,22 +6344,33 @@ gimple_lower_bitint (void)
>  }
>  }
>   tree atype = NULL_TREE;
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +fprintf (dump_file, "Mapping SSA_NAMEs to decls:\n");
>   EXECUTE_IF_SET_IN_BITMAP (large_huge.m_names, 0, i, bi)
>{
>  tree s = ssa_name (i);
>  int p = var_to_partition (large_huge.m_map, s);
> -  if (large_huge.m_vars[p] != NULL_TREE)
> -continue;
> -  if (atype == NULL_TREE
> -  || !tree_int_cst_equal (TYPE_SIZE (atype),
> -  TYPE_SIZE (TREE_TYPE (s
> +  if (large_huge.m_vars[p] == NULL_TREE)
>{
> -  unsigned HOST_WIDE_INT nelts
> -= tree_to_uhwi (TYPE_SIZE (TREE_TYPE (s))) / limb_prec;
> -  atype = build_array_type_nelts (large_huge.m_limb_type, nelts);
> +  if (atype == NULL_TREE
> +  || !tree_int_cst_equal (TYPE_SIZE (atype),
> +  TYPE_SIZE (TREE_TYPE (s
> +{
> +  unsigned HOST_WIDE_INT nelts
> += tree_to_uhwi (TYPE_SIZE (TREE_TYPE (s))) / limb_prec;
> +  atype = build_array_type_nelts (large_huge.m_limb_type,
> +  nelts);
> +}
> +  large_huge.m_vars[p] = create_tmp_var (atype, "bitint");
> +  mark_addressable (large_huge.m_vars[p]);
> +}
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +{
> +  print_generic_expr (dump_file, s, TDF_SLIM);
> +  fprintf (dump_file, " -> ");
> +  print_generic_expr (dump_file, large_huge.m_vars[p], TDF_SLIM);
> +  fprintf (dump_file, "\n");
>}
> -  large_huge.m_vars[p] = create_tmp_var (atype, "bitint");
> -  mark_addressable (large_huge.m_vars[p]);
>}
> }
> 
> 
>Jakub
> 


Re: [PATCH] lower-bitint: Avoid sign-extending cast to unsigned types feeding div/mod/float [PR113614]

2024-01-27 Thread Richard Biener



> Am 27.01.2024 um 09:18 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The following testcase is miscompiled, because some narrower value
> is sign-extended to wider unsigned _BitInt used as division operand.
> handle_operand_addr for that case returns the narrower value and
> precision -prec_of_narrower_value.  That works fine for multiplication
> (at least, normal multiplication, but we don't merge casts with
> .MUL_OVERFLOW or the ubsan multiplication right now), because the
> result is the same whether we treat the arguments as signed or unsigned.
> But is completely wrong for division/modulo or conversions to
> floating-point, if we pass negative prec for an input operand of a libgcc
> handler, those treat it like a negative number, not an unsigned one
> sign-extended from something smaller (and it doesn't know to what precision
> it has been extended).
> 
> So, the following patch fixes it by making sure we don't merge such
> sign-extensions to unsigned _BitInt type with division, modulo or
> conversions to floating point.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-01-27  Jakub Jelinek  
> 
>PR tree-optimization/113614
>* gimple-lower-bitint.cc (gimple_lower_bitint): Don't merge
>widening casts from signed to unsigned types with TRUNC_DIV_EXPR,
>TRUNC_MOD_EXPR or FLOAT_EXPR uses.
> 
>* gcc.dg/torture/bitint-54.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj2024-01-26 18:05:24.461891138 +0100
> +++ gcc/gimple-lower-bitint.cc2024-01-26 19:04:07.948780942 +0100
> @@ -6102,17 +6102,27 @@ gimple_lower_bitint (void)
>  && (TREE_CODE (rhs1) != SSA_NAME
>  || !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs1)))
>{
> -  if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE
> -  || (bitint_precision_kind (TREE_TYPE (rhs1))
> -  < bitint_prec_large))
> -continue;
>  if (is_gimple_assign (use_stmt))
>switch (gimple_assign_rhs_code (use_stmt))
>  {
> -  case MULT_EXPR:
>  case TRUNC_DIV_EXPR:
>  case TRUNC_MOD_EXPR:
>  case FLOAT_EXPR:
> +/* For division, modulo and casts to floating
> +   point, avoid representing unsigned operands
> +   using negative prec if they were sign-extended
> +   from narrower precision.  */
> +if (TYPE_UNSIGNED (TREE_TYPE (s))
> +&& !TYPE_UNSIGNED (TREE_TYPE (rhs1))
> +&& (TYPE_PRECISION (TREE_TYPE (s))
> +> TYPE_PRECISION (TREE_TYPE (rhs1
> +  goto force_name;
> +/* FALLTHRU */
> +  case MULT_EXPR:
> +if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE
> +|| (bitint_precision_kind (TREE_TYPE (rhs1))
> +< bitint_prec_large))
> +  continue;
>/* Uses which use handle_operand_addr can't
>   deal with nested casts.  */
>if (TREE_CODE (rhs1) == SSA_NAME
> @@ -6126,6 +6136,10 @@ gimple_lower_bitint (void)
>  default:
>break;
>}
> +  if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE
> +  || (bitint_precision_kind (TREE_TYPE (rhs1))
> +  < bitint_prec_large))
> +continue;
>  if ((TYPE_PRECISION (TREE_TYPE (rhs1))
>   >= TYPE_PRECISION (TREE_TYPE (s)))
>  && mergeable_op (use_stmt))
> --- gcc/testsuite/gcc.dg/torture/bitint-54.c.jj2024-01-26 
> 19:09:01.436688318 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-54.c2024-01-26 19:16:24.908504368 
> +0100
> @@ -0,0 +1,29 @@
> +/* PR tree-optimization/113614 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23 -pedantic-errors" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +_BitInt(8) a;
> +_BitInt(8) b;
> +_BitInt(8) c;
> +
> +#if __BITINT_MAXWIDTH__ >= 256
> +_BitInt(256)
> +foo (_BitInt(8) y, unsigned _BitInt(256) z)
> +{
> +  unsigned _BitInt(256) d = -y;
> +  z /= d;
> +  return z + a + b + c;
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __BITINT_MAXWIDTH__ >= 256
> +  if (foo (0xfwb, 0x24euwb))
> +__builtin_abort ();
> +#endif
> +  return 0;
> +}
> 
>Jakub
> 


Re: [PATCH v3 0/2] x86: Don't save callee-saved registers if not needed

2024-01-27 Thread H.J. Lu
On Wed, Jan 24, 2024 at 7:36 PM Hongtao Liu  wrote:
>
> On Tue, Jan 23, 2024 at 11:00 PM H.J. Lu  wrote:
> >
> > Changes in v3:
> >
> > 1. Rebase against commit 02e68389494
> > 2. Don't add call_no_callee_saved_registers to machine_function since
> > all callee-saved registers are properly clobbered by callee with
> > no_callee_saved_registers attribute.
> >
> The patch LGTM, it should be low risk since there's already
> no_caller_save_registers attribute, the patch just extends to
> no_callee_save_registers with the same approach.
> So if there's no objection(or any concerns) in the next couple days,
> I'm ok for the patch to be in GCC14 and backport.

I am checking it in.

Thanks.

H.J.
> > Changes in v2:
> >
> > 1. Rebase against commit f9df00340e3
> > 2. Don't add redundant clobbered_registers check in ix86_expand_call.
> >
> > In some cases, there are no need to save callee-saved registers:
> >
> > 1. If a noreturn function doesn't throw nor support exceptions, it can
> > skip saving callee-saved registers.
> >
> > 2. When an interrupt handler is implemented by an assembly stub which does:
> >
> >   1. Save all registers.
> >   2. Call a C function.
> >   3. Restore all registers.
> >   4. Return from interrupt.
> >
> > it is completely unnecessary to save and restore any registers in the C
> > function called by the assembly stub, even if they would normally be
> > callee-saved.
> >
> > This patch set adds no_callee_saved_registers function attribute, which
> > is complementary to no_caller_saved_registers function attribute, to
> > classify x86 backend call-saved register handling type with
> >
> >   1. Default call-saved registers.
> >   2. No caller-saved registers with no_caller_saved_registers attribute.
> >   3. No callee-saved registers with no_callee_saved_registers attribute.
> >
> > Functions of no callee-saved registers won't save callee-saved registers.
> > If a noreturn function doesn't throw nor support exceptions, it is
> > classified as the no callee-saved registers type.
> >
> > With these changes, __libc_start_main in glibc 2.39, which is a noreturn
> > function, is changed from
> >
> > __libc_start_main:
> > endbr64
> > push   %r15
> > push   %r14
> > mov%rcx,%r14
> > push   %r13
> > push   %r12
> > push   %rbp
> > mov%esi,%ebp
> > push   %rbx
> > mov%rdx,%rbx
> > sub$0x28,%rsp
> > mov%rdi,(%rsp)
> > mov%fs:0x28,%rax
> > mov%rax,0x18(%rsp)
> > xor%eax,%eax
> > test   %r9,%r9
> >
> > to
> >
> > __libc_start_main:
> > endbr64
> > sub$0x28,%rsp
> > mov%esi,%ebp
> > mov%rdx,%rbx
> > mov%rcx,%r14
> > mov%rdi,(%rsp)
> > mov%fs:0x28,%rax
> > mov%rax,0x18(%rsp)
> > xor%eax,%eax
> > test   %r9,%r9
> >
> > In Linux kernel 6.7.0 on x86-64, do_exit is changed from
> >
> > do_exit:
> > endbr64
> > call   
> > push   %r15
> > push   %r14
> > push   %r13
> > push   %r12
> > mov%rdi,%r12
> > push   %rbp
> > push   %rbx
> > mov%gs:0x0,%rbx
> > sub$0x28,%rsp
> > mov%gs:0x28,%rax
> > mov%rax,0x20(%rsp)
> > xor%eax,%eax
> > call   *0x0(%rip)# 
> > test   $0x2,%ah
> > je 
> >
> > to
> >
> > do_exit:
> > endbr64
> > call   
> > sub$0x28,%rsp
> > mov%rdi,%r12
> > mov%gs:0x28,%rax
> > mov%rax,0x20(%rsp)
> > xor%eax,%eax
> > mov%gs:0x0,%rbx
> > call   *0x0(%rip)# 
> > test   $0x2,%ah
> > je 
> >
> > I compared GCC master branch bootstrap and test times on a slow machine
> > with 6.6 Linux kernels compiled with the original GCC 13 and the GCC 13
> > with the backported patch.  The performance data isn't precise since the
> > measurements were done on different days with different GCC sources under
> > different 6.6 kernel versions.
> >
> > GCC master branch build time in seconds:
> >
> > beforeafter  improvement
> > 30043.75user  30013.16user   0%
> > 1274.85system 1243.72system  2.4%
> >
> > GCC master branch test time in seconds (new tests added):
> >
> > beforeafter  improvement
> > 216035.90user 216547.51user  0
> > 27365.51system26658.54system 2.6%
> >
> > Backported to GCC 13 to rebuild system glibc and kernel on Fedora 39.
> > Systems perform normally.
> >
> >
> > H.J. Lu (2):
> >   x86: Add no_callee_saved_registers function attribute
> >   x86: Don't save callee-saved registers in noreturn functions
> >
> >  gcc/config/i386/i386-expand.cc| 52 +---
> >  gcc/config/i386/i386-options.cc 

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-27 Thread Xi Ruoyao
On Sat, 2024-01-27 at 18:02 +0800, Xi Ruoyao wrote:
> On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote:
> > 
> > 在 2024/1/26 下午6:57, Xi Ruoyao 写道:
> > > On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:
> > > > 在 2024/1/26 下午4:49, Xi Ruoyao 写道:
> > > > > On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:
> > > > > > v3 -> v4:
> > > > > >     1. Add macro support for TLS symbols
> > > > > >     2. Added support for loading __get_tls_addr symbol address 
> > > > > > using call36.
> > > > > >     3. Merge template got_load_tls_{ld/gd/le/ie}.
> > > > > >     4. Enable explicit reloc for extreme TLS GD/LD with 
> > > > > > -mexplicit-relocs=auto.
> > > > > I've rebased and attached the patch to fix the bad split in 
> > > > > -mexplicit-
> > > > > relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
> > > > > tested it seriously though (only tested the added and modified test
> > > > > cases).
> > > > > 
> > > > OK, I'll test the spec for correctness.
> > > I suppose this still won't work yet because Binutils is not fully fixed.
> > > GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
> > > foo", but ld is still not checking if an R_LARCH_RELAX is after
> > > R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
> > > transition can still happen.
> > > 
> > 
> > The following situations are not handled in the patch:
> > 
> > diff --git a/gcc/config/loongarch/loongarch.cc 
> > b/gcc/config/loongarch/loongarch.cc
> > 
> > index 3fab4b64453..6336a9f696f 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree 
> > thunk_fndecl ATTRIBUTE_UNUSED,
> >   {
> >     if (TARGET_CMODEL_EXTREME)
> >  {
> > - emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
> > + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
> > +   {
> > + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
> > + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
> > +   }
> > + else
> > +   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));

It looks like this part is unreachable: with -mcmodel=extreme
use_sibcall_p will never be true.

So cleaned up this part and fixed an ERROR in the added test:

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3a97ba61362..7b8c85a1606 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -7481,21 +7481,24 @@ loongarch_output_mi_thunk (FILE *file, tree 
thunk_fndecl ATTRIBUTE_UNUSED,
  allowed, otherwise load the address into a register first.  */
   if (use_sibcall_p)
 {
-  if (TARGET_CMODEL_EXTREME)
-   {
- emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
- insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx));
-   }
-  else
-   insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx));
+  /* If TARGET_CMODEL_EXTREME, we cannot do a direct jump at all
+and const_call_insn_operand should have returned false.  */
+  gcc_assert (!TARGET_CMODEL_EXTREME);
+
+  insn = emit_call_insn (gen_sibcall_internal (fnaddr, const0_rtx));
   SIBLING_CALL_P (insn) = 1;
 }
   else
 {
-  if (TARGET_CMODEL_EXTREME)
+  if (!TARGET_CMODEL_EXTREME)
+   loongarch_emit_move (temp1, fnaddr);
+  else if (la_opt_explicit_relocs == EXPLICIT_RELOCS_NONE)
emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
   else
-   loongarch_emit_move (temp1, fnaddr);
+   {
+ emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
+ emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
+   }
 
   emit_jump_insn (gen_indirect_jump (temp1));
 }
diff --git 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
index 27baf4886d6..35bd4570a9e 100644
--- 
a/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
+++ 
b/gcc/testsuite/gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fPIC -mexplicit-relocs=auto -mcmodel=extreme -fno-plt" } 
*/
-/* { dg-final { scan-assembler-not "la.tls.[lg]d" { target tls_native } } } */
+/* { dg-final { scan-assembler-not "la.tls.\[lg\]d" { target tls_native } } } 
*/
 
 #include "./explicit-relocs-auto-tls-ld-gd.c"

And added 3 tests for output_mi_thunk.  The updated patch attached, now
running regression test.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From ecbadf341234fcec2e0c16e6b2435d117bf80446 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Fri, 5 Jan 2024 18:40:06 +0800
Subject: [PATCH 5/4] LoongArch: Don't split the instructions 

Re: [PATCH v4 0/4] When cmodel=extreme, add macro support and only support macros.

2024-01-27 Thread Xi Ruoyao
On Sat, 2024-01-27 at 11:15 +0800, chenglulu wrote:
> 
> 在 2024/1/26 下午6:57, Xi Ruoyao 写道:
> > On Fri, 2024-01-26 at 16:59 +0800, chenglulu wrote:
> > > 在 2024/1/26 下午4:49, Xi Ruoyao 写道:
> > > > On Fri, 2024-01-26 at 15:37 +0800, Lulu Cheng wrote:
> > > > > v3 -> v4:
> > > > >     1. Add macro support for TLS symbols
> > > > >     2. Added support for loading __get_tls_addr symbol address using 
> > > > > call36.
> > > > >     3. Merge template got_load_tls_{ld/gd/le/ie}.
> > > > >     4. Enable explicit reloc for extreme TLS GD/LD with 
> > > > > -mexplicit-relocs=auto.
> > > > I've rebased and attached the patch to fix the bad split in -mexplicit-
> > > > relocs={always,auto} -mcmodel=extreme on top of this series.  I've not
> > > > tested it seriously though (only tested the added and modified test
> > > > cases).
> > > > 
> > > OK, I'll test the spec for correctness.
> > I suppose this still won't work yet because Binutils is not fully fixed.
> > GAS has been changed not to emit R_LARCH_RELAX for "la.tls.ie a0, t0,
> > foo", but ld is still not checking if an R_LARCH_RELAX is after
> > R_LARCH_TLS_IE_PC_{HI20,LO12} properly.  Thus an invalid "partial" TLS
> > transition can still happen.
> > 
> 
> The following situations are not handled in the patch:
> 
> diff --git a/gcc/config/loongarch/loongarch.cc 
> b/gcc/config/loongarch/loongarch.cc
> 
> index 3fab4b64453..6336a9f696f 100644
> --- a/gcc/config/loongarch/loongarch.cc
> +++ b/gcc/config/loongarch/loongarch.cc
> @@ -7472,7 +7472,13 @@ loongarch_output_mi_thunk (FILE *file, tree 
> thunk_fndecl ATTRIBUTE_UNUSED,
>   {
>     if (TARGET_CMODEL_EXTREME)
>  {
> - emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
> + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
> +   {
> + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
> + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
> +   }
> + else
> +   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
>    insn = emit_call_insn (gen_sibcall_internal (temp1, const0_rtx));
>  }
>     else
> @@ -7482,7 +7488,15 @@ loongarch_output_mi_thunk (FILE *file, tree 
> thunk_fndecl ATTRIBUTE_UNUSED,
>     else
>   {
>     if (TARGET_CMODEL_EXTREME)
> -   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
> +   {
> + if (la_opt_explicit_relocs != EXPLICIT_RELOCS_NONE)
> +   {
> + emit_insn (gen_la_pcrel64_two_parts (temp1, temp2, fnaddr));
> + emit_move_insn (temp1, gen_rtx_PLUS (Pmode, temp1, temp2));
> +   }
> + else
> +   emit_insn (gen_movdi_symbolic_off64 (temp1, fnaddr, temp2));
> +   }
>     else
>  loongarch_emit_move (temp1, fnaddr);

In deed.  Considering the similarity of these two hunks I'll separate
the logic into a static function though.  And I'll also add some test
case for them...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] lower-bitint: Avoid sign-extending cast to unsigned types feeding div/mod/float [PR113614]

2024-01-27 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled, because some narrower value
is sign-extended to wider unsigned _BitInt used as division operand.
handle_operand_addr for that case returns the narrower value and
precision -prec_of_narrower_value.  That works fine for multiplication
(at least, normal multiplication, but we don't merge casts with
.MUL_OVERFLOW or the ubsan multiplication right now), because the
result is the same whether we treat the arguments as signed or unsigned.
But is completely wrong for division/modulo or conversions to
floating-point, if we pass negative prec for an input operand of a libgcc
handler, those treat it like a negative number, not an unsigned one
sign-extended from something smaller (and it doesn't know to what precision
it has been extended).

So, the following patch fixes it by making sure we don't merge such
sign-extensions to unsigned _BitInt type with division, modulo or
conversions to floating point.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-27  Jakub Jelinek  

PR tree-optimization/113614
* gimple-lower-bitint.cc (gimple_lower_bitint): Don't merge
widening casts from signed to unsigned types with TRUNC_DIV_EXPR,
TRUNC_MOD_EXPR or FLOAT_EXPR uses.

* gcc.dg/torture/bitint-54.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-01-26 18:05:24.461891138 +0100
+++ gcc/gimple-lower-bitint.cc  2024-01-26 19:04:07.948780942 +0100
@@ -6102,17 +6102,27 @@ gimple_lower_bitint (void)
  && (TREE_CODE (rhs1) != SSA_NAME
  || !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs1)))
{
- if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE
- || (bitint_precision_kind (TREE_TYPE (rhs1))
- < bitint_prec_large))
-   continue;
  if (is_gimple_assign (use_stmt))
switch (gimple_assign_rhs_code (use_stmt))
  {
- case MULT_EXPR:
  case TRUNC_DIV_EXPR:
  case TRUNC_MOD_EXPR:
  case FLOAT_EXPR:
+   /* For division, modulo and casts to floating
+  point, avoid representing unsigned operands
+  using negative prec if they were sign-extended
+  from narrower precision.  */
+   if (TYPE_UNSIGNED (TREE_TYPE (s))
+   && !TYPE_UNSIGNED (TREE_TYPE (rhs1))
+   && (TYPE_PRECISION (TREE_TYPE (s))
+   > TYPE_PRECISION (TREE_TYPE (rhs1
+ goto force_name;
+   /* FALLTHRU */
+ case MULT_EXPR:
+   if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE
+   || (bitint_precision_kind (TREE_TYPE (rhs1))
+   < bitint_prec_large))
+ continue;
/* Uses which use handle_operand_addr can't
   deal with nested casts.  */
if (TREE_CODE (rhs1) == SSA_NAME
@@ -6126,6 +6136,10 @@ gimple_lower_bitint (void)
  default:
break;
}
+ if (TREE_CODE (TREE_TYPE (rhs1)) != BITINT_TYPE
+ || (bitint_precision_kind (TREE_TYPE (rhs1))
+ < bitint_prec_large))
+   continue;
  if ((TYPE_PRECISION (TREE_TYPE (rhs1))
   >= TYPE_PRECISION (TREE_TYPE (s)))
  && mergeable_op (use_stmt))
--- gcc/testsuite/gcc.dg/torture/bitint-54.c.jj 2024-01-26 19:09:01.436688318 
+0100
+++ gcc/testsuite/gcc.dg/torture/bitint-54.c2024-01-26 19:16:24.908504368 
+0100
@@ -0,0 +1,29 @@
+/* PR tree-optimization/113614 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+_BitInt(8) a;
+_BitInt(8) b;
+_BitInt(8) c;
+
+#if __BITINT_MAXWIDTH__ >= 256
+_BitInt(256)
+foo (_BitInt(8) y, unsigned _BitInt(256) z)
+{
+  unsigned _BitInt(256) d = -y;
+  z /= d;
+  return z + a + b + c;
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 256
+  if (foo (0xfwb, 0x24euwb))
+__builtin_abort ();
+#endif
+  return 0;
+}

Jakub



[PATCH] lower-bitint: Fix up VIEW_CONVERT_EXPR handling in lower_mergeable_stmt [PR113568]

2024-01-27 Thread Jakub Jelinek
Hi!

We generally allow merging mergeable stmts with some final cast (but not
further casts or mergeable operations after the cast).  As some casts
are handled conditionally, if (idx < cst) handle_operand (idx); else if
idx == cst) handle_operand (cst); else ..., we must sure that e.g. the
mergeable PLUS_EXPR/MINUS_EXPR/NEGATE_EXPR never appear in handle_operand
called from such casts, because it ICEs on invalid SSA_NAME form (that part
could be fixable by adding further PHIs) but also because we'd need to
correctly propagate the overflow flags from the if to else if.
So, instead lower_mergeable_stmt handles an outermost widening cast (or
widening cast feeding outermost store) specially.
The problem was similar to PR113408, that VIEW_CONVERT_EXPR tree is
present in the gimple_assign_rhs1 while it is not for NOP_EXPR/CONVERT_EXPR,
so the checks whether the outermost cast should be handled didn't handle
the VCE case and so handle_plus_minus was called from the conditional
handle_cast.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2024-01-27  Jakub Jelinek  

PR tree-optimization/113568
* gimple-lower-bitint.cc (bitint_large_huge::lower_mergeable_stmt):
For VIEW_CONVERT_EXPR use first operand of rhs1 instead of rhs1
in the widening extension checks.

* gcc.dg/bitint-78.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-01-26 17:40:29.083814064 +0100
+++ gcc/gimple-lower-bitint.cc  2024-01-26 18:05:24.461891138 +0100
@@ -2401,6 +2401,8 @@ bitint_large_huge::lower_mergeable_stmt
   rhs1 = gimple_assign_rhs1 (store_operand
 ? SSA_NAME_DEF_STMT (store_operand)
 : stmt);
+  if (TREE_CODE (rhs1) == VIEW_CONVERT_EXPR)
+   rhs1 = TREE_OPERAND (rhs1, 0);
   /* Optimize mergeable ops ending with widening cast to _BitInt
 (or followed by store).  We can lower just the limbs of the
 cast operand and widen afterwards.  */
--- gcc/testsuite/gcc.dg/bitint-78.c.jj 2024-01-26 18:11:54.164435951 +0100
+++ gcc/testsuite/gcc.dg/bitint-78.c2024-01-26 18:11:33.642723218 +0100
@@ -0,0 +1,21 @@
+/* PR tree-optimization/113568 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-O2 -std=c23" } */
+
+signed char c;
+#if __BITINT_MAXWIDTH__ >= 464
+_BitInt(464) g;
+
+void
+foo (void)
+{
+  _BitInt(464) a[2] = {};
+  _BitInt(464) b;
+  while (c)
+{
+  b = g + 1;
+  g = a[0];
+  a[0] = b;
+}
+}
+#endif

Jakub



[PATCH] lower-bitint: Add debugging dump of SSA_NAME -> decl mappings

2024-01-27 Thread Jakub Jelinek
Hi!

While the SSA coalescing performed by lower bitint prints some information
if -fdump-tree-bitintlower-details, it is really hard to read and doesn't
contain the most important information which one looks for when debugging
bitint lowering issues, namely what VAR_DECLs (or PARM_DECLs/RESULT_DECLs)
each SSA_NAME in large_huge.m_names bitmap maps to.

So, the following patch adds dumping of that, so that we know that say
_3 -> bitint.3
_8 -> bitint.7
_16 -> bitint.7
etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-27  Jakub Jelinek  

* gimple-lower-bitint.cc (gimple_lower_bitint): For
TDF_DETAILS dump mapping of SSA_NAMEs to decls.

--- gcc/gimple-lower-bitint.cc.jj   2024-01-26 00:07:35.629797857 +0100
+++ gcc/gimple-lower-bitint.cc  2024-01-26 17:40:29.083814064 +0100
@@ -6344,22 +6344,33 @@ gimple_lower_bitint (void)
  }
  }
   tree atype = NULL_TREE;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Mapping SSA_NAMEs to decls:\n");
   EXECUTE_IF_SET_IN_BITMAP (large_huge.m_names, 0, i, bi)
{
  tree s = ssa_name (i);
  int p = var_to_partition (large_huge.m_map, s);
- if (large_huge.m_vars[p] != NULL_TREE)
-   continue;
- if (atype == NULL_TREE
- || !tree_int_cst_equal (TYPE_SIZE (atype),
- TYPE_SIZE (TREE_TYPE (s
+ if (large_huge.m_vars[p] == NULL_TREE)
{
- unsigned HOST_WIDE_INT nelts
-   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (s))) / limb_prec;
- atype = build_array_type_nelts (large_huge.m_limb_type, nelts);
+ if (atype == NULL_TREE
+ || !tree_int_cst_equal (TYPE_SIZE (atype),
+ TYPE_SIZE (TREE_TYPE (s
+   {
+ unsigned HOST_WIDE_INT nelts
+   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (s))) / limb_prec;
+ atype = build_array_type_nelts (large_huge.m_limb_type,
+ nelts);
+   }
+ large_huge.m_vars[p] = create_tmp_var (atype, "bitint");
+ mark_addressable (large_huge.m_vars[p]);
+   }
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ print_generic_expr (dump_file, s, TDF_SLIM);
+ fprintf (dump_file, " -> ");
+ print_generic_expr (dump_file, large_huge.m_vars[p], TDF_SLIM);
+ fprintf (dump_file, "\n");
}
- large_huge.m_vars[p] = create_tmp_var (atype, "bitint");
- mark_addressable (large_huge.m_vars[p]);
}
 }
 

Jakub



[PATCH] libgcc: Fix up _BitInt division [PR113604]

2024-01-27 Thread Jakub Jelinek
Hi!

The following testcase ends up with SIGFPE in __divmodbitint4.
The problem is a thinko in my attempt to implement Knuth's algorithm.

The algorithm does (where b is 65536, i.e. one larger than what
fits in their unsigned short word):
// Compute estimate qhat of q[j].
qhat = (un[j+n]*b + un[j+n-1])/vn[n-1];
rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1];
  again:
if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2])
{ qhat = qhat - 1;
  rhat = rhat + vn[n-1];
  if (rhat < b) goto again;
}
The problem is that it uses a double-word / word -> double-word
division (and modulo), while all we have is udiv_qrnnd unless
we'd want to do further library calls, and udiv_qrnnd is a
double-word / word -> word division and modulo.
Now, as the algorithm description says, it can produce at most
word bits + 1 bit quotient.  And I believe that actually the
highest qhat the original algorithm can produce is
(1 << word_bits) + 1.  The algorithm performs earlier canonicalization
where both the divisor and dividend are shifted left such that divisor
has msb set.  If it has msb set already before, no shifting occurs but
we start with added 0 limb, so in the first uv1:uv0 double-word uv1
is 0 and so we can't get too high qhat, if shifting occurs, the first
limb of dividend is shifted right by UWtype bits - shift count into
a new limb, so again in the first iteration in the uv1:uv0 double-word
uv1 doesn't have msb set while vv1 does and qhat has to fit into word.
In the following iterations, previous iteration should guarantee that
the previous quotient digit is correct.  Even if the divisor was the
maximal possible vv1:all_ones_in_all_lower_limbs, if the old uv0:lower_limbs
would be larger or equal to the divisor, the previous quotient digit
would increase and another divisor would be subtracted, which I think
implies that in the next iteration in uv1:uv0 double-word uv1 <= vv1,
but uv0 could be up to all ones, e.g. in case of all lower limbs
of divisor being all ones and at least one dividend limb below uv0
being not all ones.  So, we can e.g. for 64-bit UWtype see
uv1:uv0 / vv1 0x8000UL:0xUL / 0x8000UL
or 0xUL:0xUL / 0xUL
In all these cases (when uv1 == vv1 && uv0 >= uv1), qhat is
0x10001UL, i.e. 2 more than fits into UWtype result,
if uv1 == vv1 && uv0 < uv1 it would be 0x1UL, i.e.
1 more than fits into UWtype result.
Because we only have udiv_qrnnd which can't deal with those too large
cases (SIGFPEs or otherwise invokes undefined behavior on those), I've
tried to handle the uv1 >= vv1 case separately, but for one thing
I thought it would be at most 1 larger than what fits, and for two
have actually subtracted vv1:vv1 from uv1:uv0 instead of subtracting
0:vv1 from uv1:uv0.
For the uv1 < vv1 case, the implementation already performs roughly
what the algorithm does.
Now, let's see what happens with the two possible extra cases in
the original algorithm.
If uv1 == vv1 && uv0 < uv1, qhat above would be b, so we take
if (qhat >= b, decrement qhat by 1 (it becomes b - 1), add
vn[n-1] aka vv1 to rhat and goto again if rhat < b (but because
qhat already fits we can goto to the again label in the uv1 < vv1
code).  rhat in this case is uv0 and rhat + vv1 can but doesn't
have to overflow, say for uv0 42UL and vv1 0x8000UL
it will not (and so we should goto again), while for uv0
0x8000UL and vv1 0x8001UL it will (and
we shouldn't goto again).
If uv1 == vv1 && uv0 >= uv1, qhat above would be b + 1, so we
take if (qhat >= b, decrement qhat by 1 (it becomes b), add
vn[n-1] aka vv1 to rhat. But because vv1 has msb set and
rhat in this case is uv0 - vv1, the rhat + vv1 addition
certainly doesn't overflow, because (uv0 - vv1) + vv1 is uv0,
so in the algorithm we goto again, again take if (qhat >= b and
decrement qhat so it finally becomes b - 1, and add vn[n-1]
aka vv1 to rhat again.  But this time I believe it must always
overflow, simply because we added (uv0 - vv1) + vv1 + vv1 and
vv1 has msb set, so already vv1 + vv1 must overflow.  And
because it overflowed, it will not goto again.
So, I believe the following patch implements this correctly, by
subtracting vv1 from uv1:uv0 double-word once, then comparing
again if uv1 >= vv1.  If that is true, subtract vv1 from uv1:uv0
again and add 2 * vv1 to rhat, no __builtin_add_overflow is needed
as we know it always overflowed and so won't goto again.
If after the first subtraction uv1 < vv1, use __builtin_add_overflow 
when adding vv1 to rhat, because it can but doesn't have to overflow.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-27  Jakub Jelinek  

PR libgcc/113604
* libgcc2.c (__divmodbitint4): If uv1 >= vv1, subtract
vv1 from uv1:uv0 once or twice as needed, rather than
subtracting vv1:vv1.

*