[PATCH] [i386] Prevent vectorization for load from parm_decl at O2 to avoid STF issue.

2022-03-03 Thread liuhongt via Gcc-patches
For parameter passing through stack, vectorized load from parm_decl
in callee may trigger serious STF issue. This is why GCC12 regresses
50% for cray at -O2 compared to GCC11.

The patch add an extremely large number to stmt_cost to prevent
vectorization for loads from parm_decl under very-cheap cost model,
this can at least prevent O2 regression due to STF issue, but may lose
some perf where there's no such issue(1 vector_load vs n scalar_load +
CTOR).

No impact for SPEC2017 for both plain O2 and native O2 on ICX.
Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/101908
* config/i386/i386.cc (ix86_load_maybe_stfs_p): New.
(ix86_vector_costs::add_stmt_cost): Add extra cost for
vector_load/unsigned_load which may have stall forward issue.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr101908-1.c: New test.
* gcc.target/i386/pr101908-2.c: New test.
---
 gcc/config/i386/i386.cc| 31 ++
 gcc/testsuite/gcc.target/i386/pr101908-1.c | 12 +
 gcc/testsuite/gcc.target/i386/pr101908-2.c | 12 +
 3 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101908-2.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b2bf90576d5..3bbaaf65ea8 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22976,6 +22976,19 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, 
struct noce_if_info *if_info)
   return default_noce_conversion_profitable_p (seq, if_info);
 }
 
+/* Return true if REF may have STF issue, otherwise false.  */
+static bool
+ix86_load_maybe_stfs_p (tree ref)
+{
+  tree addr = get_base_address (ref);
+
+  if (TREE_CODE (addr) != PARM_DECL
+  || !tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (addr)))
+  || tree_to_uhwi (TYPE_SIZE (TREE_TYPE (addr))) <= MAX_BITS_PER_WORD)
+return false;
+  return true;
+}
+
 /* x86-specific vector costs.  */
 class ix86_vector_costs : public vector_costs
 {
@@ -23203,6 +23216,24 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
if (TREE_CODE (op) == SSA_NAME)
  TREE_VISITED (op) = 0;
 }
+
+  /* Prevent vectorization for load from parm_decl at O2 to avoid STF issue.
+ Performance may lose when there's no STF issue(1 vector_load vs n
+ scalar_load + CTOR).
+ TODO: both extra cost(2000) and ix86_load_maybe_stfs_p need to be fine
+ tuned.  */
+  if ((kind == vector_load || kind == unaligned_load)
+  && flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP
+  && stmt_info
+  && stmt_info->slp_type == pure_slp
+  && stmt_info->stmt
+  && gimple_assign_load_p (stmt_info->stmt)
+  && ix86_load_maybe_stfs_p (gimple_assign_rhs1 (stmt_info->stmt)))
+{
+  stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
+  stmt_cost += 2000;
+}
+
   if (stmt_cost == -1)
 stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
 
diff --git a/gcc/testsuite/gcc.target/i386/pr101908-1.c 
b/gcc/testsuite/gcc.target/i386/pr101908-1.c
new file mode 100644
index 000..f8e0f2e26bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101908-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slp-details" } */
+/* { dg-final { scan-tree-dump {(?n)add new stmt:.*MEM \} 
"slp2" } } */
+
+struct X { double x[2]; };
+typedef double v2df __attribute__((vector_size(16)));
+
+v2df __attribute__((noipa))
+foo (struct X* x, struct X* y)
+{
+  return (v2df) {x->x[1], x->x[0] } + (v2df) { y->x[1], y->x[0] };
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr101908-2.c 
b/gcc/testsuite/gcc.target/i386/pr101908-2.c
new file mode 100644
index 000..7f2f00cebab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101908-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slp-details" } */
+/* { dg-final { scan-tree-dump-not {(?n)add new stmt:.*MEM \} "slp2" } } */
+
+struct X { double x[2]; };
+typedef double v2df __attribute__((vector_size(16)));
+
+v2df __attribute__((noipa))
+foo (struct X x, struct X y)
+{
+  return (v2df) {x.x[1], x.x[0] } + (v2df) { y.x[1], y.x[0] };
+}
-- 
2.18.1



[PATCH v8 12/12] LoongArch Port: Add doc.

2022-03-03 Thread xuchenghua
From: chenglulu 

2022-03-04  Chenghua Xu  
Lulu Cheng  

* contrib/config-list.mk: Add LoongArch triplet.
* gcc/doc/install.texi: Add LoongArch options section.
* gcc/doc/invoke.texi: Add LoongArch options section.
* gcc/doc/md.texi: Add LoongArch options section.
---
 contrib/config-list.mk |   5 +-
 gcc/doc/install.texi   |  47 +-
 gcc/doc/invoke.texi| 202 +
 gcc/doc/md.texi|  55 +++
 4 files changed, 303 insertions(+), 6 deletions(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index 3e1d1321861..ba6f12e4693 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -57,7 +57,10 @@ LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   i686-wrs-vxworksae \
   i686-cygwinOPT-enable-threads=yes i686-mingw32crt ia64-elf \
   ia64-freebsd6 ia64-linux ia64-hpux ia64-hp-vms iq2000-elf lm32-elf \
-  lm32-rtems lm32-uclinux m32c-rtems m32c-elf m32r-elf m32rle-elf \
+  lm32-rtems lm32-uclinux \
+  loongarch64-linux-gnu loongarch64-linux-gnuf64 \
+  loongarch64-linux-gnuf32 loongarch64-linux-gnusf \
+  m32c-rtems m32c-elf m32r-elf m32rle-elf \
   m68k-elf m68k-netbsdelf \
   m68k-uclinux m68k-linux m68k-rtems \
   mcore-elf microblaze-linux microblaze-elf \
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 7258f9def6c..5fb55b1d064 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -747,9 +747,9 @@ Here are the possible CPU types:
 @quotation
 aarch64, aarch64_be, alpha, alpha64, amdgcn, arc, arceb, arm, armeb, avr, bfin,
 bpf, cr16, cris, csky, epiphany, fido, fr30, frv, ft32, h8300, hppa, hppa2.0,
-hppa64, i486, i686, ia64, iq2000, lm32, m32c, m32r, m32rle, m68k, mcore,
-microblaze, microblazeel, mips, mips64, mips64el, mips64octeon, mips64orion,
-mips64vr, mipsel, mipsisa32, mipsisa32r2, mipsisa64, mipsisa64r2,
+hppa64, i486, i686, ia64, iq2000, lm32, loongarch64, m32c, m32r, m32rle, m68k,
+mcore, microblaze, microblazeel, mips, mips64, mips64el, mips64octeon,
+mips64orion, mips64vr, mipsel, mipsisa32, mipsisa32r2, mipsisa64, mipsisa64r2,
 mipsisa64r2el, mipsisa64sb1, mipsisa64sr71k, mipstx39, mmix, mn10300, moxie,
 msp430, nds32be, nds32le, nios2, nvptx, or1k, pdp11, powerpc, powerpc64,
 powerpc64le, powerpcle, pru, riscv32, riscv32be, riscv64, riscv64be, rl78, rx,
@@ -1166,8 +1166,9 @@ sysv, aix.
 @itemx --without-multilib-list
 Specify what multilibs to build.  @var{list} is a comma separated list of
 values, possibly consisting of a single value.  Currently only implemented
-for aarch64*-*-*, arm*-*-*, riscv*-*-*, sh*-*-* and x86-64-*-linux*.  The
-accepted values and meaning for each target is given below.
+for aarch64*-*-*, arm*-*-*, loongarch64-*-*, riscv*-*-*, sh*-*-* and
+x86-64-*-linux*.  The accepted values and meaning for each target is given
+below.
 
 @table @code
 @item aarch64*-*-*
@@ -1254,6 +1255,14 @@ profile.  The union of these options is considered when 
specifying both
 @code{-mfloat-abi=hard}
 @end multitable
 
+@item loongarch*-*-*
+@var{list} is a comma-separated list of the following ABI identifiers:
+@code{lp64d[/base]} @code{lp64f[/base]} @code{lp64d[/base]}, where the
+@code{/base} suffix may be omitted, to enable their respective run-time
+libraries.  If @var{list} is empty, @code{default}
+or @option{--with-multilib-list} is not specified, then the default ABI
+as specified by @option{--with-abi} or implied by @option{--target} is 
selected.
+
 @item riscv*-*-*
 @var{list} is a single ABI name.  The target architecture must be either
 @code{rv32gc} or @code{rv64gc}.  This will build a single multilib for the
@@ -4439,6 +4448,34 @@ This configuration is intended for embedded systems.
 Lattice Mico32 processor.
 This configuration is intended for embedded systems running uClinux.
 
+@html
+
+@end html
+@anchor{loongarch}
+@heading LoongArch
+LoongArch processor.
+The following LoongArch targets are available:
+@table @code
+@item loongarch64-linux-gnu*
+LoongArch processor running GNU/Linux.  This target triplet may be coupled
+with a small set of possible suffixes to identify their default ABI type:
+@table @code
+@item f64
+Uses @code{lp64d/base} ABI by default.
+@item f32
+Uses @code{lp64f/base} ABI by default.
+@item sf
+Uses @code{lp64s/base} ABI by default.
+@end table
+
+@item loongarch64-linux-gnu
+Same as @code{loongarch64-linux-gnuf64}, but may be used with
+@option{--with-abi=*} to configure the default ABI type.
+@end table
+
+More information about LoongArch can be found at
+@uref{https://github.com/loongson/LoongArch-Documentation}.
+
 @html
 
 @end html
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 248ed534aee..d884b30b96e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -995,6 +995,16 @@ Objective-C and Objective-C++ Dialects}.
 @gccoptlist{-mbarrel-shift-enabled  -mdivide-enabled  -mmultiply-enabled @gol
 -msign-extend-enabled  -muser-enabled}
 
+@emph{LoongArch Options}

[PATCH v8 11/12] LoongArch Port: gcc/testsuite

2022-03-03 Thread xuchenghua
From: chenglulu 

2022-03-04  Chenghua Xu  
Lulu Cheng  

gcc/testsuite/

* g++.dg/cpp0x/constexpr-rom.C: Add build options for LoongArch.
* g++.old-deja/g++.abi/ptrmem.C: Add LoongArch support.
* g++.old-deja/g++.pt/ptrmem6.C: xfail for LoongArch.
* gcc.dg/20020312-2.c: Add LoongArch support.
* gcc.dg/loop-8.c: Skip on LoongArch.
* gcc.dg/torture/stackalign/builtin-apply-2.c: Likewise.
* gcc.dg/tree-ssa/ssa-fre-3.c: Likewise.
* go.test/go-test.exp: Define the LoongArch target.
* lib/target-supports.exp: Like wise.
* gcc.target/loongarch/loongarch.exp: New file.
* gcc.target/loongarch/tst-asm-const.c: Like wise.
---
 gcc/testsuite/g++.dg/cpp0x/constexpr-rom.C|  2 +-
 gcc/testsuite/g++.old-deja/g++.abi/ptrmem.C   |  2 +-
 gcc/testsuite/g++.old-deja/g++.pt/ptrmem6.C   |  2 +-
 gcc/testsuite/gcc.dg/20020312-2.c |  2 +
 gcc/testsuite/gcc.dg/loop-8.c |  2 +-
 .../torture/stackalign/builtin-apply-2.c  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-3.c |  2 +-
 .../gcc.target/loongarch/loongarch.exp| 40 +++
 .../gcc.target/loongarch/tst-asm-const.c  | 16 
 gcc/testsuite/go.test/go-test.exp |  3 ++
 gcc/testsuite/lib/target-supports.exp | 14 +++
 11 files changed, 81 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/loongarch.exp
 create mode 100644 gcc/testsuite/gcc.target/loongarch/tst-asm-const.c

diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-rom.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-rom.C
index 2e0ef685f36..424979a604b 100644
--- a/gcc/testsuite/g++.dg/cpp0x/constexpr-rom.C
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-rom.C
@@ -1,6 +1,6 @@
 // PR c++/49673: check that test_data goes into .rodata
 // { dg-do compile { target c++11 } }
-// { dg-additional-options -G0 { target { { alpha*-*-* frv*-*-* ia64-*-* 
lm32*-*-* m32r*-*-* microblaze*-*-* mips*-*-* nios2-*-* powerpc*-*-* 
rs6000*-*-* } && { ! { *-*-darwin* *-*-aix* alpha*-*-*vms* } } } } }
+// { dg-additional-options -G0 { target { { alpha*-*-* frv*-*-* ia64-*-* 
lm32*-*-* m32r*-*-* microblaze*-*-* mips*-*-* loongarch*-*-* nios2-*-* 
powerpc*-*-* rs6000*-*-* } && { ! { *-*-darwin* *-*-aix* alpha*-*-*vms* } } } } 
}
 // { dg-final { scan-assembler "\\.rdata" { target mips*-*-* } } }
 // { dg-final { scan-assembler "rodata" { target { { *-*-linux-gnu *-*-gnu* 
*-*-elf } && { ! { mips*-*-* riscv*-*-* } } } } } }
 
diff --git a/gcc/testsuite/g++.old-deja/g++.abi/ptrmem.C 
b/gcc/testsuite/g++.old-deja/g++.abi/ptrmem.C
index bda7960d8a2..f69000e9081 100644
--- a/gcc/testsuite/g++.old-deja/g++.abi/ptrmem.C
+++ b/gcc/testsuite/g++.old-deja/g++.abi/ptrmem.C
@@ -7,7 +7,7 @@
function.  However, some platforms use all bits to encode a
function pointer.  Such platforms use the lowest bit of the delta,
that is shifted left by one bit.  */
-#if defined __MN10300__ || defined __SH5__ || defined __arm__ || defined 
__thumb__ || defined __mips__ || defined __aarch64__ || defined __PRU__
+#if defined __MN10300__ || defined __SH5__ || defined __arm__ || defined 
__thumb__ || defined __mips__ || defined __aarch64__ || defined __PRU__ || 
defined __loongarch__
 #define ADJUST_PTRFN(func, virt) ((void (*)())(func))
 #define ADJUST_DELTA(delta, virt) (((delta) << 1) + !!(virt))
 #else
diff --git a/gcc/testsuite/g++.old-deja/g++.pt/ptrmem6.C 
b/gcc/testsuite/g++.old-deja/g++.pt/ptrmem6.C
index 9f4bbe43f89..8f8f7017ab7 100644
--- a/gcc/testsuite/g++.old-deja/g++.pt/ptrmem6.C
+++ b/gcc/testsuite/g++.old-deja/g++.pt/ptrmem6.C
@@ -25,7 +25,7 @@ int main() {
   h<::j>(); // { dg-error "" } 
   g<(void (A::*)()) ::f>(); // { dg-error "" "" { xfail c++11 } }
   h<(int A::*) ::i>(); // { dg-error "" "" { xfail c++11 } }
-  g<(void (A::*)()) ::f>(); // { dg-error "" "" { xfail { c++11 && { 
aarch64*-*-* arm*-*-* mips*-*-* } } } }
+  g<(void (A::*)()) ::f>(); // { dg-error "" "" { xfail { c++11 && { 
aarch64*-*-* arm*-*-* mips*-*-* loongarch*-*-* } } } }
   h<(int A::*) ::j>(); // { dg-error "" } 
   g<(void (A::*)()) 0>(); // { dg-error "" "" { target { ! c++11 } } }
   h<(int A::*) 0>(); // { dg-error "" "" { target { ! c++11 } } }
diff --git a/gcc/testsuite/gcc.dg/20020312-2.c 
b/gcc/testsuite/gcc.dg/20020312-2.c
index 52c33d09b90..92bc150df0f 100644
--- a/gcc/testsuite/gcc.dg/20020312-2.c
+++ b/gcc/testsuite/gcc.dg/20020312-2.c
@@ -37,6 +37,8 @@ extern void abort (void);
 /* PIC register is r1, but is used even without -fpic.  */
 #elif defined(__lm32__)
 /* No pic register.  */
+#elif defined(__loongarch__)
+/* No pic register.  */
 #elif defined(__M32R__)
 /* No pic register.  */
 #elif defined(__m68k__)
diff --git a/gcc/testsuite/gcc.dg/loop-8.c b/gcc/testsuite/gcc.dg/loop-8.c
index a685fc25056..8e5f2087831 100644
--- a/gcc/testsuite/gcc.dg/loop-8.c
+++ b/gcc/testsuite/gcc.dg/loop-8.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { 

[PATCH v8 08/12] LoongArch Port: libgcc

2022-03-03 Thread xuchenghua
From: chenglulu 

2022-03-04  Chenghua Xu  
Lulu Cheng  

libgcc/

* config/loongarch/crtfastmath.c: New file.
* config/loongarch/crti.S: Like wise.
* config/loongarch/crtn.S: Like wise.
* config/loongarch/linux-unwind.h: Like wise.
* config/loongarch/sfp-machine.h: Like wise.
* config/loongarch/t-crtstuff: Like wise.
* config/loongarch/t-loongarch: Like wise.
* config/loongarch/t-loongarch64: Like wise.
* config/loongarch/t-softfp-tf: Like wise.
* config.host: Add LoongArch tuples.
* configure.ac: Add LoongArch support.
---
 libgcc/config.host |  28 -
 libgcc/config/loongarch/crtfastmath.c  |  52 +
 libgcc/config/loongarch/crti.S |  43 +++
 libgcc/config/loongarch/crtn.S |  39 +++
 libgcc/config/loongarch/linux-unwind.h |  80 +
 libgcc/config/loongarch/sfp-machine.h  | 152 +
 libgcc/config/loongarch/t-crtstuff |   5 +
 libgcc/config/loongarch/t-loongarch|   7 ++
 libgcc/config/loongarch/t-loongarch64  |   1 +
 libgcc/config/loongarch/t-softfp-tf|   3 +
 libgcc/configure.ac|   2 +-
 11 files changed, 410 insertions(+), 2 deletions(-)
 create mode 100644 libgcc/config/loongarch/crtfastmath.c
 create mode 100644 libgcc/config/loongarch/crti.S
 create mode 100644 libgcc/config/loongarch/crtn.S
 create mode 100644 libgcc/config/loongarch/linux-unwind.h
 create mode 100644 libgcc/config/loongarch/sfp-machine.h
 create mode 100644 libgcc/config/loongarch/t-crtstuff
 create mode 100644 libgcc/config/loongarch/t-loongarch
 create mode 100644 libgcc/config/loongarch/t-loongarch64
 create mode 100644 libgcc/config/loongarch/t-softfp-tf

diff --git a/libgcc/config.host b/libgcc/config.host
index 094fd3ad254..8c56fcae5d2 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -138,6 +138,22 @@ hppa*-*-*)
 lm32*-*-*)
cpu_type=lm32
;;
+loongarch*-*-*)
+   cpu_type=loongarch
+   tmake_file="loongarch/t-loongarch"
+   if test "${libgcc_cv_loongarch_hard_float}" = yes; then
+   tmake_file="${tmake_file} t-hardfp-sfdf t-hardfp"
+   else
+   tmake_file="${tmake_file} t-softfp-sfdf"
+   fi
+   if test "${ac_cv_sizeof_long_double}" = 16; then
+   tmake_file="${tmake_file} loongarch/t-softfp-tf"
+   fi
+   if test "${host_address}" = 64; then
+   tmake_file="${tmake_file} loongarch/t-loongarch64"
+   fi
+   tmake_file="${tmake_file} t-softfp"
+   ;;
 m32r*-*-*)
 cpu_type=m32r
 ;;
@@ -925,7 +941,17 @@ lm32-*-rtems*)
 lm32-*-uclinux*)
 extra_parts="$extra_parts crtbegin.o crtendS.o crtbeginT.o"
 tmake_file="lm32/t-lm32 lm32/t-uclinux t-libgcc-pic t-softfp-sfdf 
t-softfp"
-   ;;  
+   ;;
+loongarch*-*-linux*)
+   extra_parts="$extra_parts crtfastmath.o"
+   tmake_file="${tmake_file} t-crtfm loongarch/t-crtstuff"
+   case ${host} in
+ *)
+   tmake_file="${tmake_file} t-slibgcc-libgcc"
+   ;;
+   esac
+   md_unwind_header=loongarch/linux-unwind.h
+   ;;
 m32r-*-elf*)
tmake_file="$tmake_file m32r/t-m32r t-fdpbit"
extra_parts="$extra_parts crtinit.o crtfini.o"
diff --git a/libgcc/config/loongarch/crtfastmath.c 
b/libgcc/config/loongarch/crtfastmath.c
new file mode 100644
index 000..52b0d6da087
--- /dev/null
+++ b/libgcc/config/loongarch/crtfastmath.c
@@ -0,0 +1,52 @@
+/* Copyright (C) 2021-2022 Free Software Foundation, Inc.
+   Contributed by Loongson Ltd.
+   Based on MIPS target for GNU compiler.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License
+and a copy of the GCC Runtime Library Exception along with this
+program; see the files COPYING3 and COPYING.RUNTIME respectively.
+If not, see .  */
+
+#ifdef __loongarch_hard_float
+
+/* Rounding control.  */
+#define _FPU_RC_NEAREST 0x000 /* RECOMMENDED.  */
+#define _FPU_RC_ZERO0x100
+#define _FPU_RC_UP  0x200
+#define _FPU_RC_DOWN0x300
+
+/* Enable interrupts for IEEE exceptions.  */
+#define _FPU_IEEE 0x001F
+
+/* Macros for accessing the hardware control word.  */
+#define _FPU_GETCW(cw) __asm__ volatile 

[PATCH v8 10/12] LoongArch Port: libgomp

2022-03-03 Thread xuchenghua
From: chenglulu 

2022-03-04  Chenghua Xu  
Lulu Cheng  

libgomp/

* configure.tgt: Add LoongArch triplet.
---
 libgomp/configure.tgt | 4 
 1 file changed, 4 insertions(+)

diff --git a/libgomp/configure.tgt b/libgomp/configure.tgt
index d4f1e741b5a..2cd7272fcd8 100644
--- a/libgomp/configure.tgt
+++ b/libgomp/configure.tgt
@@ -56,6 +56,10 @@ if test x$enable_linux_futex = xyes; then
config_path="linux/ia64 linux posix"
;;
 
+loongarch*-*-linux*)
+   config_path="linux posix"
+   ;;
+
 mips*-*-linux*)
config_path="linux/mips linux posix"
;;
-- 
2.31.1



[PATCH v8 09/12] LoongArch Port: Regenerate libgcc/configure.

2022-03-03 Thread xuchenghua
From: chenglulu 

2022-03-04  Chenghua Xu  
Lulu Cheng  

* libgcc/configure: Regenerated.
---
 libgcc/configure | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libgcc/configure b/libgcc/configure
index 52bf25d4e94..1f9b2ac578b 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -2403,6 +2403,9 @@ case "${host}" in
# sets the default TLS model and affects inlining.
PICFLAG=-fPIC
;;
+loongarch*-*-*)
+   PICFLAG=-fpic
+   ;;
 mips-sgi-irix6*)
# PIC is the default.
;;
@@ -5073,7 +5076,7 @@ $as_echo "$libgcc_cv_cfi" >&6; }
 # word size rather than the address size.
 cat > conftest.c <

[PATCH v8 06/12] LoongArch Port: Builtin functions.

2022-03-03 Thread xuchenghua
From: chenglulu 

2022-03-04  Chenghua Xu  
Lulu Cheng  

gcc/

* config/loongarch/larchintrin.h: New file.
* config/loongarch/loongarch-builtins.cc: New file.
---
 gcc/config/loongarch/larchintrin.h | 413 +
 gcc/config/loongarch/loongarch-builtins.cc | 511 +
 2 files changed, 924 insertions(+)
 create mode 100644 gcc/config/loongarch/larchintrin.h
 create mode 100644 gcc/config/loongarch/loongarch-builtins.cc

diff --git a/gcc/config/loongarch/larchintrin.h 
b/gcc/config/loongarch/larchintrin.h
new file mode 100644
index 000..d8e2a743ae5
--- /dev/null
+++ b/gcc/config/loongarch/larchintrin.h
@@ -0,0 +1,413 @@
+/* Intrinsics for LoongArch BASE operations.
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+   Contributed by Loongson Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published
+by the Free Software Foundation; either version 3, or (at your
+option) any later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#ifndef _GCC_LOONGARCH_BASE_INTRIN_H
+#define _GCC_LOONGARCH_BASE_INTRIN_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef struct drdtime
+{
+  unsigned long dvalue;
+  unsigned long dtimeid;
+} __drdtime_t;
+
+typedef struct rdtime
+{
+  unsigned int value;
+  unsigned int timeid;
+} __rdtime_t;
+
+#ifdef __loongarch64
+extern __inline __drdtime_t
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__builtin_loongarch_rdtime_d (void)
+{
+  __drdtime_t drdtime;
+  __asm__ volatile (
+"rdtime.d\t%[val],%[tid]\n\t"
+: [val]"="(drdtime.dvalue),[tid]"="(drdtime.dtimeid)
+:);
+  return drdtime;
+}
+#define __rdtime_d __builtin_loongarch_rdtime_d
+#endif
+
+extern __inline __rdtime_t
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__builtin_loongarch_rdtimeh_w (void)
+{
+  __rdtime_t rdtime;
+  __asm__ volatile (
+"rdtimeh.w\t%[val],%[tid]\n\t"
+: [val]"="(rdtime.value),[tid]"="(rdtime.timeid)
+:);
+  return rdtime;
+}
+#define __rdtimel_w __builtin_loongarch_rdtimel_w
+
+extern __inline __rdtime_t
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__builtin_loongarch_rdtimel_w (void)
+{
+  __rdtime_t rdtime;
+  __asm__ volatile (
+"rdtimel.w\t%[val],%[tid]\n\t"
+: [val]"="(rdtime.value),[tid]"="(rdtime.timeid)
+:);
+  return rdtime;
+}
+#define __rdtimeh_w __builtin_loongarch_rdtimeh_w
+
+/* Assembly instruction format:rj, fcsr.  */
+/* Data types in instruction templates:  USI, UQI.  */
+#define __movfcsr2gr(/*ui5*/ _1) __builtin_loongarch_movfcsr2gr ((_1));
+
+/* Assembly instruction format:0, fcsr, rj.  */
+/* Data types in instruction templates:  VOID, UQI, USI.  */
+#define __movgr2fcsr(/*ui5*/ _1, _2) \
+  __builtin_loongarch_movgr2fcsr ((unsigned short) _1, (unsigned int) _2);
+
+#if defined __loongarch64
+/* Assembly instruction format:ui5, rj, si12.  */
+/* Data types in instruction templates:  VOID, USI, UDI, SI.  */
+#define __dcacop(/*ui5*/ _1, /*unsigned long int*/ _2, /*si12*/ _3) \
+  ((void) __builtin_loongarch_dcacop ((_1), (unsigned long int) (_2), (_3)))
+#else
+#error "Don't support this ABI."
+#endif
+
+/* Assembly instruction format:rd, rj.  */
+/* Data types in instruction templates:  USI, USI.  */
+extern __inline unsigned int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__cpucfg (unsigned int _1)
+{
+  return (unsigned int) __builtin_loongarch_cpucfg ((unsigned int) _1);
+}
+
+#ifdef __loongarch64
+/* Assembly instruction format:rd, rj.  */
+/* Data types in instruction templates:  DI, DI.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__asrtle_d (long int _1, long int _2)
+{
+  __builtin_loongarch_asrtle_d ((long int) _1, (long int) _2);
+}
+
+/* Assembly instruction format:rd, rj.  */
+/* Data types in instruction templates:  DI, DI.  */
+extern __inline void
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__asrtgt_d (long int _1, long int _2)
+{
+  __builtin_loongarch_asrtgt_d ((long int) _1, (long int) _2);
+}
+#endif
+
+#if defined __loongarch64
+/* Assembly instruction format:rd, rj, ui5.  */
+/* Data types 

[PATCH v8 02/12] LoongArch Port: gcc build

2022-03-03 Thread xuchenghua
From: chenglulu 

2022-03-04  Chenghua Xu  
Lulu Cheng  

gcc/

* common/config/loongarch/loongarch-common.cc: New file.
* config/loongarch/genopts/genstr.sh: New file.
* config/loongarch/genopts/loongarch-strings: New file.
* config/loongarch/genopts/loongarch.opt.in: New file.
* config/loongarch/loongarch-str.h: New file.
* config/loongarch/gnu-user.h: New file.
* config/loongarch/linux.h: New file.
* config/loongarch/loongarch-cpu.cc: New file.
* config/loongarch/loongarch-cpu.h: New file.
* config/loongarch/loongarch-def.c: New file.
* config/loongarch/loongarch-def.h: New file.
* config/loongarch/loongarch-driver.cc: New file.
* config/loongarch/loongarch-driver.h: New file.
* config/loongarch/loongarch-opts.cc: New file.
* config/loongarch/loongarch-opts.h: New file.
* config/loongarch/loongarch.opt: New file.
* config/loongarch/t-linux: New file.
* config/loongarch/t-loongarch: New file.
* gcc_update (files_and_dependencies): Add
config/loongarch/loongarch.opt and config/loongarch/loongarch-str.h.
* config.gcc: Add LoongArch support.
* configure.ac: Add LoongArch support.
---
 contrib/gcc_update|   2 +
 .../config/loongarch/loongarch-common.cc  |  73 +++
 gcc/config.gcc| 410 -
 gcc/config/loongarch/genopts/genstr.sh|  91 +++
 .../loongarch/genopts/loongarch-strings   |  58 ++
 gcc/config/loongarch/genopts/loongarch.opt.in | 189 ++
 gcc/config/loongarch/gnu-user.h   |  84 +++
 gcc/config/loongarch/linux.h  |  50 ++
 gcc/config/loongarch/loongarch-cpu.cc | 206 +++
 gcc/config/loongarch/loongarch-cpu.h  |  30 +
 gcc/config/loongarch/loongarch-def.c  | 164 +
 gcc/config/loongarch/loongarch-def.h  | 151 +
 gcc/config/loongarch/loongarch-driver.cc  | 187 ++
 gcc/config/loongarch/loongarch-driver.h   |  69 +++
 gcc/config/loongarch/loongarch-opts.cc| 580 ++
 gcc/config/loongarch/loongarch-opts.h |  86 +++
 gcc/config/loongarch/loongarch-str.h  |  57 ++
 gcc/config/loongarch/loongarch.opt| 189 ++
 gcc/config/loongarch/t-linux  |  53 ++
 gcc/config/loongarch/t-loongarch  |  68 ++
 gcc/configure.ac  |  33 +-
 21 files changed, 2825 insertions(+), 5 deletions(-)
 create mode 100644 gcc/common/config/loongarch/loongarch-common.cc
 create mode 100755 gcc/config/loongarch/genopts/genstr.sh
 create mode 100644 gcc/config/loongarch/genopts/loongarch-strings
 create mode 100644 gcc/config/loongarch/genopts/loongarch.opt.in
 create mode 100644 gcc/config/loongarch/gnu-user.h
 create mode 100644 gcc/config/loongarch/linux.h
 create mode 100644 gcc/config/loongarch/loongarch-cpu.cc
 create mode 100644 gcc/config/loongarch/loongarch-cpu.h
 create mode 100644 gcc/config/loongarch/loongarch-def.c
 create mode 100644 gcc/config/loongarch/loongarch-def.h
 create mode 100644 gcc/config/loongarch/loongarch-driver.cc
 create mode 100644 gcc/config/loongarch/loongarch-driver.h
 create mode 100644 gcc/config/loongarch/loongarch-opts.cc
 create mode 100644 gcc/config/loongarch/loongarch-opts.h
 create mode 100644 gcc/config/loongarch/loongarch-str.h
 create mode 100644 gcc/config/loongarch/loongarch.opt
 create mode 100644 gcc/config/loongarch/t-linux
 create mode 100644 gcc/config/loongarch/t-loongarch

diff --git a/contrib/gcc_update b/contrib/gcc_update
index 1cf15f9b3c2..641ce164775 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -86,6 +86,8 @@ gcc/config/arm/arm-tables.opt: gcc/config/arm/arm-cpus.in 
gcc/config/arm/parsecp
 gcc/config/c6x/c6x-tables.opt: gcc/config/c6x/c6x-isas.def 
gcc/config/c6x/genopt.sh
 gcc/config/c6x/c6x-sched.md: gcc/config/c6x/c6x-sched.md.in 
gcc/config/c6x/gensched.sh
 gcc/config/c6x/c6x-mult.md: gcc/config/c6x/c6x-mult.md.in 
gcc/config/c6x/genmult.sh
+gcc/config/loongarch/loongarch-str.h: gcc/config/loongarch/genopts/genstr.sh 
gcc/config/loongarch/genopts/loongarch-string
+gcc/config/loongarch/loongarch.opt: gcc/config/loongarch/genopts/genstr.sh 
gcc/config/loongarch/genopts/loongarch.opt.in
 gcc/config/m68k/m68k-tables.opt: gcc/config/m68k/m68k-devices.def 
gcc/config/m68k/m68k-isas.def gcc/config/m68k/m68k-microarchs.def 
gcc/config/m68k/genopt.sh
 gcc/config/mips/mips-tables.opt: gcc/config/mips/mips-cpus.def 
gcc/config/mips/genopt.sh
 gcc/config/rs6000/rs6000-tables.opt: gcc/config/rs6000/rs6000-cpus.def 
gcc/config/rs6000/genopt.sh
diff --git a/gcc/common/config/loongarch/loongarch-common.cc 
b/gcc/common/config/loongarch/loongarch-common.cc
new file mode 100644
index 000..5bdfd2a30e1
--- /dev/null
+++ b/gcc/common/config/loongarch/loongarch-common.cc
@@ -0,0 +1,73 @@
+/* Common hooks for LoongArch.

[PATCH v8 07/12] LoongArch Port: Builtin macros.

2022-03-03 Thread xuchenghua
From: chenglulu 

2022-03-04  Chenghua Xu  
Lulu Cheng  

gcc/

*config/loongarch/loongarch-c.cc
---
 gcc/config/loongarch/loongarch-c.cc | 109 
 1 file changed, 109 insertions(+)
 create mode 100644 gcc/config/loongarch/loongarch-c.cc

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
new file mode 100644
index 000..e914bf306d5
--- /dev/null
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -0,0 +1,109 @@
+/* LoongArch-specific code for C family languages.
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+   Contributed by Loongson Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "c-family/c-common.h"
+#include "cpplib.h"
+
+#define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM)
+#define builtin_define(TXT) cpp_define (pfile, TXT)
+#define builtin_assert(TXT) cpp_assert (pfile, TXT)
+
+/* Define preprocessor macros for the -march and -mtune options.
+   PREFIX is either _LOONGARCH_ARCH or _LOONGARCH_TUNE, INFO is
+   the selected processor.  If INFO's canonical name is "foo",
+   define PREFIX to be "foo", and define an additional macro
+   PREFIX_FOO.  */
+#define LARCH_CPP_SET_PROCESSOR(PREFIX, CPU_TYPE)  \
+  do   \
+{  \
+  char *macro, *p; \
+  int cpu_type = (CPU_TYPE);   \
+   \
+  macro = concat ((PREFIX), "_",   \
+ loongarch_cpu_strings[cpu_type], NULL);   \
+  for (p = macro; *p != 0; p++)\
+   *p = TOUPPER (*p);  \
+   \
+  builtin_define (macro);  \
+  builtin_define_with_value ((PREFIX), \
+loongarch_cpu_strings[cpu_type], 1);   \
+  free (macro);\
+}  \
+  while (0)
+
+void
+loongarch_cpu_cpp_builtins (cpp_reader *pfile)
+{
+  builtin_assert ("machine=loongarch");
+  builtin_assert ("cpu=loongarch");
+  builtin_define ("__loongarch__");
+
+  LARCH_CPP_SET_PROCESSOR ("_LOONGARCH_ARCH", __ACTUAL_ARCH);
+  LARCH_CPP_SET_PROCESSOR ("_LOONGARCH_TUNE", __ACTUAL_TUNE);
+
+  /* Base architecture / ABI.  */
+  if (TARGET_64BIT)
+{
+  builtin_define ("__loongarch_grlen=64");
+  builtin_define ("__loongarch64");
+}
+
+  if (TARGET_ABI_LP64)
+{
+  builtin_define ("_ABILP64=3");
+  builtin_define ("_LOONGARCH_SIM=_ABILP64");
+  builtin_define ("__loongarch_lp64");
+}
+
+  /* These defines reflect the ABI in use, not whether the
+ FPU is directly accessible.  */
+  if (TARGET_DOUBLE_FLOAT_ABI)
+builtin_define ("__loongarch_double_float=1");
+  else if (TARGET_SINGLE_FLOAT_ABI)
+builtin_define ("__loongarch_single_float=1");
+
+  if (TARGET_DOUBLE_FLOAT_ABI || TARGET_SINGLE_FLOAT_ABI)
+builtin_define ("__loongarch_hard_float=1");
+  else
+builtin_define ("__loongarch_soft_float=1");
+
+
+  /* ISA Extensions.  */
+  if (TARGET_DOUBLE_FLOAT)
+builtin_define ("__loongarch_frlen=64");
+  else if (TARGET_SINGLE_FLOAT)
+builtin_define ("__loongarch_frlen=32");
+  else
+builtin_define ("__loongarch_frlen=0");
+
+  /* Native Data Sizes.  */
+  builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
+  builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE);
+  builtin_define_with_int_value ("_LOONGARCH_SZPTR", POINTER_SIZE);
+  builtin_define_with_int_value ("_LOONGARCH_FPSET", 32 / MAX_FPRS_PER_FMT);
+  builtin_define_with_int_value ("_LOONGARCH_SPFPSET", 32);
+
+}
-- 
2.31.1



[PATCH v8 03/12] LoongArch Port: Regenerate gcc/configure.

2022-03-03 Thread xuchenghua
From: chenglulu 

2022-03-04  Chenghua Xu  
Lulu Cheng  

gcc/
* configure: Regenerate.
---
 gcc/configure | 66 ++-
 1 file changed, 60 insertions(+), 6 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 14b19c8fe0c..1c1195e95cb 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -5442,6 +5442,9 @@ case "${target}" in
# sets the default TLS model and affects inlining.
PICFLAG_FOR_TARGET=-fPIC
;;
+loongarch*-*-*)
+   PICFLAG_FOR_TARGET=-fpic
+   ;;
 mips-sgi-irix6*)
# PIC is the default.
;;
@@ -7963,6 +7966,9 @@ else
 mips*-*-*)
   enable_fixed_point=yes
   ;;
+loongarch*-*-*)
+  enable_fixed_point=yes
+  ;;
 *)
   { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: fixed-point is not 
supported for this target, ignored" >&5
 $as_echo "$as_me: WARNING: fixed-point is not supported for this target, 
ignored" >&2;}
@@ -19667,7 +19673,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19670 "configure"
+#line 19676 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19773,7 +19779,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19776 "configure"
+#line 19782 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -25556,6 +25562,17 @@ foo:   data8   25
movlr24 = @tprel(foo#)'
tls_as_opt=--fatal-warnings
;;
+  loongarch*-*-*)
+conftest_s='
+   .section .tdata,"awT",@progbits
+x: .word 2
+   .text
+   la.tls.gd $a0,x
+   bl __tls_get_addr'
+   tls_first_major=0
+   tls_first_minor=0
+   tls_as_opt='--fatal-warnings'
+   ;;
   microblaze*-*-*)
 conftest_s='
.section .tdata,"awT",@progbits
@@ -28780,6 +28797,43 @@ $as_echo "#define HAVE_AS_MARCH_ZIFENCEI 1" 
>>confdefs.h
 fi
 
 ;;
+  loongarch*-*-*)
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
.dtprelword support" >&5
+$as_echo_n "checking assembler for .dtprelword support... " >&6; }
+if ${gcc_cv_as_loongarch_dtprelword+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_loongarch_dtprelword=no
+  if test x$gcc_cv_as != x; then
+$as_echo '' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags 2,18,0 -o conftest.o conftest.s 
>&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   .section .tdata,"awT",@progbits
+x:
+   .word 2
+   .text
+   .dtprelword x+0x8000
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$gcc_cv_as_loongarch_dtprelword" >&5
+$as_echo "$gcc_cv_as_loongarch_dtprelword" >&6; }
+
+if test $gcc_cv_as_loongarch_dtprelword != yes; then
+
+$as_echo "#define HAVE_AS_DTPRELWORD 1" >>confdefs.h
+
+fi
+;;
 s390*-*-*)
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
.gnu_attribute support" >&5
 $as_echo_n "checking assembler for .gnu_attribute support... " >&6; }
@@ -28943,11 +28997,11 @@ fi
 ;;
 esac
 
-# Mips and HP-UX need the GNU assembler.
+# Mips, LoongArch and HP-UX need the GNU assembler.
 # Linux on IA64 might be able to use the Intel assembler.
 
 case "$target" in
-  mips*-*-* | *-*-hpux* )
+  mips*-*-* | loongarch*-*-* | *-*-hpux* )
 if test x$gas_flag = xyes \
|| test x"$host" != x"$build" \
|| test ! -x "$gcc_cv_as" \
@@ -29384,8 +29438,8 @@ esac
 # ??? Once 2.11 is released, probably need to add first known working
 # version to the per-target configury.
 case "$cpu_type" in
-  aarch64 | alpha | arc | arm | avr | bfin | cris | csky | i386 | m32c | m68k \
-  | microblaze | mips | nds32 | nios2 | pa | riscv | rs6000 | score | sparc \
+  aarch64 | alpha | arc | arm | avr | bfin | cris | csky | i386 | loongarch | 
m32c \
+  | m68k | microblaze | mips | nds32 | nios2 | pa | riscv | rs6000 | score | 
sparc \
   | tilegx | tilepro | visium | xstormy16 | xtensa)
 insn="nop"
 ;;
-- 
2.31.1



[PATCH v8 01/12] LoongArch Port: Regenerate configure

2022-03-03 Thread xuchenghua
From: chenglulu 

2022-03-04  Chenghua Xu  
Lulu Cheng  

* config/picflag.m4: Default add build option '-fpic' for LoongArch.
* configure: Add LoongArch tuples.
* configure.ac: Like wise.
---
 config/picflag.m4 |  3 +++
 configure | 10 +-
 configure.ac  | 10 +-
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/config/picflag.m4 b/config/picflag.m4
index 8b106f9af88..0aefcf619bf 100644
--- a/config/picflag.m4
+++ b/config/picflag.m4
@@ -44,6 +44,9 @@ case "${$2}" in
# sets the default TLS model and affects inlining.
$1=-fPIC
;;
+loongarch*-*-*)
+   $1=-fpic
+   ;;
 mips-sgi-irix6*)
# PIC is the default.
;;
diff --git a/configure b/configure
index 9c2d7df1bb2..87548f0da96 100755
--- a/configure
+++ b/configure
@@ -3060,7 +3060,7 @@ case "${ENABLE_GOLD}" in
   # Check for target supported by gold.
   case "${target}" in
 i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \
-| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*)
+| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*)
  configdirs="$configdirs gold"
  if test x${ENABLE_GOLD} = xdefault; then
default_ld=gold
@@ -3646,6 +3646,9 @@ case "${target}" in
   i[3456789]86-*-*)
 libgloss_dir=i386
 ;;
+  loongarch*-*-*)
+libgloss_dir=loongarch
+;;
   m68hc11-*-*|m6811-*-*|m68hc12-*-*|m6812-*-*)
 libgloss_dir=m68hc11
 ;;
@@ -4030,6 +4033,11 @@ case "${target}" in
   wasm32-*-*)
 noconfigdirs="$noconfigdirs ld"
 ;;
+  loongarch*-*-linux*)
+;;
+  loongarch*-*-*)
+noconfigdirs="$noconfigdirs gprof"
+;;
 esac
 
 # If we aren't building newlib, then don't build libgloss, since libgloss
diff --git a/configure.ac b/configure.ac
index 68cc5cc31fe..55362afeeae 100644
--- a/configure.ac
+++ b/configure.ac
@@ -353,7 +353,7 @@ case "${ENABLE_GOLD}" in
   # Check for target supported by gold.
   case "${target}" in
 i?86-*-* | x86_64-*-* | sparc*-*-* | powerpc*-*-* | arm*-*-* \
-| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-*)
+| aarch64*-*-* | tilegx*-*-* | mips*-*-* | s390*-*-* | loongarch*-*-*)
  configdirs="$configdirs gold"
  if test x${ENABLE_GOLD} = xdefault; then
default_ld=gold
@@ -899,6 +899,9 @@ case "${target}" in
   i[[3456789]]86-*-*)
 libgloss_dir=i386
 ;;
+  loongarch*-*-*)
+libgloss_dir=loongarch
+;;
   m68hc11-*-*|m6811-*-*|m68hc12-*-*|m6812-*-*)
 libgloss_dir=m68hc11
 ;;
@@ -1283,6 +1286,11 @@ case "${target}" in
   wasm32-*-*)
 noconfigdirs="$noconfigdirs ld"
 ;;
+  loongarch*-*-linux*)
+;;
+  loongarch*-*-*)
+noconfigdirs="$noconfigdirs gprof"
+;;
 esac
 
 # If we aren't building newlib, then don't build libgloss, since libgloss
-- 
2.31.1



[PATCH v8 00/12] Add LoongArch support.

2022-03-03 Thread xuchenghua
From: chenglulu 

Hi, all:

This is the v8 version of LoongArch Port. Please review.

We know it is stage4, I think it is ok for a new prot.
The kernel side upstream waiting for a approval by gcc side,
if it is blocked by stage4, a approval for GCC13 will be appreciation.

The LoongArch architecture (LoongArch) is an Instruction Set
Architecture (ISA) that has a Reduced Instruction Set Computer (RISC)
style.
The documents are on
https://loongson.github.io/LoongArch-Documentation/README-EN.html

The ELF ABI Documents are on:
https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html

The binutils has been merged into trunk:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=560b3fe208255ae909b4b1c88ba9c28b09043307

Note: We split -mabi= into -mabi=lp64d/f/s, the new options not support by 
upstream binutils yet,
this GCC port requires the following patch applied to binutils to build.
https://github.com/loongson/binutils-gdb/commit/aacb0bf860f02aa5a7dcb76dd0e392bf871c7586
(will be submitted to upstream after gcc side comfirmed)

We have compiled more than 300 CLFS packages with this compiler.
The CLFS are currently used on Cfarm machines gcc400 and gcc401.

changelog:

v1 -> v2
1. Split patch set.
2. Change some code style.
3. Add -mabi=lp64d/f/s options.
4. Change GLIBC_DYNAMIC_LINKER_LP64 name.

v2 -> v3
1. Change some code style.
2. Bug fix.

v3 -> v4
1. Change some code style.
2. Bug fix.
3. Delete some builtin macros.

v4 -> v5
1. delete wrong insn zero_extendsidi2_internal.
2. Adjust some build options.
3. Change some .c files to .cc.

v5 -> v6
1. Fix compilation issues. The generated files *.opt and *.h
   are generated to $(objdir).

v6 -> v7
1. Bug fix.
2. Change some code style.

v7 -> v8
1. Add new addressing type ADDRESS_REG_REG support.
2. Modify documentation.
3. Eliminate compile-time warnings.

chenglulu (12):
  LoongArch Port: Regenerate configure
  LoongArch Port: gcc build
  LoongArch Port: Regenerate gcc/configure.
  LoongArch Port: Machine description files.
  LoongArch Port: Machine description C files and .h files.
  LoongArch Port: Builtin functions.
  LoongArch Port: Builtin macros.
  LoongArch Port: libgcc
  LoongArch Port: Regenerate libgcc/configure.
  LoongArch Port: libgomp
  LoongArch Port: gcc/testsuite
  LoongArch Port: Add doc.

 config/picflag.m4 |3 +
 configure |   10 +-
 configure.ac  |   10 +-
 contrib/config-list.mk|5 +-
 contrib/gcc_update|2 +
 .../config/loongarch/loongarch-common.cc  |   73 +
 gcc/config.gcc|  410 +-
 gcc/config/host-linux.cc  |2 +
 gcc/config/loongarch/constraints.md   |  204 +
 gcc/config/loongarch/generic.md   |  132 +
 gcc/config/loongarch/genopts/genstr.sh|   91 +
 .../loongarch/genopts/loongarch-strings   |   58 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  189 +
 gcc/config/loongarch/gnu-user.h   |   84 +
 gcc/config/loongarch/la464.md |  132 +
 gcc/config/loongarch/larchintrin.h|  413 ++
 gcc/config/loongarch/linux.h  |   50 +
 gcc/config/loongarch/loongarch-builtins.cc|  511 ++
 gcc/config/loongarch/loongarch-c.cc   |  109 +
 gcc/config/loongarch/loongarch-cpu.cc |  206 +
 gcc/config/loongarch/loongarch-cpu.h  |   30 +
 gcc/config/loongarch/loongarch-def.c  |  164 +
 gcc/config/loongarch/loongarch-def.h  |  151 +
 gcc/config/loongarch/loongarch-driver.cc  |  187 +
 gcc/config/loongarch/loongarch-driver.h   |   69 +
 gcc/config/loongarch/loongarch-ftypes.def |  106 +
 gcc/config/loongarch/loongarch-modes.def  |   29 +
 gcc/config/loongarch/loongarch-opts.cc|  580 ++
 gcc/config/loongarch/loongarch-opts.h |   86 +
 gcc/config/loongarch/loongarch-protos.h   |  240 +
 gcc/config/loongarch/loongarch-str.h  |   57 +
 gcc/config/loongarch/loongarch-tune.h |   72 +
 gcc/config/loongarch/loongarch.cc | 6450 +
 gcc/config/loongarch/loongarch.h  | 1273 
 gcc/config/loongarch/loongarch.md | 3712 ++
 gcc/config/loongarch/loongarch.opt|  189 +
 gcc/config/loongarch/predicates.md|  527 ++
 gcc/config/loongarch/sync.md  |  574 ++
 gcc/config/loongarch/t-linux  |   53 +
 gcc/config/loongarch/t-loongarch  |   68 +
 gcc/configure |   66 +-
 gcc/configure.ac  |   33 +-
 gcc/doc/install.texi  |   47 +-
 gcc/doc/invoke.texi   |  202 +
 gcc/doc/md.texi   |   55 +
 gcc/testsuite/g++.dg/cpp0x/constexpr-rom.C|2 +-
 gcc/testsuite/g++.old-deja/g++.abi/ptrmem.C   |2 

[r12-7468 Regression] FAIL: g++.dg/warn/Wstringop-overflow-6.C -std=gnu++20 (test for excess errors) on Linux/x86_64

2022-03-03 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

9805965e3551b66b5bd751d6076791d00cdeb137 is the first bad commit
commit 9805965e3551b66b5bd751d6076791d00cdeb137
Author: Jonathan Wakely 
Date:   Thu Mar 3 12:34:27 2022 +

libstdc++: Implement std::strong_order for floating-point types [PR96526]

caused

FAIL: g++.dg/warn/Wstringop-overflow-6.C  -std=gnu++20 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-7468/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-6.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-6.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-6.C 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-6.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH 3/3] RISC-V:Cache Management Operation instructions testcases

2022-03-03 Thread yulong
From: yulong-plct 

This commit adds testcases about CMO instructions.
  7
  8 gcc/testsuite/ChangeLog:
  9
 10 * gcc.target/riscv/cmo-zicbom-1.c: New test.
 11 * gcc.target/riscv/cmo-zicbom-2.c: New test.
 12 * gcc.target/riscv/cmo-zicbop-1.c: New test.
 13 * gcc.target/riscv/cmo-zicbop-2.c: New test.
 14 * gcc.target/riscv/cmo-zicboz-1.c: New test.
 15 * gcc.target/riscv/cmo-zicboz-2.c: New test.

---
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c | 21 +
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c | 21 +
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c | 23 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 23 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c |  9 
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c |  9 
 6 files changed, 106 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c

diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
new file mode 100644
index 000..16935ff3d31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicbom -mabi=lp64" } */
+
+int foo1()
+{
+return __builtin_riscv_clean();
+}
+
+int foo2()
+{
+return __builtin_riscv_flush();
+}
+
+int foo3()
+{
+return __builtin_riscv_inval();
+}
+
+/* { dg-final { scan-assembler-times "cbo.clean" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.flush" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.inval" 1 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
new file mode 100644
index 000..fc14f2b9c2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zicbom -mabi=ilp32" } */
+
+int foo1()
+{
+return __builtin_riscv_clean();
+}
+
+int foo2()
+{
+return __builtin_riscv_flush();
+}
+
+int foo3()
+{
+return __builtin_riscv_inval();
+}
+
+/* { dg-final { scan-assembler-times "cbo.clean" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.flush" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.inval" 1 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
new file mode 100644
index 000..b8bac2e8c51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile target { { rv64-*-*}}} */
+/* { dg-options "-march=rv64gc_zicbop -mabi=lp64" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);
+  __builtin_prefetch (p, 1, 1);
+  __builtin_prefetch (p, 1, 2);
+  __builtin_prefetch (p, 1, 3);
+}
+
+int foo1()
+{
+  return __builtin_riscv_prefetchi(1);
+}
+
+/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
new file mode 100644
index 000..5ace6e2b349
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile target { { rv32-*-*}}} */
+/* { dg-options "-march=rv32gc_zicbop -mabi=ilp32" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);
+  __builtin_prefetch (p, 1, 1);
+  __builtin_prefetch (p, 1, 2);
+  __builtin_prefetch (p, 1, 3);
+}
+
+int foo1()
+{
+  return __builtin_riscv_prefetchi(1);
+}
+
+/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
new file mode 100644
index 000..c2401fe0cf9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicboz -mabi=lp64" } */
+
+int foo1()
+{
+return __builtin_riscv_zero();
+}
+
+/* { dg-final { scan-assembler-times "cbo.zero" 1 } } */
\ No newline at end of file
diff --git 

[PATCH 2/3] RISC-V:Cache Management Operation instructions

2022-03-03 Thread yulong
From: yulong-plct 

This commit adds cbo.clea,cbo.flush,cbo.inval,cbo.zero,prefetch.i,prefetch.r 
and prefetch.w instructions.
  7
  8 gcc/ChangeLog:
  9
 10 * config/riscv/predicates.md (imm5_operand): Add a new operand type 
for prefetch instructions.
 11 * config/riscv/riscv-builtins.cc (AVAIL): Add new AVAILs for CMO 
ISA Extensions.
 12 (RISCV_ATYPE_SI): New.
 13 (RISCV_ATYPE_DI): New.
 14 * config/riscv/riscv-ftypes.def (0):  New.
 15 (1): New.
 16 * config/riscv/riscv.md (riscv_clean_): Add a new mode for 
cbo.clean instruction.
 17 (riscv_flush_): Add a new mode for cbo.flush instruction.
 18 (riscv_inval_): Add a new mode for cbo.inval instruction.
 19 (riscv_zero_): Add a new mode for cbo.zero instruction.
 20 (prefetch): Add a new mode for prefetch.r and prefetch.w 
instructions.
 21 (riscv_prefetchi_): Add a new mode for prefetch.i instruction.
 22 * config/riscv/riscv-cmo.def: New file.

---
 gcc/config/riscv/predicates.md |  4 +++
 gcc/config/riscv/riscv-builtins.cc | 16 +
 gcc/config/riscv/riscv-cmo.def | 17 ++
 gcc/config/riscv/riscv-ftypes.def  |  4 +++
 gcc/config/riscv/riscv.md  | 52 ++
 5 files changed, 93 insertions(+)
 create mode 100644 gcc/config/riscv/riscv-cmo.def

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 97cdbdf053b..3fb4d95ab08 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -239,3 +239,7 @@
 (define_predicate "const63_operand"
   (and (match_code "const_int")
(match_test "INTVAL (op) == 63")))
+
+(define_predicate "imm5_operand"
+  (and (match_code "const_int")
+   (match_test "INTVAL (op) < 5")))
\ No newline at end of file
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 0658f8d3047..795132a0c16 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -87,6 +87,18 @@ struct riscv_builtin_description {
 
 AVAIL (hard_float, TARGET_HARD_FLOAT)
 
+
+AVAIL (clean32, TARGET_ZICBOM && !TARGET_64BIT)
+AVAIL (clean64, TARGET_ZICBOM && TARGET_64BIT)
+AVAIL (flush32, TARGET_ZICBOM && !TARGET_64BIT)
+AVAIL (flush64, TARGET_ZICBOM && TARGET_64BIT)
+AVAIL (inval32, TARGET_ZICBOM && !TARGET_64BIT)
+AVAIL (inval64, TARGET_ZICBOM && TARGET_64BIT)
+AVAIL (zero32,  TARGET_ZICBOZ && !TARGET_64BIT)
+AVAIL (zero64,  TARGET_ZICBOZ && TARGET_64BIT)
+AVAIL (prefetchi32, TARGET_ZICBOP && !TARGET_64BIT)
+AVAIL (prefetchi64, TARGET_ZICBOP && TARGET_64BIT)
+
 /* Construct a riscv_builtin_description from the given arguments.
 
INSN is the name of the associated instruction pattern, without the
@@ -119,6 +131,8 @@ AVAIL (hard_float, TARGET_HARD_FLOAT)
 /* Argument types.  */
 #define RISCV_ATYPE_VOID void_type_node
 #define RISCV_ATYPE_USI unsigned_intSI_type_node
+#define RISCV_ATYPE_SI intSI_type_node
+#define RISCV_ATYPE_DI intDI_type_node
 
 /* RISCV_FTYPE_ATYPESN takes N RISCV_FTYPES-like type codes and lists
their associated RISCV_ATYPEs.  */
@@ -128,6 +142,8 @@ AVAIL (hard_float, TARGET_HARD_FLOAT)
   RISCV_ATYPE_##A, RISCV_ATYPE_##B
 
 static const struct riscv_builtin_description riscv_builtins[] = {
+  #include "riscv-cmo.def"
+
   DIRECT_BUILTIN (frflags, RISCV_USI_FTYPE, hard_float),
   DIRECT_NO_TARGET_BUILTIN (fsflags, RISCV_VOID_FTYPE_USI, hard_float)
 };
diff --git a/gcc/config/riscv/riscv-cmo.def b/gcc/config/riscv/riscv-cmo.def
new file mode 100644
index 000..8829a1d664d
--- /dev/null
+++ b/gcc/config/riscv/riscv-cmo.def
@@ -0,0 +1,17 @@
+// zicbom
+RISCV_BUILTIN (clean_si, "clean", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE, 
clean32),
+RISCV_BUILTIN (clean_di, "clean", RISCV_BUILTIN_DIRECT, RISCV_DI_FTYPE, 
clean64),
+
+RISCV_BUILTIN (flush_si, "flush", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE, 
flush32),
+RISCV_BUILTIN (flush_di, "flush", RISCV_BUILTIN_DIRECT, RISCV_DI_FTYPE, 
flush64),
+
+RISCV_BUILTIN (inval_si, "inval", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE, 
inval32),
+RISCV_BUILTIN (inval_di, "inval", RISCV_BUILTIN_DIRECT, RISCV_DI_FTYPE, 
inval64),
+
+// zicboz
+RISCV_BUILTIN (zero_si, "zero", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE, zero32),
+RISCV_BUILTIN (zero_di, "zero", RISCV_BUILTIN_DIRECT, RISCV_DI_FTYPE, zero64),
+
+// zicbop
+RISCV_BUILTIN (prefetchi_si, "prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE_SI, prefetchi32),
+RISCV_BUILTIN (prefetchi_di, "prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE_DI, prefetchi64),
\ No newline at end of file
diff --git a/gcc/config/riscv/riscv-ftypes.def 
b/gcc/config/riscv/riscv-ftypes.def
index 2214c496f9b..62421292ce7 100644
--- a/gcc/config/riscv/riscv-ftypes.def
+++ b/gcc/config/riscv/riscv-ftypes.def
@@ -28,3 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 
 DEF_RISCV_FTYPE (0, (USI))
 DEF_RISCV_FTYPE (1, (VOID, USI))
+DEF_RISCV_FTYPE (0, (SI))
+DEF_RISCV_FTYPE (0, (DI))

[PATCH 1/3] RISC-V: Add mininal support for Zicbo[mzp]

2022-03-03 Thread yulong
From: yulong-plct 

This commit adds minimal support for 'Zicbom','Zicboz' and 'Zicbop' extensions.
  7
  8 gcc/ChangeLog:
  9
 10 * common/config/riscv/riscv-common.cc: Add zicbom, zicboz, zicbop 
extensions.
 11 * config/riscv/riscv-opts.h (MASK_ZICBOZ): New.
 12 (MASK_ZICBOM): New.
 13 (MASK_ZICBOP): New.
 14 (TARGET_ZICBOZ): New.
 15 (TARGET_ZICBOM): New.
 16 (TARGET_ZICBOP): New.
 17 * config/riscv/riscv.opt: New.

---
 gcc/common/config/riscv/riscv-common.cc | 6 ++
 gcc/config/riscv/riscv-opts.h   | 9 +
 gcc/config/riscv/riscv.opt  | 3 +++
 3 files changed, 18 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index a904893b9ed..3ba8f240977 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -164,6 +164,9 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zksed", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zksh",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"zkt",   ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zicboz",ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zicbop",ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"zve32x", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zve32f", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1049,6 +1052,9 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zksed",  _options::x_riscv_zk_subext, MASK_ZKSED},
   {"zksh",   _options::x_riscv_zk_subext, MASK_ZKSH},
   {"zkt",_options::x_riscv_zk_subext, MASK_ZKT},
+  {"zicboz", _options::x_riscv_zicmo_subext, MASK_ZICBOZ},
+  {"zicbom", _options::x_riscv_zicmo_subext, MASK_ZICBOM},
+  {"zicbop", _options::x_riscv_zicmo_subext, MASK_ZICBOP},
 
   {"zve32x",   _options::x_target_flags, MASK_VECTOR},
   {"zve32f",   _options::x_target_flags, MASK_VECTOR},
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 929e4e3a7c5..d17cf6ea18a 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -83,6 +83,15 @@ enum stack_protector_guard {
 #define TARGET_ZBC((riscv_zb_subext & MASK_ZBC) != 0)
 #define TARGET_ZBS((riscv_zb_subext & MASK_ZBS) != 0)
 
+#define MASK_ZICBOZ   (1 << 0)
+#define MASK_ZICBOM   (1 << 1)
+#define MASK_ZICBOP   (1 << 2)
+
+
+#define TARGET_ZICBOZ ((riscv_zicmo_subext & MASK_ZICBOZ) != 0)
+#define TARGET_ZICBOM ((riscv_zicmo_subext & MASK_ZICBOM) != 0)
+#define TARGET_ZICBOP ((riscv_zicmo_subext & MASK_ZICBOP) != 0)
+
 #define MASK_ZBKB (1 << 0)
 #define MASK_ZBKC (1 << 1)
 #define MASK_ZBKX (1 << 2)
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 9fffc08220d..2058a874d31 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -200,6 +200,9 @@ int riscv_zi_subext
 TargetVariable
 int riscv_zb_subext
 
+TargetVariable
+int riscv_zicmo_subext
+
 TargetVariable
 int riscv_zk_subext
 
-- 
2.17.1



[PATCH 0/3] RISC-V: Add Ratified Cache Management Operation ISA Extensions

2022-03-03 Thread yulong
From: yulong-plct 

This patchset adds support for three recently ratified RISC-V extensions:

-   Zicbom (Cache-Block Management Instructions)
-   Zicbop (Cache-Block Prefetch hint instructions)
-   Zicboz (Cache-Block Zero Instructions)

Patch 1: Add Zicbom/z/p mininal support
Patch 2: Add Zicbom/z/p instructions arch support
Patch 3: Add Zicbom/z/p instructions testcases

cf. 


*** BLURB HERE ***

yulong-plct (3):
  RISC-V: Add mininal support for Zicbo[mzp]
  RISC-V:Cache Management Operation instructions
  RISC-V:Cache Management Operation instructions testcases

 gcc/common/config/riscv/riscv-common.cc   |  6 +++
 gcc/config/riscv/predicates.md|  4 ++
 gcc/config/riscv/riscv-builtins.cc| 16 ++
 gcc/config/riscv/riscv-cmo.def| 17 ++
 gcc/config/riscv/riscv-ftypes.def |  4 ++
 gcc/config/riscv/riscv-opts.h |  9 
 gcc/config/riscv/riscv.md | 52 +++
 gcc/config/riscv/riscv.opt|  3 ++
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c | 21 
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c | 21 
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c | 23 
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 23 
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c |  9 
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c |  9 
 14 files changed, 217 insertions(+)
 create mode 100644 gcc/config/riscv/riscv-cmo.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c

-- 
2.17.1



Re: [PATCH] [i386] Optimize v4si broadcast for noavx512vl.

2022-03-03 Thread Hongtao Liu via Gcc-patches
On Fri, Mar 4, 2022 at 10:29 AM liuhongt via Gcc-patches
 wrote:
>
> This is incremental patch based on [1], it enables optimization as below
>
> -   vbroadcastss.LC1(%rip), %xmm0
> +   movl$-45, %edx
> +   vmovd   %edx, %xmm0
> +   vpshufd $0, %xmm0, %xmm0
>
> According to microbenchmark, it's faster than broadcast from memory.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591162.html.
>
> Bootstrapped and regtest on x86_64-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/104704
> * config/i386/sse.md (*vec_dupv4si): Add alternative $r and
> corresponding post_reload splitter.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr100865-8a.c: Adjust testcase.
> * gcc.target/i386/pr100865-8c.c: Ditto.
> * gcc.target/i386/pr100865-9c.c: Ditto.
> ---
>  gcc/config/i386/sse.md  | 41 -
>  gcc/testsuite/gcc.target/i386/pr100865-8a.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-8c.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-9c.c |  2 +-
>  4 files changed, 35 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 3066ea3734a..d124545aa5d 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -25121,20 +25121,43 @@ (define_insn "vec_dupv4sf"
> (set_attr "mode" "V4SF")])
>
>  (define_insn "*vec_dupv4si"
> -  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
> +  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x,v")
> (vec_duplicate:V4SI
> - (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
> + (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0,$r")))]
>"TARGET_SSE"
>"@
> %vpshufd\t{$0, %1, %0|%0, %1, 0}
> vbroadcastss\t{%1, %0|%0, %1}
> -   shufps\t{$0, %0, %0|%0, %0, 0}"
> -  [(set_attr "isa" "sse2,avx,noavx")
> -   (set_attr "type" "sselog1,ssemov,sselog1")
> -   (set_attr "length_immediate" "1,0,1")
> -   (set_attr "prefix_extra" "0,1,*")
> -   (set_attr "prefix" "maybe_vex,maybe_evex,orig")
> -   (set_attr "mode" "TI,V4SF,V4SF")])
> +   shufps\t{$0, %0, %0|%0, %0, 0}
> +   #"
> +  [(set_attr "isa" "sse2,avx,noavx,noavx512vl")
> +   (set_attr "type" "sselog1,ssemov,sselog1,sselog1")
> +   (set_attr "length_immediate" "1,0,1,1")
> +   (set_attr "prefix_extra" "0,1,*,0")
> +   (set_attr "prefix" "maybe_vex,maybe_evex,orig,maybe_vex")
> +   (set_attr "mode" "TI,V4SF,V4SF,TI")
> +   (set (attr "preferred_for_speed")
> + (cond [(eq_attr "alternative" "3")
> + (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC")
> +  ]
> +  (symbol_ref "true")))])
> +
> +(define_split
> +  [(set (match_operand:V4SI 0 "sse_reg_operand")
> +   (vec_duplicate:V4SI
> + (match_operand:SI 1 "general_reg_operand")))]
> +  "TARGET_SSE && reload_completed
> +   /* Disable this splitter if avx512vl_vec_dup_gprv4si insn is
> +  available, because then we can broadcast from GPRs directly.  */
> +   && !TARGET_AVX512VL"
> +  [(const_int 0)]
> +{
> +  emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]),
> +   CONST0_RTX (V4SImode),
> +   gen_lowpart (SImode, operands[1])));
> +  emit_insn (gen_vec_duplicatev4si (operands[0], operands[0]));
> +  DONE;
> +})
>
>  (define_insn "*vec_dupv2di"
>[(set (match_operand:V2DI 0 "register_operand" "=x,v,v,x")
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8a.c 
> b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> index 911b14d4a25..544a14db6f7 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> @@ -20,5 +20,5 @@ foo (void)
>  array[i] = MK_CONST128_BROADCAST_SIGNED (-45);
>  }
>
> -/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
> \]+\[^\n\]*, %xmm\[0-9\]+" 1 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
> \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
>  /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } 
> */
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8c.c 
> b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> index 00682edb8c9..efee0488614 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> @@ -3,5 +3,5 @@
>
>  #include "pr100865-8a.c"
>
> -/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, 
> %xmm\[0-9\]+" 1 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, 
> %xmm\[0-9\]+" 1 } } */
>  /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } 
> */
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9c.c 
> b/gcc/testsuite/gcc.target/i386/pr100865-9c.c
> index 8ffcdc1629d..e6f25902c1d 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-9c.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-9c.c
> @@ -3,5 +3,5 

[PATCH] [i386] Optimize v4si broadcast for noavx512vl.

2022-03-03 Thread liuhongt via Gcc-patches
This is incremental patch based on [1], it enables optimization as below

-   vbroadcastss.LC1(%rip), %xmm0
+   movl$-45, %edx
+   vmovd   %edx, %xmm0
+   vpshufd $0, %xmm0, %xmm0

According to microbenchmark, it's faster than broadcast from memory.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591162.html.

Bootstrapped and regtest on x86_64-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/104704
* config/i386/sse.md (*vec_dupv4si): Add alternative $r and
corresponding post_reload splitter.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr100865-8a.c: Adjust testcase.
* gcc.target/i386/pr100865-8c.c: Ditto.
* gcc.target/i386/pr100865-9c.c: Ditto.
---
 gcc/config/i386/sse.md  | 41 -
 gcc/testsuite/gcc.target/i386/pr100865-8a.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-8c.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-9c.c |  2 +-
 4 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 3066ea3734a..d124545aa5d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -25121,20 +25121,43 @@ (define_insn "vec_dupv4sf"
(set_attr "mode" "V4SF")])
 
 (define_insn "*vec_dupv4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
+  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x,v")
(vec_duplicate:V4SI
- (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
+ (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0,$r")))]
   "TARGET_SSE"
   "@
%vpshufd\t{$0, %1, %0|%0, %1, 0}
vbroadcastss\t{%1, %0|%0, %1}
-   shufps\t{$0, %0, %0|%0, %0, 0}"
-  [(set_attr "isa" "sse2,avx,noavx")
-   (set_attr "type" "sselog1,ssemov,sselog1")
-   (set_attr "length_immediate" "1,0,1")
-   (set_attr "prefix_extra" "0,1,*")
-   (set_attr "prefix" "maybe_vex,maybe_evex,orig")
-   (set_attr "mode" "TI,V4SF,V4SF")])
+   shufps\t{$0, %0, %0|%0, %0, 0}
+   #"
+  [(set_attr "isa" "sse2,avx,noavx,noavx512vl")
+   (set_attr "type" "sselog1,ssemov,sselog1,sselog1")
+   (set_attr "length_immediate" "1,0,1,1")
+   (set_attr "prefix_extra" "0,1,*,0")
+   (set_attr "prefix" "maybe_vex,maybe_evex,orig,maybe_vex")
+   (set_attr "mode" "TI,V4SF,V4SF,TI")
+   (set (attr "preferred_for_speed")
+ (cond [(eq_attr "alternative" "3")
+ (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC")
+  ]
+  (symbol_ref "true")))])
+
+(define_split
+  [(set (match_operand:V4SI 0 "sse_reg_operand")
+   (vec_duplicate:V4SI
+ (match_operand:SI 1 "general_reg_operand")))]
+  "TARGET_SSE && reload_completed
+   /* Disable this splitter if avx512vl_vec_dup_gprv4si insn is
+  available, because then we can broadcast from GPRs directly.  */
+   && !TARGET_AVX512VL"
+  [(const_int 0)]
+{
+  emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]),
+   CONST0_RTX (V4SImode),
+   gen_lowpart (SImode, operands[1])));
+  emit_insn (gen_vec_duplicatev4si (operands[0], operands[0]));
+  DONE;
+})
 
 (define_insn "*vec_dupv2di"
   [(set (match_operand:V2DI 0 "register_operand" "=x,v,v,x")
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8a.c 
b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
index 911b14d4a25..544a14db6f7 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-8a.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
@@ -20,5 +20,5 @@ foo (void)
 array[i] = MK_CONST128_BROADCAST_SIGNED (-45);
 }
 
-/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
\]+\[^\n\]*, %xmm\[0-9\]+" 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
\]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8c.c 
b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
index 00682edb8c9..efee0488614 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-8c.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
@@ -3,5 +3,5 @@
 
 #include "pr100865-8a.c"
 
-/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 
1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 
1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9c.c 
b/gcc/testsuite/gcc.target/i386/pr100865-9c.c
index 8ffcdc1629d..e6f25902c1d 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-9c.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-9c.c
@@ -3,5 +3,5 @@
 
 #include "pr100865-9a.c"
 
-/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 
1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, %xmm\[0-9\]+" 
1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } 

Re: [PATCH] x86: Always return pseudo register in ix86_gen_scratch_sse_rtx

2022-03-03 Thread Hongtao Liu via Gcc-patches
On Thu, Mar 3, 2022 at 10:22 PM H.J. Lu via Gcc-patches
 wrote:
>
> ix86_gen_scratch_sse_rtx returns XMM7/XMM15/XMM31 as a scratch vector
> register to prevent RTL optimizers from removing vector register.  It
> introduces a conflict with explicit XMM7/XMM15/XMM31 usage and when it
> is called by RTL optimizers, it may introduce conflicting usages of
> XMM7/XMM15/XMM31.
>
> Change ix86_gen_scratch_sse_rtx to always return a pseudo register and
> xfail x86 tests which are optimized with a hard scratch register.
LGTM.
>
> gcc/
>
> PR target/104704
> * config/i386/i386.cc (ix86_gen_scratch_sse_rtx): Always return
> a pseudo register.
>
> gcc/testsuite/
>
> PR target/104704
> * gcc.target/i386/incoming-11.c: Xfail.
> * gcc.target/i386/pieces-memset-3.c: Likewise.
> * gcc.target/i386/pieces-memset-37.c: Likewise.
> * gcc.target/i386/pieces-memset-39.c: Likewise.
> * gcc.target/i386/pieces-memset-46.c: Likewise.
> * gcc.target/i386/pieces-memset-47.c: Likewise.
> * gcc.target/i386/pieces-memset-48.c: Likewise.
> * gcc.target/i386/pr90773-5.c: Likewise.
> * gcc.target/i386/pr90773-14.c: Likewise.
> * gcc.target/i386/pr90773-17.c: Likewise.
> * gcc.target/i386/pr100865-8a.c: Likewise.
> * gcc.target/i386/pr100865-8c.c: Likewise.
> * gcc.target/i386/pr100865-9c.c: Likewise.
> * gcc.target/i386/pieces-memset-21.c: Always expect vzeroupper.
> * gcc.target/i386/pr82941-1.c: Likewise.
> * gcc.target/i386/pr82942-1.c: Likewise.
> * gcc.target/i386/pr82990-1.c: Likewise.
> * gcc.target/i386/pr82990-3.c: Likewise.
> * gcc.target/i386/pr82990-5.c: Likewise.
> * gcc.target/i386/pr100865-11b.c: Expect vmovdqa instead of
> vmovdqa64.
> * gcc.target/i386/pr100865-12b.c: Likewise.
> * gcc.target/i386/pr100865-8b.c: Likewise.
> * gcc.target/i386/pr100865-9b.c: Likewise.
> * gcc.target/i386/pr104704-1.c: New test.
> * gcc.target/i386/pr104704-2.c: Likewise.
> * gcc.target/i386/pr104704-3.c: Likewise.
> * gcc.target/i386/pr104704-4.c: Likewise.
> * gcc.target/i386/pr104704-5.c: Likewise.
> * gcc.target/i386/pr104704-6.c: Likewise.
> ---
>  gcc/config/i386/i386.cc   | 19 +--
>  gcc/testsuite/gcc.target/i386/incoming-11.c   |  2 +-
>  .../gcc.target/i386/pieces-memset-21.c|  3 +-
>  .../gcc.target/i386/pieces-memset-3.c |  4 +--
>  .../gcc.target/i386/pieces-memset-37.c|  4 +--
>  .../gcc.target/i386/pieces-memset-39.c|  4 +--
>  .../gcc.target/i386/pieces-memset-46.c|  2 +-
>  .../gcc.target/i386/pieces-memset-47.c|  2 +-
>  .../gcc.target/i386/pieces-memset-48.c|  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-11b.c  |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-12b.c  |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-8a.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-8b.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-8c.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-9b.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-9c.c   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr104704-1.c| 33 +++
>  gcc/testsuite/gcc.target/i386/pr104704-2.c| 33 +++
>  gcc/testsuite/gcc.target/i386/pr104704-3.c| 33 +++
>  gcc/testsuite/gcc.target/i386/pr104704-4.c| 33 +++
>  gcc/testsuite/gcc.target/i386/pr104704-5.c| 33 +++
>  gcc/testsuite/gcc.target/i386/pr104704-6.c| 33 +++
>  gcc/testsuite/gcc.target/i386/pr82941-1.c |  3 +-
>  gcc/testsuite/gcc.target/i386/pr82942-1.c |  3 +-
>  gcc/testsuite/gcc.target/i386/pr82990-1.c |  3 +-
>  gcc/testsuite/gcc.target/i386/pr82990-3.c |  3 +-
>  gcc/testsuite/gcc.target/i386/pr82990-5.c |  3 +-
>  gcc/testsuite/gcc.target/i386/pr90773-14.c|  2 +-
>  gcc/testsuite/gcc.target/i386/pr90773-17.c|  2 +-
>  gcc/testsuite/gcc.target/i386/pr90773-5.c |  2 +-
>  30 files changed, 225 insertions(+), 50 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-6.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b2bf90576d5..95219902694 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -23786,24 +23786,7 @@ ix86_optab_supported_p (int op, machine_mode mode1, 
> machine_mode,
>  rtx
>  ix86_gen_scratch_sse_rtx (machine_mode mode)
>  {
> -  if (TARGET_SSE && !lra_in_progress)
> -{
> -  unsigned int regno;
> 

Re: [PATCH] call mark_dfs_back_edges() before testing EDGE_DFS_BACK [PR104761]

2022-03-03 Thread Martin Sebor via Gcc-patches

On 3/3/22 01:01, Jakub Jelinek wrote:

On Wed, Mar 02, 2022 at 04:15:09PM -0700, Martin Sebor via Gcc-patches wrote:

The -Wdangling-pointer code tests the EDGE_DFS_BACK but the pass never
calls the mark_dfs_back_edges() function that initializes the bit (I
didn't know about it).  As a result the bit is not set when expected,
which can cause false positives under the right conditions.


Not a review because I also had to look up what computes EDGE_DFS_BACK,
so I don't feel the right person to ack the patch.

So, just a few questions.

The code in question is:
   auto gsi = gsi_for_stmt (use_stmt);

   auto_bitmap visited;

   /* A use statement in the last basic block in a function or one that
  falls through to it is after any other prior clobber of the used
  variable unless it's followed by a clobber of the same variable. */
   basic_block bb = use_bb;
   while (bb != inval_bb
  && single_succ_p (bb)
  && !(single_succ_edge (bb)->flags & (EDGE_EH|EDGE_DFS_BACK)))
 {
   if (!bitmap_set_bit (visited, bb->index))
 /* Avoid cycles. */
 return true;

   for (; !gsi_end_p (gsi); gsi_next_nondebug ())
 {
   gimple *stmt = gsi_stmt (gsi);
   if (gimple_clobber_p (stmt))
 {
   if (clobvar == gimple_assign_lhs (stmt))
 /* The use is followed by a clobber.  */
 return false;
 }
 }

   bb = single_succ (bb);
   gsi = gsi_start_bb (bb);
 }

1) shouldn't it give up for EDGE_ABNORMAL too?  I mean, e.g.
following a non-local goto forced edge from a noreturn call
to a non-local label (if there is just one) doesn't seem
right to me


Possibly yes.  I can add it but I don't have a lot of experience with
these bits so if you can suggest a test case to exercise this that
would be helpful.


2) if EDGE_DFS_BACK is computed and 1) is done, is there any
reason why you need 2 levels of protection, i.e. the EDGE_DFS_BACK
check as well as the visited bitmap (and having them use
very different answers, if EDGE_DFS_BACK is seen, the function
will return false, if visited bitmap has a bb, it will return true)?
Can't the visited bitmap go away?


Possibly.  As I said above, I don't have enough experience with these
bits to make (and test) the changes quickly, or enough bandwidth to
come up to speed on them.  Please feel free to make these improvements.


3) I'm concerned about compile time with the above, consider you have
100 use_stmts and 100 corresponding inv_stmts and in each
case you enter this loop and go through a series of very large basic
blocks that don't clobber those stmts; shouldn't it bail out
(return false) after walking some param
controlled number of non-debug stmts (say 1000 by default)?
There is an early exit if
if (dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb))
  return true;
(I admit I haven't read the code what happens if there is more than
one clobber for the same variable)


I tend to agree that the loop is less than optimal.  But before
imposing another arbitrary limit my preference would be to see if
it could be made more efficient.  Without thinking about it too hard,
it seems that with some efficient lookup table a single traversal
per function should be sufficient.  The first time through populate
the table with the clobbered variables along the path from use_bb
and each subsequent time just look up clobvar in the table.

But I have to use up the rest of my 2021 PTO next week before I lose
it and I don't expect to have the cycles to work on this anytime soon.

Martin




The attached patch adds a call to the warning pass to initialize
the bit.  Tested on x86_64-linux.

Martin



Call mark_dfs_back_edges before testing EDGE_DFS_BACK [PR104761].

Resolves:
PR middle-end/104761 - bogus -Wdangling-pointer with cleanup and infinite loop

gcc/ChangeLog:

PR middle-end/104761
* gimple-ssa-warn-access.cc (pass_waccess::execute): Call
mark_dfs_back_edges.

gcc/testsuite/ChangeLog:

PR middle-end/104761
* g++.dg/warn/Wdangling-pointer-4.C: New test.
* gcc.dg/Wdangling-pointer-4.c: New test.

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index b7cdad517b3..b519712d76e 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -47,7 +47,7 @@
  #include "tree-object-size.h"
  #include "tree-ssa-strlen.h"
  #include "calls.h"
-#include "cfgloop.h"
+#include "cfganal.h"
  #include "intl.h"
  #include "gimple-range.h"
  #include "stringpool.h"
@@ -4710,6 +4710,9 @@ pass_waccess::execute (function *fun)
calculate_dominance_info (CDI_DOMINATORS);
calculate_dominance_info (CDI_POST_DOMINATORS);
  
+  /* Set or clear EDGE_DFS_BACK bits on back edges.  */

+  

Re: [PATCH] libgcc: allow building float128 libraries on FreeBSD

2022-03-03 Thread Segher Boessenkool
Hi!

On Mon, Feb 21, 2022 at 12:37:56AM +0100, pku...@freebsd.org wrote:
> From: Piotr Kubaj 
> 
> While FreeBSD currently uses 64-bit long double, there should be no
> problem with adding support for float128.
> 
> Signed-off-by: Piotr Kubaj 

This needs a changelog.  The entry for configure should just say
"Regenerate." btw :-)

So, I had some problems with this.  Firstly, the test does not make very
much sense, or rather, it is very indirect.  But you copied this from
the Linux code, so :-)

Secondly, I needed to think about if this code should just be shared
between the two then.  Copied code easily turns into a maintenance
nightmare, after all.

And then I just forgot, it fell off my to-do lists, sorry.

The patch is okay for trunk, with a suitable changelog.  Thanks!  Do you
want backports for this as well?


Segher


New German PO file for 'gcc' (version 12.1-b20220213)

2022-03-03 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

https://translationproject.org/latest/gcc/de.po

(This file, 'gcc-12.1-b20220213.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[committed] libstdc++: Use non-debug vector in constexpr test [PR104748]

2022-03-03 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.

-- >8 --

The std::__debug::vector isn't usable in constant expressions, so this
test fails in debug mode. Until the debug vector is fixed we can just
make the test use the non-debug one.

libstdc++-v3/ChangeLog:

PR libstdc++/104748
* testsuite/std/ranges/adaptors/all.cc: Use non-debug vector for
constexpr test.
---
 .../testsuite/std/ranges/adaptors/all.cc | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/all.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/all.cc
index e457462825d..a4924b9909f 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/all.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/all.cc
@@ -164,20 +164,26 @@ test07()
 constexpr bool
 test08()
 {
+#ifdef _GLIBCXX_DEBUG
+  using std::_GLIBCXX_STD_C::vector;
+#else
+  using std::vector;
+#endif
+
   // Verify P2415R2 "What is a view?" changes.
   // In particular, rvalue non-view non-borrowed ranges are now viewable.
-  static_assert(ranges::viewable_range&&>);
-  static_assert(!ranges::viewable_range&&>);
+  static_assert(ranges::viewable_range&&>);
+  static_assert(!ranges::viewable_range&&>);
 
   static_assert(ranges::viewable_range&>);
   static_assert(ranges::viewable_range&>);
   static_assert(!ranges::viewable_range&&>);
   static_assert(!ranges::viewable_range&&>);
 
-  using type = views::all_t&&>;
-  using type = ranges::owning_view>;
+  using type = views::all_t&&>;
+  using type = ranges::owning_view>;
 
-  std::same_as auto v = std::vector{{1,2,3}} | views::all;
+  std::same_as auto v = vector{{1,2,3}} | views::all;
 
   VERIFY( ranges::equal(v, (int[]){1,2,3}) );
   VERIFY( ranges::size(v) == 3 );
-- 
2.34.1



[committed] libstdc++: Fix test failure on AIX

2022-03-03 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.

-- >8 --

This fixes a test failure due to a non-reserved name in an AIX system
header (included via ). That name clashes with one of the
names we check our own headers for, so skip checking that name on AIX.

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc (func): Undef on AIX.
---
 libstdc++-v3/testsuite/17_intro/names.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 1f7f83da8fa..ede2fe8caa7 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -203,6 +203,8 @@
 #undef y
 //  defines vario::v
 #undef v
+//  defines trb::func and cputime_tmr::func
+#undef func
 #endif
 
 #ifdef __APPLE__
-- 
2.34.1



[committed] libstdc++: Implement std::strong_order for floating-point types [PR96526]

2022-03-03 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux (-m32/-m64), powerpc64-linux (-m32/-m64),
powerpc64le-linux, powerpc-aix (maix32/-maix64/-mlong-double-128).

Pushed to trunk. I'm inclined to backport this to gcc-11 after some soak
time on trunk (but not gcc-10, because it needs __builtin_bit_cast).

-- >8 --

This removes a FIXME in , defining the total order for
floating-point types. I originally opened PR96526 to request a new
compiler built-in to implement this, but now that we have std::bit_cast
it can be done entirely in the library.

The implementation is based on the glibc definitions of totalorder,
totalorderf, totalorderl etc.

I think this works for all the types that satisfy std::floating_point
today, and should also work for the types expected to be added by P1467
except for std::bfloat16_t. It also supports some additional types that
don't currently satisfy std::floating_point, such as __float80, but we
probably do want that to satisfy the concept for non-strict modes.

libstdc++-v3/ChangeLog:

PR libstdc++/96526
* libsupc++/compare (strong_order): Add missing support for
floating-point types.
* testsuite/18_support/comparisons/algorithms/strong_order_floats.cc:
New test.
---
 libstdc++-v3/libsupc++/compare| 253 +-
 .../algorithms/strong_order_floats.cc | 102 +++
 2 files changed, 351 insertions(+), 4 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/18_support/comparisons/algorithms/strong_order_floats.cc

diff --git a/libstdc++-v3/libsupc++/compare b/libstdc++-v3/libsupc++/compare
index e08a3ce922f..a8747207b23 100644
--- a/libstdc++-v3/libsupc++/compare
+++ b/libstdc++-v3/libsupc++/compare
@@ -642,7 +642,7 @@ namespace std
 template
   concept __strongly_ordered
= __adl_strong<_Tp, _Up>
- // FIXME: || floating_point>
+ || floating_point>
  || __cmp3way;
 
 template
@@ -667,6 +667,252 @@ namespace std
   friend class _Weak_order;
   friend class _Strong_fallback;
 
+  // Names for the supported floating-point representations.
+  enum class _Fp_fmt
+  {
+   _Binary16, _Binary32, _Binary64, _Binary128, // IEEE
+   _X86_80bit,  // x86 80-bit extended precision
+   _M68k_80bit, // m68k 80-bit extended precision
+   _Dbldbl, // IBM 128-bit double-double
+   // TODO: _Bfloat16,
+  };
+
+  // Identify the format used by a floating-point type.
+  template
+   static consteval _Fp_fmt
+   _S_fp_fmt() noexcept
+   {
+ using enum _Fp_fmt;
+
+ // Identify these formats first, then assume anything else is IEEE.
+ // N.B. ARM __fp16 alternative format can be handled as binary16.
+
+#ifdef __LONG_DOUBLE_IBM128__
+ if constexpr (__is_same(_Tp, long double))
+   return _Dbldbl;
+#elif defined __LONG_DOUBLE_IEEE128__ && defined __SIZEOF_IBM128__
+ if constexpr (__is_same(_Tp, __ibm128))
+   return _Dbldbl;
+#endif
+
+#if __LDBL_MANT_DIG__ == 64
+ if constexpr (__is_same(_Tp, long double))
+   return __LDBL_MIN_EXP__ == -16381 ? _X86_80bit : _M68k_80bit;
+#endif
+#ifdef __SIZEOF_FLOAT80__
+ if constexpr (__is_same(_Tp, __float80))
+   return _X86_80bit;
+#endif
+
+ constexpr int __width = sizeof(_Tp) * __CHAR_BIT__;
+
+ if constexpr (__width == 16)   // IEEE binary16 (or ARM fp16).
+   return _Binary16;
+ else if constexpr (__width == 32)  // IEEE binary32
+   return _Binary32;
+ else if constexpr (__width == 64)  // IEEE binary64
+   return _Binary64;
+ else if constexpr (__width == 128) // IEEE binary128
+   return _Binary128;
+   }
+
+  // So we don't need to include  and pollute the namespace.
+  using int64_t = __INT64_TYPE__;
+  using int32_t = __INT32_TYPE__;
+  using int16_t = __INT16_TYPE__;
+  using uint64_t = __UINT64_TYPE__;
+  using uint16_t = __UINT16_TYPE__;
+
+  // Used to unpack floating-point types that do not fit into an integer.
+  template
+   struct _Int
+   {
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+ uint64_t _M_lo;
+ _Tp _M_hi;
+#else
+ _Tp _M_hi;
+ uint64_t _M_lo;
+#endif
+
+ constexpr explicit
+ _Int(_Tp __hi, uint64_t __lo) noexcept : _M_hi(__hi)
+ { _M_lo = __lo; }
+
+ constexpr explicit
+ _Int(uint64_t __lo) noexcept : _M_hi(0)
+ { _M_lo = __lo; }
+
+ constexpr bool operator==(const _Int&) const = default;
+
+#if defined __hppa__ || (defined __mips__ && !defined __mips_nan2008)
+ consteval _Int
+ operator<<(int __n) const noexcept
+ {
+   // XXX this assumes n >= 64, which is true for the use below.
+   return _Int(static_cast<_Tp>(_M_lo << (__n - 64)), 0);
+ }
+#endif
+
+ constexpr _Int&
+ operator^=(const _Int& __rhs) noexcept
+ {
+   

Re: [PATCH] tree: Fix up warn_deprecated_use [PR104627]

2022-03-03 Thread Jason Merrill via Gcc-patches

On 3/1/22 14:33, Jakub Jelinek wrote:

Hi!

The r12-7287-g1b71bc7c8b18bd1b change improved the -Wdeprecated
warning for C++, but regressed it for C, in particular in
gcc.dg/deprecated.c testcase we now report a type that actually isn't
deprecated as deprecated instead of the one that is deprecated.

The following change tries to find the middle ground between what
we used to do before and what r12-7287 change does.
If TYPE_STUB_DECL (node) is non-NULL (that is what happens with
those C tests), then it will do what it used to do before (just smarter,
there is no need to lookup_attribute when it is called again a few lines
below this), if it is NULL, it will try
TYPE_STUB_DECL (TYPE_MAIN_VARIANT (node)) - what the deprecated-16.C
test needs.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK, thanks.


2022-03-01  Jakub Jelinek  

PR c/104627
* tree.cc (warn_deprecated_use): For types prefer to use node
and only use TYPE_MAIN_VARIANT (node) if TYPE_STUB_DECL (node) is
NULL.

--- gcc/tree.cc.jj  2022-02-18 12:38:06.172391744 +0100
+++ gcc/tree.cc 2022-02-28 13:17:57.223216010 +0100
@@ -12047,8 +12047,11 @@ warn_deprecated_use (tree node, tree att
attr = DECL_ATTRIBUTES (node);
else if (TYPE_P (node))
{
- tree decl = TYPE_STUB_DECL (TYPE_MAIN_VARIANT (node));
+ tree decl = TYPE_STUB_DECL (node);
  if (decl)
+   attr = TYPE_ATTRIBUTES (TREE_TYPE (decl));
+ else if ((decl = TYPE_STUB_DECL (TYPE_MAIN_VARIANT (node)))
+  != NULL_TREE)
{
  node = TREE_TYPE (decl);
  attr = TYPE_ATTRIBUTES (node);

Jakub





[PATCH] c++: merge default targs for function templates [PR65396]

2022-03-03 Thread Patrick Palka via Gcc-patches
We currently merge default template arguments for class templates, but
not for function templates.  This patch fixes this by splitting out the
argument merging logic in redeclare_class_template into a separate
function and using it in duplicate_decls as well.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/65396

gcc/cp/ChangeLog:

* cp-tree.h (merge_default_template_args): Declare.
* decl.cc (merge_default_template_args): Define, split out from
redeclare_class_template.
(duplicate_decls): Use it when merging member function template
and free function declarations.
* pt.cc (redeclare_class_template): Split out default argument
merging logic into merge_default_template_args.  Improve location
of a note when there's a template parameter kind mismatch.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/vt-34314.C: Adjust expected location of
"redeclared here" note.
* g++.dg/template/pr92440.C: Likewise.
* g++.old-deja/g++.pt/redecl1.C: Adjust expected location of
"redefinition of default argument" error.
* g++.dg/template/defarg23.C: New test.
* g++.dg/template/defarg23a.C: New test.
---
 gcc/cp/cp-tree.h|  1 +
 gcc/cp/decl.cc  | 60 -
 gcc/cp/pt.cc| 31 ++-
 gcc/testsuite/g++.dg/cpp0x/vt-34314.C   | 12 ++---
 gcc/testsuite/g++.dg/template/defarg23.C| 21 
 gcc/testsuite/g++.dg/template/defarg23a.C   | 24 +
 gcc/testsuite/g++.dg/template/pr92440.C |  4 +-
 gcc/testsuite/g++.old-deja/g++.pt/redecl1.C | 12 ++---
 8 files changed, 123 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/defarg23.C
 create mode 100644 gcc/testsuite/g++.dg/template/defarg23a.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 8a44218611f..ea53e2d0ef2 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6783,6 +6783,7 @@ extern void note_iteration_stmt_body_end  (bool);
 extern void determine_local_discriminator  (tree);
 extern int decls_match (tree, tree, bool = true);
 extern bool maybe_version_functions(tree, tree, bool);
+extern bool merge_default_template_args(tree, tree, bool);
 extern tree duplicate_decls(tree, tree,
 bool hiding = false,
 bool was_hidden = false);
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 23c06655bde..a0bce56c121 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1470,6 +1470,43 @@ duplicate_function_template_decls (tree newdecl, tree 
olddecl)
   return false;
 }
 
+/* OLD_PARMS is the innermost set of template parameters for some template
+   declaration, and NEW_PARMS is the corresponding set of template parameters
+   for a redeclaration of that template.  Merge the default arguments within
+   these two sets of parameters.  CLASS_P is true iff the template in
+   question is a class template.  */
+
+bool
+merge_default_template_args (tree new_parms, tree old_parms, bool class_p)
+{
+  gcc_checking_assert (TREE_VEC_LENGTH (new_parms)
+  == TREE_VEC_LENGTH (old_parms));
+  for (int i = 0; i < TREE_VEC_LENGTH (new_parms); i++)
+{
+  tree new_parm = TREE_VALUE (TREE_VEC_ELT (new_parms, i));
+  tree old_parm = TREE_VALUE (TREE_VEC_ELT (old_parms, i));
+  tree& new_default = TREE_PURPOSE (TREE_VEC_ELT (new_parms, i));
+  tree& old_default = TREE_PURPOSE (TREE_VEC_ELT (old_parms, i));
+  if (new_default != NULL_TREE && old_default != NULL_TREE)
+   {
+ auto_diagnostic_group d;
+ error ("redefinition of default argument for %q+#D", new_parm);
+ inform (DECL_SOURCE_LOCATION (old_parm),
+ "original definition appeared here");
+ return false;
+   }
+  else if (new_default != NULL_TREE)
+   /* Update the previous template parameters (which are the ones
+  that will really count) with the new default value.  */
+   old_default = new_default;
+  else if (class_p && old_default != NULL_TREE)
+   /* Update the new parameters, too; they'll be used as the
+  parameters for any members.  */
+   new_default = old_default;
+}
+  return true;
+}
+
 /* If NEWDECL is a redeclaration of OLDDECL, merge the declarations.
If the redeclaration is invalid, a diagnostic is issued, and the
error_mark_node is returned.  Otherwise, OLDDECL is returned.
@@ -1990,7 +2027,23 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
hiding, bool was_hidden)
 template shall be specified on the initial declaration
 of the member function within the class template.  */
  || CLASSTYPE_TEMPLATE_INFO (CP_DECL_CONTEXT (olddecl
- 

Re: [PATCH] call mark_dfs_back_edges() before testing EDGE_DFS_BACK [PR104761]

2022-03-03 Thread Jeff Law via Gcc-patches




On 3/3/2022 1:01 AM, Jakub Jelinek wrote:

On Wed, Mar 02, 2022 at 04:15:09PM -0700, Martin Sebor via Gcc-patches wrote:

The -Wdangling-pointer code tests the EDGE_DFS_BACK but the pass never
calls the mark_dfs_back_edges() function that initializes the bit (I
didn't know about it).  As a result the bit is not set when expected,
which can cause false positives under the right conditions.

Not a review because I also had to look up what computes EDGE_DFS_BACK,
so I don't feel the right person to ack the patch.

So, just a few questions.

The code in question is:
   auto gsi = gsi_for_stmt (use_stmt);

   auto_bitmap visited;

   /* A use statement in the last basic block in a function or one that
  falls through to it is after any other prior clobber of the used
  variable unless it's followed by a clobber of the same variable. */
   basic_block bb = use_bb;
   while (bb != inval_bb
  && single_succ_p (bb)
  && !(single_succ_edge (bb)->flags & (EDGE_EH|EDGE_DFS_BACK)))
 {
   if (!bitmap_set_bit (visited, bb->index))
 /* Avoid cycles. */
 return true;

   for (; !gsi_end_p (gsi); gsi_next_nondebug ())
 {
   gimple *stmt = gsi_stmt (gsi);
   if (gimple_clobber_p (stmt))
 {
   if (clobvar == gimple_assign_lhs (stmt))
 /* The use is followed by a clobber.  */
 return false;
 }
 }

   bb = single_succ (bb);
   gsi = gsi_start_bb (bb);
 }

1) shouldn't it give up for EDGE_ABNORMAL too?  I mean, e.g.
following a non-local goto forced edge from a noreturn call
to a non-local label (if there is just one) doesn't seem
right to me

I think so.


2) if EDGE_DFS_BACK is computed and 1) is done, is there any
reason why you need 2 levels of protection, i.e. the EDGE_DFS_BACK
check as well as the visited bitmap (and having them use
very different answers, if EDGE_DFS_BACK is seen, the function
will return false, if visited bitmap has a bb, it will return true)?
Can't the visited bitmap go away?
I would think so.  Given how this code is written, I don't see any way 
other than cycles to visit a BB more than once and with backedges 
marked, there shouldn't be a way to get into a cycle if we ignore backedges.



3) I'm concerned about compile time with the above, consider you have
100 use_stmts and 100 corresponding inv_stmts and in each
case you enter this loop and go through a series of very large basic
blocks that don't clobber those stmts; shouldn't it bail out
(return false) after walking some param
controlled number of non-debug stmts (say 1000 by default)?
There is an early exit if
if (dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb))
  return true;
(I admit I haven't read the code what happens if there is more than
one clobber for the same variable)

I'll let Martin comment on the time complexity question

I think #1 and #2 can be addressed as followups.

jeff



Re: [PATCH] libgcc: allow building float128 libraries on FreeBSD

2022-03-03 Thread David Edelsohn via Gcc-patches
I don't have any objection, but the patch is FreeBSD-specific.  You
are sending the patch from the FreeBSD organization, but I don't know
the authority structure within the organization.   Andreas Tobler is
the FreeBSD maintainer for GCC, but I don't know his current status.

Thanks, David

On Sun, Feb 20, 2022 at 6:38 PM  wrote:
>
> From: Piotr Kubaj 
>
> While FreeBSD currently uses 64-bit long double, there should be no
> problem with adding support for float128.
>
> Signed-off-by: Piotr Kubaj 
> ---
>  libgcc/configure| 22 ++
>  libgcc/configure.ac | 11 +++
>  2 files changed, 33 insertions(+)
>
> diff --git a/libgcc/configure b/libgcc/configure
> index 4919a56f518..334d20d1fb1 100755
> --- a/libgcc/configure
> +++ b/libgcc/configure
> @@ -5300,6 +5300,28 @@ fi
>  { $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $libgcc_cv_powerpc_3_1_float128_hw" >&5
>  $as_echo "$libgcc_cv_powerpc_3_1_float128_hw" >&6; }
>CFLAGS="$saved_CFLAGS"
> +;;
> +powerpc*-*-freebsd*)
> +  saved_CFLAGS="$CFLAGS"
> +  CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128"
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PowerPC ISA 2.06 to 
> build __float128 libraries" >&5
> +$as_echo_n "checking for PowerPC ISA 2.06 to build __float128 libraries... " 
> >&6; }
> +if ${libgcc_cv_powerpc_float128+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +vector double dadd (vector double a, vector double b) { return a + b; }
> +_ACEOF
> +if ac_fn_c_try_compile "$LINENO"; then :
> +  libgcc_cv_powerpc_float128=yes
> +else
> +  libgcc_cv_powerpc_float128=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $libgcc_cv_powerpc_float128" >&5
> +$as_echo "$libgcc_cv_powerpc_float128" >&6; }
>  esac
>
>  # Collect host-machine-specific information.
> diff --git a/libgcc/configure.ac b/libgcc/configure.ac
> index 13a80b2551b..99ec5d405a4 100644
> --- a/libgcc/configure.ac
> +++ b/libgcc/configure.ac
> @@ -483,6 +483,17 @@ powerpc*-*-linux*)
>  [libgcc_cv_powerpc_3_1_float128_hw=yes],
>  [libgcc_cv_powerpc_3_1_float128_hw=no])])
>CFLAGS="$saved_CFLAGS"
> +;;
> +powerpc*-*-freebsd*)
> +  saved_CFLAGS="$CFLAGS"
> +  CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128"
> +  AC_CACHE_CHECK([for PowerPC ISA 2.06 to build __float128 libraries],
> + [libgcc_cv_powerpc_float128],
> + [AC_COMPILE_IFELSE(
> +[AC_LANG_SOURCE([vector double dadd (vector double a, vector double b) { 
> return a + b; }])],
> +[libgcc_cv_powerpc_float128=yes],
> +[libgcc_cv_powerpc_float128=no])])
> +  CFLAGS="$saved_CFLAGS"
>  esac
>
>  # Collect host-machine-specific information.
> --
> 2.35.1
>


[PATCH][pushed] configure: enable plugin support for ld.mold

2022-03-03 Thread Martin Liška

Hi.

There's another part of mold enablement.

Going to push it.
Martin

gcc/ChangeLog:

* configure.ac: Now ld.mold support LTO plugin API, use it.
* configure: Regenerate.
---
 gcc/configure| 2 ++
 gcc/configure.ac | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/gcc/configure b/gcc/configure
index 22eb3451e3d..6f5fc20fcf3 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -26037,6 +26037,8 @@ if test -f liblto_plugin.la; then
 # Allow -fuse-linker-plugin to enable plugin support in GNU gold 2.20.
 elif test "$ld_is_gold" = yes -a "$ld_vers_major" -eq 2 -a 
"$ld_vers_minor" -eq 20; then
   gcc_cv_lto_plugin=1
+elif test "$ld_is_mold" = yes; then
+  gcc_cv_lto_plugin=1
 fi
   fi
 
diff --git a/gcc/configure.ac b/gcc/configure.ac

index 20da90901f8..3d85d33bc80 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4278,6 +4278,8 @@ changequote([,])dnl
 # Allow -fuse-linker-plugin to enable plugin support in GNU gold 2.20.
 elif test "$ld_is_gold" = yes -a "$ld_vers_major" -eq 2 -a 
"$ld_vers_minor" -eq 20; then
   gcc_cv_lto_plugin=1
+elif test "$ld_is_mold" = yes; then
+  gcc_cv_lto_plugin=1
 fi
   fi
 
--

2.35.1



[PATCH] x86: Always return pseudo register in ix86_gen_scratch_sse_rtx

2022-03-03 Thread H.J. Lu via Gcc-patches
ix86_gen_scratch_sse_rtx returns XMM7/XMM15/XMM31 as a scratch vector
register to prevent RTL optimizers from removing vector register.  It
introduces a conflict with explicit XMM7/XMM15/XMM31 usage and when it
is called by RTL optimizers, it may introduce conflicting usages of
XMM7/XMM15/XMM31.

Change ix86_gen_scratch_sse_rtx to always return a pseudo register and
xfail x86 tests which are optimized with a hard scratch register.

gcc/

PR target/104704
* config/i386/i386.cc (ix86_gen_scratch_sse_rtx): Always return
a pseudo register.

gcc/testsuite/

PR target/104704
* gcc.target/i386/incoming-11.c: Xfail.
* gcc.target/i386/pieces-memset-3.c: Likewise.
* gcc.target/i386/pieces-memset-37.c: Likewise.
* gcc.target/i386/pieces-memset-39.c: Likewise.
* gcc.target/i386/pieces-memset-46.c: Likewise.
* gcc.target/i386/pieces-memset-47.c: Likewise.
* gcc.target/i386/pieces-memset-48.c: Likewise.
* gcc.target/i386/pr90773-5.c: Likewise.
* gcc.target/i386/pr90773-14.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr100865-8a.c: Likewise.
* gcc.target/i386/pr100865-8c.c: Likewise.
* gcc.target/i386/pr100865-9c.c: Likewise.
* gcc.target/i386/pieces-memset-21.c: Always expect vzeroupper.
* gcc.target/i386/pr82941-1.c: Likewise.
* gcc.target/i386/pr82942-1.c: Likewise.
* gcc.target/i386/pr82990-1.c: Likewise.
* gcc.target/i386/pr82990-3.c: Likewise.
* gcc.target/i386/pr82990-5.c: Likewise.
* gcc.target/i386/pr100865-11b.c: Expect vmovdqa instead of
vmovdqa64.
* gcc.target/i386/pr100865-12b.c: Likewise.
* gcc.target/i386/pr100865-8b.c: Likewise.
* gcc.target/i386/pr100865-9b.c: Likewise.
* gcc.target/i386/pr104704-1.c: New test.
* gcc.target/i386/pr104704-2.c: Likewise.
* gcc.target/i386/pr104704-3.c: Likewise.
* gcc.target/i386/pr104704-4.c: Likewise.
* gcc.target/i386/pr104704-5.c: Likewise.
* gcc.target/i386/pr104704-6.c: Likewise.
---
 gcc/config/i386/i386.cc   | 19 +--
 gcc/testsuite/gcc.target/i386/incoming-11.c   |  2 +-
 .../gcc.target/i386/pieces-memset-21.c|  3 +-
 .../gcc.target/i386/pieces-memset-3.c |  4 +--
 .../gcc.target/i386/pieces-memset-37.c|  4 +--
 .../gcc.target/i386/pieces-memset-39.c|  4 +--
 .../gcc.target/i386/pieces-memset-46.c|  2 +-
 .../gcc.target/i386/pieces-memset-47.c|  2 +-
 .../gcc.target/i386/pieces-memset-48.c|  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-11b.c  |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-12b.c  |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-8a.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-8b.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-8c.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-9b.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr100865-9c.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr104704-1.c| 33 +++
 gcc/testsuite/gcc.target/i386/pr104704-2.c| 33 +++
 gcc/testsuite/gcc.target/i386/pr104704-3.c| 33 +++
 gcc/testsuite/gcc.target/i386/pr104704-4.c| 33 +++
 gcc/testsuite/gcc.target/i386/pr104704-5.c| 33 +++
 gcc/testsuite/gcc.target/i386/pr104704-6.c| 33 +++
 gcc/testsuite/gcc.target/i386/pr82941-1.c |  3 +-
 gcc/testsuite/gcc.target/i386/pr82942-1.c |  3 +-
 gcc/testsuite/gcc.target/i386/pr82990-1.c |  3 +-
 gcc/testsuite/gcc.target/i386/pr82990-3.c |  3 +-
 gcc/testsuite/gcc.target/i386/pr82990-5.c |  3 +-
 gcc/testsuite/gcc.target/i386/pr90773-14.c|  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-17.c|  2 +-
 gcc/testsuite/gcc.target/i386/pr90773-5.c |  2 +-
 30 files changed, 225 insertions(+), 50 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104704-6.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b2bf90576d5..95219902694 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23786,24 +23786,7 @@ ix86_optab_supported_p (int op, machine_mode mode1, 
machine_mode,
 rtx
 ix86_gen_scratch_sse_rtx (machine_mode mode)
 {
-  if (TARGET_SSE && !lra_in_progress)
-{
-  unsigned int regno;
-  if (TARGET_64BIT)
-   {
- /* In 64-bit mode, use XMM31 to avoid vzeroupper and always
-use XMM31 for CSE.  */
- if (ix86_hard_regno_mode_ok (LAST_EXT_REX_SSE_REG, mode))
-   regno = LAST_EXT_REX_SSE_REG;
- else

Re: [PATCH] libgcc: allow building float128 libraries on FreeBSD

2022-03-03 Thread Piotr Kubaj
Bumping. Is there anything wrong with this patch?

On 22-02-21 00:37:56, pku...@freebsd.org wrote:
> From: Piotr Kubaj 
> 
> While FreeBSD currently uses 64-bit long double, there should be no
> problem with adding support for float128.
> 
> Signed-off-by: Piotr Kubaj 
> ---
>  libgcc/configure| 22 ++
>  libgcc/configure.ac | 11 +++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/libgcc/configure b/libgcc/configure
> index 4919a56f518..334d20d1fb1 100755
> --- a/libgcc/configure
> +++ b/libgcc/configure
> @@ -5300,6 +5300,28 @@ fi
>  { $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $libgcc_cv_powerpc_3_1_float128_hw" >&5
>  $as_echo "$libgcc_cv_powerpc_3_1_float128_hw" >&6; }
>CFLAGS="$saved_CFLAGS"
> +;;
> +powerpc*-*-freebsd*)
> +  saved_CFLAGS="$CFLAGS"
> +  CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128"
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PowerPC ISA 2.06 to 
> build __float128 libraries" >&5
> +$as_echo_n "checking for PowerPC ISA 2.06 to build __float128 libraries... " 
> >&6; }
> +if ${libgcc_cv_powerpc_float128+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +vector double dadd (vector double a, vector double b) { return a + b; }
> +_ACEOF
> +if ac_fn_c_try_compile "$LINENO"; then :
> +  libgcc_cv_powerpc_float128=yes
> +else
> +  libgcc_cv_powerpc_float128=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $libgcc_cv_powerpc_float128" >&5
> +$as_echo "$libgcc_cv_powerpc_float128" >&6; }
>  esac
>  
>  # Collect host-machine-specific information.
> diff --git a/libgcc/configure.ac b/libgcc/configure.ac
> index 13a80b2551b..99ec5d405a4 100644
> --- a/libgcc/configure.ac
> +++ b/libgcc/configure.ac
> @@ -483,6 +483,17 @@ powerpc*-*-linux*)
>  [libgcc_cv_powerpc_3_1_float128_hw=yes],
>  [libgcc_cv_powerpc_3_1_float128_hw=no])])
>CFLAGS="$saved_CFLAGS"
> +;;
> +powerpc*-*-freebsd*)
> +  saved_CFLAGS="$CFLAGS"
> +  CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128"
> +  AC_CACHE_CHECK([for PowerPC ISA 2.06 to build __float128 libraries],
> + [libgcc_cv_powerpc_float128],
> + [AC_COMPILE_IFELSE(
> +[AC_LANG_SOURCE([vector double dadd (vector double a, vector double b) { 
> return a + b; }])],
> +[libgcc_cv_powerpc_float128=yes],
> +[libgcc_cv_powerpc_float128=no])])
> +  CFLAGS="$saved_CFLAGS"
>  esac
>  
>  # Collect host-machine-specific information.
> -- 
> 2.35.1
> 


signature.asc
Description: PGP signature


[wwwdocs][patch] gcc-12/changes.html: Document -misa update for nvptx

2022-03-03 Thread Tobias Burnus

The current wording, https://gcc.gnu.org/gcc-12/changes.html#nvptx ,
is outdated and (now wrongly) encourages to use -mptx=.

Updated as follows.

OK?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc-12/changes.html: Document -misa update for nvptx

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index a3e46eeb..63e2bf63 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -468,9 +468,14 @@ a work-in-progress.
 
 NVPTX
 
+  The -misa/ flag now supports sm_53,
+sm_70, sm_75, and sm_80 besides
+sm_30 and sm_35 (which is the default).
   The -mptx flag has been added to specify the PTX ISA version
   for the generated code; permitted values are 3.1
-  (default, matches previous GCC versions) and 6.3.
+  (matches previous GCC versions), 6.0, 6.3,
+  and 7.0. If not specified, the used version is the minimal
+  version required for -misa but at least 6.0.
   
   The new __PTX_SM__ predefined macro allows code to check the
   compute model being targeted by the compiler.


Re: [PATCH] Check if loading const from mem is faster

2022-03-03 Thread Jiufu Guo via Gcc-patches


Hi Sehger,

Segher Boessenkool  writes:

> On Tue, Mar 01, 2022 at 10:28:57PM +0800, Jiufu Guo wrote:
>> Segher Boessenkool  writes:
>> > No.  insn_cost is only for correct, existing instructions, not for
>> > made-up nonsense.  I created insn_cost precisely to get away from that
>> > aspect of rtx_cost (and some other issues, like, it is incredibly hard
>> > and cumbersome to write a correct rtx_cost).
>> 
>> Thanks! The implementations of hook insn_cost are align with this
>> design, they are  checking insn's attributes and COSTS_N_INSNS.
>> 
>> One question on the speciall case: 
>> For instruction: "r119:DI=0x100803004101001"
>> Would we treat it as valid instruction?
>
> Currently we do, alternative 6 in *movdi_internal64: we allow any r<-n.
> This is costed as 5 insns (cost=20).
>
> It generally is better to split things into patterns close to the
> eventual machine isntructions as early as possible: all the more generic
> optimisations can take advantage of that then.
Get it!
>
>> A patch, which is attached the end of this mail, accepts
>> "r119:DI=0x100803004101001" as input of insn_cost.
>> In this patch, 
>> - A tmp instruction is generated via make_insn_raw.
>> - A few calls to rtx_cost (in cse_insn) is replaced by insn_cost.
>> - In hook of insn_cost, checking the special 'constant' instruction.
>> Are these make sense?
>
> I'll review that patch inline.
>
>> > That is one reason why it is better to generate (close to) machine
>> > insns as early as possible: it makes it much easier to estimate
>> > realistic costs.  (Another important reason is it allows other
>> > optimisations, without us having to do any work for it!)
>> Get it!  In the middle of an optimization pass, 'interim'
>> instruction maybe acceptable.  While it would better to outputs
>> only contains 'valid machine insn' from any RTL passes.
>
> Acceptable only if there is a very good reason for it, really :-(
>
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -22131,6 +22131,16 @@ rs6000_debug_rtx_costs (rtx x, machine_mode mode, 
>> int outer_code,
>>  static int
>>  rs6000_insn_cost (rtx_insn *insn, bool speed)
>>  {
>> +  /* Handle special 'constant int' insn. */
>> +  rtx set = PATTERN (insn);
>> +  if (GET_CODE (set) == SET && CONSTANT_P (SET_SRC (set)))
>> +{
>> +  rtx src = SET_SRC (set);
>> +  machine_mode mode = GET_MODE (SET_DEST (set));
>> +  if (CONST_INT_P (src) || CONST_WIDE_INT_P (src))
>> +return COSTS_N_INSNS (num_insns_constant (src, mode));
>> +}
>> +  
>>if (recog_memoized (insn) < 0)
>>  return 0;
>
> Why would such a set not recog()?
Thanks.  This code is not need at the top of function insn_cost.
recog_memoized could check insn_code on 'insn'.
>
> Needs a comment in any case, to say what this is a workaround for.
>
>> +static int insn_cost_x (rtx_insn *, rtx);
>
> Don't declare functions, just put their definitions before their first
> use.  (And use a better name please :-) )
Get it. :-)
>
>>  static int
>> -notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno)
>> +notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno,
>> + rtx_insn *insn = NULL)
>
> Don't use default arguments like this, it is an abomination.
Thanks.
>
>> @@ -709,9 +713,21 @@ notreg_cost (rtx x, machine_mode mode, enum rtx_code 
>> outer, int opno)
>> && subreg_lowpart_p (x)
>> && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode))
>>? 0
>> +  : insn != NULL ? insn_cost_x (insn, x)
>>: rtx_cost (x, mode, outer, opno, optimize_this_for_speed_p) * 2);
>>  }
>
> You can just always use insn_cost?  insn_cost -> pattern_cost ->
> set_src_cost -> rtx_cost.  That works for COST at least, not sure about
> COST_IN, maybe that needs a little more care (cse.c works with invalid
> insns all over the place :-( )
>
This experiement patch just replace part of rtx_cost with insn_cost.
In case, COST is called outside cse_insn, 'insn' may not be set, and
then 'insn_cost' may not work.   This would need to be enhanced.
>>  
>> +/* Internal function, to get cost when use X to replace source of insn
>> +   which is a SET.  */
>> +
>> +static int
>> +insn_cost_x (rtx_insn *insn, rtx x)
>> +{
>> +  INSN_CODE (insn) = -1;
>> +  SET_SRC (PATTERN (insn)) = x;
>> +  return insn_cost (insn, optimize_this_for_speed_p);
>> +}
>
> You need to restore stuff as well?
In this patch, this function is called on a tmp_insn, so I did not
restore it.  If using the original 'insn' of cse_insn to invoked
'insn_cost_x', fields of 'insn' should be restored.
>
>> @@ -4603,6 +4619,7 @@ cse_insn (rtx_insn *insn)
>>  
>>   Nothing in this loop changes the hash table or the register chains.  */
>>  
>> +  rtx_insn *tmp_insn = NULL;
>>for (i = 0; i < n_sets; i++)
>>  {
>>bool repeat = false;
>> @@ -4638,6 +4655,10 @@ cse_insn (rtx_insn *insn)
>>mode = GET_MODE (src) == VOIDmode ? GET_MODE (dest) : GET_MODE (src);
>>

Re: [PATCH] Check if loading const from mem is faster

2022-03-03 Thread Jiufu Guo via Gcc-patches


Hi,

Jeff Law  writes:

> On 3/1/2022 12:47 AM, Richard Biener via Gcc-patches wrote:
>> On Tue, 1 Mar 2022, Jiufu Guo wrote:
>>
>>> Segher Boessenkool  writes:
>>>
 On Thu, Feb 24, 2022 at 09:50:28AM +0100, Richard Biener wrote:
> On Thu, 24 Feb 2022, Jiufu Guo wrote:
>> And another thing as Segher pointed out, CSE is doing too
>> much work.  It may be ok to separate the constant handling
>> logic from CSE.
> Not sure - CSE just is value numbering, I don't see that it does
> more than that.  Yes, it might have developed "heuristics" over
> the years what to CSE and to what and where to substitute and
> where not.  But in the end it does just value numbering.
 It also does various micro-optimisations, like all the CC things it
 does.

 It is not very good at doing the CSE job, but it cannot easily be
 replaced by a better implementation because it does many other small
 optimisations (that are not done elsewhere).

>>> Thanks a lot for these comments! I'm also wondering if we would
>>> rewrite this cse.cc or refactor it in some aspects.
>> I think time is better spent elsewhere ... I don't think CSE is as
>> bad as Segher depicts it - it might do "CC things" and other bits
>> but in the end that's going to be instruction/expression combination
>> things that "fit" likely because a value lattice (or just nonzero bits
>> in the cselib variant) existed.
>>
>> So what might be interesting would be to work towards cleansing
>> CSE of those, producing testcases and making sure a better fit
>> pass (combine? fwprop? compare-elim?) performs the desired
>> optimization.
>>
>> But I'm not really sure what Segher is talking about - I suppose
>> it must be magic done inside cselib (which only does analysis),
>> not in cse.cc itself.
> I also don't see cse.cc has being *that* bad. It's largely the
> same bits as we had before SSA and I wouldn't be surprised if there
> are things in there that aren't needed anymore.
>
> For example, while we have excised HAVE_cc0, I think there are still
> remnants of those concepts lying around (see this_insn_cc0 and
> friends).
>
> I'd always hoped we'd get to a point where we could eliminate the
> follow-jumps and cse-around-loops bits, but we never managed to
> accomplish that.  When I last looked (a long long time ago), there
> were still things that got exposed when we lowered from gimple to RTL
> that were picked up by those options.   Changing to work  with
> dominators to find EBBs would be a nice cleanup, but the code as it
> stands right now works.
>
> I wouldn't be surprised if the costing stuff could use some serious
> cleanup.  But I don't see it as inherently broken, just very dated.
>
> But in the end, it's really just value-numbering as you say.  It could
> be rewritten to be more modern, but I doubt it's going to make much of
> a real difference in the end.  If there are things that fit logically
> elsehwere, sure move them where they more logically fit, but I doubt
> there's a lot of this stuff.
>
> jeff

Thanks for your comments and suggestions!

I had a quick test about cse1/cse2 on spec2017. Compare with "-O3",
we can see both positive and negative impacts for "-O3
-fdisable-rtl-cse1 -fdisable-rtl-cse2".  we can see performance gain on
500.perlbench_r(+1.83%),  538.imagick_r(0.7%) 520.omnetpp_r(+0.6%);
and performance recession on  548.exchange2_r(-1.97%) 557.xz_r(-0.7%)
on Power9.  The performance change on 500.perlbench_r would relate
to the behavior of 'constant loading'.

The performance impactions may be directly or indirectly caused by
sub-behaviors of current cse.cc.  And the data would change on
on different targets. Anyway, this data may also indicate that
we could clean up or enhance some functionalities in cse.cc.

BR,
Jiufu


Re: [PATCH] eliminate mutex in fast path of __register_frame

2022-03-03 Thread Thomas Neumann via Gcc-patches

We may have to add a new interface.  In some other cases, I've seen
errno being used for error reporting, but that requires changes in
applications to recognize the error.  It's probably better to crash here
than to fail mysteriously later.

Out of curiosity, how many times do you call the registration functions
from your application?  Would a batch registration interface help?  It
could address error reporting as well.


we are compiling SQL queries into machine code, which means we call 
__(de)register_frame once per query. There is usually no way to batch 
that, as the queries typically come in one after the other.
In practice the system caches generated code, which means that most of 
the time are we executing an already compiled query with different 
parameters. Thus __register_frame is not a big performance problem for 
us, but unwinding is, as queries are executed in parallel using hundreds 
of threads and some of them fail.


Thank you for your detailed feedback, I will incorporate your 
suggestions and will send an updated patch in a few days.


Best

Thomas


[committed][nvptx] Build libraries with mptx=3.1

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi,

In gcc-5 to gcc-11, the ptx isa version was 3.1.

On trunk, the default is now 6.0, which is also what will be the value in
the libraries.

Consequently, there may be setups with an older driver that worked with
gcc-11, but will become unsupported with gcc-12.

Fix this by building the libraries with mptx=3.1.

After this, setups with an older driver still won't work out of the box
with gcc-12, because the default ptx isa version has changed, but should work
after specifying mptx=3.1.

Committed to trunk.

Thanks,
- Tom

[nvptx] Build libraries with mptx=3.1

gcc/ChangeLog:

2022-03-03  Tom de Vries  

* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Add mptx=3.1.

---
 gcc/config/nvptx/t-nvptx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index 056d2dd2d04..8f67264d132 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -32,4 +32,4 @@ s-nvptx-gen-opt: $(srcdir)/config/nvptx/nvptx-sm.def
 
 MULTILIB_OPTIONS = mgomp
 
-MULTILIB_EXTRA_OPTS = misa=sm_30
+MULTILIB_EXTRA_OPTS = misa=sm_30 mptx=3.1


[committed][nvptx] Build libraries with misa=sm_30

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi,

In gcc-11, when  specifying -misa=sm_30, an executable may still contain sm_35
code (due to libraries being built with the default -misa=sm_35), so it won't
run on an sm_30 board.

Fix this by building libraries with sm_30, as was the case in gcc-5 to gcc-10.

Committed to trunk.

Thanks,
- Tom

[nvptx] Build libraries with misa=sm_30

gcc/ChangeLog:

2022-03-03  Tom de Vries  

PR target/104758
* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Add misa=sm_30.

---
 gcc/config/nvptx/t-nvptx | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index f17fc9c19aa..056d2dd2d04 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -31,3 +31,5 @@ s-nvptx-gen-opt: $(srcdir)/config/nvptx/nvptx-sm.def
$(STAMP) s-nvptx-gen-opt
 
 MULTILIB_OPTIONS = mgomp
+
+MULTILIB_EXTRA_OPTS = misa=sm_30


[committed][nvptx] Use --no-verify for sm_30

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi,

In PR97348, we ran into the problem that recent CUDA dropped support for
sm_30, which inhibited the build when building with CUDA bin in the path,
because the nvptx-tools assembler uses CUDA's ptxas to do ptx verification.

To fix this, in gcc-11 the default sm_xx was moved from sm_30 to sm_35.

This however broke support for sm_30 boards: an executable build for sm_30
might contain sm_35 code from the libraries, which are build with the default
sm_xx (PR104758).

We want to fix this by going back to having the libraries build with sm_30, as
was the case for gcc-5 to gcc-10.  That however reintroduces the problem from
PR97348.

Deal with PR97348 in the simplest way possible: when calling the assembler for
sm_30, specify --no-verify.

This has the unfortunate effect that after fixing PR104758 by building
libraries with sm_30, the libraries are no longer verified.  This can be
improved upon by:
- adding a configure test in gcc that tests if CUDA supports sm_30, and
  if so disabling this patch
- dealing with this in nvptx-tools somehow, either:
  - detect at ptxas execution time that it doesn't support sm_30, or
  - detect this at nvptx-tool configure time.

Committed to trunk.

Thanks,
- Tom

[nvptx] Use --no-verify for sm_30

gcc/ChangeLog:

2022-03-03  Tom de Vries  

* config/nvptx/nvptx.h (ASM_SPEC): Add %{misa=sm_30:--no-verify}.

---
 gcc/config/nvptx/nvptx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 4ab412bc7d8..3ca22a595d2 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -32,7 +32,7 @@
 /* Default needs to be in sync with default for misa in nvptx.opt.
We add a default here to work around a hard-coded sm_30 default in
nvptx-as.  */
-#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"
+#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}%{misa=sm_30:--no-verify}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
 


[committed][nvptx] Add -mptx=_ in gcc.target/nvptx/smxx.c

2022-03-03 Thread Tom de Vries via Gcc-patches
Hi,

With target board nvptx-none-run/-mptx=3.1 we run into:
...
cc1: error: PTX version (-mptx) needs to be at least 4.2 to support \
  selected -misa (sm_53)^M
compiler exited with status 1
FAIL: gcc.target/nvptx/sm53.c (test for excess errors)
...

Fix this by adding -mptx=_ in sm53.c and similar.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add -mptx=_ in gcc.target/nvptx/smxx.c

gcc/testsuite/ChangeLog:

2022-03-03  Tom de Vries  

* gcc.target/nvptx/sm53.c: Add -mptx=_.
* gcc.target/nvptx/sm70.c: Same.
* gcc.target/nvptx/sm75.c: Same.
* gcc.target/nvptx/sm80.c: Same.

---
 gcc/testsuite/gcc.target/nvptx/sm53.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/sm70.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/sm75.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/sm80.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/sm53.c 
b/gcc/testsuite/gcc.target/nvptx/sm53.c
index c47790b6448..b4d819c6a79 100644
--- a/gcc/testsuite/gcc.target/nvptx/sm53.c
+++ b/gcc/testsuite/gcc.target/nvptx/sm53.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-misa=sm_53" } */
+/* { dg-options "-misa=sm_53 -mptx=_" } */
 
 #if __PTX_SM__ != 530
 #error wrong value for __PTX_SM__
diff --git a/gcc/testsuite/gcc.target/nvptx/sm70.c 
b/gcc/testsuite/gcc.target/nvptx/sm70.c
index dc5a5fd8bfa..4bd012b5680 100644
--- a/gcc/testsuite/gcc.target/nvptx/sm70.c
+++ b/gcc/testsuite/gcc.target/nvptx/sm70.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-misa=sm_70" } */
+/* { dg-options "-misa=sm_70 -mptx=_" } */
 
 #if __PTX_SM__ != 700
 #error wrong value for __PTX_SM__
diff --git a/gcc/testsuite/gcc.target/nvptx/sm75.c 
b/gcc/testsuite/gcc.target/nvptx/sm75.c
index c098bf77ca2..d159d3f5fb3 100644
--- a/gcc/testsuite/gcc.target/nvptx/sm75.c
+++ b/gcc/testsuite/gcc.target/nvptx/sm75.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-misa=sm_75" } */
+/* { dg-options "-misa=sm_75 -mptx=_" } */
 
 #if __PTX_SM__ != 750
 #error wrong value for __PTX_SM__
diff --git a/gcc/testsuite/gcc.target/nvptx/sm80.c 
b/gcc/testsuite/gcc.target/nvptx/sm80.c
index 3770563eb16..ef6d8b7fa23 100644
--- a/gcc/testsuite/gcc.target/nvptx/sm80.c
+++ b/gcc/testsuite/gcc.target/nvptx/sm80.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-misa=sm_80" } */
+/* { dg-options "-misa=sm_80 -mptx=_" } */
 
 #if __PTX_SM__ != 800
 #error wrong value for __PTX_SM__


Re: [PATCH] eliminate mutex in fast path of __register_frame

2022-03-03 Thread Florian Weimer via Gcc-patches
* Thomas Neumann:

>>> +// Common logic for version locks
>>> +struct version_lock
>>> +{
>>> +  // The lock itself
>>> +  uintptr_t version_lock;
>>> +};
>> version_lock must not overflow, right?  This means we need a wider
>> counter on 32-bit, too.  glibc contains a 62-bit counter that it uses
>> for its own data structure.
>
> an overflow is not a problem per se, it is only problematic if we hit
> exactly the same value again between lock_optimistic and
> validate. Note that these ranges are usually a handful of assembler
> instructions, and we would have to see 4 billion frame registrations
> in that short time span. I don't think that is a problem in
> practice. But I can switch to 64bit, of course, if you think the risk
> is too high.

This is more of a GCC policy question, which I cannot answer.

At least it needs a comment explaining the decision to ignore overflows.

> But we could eliminate the spinlock aspect by using a global mutex,
> which would guarantee us that nothing is locked exclusively and thus
> no spinning is required. That would also allow us to fix the
> async-signal-safety at the same time if needed by blocking signals
> globally during updates.

Again, I don't know if people consider the spinning a problem.  In
glibc, we would use that mutex.

> Note that the current code is not async-signal-safe either, it simply
> grabs a mutex. If a signal happens while calling __register_frame, and 
> that handler tries to unwind, the current code will deadlock, too.

Yes, understood.

>>> +// Validate a previously acquire lock
>>> +static inline bool
>>> +version_lock_validate (const struct version_lock *vl, uintptr_t lock)
>>> +{
>>> +  // Check that the node is still in the same state
>>> +  uintptr_t state = __atomic_load_n (&(vl->version_lock), 
>>> __ATOMIC_SEQ_CST);
>>> +  return (state == lock);
>>> +}
>> I think an acquire fence is missing before the __atomic_load_n.  We
>> learned this the hard way in the glibc implementation.  The reference
>> that Szabolcs found is:
>>   Hans Boehm, Can Seqlocks Get Along with Programming Language
>>   Memory Models?, Section 4.
>
> thanks for the pointer, I will look at this more carefully. When I
> read the text correctly we need a
> __atomic_thread_fence(__ATOMIC_ACQUIRE) before the load, but I will
> double check that.

The equivalent glibc function looks like this:

/* Return true if the read was successful, given the start
   version.  */
static inline bool
_dlfo_read_success (uint64_t start_version)
{
  /* See Hans Boehm, Can Seqlocks Get Along with Programming Language
 Memory Models?, Section 4.  This is necessary so that loads in
 the TM region are not ordered past the version check below.  */
  atomic_thread_fence_acquire ();

  /* Synchronizes with the fences in _dlfo_mappings_begin_update,
 _dlfo_mappings_end_update.  It is important that all stores from
 the last update have become visible, and stores from the next
 update to this version are not before the version number is
 updated.

 Unlike with seqlocks, there is no check for odd versions here
 because we have read the unmodified copy (confirmed to be
 unmodified by the unchanged version).  */
  return _dlfo_read_start_version () == start_version;
}

>>> +// Allocate a node. This node will be returned in locked exclusive state
>>> +static struct btree_node *
>>> +btree_allocate_node (struct btree *t, bool inner)
>>> +{
>> 
>>> +  // No free page available, allocate a new one
>>> +  struct btree_node *new_node
>>> +   = (struct btree_node *) (malloc (sizeof (struct btree_node)));
>>> +  version_lock_initialize_locked_exclusive (
>>> +   &(new_node->version_lock)); // initialize the node in locked state
>>> +  new_node->entry_count = 0;
>>> +  new_node->type = inner ? btree_node_inner : btree_node_leaf;
>>> +  return new_node;
>>> +}
>>> +}
>> This needs some form of error checking for malloc.  But I see the
>> existing code does not have that, either. 8-(
>
> and I do not see how we can really handle a malloc failure here. What
> should we do except die?

We may have to add a new interface.  In some other cases, I've seen
errno being used for error reporting, but that requires changes in
applications to recognize the error.  It's probably better to crash here
than to fail mysteriously later.

Out of curiosity, how many times do you call the registration functions
from your application?  Would a batch registration interface help?  It
could address error reporting as well.

>>> +// Find the corresponding entry the given address
>>> +static struct object *
>>> +btree_lookup (const struct btree *t, uintptr_t target_addr)
>>> +{
>> 
>>> +  if (type == btree_node_inner)
>>> +   {
>>> + // We cannot call find_inner_slot here because we can only trust our
>>> + // validated entries
>>> + unsigned slot = 0;
>>> + while (((slot + 1) < entry_count)
>>> +&& 

Re: [PATCH] call mark_dfs_back_edges() before testing EDGE_DFS_BACK [PR104761]

2022-03-03 Thread Richard Biener via Gcc-patches



> Am 03.03.2022 um 09:02 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> On Wed, Mar 02, 2022 at 04:15:09PM -0700, Martin Sebor via Gcc-patches wrote:
>> The -Wdangling-pointer code tests the EDGE_DFS_BACK but the pass never
>> calls the mark_dfs_back_edges() function that initializes the bit (I
>> didn't know about it).  As a result the bit is not set when expected,
>> which can cause false positives under the right conditions.
> 
> Not a review because I also had to look up what computes EDGE_DFS_BACK,
> so I don't feel the right person to ack the patch.

The patch looks OK.  The questions below might be all valid but they can be 
addressed with followup changes.

Richard.


> So, just a few questions.
> 
> The code in question is:
>  auto gsi = gsi_for_stmt (use_stmt);
> 
>  auto_bitmap visited;
> 
>  /* A use statement in the last basic block in a function or one that
> falls through to it is after any other prior clobber of the used
> variable unless it's followed by a clobber of the same variable. */
>  basic_block bb = use_bb;
>  while (bb != inval_bb
> && single_succ_p (bb)
> && !(single_succ_edge (bb)->flags & (EDGE_EH|EDGE_DFS_BACK)))
>{
>  if (!bitmap_set_bit (visited, bb->index))
>/* Avoid cycles. */
>return true;
> 
>  for (; !gsi_end_p (gsi); gsi_next_nondebug ())
>{
>  gimple *stmt = gsi_stmt (gsi);
>  if (gimple_clobber_p (stmt))
>{
>  if (clobvar == gimple_assign_lhs (stmt))
>/* The use is followed by a clobber.  */
>return false;
>}
>}
> 
>  bb = single_succ (bb);
>  gsi = gsi_start_bb (bb);
>}
> 
> 1) shouldn't it give up for EDGE_ABNORMAL too?  I mean, e.g.
>   following a non-local goto forced edge from a noreturn call
>   to a non-local label (if there is just one) doesn't seem
>   right to me
> 2) if EDGE_DFS_BACK is computed and 1) is done, is there any
>   reason why you need 2 levels of protection, i.e. the EDGE_DFS_BACK
>   check as well as the visited bitmap (and having them use
>   very different answers, if EDGE_DFS_BACK is seen, the function
>   will return false, if visited bitmap has a bb, it will return true)?
>   Can't the visited bitmap go away?
> 3) I'm concerned about compile time with the above, consider you have
>   100 use_stmts and 100 corresponding inv_stmts and in each
>   case you enter this loop and go through a series of very large basic
>   blocks that don't clobber those stmts; shouldn't it bail out
>   (return false) after walking some param
>   controlled number of non-debug stmts (say 1000 by default)?
>   There is an early exit if
>   if (dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb))
> return true;
>   (I admit I haven't read the code what happens if there is more than
>   one clobber for the same variable)
> 
>> The attached patch adds a call to the warning pass to initialize
>> the bit.  Tested on x86_64-linux.
>> 
>> Martin
> 
>> Call mark_dfs_back_edges before testing EDGE_DFS_BACK [PR104761].
>> 
>> Resolves:
>> PR middle-end/104761 - bogus -Wdangling-pointer with cleanup and infinite 
>> loop
>> 
>> gcc/ChangeLog:
>> 
>>PR middle-end/104761
>>* gimple-ssa-warn-access.cc (pass_waccess::execute): Call
>>mark_dfs_back_edges.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>PR middle-end/104761
>>* g++.dg/warn/Wdangling-pointer-4.C: New test.
>>* gcc.dg/Wdangling-pointer-4.c: New test.
>> 
>> diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
>> index b7cdad517b3..b519712d76e 100644
>> --- a/gcc/gimple-ssa-warn-access.cc
>> +++ b/gcc/gimple-ssa-warn-access.cc
>> @@ -47,7 +47,7 @@
>> #include "tree-object-size.h"
>> #include "tree-ssa-strlen.h"
>> #include "calls.h"
>> -#include "cfgloop.h"
>> +#include "cfganal.h"
>> #include "intl.h"
>> #include "gimple-range.h"
>> #include "stringpool.h"
>> @@ -4710,6 +4710,9 @@ pass_waccess::execute (function *fun)
>>   calculate_dominance_info (CDI_DOMINATORS);
>>   calculate_dominance_info (CDI_POST_DOMINATORS);
>> 
>> +  /* Set or clear EDGE_DFS_BACK bits on back edges.  */
>> +  mark_dfs_back_edges (fun);
>> +
>>   /* Create a new ranger instance and associate it with FUN.  */
>>   m_ptr_qry.rvals = enable_ranger (fun);
>>   m_func = fun;
> 
>Jakub
> 


[PATCH] rs6000: Adjust mov optabs for opaque modes [PR103353]

2022-03-03 Thread Kewen.Lin via Gcc-patches
Hi,

As PR103353 shows, we may want to continue to expand a MMA built-in
function like a normal function, even if we have already emitted
error messages about some missing required conditions.  As shown in
that PR, without one explicit mov optab on OOmode provided, it would
call emit_move_insn recursively.

So this patch is to allow the mov pattern to be generated when we are
expanding to RTL and have seen errors even without MMA supported, it's
expected that the generated pattern would not cause further ICEs as the
compilation would stop soon after expanding.

Bootstrapped and regtested on powerpc64-linux-gnu P8 and
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

BR,
Kewen
--

PR target/103353

gcc/ChangeLog:

* config/rs6000/mma.md (define_expand movoo): Move TARGET_MMA condition
check to preparation statements and add handlings for !TARGET_MMA.
(define_expand movxo): Likewise.
---
 gcc/config/rs6000/mma.md | 42 ++--
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 907c9d6d516..f76a87b4a21 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -268,10 +268,25 @@ (define_int_attr avvi4i4i4
[(UNSPEC_MMA_PMXVI8GER4PP   "pmxvi8ger4pp")
 (define_expand "movoo"
   [(set (match_operand:OO 0 "nonimmediate_operand")
(match_operand:OO 1 "input_operand"))]
-  "TARGET_MMA"
+  ""
 {
-  rs6000_emit_move (operands[0], operands[1], OOmode);
-  DONE;
+  if (TARGET_MMA) {
+rs6000_emit_move (operands[0], operands[1], OOmode);
+DONE;
+  }
+  /* Opaque modes are only expected to be available when MMA is supported,
+ but PR103353 shows we may want to continue to expand a MMA built-in
+ function like a normal function, even if we have already emitted
+ error messages about some missing required conditions.
+ As shown in that PR, without one explicit mov optab on OOmode provided,
+ it would call emit_move_insn recursively.  So we allow this pattern to
+ be generated when we are expanding to RTL and have seen errors, even
+ though there is no MMA support.  It would not cause further ICEs as
+ the compilation would stop soon after expanding.  */
+  else if (currently_expanding_to_rtl && seen_error ())
+;
+  else
+gcc_unreachable ();
 })
 
 (define_insn_and_split "*movoo"
@@ -300,10 +315,25 @@ (define_insn_and_split "*movoo"
 (define_expand "movxo"
   [(set (match_operand:XO 0 "nonimmediate_operand")
(match_operand:XO 1 "input_operand"))]
-  "TARGET_MMA"
+  ""
 {
-  rs6000_emit_move (operands[0], operands[1], XOmode);
-  DONE;
+  if (TARGET_MMA) {
+rs6000_emit_move (operands[0], operands[1], XOmode);
+DONE;
+  }
+  /* Opaque modes are only expected to be available when MMA is supported,
+ but PR103353 shows we may want to continue to expand a MMA built-in
+ function like a normal function, even if we have already emitted
+ error messages about some missing required conditions.
+ As shown in that PR, without one explicit mov optab on OOmode provided,
+ it would call emit_move_insn recursively.  So we allow this pattern to
+ be generated when we are expanding to RTL and have seen errors, even
+ though there is no MMA support.  It would not cause further ICEs as
+ the compilation would stop soon after expanding.  */
+  else if (currently_expanding_to_rtl && seen_error ())
+;
+  else
+gcc_unreachable ();
 })
 
 (define_insn_and_split "*movxo"
-- 
2.25.1



[committed] openmp: Disable SSA form during gimplification on OMP_SIMD clauses and body [PR104757]

2022-03-03 Thread Jakub Jelinek via Gcc-patches
Hi!

When offloading to nvptx is enabled, scan_omp_simd duplicates the simd
region including its clauses and body using inliner's
copy_gimple_seq_and_replace_locals.  That works nicely for decls, remaps
only those that are seen in the nested bind expr vars (i.e. local variables)
and doesn't remap other vars.  But for SSA_NAMEs it remaps them always, doesn't
know if their def stmt is outside of the simd (then it better shouldn't be 
remapped)
or inside of it (then it should) and without cfg/dominators that is pretty hard
to figure out (well, we could walk the region twice, once note SSA_NAMEs defined
by each stmt seen there and once do the remapping of only those visited 
SSA_NAMEs).

This patch uses a simpler way, disables temporarily into_ssa for the clauses and
body of each simd region; we already disable into_ssa e.g. in 
parallel/target/task
etc. regions through push_gimplify_context () but for simd we don't push
any gimplification context and appart from into_ssa I think we don't need it.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2022-03-03  Jakub Jelinek  

PR middle-end/104757
* gimplify.cc (gimplify_omp_loop): Call gimplify_expr rather than
gimplify_omp_for.
(gimplify_expr) : Temporarily disable
gimplify_ctxp->into_ssa around call to gimplify_omp_for.

* gfortran.dg/gomp/pr104757.f90: New test.
* gcc.dg/gomp/pr104757.c: New test.

--- gcc/gimplify.cc.jj  2022-02-14 13:14:39.058128337 +0100
+++ gcc/gimplify.cc 2022-03-02 19:31:46.675210108 +0100
@@ -13725,7 +13725,7 @@ gimplify_omp_loop (tree *expr_p, gimple_
   *pc = NULL_TREE;
   *expr_p = t;
 }
-  return gimplify_omp_for (expr_p, pre_p);
+  return gimplify_expr (expr_p, pre_p, NULL, is_gimple_stmt, fb_none);
 }
 
 
@@ -15479,8 +15479,19 @@ gimplify_expr (tree *expr_p, gimple_seq
  ret = GS_ALL_DONE;
  break;
 
-   case OMP_FOR:
case OMP_SIMD:
+ {
+   /* Temporarily disable into_ssa, as scan_omp_simd
+  which calls copy_gimple_seq_and_replace_locals can't deal
+  with SSA_NAMEs defined outside of the body properly.  */
+   bool saved_into_ssa = gimplify_ctxp->into_ssa;
+   gimplify_ctxp->into_ssa = false;
+   ret = gimplify_omp_for (expr_p, pre_p);
+   gimplify_ctxp->into_ssa = saved_into_ssa;
+   break;
+ }
+
+   case OMP_FOR:
case OMP_DISTRIBUTE:
case OMP_TASKLOOP:
case OACC_LOOP:
--- gcc/testsuite/gfortran.dg/gomp/pr104757.f90.jj  2022-03-02 
18:35:00.606549074 +0100
+++ gcc/testsuite/gfortran.dg/gomp/pr104757.f90 2022-03-02 18:34:42.028804663 
+0100
@@ -0,0 +1,19 @@
+! PR middle-end/104757
+! { dg-do compile }
+! { dg-options "-O -fopenmp" }
+
+module pr104757
+  implicit none (external, type)
+  integer :: ll
+  !$omp declare target (ll)
+contains
+  subroutine foo (i1)
+!$omp declare target (foo)
+logical :: i1
+integer :: i
+!$omp distribute simd if(i1)
+do i = 1, 64
+  ll = ll + 1
+end do
+  end
+end module
--- gcc/testsuite/gcc.dg/gomp/pr104757.c.jj 2022-03-02 19:33:52.637485113 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr104757.c2022-03-02 19:34:09.365256034 
+0100
@@ -0,0 +1,14 @@
+/* PR middle-end/104757 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp" } */
+
+#pragma omp declare target
+void
+foo (int x, int y, int *z)
+{
+  int j = 0;
+  #pragma omp simd linear(j:x + y)
+  for (int i = 0; i < 64; i++)
+j += x + y;
+}
+#pragma omp end declare target

Jakub



Re: [PATCH] call mark_dfs_back_edges() before testing EDGE_DFS_BACK [PR104761]

2022-03-03 Thread Jakub Jelinek via Gcc-patches
On Wed, Mar 02, 2022 at 04:15:09PM -0700, Martin Sebor via Gcc-patches wrote:
> The -Wdangling-pointer code tests the EDGE_DFS_BACK but the pass never
> calls the mark_dfs_back_edges() function that initializes the bit (I
> didn't know about it).  As a result the bit is not set when expected,
> which can cause false positives under the right conditions.

Not a review because I also had to look up what computes EDGE_DFS_BACK,
so I don't feel the right person to ack the patch.

So, just a few questions.

The code in question is:
  auto gsi = gsi_for_stmt (use_stmt);

  auto_bitmap visited;

  /* A use statement in the last basic block in a function or one that
 falls through to it is after any other prior clobber of the used
 variable unless it's followed by a clobber of the same variable. */
  basic_block bb = use_bb;
  while (bb != inval_bb
 && single_succ_p (bb)
 && !(single_succ_edge (bb)->flags & (EDGE_EH|EDGE_DFS_BACK)))
{
  if (!bitmap_set_bit (visited, bb->index))
/* Avoid cycles. */
return true;

  for (; !gsi_end_p (gsi); gsi_next_nondebug ())
{
  gimple *stmt = gsi_stmt (gsi);
  if (gimple_clobber_p (stmt))
{
  if (clobvar == gimple_assign_lhs (stmt))
/* The use is followed by a clobber.  */
return false;
}
}

  bb = single_succ (bb);
  gsi = gsi_start_bb (bb);
}

1) shouldn't it give up for EDGE_ABNORMAL too?  I mean, e.g.
   following a non-local goto forced edge from a noreturn call
   to a non-local label (if there is just one) doesn't seem
   right to me
2) if EDGE_DFS_BACK is computed and 1) is done, is there any
   reason why you need 2 levels of protection, i.e. the EDGE_DFS_BACK
   check as well as the visited bitmap (and having them use
   very different answers, if EDGE_DFS_BACK is seen, the function
   will return false, if visited bitmap has a bb, it will return true)?
   Can't the visited bitmap go away?
3) I'm concerned about compile time with the above, consider you have
   100 use_stmts and 100 corresponding inv_stmts and in each
   case you enter this loop and go through a series of very large basic
   blocks that don't clobber those stmts; shouldn't it bail out
   (return false) after walking some param
   controlled number of non-debug stmts (say 1000 by default)?
   There is an early exit if
   if (dominated_by_p (CDI_DOMINATORS, use_bb, inval_bb))
 return true;
   (I admit I haven't read the code what happens if there is more than
   one clobber for the same variable)

> The attached patch adds a call to the warning pass to initialize
> the bit.  Tested on x86_64-linux.
> 
> Martin

> Call mark_dfs_back_edges before testing EDGE_DFS_BACK [PR104761].
> 
> Resolves:
> PR middle-end/104761 - bogus -Wdangling-pointer with cleanup and infinite loop
> 
> gcc/ChangeLog:
> 
>   PR middle-end/104761
>   * gimple-ssa-warn-access.cc (pass_waccess::execute): Call
>   mark_dfs_back_edges.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/104761
>   * g++.dg/warn/Wdangling-pointer-4.C: New test.
>   * gcc.dg/Wdangling-pointer-4.c: New test.
> 
> diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
> index b7cdad517b3..b519712d76e 100644
> --- a/gcc/gimple-ssa-warn-access.cc
> +++ b/gcc/gimple-ssa-warn-access.cc
> @@ -47,7 +47,7 @@
>  #include "tree-object-size.h"
>  #include "tree-ssa-strlen.h"
>  #include "calls.h"
> -#include "cfgloop.h"
> +#include "cfganal.h"
>  #include "intl.h"
>  #include "gimple-range.h"
>  #include "stringpool.h"
> @@ -4710,6 +4710,9 @@ pass_waccess::execute (function *fun)
>calculate_dominance_info (CDI_DOMINATORS);
>calculate_dominance_info (CDI_POST_DOMINATORS);
>  
> +  /* Set or clear EDGE_DFS_BACK bits on back edges.  */
> +  mark_dfs_back_edges (fun);
> +
>/* Create a new ranger instance and associate it with FUN.  */
>m_ptr_qry.rvals = enable_ranger (fun);
>m_func = fun;

Jakub