[PATCH] MAINTAINERS: Add myself as maintainer of the i386 vector extensions.

2021-06-20 Thread liuhongt via Gcc-patches
ChangeLog:

* MAINTAINERS: Add myself as maintainer of the i386 vector
extensions.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 32a414ba8af..4ac4fc5f3bd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -74,6 +74,7 @@ hppa port John David Anglin   

 i386 port  Jan Hubicka 
 i386 port  Uros Bizjak 
 i386 vector ISA extns  Kirill Yukhin   
+i386 vector ISA extns  Hongtao Liu 
 iq2000 portNick Clifton
 lm32 port  Sebastien Bourdeauducq  
 m32r port  Nick Clifton
-- 
2.18.1



[PATCH] Disparage slightly the mask register alternative for bitwise operations. [PR target/101142]

2021-06-20 Thread liuhongt via Gcc-patches
The avx512 supports bitwise operations with mask registers, but the
throughput of those instructions is much lower than that of the
corresponding gpr version, so we would additionally disparages
slightly the mask register alternative for bitwise operations in the
LRA.

Also when allocano cost of GENERAL_REGS is same as MASK_REGS, allocate
MASK_REGS first since it has already been disparaged.

gcc/ChangeLog:

PR target/101142
* config/i386/i386.md: (*anddi_1): Disparage slightly the mask
register alternative.
(*and_1): Ditto.
(*andqi_1): Ditto.
(*andn_1): Ditto.
(*_1): Ditto.
(*qi_1): Ditto.
(*one_cmpl2_1): Ditto.
(*one_cmplsi2_1_zext): Ditto.
(*one_cmplqi2_1): Ditto.
* config/i386/i386.c (x86_order_regs_for_local_alloc): Change
the order of mask registers to be before general registers.

gcc/testsuite/ChangeLog:

PR target/101142
* gcc.target/i386/spill_to_mask-1.c: Adjust testcase.
* gcc.target/i386/spill_to_mask-2.c: Adjust testcase.
* gcc.target/i386/spill_to_mask-3.c: Adjust testcase.
* gcc.target/i386/spill_to_mask-4.c: Adjust testcase.
---
 gcc/config/i386/i386.c|  8 +-
 gcc/config/i386/i386.md   | 20 ++---
 .../gcc.target/i386/spill_to_mask-1.c | 89 +--
 .../gcc.target/i386/spill_to_mask-2.c | 11 ++-
 .../gcc.target/i386/spill_to_mask-3.c | 11 ++-
 .../gcc.target/i386/spill_to_mask-4.c | 11 ++-
 6 files changed, 91 insertions(+), 59 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a61255857ff..a651853ca3b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20463,6 +20463,10 @@ x86_order_regs_for_local_alloc (void)
int pos = 0;
int i;
 
+   /* Mask register.  */
+   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
+ reg_alloc_order [pos++] = i;
+
/* First allocate the local general purpose registers.  */
for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
  if (GENERAL_REGNO_P (i) && call_used_or_fixed_reg_p (i))
@@ -20489,10 +20493,6 @@ x86_order_regs_for_local_alloc (void)
for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
  reg_alloc_order [pos++] = i;
 
-   /* Mask register.  */
-   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
- reg_alloc_order [pos++] = i;
-
/* x87 registers.  */
if (TARGET_SSE_MATH)
  for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6e4abf32e7c..3eef56b27d7 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9138,7 +9138,7 @@ (define_insn_and_split "*anddi3_doubleword"
 })
 
 (define_insn "*anddi_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r,k")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r,?k")
(and:DI
 (match_operand:DI 1 "nonimmediate_operand" "%0,0,0,qm,k")
 (match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,m,L,k")))
@@ -9226,7 +9226,7 @@ (define_insn "*andsi_1_zext"
(set_attr "mode" "SI")])
 
 (define_insn "*and_1"
-  [(set (match_operand:SWI24 0 "nonimmediate_operand" "=rm,r,Ya,k")
+  [(set (match_operand:SWI24 0 "nonimmediate_operand" "=rm,r,Ya,?k")
(and:SWI24 (match_operand:SWI24 1 "nonimmediate_operand" "%0,0,qm,k")
   (match_operand:SWI24 2 "" "r,m,L,k")))
(clobber (reg:CC FLAGS_REG))]
@@ -9255,7 +9255,7 @@ (define_insn "*and_1"
(set_attr "mode" ",,SI,")])
 
 (define_insn "*andqi_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,k")
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,?k")
(and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k")
(match_operand:QI 2 "general_operand" "qn,m,rn,k")))
(clobber (reg:CC FLAGS_REG))]
@@ -9651,7 +9651,7 @@ (define_split
 })
 
 (define_insn "*andn_1"
-  [(set (match_operand:SWI48 0 "register_operand" "=r,r,k")
+  [(set (match_operand:SWI48 0 "register_operand" "=r,r,?k")
(and:SWI48
  (not:SWI48 (match_operand:SWI48 1 "register_operand" "r,r,k"))
  (match_operand:SWI48 2 "nonimmediate_operand" "r,m,k")))
@@ -9667,7 +9667,7 @@ (define_insn "*andn_1"
(set_attr "mode" "")])
 
 (define_insn "*andn_1"
-  [(set (match_operand:SWI12 0 "register_operand" "=r,k")
+  [(set (match_operand:SWI12 0 "register_operand" "=r,?k")
(and:SWI12
  (not:SWI12 (match_operand:SWI12 1 "register_operand" "r,k"))
  (match_operand:SWI12 2 "register_operand" "r,k")))
@@ -9757,7 +9757,7 @@ (define_insn_and_split "*di3_doubleword"
 })
 
 (define_insn "*_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,k")
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
(any_or:SWI248
 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
 (match_operand:SWI248 2 "" "r,m,k")))

Re: [PATCH] c++: REF_PARENTHESIZED_P wrapper inhibiting NRVO [PR67302]

2021-06-20 Thread Jason Merrill via Gcc-patches

On 6/19/21 3:45 PM, Patrick Palka wrote:

Here, in C++14 or later, we remember the parentheses around 'a' in the
return statement by using a REF_PARENTHESIZED_P wrapper, which ends up
inhibiting NRVO because we don't look through this wrapper before
checking the conditions for NRVO.  This patch fixes this by calling
maybe_undo_parenthesized_ref sooner in check_return_expr.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/67302

gcc/cp/ChangeLog:

* typeck.c (check_return_expr): Call maybe_undo_parenthesized_ref
sooner, before the NRVO handling.

gcc/testsuite/ChangeLog:

* g++.dg/opt/nrv21.C: New test.
---
  gcc/cp/typeck.c  |  9 -
  gcc/testsuite/g++.dg/opt/nrv21.C | 14 ++
  2 files changed, 18 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/opt/nrv21.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index dbb2370510c..aa014c3812a 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -10306,7 +10306,10 @@ check_return_expr (tree retval, bool *no_warning)
  
   See finish_function and finalize_nrv for the rest of this optimization.  */

if (retval)
-STRIP_ANY_LOCATION_WRAPPER (retval);
+{
+  retval = maybe_undo_parenthesized_ref (retval);
+  STRIP_ANY_LOCATION_WRAPPER (retval);
+}
  
bool named_return_value_okay_p = can_do_nrvo_p (retval, functype);

if (fn_returns_value_p && flag_elide_constructors)
@@ -10340,10 +10343,6 @@ check_return_expr (tree retval, bool *no_warning)
if (VOID_TYPE_P (functype))
return error_mark_node;
  
-  /* If we had an id-expression obfuscated by force_paren_expr, we need

-to undo it so we can try to treat it as an rvalue below.  */
-  retval = maybe_undo_parenthesized_ref (retval);
-
if (processing_template_decl)
retval = build_non_dependent_expr (retval);
  
diff --git a/gcc/testsuite/g++.dg/opt/nrv21.C b/gcc/testsuite/g++.dg/opt/nrv21.C

new file mode 100644
index 000..31bff79afc1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/nrv21.C
@@ -0,0 +1,14 @@
+// PR c++/67302
+// { dg-additional-options -fdump-tree-gimple }
+// { dg-final { scan-tree-dump-not " = a" "gimple" } }
+
+struct A
+{
+  int ar[42];
+  A();
+};
+
+A f() {
+  A a;
+  return (a);
+}





Re: [PATCH] c++: conversion to base of vbase in NSDMI [PR80431]

2021-06-20 Thread Jason Merrill via Gcc-patches

On 6/18/21 4:39 PM, Patrick Palka wrote:

The delayed processing of conversions to a virtual base inside an NSDMI
assumes the target base type is a (possibly indirect) virtual base of
the current class, but the target base type could also be an indirect
non-virtual base inherited from a virtual base, as in the testcase below.
Since such a base isn't a part of CLASSTYPE_VBASECLASSES, we end up
miscompiling the testcase due to build_base_path (called with
binfo=NULL_TREE) silently returning error_mark_node.  Fix this by
using convert_to_base to build the conversion.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/80431

gcc/cp/ChangeLog:

* tree.c (bot_replace): Use convert_to_base instead of
only looking through CLASSTYPE_VBASECLASSES of the current class
type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-virtual1a.C: New test.
---
  gcc/cp/tree.c| 10 ++-
  gcc/testsuite/g++.dg/cpp0x/nsdmi-virtual1a.C | 29 
  2 files changed, 32 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/nsdmi-virtual1a.C

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index fec5afaa2be..3537f395960 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -3244,13 +3244,9 @@ bot_replace (tree* t, int* /*walk_subtrees*/, void* 
data_)
  {
/* In an NSDMI build_base_path defers building conversions to virtual


So this should be "morally virtual" 
(https://itanium-cxx-abi.github.io/cxx-abi/abi.html#definitions)


OK with that change.


 bases, and we handle it here.  */
-  tree basetype = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (*t)));
-  vec *vbases = CLASSTYPE_VBASECLASSES (current_class_type);
-  int i; tree binfo;
-  FOR_EACH_VEC_SAFE_ELT (vbases, i, binfo)
-   if (BINFO_TYPE (binfo) == basetype)
- break;
-  *t = build_base_path (PLUS_EXPR, TREE_OPERAND (*t, 0), binfo, true,
+  tree basetype = TREE_TYPE (*t);
+  *t = convert_to_base (TREE_OPERAND (*t, 0), basetype,
+   /*check_access=*/false, /*nonnull=*/true,
tf_warning_or_error);
  }
  
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-virtual1a.C b/gcc/testsuite/g++.dg/cpp0x/nsdmi-virtual1a.C

new file mode 100644
index 000..fe647fe3cf7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-virtual1a.C
@@ -0,0 +1,29 @@
+// PR c++/80431
+// { dg-do run { target c++11 } }
+
+// A variant of nsdmi-virtual1.C that turns A from a virtual base of B to a 
base
+// of a virtual base of B, using the intermediate class D.
+
+struct A
+{
+  A(): i(42) { }
+  int i;
+  int f() { return i; }
+};
+
+struct D : A { int pad; };
+
+struct B : virtual D
+{
+  int j = i + f();
+  int k = A::i + A::f();
+};
+
+struct C: B { int pad; };
+
+int main()
+{
+  C c;
+  if (c.j != 84 || c.k != 84)
+__builtin_abort();
+}





[PATCH 2/2] Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE

2021-06-20 Thread H.J. Lu via Gcc-patches
Generate the GNU_PROPERTY_1_NEEDED_SINGLE_GLOBAL_DEFINITION marker for
-fsingle-global-definition to indicate that the object file requires
canonical function pointers and cannot be used with copy relocation.

* configure.ac (HAVE_LD_SINGLE_GLOBAL_DEFINITION_SUPPORT): New.
Define to 1 if linker supports -z single-global-definition.
* output.h (emit_gnu_property): New.
(emit_gnu_property_note): Likewise.
* target.def (emit_gnu_property_note): Add a argetm.asm_out hook.
* toplev.c (compile_file): Call emit_gnu_property_note before
file_end.
* varasm.c (emit_gnu_property): New.
(emit_gnu_property_note): Likewise.
* config.in: Regenerated.
* configure: Likewise.
* doc/tm.texi: Likewise.
* config/i386/gnu-property.c (emit_gnu_property): Removed.
(TARGET_ASM_EMIT_GNU_PROPERTY_NOTE): New.
* doc/tm.texi.in: Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE.
---
 gcc/config.in  |  6 +
 gcc/config/i386/gnu-property.c | 31 --
 gcc/config/i386/i386.c |  2 ++
 gcc/configure  | 42 +++---
 gcc/configure.ac   | 20 +++
 gcc/doc/tm.texi|  5 
 gcc/doc/tm.texi.in |  2 ++
 gcc/output.h   |  2 ++
 gcc/target.def |  8 ++
 gcc/toplev.c   |  3 +++
 gcc/varasm.c   | 47 ++
 11 files changed, 134 insertions(+), 34 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index 18e627141cc..ee2a94f3847 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1690,6 +1690,12 @@
 #endif
 
 
+/* Define to 1 if your linker supports -z single-global-definition */
+#ifndef USED_FOR_TARGET
+#undef HAVE_LD_SINGLE_GLOBAL_DEFINITION_SUPPORT
+#endif
+
+
 /* Define if your linker supports the *_sol2 emulations. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_LD_SOL2_EMULATION
diff --git a/gcc/config/i386/gnu-property.c b/gcc/config/i386/gnu-property.c
index 4ba04403002..9fe8d00132e 100644
--- a/gcc/config/i386/gnu-property.c
+++ b/gcc/config/i386/gnu-property.c
@@ -24,37 +24,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "linux-common.h"
 
-static void
-emit_gnu_property (unsigned int type, unsigned int data)
-{
-  int p2align = ptr_mode == SImode ? 2 : 3;
-
-  switch_to_section (get_section (".note.gnu.property",
- SECTION_NOTYPE, NULL));
-
-  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
-  /* name length.  */
-  fprintf (asm_out_file, ASM_LONG "1f - 0f\n");
-  /* data length.  */
-  fprintf (asm_out_file, ASM_LONG "4f - 1f\n");
-  /* note type: NT_GNU_PROPERTY_TYPE_0.  */
-  fprintf (asm_out_file, ASM_LONG "5\n");
-  fprintf (asm_out_file, "0:\n");
-  /* vendor name: "GNU".  */
-  fprintf (asm_out_file, STRING_ASM_OP "\"GNU\"\n");
-  fprintf (asm_out_file, "1:\n");
-  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
-  /* pr_type.  */
-  fprintf (asm_out_file, ASM_LONG "0x%x\n", type);
-  /* pr_datasz.  */
-  fprintf (asm_out_file, ASM_LONG "3f - 2f\n");
-  fprintf (asm_out_file, "2:\n");
-  fprintf (asm_out_file, ASM_LONG "0x%x\n", data);
-  fprintf (asm_out_file, "3:\n");
-  ASM_OUTPUT_ALIGN (asm_out_file, p2align);
-  fprintf (asm_out_file, "4:\n");
-}
-
 void
 file_end_indicate_exec_stack_and_gnu_property (void)
 {
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9878c3126d0..b1268756322 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -24036,6 +24036,8 @@ ix86_run_selftests (void)
 #if !TARGET_MACHO && !TARGET_DLLIMPORT_DECL_ATTRIBUTES
 # undef TARGET_ASM_RELOC_RW_MASK
 # define TARGET_ASM_RELOC_RW_MASK ix86_reloc_rw_mask
+# undef TARGET_ASM_EMIT_GNU_PROPERTY_NOTE
+# define TARGET_ASM_EMIT_GNU_PROPERTY_NOTE emit_gnu_property_note
 #endif
 
 static bool ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
diff --git a/gcc/configure b/gcc/configure
index dd0194a57f4..3d53ce8cc9a 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -911,6 +911,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -1085,6 +1086,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE}'
@@ -1337,6 +1339,15 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
+  -runstatedir | --runstatedir | --runstatedi | --runstated \
+  | --runstate | --runstat | --runsta | --runst | --runs \
+  | --run | --ru | --r)
+ac_prev=runstatedir ;;
+  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
+  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
+  | --run=* | --ru=* | --r=*)
+runstatedir=$ac_optarg ;;
+
   -sbindir | --sbindir | 

[PATCH 0/2] Implement single global definition

2021-06-20 Thread H.J. Lu via Gcc-patches
On systems with copy relocation:
* A copy in executable is created for the definition in a shared library
at run-time by ld.so.
* The copy is referenced by executable and shared libraries.
* Executable can access the copy directly.

Issues are:
* Overhead of a copy, time and space, may be visible at run-time.
* Read-only data in the shared library becomes read-write copy in
executable at run-time.
* Local access to data with the STV_PROTECTED visibility in the shared
library must use GOT.

On systems without function descriptor, function pointers vary depending
on where and how the functions are defined.
* If the function is defined in executable, it can be the address of
function body.
* If the function, including the function with STV_PROTECTED visibility,
is defined in the shared library, it can be the address of the PLT entry
in executable or shared library.

Issues are:
* The address of function body may not be used as its function pointer.
* ld.so needs to search loaded shared libraries for the function pointer
of the function with STV_PROTECTED visibility.

Here is a proposal to remove copy relocation and use canonical function
pointer:

1. Accesses, including in PIE and non-PIE, to undefined symbols must
use GOT.
  a. Linker may optimize out GOT access if the data is defined in PIE or
  non-PIE.
2. Read-only data in the shared library remain read-only at run-time
3. Address of global data with the STV_PROTECTED visibility in the shared
library is the address of data body.
  a. Can use IP-relative access.
  b. May need GOT without IP-relative access.
4. For systems without function descriptor,
  a. All global function pointers of undefined functions in PIE and
  non-PIE must use GOT.  Linker may optimize out GOT access if the
  function is defined in PIE or non-PIE.
  b. Function pointer of functions with the STV_PROTECTED visibility in
  executable and shared library is the address of function body.
   i. Can use IP-relative access.
   ii. May need GOT without IP-relative access.
   iii. Branches to undefined functions may use PLT.
5. Single global definition marker:

Add GNU_PROPERTY_1_NEEDED:

#define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO

to indicate the needed properties by the object file.

Add GNU_PROPERTY_1_NEEDED_SINGLE_GLOBAL_DEFINITION:

#define GNU_PROPERTY_1_NEEDED_SINGLE_GLOBAL_DEFINITION (1U << 0)

to indicate that the object file requires canonical function pointers and
cannot be used with copy relocation.

  a. Copy relocation should be disallowed at link-time and run-time.
  b. Canonical function pointers are required at link-time and run-tima

Add a compiler option, -fsingle-global-definition:

1. Always to use GOT to access undefined symbols, including in PIE and
non-PIE.  This is safe to do and does not break the ABI.
2. In executable and shared library, for symbols with the STV_PROTECTED
visibility:
  a. The address of data symbol is the address of data body.
  b. For systems without function descriptor, the function pointer is
  the address of function body.
These break the ABI and resulting shared libraries may not be compatible
with executables which are not compiled with -fsingle-global-definition.
3. Generate a single global definition marker in relocatable objects.

H.J. Lu (2):
  Add -fsingle-global-definition
  Add TARGET_ASM_EMIT_GNU_PROPERTY_NOTE

 gcc/common.opt|  4 ++
 gcc/config.in |  6 +++
 gcc/config/i386/gnu-property.c| 31 -
 gcc/config/i386/i386-protos.h |  2 +-
 gcc/config/i386/i386.c| 52 --
 gcc/configure | 42 --
 gcc/configure.ac  | 20 +
 gcc/doc/invoke.texi   |  8 +++-
 gcc/doc/tm.texi   |  5 +++
 gcc/doc/tm.texi.in|  2 +
 gcc/output.h  |  2 +
 gcc/target.def|  8 
 gcc/testsuite/g++.dg/pr35513-1.C  | 25 +++
 gcc/testsuite/g++.dg/pr35513-2.C  | 53 +++
 gcc/testsuite/gcc.target/i386/pr35513-1.c | 16 +++
 gcc/testsuite/gcc.target/i386/pr35513-2.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-3.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-4.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-5.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-6.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr35513-7.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-8.c | 41 ++
 gcc/toplev.c  |  3 ++
 gcc/varasm.c  | 47 
 24 files changed, 406 insertions(+), 50 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr35513-1.C
 create mode 100644 gcc/testsuite/g++.dg/pr35513-2.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-1.c
 create mode 100644 

[PATCH 1/2] Add -fsingle-global-definition

2021-06-20 Thread H.J. Lu via Gcc-patches
1. Generate a single global definition marker in relocatable objects.
   a. Always use GOT to access undefined data and function symbols,
  including in PIE and non-PIE.  These will avoid copy relocations
  in executables.
   b. This is compatible with existing executables and shared libraries.
2. In executable and shared library, bind symbols with the STV_PROTECTED
   visibility locally:
   a. The address of data symbol is the address of data body.
   b. For systems without function descriptor, the function pointer is
  the address of function body.
   c. The resulting shared libraries may not be incompatible with
  executables which have copy relocations on protected symbols.
3. Update asm_preferred_eh_data_format to properly select EH encoding
format with -fsingle-global-definition.
4. Add ix86_reloc_rw_mask for TARGET_ASM_RELOC_RW_MASK to avoid copy
relocation with -fsingle-global-definition.

gcc/

PR target/35513
PR target/100593
* common.opt: Add -fsingle-global-definition.
* config/i386/i386-protos.h (ix86_force_load_from_GOT_p): Add a
bool argument.
* config/i386/i386.c (ix86_force_load_from_GOT_p): Add a bool
argument to indicate call operand.  Force non-call load
from GOT for -fsingle-global-definition.
(legitimate_pic_address_disp_p): Avoid copy relocation in PIE
for -fsingle-global-definition.
(ix86_print_operand): Pass true to ix86_force_load_from_GOT_p
for call operand.
(asm_preferred_eh_data_format): Use PC-relative format for
-fsingle-global-definition to avoid copy relocation.  Check
ptr_mode instead of TARGET_64BIT when selecting DW_EH_PE_sdata4.
(ix86_binds_local_p): Don't treat protected data as extern and
avoid copy relocation on common symbol.
(ix86_reloc_rw_mask): New to avoid copy relocation for
-fsingle-global-definition.
(TARGET_ASM_RELOC_RW_MASK): New.
* doc/invoke.texi: Document -fsingle-global-definition.

gcc/testsuite/

PR target/35513
PR target/100593
* g++.dg/pr35513-1.C: New file.
* g++.dg/pr35513-2.C: Likewise.
* gcc.target/i386/pr35513-1.c: Likewise.
* gcc.target/i386/pr35513-2.c: Likewise.
* gcc.target/i386/pr35513-3.c: Likewise.
* gcc.target/i386/pr35513-4.c: Likewise.
* gcc.target/i386/pr35513-5.c: Likewise.
* gcc.target/i386/pr35513-6.c: Likewise.
* gcc.target/i386/pr35513-7.c: Likewise.
* gcc.target/i386/pr35513-8.c: Likewise.
---
 gcc/common.opt|  4 ++
 gcc/config/i386/i386-protos.h |  2 +-
 gcc/config/i386/i386.c| 50 +++--
 gcc/doc/invoke.texi   |  8 +++-
 gcc/testsuite/g++.dg/pr35513-1.C  | 25 +++
 gcc/testsuite/g++.dg/pr35513-2.C  | 53 +++
 gcc/testsuite/gcc.target/i386/pr35513-1.c | 16 +++
 gcc/testsuite/gcc.target/i386/pr35513-2.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-3.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-4.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-5.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-6.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr35513-7.c | 15 +++
 gcc/testsuite/gcc.target/i386/pr35513-8.c | 41 ++
 14 files changed, 272 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr35513-1.C
 create mode 100644 gcc/testsuite/g++.dg/pr35513-2.C
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-8.c

diff --git a/gcc/common.opt b/gcc/common.opt
index a1353e06bdc..b1cb53bb780 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2579,6 +2579,10 @@ fsigned-zeros
 Common Var(flag_signed_zeros) Init(1) Optimization SetByCombined
 Disable floating point optimizations that ignore the IEEE signedness of zero.
 
+fsingle-global-definition
+Common Var(flag_single_global_definition) Optimization
+Use GOT to access external symbols and make access to protected symbols local.
+
 fsingle-precision-constant
 Common Var(flag_single_precision_constant) Optimization
 Convert floating point constants to single precision constants.
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index e6ac9390777..30f75b9900b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -77,7 +77,7 @@ extern bool ix86_expand_cmpstrn_or_cmpmem (rtx, rtx, rtx, 
rtx, rtx, bool);
 extern bool 

[PATCH] doc/lto.texi: List slim object format as the default

2021-06-20 Thread Dimitar Dimitrov
Slim LTO object files have been the default for quite a while, since:
  commit e9f67e625c2a4225a7169d7220dcb85b6fdd7ca9
  Author: Jan Hubicka 
  common.opt (ffat-lto-objects): Disable by default.

That commit did not update lto.texi, so do it now.

gcc/ChangeLog:

* doc/lto.texi (Design Overview): Update that slim objects are
the default.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/doc/lto.texi | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/gcc/doc/lto.texi b/gcc/doc/lto.texi
index 1f55216328a..755258ccb2b 100644
--- a/gcc/doc/lto.texi
+++ b/gcc/doc/lto.texi
@@ -36,11 +36,16 @@ bytecode representation of GIMPLE that is emitted in 
special sections
 of @code{.o} files.  Currently, LTO support is enabled in most
 ELF-based systems, as well as darwin, cygwin and mingw systems.
 
-Since GIMPLE bytecode is saved alongside final object code, object
-files generated with LTO support are larger than regular object files.
-This ``fat'' object format makes it easy to integrate LTO into
-existing build systems, as one can, for instance, produce archives of
-the files.  Additionally, one might be able to ship one set of fat
+Object files generated with LTO support contain only GIMPLE bytecode.
+Such objects are called ``slim'', and they require that tools like
+@code{ar} and @code{nm} understand symbol tables of LTO sections.  These tools
+have been extended to use the plugin infrastructure, so GCC can support
+``slim'' objects consisting of the intermediate code alone.
+
+GIMPLE bytecode could also be saved alongside final object code if the
+@option{-ffat-lto-objects} option is passed.  But this would make the
+object files generated with LTO support larger than regular object
+files.  This ``fat'' object format allows to ship one set of fat
 objects which could be used both for development and the production of
 optimized builds.  A, perhaps surprising, side effect of this feature
 is that any mistake in the toolchain leads to LTO information not
@@ -49,14 +54,6 @@ This is both an advantage, as the system is more robust, and 
a
 disadvantage, as the user is not informed that the optimization has
 been disabled.
 
-The current implementation only produces ``fat'' objects, effectively
-doubling compilation time and increasing file sizes up to 5x the
-original size.  This hides the problem that some tools, such as
-@code{ar} and @code{nm}, need to understand symbol tables of LTO
-sections.  These tools were extended to use the plugin infrastructure,
-and with these problems solved, GCC will also support ``slim'' objects
-consisting of the intermediate code alone.
-
 At the highest level, LTO splits the compiler in two.  The first half
 (the ``writer'') produces a streaming representation of all the
 internal data structures needed to optimize and generate code.  This
-- 
2.31.1



[x86_64 PATCH] PR target/11877: Use xor to write zero to memory with -Os

2021-06-20 Thread Roger Sayle

The following patch attempts to resolve PR target/11877 (without
triggering PR/23102).  On x86_64, writing an SImode or DImode zero
to memory uses an instruction encoding that is larger than first
clearing a register (using xor) then writing that to memory.  Hence,
after reload, the peephole2 pass can determine if there's a suitable
free register, and if so, use that to shrink the code size with -Os.

To improve code size, and avoid inserting a large number of xor
instructions (PR target/23102), this patch makes use of peephole2's
efficient pattern matching to use a single temporary for a run of
consecutive writes.  In theory, one could do better still with a
new target-specific pass, gated on -Os, to shrink these instructions
(like stv), but that's probably overkill for the little remaining
space savings.

Evaluating this patch on the CSiBE benchmark (v2.1.1) results in a
0.26% code size improvement (3715273 bytes down to 3705477) on x86_64
with -Os [saving 1 byte every 400].  549 of 894 tests improve, two
tests grow larger.  Analysis of these 2 pathological cases reveals
that although peephole2's match_scratch prefers to use a call-clobbered
register (to avoid requiring a new stack frame), very rarely this
interacts with GCC's shrink wrapping optimization, which may previously
have avoided saving/restoring a call clobbered register, such as %eax,
in the calling function.

This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap
and make -k check with no new failures.

Ok for mainline?


2021-06-20  Roger Sayle  

gcc/ChangeLog
PR target/11877
* config/i386/i386.md: New define_peephole2s to shrink writing
1, 2 or 4 consecutive zeros to memory when optimizing for size.

gcc/testsuite/ChangeLog
PR target/11877
* gcc.target/i386/pr11877.c: New test case.

--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 48532eb..2333261 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -19357,6 +19357,42 @@
   ix86_expand_clear (operands[1]);
 })
 
+;; When optimizing for size, zeroing memory should use a register.
+(define_peephole2
+  [(match_scratch:SWI48 0 "r")
+   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))
+   (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))
+   (set (match_operand:SWI48 3 "memory_operand" "") (const_int 0))
+   (set (match_operand:SWI48 4 "memory_operand" "") (const_int 0))]
+  "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(set (match_dup 1) (match_dup 0))
+   (set (match_dup 2) (match_dup 0))
+   (set (match_dup 3) (match_dup 0))
+   (set (match_dup 4) (match_dup 0))]
+{
+  ix86_expand_clear (operands[0]);
+})
+
+(define_peephole2
+  [(match_scratch:SWI48 0 "r")
+   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))
+   (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))]
+  "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(set (match_dup 1) (match_dup 0))
+   (set (match_dup 2) (match_dup 0))]
+{
+  ix86_expand_clear (operands[0]);
+})
+
+(define_peephole2
+  [(match_scratch:SWI48 0 "r")
+   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))]
+  "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)"
+  [(set (match_dup 1) (match_dup 0))]
+{
+  ix86_expand_clear (operands[0]);
+})
+
 ;; Reload dislikes loading constants directly into class_likely_spilled
 ;; hard registers.  Try to tidy things up here.
 (define_peephole2
/* PR target/11877 */
/* { dg-do compile } */
/* { dg-options "-Os" } */

void foo (long long *p)
{
  *p = 0;
}

void bar (int *p)
{
  *p = 0;
}

/* { dg-final { scan-assembler-times "xorl\[ \t\]" 2 } } */
/* { dg-final { scan-assembler-not "\\\$0," } } */


Re: [PATCH] Modula-2 into the GCC tree on master

2021-06-20 Thread Gaius Mulley via Gcc-patches
Segher Boessenkool  writes:

> Hi!
>
> On Fri, Jun 18, 2021 at 10:00:40PM +0100, Gaius Mulley wrote:
>> Segher Boessenkool  writes:
>> > On Thu, Jun 17, 2021 at 11:26:41PM +0100, Gaius Mulley via Gcc-patches 
>> > wrote:
>> >> Debian Stretch using make -j 4, x86_64 GNU/Linux Debian Stretch built
>> >> using make -j 24 and also under x86_64 GNU/Linux Debian Buster using
>> >> make -j 4.
>> >
>> > I am building it on powerpc64-linux (-m32,-m64) and poweerpc64le-linux
>> > currently.  (All CentOS 7 fwiw).
>>
>> excellent the more varieties the better - I'm eagerly awaiting a risc-v
>> motherboard which might also be interesting
>
> I needed a few fixes to get it to build, they are in my branch
> (https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=users/segher/heads/gm2)
>
> The files gm2-libs/getopt.def and gm2-libs/GetOpt.def have filenames
> that differ case only, this is censored by the scripts that we run on
> the Git server.  I renamed the former to cgetopt.def for now, but of
> course more changes are needed for this to work at all.

Hi Segher,

ah yes thanks for spotting this - I recall I had a similar issue with
SYSTEM.def will change to getopt.def to cgetopt.def.

>> > It does not want to build gm2tools, haven't investigated that yet
>> > either.
>
> Not yet :-)
>
>> > Will report results later.
>
> powerpc64-linux now is building, and is running the tetsuite.  My
> powerpc64le-linux build used --enable-languages=all, but Ada fails to
> build, so I'll redo that without Ada.
>
> Gaius, could you look through the two patches I did to get the build to
> work, see if those are correct or if something better needs to be done?
>
> 
> $(subdir) is an absolute path for me, so ../$(subdir) cannot work.

this looks sensible - I'll also test and apply this on a few machines.

> 
> Maybe your texinfo is less picky than mine, I use an older one (5.1)?

(Debian buster texinfo is on 6.5.0 and Debian stretch is on 6.3.0).  But
the up node was inconsistent :-), again thanks for these up node fixes.

I will rebuild on aarch64 (Debian stretch), x86_64 Debian stretch and
x86_64 Debian buster and make source changes for cgetopt.def etc.


regards,
Gaius