Re: [PATCH, libstdc++ testsuite] Correct path to libatomic

2021-04-23 Thread Jeff Law via Gcc-patches



On 4/23/2021 6:54 PM, David Edelsohn via Gcc-patches wrote:

Some ports require libatomic for atomic operations, at least for some
data types and widths.  The libstdc++ testsuite previously was updated
to link against libatomic, but the search path was hard-coded to
something that is not always correct, and the shared library search
path was not set.

The search path was hard-coded to the expected location of the
libatomic build directory relative to the libstdc++ testsuite
directory, but if one uses parallelism when invoking the libstdc++
testsuite, the tests are run in the "normalXX" sub-directories, for
which the hard-coded search path is incorrect. The path also is
incorrect for alternative multilib and tool options.

This patch adopts the logic from gcc/testsuite/lib/atomic-dg.exp to
search for the library and adds the logic to the libstdc++ testsuite
libatomic seatch path code.  Previously the libstdc++ testsuite atomic
tests failed depending on the build configuration and if a build of
libatomic was installed in the default search path.

Bootstrapped on powerpc-ibm-aix7.2.3.0.

Okay to install?

Thanks, David

* testsuite/lib/dg-options.exp (atomic_link_flags): New.
(add_options_for_libatomic): Use atomic_link_flags.


OK

jeff



Re: [PATCH] Add dg-final option-based target selectors

2021-04-23 Thread Jeff Law via Gcc-patches



On 4/19/2021 1:28 PM, Richard Sandiford via Gcc-patches wrote:

This patch adds target selectors of the form:

   { any-opts "opt1" ... "optn" }
   { no-opts "opt1" ... "optn" }

for skipping or xfailing tests based on compiler options.  It only
works for dg-final selectors.

The patch then uses no-opts to exclude -O0 and (sometimes) -Og from
some guality.exp xfails.  AFAICT (based on gcc-testresults) these
tests pass for those options for all targets.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
If so, OK now, or should it wait for GCC 12?

Richard


gcc/
* doc/sourcebuild.texi: Document no-opts and any-opts target
selectors.

gcc/testsuite/
* lib/target-supports-dg.exp (selector_expression): Handle any-opts
and no-opts.
* gcc.dg/guality/pr41353-1.c: Exclude -O0 from xfail.
* gcc.dg/guality/pr59776.c: Likewise.
* gcc.dg/guality/pr54970.c: Likewise -O0 and -Og.


OK for the trunk.

jeff




Re: [RFC] Run pass_sink_code once more after ivopts/fre

2021-04-23 Thread Jeff Law via Gcc-patches



On 4/14/2021 12:41 AM, Richard Biener wrote:

On Wed, 14 Apr 2021, Xionghu Luo wrote:


Hi,

On 2021/3/26 15:35, Xionghu Luo via Gcc-patches wrote:

Also we already have a sinking pass on RTL which even computes
a proper PRE on the reverse graph - -fgcse-sm aka store-motion.c.
I'm not sure whether this deals with non-stores but the
LCM machinery definitely can handle arbitrary expressions.  I wonder
if it makes more sense to extend this rather than inventing a new
ad-hoc sinking pass?

  From the literal, my pass doesn't handle or process store instructions
like store-motion..  Thanks, will check it.

Store motion only processes store instructions with data flow equations,
generating 4 inputs(st_kill, st_avloc, st_antloc, st_transp) and solve it
by Lazy Code Motion API(5 DF compute call) with 2 outputs (st_delete_map,
st_insert_map) globally, each store place is independently represented in
the input bitmap vectors. Output is which should be delete and where to
insert, current code does what you said "emit copies to a new pseudo at
the original insn location and use it in followed bb", actually it is
"store replacement" instead of "store move", why not save one pseudo by
moving the store instruction to target edge directly?

It probably simply saves the pass from doing analysis whether the
stored value is clobbered on the sinking path, enabling more store
sinking.  For stores that might be even beneficial, for non-stores
it becomes more of a cost issue, yes.


There are many differences between the newly added rtl-sink pass and
store-motion pass.
1. Store motion moves only store instructions, rtl-sink ignores store
instructions.
2. Store motion is a global DF problem solving, rtl-sink only processes
loop header reversely with dependency check in loop, take the below RTL
as example,
"#538,#235,#234,#233" will all be sunk from bb 35 to bb 37 by rtl-sink,
but it moves #538 first, then #235, there is strong dependency here. It
seemsdoesn't like the LCM framework that could solve all and do the
delete-insert in one iteration.

So my question was whether we want to do both within the LCM store
sinking framework.  The LCM dataflow is also used by RTL PRE which
handles both loads and non-loads so in principle it should be able
to handle stores and non-stores for the sinking case (PRE on the
reverse CFG).

A global dataflow is more powerful than any local ad-hoc method.


IIRC you can use LCM on stores like this, but you have to run it 
independently on each store to pick up the secondary effects.   I 
believe the basic concepts are discussed in Morgan's book.   That may 
turn out to be too expensive in practice -- I've never tried it though.


jeff




[PATCH, libstdc++ testsuite] Correct path to libatomic

2021-04-23 Thread David Edelsohn via Gcc-patches
Some ports require libatomic for atomic operations, at least for some
data types and widths.  The libstdc++ testsuite previously was updated
to link against libatomic, but the search path was hard-coded to
something that is not always correct, and the shared library search
path was not set.

The search path was hard-coded to the expected location of the
libatomic build directory relative to the libstdc++ testsuite
directory, but if one uses parallelism when invoking the libstdc++
testsuite, the tests are run in the "normalXX" sub-directories, for
which the hard-coded search path is incorrect. The path also is
incorrect for alternative multilib and tool options.

This patch adopts the logic from gcc/testsuite/lib/atomic-dg.exp to
search for the library and adds the logic to the libstdc++ testsuite
libatomic seatch path code.  Previously the libstdc++ testsuite atomic
tests failed depending on the build configuration and if a build of
libatomic was installed in the default search path.

Bootstrapped on powerpc-ibm-aix7.2.3.0.

Okay to install?

Thanks, David

* testsuite/lib/dg-options.exp (atomic_link_flags): New.
(add_options_for_libatomic): Use atomic_link_flags.

--- a/libstdc++-v3/testsuite/lib/dg-options.exp
+++ b/libstdc++-v3/testsuite/lib/dg-options.exp
@@ -260,13 +260,58 @@ proc add_options_for_net_ts { flags } {
 # Add to FLAGS all the target-specific flags to link to libatomic,
 # if required for atomics on pointers and 64-bit types.

+proc atomic_link_flags { paths } {
+global srcdir
+global ld_library_path
+global shlib_ext
+
+set gccpath ${paths}
+set flags ""
+
+set shlib_ext [get_shlib_extension]
+
+if { $gccpath != "" } {
+  if { [file exists "${gccpath}/libatomic/.libs/libatomic.a"]
+   || [file exists "${gccpath}/libatomic/.libs/libatomic.${shlib_ext}"]
 } {
+  append flags " -B${gccpath}/libatomic/ "
+  append flags " -L${gccpath}/libatomic/.libs"
+  append ld_library_path ":${gccpath}/libatomic/.libs"
+  }
+} else {
+  global tool_root_dir
+
+  set libatomic [lookfor_file ${tool_root_dir} libatomic]
+  if { $libatomic != "" } {
+  append flags "-L${libatomic} "
+  append ld_library_path ":${libatomic}"
+  }
+}
+
+set_ld_library_path_env_vars
+
+return "$flags"
+}
+
 proc add_options_for_libatomic { flags } {
 if { [istarget hppa*-*-hpux*]
 || ([istarget powerpc*-*-*] && [check_effective_target_ilp32])
 || [istarget riscv*-*-*]
 || ([istarget sparc*-*-linux-gnu] && [check_effective_target_ilp32])
} {
-   return "$flags -L../../libatomic/.libs -latomic"
+   global TOOL_OPTIONS
+
+   set link_flags ""
+   if ![is_remote host] {
+   if [info exists TOOL_OPTIONS] {
+   set link_flags "[atomic_link_flags [get_multilibs
${TOOL_OPTIONS}]]"
+   } else {
+   set link_flags "[atomic_link_flags [get_multilibs]]"
+   }
+   }
+
+   append link_flags " -latomic "
+
+   return "$flags $link_flags"
 }
 return $flags
 }


Re: [PATCH] Fix logic error in 32-bit trampolines, PR target/98952

2021-04-23 Thread Segher Boessenkool
On Fri, Apr 23, 2021 at 06:24:07PM -0400, Michael Meissner wrote:
> On Thu, Apr 22, 2021 at 05:56:32PM -0500, Segher Boessenkool wrote:
> > As Will says, it looks like the ELFv2 version has the same bug.  Please
> > fix that the same way.
> 
> Yes it has the same bug.  However in practice it would never be hit, since 
> this
> bug is 32-bit, and we only build 64-bit systems with ELF v2.  I did fix it.

Hrm, in that case, why do we have that code at all?!

> > Okay for trunk.  Okay for backport to 11 when that branch opens again.
> > Does this need more backports?  (Those should follow after 11 of
> > course).
> 
> Bill mentioned we may want to backport this to earlier branches before they 
> are
> frozen.  Tulio, are backports to earlier revisions important?

Well, the bug has been there since the original commit to (then)
tramp.asm, which was 25 years ago, and only now people noticed ;-)

We should have a backport to GCC 11 at least.  Older is up to you (and
Tulio).


Segher


Re: [RFC] bpf.2: Use standard types and attributes

2021-04-23 Thread Alexei Starovoitov via Gcc-patches
On Fri, Apr 23, 2021 at 4:15 PM Alejandro Colomar
 wrote:
>
> Some manual pages are already using C99 syntax for integral
> types 'uint32_t', but some aren't.  There are some using kernel
> syntax '__u32'.  Fix those.
>
> Some pages also document attributes, using GNU syntax
> '__attribute__((xxx))'.  Update those to use the shorter and more
> portable C2x syntax, which hasn't been standardized yet, but is
> already implemented in GCC, and available through either --std=c2x
> or any of the --std=gnu... options.
>
> Signed-off-by: Alejandro Colomar 
> ---
>  man2/bpf.2 | 47 +++
>  1 file changed, 23 insertions(+), 24 deletions(-)
>
> diff --git a/man2/bpf.2 b/man2/bpf.2
> index 6e1ffa198..204f01bfc 100644
> --- a/man2/bpf.2
> +++ b/man2/bpf.2
> @@ -188,39 +188,38 @@ commands:
>  .EX
>  union bpf_attr {
>  struct {/* Used by BPF_MAP_CREATE */
> -__u32 map_type;
> -__u32 key_size;/* size of key in bytes */
> -__u32 value_size;  /* size of value in bytes */
> -__u32 max_entries; /* maximum number of entries
> -  in a map */
> +uint32_tmap_type;
> +uint32_tkey_size;/* size of key in bytes */
> +uint32_tvalue_size;  /* size of value in bytes */
> +uint32_tmax_entries; /* maximum number of entries
> +in a map */

Nack.
The man page should describe the kernel api the way it is in .h file.


[RFC] bpf.2: Use standard types and attributes

2021-04-23 Thread Alejandro Colomar via Gcc-patches
Some manual pages are already using C99 syntax for integral
types 'uint32_t', but some aren't.  There are some using kernel
syntax '__u32'.  Fix those.

Some pages also document attributes, using GNU syntax
'__attribute__((xxx))'.  Update those to use the shorter and more
portable C2x syntax, which hasn't been standardized yet, but is
already implemented in GCC, and available through either --std=c2x
or any of the --std=gnu... options.

Signed-off-by: Alejandro Colomar 
---
 man2/bpf.2 | 47 +++
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/man2/bpf.2 b/man2/bpf.2
index 6e1ffa198..204f01bfc 100644
--- a/man2/bpf.2
+++ b/man2/bpf.2
@@ -188,39 +188,38 @@ commands:
 .EX
 union bpf_attr {
 struct {/* Used by BPF_MAP_CREATE */
-__u32 map_type;
-__u32 key_size;/* size of key in bytes */
-__u32 value_size;  /* size of value in bytes */
-__u32 max_entries; /* maximum number of entries
-  in a map */
+uint32_tmap_type;
+uint32_tkey_size;/* size of key in bytes */
+uint32_tvalue_size;  /* size of value in bytes */
+uint32_tmax_entries; /* maximum number of entries
+in a map */
 };
 
-struct {/* Used by BPF_MAP_*_ELEM and BPF_MAP_GET_NEXT_KEY
-   commands */
-__u32 map_fd;
-__aligned_u64 key;
+struct {/* Used by BPF_MAP_*_ELEM and BPF_MAP_GET_NEXT_KEY commands */
+uint32_t map_fd;
+uint64_t [[gnu::aligned(8)]] key;
 union {
-__aligned_u64 value;
-__aligned_u64 next_key;
+uint64_t [[gnu::aligned(8)]] value;
+uint64_t [[gnu::aligned(8)]] next_key;
 };
-__u64 flags;
+uint64_t flags;
 };
 
 struct {/* Used by BPF_PROG_LOAD */
-__u32 prog_type;
-__u32 insn_cnt;
-__aligned_u64 insns;  /* \(aqconst struct bpf_insn *\(aq */
-__aligned_u64 license;/* \(aqconst char *\(aq */
-__u32 log_level;  /* verbosity level of verifier */
-__u32 log_size;   /* size of user buffer */
-__aligned_u64 log_buf;/* user supplied \(aqchar *\(aq
- buffer */
-__u32 kern_version;
-  /* checked when prog_type=kprobe
- (since Linux 4.1) */
+uint32_t prog_type;
+uint32_t insn_cnt;
+uint64_t [[gnu::aligned(8)]] insns; /* \(aqconst struct bpf_insn 
*\(aq */
+uint64_t [[gnu::aligned(8)]] license;   /* \(aqconst char *\(aq */
+uint32_t log_level; /* verbosity level of verifier 
*/
+uint32_t log_size;  /* size of user buffer */
+uint64_t [[gnu::aligned(8)]] log_buf;   /* user supplied \(aqchar *\(aq
+   buffer */
+uint32_t kern_version;
+/* checked when 
prog_type=kprobe
+   (since Linux 4.1) */
 .\" commit 2541517c32be2531e0da59dfd7efc1ce844644f5
 };
-} __attribute__((aligned(8)));
+} [[gnu::aligned(8)]];
 .EE
 .in
 .\"
-- 
2.31.0



Re: [PATCH] Fix logic error in 32-bit trampolines, PR target/98952

2021-04-23 Thread Michael Meissner via Gcc-patches
On Thu, Apr 22, 2021 at 05:56:32PM -0500, Segher Boessenkool wrote:
> On Fri, Apr 09, 2021 at 05:09:07PM -0400, Michael Meissner wrote:
> > Fix logic error in 32-bit trampolines, PR target/98952.
> > 
> > The test in the PowerPC 32-bit trampoline support is backwards.  It aborts
> > if the trampoline size is greater than the expected size.  It should abort
> > when the trampoline size is less than the expected size.
> 
> > PR target/98952
> > * config/rs6000/tramp.S (__trampoline_setup): Fix trampoline size
> > comparison in 32-bit.
> 
> > --- a/libgcc/config/rs6000/tramp.S
> > +++ b/libgcc/config/rs6000/tramp.S
> > @@ -64,8 +64,7 @@ FUNC_START(__trampoline_setup)
> >  mflr   r11
> >  addi   r7,r11,trampoline_initial-4-.LCF0 /* trampoline address 
> > -4 */
> >  
> > -   li  r8,trampoline_size  /* verify that the trampoline is big 
> > enough */
> > -   cmpwcr1,r8,r4
> > +   cmpwi   cr1,r4,trampoline_size  /* verify that the trampoline is big 
> > enough */
> > srwir4,r4,2 /* # words to move */
> > addir9,r3,-4/* adjust pointer for lwzu */
> > mtctr   r4
> 
> As Will says, it looks like the ELFv2 version has the same bug.  Please
> fix that the same way.

Yes it has the same bug.  However in practice it would never be hit, since this
bug is 32-bit, and we only build 64-bit systems with ELF v2.  I did fix it.

> In the commit message and the changelog, point out that you folded the
> cmp with the li while you were at it.  It is easier to read code like
> this so the change is fine, but do point it out.
> 
> Can you test this in a testcase somehow?  That would have found the
> ELFv2 case, for example.

I created a test case calling __trampoline_setup with a larger buffer.  If it
doesn't abort the test passes.

> Okay for trunk.  Okay for backport to 11 when that branch opens again.
> Does this need more backports?  (Those should follow after 11 of
> course).

Bill mentioned we may want to backport this to earlier branches before they are
frozen.  Tulio, are backports to earlier revisions important?

I will attach the patch that I just commited.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
>From 9a30a3f06b908e4e781324c2e813cd1db87119df Mon Sep 17 00:00:00 2001
From: Michael Meissner 
Date: Fri, 23 Apr 2021 18:16:03 -0400
Subject: [PATCH] Fix logic error in 32-bit trampolines.

The test in the PowerPC 32-bit trampoline support is backwards.  It aborts
if the trampoline size is greater than the expected size.  It should abort
when the trampoline size is less than the expected size.  I fixed the test
so the operands are reversed.  I then folded the load immediate into the
compare instruction.

I verified this by creating a 32-bit trampoline program and manually
changing the size of the trampoline to be 48 instead of 40.  The program
aborted with the larger size.  I updated this code and ran the test again
and it passed.

I added a test case that runs on PowerPC 32-bit Linux systems and it calls
the __trampoline_setup function with a larger buffer size than the
compiler uses.  The test is not run on 64-bit systems, since the function
__trampoline_setup is not called.  I also limited the test to just Linux
systems, in case trampolines are handled differently in other systems.

libgcc/
2021-04-23  Michael Meissner  

PR target/98952
* config/rs6000/tramp.S (__trampoline_setup, elfv1 #ifdef): Fix
trampoline size comparison in 32-bit by reversing test and
combining load immediate with compare.
(__trampoline_setup, elfv2 #ifdef): Fix trampoline size comparison
in 32-bit by reversing test and combining load immediate with
compare.

gcc/testsuite/
2021-04-23  Michael Meissner  

PR target/98952
* gcc.target/powerpc/pr98952.c: New test.
---
 gcc/testsuite/gcc.target/powerpc/pr98952.c | 28 ++
 libgcc/config/rs6000/tramp.S   |  6 ++---
 2 files changed, 30 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr98952.c

diff --git a/gcc/testsuite/gcc.target/powerpc/pr98952.c 
b/gcc/testsuite/gcc.target/powerpc/pr98952.c
new file mode 100644
index 000..c487fbc403e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr98952.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target { powerpc*-*-linux* && ilp32 } } } */
+/* { dg-options "-O2" } */
+
+/* PR 96983 reported that the test in libgcc's tramp.S was backwards and it
+   would abort if the trampoline size passed to the function was greater than
+   the size the runtime was expecting (40).  It should abort if the size is 
less
+   than 40, not greater than 40.  This test creates a call to 
__trampoline_setup
+   with a much larger buffer to make sure the function does not abort.
+
+   We do not run this test on 64-bit since __trampoline_setup is not present in
+ 

Re: [PATCH] c++: do_class_deduction and dependent init [PR93383]

2021-04-23 Thread Jason Merrill via Gcc-patches

On 4/22/21 9:46 AM, Patrick Palka wrote:

On Wed, 21 Apr 2021, Patrick Palka wrote:


On Wed, 21 Apr 2021, Jason Merrill wrote:


On 4/12/21 1:20 PM, Patrick Palka wrote:

Here we're crashing during deduction for a template placeholder from a
dependent initializer because one of the initializer's elements has an
empty TREE_TYPE, something which resolve_args and later unify_one_argument
don't expect.  And if the deduction from a dependent initializer
otherwise fails, we prematurely issue an error rather than reattempting
the deduction at instantiation time.

This patch makes do_class_deduction more tolerant about dependent
initializers, in a manner similar to what do_auto_deduction does: if
deduction from a dependent initializer fails, just return the original
placeholder unchanged.


Why doesn't the type_dependent_expression_p check in do_auto_deduction catch
this already?


That check applies only when context != adc_unify, but here we have
context == adc_unify since we're being called from
convert_template_argument.

And currently, when 'auto' deduction fails for a dependent initializer,
do_auto_deduction will just silently return the original placeholder:

   int val = type_unification_real (tparms, targs, parms, , 1, 0,
DEDUCE_CALL,
NULL, /*explain_p=*/false);
   if (val > 0)
 {
   if (processing_template_decl)
 /* Try again at instantiation time.  */
 return type;

so I suppose this patch just makes do_class_deduction behave more
similarly to do_auto_deduction in this situation.


On second thought, I think attempting CTAD a dependent initializer as the patch
does might sometimes give us the wrong answer.  If e.g. the class template in
question has the deduction guides

   template  A(T) -> A;
   A(int) -> A;

then ahead-of-time CTAD for A{v} where v is type-dependent will succeed and
resolve to A, but at instantiation time the type of v might be int.  So
perhaps we should just have do_class_deduction punt on all type-dependent
expressions, e.g.


OK.


-- >8 --

gcc/cp/ChangeLog:

PR c++/89565
PR c++/93383
PR c++/99200
* pt.c (do_class_deduction): Give up if the initializer is
type-dependent.

gcc/testsuite/ChangeLog:

PR c++/89565
PR c++/93383
PR c++/99200
* g++.dg/cpp2a/nontype-class39.C: Remove dg-ice.
* g++.dg/cpp2a/nontype-class45.C: New test.
* g++.dg/cpp2a/nontype-class46.C: New test.
---
  gcc/cp/pt.c  |  5 +++
  gcc/testsuite/g++.dg/cpp2a/nontype-class39.C |  2 --
  gcc/testsuite/g++.dg/cpp2a/nontype-class45.C | 32 
  gcc/testsuite/g++.dg/cpp2a/nontype-class46.C | 11 +++
  4 files changed, 48 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class45.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class46.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7bcbe6dc3ce..6673f935ab6 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -29362,6 +29362,11 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
return error_mark_node;
  }
  
+  /* If the initializer is dependent, we can't resolve the class template

+ placeholder ahead of time.  */
+  if (type_dependent_expression_p (init))
+return ptype;
+
tree type = TREE_TYPE (tmpl);
  
bool try_list_ctor = false;

diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C
index 512afad8e4f..9b4da4f02ea 100644
--- a/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C
@@ -1,7 +1,5 @@
  // PR c++/89565
  // { dg-do compile { target c++20 } }
-// { dg-additional-options "-fchecking" }
-// { dg-ice "resolve_args" }
  
  template 

  struct N{};
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class45.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class45.C
new file mode 100644
index 000..e7addf5f291
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class45.C
@@ -0,0 +1,32 @@
+// PR c++/99200
+// { dg-do compile { target c++20 } }
+
+template 
+struct A
+{
+  constexpr A (const char ()[N]) { for (int i = 0; i < N; i++) v[i] = s[i]; 
v[N] = 0; }
+  char v[N + 1];
+};
+
+template 
+struct B
+{
+  constexpr operator const char *() { return s.v; }
+};
+
+template 
+const char *
+foo ()
+{
+  return B<__PRETTY_FUNCTION__>{};
+}
+
+template 
+const char *
+bar ()
+{
+  return B<__FUNCTION__>{};
+}
+
+auto a = foo  ();
+auto b = bar  ();
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class46.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class46.C
new file mode 100644
index 000..d91e800424f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class46.C
@@ -0,0 +1,11 @@
+// PR c++/93383
+// { dg-do compile { target c++20 } }
+
+template  struct A {};
+
+template  struct B {
+  void foo(B<+a>);
+  void bar(B);
+  template  using 

Re: [patch for gcc12 stage1][version 2] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-04-23 Thread Kees Cook via Gcc-patches
On Fri, Apr 23, 2021 at 08:05:29PM +0100, Richard Sandiford wrote:
> Finally getting to this now that the GCC 11 rush is over.  Sorry for
> the slow response.
> 
> I've tried to review most of the code below, but skipped the testsuite
> parts in the interests of time.  I'll probably have more comments in
> future rounds, just wanted to get the ball rolling.
> 
> This is realy Richi's area more than mine though, so please take this
> with a grain of salt.
> 
> Qing Zhao  writes:
> > 2.  initialize all paddings to zero when -ftrivial-auto-var-init is present.
> > In expr.c (store_constructor):
> >
> > Clear the whole structure when
> > -ftrivial-auto-var-init and the structure has paddings.
> >
> > In gimplify.c (gimplify_init_constructor):
> >
> > Clear the whole structure when
> > -ftrivial-auto-var-init and the structure has paddings.
> 
> Just to check: are we sure we want to use zero as the padding fill
> value even for -ftrivial-auto-var-init=pattern?  Or should it be
> 0xAA instead, to match the integer fill pattern?
> 
> I can see the arguments both ways, just thought it was worth asking.

I have no opinion myself, but I can give background.  Originally, Clang
implemented using pattern, but there was discussion around it and the
decision there was to go with zero init, as it seemed to more closely
match the C spec:
https://github.com/llvm/llvm-project/commit/d39fbc7e20d84364e409ce59724ce20625637062

> > +This is C and C++'s default.
> > +
> > +@item
> > +@samp{pattern} Initialize automatic variables with values which will likely
> > +transform logic bugs into crashes down the line, are easily recognized in a
> > +crash dump and without being values that programmers can rely on for useful
> > +program semantics.
> > +The values used for pattern initialization might be changed in the future.
> > +
> > +@item
> > +@samp{zero} Initialize automatic variables with zeroes.
> > +@end itemize
> > +
> > +The default is @samp{uninitialized}.
> > +
> > +You can control this behavior for a specific variable by using the variable
> > +attribute @code{uninitialized} (@pxref{Variable Attributes}).
> > +
> 
> I think it's important to say here that GCC still considers the
> variables to be uninitialised and still considers reading them to
> be undefined behaviour.  The option is simply trying to improve the
> security and predictability of the program in the presence of these
> uninitialised variables.

Excellent point, yes. That'd be good to call out.

> > […]
> > @@ -1831,6 +2000,17 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
> >as they may contain a label address.  */
> > walk_tree (, force_labels_r, NULL, NULL);
> > }
> > +  /* When there is no explicit initializer, if the user requested,
> > +We should insert an artifical initializer for this automatic
> > +variable for non vla variables.  */
> 
> I think we should explain why we can skip VLAs here.

FWIW, in testing, VLAs do get initialized, so I guess there's a separate
place where it happens.


Thanks for the review!

-- 
Kees Cook


Re: [PATCH] Fix PR88085

2021-04-23 Thread Jeff Law via Gcc-patches



On 4/20/2021 8:06 AM, Andreas Krebbel via Gcc-patches wrote:

With the current handling of decl alignments it is impossible to
reduce the alignment requirement as part of a variable declaration.

This change has been proposed by Richard in the PR. It fixes the
align-1.c testcase on IBM Z.

Bootstrapped on x86_64 and s390x. No regressions.

Ok for mainline?

gcc/ChangeLog:

PR middle-end/88085
* emit-rtl.c (set_mem_attributes_minus_bitpos): Use the user
alignment if there are no pre-existing mem attrs.


OK

jeff



Re: [patch for gcc12 stage1][version 2] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-04-23 Thread Richard Sandiford via Gcc-patches
Finally getting to this now that the GCC 11 rush is over.  Sorry for
the slow response.

I've tried to review most of the code below, but skipped the testsuite
parts in the interests of time.  I'll probably have more comments in
future rounds, just wanted to get the ball rolling.

This is realy Richi's area more than mine though, so please take this
with a grain of salt.

Qing Zhao  writes:
> 2.  initialize all paddings to zero when -ftrivial-auto-var-init is present.
> In expr.c (store_constructor):
>
> Clear the whole structure when
> -ftrivial-auto-var-init and the structure has paddings.
>
> In gimplify.c (gimplify_init_constructor):
>
> Clear the whole structure when
> -ftrivial-auto-var-init and the structure has paddings.

Just to check: are we sure we want to use zero as the padding fill
value even for -ftrivial-auto-var-init=pattern?  Or should it be
0xAA instead, to match the integer fill pattern?

I can see the arguments both ways, just thought it was worth asking.

> […]
> @@ -1589,6 +1592,24 @@ handle_retain_attribute (tree *pnode, tree name, tree 
> ARG_UNUSED (args),
>return NULL_TREE;
>  }
>  
> +/* Handle a "uninitialized" attribute; arguments as in

This occurs in existing code too, but s/a/an/.

> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_uninitialized_attribute (tree *node, tree name, tree ARG_UNUSED 
> (args),
> + int ARG_UNUSED (flags), bool *no_add_attrs)
> +{
> +  if (VAR_P (*node))
> +DECL_UNINITIALIZED (*node) = 1;
> +  else
> +{
> +  warning (OPT_Wattributes, "%qE attribute ignored", name);
> +  *no_add_attrs = true;
> +}
> +
> +  return NULL_TREE;
> +}
> +
>  /* Handle a "externally_visible" attribute; arguments as in
> struct attribute_spec.handler.  */

> […]
> @@ -11689,6 +11689,34 @@ Perform basic block vectorization on trees. This 
> flag is enabled by default at
>  @option{-O3} and by @option{-ftree-vectorize}, @option{-fprofile-use},
>  and @option{-fauto-profile}.
>  
> +@item -ftrivial-auto-var-init=@var{choice}
> +@opindex ftrivial-auto-var-init
> +Initialize automatic variables with either a pattern or with zeroes to 
> increase
> +program security by preventing uninitialized memory disclosure and use.
> +
> +The three values of @var{choice} are:
> +
> +@itemize @bullet
> +@item
> +@samp{uninitialized} doesn't initialize any automatic variables.
> +This is C and C++'s default.
> +
> +@item
> +@samp{pattern} Initialize automatic variables with values which will likely
> +transform logic bugs into crashes down the line, are easily recognized in a
> +crash dump and without being values that programmers can rely on for useful
> +program semantics.
> +The values used for pattern initialization might be changed in the future.
> +
> +@item
> +@samp{zero} Initialize automatic variables with zeroes.
> +@end itemize
> +
> +The default is @samp{uninitialized}.
> +
> +You can control this behavior for a specific variable by using the variable
> +attribute @code{uninitialized} (@pxref{Variable Attributes}).
> +

I think it's important to say here that GCC still considers the
variables to be uninitialised and still considers reading them to
be undefined behaviour.  The option is simply trying to improve the
security and predictability of the program in the presence of these
uninitialised variables.

I think it would also be worth saying that options like -Wuninitialized
still try to warn about uninitialised variables, although using
-ftrivial-auto-var-init may change which warnings are generated.

(The above comments are just a summary, not suitable for direct
inclusion. :-))

>  @item -fvect-cost-model=@var{model}
>  @opindex fvect-cost-model
>  Alter the cost model used for vectorization.  The @var{model} argument
> […]
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 6da6698..fafd2e9 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -1716,6 +1716,116 @@ gimplify_vla_decl (tree decl, gimple_seq *seq_p)
>  
>gimplify_and_add (t, seq_p);
>  
> +  /* Add a call to memset or calls to memcpy to initialize this vla
> + when the user requested.  */
> +  if (!DECL_ARTIFICIAL (decl)
> +  && VAR_P (decl)
> +  && !DECL_EXTERNAL (decl)
> +  && !TREE_STATIC (decl)
> +  && !DECL_UNINITIALIZED (decl))
> +switch (flag_trivial_auto_var_init)
> +  {
> +  case AUTO_INIT_UNINITIALIZED:
> + break;
> +  case AUTO_INIT_ZERO:
> + {
> +   /* Generate a call to memset to initialize this vla.  */
> +   gcall *gs;
> +   t = builtin_decl_implicit (BUILT_IN_MEMSET);
> +   gs = gimple_build_call (t, 3, addr, integer_zero_node,
> +   DECL_SIZE_UNIT (decl));
> +   gimple_call_set_memset_for_uninit (gs, true);
> +   gimplify_seq_add_stmt (seq_p, gs);
> + }
> + break;
> +  case AUTO_INIT_PATTERN:
> + {
> +   /* Generate the following sequence to initialize this vla:
> + 

Re: [PATCH] lra: Avoid cycling on certain subreg reloads [PR96796]

2021-04-23 Thread Vladimir Makarov via Gcc-patches



On 2021-04-23 12:13 p.m., Richard Sandiford wrote:

This is a backport of the PR96796 fix to GCC 10 and GCC 9.  The original
trunk patch was:

https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552878.html

reviewed here:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553308.html


...


This backport is less aggressive than the trunk version, in that the new
code reuses the test for a reload move from in_class_p.  We will therefore
only narrow OP_OUT classes if the instruction is a register move or memory
load that was generated by LRA itself.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK for GCC 10
and GCC 9?

Yes.  I think as the previous patch did not introduced new issues and 
this patch works in less cases, the patch is ok for GCC10 and GCC9 
branches.  I definitely like this version of the patch more.


Thank you, Richard, for working on this issue.


gcc/
PR rtl-optimization/96796
* lra-constraints.c (in_class_p): Add a default-false
allow_all_reload_class_changes_p parameter.  Do not treat
reload moves specially when the parameter is true.
(get_reload_reg): Try to narrow the class of an existing OP_OUT
reload if we're reloading a reload pseudo in a reload instruction.

gcc/testsuite/
PR rtl-optimization/96796
* gcc.c-torture/compile/pr96796.c: New test.
---
  gcc/lra-constraints.c | 59 +++
  gcc/testsuite/gcc.c-torture/compile/pr96796.c | 56 ++
  2 files changed, 105 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr96796.c

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 7cc479b3042..29a734e0e10 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -235,12 +235,17 @@ get_reg_class (int regno)
 CL.  Use elimination first if REG is a hard register.  If REG is a
 reload pseudo created by this constraints pass, assume that it will
 be allocated a hard register from its allocno class, but allow that
-   class to be narrowed to CL if it is currently a superset of CL.
+   class to be narrowed to CL if it is currently a superset of CL and
+   if either:
+
+   - ALLOW_ALL_RELOAD_CLASS_CHANGES_P is true or
+   - the instruction we're processing is not a reload move.
  
 If NEW_CLASS is nonnull, set *NEW_CLASS to the new allocno class of

 REGNO (reg), or NO_REGS if no change in its class was needed.  */
  static bool
-in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class)
+in_class_p (rtx reg, enum reg_class cl, enum reg_class *new_class,
+   bool allow_all_reload_class_changes_p = false)
  {
enum reg_class rclass, common_class;
machine_mode reg_mode;
@@ -267,7 +272,8 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class 
*new_class)
 typically moves that have many alternatives, and restricting
 reload pseudos for one alternative may lead to situations
 where other reload pseudos are no longer allocatable.  */
-  || (INSN_UID (curr_insn) >= new_insn_uid_start
+  || (!allow_all_reload_class_changes_p
+ && INSN_UID (curr_insn) >= new_insn_uid_start
  && src != NULL
  && ((REG_P (src) || MEM_P (src))
  || (GET_CODE (src) == SUBREG
@@ -570,13 +576,12 @@ init_curr_insn_input_reloads (void)
curr_insn_input_reloads_num = 0;
  }
  
-/* Create a new pseudo using MODE, RCLASS, ORIGINAL or reuse already

-   created input reload pseudo (only if TYPE is not OP_OUT).  Don't
-   reuse pseudo if IN_SUBREG_P is true and the reused pseudo should be
-   wrapped up in SUBREG.  The result pseudo is returned through
-   RESULT_REG.  Return TRUE if we created a new pseudo, FALSE if we
-   reused the already created input reload pseudo.  Use TITLE to
-   describe new registers for debug purposes.  */
+/* Create a new pseudo using MODE, RCLASS, ORIGINAL or reuse an existing
+   reload pseudo.  Don't reuse an existing reload pseudo if IN_SUBREG_P
+   is true and the reused pseudo should be wrapped up in a SUBREG.
+   The result pseudo is returned through RESULT_REG.  Return TRUE if we
+   created a new pseudo, FALSE if we reused an existing reload pseudo.
+   Use TITLE to describe new registers for debug purposes.  */
  static bool
  get_reload_reg (enum op_type type, machine_mode mode, rtx original,
enum reg_class rclass, bool in_subreg_p,
@@ -588,6 +593,40 @@ get_reload_reg (enum op_type type, machine_mode mode, rtx 
original,
  
if (type == OP_OUT)

  {
+  /* Output reload registers tend to start out with a conservative
+choice of register class.  Usually this is ALL_REGS, although
+a target might narrow it (for performance reasons) through
+targetm.preferred_reload_class.  It's therefore quite common
+for a reload instruction to require a more restrictive class
+than the class that was originally assigned to the reload 

Re: [PATCH 2/2] bpf: allow BSS symbols to be global symbols

2021-04-23 Thread YiFei Zhu via Gcc-patches
On Fri, Apr 23, 2021 at 12:25 PM David Faust  wrote:
> I've just checked in both patches to master and GCC 10 on your behalf.

On 4/22/21 11:54 PM, Jose E. Marchesi via Gcc-patches wrote:
> Thanks for the patch.
> This is OK for both master and GCC 10.

Thanks to both of you :)

YiFei Zhu


Re: [PATCH 2/2] bpf: allow BSS symbols to be global symbols

2021-04-23 Thread David Faust via Gcc-patches
Hi YiFei,

I've just checked in both patches to master and GCC 10 on your behalf.

Thanks!

On 4/22/21 11:54 PM, Jose E. Marchesi via Gcc-patches wrote:
> 
> Hi YiFei.
> 
>> Prior to this, a BSS declaration such as:
>>
>>   int foo;
>>   static int bar;
>>
>> Generates:
>>
>>   .global foo
>>   .local  foo
>>   .comm   foo,4,4
>>   .local  bar
>>   .commbar,4,4
>>
>> Creating symbols:
>>
>>    b foo
>>   0004 b bar
>>
>> Both symbols are local. However, libbpf bpf_object__variable_offset
>> rquires symbols to be STB_GLOBAL & STT_OBJECT for data section lookup.
>> This patch makes the same declaration generate:
>>
>>   .global foo
>>   .type   foo, @object
>>   .lcomm  foo,4,4
>>   .local  bar
>>   .comm   bar,4,4
>>
>> Creating symbols:
>>
>>    B foo
>>   0004 b bar
>>
>> And libbpf will be okay with looking up the global symbol "foo".
> 
> Thanks for the patch.
> This is OK for both master and GCC 10.
> 


Re: [PATCH][libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

2021-04-23 Thread Tom de Vries
On 4/23/21 5:45 PM, Alexander Monakov wrote:
> On Thu, 22 Apr 2021, Tom de Vries wrote:
> 
>> Ah, I see, agreed, that makes sense.  I was afraid there was some
>> fundamental problem that I overlooked.
>>
>> Here's an updated version.  I've tried to make it clear that the
>> futex_wait/wake are locally used versions, not generic functionality.
> 
> Could you please regenerate the patch passing appropriate flags to
> 'git format-patch' so it presents a rewrite properly (see documentation
> for --patience and --break-rewrites options). The attached patch was mostly
> unreadable, I'm afraid.

Sure.  I did notice that the patch was not readable, but I didn't known
there were options to improve that, so thanks for pointing that out.

Thanks,
- Tom
>From d3053a7ec7444b371ee29097a673e637b0d369d9 Mon Sep 17 00:00:00 2001
From: Tom de Vries 
Date: Tue, 20 Apr 2021 08:47:03 +0200
Subject: [PATCH 1/4] [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

Consider the following omp fragment.
...
  #pragma omp target
  #pragma omp parallel num_threads (2)
  #pragma omp task
;
...

This hangs at -O0 for nvptx.

Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
  - deposit a task, and
  - execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
  proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
  calls gomp_barrier_handle_tasks, where it:
  - executes both tasks and marks the team barrier done
  - executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
  the team barrier.
- thread 0 hangs.

To understand why there is a hang here, it's good to understand how things
are setup for nvptx.  The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
  if (bar->total > 1)
asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...

The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait.  In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.

In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.

The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}.  That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.

Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.

This is a WIP version that does not yet take performance into consideration,
but instead focuses on copying a working version as completely as possible,
and isolating the machine-specific changes to as few functions as
possible.

The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.

This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.

Tested libgomp on x86_64 with nvptx accelerator, both as-is and with
do_spin hardcoded to 1.

libgomp/ChangeLog:

2021-04-20  Tom de Vries  

	PR target/99555
	* config/nvptx/bar.c (generation_to_barrier): New function, copied
	from config/rtems/bar.c.
	(futex_wait, futex_wake): New function.
	(do_spin, do_wait): New function, copied from config/linux/wait.h.
	(gomp_barrier_wait_end, gomp_barrier_wait_last)
	(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
	(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
	and replace with include of config/linux/bar.c.
	* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
	(gomp_barrier_init): Init new fields.
	* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
	workarounds.
	* testsuite/libgomp.c/pr99555-1.c: Same.
	* testsuite/libgomp.fortran/task-detach-6.f90: Same.
---
 libgomp/config/nvptx/bar.c| 388 --
 libgomp/config/nvptx/bar.h|   4 +
 .../libgomp.c-c++-common/task-detach-6.c  |   8 -
 libgomp/testsuite/libgomp.c/pr99555-1.c   |   8 -
 .../libgomp.fortran/task-detach-6.f90 |  12 -
 5 files changed, 180 insertions(+), 240 deletions(-)
 rewrite libgomp/config/nvptx/bar.c (76%)

diff --git 

i386: Reject -m96bit-long-double for 64bit targets [PR100041]

2021-04-23 Thread Uros Bizjak via Gcc-patches
64bit targets default to 128bit long double, so -m96bit-long-double should
not be used.  Together with -m128bit-long-double, this option was intended
to be an optimization for 32bit targets only.

Error out when -m96bit-long-double is used with 64bit targets.

2021-04-23  Uroš Bizjak  

gcc/
PR target/100041
* config/i386/i386-options.c (ix86_option_override_internal):
Error out when -m96bit-long-double is used with 64bit targets.
* config/i386/i386.md (*pushxf_rounded): Remove pattern.

gcc/testsuite/

PR target/100041
* gcc.target/i386/pr79514.c (dg-error):
Expect error for 64bit targets.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to mainline.

Uros.
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 91da2849c49..b1059c77b6b 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2557,6 +2557,9 @@ ix86_option_override_internal (bool main_args_p,
opts->x_ix86_isa_flags
  |= TARGET_SUBTARGET64_ISA_DEFAULT & ~opts->x_ix86_isa_flags_explicit;
 
+  if (!TARGET_128BIT_LONG_DOUBLE_P (opts->x_target_flags))
+   error ("%<-m96bit-long-double%> is not compatible with this target");
+
   if (TARGET_RTD_P (opts->x_target_flags))
warning (0,
 main_args_p
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9ff35d9a607..9e9dce6d433 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3044,36 +3044,6 @@
   operands[0] = replace_equiv_address (operands[0], stack_pointer_rtx);
 })
 
-(define_insn_and_split "*pushxf_rounded"
-  [(set (mem:XF
- (pre_modify:P
-   (reg:P SP_REG)
-   (plus:P (reg:P SP_REG) (const_int -16
-   (match_operand:XF 0 "nonmemory_no_elim_operand" "f,r,*r,C"))]
-  "TARGET_64BIT"
-  "#"
-  "&& 1"
-  [(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (const_int -16)))
-   (set (match_dup 1) (match_dup 0))]
-{
-  rtx pat = PATTERN (curr_insn);
-  operands[1] = SET_DEST (pat);
-
-  /* Preserve memory attributes. */
-  operands[1] = replace_equiv_address (operands[1], stack_pointer_rtx);
-}
-  [(set_attr "type" "multi")
-   (set_attr "unit" "i387,*,*,*")
-   (set (attr "mode")
-   (cond [(eq_attr "alternative" "1,2,3")
-(const_string "DI")
- ]
- (const_string "XF")))
-   (set (attr "preferred_for_size")
- (cond [(eq_attr "alternative" "1")
-  (symbol_ref "false")]
-   (symbol_ref "true")))])
-
 (define_insn "*pushxf"
   [(set (match_operand:XF 0 "push_operand" "=<,<,<,<,<")
(match_operand:XF 1 "general_no_elim_operand" "f,r,*r,oF,oC"))]
diff --git a/gcc/testsuite/gcc.target/i386/pr79514.c 
b/gcc/testsuite/gcc.target/i386/pr79514.c
index c5b7bf8ef67..8235da6e14c 100644
--- a/gcc/testsuite/gcc.target/i386/pr79514.c
+++ b/gcc/testsuite/gcc.target/i386/pr79514.c
@@ -1,6 +1,7 @@
 /* PR target/79514 */
 /* { dg-do compile } */
 /* { dg-options "-m96bit-long-double" } */
+/* { dg-error "'-m96bit-long-double' is not compatible" "" { target { ! ia32 } 
} 0 } */
 
 extern void bar (long double);
 


[committed] MAINTAINERS: Add myself for write after approval

2021-04-23 Thread David Faust via Gcc-patches
ChangeLog:

2021-04-23  David Faust  

* MAINTAINERS (Write After Approval): Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index db25583b37b..c589c0b30ac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -387,6 +387,7 @@ Doug Evans  

 Chris Fairles  
 Alessandro Fanfarillo  
 Changpeng Fang 
+David Faust
 Li Feng
 Thomas Fitzsimmons 
 Alexander Fomin

-- 
2.30.2



[PATCH] lra: Avoid cycling on certain subreg reloads [PR96796]

2021-04-23 Thread Richard Sandiford via Gcc-patches
This is a backport of the PR96796 fix to GCC 10 and GCC 9.  The original
trunk patch was:

   https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552878.html

reviewed here:

   https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553308.html

I'm not aware of any fallout since then.  However, as the covering
message for the original patch said, the patch was quite aggressive
about when to apply the new heuristic.  This version takes the
(hopefully) more branch-friendly approach described there and
limits the new heuristic to move and load instructions inserted
by LRA itself.

The full (adjusted) covering message is below:

This PR is about LRA cycling for a reload of the form:


Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI]
  Creating newreg=287, assigning class ALL_REGS to slow/invalid mem r287
  Creating newreg=288, assigning class ALL_REGS to slow/invalid mem r288
  103: r203:SI=r288:SI<<0x1+r196:DI#0
  REG_DEAD r196:DI
Inserting slow/invalid mem reload before:
  316: r287:DI=[r105:DI*0x8+r140:DI]
  317: r288:SI=r287:DI#0


The problem is with r287.  We rightly give it a broad starting class of
POINTER_AND_FP_REGS (reduced from ALL_REGS by preferred_reload_class).
However, we never make forward progress towards narrowing it down to
a specific choice of class (POINTER_REGS or FP_REGS).

I think in practice we rely on two things to narrow a reload pseudo's
class down to a specific choice:

(1) a restricted class is specified when the pseudo is created

This happens for input address reloads, where the class is taken
from the target's chosen base register class.  It also happens
for simple REG reloads, where the class is taken from the chosen
alternative's constraints.

(2) uses of the reload pseudo as a direct input operand

In this case get_reload_reg tries to reuse the existing register
and narrow its class, instead of creating a new reload pseudo.

However, neither occurs here.  As described above, r287 rightly
starts out with a wide choice of class, ultimately derived from
ALL_REGS, so we don't get (1).  And as the comments in the PR
explain, r287 is never used as an input reload, only the subreg is,
so we don't get (2):


 Choosing alt 13 in insn 317:  (0) r  (1) w {*movsi_aarch64}
  Creating newreg=291, assigning class FP_REGS to r291
  317: r288:SI=r291:SI
Inserting insn reload before:
  320: r291:SI=r287:DI#0


IMO, in this case we should rely on the reload of r316 to narrow
down the class of r278.  Currently we do:


 Choosing alt 7 in insn 316:  (0) r  (1) m {*movdi_aarch64}
  Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to r289
  316: r289:DI=[r105:DI*0x8+r140:DI]
Inserting insn reload after:
  318: r287:DI=r289:DI
---

i.e. we create a new pseudo register r289 and give *that* pseudo
GENERAL_REGS instead.  This is because get_reload_reg only narrows
down the existing class for OP_IN and OP_INOUT, not OP_OUT.

But if we have a reload pseudo in a reload instruction and have chosen
a specific class for the reload pseudo, I think we should simply install
it for OP_OUT reloads too, if the class is a subset of the existing class.
We will need to pick such a register whatever happens (for r289 in the
example above).  And as explained in the PR, doing this actually avoids
an unnecessary move via the FP registers too.

This backport is less aggressive than the trunk version, in that the new
code reuses the test for a reload move from in_class_p.  We will therefore
only narrow OP_OUT classes if the instruction is a register move or memory
load that was generated by LRA itself.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK for GCC 10
and GCC 9?

Thanks,
Richard


gcc/
PR rtl-optimization/96796
* lra-constraints.c (in_class_p): Add a default-false
allow_all_reload_class_changes_p parameter.  Do not treat
reload moves specially when the parameter is true.
(get_reload_reg): Try to narrow the class of an existing OP_OUT
reload if we're reloading a reload pseudo in a reload instruction.

gcc/testsuite/
PR rtl-optimization/96796
* gcc.c-torture/compile/pr96796.c: New test.
---
 gcc/lra-constraints.c | 59 +++
 gcc/testsuite/gcc.c-torture/compile/pr96796.c | 56 ++
 2 files changed, 105 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr96796.c

diff --git a/gcc/lra-constraints.c 

Re: [PATCH][libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

2021-04-23 Thread Alexander Monakov via Gcc-patches
On Thu, 22 Apr 2021, Tom de Vries wrote:

> Ah, I see, agreed, that makes sense.  I was afraid there was some
> fundamental problem that I overlooked.
> 
> Here's an updated version.  I've tried to make it clear that the
> futex_wait/wake are locally used versions, not generic functionality.

Could you please regenerate the patch passing appropriate flags to
'git format-patch' so it presents a rewrite properly (see documentation
for --patience and --break-rewrites options). The attached patch was mostly
unreadable, I'm afraid.

Alexander


i386: Fix atomic FP peepholes [PR100182]

2021-04-23 Thread Uros Bizjak via Gcc-patches
64bit loads to/stores from x87 and SSE registers are atomic also on 32-bit
targets, so there is no need for additional atomic moves to a temporary
register.

Introduced load peephole2 patterns assume that there won't be any additional
loads from the load location outside the peepholed sequence and wrongly
removed the source location initialization.

OTOH, introduced store peephole2 patterns assume there won't be any additional
loads from the stored location outside the peepholed sequence and wrongly
removed the destination location initialization.  Note that we can't use plain
x87 FST instruction to initialize destination location because FST converts
the value to the double-precision format, changing bits during move.

The patch restores removed initializations in load and store patterns.
Additionally, plain x87 FST in store peephole2 patterns is prevented by
limiting the store operand source to SSE registers.

2021-04-23  Uroš Bizjak  

gcc/
PR target/100182
* config/i386/sync.md (FILD_ATOMIC/FIST_ATOMIC FP load peephole2):
Copy operand 3 to operand 4.  Use sse_reg_operand
as operand 3 predicate.
(FILD_ATOMIC/FIST_ATOMIC FP load peephole2 with mem blockage): Ditto.
(LDX_ATOMIC/STX_ATOMIC FP load peephole2): Ditto.
(LDX_ATOMIC/LDX_ATOMIC FP load peephole2 with mem blockage): Ditto.
(FILD_ATOMIC/FIST_ATOMIC FP store peephole2):
Copy operand 1 to operand 0.
(FILD_ATOMIC/FIST_ATOMIC FP store peephole2 with mem blockage): Ditto.
(LDX_ATOMIC/STX_ATOMIC FP store peephole2): Ditto.
(LDX_ATOMIC/LDX_ATOMIC FP store peephole2 with mem blockage): Ditto.

gcc/testsuite/

PR target/100182
* gcc.target/i386/pr100182.c: New test.
* gcc.target/i386/pr71245-1.c (dg-final): Xfail scan-assembler-not.
* gcc.target/i386/pr71245-2.c (dg-final): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to mainline, will be pushed to other release branches once gcc-11 opens.

Uros.
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index c7c508c8de8..7913b918796 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -226,12 +226,13 @@
(set (match_operand:DI 2 "memory_operand")
(unspec:DI [(match_dup 0)]
   UNSPEC_FIST_ATOMIC))
-   (set (match_operand:DF 3 "any_fp_register_operand")
+   (set (match_operand:DF 3 "sse_reg_operand")
(match_operand:DF 4 "memory_operand"))]
   "!TARGET_64BIT
&& peep2_reg_dead_p (2, operands[0])
&& rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
-  [(set (match_dup 3) (match_dup 5))]
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 4) (match_dup 3))]
   "operands[5] = gen_lowpart (DFmode, operands[1]);")
 
 (define_peephole2
@@ -243,7 +244,7 @@
   UNSPEC_FIST_ATOMIC))
(set (mem:BLK (scratch:SI))
(unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE))
-   (set (match_operand:DF 3 "any_fp_register_operand")
+   (set (match_operand:DF 3 "sse_reg_operand")
(match_operand:DF 4 "memory_operand"))]
   "!TARGET_64BIT
&& peep2_reg_dead_p (2, operands[0])
@@ -251,6 +252,7 @@
   [(const_int 0)]
 {
   emit_move_insn (operands[3], gen_lowpart (DFmode, operands[1]));
+  emit_move_insn (operands[4], operands[3]);
   emit_insn (gen_memory_blockage ());
   DONE;
 })
@@ -262,12 +264,13 @@
(set (match_operand:DI 2 "memory_operand")
(unspec:DI [(match_dup 0)]
   UNSPEC_STX_ATOMIC))
-   (set (match_operand:DF 3 "any_fp_register_operand")
+   (set (match_operand:DF 3 "sse_reg_operand")
(match_operand:DF 4 "memory_operand"))]
   "!TARGET_64BIT
&& peep2_reg_dead_p (2, operands[0])
&& rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
-  [(set (match_dup 3) (match_dup 5))]
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 4) (match_dup 3))]
   "operands[5] = gen_lowpart (DFmode, operands[1]);")
 
 (define_peephole2
@@ -279,7 +282,7 @@
   UNSPEC_STX_ATOMIC))
(set (mem:BLK (scratch:SI))
(unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE))
-   (set (match_operand:DF 3 "any_fp_register_operand")
+   (set (match_operand:DF 3 "sse_reg_operand")
(match_operand:DF 4 "memory_operand"))]
   "!TARGET_64BIT
&& peep2_reg_dead_p (2, operands[0])
@@ -287,6 +290,7 @@
   [(const_int 0)]
 {
   emit_move_insn (operands[3], gen_lowpart (DFmode, operands[1]));
+  emit_move_insn (operands[4], operands[3]);
   emit_insn (gen_memory_blockage ());
   DONE;
 })
@@ -392,7 +396,8 @@
   "!TARGET_64BIT
&& peep2_reg_dead_p (3, operands[2])
&& rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))"
-  [(set (match_dup 5) (match_dup 1))]
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 5) (match_dup 1))]
   "operands[5] = gen_lowpart (DFmode, operands[4]);")
 
 (define_peephole2
@@ -411,6 +416,7 @@
&& rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))"
   [(const_int 0)]
 {
+  emit_move_insn 

Re: [PATCH] ix86: Support V{2, 4}DImode arithmetic right shifts for SSE2+ [PR98856]

2021-04-23 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 09, 2021 at 12:12:24PM +0100, Jakub Jelinek via Gcc-patches wrote:
> As mentioned in the PR, we don't support arithmetic right V2DImode or
> V4DImode on x86 without -mavx512vl or -mxop.  The ISAs indeed don't have
> {,v}psraq instructions until AVX512VL, but we actually can emulate it quite
> easily.
> One case is arithmetic >> 63, we can just emit {,v}pxor; {,v}pcmpgt for
> that for SSE4.2+, or for SSE2 psrad $31; pshufd $0xf5.
> Then arithmetic >> by constant > 32, that can be done with {,v}psrad $31
> and {,v}psrad $(cst-32) and two operand permutation,
> arithmetic >> 32 can be done as {,v}psrad $31 and permutation of that
> and the original operand.  Arithmetic >> by constant < 32 can be done
> as {,v}psrad $cst and {,v}psrlq $cst and two operand permutation.
> And arithmetic >> by variable scalar amount can be done as
> arithmetic >> 63, logical >> by the amount, << by (64 - amount of the
> >> 63 result; note that the vector << 64 result in 0) and oring together.
> 
> I had to improve the permutation generation so that it actually handles
> the needed permutations (or handles them better).
> 
> Richard, does this actually improve the benchmark that regressed?
> 
> If not, I guess this is a GCC 12 material.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux.
> 
> 2021-02-09  Jakub Jelinek  
> 
>   PR tree-optimization/98856
>   * config/i386/i386.c (ix86_shift_rotate_cost): Add CODE argument.
>   Expect V2DI and V4DI arithmetic right shifts to be emulated.
>   (ix86_rtx_costs, ix86_add_stmt_cost): Adjust ix86_shift_rotate_cost
>   caller.
>   * config/i386/i386-expand.c (expand_vec_perm_2perm_interleave,
>   expand_vec_perm_2perm_pblendv): New functions.
>   (ix86_expand_vec_perm_const_1): Use them.
>   * config/i386/sse.md (ashr3): Rename to ...
>   (ashr3): ... this.
>   (ashr3): New define_expand with VI248_AVX512BW iterator.
>   (ashrv4di3): New define_expand.
>   (ashrv2di3): Change condition to TARGET_SSE2, handle !TARGET_XOP
>   and !TARGET_AVX512VL expansion.
> 
>   * gcc.target/i386/sse2-psraq-1.c: New test.
>   * gcc.target/i386/sse4_2-psraq-1.c: New test.
>   * gcc.target/i386/avx-psraq-1.c: New test.
>   * gcc.target/i386/avx2-psraq-1.c: New test.
>   * gcc.target/i386/avx-pr82370.c: Adjust expected number of vpsrad
>   instructions.
>   * gcc.target/i386/avx2-pr82370.c: Likewise.
>   * gcc.target/i386/avx512f-pr82370.c: Likewise.
>   * gcc.target/i386/avx512bw-pr82370.c: Likewise.
>   * gcc.dg/torture/vshuf-4.inc: Add two further permutations.
>   * gcc.dg/torture/vshuf-8.inc: Likewise.

Ok for trunk now?
https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565026.html

Jakub



Re: [PATCH] match.pd: Add some __builtin_ctz (x) cmp cst simplifications [PR95527]

2021-04-23 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 02, 2021 at 07:40:02PM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Feb 02, 2021 at 11:39:30AM -0700, Jeff Law wrote:
> > > This patch adds some ctz simplifications (e.g. ctz (x) >= 3 can be done by
> > > testing if the low 3 bits are zero, etc.).
> > >
> > > In addition, I've noticed that in the CLZ case, the
> > > #ifdef CLZ_DEFINED_VALUE_AT_ZERO don't really work as intended, they
> > > are evaluated during genmatch and the macro is not defined then
> > > (but, because of the missing tm.h includes it isn't defined in
> > > gimple-match.c or generic-match.c either).  And when tm.h is included,
> > > defaults.h is included which defines a fallback version of that macro.
> > >
> > > For GCC 12, I wonder if it wouldn't be better to say in addition to 
> > > __builtin_c[lt]z*
> > > is always UB at zero that it would be undefined for .C[LT]Z ifn too if it
> > > has just one operand and use a second operand to be the constant we expect
> > > at zero.
> > >
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > >
> > > 2021-01-16  Jakub Jelinek  
> > >
> > >   PR tree-optimization/95527
> > >   * generic-match-head.c: Include tm.h.
> > >   * gimple-match-head.c: Include tm.h.
> > >   * match.pd (CLZ == INTEGER_CST): Don't use
> > >   #ifdef CLZ_DEFINED_VALUE_AT_ZERO, only test CLZ_DEFINED_VALUE_AT_ZERO
> > >   if clz == CFN_CLZ.  Add missing val declaration.
> > >   (CTZ cmp CST): New simplifications.
> > >
> > >   * gcc.dg/tree-ssa/pr95527-2.c: New test.
> > Similarly.  I'd lean towards deferring to gcc-12.
> 
> Ok, will repost at the start of stage1 then (for both).

Ok for trunk now?
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563711.html

Jakub



Re: [PATCH] expand: Expand x / y * y as x - x % y if the latter is cheaper [PR96696]

2021-04-23 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 02, 2021 at 11:38:11AM -0700, Jeff Law via Gcc-patches wrote:
> On 1/16/21 11:13 AM, Jakub Jelinek wrote:
> > Hi!
> >
> > The following patch tests both x / y * y and x - x % y expansion for the
> > former GIMPLE code and chooses the cheaper of those sequences.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2021-01-16  Jakub Jelinek  
> >
> > PR tree-optimization/96696
> > * expr.c (expand_expr_divmod): New function.
> > (expand_expr_real_2) : Use it for truncations and
> > divisions.  Formatting fixes.
> > : Optimize x / y * y as x - x % y if the latter is
> > cheaper.
> >
> > * gcc.target/i386/pr96696.c: New test.
> Given this is strictly a missed optimization, I'd lean towards deferring
> to gcc-12 at this point.  Thoughts?

Ok for trunk now?
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563710.html

Jakub



[committed 3/3] libstdc++: Allow net::io_context to compile without [PR 100180]

2021-04-23 Thread Jonathan Wakely via Gcc-patches

This adds dummy placeholders to net::io_context so that it can still be
compiled on targets without .

Tested powerpc64le-linux and sparc-solaris. Committed to trunk.

This could be backported so that it fixes PR 100180 everywhere, but
after the gcc-11 release.


commit 0e1e7b77904f1fe2a6dbfe84bb4fc026584ba480
Author: Jonathan Wakely 
Date:   Fri Apr 23 13:38:05 2021

libstdc++: Allow net::io_context to compile without  [PR 100180]

This adds dummy placeholders to net::io_context so that it can still be
compiled on targets without .

libstdc++-v3/ChangeLog:

PR libstdc++/100180
* include/experimental/io_context (io_context): Define
dummy_pollfd type so that most member functions still compile
without  and struct pollfd.

diff --git a/libstdc++-v3/include/experimental/io_context b/libstdc++-v3/include/experimental/io_context
index 82d7b4f545e..63d7db5b2d0 100644
--- a/libstdc++-v3/include/experimental/io_context
+++ b/libstdc++-v3/include/experimental/io_context
@@ -716,6 +716,7 @@ inline namespace v1
 
 struct __reactor
 {
+#ifdef _GLIBCXX_HAVE_POLL_H
   __reactor() : _M_fds(1)
   {
 	int __pipe[2];
@@ -739,6 +740,7 @@ inline namespace v1
 	::close(_M_fds.back().fd);
 	::close(_M_notify_wr);
   }
+#endif
 
   // write a notification byte to the pipe (ignoring errors)
   void _M_notify()
@@ -799,8 +801,12 @@ inline namespace v1
 	_M_notify();
   }
 
-# ifdef _GLIBCXX_HAVE_POLL_H
+#ifdef _GLIBCXX_HAVE_POLL_H
   using __fdvec = vector<::pollfd>;
+#else
+  struct dummy_pollfd { int fd = -1; short events = 0, revents = 0; };
+  using __fdvec = vector;
+#endif
 
   // Find first element p such that !(p.fd < __fd)
   // N.B. always returns a dereferencable iterator.
@@ -816,6 +822,7 @@ inline namespace v1
   __status
   wait(__fdvec& __fds, chrono::milliseconds __timeout)
   {
+#ifdef _GLIBCXX_HAVE_POLL_H
 	// XXX not thread-safe!
 	__fds = _M_fds;  // take snapshot to pass to poll()
 
@@ -845,10 +852,14 @@ inline namespace v1
 	__fds.erase(__part, __fds.end());
 
 	return _S_ok;
+#else
+	(void) __timeout;
+	__fds.clear();
+	return _S_error;
+#endif
   }
 
   __fdvec _M_fds;	// _M_fds.back() is the read end of the self-pipe
-#endif
   int _M_notify_wr;	// write end of the self-pipe
 };
 


[committed 2/3] libstdc++ Clarify argument to net::io_context::async_wait

2021-04-23 Thread Jonathan Wakely via Gcc-patches

Add a comment documenting the __w parameter of the private
ios_context::async_wait function. Add casts to callers, making the
conversions explicit.

Tested powerpc64le-linux and sparc-solaris. Committed to trunk.


commit 3517dfe05c05a48885149334143230fcf0ebe6be
Author: Jonathan Wakely 
Date:   Fri Apr 23 13:31:33 2021

libstdc++: Clarify argument to net::io_context::async_wait

Add a comment documenting the __w parameter of the private
ios_context::async_wait function. Add casts to callers, making the
conversions explicit.

libstdc++-v3/ChangeLog:

* include/experimental/io_context (io_context::async_wait): Add
comment.
* include/experimental/socket (basic_socket::async_connect):
Cast wait_type constant to int.
(basic_datagram_socket::async_receive): Likewise.
(basic_datagram_socket::async_receive_from): Likewise.
(basic_datagram_socket::async_send): Likewise.
(basic_datagram_socket::async_send_to): Likewise.
(basic_stream_socket::async_receive): Likewise.
(basic_stream_socket::async_send): Likewise. Use io_context
parameter directly, instead of via an executor.
(basic_socket_acceptor::async_accept): Likewise.

diff --git a/libstdc++-v3/include/experimental/io_context b/libstdc++-v3/include/experimental/io_context
index c82f30cd119..82d7b4f545e 100644
--- a/libstdc++-v3/include/experimental/io_context
+++ b/libstdc++-v3/include/experimental/io_context
@@ -475,6 +475,9 @@ inline namespace v1
 	return 0;
   }
 
+// The caller must know what the wait-type __w will be interpreted.
+// In the current implementation the reactor is based on 
+// so the parameter must be one of POLLIN, POLLOUT or POLLERR.
 template
   void
   async_wait(int __fd, int __w, _Op&& __op)
diff --git a/libstdc++-v3/include/experimental/socket b/libstdc++-v3/include/experimental/socket
index ec4ed9d95e2..09c3b729607 100644
--- a/libstdc++-v3/include/experimental/socket
+++ b/libstdc++-v3/include/experimental/socket
@@ -954,7 +954,7 @@ inline namespace v1
 	}
 
 	  get_executor().context().async_wait( native_handle(),
-	  socket_base::wait_read,
+	  (int) socket_base::wait_read,
 	  [__h = std::move(__init.completion_handler),
__ep = std::move(__endpoint),
__fd = native_handle()]
@@ -1165,7 +1165,7 @@ inline namespace v1
 __init{__token};
 
 	  this->get_executor().context().async_wait(this->native_handle(),
-	  socket_base::wait_read,
+	  (int) socket_base::wait_read,
 	  [__h = std::move(__init.completion_handler),
&__buffers, __flags = static_cast(__flags),
__fd = this->native_handle()]
@@ -1271,7 +1271,7 @@ inline namespace v1
 __init{__token};
 
 	  this->get_executor().context().async_wait( this->native_handle(),
-	  socket_base::wait_read,
+	  (int) socket_base::wait_read,
 	  [__h = std::move(__init.completion_handler),
&__buffers, __flags = static_cast(__flags),
__sender = std::move(__sender),
@@ -1366,7 +1366,7 @@ inline namespace v1
 __init{__token};
 
 	  this->get_executor().context().async_wait( this->native_handle(),
-	  socket_base::wait_write,
+	  (int) socket_base::wait_write,
 	  [__h = std::move(__init.completion_handler),
&__buffers, __flags = static_cast(__flags),
__fd = this->native_handle()]
@@ -1469,7 +1469,7 @@ inline namespace v1
 __init{__token};
 
 	  this->get_executor().context().async_wait( this->native_handle(),
-	  socket_base::wait_write,
+	  (int) socket_base::wait_write,
 	  [__h = std::move(__init.completion_handler),
&__buffers, __flags = static_cast(__flags),
__recipient = std::move(__recipient),
@@ -1634,7 +1634,7 @@ inline namespace v1
 	}
 
   this->get_executor().context().async_wait(this->native_handle(),
-	  socket_base::wait_read,
+	  (int) socket_base::wait_read,
 	  [__h = std::move(__init.completion_handler),
&__buffers, __flags = static_cast(__flags),
__fd = this->native_handle()]
@@ -1741,7 +1741,7 @@ inline namespace v1
 	}
 
   this->get_executor().context().async_wait(this->native_handle(),
-	  socket_base::wait_write,
+	  (int) socket_base::wait_write,
 	  [__h = std::move(__init.completion_handler),
&__buffers, __flags = static_cast(__flags),
__fd = this->native_handle()]
@@ -2098,8 +2098,8 @@ inline namespace v1
   async_completion<_CompletionToken, void(error_code, socket_type)>
 __init{__token};
 
-	  __ctx.get_executor().context().async_wait(native_handle(),
-	  socket_base::wait_read,
+	  __ctx.async_wait(native_handle(),
+	  (int) socket_base::wait_read,
 	  [__h = 

[committed 1/3] libstdc++ Simplify definition of net::socket_base constants

2021-04-23 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* include/experimental/socket (socket_base::shutdown_type):
(socket_base::wait_type, socket_base::message_flags):
Remove enumerators. Initialize constants directly with desired
values.
(socket_base::message_flags): Make all operators constexpr and
noexcept.
* testsuite/util/testsuite_common_types.h (test_bitmask_values):
New test utility.
* testsuite/experimental/net/socket/socket_base.cc: New test.

Tested powerpc64le-linux and sparc-solaris. Committed to trunk.

commit a752a43073dc49909c017fd52feacd7526ed31c0
Author: Jonathan Wakely 
Date:   Fri Apr 23 13:25:56 2021

libstdc++ Simplify definition of net::socket_base constants

libstdc++-v3/ChangeLog:

* include/experimental/socket (socket_base::shutdown_type):
(socket_base::wait_type, socket_base::message_flags):
Remove enumerators. Initialize constants directly with desired
values.
(socket_base::message_flags): Make all operators constexpr and
noexcept.
* testsuite/util/testsuite_common_types.h (test_bitmask_values):
New test utility.
* testsuite/experimental/net/socket/socket_base.cc: New test.

diff --git a/libstdc++-v3/include/experimental/socket 
b/libstdc++-v3/include/experimental/socket
index a5a23ed3c06..ec4ed9d95e2 100644
--- a/libstdc++-v3/include/experimental/socket
+++ b/libstdc++-v3/include/experimental/socket
@@ -250,37 +250,29 @@ inline namespace v1
   static const int _S_name = SO_SNDLOWAT;
 };
 
-enum shutdown_type : int
-{
-  __shutdown_receive   = SHUT_RD,
-  __shutdown_send  = SHUT_WR,
-  __shutdown_both  = SHUT_RDWR
-};
-static constexpr shutdown_type shutdown_receive= __shutdown_receive;
-static constexpr shutdown_type shutdown_send   = __shutdown_send;
-static constexpr shutdown_type shutdown_both   = __shutdown_both;
+enum shutdown_type : int { };
+static constexpr shutdown_type shutdown_receive = (shutdown_type)SHUT_RD;
+static constexpr shutdown_type shutdown_send= (shutdown_type)SHUT_WR;
+static constexpr shutdown_type shutdown_both= (shutdown_type)SHUT_RDWR;
 
+enum wait_type : int { };
 #ifdef _GLIBCXX_HAVE_POLL_H
-enum wait_type : int
-{
-  __wait_read  = POLLIN,
-  __wait_write = POLLOUT,
-  __wait_error = POLLERR
-};
-static constexpr wait_type wait_read   = __wait_read;
-static constexpr wait_type wait_write  = __wait_write;
-static constexpr wait_type wait_error  = __wait_error;
+static constexpr wait_type wait_read  = (wait_type)POLLIN;
+static constexpr wait_type wait_write = (wait_type)POLLOUT;
+static constexpr wait_type wait_error = (wait_type)POLLERR;
+#else
+static constexpr wait_type wait_read  = (wait_type)1;
+static constexpr wait_type wait_write = (wait_type)2;
+static constexpr wait_type wait_error = (wait_type)4;
 #endif
 
-enum message_flags : int
-{
-  __message_peek   = MSG_PEEK,
-  __message_oob= MSG_OOB,
-  __message_dontroute  = MSG_DONTROUTE
-};
-static constexpr message_flags message_peek= 
__message_peek;
-static constexpr message_flags message_out_of_band = __message_oob;
-static constexpr message_flags message_do_not_route= 
__message_dontroute;
+enum message_flags : int { };
+static constexpr message_flags message_peek
+  = (message_flags)MSG_PEEK;
+static constexpr message_flags message_out_of_band
+  = (message_flags)MSG_OOB;
+static constexpr message_flags message_do_not_route
+  = (message_flags)MSG_DONTROUTE;
 
 static const int max_listen_connections = SOMAXCONN;
 #endif
@@ -350,30 +342,37 @@ inline namespace v1
 
   constexpr socket_base::message_flags
   operator&(socket_base::message_flags __f1, socket_base::message_flags __f2)
+noexcept
   { return socket_base::message_flags( int(__f1) & int(__f2) ); }
 
   constexpr socket_base::message_flags
   operator|(socket_base::message_flags __f1, socket_base::message_flags __f2)
+noexcept
   { return socket_base::message_flags( int(__f1) | int(__f2) ); }
 
   constexpr socket_base::message_flags
   operator^(socket_base::message_flags __f1, socket_base::message_flags __f2)
+noexcept
   { return socket_base::message_flags( int(__f1) ^ int(__f2) ); }
 
   constexpr socket_base::message_flags
   operator~(socket_base::message_flags __f)
+noexcept
   { return socket_base::message_flags( ~int(__f) ); }
 
-  inline socket_base::message_flags&
+  constexpr socket_base::message_flags&
   operator&=(socket_base::message_flags& __f1, socket_base::message_flags __f2)
+noexcept
   { return __f1 = (__f1 & __f2); }
 
-  inline socket_base::message_flags&
+  constexpr 

Re: [PATCH] early-remat.c: Fix new/delete mismatch [PR100230]

2021-04-23 Thread Richard Biener via Gcc-patches
On Fri, Apr 23, 2021 at 2:13 PM Alex Coplan via Gcc-patches
 wrote:
>
> Hi,
>
> This simple patch fixes a mismatched operator new/delete in
> early-remat.c which triggers ASan errors on (at least) AArch64 when
> compiling SVE code.
>
> Bootstrap and regtest on aarch64-linux-gnu in progress.
>
> OK for trunk and backports (as appropriate) if testing looks good?

OK.

Thank,
Richard.

> Thanks,
> Alex
>
> gcc/ChangeLog:
>
> PR rtl-optimization/100230
> * early-remat.c (early_remat::sort_candidates): Use delete[]
> instead of delete for array allocated with new[].


[PATCH] VEC_COND_EXPR code cleanup

2021-04-23 Thread Richard Biener
This removes now unnecessary special-casings of VEC_COND_EXPRs after
making its first operand a gimple value.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-04-14  Richard Biener  

* genmatch.c (lower_cond): Remove VEC_COND_EXPR special-casing.
(capture_info::capture_info): Likewise.
(capture_info::walk_match): Likewise.
(expr::gen_transform): Likewise.
(dt_simplify::gen_1): Likewise.
* gimple-match-head.c (maybe_resimplify_conditional_op):
Remove VEC_COND_EXPR special-casing.
(gimple_simplify): Likewise.
* gimple.c (gimple_could_trap_p_1): Adjust.
* tree-ssa-pre.c (compute_avail): Allow VEC_COND_EXPR
to participate in PRE.
---
 gcc/genmatch.c  | 20 +++-
 gcc/gimple-match-head.c |  9 -
 gcc/gimple.c|  7 +--
 gcc/tree-ssa-pre.c  |  9 -
 4 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index 8311f5d768a..5db1d969688 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -1210,7 +1210,7 @@ lower_opt (simplify *s, vec& simplifiers)
 }
 }
 
-/* Lower the compare operand of COND_EXPRs and VEC_COND_EXPRs to a
+/* Lower the compare operand of COND_EXPRs to a
GENERIC and a GIMPLE variant.  */
 
 static vec
@@ -1257,8 +1257,7 @@ lower_cond (operand *o)
   /* If this is a COND with a captured expression or an
  expression with two operands then also match a GENERIC
 form on the compare.  */
-  if ((*e->operation == COND_EXPR
-  || *e->operation == VEC_COND_EXPR)
+  if (*e->operation == COND_EXPR
  && ((is_a  (e->ops[0])
   && as_a  (e->ops[0])->what
   && is_a  (as_a  (e->ops[0])->what)
@@ -1296,7 +1295,7 @@ lower_cond (operand *o)
   return ro;
 }
 
-/* Lower the compare operand of COND_EXPRs and VEC_COND_EXPRs to a
+/* Lower the compare operand of COND_EXPRs to a
GENERIC and a GIMPLE variant.  */
 
 static void
@@ -2132,9 +2131,7 @@ capture_info::capture_info (simplify *s, operand *result, 
bool gimple_)
(i != 0 && *e->operation == COND_EXPR)
|| *e->operation == TRUTH_ANDIF_EXPR
|| *e->operation == TRUTH_ORIF_EXPR,
-   i == 0
-   && (*e->operation == COND_EXPR
-   || *e->operation == VEC_COND_EXPR));
+   i == 0 && *e->operation == COND_EXPR);
 
   walk_result (s->result, false, result);
 }
@@ -2197,8 +2194,7 @@ capture_info::walk_match (operand *o, unsigned 
toplevel_arg,
   || *e->operation == TRUTH_ORIF_EXPR)
cond_p = true;
  if (i == 0
- && (*e->operation == COND_EXPR
- || *e->operation == VEC_COND_EXPR))
+ && *e->operation == COND_EXPR)
expr_cond_p = true;
  walk_match (e->ops[i], toplevel_arg, cond_p, expr_cond_p);
}
@@ -2494,8 +2490,7 @@ expr::gen_transform (FILE *f, int indent, const char 
*dest, bool gimple,
i == 0 ? NULL : op0type);
   ops[i]->gen_transform (f, indent, dest1, gimple, depth + 1, optype1,
 cinfo, indexes,
-(*opr == COND_EXPR
- || *opr == VEC_COND_EXPR) && i == 0 ? 1 : 2);
+*opr == COND_EXPR && i == 0 ? 1 : 2);
 }
 
   const char *opr_name;
@@ -3417,8 +3412,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
 into COND_EXPRs.  */
  int cond_handling = 0;
  if (!is_predicate)
-   cond_handling = ((*opr == COND_EXPR
- || *opr == VEC_COND_EXPR) && j == 0) ? 1 : 2;
+   cond_handling = (*opr == COND_EXPR && j == 0) ? 1 : 2;
  e->ops[j]->gen_transform (f, indent, dest, true, 1, optype,
, indexes, cond_handling);
}
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index d941b8b386f..84fbaefd762 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -147,10 +147,10 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
   tree_code op_code = (tree_code) res_op->code;
   bool op_could_trap;
 
-  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
+  /* COND_EXPR will trap if, and only if, the condition
 traps and hence we have to check this.  For all other operations, we
 don't need to consider the operands.  */
-  if (op_code == COND_EXPR || op_code == VEC_COND_EXPR)
+  if (op_code == COND_EXPR)
op_could_trap = generic_expr_could_trap_p (res_op->ops[0]);
   else
op_could_trap = operation_could_trap_p ((tree_code) res_op->code,
@@ -961,10 +961,9 @@ gimple_simplify (gimple *stmt, gimple_match_op *res_op, 
gimple_seq *seq,
{
  bool 

Re: [PATCH] gcov: Use system IO buffering

2021-04-23 Thread Martin Liška

On 4/23/21 11:44 AM, Richard Biener wrote:

On Fri, Apr 23, 2021 at 11:24 AM Martin Liška  wrote:


On 4/23/21 8:49 AM, Richard Biener wrote:

On Thu, Apr 22, 2021 at 9:47 PM Andi Kleen via Gcc-patches
 wrote:


Martin Liška  writes:


Hey.

I/O buffering in gcov seems duplicite to what modern C library can provide.
The patch is a simplification and can provide easier interface for system
that don't have a filesystem and would like using GCOV.

I'm going to install the patch after 11.1 if there are no objections.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.


What happens if someone compiles the C library with gcov?


Haven't tried that..



Yeah, I think this is the wrong direction - we're already having
issues with using
malloc, this makes it much worse.


I don't see where problem with my patch? It's not adding usage of any additional
system routines.


True.  So the only impact should be more calls to libc (not necessarily more
I/O since libc should do buffering).


Exactly.



"The patch is a simplification and can provide easier interface for system
that don't have a filesystem and would like using GCOV."

Can you explain?


It's described in the following thread:
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559342.html

Martin




Martin



Richard.


Being as self contained as possible (only system calls) would seem
safer.

-Andi






Re: [PATCH] arm: Fix PCS for SFmode -> SImode libcalls [PR99748]

2021-04-23 Thread Alex Coplan via Gcc-patches
On 01/04/2021 18:35, Richard Earnshaw wrote:
> 
> 
> On 01/04/2021 17:11, Alex Coplan via Gcc-patches wrote:
> > Hi all,
> > 
> > This patch fixes PR99748 which shows us trying to pass the argument to
> > __aeabi_f2iz in the VFP register s0 when the library function is
> > expecting to use the GPR r0. It also fixes the __aeabi_f2uiz case which
> > was broken in the same way.
> > 
> > For the testcase in the PR, here is the code we generate before the
> > patch (with -mfloat-abi=hard -march=armv8.1-m.main+mve -O0):
> > 
> > main:
> >  push{r7, lr}
> >  sub sp, sp, #8
> >  add r7, sp, #0
> >  mov r3, #1065353216
> >  str r3, [r7, #4]@ float
> >  vldr.32 s0, [r7, #4]
> >  bl  __aeabi_f2iz
> >  mov r3, r0
> >  cmp r3, #1
> >  [...]
> > 
> > This becomes:
> > 
> > main:
> >  push{r7, lr}
> >  sub sp, sp, #8
> >  add r7, sp, #0
> >  mov r3, #1065353216
> >  str r3, [r7, #4]@ float
> >  ldr r0, [r7, #4]@ float
> >  bl  __aeabi_f2iz
> >  mov r3, r0
> >  cmp r3, #1
> >  [...]
> > 
> > after the patch. We see a similar change for the same testcase with a
> > cast to unsigned instead of int.
> > 
> > Testing:
> >   * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions.
> >   * Regtested an arm-eabi cross configured with --with-float=hard
> > --with-arch=armv8.1-m.main+mve. This shows that the patch fixes the
> > following execution failures:
> > 
> > FAIL->PASS: gcc.c-torture/execute/2605-1.c   -O0  execution test
> > FAIL->PASS: gcc.c-torture/execute/conversion.c   -O0  execution test
> > FAIL->PASS: gcc.c-torture/execute/float-floor.c   -O0  execution test
> > FAIL->PASS: gcc.c-torture/execute/float-floor.c   -O1  execution test
> > FAIL->PASS: gcc.c-torture/execute/float-floor.c   -O2  execution test
> > FAIL->PASS: gcc.c-torture/execute/float-floor.c   -O2 -flto 
> > -fno-use-linker-plugin -flto-partition=none  execution test
> > FAIL->PASS: gcc.c-torture/execute/float-floor.c   -O3 -g  execution test
> > FAIL->PASS: gcc.c-torture/execute/float-floor.c   -Os  execution test
> > FAIL->PASS: gcc.c-torture/execute/gofast.c   -O0  execution test
> > FAIL->PASS: gcc.dg/torture/float32-basic.c   -O0  execution test
> > FAIL->PASS: gcc.dg/torture/float32-basic.c   -O1  execution test
> > FAIL->PASS: gcc.dg/torture/float32-basic.c   -O2  execution test
> > FAIL->PASS: gcc.dg/torture/float32-basic.c   -O2 -flto 
> > -fno-use-linker-plugin -flto-partition=none  execution test
> > FAIL->PASS: gcc.dg/torture/float32-basic.c   -O2 -flto 
> > -fuse-linker-plugin -fno-fat-lto-objects  execution test
> > FAIL->PASS: gcc.dg/torture/float32-basic.c   -O3 -g  execution test
> > FAIL->PASS: gcc.dg/torture/float32-basic.c   -Os  execution test
> > 
> > OK for trunk?
> > 
> > Thanks,
> > Alex
> > 
> > gcc/ChangeLog:
> > 
> > * config/arm/arm.c (arm_libcall_uses_aapcs_base): Also use base
> > PCS for [su]fix_optab.
> > 
> 
> OK.
> 
> As a wrong code bug we should probably be looking to backport this if needed
> (though it's likely too late now for 10.3).

Testing shows the patch fixes the issue on the 10 branch. Bootstrapped
on arm-linux-gnueabihf and regtested an MVE cross: no issues.

Cherry-picked as
r10-9755-g283367662c25057fd7c9c98257cca858f85b75fc.

> 
> R.

Thanks,
Alex


Re: [PATCH] Bump LTO_major_version to 11.

2021-04-23 Thread Martin Liška
On 4/23/21 12:59 PM, Richard Biener wrote:
> True, the question is on how much detail we have to pay attention to.

Agree with that.

> For us of course the build-id solution works fine.  And hopefully the
> days of PCH are counted...

Yes.

I have a tentative patch that emits the attached checksum.h header file.
We also include flags in the checksum:

...
 build/genchecksum$(build_exeext) $(C_OBJS) $(BACKEND) $(LIBDEPS) \

 checksum-options > cc1-checksum.c.tmp &&   \

...

$ cat checksum-options

g++ -no-pie   -g   -DIN_GCC -fPIC-fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -static-libstdc++ 
-static-libgcc  

Can we ignore them in the checksum calculation?
Martin
/* Checksum based on the following files:

  gt-ada-decl.h
  gt-ada-misc.h
  gt-ada-trans.h
  gt-ada-utils.h
  gt-alias.h
  gt-asan.h
  gt-bitmap.h
  gt-brig-brig-lang.h
  gt-caller-save.h
  gt-calls.h
  gt-c-c-decl.h
  gt-c-c-parser.h
  gt-c-family-c-common.h
  gt-c-family-c-cppbuiltin.h
  gt-c-family-c-format.h
  gt-c-family-c-pragma.h
  gt-cfgrtl.h
  gt-cgraphclones.h
  gt-cgraph.h
  gt-coverage.h
  gt-cp-call.h
  gt-cp-class.h
  gt-cp-constexpr.h
  gt-cp-constraint.h
  gt-cp-coroutines.h
  gt-cp-cp-gimplify.h
  gt-cp-cp-lang.h
  gt-cp-cp-objcp-common.h
  gt-cp-decl2.h
  gt-cp-decl.h
  gt-cp-except.h
  gt-cp-friend.h
  gt-cp-init.h
  gt-cp-lambda.h
  gt-cp-lex.h
  gt-cp-logic.h
  gt-cp-mangle.h
  gt-cp-method.h
  gt-cp-module.h
  gt-cp-name-lookup.h
  gt-cp-parser.h
  gt-cp-pt.h
  gt-cp-rtti.h
  gt-cp-semantics.h
  gt-cp-tree.h
  gt-cp-vtable-class-hierarchy.h
  gt-cselib.h
  gt-dbxout.h
  gt-d-d-builtins.h
  gt-d-d-lang.h
  gt-dojump.h
  gt-d-typeinfo.h
  gt-dwarf2asm.h
  gt-dwarf2cfi.h
  gt-dwarf2out.h
  gt-emit-rtl.h
  gt-except.h
  gt-explow.h
  gt-fortran-f95-lang.h
  gt-fortran-trans-decl.h
  gt-fortran-trans-intrinsic.h
  gt-fortran-trans-io.h
  gt-fortran-trans-stmt.h
  gt-fortran-trans-types.h
  gt-function.h
  gt-gcse.h
  gt-ggc-tests.h
  gt-gimple-expr.h
  gt-godump.h
  gt-go-go-lang.h
  gt-i386-builtins.h
  gt-i386-expand.h
  gt-i386.h
  gt-i386-options.h
  gt-ipa-devirt.h
  gt-ipa-modref.h
  gt-ipa-prop.h
  gt-ipa-sra.h
  gt-jit-dummy-frontend.h
  gt-lists.h
  gt-lto-lto-common.h
  gt-lto-lto-lang.h
  gt-objc-objc-act.h
  gt-objc-objc-gnu-runtime-abi-01.h
  gt-objc-objc-map.h
  gt-objc-objc-next-runtime-abi-01.h
  gt-objc-objc-next-runtime-abi-02.h
  gt-objc-objc-runtime-shared-support.h
  gt-omp-general.h
  gt-omp-low.h
  gt-optabs-libfuncs.h
  gt-stor-layout.h
  gt-stringpool.h
  gt-symtab-thunks.h
  gt-targhooks.h
  gt-trans-mem.h
  gt-tree.h
  gt-tree-iterator.h
  gt-tree-nested.h
  gt-tree-phinodes.h
  gt-tree-profile.h
  gt-tree-scalar-evolution.h
  gt-tree-ssa-address.h
  gt-tree-ssa-loop-ivopts.h
  gt-tree-vect-generic.h
  gt-ubsan.h
  gt-varasm.h
  gt-vtable-verify.h
  auto-host.h
*/

constexpr unsigned char executable_checksum[] = { 0xcf, 0xee, 0xca, 0xc0, 0x17, 0x97, 0x80, 0x55, 0x3a, 0xdd, 0xd4, 0x1e, 0xd4, 0xb9, 0xe7, 0x91 };
>From d5025b3148c895f78edaddca0637dc668ae81be9 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 23 Apr 2021 13:33:55 +0200
Subject: [PATCH] Emit checksum.h from gt-*.h and auto-host.h.

---
 gcc/Makefile.in | 12 
 gcc/c-family/c-common.h |  3 ---
 gcc/c-family/c-opts.c   |  1 +
 gcc/c-family/c-pch.c|  1 +
 gcc/c/Make-lang.in  | 20 +++-
 gcc/cp/Make-lang.in | 20 +++-
 gcc/genchecksum.c   |  9 ++---
 gcc/objc/Make-lang.in   | 12 +++-
 gcc/objcp/Make-lang.in  | 13 +++--
 9 files changed, 28 insertions(+), 63 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index e5d07fb98b0..d1c2da97f52 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1765,7 +1765,7 @@ MOSTLYCLEANFILES = insn-flags.h insn-config.h insn-codes.h \
  gcc-ranlib$(exeext) \
  genversion$(build_exeext) gcov$(exeext) gcov-dump$(exeext) \
  gcov-tool$(exeect) \
- gengtype$(exeext) *.[0-9][0-9].* *.[si] *-checksum.c libbackend.a \
+ gengtype$(exeext) genchecksum$(exeext) *.[0-9][0-9].* *.[si] libbackend.a \
  libcommon-target.a libcommon.a libgcc.mk perf.data
 
 # This symlink makes the full installation name of the driver be available
@@ -2814,7 +2814,6 @@ build/genautomata.o : genautomata.c $(RTL_BASE_H) $(OBSTACK_H)		\
 build/gencheck.o : gencheck.c all-tree.def $(BCONFIG_H) $(GTM_H)	\
 	$(SYSTEM_H) $(CORETYPES_H) tree.def c-family/c-common.def	\
 	$(lang_tree_files) gimple.def
-build/genchecksum.o : genchecksum.c $(BCONFIG_H) $(SYSTEM_H) $(MD5_H)
 build/gencodes.o : gencodes.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)	\
   $(CORETYPES_H) $(GTM_H) errors.h $(GENSUPPORT_H)
 build/genconditions.o : genconditions.c $(RTL_BASE_H) $(BCONFIG_H)	\
@@ -3037,6 

Re: [PATCH] First do add_noreturn_fake_exit_edges in connect_infinite_loops_to_exit

2021-04-23 Thread Richard Biener via Gcc-patches
On Thu, Feb 25, 2021 at 2:29 PM Richard Biener  wrote:
>
> Most callers of connect_infinite_loops_to_exit already do this but
> the few that do not end up with extra exit edges.  The following
> makes that consistent, also matching the post-dominance DFS walk code.
>
> Boostrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.

g:58ad6b2802592f1448eed48e8ad4e0e87985cecb

> 2021-02-25  Richard Biener  
>
> * cfganal.c (connect_infinite_loops_to_exit): First call
> add_noreturn_fake_exit_edges.
> * ipa-sra.c (process_scan_results): Do not call the now redundant
> add_noreturn_fake_exit_edges.
> * predict.c (tree_estimate_probability): Likewise.
> (rebuild_frequencies): Likewise.
> * store-motion.c (one_store_motion_pass): Likewise.
> ---
>  gcc/cfganal.c  | 10 +++---
>  gcc/ipa-sra.c  |  1 -
>  gcc/predict.c  |  2 --
>  gcc/store-motion.c |  1 -
>  4 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/cfganal.c b/gcc/cfganal.c
> index 2627c2ff457..cec5abe30f9 100644
> --- a/gcc/cfganal.c
> +++ b/gcc/cfganal.c
> @@ -582,9 +582,9 @@ add_noreturn_fake_exit_edges (void)
>make_single_succ_edge (bb, EXIT_BLOCK_PTR_FOR_FN (cfun), EDGE_FAKE);
>  }
>
> -/* This function adds a fake edge between any infinite loops to the
> -   exit block.  Some optimizations require a path from each node to
> -   the exit node.
> +/* This function adds a fake edge between any noreturn block and
> +   infinite loops to the exit block.  Some optimizations require a path
> +   from each node to the exit node.
>
> See also Morgan, Figure 3.10, pp. 82-83.
>
> @@ -596,6 +596,10 @@ add_noreturn_fake_exit_edges (void)
>  void
>  connect_infinite_loops_to_exit (void)
>  {
> +  /* First add fake exits to noreturn blocks, this is required to
> + discover only truly infinite loops below.  */
> +  add_noreturn_fake_exit_edges ();
> +
>/* Perform depth-first search in the reverse graph to find nodes
>   reachable from the exit block.  */
>depth_first_search dfs;
> diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c
> index 1571921cb48..7a89906cee6 100644
> --- a/gcc/ipa-sra.c
> +++ b/gcc/ipa-sra.c
> @@ -2394,7 +2394,6 @@ process_scan_results (cgraph_node *node, struct 
> function *fun,
> if (!pdoms_calculated)
>   {
> gcc_checking_assert (cfun);
> -   add_noreturn_fake_exit_edges ();
> connect_infinite_loops_to_exit ();
> calculate_dominance_info (CDI_POST_DOMINATORS);
> pdoms_calculated = true;
> diff --git a/gcc/predict.c b/gcc/predict.c
> index d0a8e5f8e04..0bf1748ffa8 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -3106,7 +3106,6 @@ tree_estimate_probability (bool dry_run)
>  {
>basic_block bb;
>
> -  add_noreturn_fake_exit_edges ();
>connect_infinite_loops_to_exit ();
>/* We use loop_niter_by_eval, which requires that the loops have
>   preheaders.  */
> @@ -4291,7 +4290,6 @@ rebuild_frequencies (void)
>if (profile_status_for_fn (cfun) == PROFILE_GUESSED)
>  {
>loop_optimizer_init (0);
> -  add_noreturn_fake_exit_edges ();
>mark_irreducible_loops ();
>connect_infinite_loops_to_exit ();
>estimate_bb_frequencies (true);
> diff --git a/gcc/store-motion.c b/gcc/store-motion.c
> index f0401cae272..3f6e003219d 100644
> --- a/gcc/store-motion.c
> +++ b/gcc/store-motion.c
> @@ -1152,7 +1152,6 @@ one_store_motion_pass (void)
>
>/* Now compute kill & transp vectors.  */
>build_store_vectors ();
> -  add_noreturn_fake_exit_edges ();
>connect_infinite_loops_to_exit ();
>
>edge_list = pre_edge_rev_lcm (num_stores, st_transp, st_avloc,
> --
> 2.26.2


Re: [PATCH] Bump LTO_major_version to 11.

2021-04-23 Thread Richard Biener via Gcc-patches
On Fri, Apr 23, 2021 at 11:51 AM Jan Hubicka  wrote:
>
> > > That needs to be combined with the generated auto-host.h header file.
> > > From which locations do you want to build the hash? Any other $objdir
> > > files except auto-host.h?
> >
> > In fact for PCH just summing the gengtype generated files would be
> > good enough I guess ...
>
> I think one can, for example, change datastructure layout/meaning of a
> bit in tree.h that invalidates PCH but makes the accessors same.

True, the question is on how much detail we have to pay attention to.
For us of course the build-id solution works fine.  And hopefully the
days of PCH are counted...

Richard.

> Honza
> >
> > > Note 'git archive' can append arbitrary non-git files.
> > >
> > > 2) Doing checksum of *.[cC] in a given folder + auto-host.h.
> > >
> > > 3) Using git hash (+ auto-host.h), but it's likely too gross, right?
> > >
> > > > link and my "hack" to re-use the version from prev-gcc
> > > > as well as our openSUSE "hack" for reproducible builds
> > > > which elides genchecksum.c for the use of the build-id
> > > > in the actual executables.
> > >
> > > What a hack. The binary is reading it's buildid right from the memory,
> > > right?
> >
> > Well, yes (I think I've posted the patch as RFC once, attached for 
> > reference).
> >
> > Richard.
> >
> > > Thoughts?
> > >
> > > Martin
> > >
> > > >
> > > > Richard.
> > > >
> > > >> Thanks,
> > > >> Martin
> > >
>
> > Use the binaries build-id as checksum for PCH purposes.
> >
> > diff --git a/gcc/c-family/c-pch.c b/gcc/c-family/c-pch.c
> > index a2292f46a7d..94d633d139a 100644
> > --- a/gcc/c-family/c-pch.c
> > +++ b/gcc/c-family/c-pch.c
> > @@ -65,6 +65,66 @@ static FILE *pch_outfile;
> >
> >  static const char *get_ident (void);
> >
> > +#if _GNU_SOURCE
> > +#include 
> > +
> > +#define ALIGN(val, align)  (((val) + (align) - 1) & ~((align) - 1))
> > +
> > +static int
> > +get_build_id_1 (struct dl_phdr_info *info, size_t, void *data)
> > +{
> > +  for (unsigned i = 0; i < info->dlpi_phnum; ++i)
> > +{
> > +  if (info->dlpi_phdr[i].p_type != PT_NOTE)
> > + continue;
> > +  ElfW(Nhdr) *nhdr
> > + = (ElfW(Nhdr) *)(info->dlpi_addr + info->dlpi_phdr[i].p_vaddr);
> > +  ptrdiff_t size = info->dlpi_phdr[i].p_filesz;
> > +  ptrdiff_t align = info->dlpi_phdr[i].p_align;
> > +  if (align != 8)
> > + align = 4;
> > +  while (size >= (ptrdiff_t)sizeof (ElfW(Nhdr)))
> > + {
> > +   if (nhdr->n_type == NT_GNU_BUILD_ID
> > +   && nhdr->n_namesz == 4
> > +   && strncmp ((char *)nhdr
> > +   + sizeof (ElfW(Nhdr)),
> > +   "GNU", 4) == 0
> > +   && nhdr->n_descsz >= 16)
> > + {
> > +   memcpy (data,
> > +   (char *)nhdr
> > +   + ALIGN (sizeof (ElfW(Nhdr))
> > ++ nhdr->n_namesz, align), 16);
> > +   return 1;
> > + }
> > +   size_t offset = (ALIGN (sizeof (ElfW(Nhdr))
> > +   + nhdr->n_namesz, align)
> > ++ ALIGN(nhdr->n_descsz, align));
> > +   nhdr = (ElfW(Nhdr) *)((char *)nhdr + offset);
> > +   size -= offset;
> > + }
> > +}
> > +
> > +  return 0;
> > +}
> > +
> > +static const unsigned char *
> > +get_build_id ()
> > +{
> > +  static unsigned char build_id[16];
> > +  if (!dl_iterate_phdr (get_build_id_1, build_id))
> > +return NULL;
> > +  return build_id;
> > +}
> > +#else
> > +static const unsigned char *
> > +get_build_id ()
> > +{
> > +  return NULL;
> > +}
> > +#endif
> > +
> >  /* Compute an appropriate 8-byte magic number for the PCH file, so that
> > utilities like file(1) can identify it, and so that GCC can quickly
> > ignore non-PCH files and PCH files that are of a completely different
> > @@ -120,8 +180,11 @@ pch_init (void)
> >v.pch_init = _init;
> >target_validity = targetm.get_pch_validity (_data_length);
> >
> > +  const unsigned char *chksum = get_build_id ();
> > +  if (!chksum)
> > +chksum = executable_checksum;
> >if (fwrite (partial_pch, IDENT_LENGTH, 1, f) != 1
> > -  || fwrite (executable_checksum, 16, 1, f) != 1
> > +  || fwrite (chksum, 16, 1, f) != 1
> >|| fwrite (, sizeof (v), 1, f) != 1
> >|| fwrite (target_validity, v.target_data_length, 1, f) != 1)
> >  fatal_error (input_location, "cannot write to %s: %m", pch_file);
> > @@ -237,7 +300,10 @@ c_common_valid_pch (cpp_reader *pfile, const char 
> > *name, int fd)
> >   }
> >return 2;
> >  }
> > -  if (memcmp (ident + IDENT_LENGTH, executable_checksum, 16) != 0)
> > +  const unsigned char *chksum = get_build_id ();
> > +  if (!chksum)
> > +chksum = executable_checksum;
> > +  if (memcmp (ident + IDENT_LENGTH, chksum, 16) != 0)
> >  {
> >if (cpp_get_options (pfile)->warn_invalid_pch)
> >   cpp_error (pfile, CPP_DL_WARNING,
> > diff --git a/gcc/genchecksum.c 

[PATCH] early-remat.c: Fix new/delete mismatch [PR100230]

2021-04-23 Thread Alex Coplan via Gcc-patches
Hi,

This simple patch fixes a mismatched operator new/delete in
early-remat.c which triggers ASan errors on (at least) AArch64 when
compiling SVE code.

Bootstrap and regtest on aarch64-linux-gnu in progress.

OK for trunk and backports (as appropriate) if testing looks good?

Thanks,
Alex

gcc/ChangeLog:

PR rtl-optimization/100230
* early-remat.c (early_remat::sort_candidates): Use delete[]
instead of delete for array allocated with new[].
diff --git a/gcc/early-remat.c b/gcc/early-remat.c
index c8d4fee937d..92077d094ae 100644
--- a/gcc/early-remat.c
+++ b/gcc/early-remat.c
@@ -1059,7 +1059,7 @@ early_remat::sort_candidates (void)
 
   m_candidates.qsort (compare_candidates);
 
-  delete postorder_index;
+  delete[] postorder_index;
 }
 
 /* Commit to the current candidate indices and initialize cross-references.  */


Re: [PATCH] Bump LTO_major_version to 11.

2021-04-23 Thread Jan Hubicka
> > That needs to be combined with the generated auto-host.h header file.
> > From which locations do you want to build the hash? Any other $objdir
> > files except auto-host.h?
> 
> In fact for PCH just summing the gengtype generated files would be
> good enough I guess ...

I think one can, for example, change datastructure layout/meaning of a
bit in tree.h that invalidates PCH but makes the accessors same.

Honza
> 
> > Note 'git archive' can append arbitrary non-git files.
> >
> > 2) Doing checksum of *.[cC] in a given folder + auto-host.h.
> >
> > 3) Using git hash (+ auto-host.h), but it's likely too gross, right?
> >
> > > link and my "hack" to re-use the version from prev-gcc
> > > as well as our openSUSE "hack" for reproducible builds
> > > which elides genchecksum.c for the use of the build-id
> > > in the actual executables.
> >
> > What a hack. The binary is reading it's buildid right from the memory,
> > right?
> 
> Well, yes (I think I've posted the patch as RFC once, attached for reference).
> 
> Richard.
> 
> > Thoughts?
> >
> > Martin
> >
> > >
> > > Richard.
> > >
> > >> Thanks,
> > >> Martin
> >

> Use the binaries build-id as checksum for PCH purposes.
> 
> diff --git a/gcc/c-family/c-pch.c b/gcc/c-family/c-pch.c
> index a2292f46a7d..94d633d139a 100644
> --- a/gcc/c-family/c-pch.c
> +++ b/gcc/c-family/c-pch.c
> @@ -65,6 +65,66 @@ static FILE *pch_outfile;
>  
>  static const char *get_ident (void);
>  
> +#if _GNU_SOURCE
> +#include 
> +
> +#define ALIGN(val, align)  (((val) + (align) - 1) & ~((align) - 1))
> +
> +static int
> +get_build_id_1 (struct dl_phdr_info *info, size_t, void *data)
> +{
> +  for (unsigned i = 0; i < info->dlpi_phnum; ++i)
> +{
> +  if (info->dlpi_phdr[i].p_type != PT_NOTE)
> + continue;
> +  ElfW(Nhdr) *nhdr
> + = (ElfW(Nhdr) *)(info->dlpi_addr + info->dlpi_phdr[i].p_vaddr);
> +  ptrdiff_t size = info->dlpi_phdr[i].p_filesz;
> +  ptrdiff_t align = info->dlpi_phdr[i].p_align;
> +  if (align != 8)
> + align = 4;
> +  while (size >= (ptrdiff_t)sizeof (ElfW(Nhdr)))
> + {
> +   if (nhdr->n_type == NT_GNU_BUILD_ID
> +   && nhdr->n_namesz == 4
> +   && strncmp ((char *)nhdr
> +   + sizeof (ElfW(Nhdr)),
> +   "GNU", 4) == 0
> +   && nhdr->n_descsz >= 16)
> + {
> +   memcpy (data, 
> +   (char *)nhdr
> +   + ALIGN (sizeof (ElfW(Nhdr))
> ++ nhdr->n_namesz, align), 16);
> +   return 1;
> + }
> +   size_t offset = (ALIGN (sizeof (ElfW(Nhdr))
> +   + nhdr->n_namesz, align)
> ++ ALIGN(nhdr->n_descsz, align));
> +   nhdr = (ElfW(Nhdr) *)((char *)nhdr + offset);
> +   size -= offset;
> + }
> +}
> +
> +  return 0;
> +}
> +
> +static const unsigned char *
> +get_build_id ()
> +{
> +  static unsigned char build_id[16];
> +  if (!dl_iterate_phdr (get_build_id_1, build_id))
> +return NULL;
> +  return build_id;
> +}
> +#else
> +static const unsigned char *
> +get_build_id ()
> +{
> +  return NULL;
> +}
> +#endif
> +
>  /* Compute an appropriate 8-byte magic number for the PCH file, so that
> utilities like file(1) can identify it, and so that GCC can quickly
> ignore non-PCH files and PCH files that are of a completely different
> @@ -120,8 +180,11 @@ pch_init (void)
>v.pch_init = _init;
>target_validity = targetm.get_pch_validity (_data_length);
>  
> +  const unsigned char *chksum = get_build_id ();
> +  if (!chksum)
> +chksum = executable_checksum;
>if (fwrite (partial_pch, IDENT_LENGTH, 1, f) != 1
> -  || fwrite (executable_checksum, 16, 1, f) != 1
> +  || fwrite (chksum, 16, 1, f) != 1
>|| fwrite (, sizeof (v), 1, f) != 1
>|| fwrite (target_validity, v.target_data_length, 1, f) != 1)
>  fatal_error (input_location, "cannot write to %s: %m", pch_file);
> @@ -237,7 +300,10 @@ c_common_valid_pch (cpp_reader *pfile, const char *name, 
> int fd)
>   }
>return 2;
>  }
> -  if (memcmp (ident + IDENT_LENGTH, executable_checksum, 16) != 0)
> +  const unsigned char *chksum = get_build_id ();
> +  if (!chksum)
> +chksum = executable_checksum;
> +  if (memcmp (ident + IDENT_LENGTH, chksum, 16) != 0)
>  {
>if (cpp_get_options (pfile)->warn_invalid_pch)
>   cpp_error (pfile, CPP_DL_WARNING,
> diff --git a/gcc/genchecksum.c b/gcc/genchecksum.c
> index 09fbb63fa93..ec8b3281d53 100644
> --- a/gcc/genchecksum.c
> +++ b/gcc/genchecksum.c
> @@ -113,8 +113,13 @@ main (int argc, char ** argv)
>puts ("#include \"config.h\"");
>puts ("#include \"system.h\"");
>fputs ("EXPORTED_CONST unsigned char executable_checksum[16] = { ", 
> stdout);
> +#if _GNU_SOURCE
> +  for (i = 0; i < 16; i++)
> +printf ("0x%02x%s", 0, i == 15 ? " };\n" : ", ");
> +#else
>for (i = 0; i < 16; i++)
>  printf ("0x%02x%s", result[i], i == 

Re: [PATCH] gcov: Use system IO buffering

2021-04-23 Thread Richard Biener via Gcc-patches
On Fri, Apr 23, 2021 at 11:24 AM Martin Liška  wrote:
>
> On 4/23/21 8:49 AM, Richard Biener wrote:
> > On Thu, Apr 22, 2021 at 9:47 PM Andi Kleen via Gcc-patches
> >  wrote:
> >>
> >> Martin Liška  writes:
> >>
> >>> Hey.
> >>>
> >>> I/O buffering in gcov seems duplicite to what modern C library can 
> >>> provide.
> >>> The patch is a simplification and can provide easier interface for system
> >>> that don't have a filesystem and would like using GCOV.
> >>>
> >>> I'm going to install the patch after 11.1 if there are no objections.
> >>>
> >>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> What happens if someone compiles the C library with gcov?
>
> Haven't tried that..
>
> >
> > Yeah, I think this is the wrong direction - we're already having
> > issues with using
> > malloc, this makes it much worse.
>
> I don't see where problem with my patch? It's not adding usage of any 
> additional
> system routines.

True.  So the only impact should be more calls to libc (not necessarily more
I/O since libc should do buffering).

"The patch is a simplification and can provide easier interface for system
that don't have a filesystem and would like using GCOV."

Can you explain?

> Martin
>
> >
> > Richard.
> >
> >> Being as self contained as possible (only system calls) would seem
> >> safer.
> >>
> >> -Andi
>


Re: [PATCH] Bump LTO_major_version to 11.

2021-04-23 Thread Richard Biener via Gcc-patches
On Fri, Apr 23, 2021 at 9:59 AM Martin Liška  wrote:
>
> On 4/23/21 9:28 AM, Richard Biener wrote:
> > On Tue, Apr 20, 2021 at 8:49 PM Martin Liška  wrote:
> >>
> >> On 4/20/21 2:46 PM, Richard Biener wrote:
> >>> OK.  Can you somehow arrange for trunk to pick up LTO_major from GCC
> >>> major automagically then?
> >>
> >> I have a pretty nice solution for it where I extended (and simplified)
> >> the existing gcov-iov.c generator. Doing that we can remove 
> >> gcc/version.[ch].
> >>
> >> Using the patch, the following version.h is generated:
> >>
> >> #ifndef VERSION_H
> >> #define VERSION_H
> >>
> >> /* Generated automatically by genversion.  */
> >>
> >> #define GCC_major_version 12
> >>
> >> /* The complete version string, assembled from several pieces.
> >> BASEVER, DATESTAMP, DEVPHASE, and REVISION are defined by the
> >> Makefile.  */
> >>
> >> #define version_string "12.0.0 20210420 (experimental)"
> >> #define pkgversion_string "(GCC) "
> >>
> >> /* This is the location of the online document giving instructions for
> >> reporting bugs.  If you distribute a modified version of GCC,
> >> please configure with --with-bugurl pointing to a document giving
> >> instructions for reporting bugs to you, not us.  (You are of course
> >> welcome to forward us bugs reported to you, if you determine that
> >> they are not bugs in your modifications.)  */
> >>
> >> #define bug_report_url ""
> >>
> >> #define GCOV_VERSION ((gcov_unsigned_t)0x42323020)  /* B20  */
> >>
> >> #endif /* VERSION_H */
> >>
> >> Ready for master?
> >
> > Nice.  This is OK if others do not have further comments.
>
> Thanks, I'm going to install it once GCC 11.1 is released.
>
> >
> > I think we'd want to explore whether we can integrate
> > genchecksum.c as well and make the PCH checksum
> > based on a set of source files (including the generated
> > auto-host.h) - that might allow removing the two-stage
>
> Definitely. I see multiple options:
>
> 1) using git, it can make provide a hash for content of a folder:
>
> $ git ls-tree HEAD -- gcc
>
> 04 tree db613554ec17462c63bace2015c877d6bed70bbegcc
>
> One can do that per-file as well:
> git ls-tree HEAD -- gcc/c/*.c
>
> 100644 blob bae5757ad137c0af58dbe66229d4201a45094acagcc/c/c-aux-info.c
>
> 100644 blob d0035a31723447657a04c2ef79c9fd7c0ddc7568gcc/c/c-convert.c
>
> 100644 blob 3ea4708c5075d9274601a0676f86a6900a9345b0gcc/c/c-decl.c
>
> 100644 blob de98958ceabac9d631f937f9f28547d8aed26af2gcc/c/c-errors.c
>
> 100644 blob 68c74cc1eb2ef908545b36e2dbff65606f756e15gcc/c/c-fold.c
>
> ...

I think using git is out of the question unless we want to check in the
generated file.  We ship tarballs (w/o generated files for snapshots) and those
have to build as well.

> That needs to be combined with the generated auto-host.h header file.
> From which locations do you want to build the hash? Any other $objdir
> files except auto-host.h?

In fact for PCH just summing the gengtype generated files would be
good enough I guess ...

> Note 'git archive' can append arbitrary non-git files.
>
> 2) Doing checksum of *.[cC] in a given folder + auto-host.h.
>
> 3) Using git hash (+ auto-host.h), but it's likely too gross, right?
>
> > link and my "hack" to re-use the version from prev-gcc
> > as well as our openSUSE "hack" for reproducible builds
> > which elides genchecksum.c for the use of the build-id
> > in the actual executables.
>
> What a hack. The binary is reading it's buildid right from the memory,
> right?

Well, yes (I think I've posted the patch as RFC once, attached for reference).

Richard.

> Thoughts?
>
> Martin
>
> >
> > Richard.
> >
> >> Thanks,
> >> Martin
>
Use the binaries build-id as checksum for PCH purposes.

diff --git a/gcc/c-family/c-pch.c b/gcc/c-family/c-pch.c
index a2292f46a7d..94d633d139a 100644
--- a/gcc/c-family/c-pch.c
+++ b/gcc/c-family/c-pch.c
@@ -65,6 +65,66 @@ static FILE *pch_outfile;
 
 static const char *get_ident (void);
 
+#if _GNU_SOURCE
+#include 
+
+#define ALIGN(val, align)  (((val) + (align) - 1) & ~((align) - 1))
+
+static int
+get_build_id_1 (struct dl_phdr_info *info, size_t, void *data)
+{
+  for (unsigned i = 0; i < info->dlpi_phnum; ++i)
+{
+  if (info->dlpi_phdr[i].p_type != PT_NOTE)
+	continue;
+  ElfW(Nhdr) *nhdr
+	= (ElfW(Nhdr) *)(info->dlpi_addr + info->dlpi_phdr[i].p_vaddr);
+  ptrdiff_t size = info->dlpi_phdr[i].p_filesz;
+  ptrdiff_t align = info->dlpi_phdr[i].p_align;
+  if (align != 8)
+	align = 4;
+  while (size >= (ptrdiff_t)sizeof (ElfW(Nhdr)))
+	{
+	  if (nhdr->n_type == NT_GNU_BUILD_ID
+	  && nhdr->n_namesz == 4
+	  && strncmp ((char *)nhdr
+			  + sizeof (ElfW(Nhdr)),
+			  "GNU", 4) == 0
+	  && nhdr->n_descsz >= 16)
+	{
+	  memcpy (data, 
+		  (char *)nhdr
+		  + ALIGN (sizeof (ElfW(Nhdr))
+			   + nhdr->n_namesz, align), 16);
+	  return 1;
+	}
+	  size_t offset = (ALIGN (sizeof (ElfW(Nhdr))
+  + 

Re: [PATCH] Add folding and remove expanders for x86 *pcmp{et,gt}* builtins [PR target/98911]

2021-04-23 Thread Uros Bizjak via Gcc-patches
On Fri, Apr 23, 2021 at 11:04 AM Hongtao Liu  wrote:
>
> On Fri, Apr 23, 2021 at 3:18 PM Uros Bizjak  wrote:
> >
> > On Fri, Apr 23, 2021 at 9:15 AM Hongtao Liu  wrote:
> > >
> > > On Fri, Apr 23, 2021 at 2:50 PM Uros Bizjak  wrote:
> > > >
> > > > On Fri, Apr 23, 2021 at 8:36 AM Hongtao Liu  wrote:
> > > > >
> > > > > Hi:
> > > > >   The patch is a follow-up to
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564320.html.
> > > > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > > > >   Ok for trunk?
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR target/98911
> > > > > * config/i386/i386-builtin.def (BDESC): Change the icode of
> > > > > the following builtins to CODE_FOR_nothing.
> > > > > * config/i386/i386.c (ix86_gimple_fold_builtin): Fold
> > > > > IX86_BUILTIN_PCMPEQB128, IX86_BUILTIN_PCMPEQW128,
> > > > > IX86_BUILTIN_PCMPEQD128, IX86_BUILTIN_PCMPEQQ,
> > > > > IX86_BUILTIN_PCMPEQB256, IX86_BUILTIN_PCMPEQW256,
> > > > > IX86_BUILTIN_PCMPEQD256, IX86_BUILTIN_PCMPEQQ256,
> > > > > IX86_BUILTIN_PCMPGTB128, IX86_BUILTIN_PCMPGTW128,
> > > > > IX86_BUILTIN_PCMPGTD128, IX86_BUILTIN_PCMPGTQ,
> > > > > IX86_BUILTIN_PCMPGTB256, IX86_BUILTIN_PCMPGTW256,
> > > > > IX86_BUILTIN_PCMPGTD256, IX86_BUILTIN_PCMPGTQ256.
> > > > > * config/i386/sse.md (avx2_eq3): Deleted.
> > > > > (sse2_eq3): Ditto.
> > > > > (sse2_gt3): Rename to ..
> > > > > (*sse2_gt3): .. this.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR target/98911
> > > > > * gcc.target/i386/pr98911.c: New test.
> > > > > * gcc.target/i386/funcspec-8.c: Remove __builtin_ia32_pcmpgtq
> > > > > since it has been folded.
> > > >
> > > >
> > > > -(define_expand "sse2_eq3"
> > > > -  [(set (match_operand:VI124_128 0 "register_operand")
> > > > -(eq:VI124_128
> > > > -  (match_operand:VI124_128 1 "vector_operand")
> > > > -  (match_operand:VI124_128 2 "vector_operand")))]
> > > > -  "TARGET_SSE2 && !TARGET_XOP "
> > > > -  "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);")
> > > > -
> > > >  (define_expand "sse4_1_eqv2di3"
> > > >[(set (match_operand:V2DI 0 "register_operand")
> > > >  (eq:V2DI
> > > >
> > > > You can also remove sse4_1_eqv2di3 expander.
> > >
> > > Oh, yes.
> > >
> > > >
> > > > -#ifdef __SSE4_2__
> > > > -#error "-msse4.2 should not be set for this test"
> > > > -#endif
> > > > -
> > > > -__m128i sse4_2_pcmpgtq (__m128i a, __m128i b)
> > > > __attribute__((__target__("sse4.2")));
> > > > -__m128i generic_pcmpgtq (__m128i ab, __m128i b);
> > > > -
> > > > -__m128i
> > > > -sse4_2_pcmpgtq (__m128i a, __m128i b)
> > > > -{
> > > > -  return __builtin_ia32_pcmpgtq (a, b);
> > > > -}
> > > > -
> > > > -__m128i
> > > > -generic_pcmpgtq (__m128i a, __m128i b)
> > > > -{
> > > > -  return __builtin_ia32_pcmpgtq (a, b);/* { dg-error
> > > > "needs isa option" } */
> > > > -}
> > > >
> > > > Why remove the above? It is testing isa options, it has nothing to do
> > > > with improved folding.
> > >
> > > If the backend does not support the corresponding instruction, the
> > > vector operation will be automatically lowered to scalar, so no error
> > > will be reported
> >
> > I see. It would be nice to change the test to use some other SSE4.2
> > builtin (there are plenty of packed compares) and not remove it
> > altogether.
>
> Updated patch.

LGTM.

Thanks,
Uros.


Re: [PATCH] gcov: Use system IO buffering

2021-04-23 Thread Martin Liška
On 4/23/21 8:49 AM, Richard Biener wrote:
> On Thu, Apr 22, 2021 at 9:47 PM Andi Kleen via Gcc-patches
>  wrote:
>>
>> Martin Liška  writes:
>>
>>> Hey.
>>>
>>> I/O buffering in gcov seems duplicite to what modern C library can provide.
>>> The patch is a simplification and can provide easier interface for system
>>> that don't have a filesystem and would like using GCOV.
>>>
>>> I'm going to install the patch after 11.1 if there are no objections.
>>>
>>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> What happens if someone compiles the C library with gcov?

Haven't tried that..

> 
> Yeah, I think this is the wrong direction - we're already having
> issues with using
> malloc, this makes it much worse.

I don't see where problem with my patch? It's not adding usage of any additional
system routines.

Martin

> 
> Richard.
> 
>> Being as self contained as possible (only system calls) would seem
>> safer.
>>
>> -Andi



[PATCH] tree-optimization/100222 - remove redundant mark_irreducible_loops calls

2021-04-23 Thread Richard Biener
loop_optimizer_init (LOOPS_NORMAL) already performs this (quite
expensive) marking.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

2021-04-23  Richard Biener  

PR tree-optimization/100222
* predict.c (pass_profile::execute): Remove redundant call to
mark_irreducible_loops.
(report_predictor_hitrates): Likewise.
---
 gcc/predict.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/predict.c b/gcc/predict.c
index d0a8e5f8e04..dc2327d4032 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -4096,8 +4096,6 @@ pass_profile::execute (function *fun)
   if (dump_file && (dump_flags & TDF_DETAILS))
 flow_loops_dump (dump_file, NULL, 0);
 
-  mark_irreducible_loops ();
-
   nb_loops = number_of_loops (fun);
   if (nb_loops > 1)
 scev_initialize ();
@@ -4320,8 +4318,6 @@ report_predictor_hitrates (void)
   if (dump_file && (dump_flags & TDF_DETAILS))
 flow_loops_dump (dump_file, NULL, 0);
 
-  mark_irreducible_loops ();
-
   nb_loops = number_of_loops (cfun);
   if (nb_loops > 1)
 scev_initialize ();
-- 
2.26.2


Re: [PATCH] [i386] Optimize __builtin_shuffle when it's used to zero the upper bits of the dest. [PR target/94680]

2021-04-23 Thread Jakub Jelinek via Gcc-patches
On Fri, Apr 23, 2021 at 12:53:58PM +0800, Hongtao Liu via Gcc-patches wrote:
> +  if (!CONST_INT_P (er))
> + return 0;
> +  ei = INTVAL (er);
> +  if (i < nelt2 && ei != i)
> + return 0;
> +  if (i >= nelt2
> +  && (ei < nelt || ei >= nelt<<1))

Formatting:
1) you have spaces followed by tab, remove the spaces; but,
  if (i >= nelt2 && (ei < nelt || ei >= nelt<<1))
   fits on one line, so keep it on one line.
2) nelt<<1 should be nelt << 1 with spaces around the <<

> -(define_insn "*vec_concatv4si_0"
> -  [(set (match_operand:V4SI 0 "register_operand"   "=v,x")
> - (vec_concat:V4SI
> -   (match_operand:V2SI 1 "nonimmediate_operand" "vm,?!*y")
> -   (match_operand:V2SI 2 "const0_operand"   " C,C")))]
> +(define_insn "*vec_concat_0"
> +  [(set (match_operand:VI124_128 0 "register_operand"   "=v,x")
> + (vec_concat:VI124_128
> +   (match_operand: 1 "nonimmediate_operand" "vm,?!*y")
> +   (match_operand: 2 "const0_operand"   " C,C")))]
>"TARGET_SSE2"
>"@
> %vmovq\t{%1, %0|%0, %1}
> @@ -22154,6 +22157,24 @@ (define_insn "avx_vec_concat"
> (set_attr "prefix" "maybe_evex")
> (set_attr "mode" "")])
>  
> +(define_insn_and_split "*vec_concat_0"

Would be better to use a different pattern name, *vec_concat_0
is already used in the above define_insn.
Use some additional suffix after _0?

> +  return __builtin_shuffle (x, (v32qi) { 0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0 },
> +(v32qi) { 0, 1, 2, 3, 4, 5, 6, 7,
> +  8, 9, 10, 11, 12, 13, 14, 15,
> +  32, 49, 34, 58, 36, 53, 38, 39,
> +  40, 60, 42, 43, 63, 45, 46, 47 });

In this testcase the shuffles in the part taking indexes from the zero
vector are nicely randomized.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr94680.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512bw -mavx512vbmi -O2" } */
> +/* { dg-final { scan-assembler-times {(?n)vmov[a-z0-9]*[ \t]*%ymm[0-9]} 6} } 
> */
> +/* { dg-final { scan-assembler-not "pxor" } } */
> +
> +
> +typedef float v16sf __attribute__((vector_size(64)));
> +typedef double v8df __attribute__ ((vector_size (64)));
> +typedef long long v8di __attribute__((vector_size(64)));
> +typedef int v16si __attribute__((vector_size(64)));
> +typedef short v32hi __attribute__ ((vector_size (64)));
> +typedef char v64qi __attribute__ ((vector_size (64)));
> +
> +v8df
> +foo_v8df (v8df x)
> +{
> +  return __builtin_shuffle (x, (v8df) { 0, 0, 0, 0, 0, 0, 0, 0 },
> + (v8di) { 0, 1, 2, 3, 8, 9, 10, 11 });
> +}
> +
> +v8di
> +foo_v8di (v8di x)
> +{
> +  return __builtin_shuffle (x, (v8di) { 0, 0, 0, 0, 0, 0, 0, 0 },
> + (v8di) { 0, 1, 2, 3, 8, 9, 10, 11 });
> +}
> +
> +v16sf
> +foo_v16sf (v16sf x)
> +{
> +  return __builtin_shuffle (x, (v16sf)  { 0, 0, 0, 0, 0, 0, 0, 0,
> +0, 0, 0, 0, 0, 0, 0, 0 },
> +(v16si) { 0, 1, 2, 3, 4, 5, 6, 7,
> +  16, 17, 18, 19, 20, 21, 22, 23 });
> +}
> +
> +v16si
> +foo_v16si (v16si x)
> +{
> +return __builtin_shuffle (x, (v16si)  { 0, 0, 0, 0, 0, 0, 0, 0,
> +0, 0, 0, 0, 0, 0, 0, 0 },
> +(v16si) { 0, 1, 2, 3, 4, 5, 6, 7,
> +  16, 17, 18, 19, 20, 21, 22, 23 });
> +}
> +
> +v32hi
> +foo_v32hi (v32hi x)
> +{
> +  return __builtin_shuffle (x, (v32hi) { 0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0 },
> +(v32hi) { 0, 1, 2, 3, 4, 5, 6, 7,
> +  8, 9, 10, 11, 12, 13, 14, 15,
> +  32, 33, 34, 35, 36, 37, 38, 39,
> +  40,41, 42, 43, 44, 45, 46, 47 });
> +}
> +
> +v64qi
> +foo_v64qi (v64qi x)
> +{
> +  return __builtin_shuffle (x, (v64qi) { 0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0,
> +  0, 0, 0, 0, 0, 0, 0, 0 },
> +(v64qi) {0, 1, 2, 3, 4, 5, 6, 7,
> +   8, 9, 

Re: [PATCH] middle-end/98726 - fix VECTOR_CST element access

2021-04-23 Thread Richard Biener
On Fri, 23 Apr 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > This fixes VECTOR_CST element access with POLY_INT elements and
> > allows to produce dump files of the PR98726 testcase without
> > ICEing.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> >
> > Thanks,
> > Richard.
> >
> > 2021-01-26  Richard Biener  
> >
> > PR middle-end/98726
> > * tree.h (vector_cst_int_elt): Remove.
> > * tree.c (vector_cst_int_elt): Use poly_wide_int for computations,
> > make static.
> 
> Is it OK to backport this to GCC 10, along with the follow-on rtl patch?
> (1b5f74e8be4dd7abe5624ff60adceff19ca71bda)?

Yes.

Richard.

> Richard
> 
> > ---
> >  gcc/tree.c | 10 +-
> >  gcc/tree.h |  1 -
> >  2 files changed, 5 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/tree.c b/gcc/tree.c
> > index 287e5001dc3..f9d57e6d409 100644
> > --- a/gcc/tree.c
> > +++ b/gcc/tree.c
> > @@ -11079,13 +11079,13 @@ build_opaque_vector_type (tree innertype, 
> > poly_int64 nunits)
> >  
> >  /* Return the value of element I of VECTOR_CST T as a wide_int.  */
> >  
> > -wide_int
> > +static poly_wide_int
> >  vector_cst_int_elt (const_tree t, unsigned int i)
> >  {
> >/* First handle elements that are directly encoded.  */
> >unsigned int encoded_nelts = vector_cst_encoded_nelts (t);
> >if (i < encoded_nelts)
> > -return wi::to_wide (VECTOR_CST_ENCODED_ELT (t, i));
> > +return wi::to_poly_wide (VECTOR_CST_ENCODED_ELT (t, i));
> >  
> >/* Identify the pattern that contains element I and work out the index of
> >   the last encoded element for that pattern.  */
> > @@ -11096,13 +11096,13 @@ vector_cst_int_elt (const_tree t, unsigned int i)
> >  
> >/* If there are no steps, the final encoded value is the right one.  */
> >if (!VECTOR_CST_STEPPED_P (t))
> > -return wi::to_wide (VECTOR_CST_ENCODED_ELT (t, final_i));
> > +return wi::to_poly_wide (VECTOR_CST_ENCODED_ELT (t, final_i));
> >  
> >/* Otherwise work out the value from the last two encoded elements.  */
> >tree v1 = VECTOR_CST_ENCODED_ELT (t, final_i - npatterns);
> >tree v2 = VECTOR_CST_ENCODED_ELT (t, final_i);
> > -  wide_int diff = wi::to_wide (v2) - wi::to_wide (v1);
> > -  return wi::to_wide (v2) + (count - 2) * diff;
> > +  poly_wide_int diff = wi::to_poly_wide (v2) - wi::to_poly_wide (v1);
> > +  return wi::to_poly_wide (v2) + (count - 2) * diff;
> >  }
> >  
> >  /* Return the value of element I of VECTOR_CST T.  */
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index 02b03d1f68e..17a811c02e8 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -4762,7 +4762,6 @@ extern tree last_field (const_tree) ATTRIBUTE_NONNULL 
> > (1);
> >  extern bool initializer_zerop (const_tree, bool * = NULL);
> >  extern bool initializer_each_zero_or_onep (const_tree);
> >  
> > -extern wide_int vector_cst_int_elt (const_tree, unsigned int);
> >  extern tree vector_cst_elt (const_tree, unsigned int);
> >  
> >  /* Given a vector VEC, return its first element if all elements are
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] middle-end/98726 - fix VECTOR_CST element access

2021-04-23 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> This fixes VECTOR_CST element access with POLY_INT elements and
> allows to produce dump files of the PR98726 testcase without
> ICEing.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
>
> Thanks,
> Richard.
>
> 2021-01-26  Richard Biener  
>
>   PR middle-end/98726
>   * tree.h (vector_cst_int_elt): Remove.
>   * tree.c (vector_cst_int_elt): Use poly_wide_int for computations,
>   make static.

Is it OK to backport this to GCC 10, along with the follow-on rtl patch?
(1b5f74e8be4dd7abe5624ff60adceff19ca71bda)?

Richard

> ---
>  gcc/tree.c | 10 +-
>  gcc/tree.h |  1 -
>  2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 287e5001dc3..f9d57e6d409 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -11079,13 +11079,13 @@ build_opaque_vector_type (tree innertype, 
> poly_int64 nunits)
>  
>  /* Return the value of element I of VECTOR_CST T as a wide_int.  */
>  
> -wide_int
> +static poly_wide_int
>  vector_cst_int_elt (const_tree t, unsigned int i)
>  {
>/* First handle elements that are directly encoded.  */
>unsigned int encoded_nelts = vector_cst_encoded_nelts (t);
>if (i < encoded_nelts)
> -return wi::to_wide (VECTOR_CST_ENCODED_ELT (t, i));
> +return wi::to_poly_wide (VECTOR_CST_ENCODED_ELT (t, i));
>  
>/* Identify the pattern that contains element I and work out the index of
>   the last encoded element for that pattern.  */
> @@ -11096,13 +11096,13 @@ vector_cst_int_elt (const_tree t, unsigned int i)
>  
>/* If there are no steps, the final encoded value is the right one.  */
>if (!VECTOR_CST_STEPPED_P (t))
> -return wi::to_wide (VECTOR_CST_ENCODED_ELT (t, final_i));
> +return wi::to_poly_wide (VECTOR_CST_ENCODED_ELT (t, final_i));
>  
>/* Otherwise work out the value from the last two encoded elements.  */
>tree v1 = VECTOR_CST_ENCODED_ELT (t, final_i - npatterns);
>tree v2 = VECTOR_CST_ENCODED_ELT (t, final_i);
> -  wide_int diff = wi::to_wide (v2) - wi::to_wide (v1);
> -  return wi::to_wide (v2) + (count - 2) * diff;
> +  poly_wide_int diff = wi::to_poly_wide (v2) - wi::to_poly_wide (v1);
> +  return wi::to_poly_wide (v2) + (count - 2) * diff;
>  }
>  
>  /* Return the value of element I of VECTOR_CST T.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 02b03d1f68e..17a811c02e8 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -4762,7 +4762,6 @@ extern tree last_field (const_tree) ATTRIBUTE_NONNULL 
> (1);
>  extern bool initializer_zerop (const_tree, bool * = NULL);
>  extern bool initializer_each_zero_or_onep (const_tree);
>  
> -extern wide_int vector_cst_int_elt (const_tree, unsigned int);
>  extern tree vector_cst_elt (const_tree, unsigned int);
>  
>  /* Given a vector VEC, return its first element if all elements are


Re: [PATCH] Add folding and remove expanders for x86 *pcmp{et,gt}* builtins [PR target/98911]

2021-04-23 Thread Hongtao Liu via Gcc-patches
On Fri, Apr 23, 2021 at 3:18 PM Uros Bizjak  wrote:
>
> On Fri, Apr 23, 2021 at 9:15 AM Hongtao Liu  wrote:
> >
> > On Fri, Apr 23, 2021 at 2:50 PM Uros Bizjak  wrote:
> > >
> > > On Fri, Apr 23, 2021 at 8:36 AM Hongtao Liu  wrote:
> > > >
> > > > Hi:
> > > >   The patch is a follow-up to
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564320.html.
> > > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > > >   Ok for trunk?
> > > > gcc/ChangeLog:
> > > >
> > > > PR target/98911
> > > > * config/i386/i386-builtin.def (BDESC): Change the icode of
> > > > the following builtins to CODE_FOR_nothing.
> > > > * config/i386/i386.c (ix86_gimple_fold_builtin): Fold
> > > > IX86_BUILTIN_PCMPEQB128, IX86_BUILTIN_PCMPEQW128,
> > > > IX86_BUILTIN_PCMPEQD128, IX86_BUILTIN_PCMPEQQ,
> > > > IX86_BUILTIN_PCMPEQB256, IX86_BUILTIN_PCMPEQW256,
> > > > IX86_BUILTIN_PCMPEQD256, IX86_BUILTIN_PCMPEQQ256,
> > > > IX86_BUILTIN_PCMPGTB128, IX86_BUILTIN_PCMPGTW128,
> > > > IX86_BUILTIN_PCMPGTD128, IX86_BUILTIN_PCMPGTQ,
> > > > IX86_BUILTIN_PCMPGTB256, IX86_BUILTIN_PCMPGTW256,
> > > > IX86_BUILTIN_PCMPGTD256, IX86_BUILTIN_PCMPGTQ256.
> > > > * config/i386/sse.md (avx2_eq3): Deleted.
> > > > (sse2_eq3): Ditto.
> > > > (sse2_gt3): Rename to ..
> > > > (*sse2_gt3): .. this.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR target/98911
> > > > * gcc.target/i386/pr98911.c: New test.
> > > > * gcc.target/i386/funcspec-8.c: Remove __builtin_ia32_pcmpgtq
> > > > since it has been folded.
> > >
> > >
> > > -(define_expand "sse2_eq3"
> > > -  [(set (match_operand:VI124_128 0 "register_operand")
> > > -(eq:VI124_128
> > > -  (match_operand:VI124_128 1 "vector_operand")
> > > -  (match_operand:VI124_128 2 "vector_operand")))]
> > > -  "TARGET_SSE2 && !TARGET_XOP "
> > > -  "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);")
> > > -
> > >  (define_expand "sse4_1_eqv2di3"
> > >[(set (match_operand:V2DI 0 "register_operand")
> > >  (eq:V2DI
> > >
> > > You can also remove sse4_1_eqv2di3 expander.
> >
> > Oh, yes.
> >
> > >
> > > -#ifdef __SSE4_2__
> > > -#error "-msse4.2 should not be set for this test"
> > > -#endif
> > > -
> > > -__m128i sse4_2_pcmpgtq (__m128i a, __m128i b)
> > > __attribute__((__target__("sse4.2")));
> > > -__m128i generic_pcmpgtq (__m128i ab, __m128i b);
> > > -
> > > -__m128i
> > > -sse4_2_pcmpgtq (__m128i a, __m128i b)
> > > -{
> > > -  return __builtin_ia32_pcmpgtq (a, b);
> > > -}
> > > -
> > > -__m128i
> > > -generic_pcmpgtq (__m128i a, __m128i b)
> > > -{
> > > -  return __builtin_ia32_pcmpgtq (a, b);/* { dg-error
> > > "needs isa option" } */
> > > -}
> > >
> > > Why remove the above? It is testing isa options, it has nothing to do
> > > with improved folding.
> >
> > If the backend does not support the corresponding instruction, the
> > vector operation will be automatically lowered to scalar, so no error
> > will be reported
>
> I see. It would be nice to change the test to use some other SSE4.2
> builtin (there are plenty of packed compares) and not remove it
> altogether.

Updated patch.

>
> Uros.



-- 
BR,
Hongtao
From 31b3110300b9661b5a7bb5811d487ea35dbab8e9 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 23 Feb 2021 11:17:40 +0800
Subject: [PATCH] Add folding and remove expanders for x86 *pcmp{et,gt}*
 builtins [PR target/98911]

gcc/ChangeLog:

	PR target/98911
	* config/i386/i386-builtin.def (BDESC): Change the icode of
	the following builtins to CODE_FOR_nothing.
	* config/i386/i386.c (ix86_gimple_fold_builtin): Fold
	IX86_BUILTIN_PCMPEQB128, IX86_BUILTIN_PCMPEQW128,
	IX86_BUILTIN_PCMPEQD128, IX86_BUILTIN_PCMPEQQ,
	IX86_BUILTIN_PCMPEQB256, IX86_BUILTIN_PCMPEQW256,
	IX86_BUILTIN_PCMPEQD256, IX86_BUILTIN_PCMPEQQ256,
	IX86_BUILTIN_PCMPGTB128, IX86_BUILTIN_PCMPGTW128,
	IX86_BUILTIN_PCMPGTD128, IX86_BUILTIN_PCMPGTQ,
	IX86_BUILTIN_PCMPGTB256, IX86_BUILTIN_PCMPGTW256,
	IX86_BUILTIN_PCMPGTD256, IX86_BUILTIN_PCMPGTQ256.
	* config/i386/sse.md (avx2_eq3): Deleted.
	(sse2_eq3): Ditto.
	(sse4_1_eqv2di3): Ditto.
	(sse2_gt3): Rename to ..
	(*sse2_gt3): .. this.

gcc/testsuite/ChangeLog:

	PR target/98911
	* gcc.target/i386/pr98911.c: New test.
	* gcc.target/i386/funcspec-8.c: Replace __builtin_ia32_pcmpgtq
	with __builtin_ia32_pcmpistrm128 since it has been folded.
---
 gcc/config/i386/i386-builtin.def   |  32 +++---
 gcc/config/i386/i386.c |  44 
 gcc/config/i386/sse.md |  26 +
 gcc/testsuite/gcc.target/i386/funcspec-8.c |  17 +--
 gcc/testsuite/gcc.target/i386/pr98911.c| 116 +
 5 files changed, 186 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr98911.c

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index e3ed4e1578f..4dbd4f23647 

Re: [PATCH] Avoid more temporaries in IVOPTs

2021-04-23 Thread Richard Biener via Gcc-patches
On Wed, Apr 14, 2021 at 2:41 PM Richard Biener  wrote:
>
> This avoids use of valid_gimple_rhs_p and instead gimplifies to
> such a RHS, avoiding more SSA copies being generated by IVOPTs.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1

g:b26485f1af45423980b7bc1206411cf4b8bb84b6

> 2021-04-14  Richard Biener  
>
> * tree-ssa-loop-ivopts.c (rewrite_use_nonlinear_expr): Avoid
> valid_gimple_rhs_p by instead gimplifying to one.
> ---
>  gcc/tree-ssa-loop-ivopts.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> index 4012ae3f19d..12a8a49a307 100644
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -7286,12 +7286,13 @@ rewrite_use_nonlinear_expr (struct ivopts_data *data,
>  }
>
>comp = fold_convert (type, comp);
> -  if (!valid_gimple_rhs_p (comp)
> -  || (gimple_code (use->stmt) != GIMPLE_PHI
> - /* We can't allow re-allocating the stmt as it might be pointed
> -to still.  */
> - && (get_gimple_rhs_num_ops (TREE_CODE (comp))
> - >= gimple_num_ops (gsi_stmt (bsi)
> +  comp = force_gimple_operand (comp, , false, NULL);
> +  gimple_seq_add_seq (_list, seq);
> +  if (gimple_code (use->stmt) != GIMPLE_PHI
> +  /* We can't allow re-allocating the stmt as it might be pointed
> +to still.  */
> +  && (get_gimple_rhs_num_ops (TREE_CODE (comp))
> + >= gimple_num_ops (gsi_stmt (bsi
>  {
>comp = force_gimple_operand (comp, , true, NULL);
>gimple_seq_add_seq (_list, seq);
> --
> 2.26.2


Re: [PATCH] Bump LTO_major_version to 11.

2021-04-23 Thread Martin Liška
On 4/23/21 9:28 AM, Richard Biener wrote:
> On Tue, Apr 20, 2021 at 8:49 PM Martin Liška  wrote:
>>
>> On 4/20/21 2:46 PM, Richard Biener wrote:
>>> OK.  Can you somehow arrange for trunk to pick up LTO_major from GCC
>>> major automagically then?
>>
>> I have a pretty nice solution for it where I extended (and simplified)
>> the existing gcov-iov.c generator. Doing that we can remove gcc/version.[ch].
>>
>> Using the patch, the following version.h is generated:
>>
>> #ifndef VERSION_H
>> #define VERSION_H
>>
>> /* Generated automatically by genversion.  */
>>
>> #define GCC_major_version 12
>>
>> /* The complete version string, assembled from several pieces.
>> BASEVER, DATESTAMP, DEVPHASE, and REVISION are defined by the
>> Makefile.  */
>>
>> #define version_string "12.0.0 20210420 (experimental)"
>> #define pkgversion_string "(GCC) "
>>
>> /* This is the location of the online document giving instructions for
>> reporting bugs.  If you distribute a modified version of GCC,
>> please configure with --with-bugurl pointing to a document giving
>> instructions for reporting bugs to you, not us.  (You are of course
>> welcome to forward us bugs reported to you, if you determine that
>> they are not bugs in your modifications.)  */
>>
>> #define bug_report_url ""
>>
>> #define GCOV_VERSION ((gcov_unsigned_t)0x42323020)  /* B20  */
>>
>> #endif /* VERSION_H */
>>
>> Ready for master?
> 
> Nice.  This is OK if others do not have further comments.

Thanks, I'm going to install it once GCC 11.1 is released.

> 
> I think we'd want to explore whether we can integrate
> genchecksum.c as well and make the PCH checksum
> based on a set of source files (including the generated
> auto-host.h) - that might allow removing the two-stage

Definitely. I see multiple options:

1) using git, it can make provide a hash for content of a folder:

$ git ls-tree HEAD -- gcc

04 tree db613554ec17462c63bace2015c877d6bed70bbegcc

One can do that per-file as well:
git ls-tree HEAD -- gcc/c/*.c

100644 blob bae5757ad137c0af58dbe66229d4201a45094acagcc/c/c-aux-info.c

100644 blob d0035a31723447657a04c2ef79c9fd7c0ddc7568gcc/c/c-convert.c

100644 blob 3ea4708c5075d9274601a0676f86a6900a9345b0gcc/c/c-decl.c

100644 blob de98958ceabac9d631f937f9f28547d8aed26af2gcc/c/c-errors.c

100644 blob 68c74cc1eb2ef908545b36e2dbff65606f756e15gcc/c/c-fold.c

...

That needs to be combined with the generated auto-host.h header file.
>From which locations do you want to build the hash? Any other $objdir
files except auto-host.h?

Note 'git archive' can append arbitrary non-git files.

2) Doing checksum of *.[cC] in a given folder + auto-host.h.

3) Using git hash (+ auto-host.h), but it's likely too gross, right?

> link and my "hack" to re-use the version from prev-gcc
> as well as our openSUSE "hack" for reproducible builds
> which elides genchecksum.c for the use of the build-id
> in the actual executables.

What a hack. The binary is reading it's buildid right from the memory,
right?

Thoughts?

Martin

> 
> Richard.
> 
>> Thanks,
>> Martin



Re: [PATCH] tree-optimization/99971 - improve BB vect dependence analysis

2021-04-23 Thread Richard Biener via Gcc-patches
On Fri, Apr 9, 2021 at 10:28 AM Richard Biener  wrote:
>
> We can use TBAA even when we have a DR, do so.  For the testcase
> that means fully vectorizing it instead of only vectorizing
> the first store group resulting in suboptimal code.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.

g:700e542971251b11623cce877075567815f72965

> 2021-04-09  Richard Biener  
>
> PR tree-optimization/99971
> * tree-vect-data-refs.c (vect_slp_analyze_node_dependences):
> Always use TBAA for loads.
>
> * g++.dg/vect/slp-pr99971.cc: New testcase.
> ---
>  gcc/testsuite/g++.dg/vect/slp-pr99971.cc | 36 
>  gcc/tree-vect-data-refs.c| 18 +++-
>  2 files changed, 47 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/vect/slp-pr99971.cc
>
> diff --git a/gcc/testsuite/g++.dg/vect/slp-pr99971.cc 
> b/gcc/testsuite/g++.dg/vect/slp-pr99971.cc
> new file mode 100644
> index 000..bec6418d4e8
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/slp-pr99971.cc
> @@ -0,0 +1,36 @@
> +// { dg-do compile }
> +// { dg-require-effective-target vect_int }
> +
> +struct A
> +{
> +  unsigned int a, b, c, d;
> +
> +  A& operator+= (A const& that)
> +{
> +  a += that.a;
> +  b += that.b;
> +  c += that.c;
> +  d += that.d;
> +  return *this;
> +}
> +
> +  A& operator-= (A const& that)
> +{
> +  a -= that.a;
> +  b -= that.b;
> +  c -= that.c;
> +  d -= that.d;
> +  return *this;
> +}
> +};
> +
> +void test(A& x, A const& y1, A const& y2)
> +{
> +  x += y1;
> +  x -= y2;
> +}
> +
> +// We want to SLP vectorize a single connected SLP subgraph with two 
> instances
> +// { dg-final { scan-tree-dump-not "removing SLP instance" "slp2" } }
> +// { dg-final { scan-tree-dump-times "SLPing BB part" 1 "slp2" } }
> +// { dg-final { scan-tree-dump-times "Vectorizing SLP" 2 "slp2" } }
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index ee266ba62a8..6ea5e3a3eda 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -780,16 +780,20 @@ vect_slp_analyze_node_dependences (vec_info *vinfo, 
> slp_tree node,
>  stmt we have to resort to the alias oracle.  */
>   stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
>   data_reference *dr_b = STMT_VINFO_DATA_REF (stmt_info);
> - if (!dr_b)
> +
> + /* We are hoisting a load - this means we can use
> +TBAA for disambiguation.  */
> + if (!ref_initialized_p)
> +   ao_ref_init (, DR_REF (dr_a));
> + if (stmt_may_clobber_ref_p_1 (stmt, , true))
> {
> - /* We are hoisting a load - this means we can use
> -TBAA for disambiguation.  */
> - if (!ref_initialized_p)
> -   ao_ref_init (, DR_REF (dr_a));
> - if (stmt_may_clobber_ref_p_1 (stmt, , true))
> + if (!dr_b)
> return false;
> - continue;
> + /* Resort to dependence checking below.  */
> }
> + else
> +   /* No dependence.  */
> +   continue;
>
>   bool dependent = false;
>   /* If we run into a store of this same instance (we've just
> --
> 2.26.2


Re: [PATCH] Bump LTO_major_version to 11.

2021-04-23 Thread Richard Biener via Gcc-patches
On Tue, Apr 20, 2021 at 8:49 PM Martin Liška  wrote:
>
> On 4/20/21 2:46 PM, Richard Biener wrote:
> > OK.  Can you somehow arrange for trunk to pick up LTO_major from GCC
> > major automagically then?
>
> I have a pretty nice solution for it where I extended (and simplified)
> the existing gcov-iov.c generator. Doing that we can remove gcc/version.[ch].
>
> Using the patch, the following version.h is generated:
>
> #ifndef VERSION_H
> #define VERSION_H
>
> /* Generated automatically by genversion.  */
>
> #define GCC_major_version 12
>
> /* The complete version string, assembled from several pieces.
> BASEVER, DATESTAMP, DEVPHASE, and REVISION are defined by the
> Makefile.  */
>
> #define version_string "12.0.0 20210420 (experimental)"
> #define pkgversion_string "(GCC) "
>
> /* This is the location of the online document giving instructions for
> reporting bugs.  If you distribute a modified version of GCC,
> please configure with --with-bugurl pointing to a document giving
> instructions for reporting bugs to you, not us.  (You are of course
> welcome to forward us bugs reported to you, if you determine that
> they are not bugs in your modifications.)  */
>
> #define bug_report_url ""
>
> #define GCOV_VERSION ((gcov_unsigned_t)0x42323020)  /* B20  */
>
> #endif /* VERSION_H */
>
> Ready for master?

Nice.  This is OK if others do not have further comments.

I think we'd want to explore whether we can integrate
genchecksum.c as well and make the PCH checksum
based on a set of source files (including the generated
auto-host.h) - that might allow removing the two-stage
link and my "hack" to re-use the version from prev-gcc
as well as our openSUSE "hack" for reproducible builds
which elides genchecksum.c for the use of the build-id
in the actual executables.

Richard.

> Thanks,
> Martin


Re: [PATCH] Add folding and remove expanders for x86 *pcmp{et,gt}* builtins [PR target/98911]

2021-04-23 Thread Uros Bizjak via Gcc-patches
On Fri, Apr 23, 2021 at 9:15 AM Hongtao Liu  wrote:
>
> On Fri, Apr 23, 2021 at 2:50 PM Uros Bizjak  wrote:
> >
> > On Fri, Apr 23, 2021 at 8:36 AM Hongtao Liu  wrote:
> > >
> > > Hi:
> > >   The patch is a follow-up to
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564320.html.
> > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > >   Ok for trunk?
> > > gcc/ChangeLog:
> > >
> > > PR target/98911
> > > * config/i386/i386-builtin.def (BDESC): Change the icode of
> > > the following builtins to CODE_FOR_nothing.
> > > * config/i386/i386.c (ix86_gimple_fold_builtin): Fold
> > > IX86_BUILTIN_PCMPEQB128, IX86_BUILTIN_PCMPEQW128,
> > > IX86_BUILTIN_PCMPEQD128, IX86_BUILTIN_PCMPEQQ,
> > > IX86_BUILTIN_PCMPEQB256, IX86_BUILTIN_PCMPEQW256,
> > > IX86_BUILTIN_PCMPEQD256, IX86_BUILTIN_PCMPEQQ256,
> > > IX86_BUILTIN_PCMPGTB128, IX86_BUILTIN_PCMPGTW128,
> > > IX86_BUILTIN_PCMPGTD128, IX86_BUILTIN_PCMPGTQ,
> > > IX86_BUILTIN_PCMPGTB256, IX86_BUILTIN_PCMPGTW256,
> > > IX86_BUILTIN_PCMPGTD256, IX86_BUILTIN_PCMPGTQ256.
> > > * config/i386/sse.md (avx2_eq3): Deleted.
> > > (sse2_eq3): Ditto.
> > > (sse2_gt3): Rename to ..
> > > (*sse2_gt3): .. this.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/98911
> > > * gcc.target/i386/pr98911.c: New test.
> > > * gcc.target/i386/funcspec-8.c: Remove __builtin_ia32_pcmpgtq
> > > since it has been folded.
> >
> >
> > -(define_expand "sse2_eq3"
> > -  [(set (match_operand:VI124_128 0 "register_operand")
> > -(eq:VI124_128
> > -  (match_operand:VI124_128 1 "vector_operand")
> > -  (match_operand:VI124_128 2 "vector_operand")))]
> > -  "TARGET_SSE2 && !TARGET_XOP "
> > -  "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);")
> > -
> >  (define_expand "sse4_1_eqv2di3"
> >[(set (match_operand:V2DI 0 "register_operand")
> >  (eq:V2DI
> >
> > You can also remove sse4_1_eqv2di3 expander.
>
> Oh, yes.
>
> >
> > -#ifdef __SSE4_2__
> > -#error "-msse4.2 should not be set for this test"
> > -#endif
> > -
> > -__m128i sse4_2_pcmpgtq (__m128i a, __m128i b)
> > __attribute__((__target__("sse4.2")));
> > -__m128i generic_pcmpgtq (__m128i ab, __m128i b);
> > -
> > -__m128i
> > -sse4_2_pcmpgtq (__m128i a, __m128i b)
> > -{
> > -  return __builtin_ia32_pcmpgtq (a, b);
> > -}
> > -
> > -__m128i
> > -generic_pcmpgtq (__m128i a, __m128i b)
> > -{
> > -  return __builtin_ia32_pcmpgtq (a, b);/* { dg-error
> > "needs isa option" } */
> > -}
> >
> > Why remove the above? It is testing isa options, it has nothing to do
> > with improved folding.
>
> If the backend does not support the corresponding instruction, the
> vector operation will be automatically lowered to scalar, so no error
> will be reported

I see. It would be nice to change the test to use some other SSE4.2
builtin (there are plenty of packed compares) and not remove it
altogether.

Uros.


Re: [PATCH] Add folding and remove expanders for x86 *pcmp{et,gt}* builtins [PR target/98911]

2021-04-23 Thread Hongtao Liu via Gcc-patches
On Fri, Apr 23, 2021 at 2:50 PM Uros Bizjak  wrote:
>
> On Fri, Apr 23, 2021 at 8:36 AM Hongtao Liu  wrote:
> >
> > Hi:
> >   The patch is a follow-up to
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564320.html.
> >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> >   Ok for trunk?
> > gcc/ChangeLog:
> >
> > PR target/98911
> > * config/i386/i386-builtin.def (BDESC): Change the icode of
> > the following builtins to CODE_FOR_nothing.
> > * config/i386/i386.c (ix86_gimple_fold_builtin): Fold
> > IX86_BUILTIN_PCMPEQB128, IX86_BUILTIN_PCMPEQW128,
> > IX86_BUILTIN_PCMPEQD128, IX86_BUILTIN_PCMPEQQ,
> > IX86_BUILTIN_PCMPEQB256, IX86_BUILTIN_PCMPEQW256,
> > IX86_BUILTIN_PCMPEQD256, IX86_BUILTIN_PCMPEQQ256,
> > IX86_BUILTIN_PCMPGTB128, IX86_BUILTIN_PCMPGTW128,
> > IX86_BUILTIN_PCMPGTD128, IX86_BUILTIN_PCMPGTQ,
> > IX86_BUILTIN_PCMPGTB256, IX86_BUILTIN_PCMPGTW256,
> > IX86_BUILTIN_PCMPGTD256, IX86_BUILTIN_PCMPGTQ256.
> > * config/i386/sse.md (avx2_eq3): Deleted.
> > (sse2_eq3): Ditto.
> > (sse2_gt3): Rename to ..
> > (*sse2_gt3): .. this.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/98911
> > * gcc.target/i386/pr98911.c: New test.
> > * gcc.target/i386/funcspec-8.c: Remove __builtin_ia32_pcmpgtq
> > since it has been folded.
>
>
> -(define_expand "sse2_eq3"
> -  [(set (match_operand:VI124_128 0 "register_operand")
> -(eq:VI124_128
> -  (match_operand:VI124_128 1 "vector_operand")
> -  (match_operand:VI124_128 2 "vector_operand")))]
> -  "TARGET_SSE2 && !TARGET_XOP "
> -  "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);")
> -
>  (define_expand "sse4_1_eqv2di3"
>[(set (match_operand:V2DI 0 "register_operand")
>  (eq:V2DI
>
> You can also remove sse4_1_eqv2di3 expander.

Oh, yes.

>
> -#ifdef __SSE4_2__
> -#error "-msse4.2 should not be set for this test"
> -#endif
> -
> -__m128i sse4_2_pcmpgtq (__m128i a, __m128i b)
> __attribute__((__target__("sse4.2")));
> -__m128i generic_pcmpgtq (__m128i ab, __m128i b);
> -
> -__m128i
> -sse4_2_pcmpgtq (__m128i a, __m128i b)
> -{
> -  return __builtin_ia32_pcmpgtq (a, b);
> -}
> -
> -__m128i
> -generic_pcmpgtq (__m128i a, __m128i b)
> -{
> -  return __builtin_ia32_pcmpgtq (a, b);/* { dg-error
> "needs isa option" } */
> -}
>
> Why remove the above? It is testing isa options, it has nothing to do
> with improved folding.

If the backend does not support the corresponding instruction, the
vector operation will be automatically lowered to scalar, so no error
will be reported

>
> Uros.



-- 
BR,
Hongtao


Re: [PATCH 1/2] Generate overlapping operations between two areas of memory

2021-04-23 Thread Richard Biener via Gcc-patches
On Fri, Apr 23, 2021 at 1:35 AM H.J. Lu via Gcc-patches
 wrote:
>
> For op_by_pieces operations between two areas of memory on non-strict
> alignment target, add -foverlap-op-by-pieces=[off|on|max-memset] to
> generate overlapping operations to minimize number of operations if it
> is not a stack push which must not overlap.
>
> When operating on LENGTH bytes of memory, -foverlap-op-by-pieces=on
> starts with the widest usable integer size, MAX_SIZE, for LENGTH bytes
> and finishes with the smallest usable integer size, MIN_SIZE, for the
> remaining bytes where MAX_SIZE >= MIN_SIZE.  If MIN_SIZE > the remaining
> bytes, the last operation is performed on MIN_SIZE bytes of overlapping
> memory from the previous operation.
>
> For memset with non-zero byte, -foverlap-op-by-pieces=max-memset generates
> an overlapping fill with MAX_SIZE if the number of the remaining bytes is
> greater than one.
>
> Tested on Linux/x86-64 with both -foverlap-op-by-pieces enabled and
> disabled by default.

Neither the user documentation nor the patch description tells me what
"generate overlapping operations" does.  I _suspect_ it's doing an
offset adjusted read/write of the last piece of a memory region to
avoid doing more than one smaller operations.  Thus for a region
of size 7 and 4-byte granular ops you'd do operations at
offset 0 and 3 rather than one at 0, a two-byte at offset 4 and
a one-byte at offset 7.

When the tail is of power-of-two size you still generate non-overlapping
ops?

For memmove there's a correctness issue so you have to make sure
to first load the last two ops before performing the stores which
increases register pressure.

I'm not sure we want a -f option to control this - not all targets will
be able to support this.  So I'd use a target hook or rather extend
the existing use_by_pieces_infrastructure_p hook with an alternate
return (some flags bitmask I guess).  We do have one extra
target hook, compare_by_pieces_branch_ratio, so by that using
an alternate hook might be also OK.

Adding a -m option in targets that want this user-controllable would
be OK of course.

Richard.

> gcc/
>
> PR middl-end/90773
> * common.opt (-foverlap-op-by-pieces): New.
> * expr.c (by_pieces_ninsns): If -foverlap-op-by-pieces is enabled,
> round up size and alignment to the widest integer mode for maximum
> size
> (op_by_pieces_d): Add get_usable_mode, m_push and
> m_non_zero_memset.
> (op_by_pieces_d::op_by_pieces_d): Add 2 bool arguments to
> initialize m_push and m_non_zero_memset.
> (op_by_pieces_d::get_usable_mode): New.
> (op_by_pieces_d::run): Use get_usable_mode to get the largest
> usable integer mode and generate overlapping operations for
> -foverlap-op-by-pieces.
> (PUSHG_P): New.
> (move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d
> change.
> (store_by_pieces_d::store_by_pieces_d): Likewise.
> (clear_by_pieces): Likewsie.
> * toplev.c (process_options): Issue an error when
> -foverlap-op-by-pieces is used for strict alignment target.
> * doc/invoke.texi: Document -foverlap-op-by-pieces.
>
> gcc/testsuite/
>
> PR middl-end/90773
> * g++.dg/pr90773-1.h: New test.
> * g++.dg/pr90773-1a.C: Likewise.
> * g++.dg/pr90773-1b.C: Likewise.
> * g++.dg/pr90773-1c.C: Likewise.
> * g++.dg/pr90773-1d.C: Likewise.
> * gcc.target/i386/pr90773-1.c: Likewise.
> * gcc.target/i386/pr90773-2.c: Likewise.
> * gcc.target/i386/pr90773-3.c: Likewise.
> * gcc.target/i386/pr90773-4.c: Likewise.
> * gcc.target/i386/pr90773-5.c: Likewise.
> * gcc.target/i386/pr90773-6.c: Likewise.
> * gcc.target/i386/pr90773-7.c: Likewise.
> * gcc.target/i386/pr90773-8.c: Likewise.
> * gcc.target/i386/pr90773-9.c: Likewise.
> * gcc.target/i386/pr90773-10.c: Likewise.
> * gcc.target/i386/pr90773-11.c: Likewise.
> ---
>  gcc/common.opt |  19 +++
>  gcc/doc/invoke.texi|  14 ++
>  gcc/expr.c | 159 -
>  gcc/testsuite/g++.dg/pr90773-1.h   |  14 ++
>  gcc/testsuite/g++.dg/pr90773-1a.C  |  13 ++
>  gcc/testsuite/g++.dg/pr90773-1b.C  |   5 +
>  gcc/testsuite/g++.dg/pr90773-1c.C  |   5 +
>  gcc/testsuite/g++.dg/pr90773-1d.C  |  19 +++
>  gcc/testsuite/gcc.target/i386/pr90773-1.c  |  17 +++
>  gcc/testsuite/gcc.target/i386/pr90773-10.c |  13 ++
>  gcc/testsuite/gcc.target/i386/pr90773-11.c |  13 ++
>  gcc/testsuite/gcc.target/i386/pr90773-2.c  |  20 +++
>  gcc/testsuite/gcc.target/i386/pr90773-3.c  |  23 +++
>  gcc/testsuite/gcc.target/i386/pr90773-4.c  |  13 ++
>  gcc/testsuite/gcc.target/i386/pr90773-5.c  |  13 ++
>  gcc/testsuite/gcc.target/i386/pr90773-6.c  |  11 ++
>  

Re: [PATCH 2/2] bpf: allow BSS symbols to be global symbols

2021-04-23 Thread Jose E. Marchesi via Gcc-patches


Hi YiFei.

> Prior to this, a BSS declaration such as:
>
>   int foo;
>   static int bar;
>
> Generates:
>
>   .global foo
>   .local  foo
>   .comm   foo,4,4
>   .local  bar
>   .comm bar,4,4
>
> Creating symbols:
>
>    b foo
>   0004 b bar
>
> Both symbols are local. However, libbpf bpf_object__variable_offset
> rquires symbols to be STB_GLOBAL & STT_OBJECT for data section lookup.
> This patch makes the same declaration generate:
>
>   .global foo
>   .type   foo, @object
>   .lcomm  foo,4,4
>   .local  bar
>   .comm   bar,4,4
>
> Creating symbols:
>
>    B foo
>   0004 b bar
>
> And libbpf will be okay with looking up the global symbol "foo".

Thanks for the patch.
This is OK for both master and GCC 10.


Re: [PATCH] config/i386: Commentary typo fix

2021-04-23 Thread Richard Biener via Gcc-patches
On Fri, Apr 23, 2021 at 12:19 AM Bernhard Reutner-Fischer via
Gcc-patches  wrote:
>
> From: Bernhard Reutner-Fischer 

OK

> gcc/ChangeLog:
>
> * config/i386/x86-tune-sched-bd.c (dispatch_group): Commentary
> typo fix.
> ---
>  gcc/config/i386/x86-tune-sched-bd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/x86-tune-sched-bd.c 
> b/gcc/config/i386/x86-tune-sched-bd.c
> index ad0edf713f5..be38e48b271 100644
> --- a/gcc/config/i386/x86-tune-sched-bd.c
> +++ b/gcc/config/i386/x86-tune-sched-bd.c
> @@ -67,7 +67,7 @@ along with GCC; see the file COPYING3.  If not see
>  #define BIG 100
>
>
> -/* Dispatch groups.  Istructions that affect the mix in a dispatch window.  
> */
> +/* Dispatch groups.  Instructions that affect the mix in a dispatch window.  
> */
>  enum dispatch_group {
>disp_no_group = 0,
>disp_load,
> --
> 2.31.1
>


Re: [PATCH] Add folding and remove expanders for x86 *pcmp{et,gt}* builtins [PR target/98911]

2021-04-23 Thread Uros Bizjak via Gcc-patches
On Fri, Apr 23, 2021 at 8:36 AM Hongtao Liu  wrote:
>
> Hi:
>   The patch is a follow-up to
> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564320.html.
>   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
>   Ok for trunk?
> gcc/ChangeLog:
>
> PR target/98911
> * config/i386/i386-builtin.def (BDESC): Change the icode of
> the following builtins to CODE_FOR_nothing.
> * config/i386/i386.c (ix86_gimple_fold_builtin): Fold
> IX86_BUILTIN_PCMPEQB128, IX86_BUILTIN_PCMPEQW128,
> IX86_BUILTIN_PCMPEQD128, IX86_BUILTIN_PCMPEQQ,
> IX86_BUILTIN_PCMPEQB256, IX86_BUILTIN_PCMPEQW256,
> IX86_BUILTIN_PCMPEQD256, IX86_BUILTIN_PCMPEQQ256,
> IX86_BUILTIN_PCMPGTB128, IX86_BUILTIN_PCMPGTW128,
> IX86_BUILTIN_PCMPGTD128, IX86_BUILTIN_PCMPGTQ,
> IX86_BUILTIN_PCMPGTB256, IX86_BUILTIN_PCMPGTW256,
> IX86_BUILTIN_PCMPGTD256, IX86_BUILTIN_PCMPGTQ256.
> * config/i386/sse.md (avx2_eq3): Deleted.
> (sse2_eq3): Ditto.
> (sse2_gt3): Rename to ..
> (*sse2_gt3): .. this.
>
> gcc/testsuite/ChangeLog:
>
> PR target/98911
> * gcc.target/i386/pr98911.c: New test.
> * gcc.target/i386/funcspec-8.c: Remove __builtin_ia32_pcmpgtq
> since it has been folded.


-(define_expand "sse2_eq3"
-  [(set (match_operand:VI124_128 0 "register_operand")
-(eq:VI124_128
-  (match_operand:VI124_128 1 "vector_operand")
-  (match_operand:VI124_128 2 "vector_operand")))]
-  "TARGET_SSE2 && !TARGET_XOP "
-  "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);")
-
 (define_expand "sse4_1_eqv2di3"
   [(set (match_operand:V2DI 0 "register_operand")
 (eq:V2DI

You can also remove sse4_1_eqv2di3 expander.

-#ifdef __SSE4_2__
-#error "-msse4.2 should not be set for this test"
-#endif
-
-__m128i sse4_2_pcmpgtq (__m128i a, __m128i b)
__attribute__((__target__("sse4.2")));
-__m128i generic_pcmpgtq (__m128i ab, __m128i b);
-
-__m128i
-sse4_2_pcmpgtq (__m128i a, __m128i b)
-{
-  return __builtin_ia32_pcmpgtq (a, b);
-}
-
-__m128i
-generic_pcmpgtq (__m128i a, __m128i b)
-{
-  return __builtin_ia32_pcmpgtq (a, b);/* { dg-error
"needs isa option" } */
-}

Why remove the above? It is testing isa options, it has nothing to do
with improved folding.

Uros.


Re: [PATCH] gcov: Use system IO buffering

2021-04-23 Thread Richard Biener via Gcc-patches
On Thu, Apr 22, 2021 at 9:47 PM Andi Kleen via Gcc-patches
 wrote:
>
> Martin Liška  writes:
>
> > Hey.
> >
> > I/O buffering in gcov seems duplicite to what modern C library can provide.
> > The patch is a simplification and can provide easier interface for system
> > that don't have a filesystem and would like using GCOV.
> >
> > I'm going to install the patch after 11.1 if there are no objections.
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> What happens if someone compiles the C library with gcov?

Yeah, I think this is the wrong direction - we're already having
issues with using
malloc, this makes it much worse.

Richard.

> Being as self contained as possible (only system calls) would seem
> safer.
>
> -Andi


Re: [PATCH] libstdc++: Fix semaphore to work with system_clock timeouts

2021-04-23 Thread Richard Biener via Gcc-patches
On Thu, Apr 22, 2021 at 8:48 PM Jonathan Wakely via Gcc-patches
 wrote:
>
> The __cond_wait_until_impl function takes a steady_clock timeout, but
> then sometimes tries to compare it to a time from the system_clock,
> which won't compile.  Additionally, that function gets called with
> system_clock timeouts, which also won't compile. This makes the function
> accept timeouts for either clock, and compare to the time from the right
> clock.
>
> This fixes the compilation error that was causing two tests to fail on
> non-futex targets, so we can revert the r12-11 change to disable them.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/atomic_timed_wait.h (__cond_wait_until_impl):
> Handle system_clock as well as steady_clock.
> * testsuite/30_threads/semaphore/try_acquire_for.cc: Re-enable.
> * testsuite/30_threads/semaphore/try_acquire_until.cc:
> Re-enable.
>
> I'm testing this now on x86_64-linux, powerpc64le-linux, sparc-linux,
> power-aix and sparc-solaris. It looks good so far, so I'll push to
> trunk when the tests finish.
>
> This should also go to the gcc-11 branch, or the timed waits for
> semaphores can't be used with system_clock times on non-futed targets.

Fine with me.

>
>


Re: [committed] libstdc++: Add options for libatomic to test

2021-04-23 Thread Richard Biener via Gcc-patches
On Thu, Apr 22, 2021 at 6:56 PM Jonathan Wakely via Gcc-patches
 wrote:
>
> This fixes a linker error on AIX:
>
> FAIL: 30_threads/semaphore/try_acquire_posix.cc (test for excess errors)
> Excess errors:
> ld: 0711-317 ERROR: Undefined symbol: .__atomic_fetch_add_8
> ld: 0711-317 ERROR: Undefined symbol: .__atomic_load_8
> ld: 0711-317 ERROR: Undefined symbol: .__atomic_fetch_sub_8
> ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
> collect2: error: ld returned 8 exit status
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/30_threads/semaphore/try_acquire_posix.cc: Add
> options for libatomic.
>
> Tested powerpc64le-linux and powerpc-aix. Committed to trunk.
>
> I'd like to backport this to gcc-11 too.

Fine with me.


[PATCH] Add folding and remove expanders for x86 *pcmp{et,gt}* builtins [PR target/98911]

2021-04-23 Thread Hongtao Liu via Gcc-patches
Hi:
  The patch is a follow-up to
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564320.html.
  Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
  Ok for trunk?
gcc/ChangeLog:

PR target/98911
* config/i386/i386-builtin.def (BDESC): Change the icode of
the following builtins to CODE_FOR_nothing.
* config/i386/i386.c (ix86_gimple_fold_builtin): Fold
IX86_BUILTIN_PCMPEQB128, IX86_BUILTIN_PCMPEQW128,
IX86_BUILTIN_PCMPEQD128, IX86_BUILTIN_PCMPEQQ,
IX86_BUILTIN_PCMPEQB256, IX86_BUILTIN_PCMPEQW256,
IX86_BUILTIN_PCMPEQD256, IX86_BUILTIN_PCMPEQQ256,
IX86_BUILTIN_PCMPGTB128, IX86_BUILTIN_PCMPGTW128,
IX86_BUILTIN_PCMPGTD128, IX86_BUILTIN_PCMPGTQ,
IX86_BUILTIN_PCMPGTB256, IX86_BUILTIN_PCMPGTW256,
IX86_BUILTIN_PCMPGTD256, IX86_BUILTIN_PCMPGTQ256.
* config/i386/sse.md (avx2_eq3): Deleted.
(sse2_eq3): Ditto.
(sse2_gt3): Rename to ..
(*sse2_gt3): .. this.

gcc/testsuite/ChangeLog:

PR target/98911
* gcc.target/i386/pr98911.c: New test.
* gcc.target/i386/funcspec-8.c: Remove __builtin_ia32_pcmpgtq
since it has been folded.

-- 
BR,
Hongtao
From cce210b95c7382728608b517491a4c682cfaf5f0 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Tue, 23 Feb 2021 11:17:40 +0800
Subject: [PATCH] Add folding and remove expanders for x86 *pcmp{et,gt}*
 builtins [PR target/98911]

gcc/ChangeLog:

	PR target/98911
	* config/i386/i386-builtin.def (BDESC): Change the icode of
	the following builtins to CODE_FOR_nothing.
	* config/i386/i386.c (ix86_gimple_fold_builtin): Fold
	IX86_BUILTIN_PCMPEQB128, IX86_BUILTIN_PCMPEQW128,
	IX86_BUILTIN_PCMPEQD128, IX86_BUILTIN_PCMPEQQ,
	IX86_BUILTIN_PCMPEQB256, IX86_BUILTIN_PCMPEQW256,
	IX86_BUILTIN_PCMPEQD256, IX86_BUILTIN_PCMPEQQ256,
	IX86_BUILTIN_PCMPGTB128, IX86_BUILTIN_PCMPGTW128,
	IX86_BUILTIN_PCMPGTD128, IX86_BUILTIN_PCMPGTQ,
	IX86_BUILTIN_PCMPGTB256, IX86_BUILTIN_PCMPGTW256,
	IX86_BUILTIN_PCMPGTD256, IX86_BUILTIN_PCMPGTQ256.
	* config/i386/sse.md (avx2_eq3): Deleted.
	(sse2_eq3): Ditto.
	(sse2_gt3): Rename to ..
	(*sse2_gt3): .. this.

gcc/testsuite/ChangeLog:

	PR target/98911
	* gcc.target/i386/pr98911.c: New test.
	* gcc.target/i386/funcspec-8.c: Remove __builtin_ia32_pcmpgtq
	since it has been folded.
---
 gcc/config/i386/i386-builtin.def   |  32 +++---
 gcc/config/i386/i386.c |  44 
 gcc/config/i386/sse.md |  18 +---
 gcc/testsuite/gcc.target/i386/funcspec-8.c |  19 
 gcc/testsuite/gcc.target/i386/pr98911.c| 116 +
 5 files changed, 177 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr98911.c

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index e3ed4e1578f..4dbd4f23647 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -773,12 +773,12 @@ BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_xorv2di3, "__builtin_ia32_pxor128", IX8
 BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_uavgv16qi3, "__builtin_ia32_pavgb128", IX86_BUILTIN_PAVGB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
 BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_uavgv8hi3, "__builtin_ia32_pavgw128", IX86_BUILTIN_PAVGW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI)
 
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_eqv16qi3, "__builtin_ia32_pcmpeqb128", IX86_BUILTIN_PCMPEQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_eqv8hi3, "__builtin_ia32_pcmpeqw128", IX86_BUILTIN_PCMPEQW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI)
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_eqv4si3, "__builtin_ia32_pcmpeqd128", IX86_BUILTIN_PCMPEQD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI )
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_gtv16qi3, "__builtin_ia32_pcmpgtb128", IX86_BUILTIN_PCMPGTB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_gtv8hi3, "__builtin_ia32_pcmpgtw128", IX86_BUILTIN_PCMPGTW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI)
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_gtv4si3, "__builtin_ia32_pcmpgtd128", IX86_BUILTIN_PCMPGTD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI )
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, "__builtin_ia32_pcmpeqb128", IX86_BUILTIN_PCMPEQB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, "__builtin_ia32_pcmpeqw128", IX86_BUILTIN_PCMPEQW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, "__builtin_ia32_pcmpeqd128", IX86_BUILTIN_PCMPEQD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI )
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, "__builtin_ia32_pcmpgtb128", IX86_BUILTIN_PCMPGTB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, "__builtin_ia32_pcmpgtw128", IX86_BUILTIN_PCMPGTW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 

Re: [wwwdocs] IPA/LTO/profile-feedback changes

2021-04-23 Thread Gerald Pfeifer
On Fri, 23 Apr 2021, Jan Hubicka wrote:
> this patch adds changesentry for IPA/LTO and FDO.

Ah, cool!  This looks fine with some minor edits.

> --- a/htdocs/gcc-11/changes.html
> +++ b/htdocs/gcc-11/changes.html
> +
> +  New IPA-modref pass was added to track side-effects of function 
> calls
> +  and improve precision of points-to-analysis. Pass can be controlled
> +   by -fipa-modref attribute.

"A new IPA-modref pass..." or "An IPA-modref pass..."

And simply "side effects" without a dash.

"The pass can be..."   "...by the..."

> +  Identical code folding pass was significantly improved to increase 
> number of
> +   unified functions and to reduce compile-time memory use.

"The identical code..."

"the number"

> +  IPA-CP heuristics improved its estimation of potential usefulness 
> of
> +  known loop bounds and strides by taking into account the estimated
> +  frequency of these loops. 

Here I'd probably say "by taking the ... frequence of these loops into 
account".

> +  LTO bytecode file format was optimized for smaller object files and
> +   faster streaming.

"The LTO bytecode format..."

Thank you,
Gerald