date:20200703

[PATCH] libgomp: Add OMPD process functions and datatypes.

2020-07-03 Thread y2s1982 via Gcc-patches

This patch adds OMPD functions defined in 5.5.2 of the OpenMP 5.0 API
documentation. It adds per-process and per-device functions, defines
related handle data types, and adds a helper function for storing device
id.

2020-07-03  Tony Sim  

libgomp/ChangeLog:

* Makefile.am (libgompd_la_OBJECTS): Add ompd-proc.c and
 ompd-helper.c
* Makefile.in: Regenerate.
* libgompd.h (ompd_address_space_handle_t): Define.
(_gompd_device_id): Define.
* ompd-helper.c: New file.
* ompd-helper.h: New file.
* ompd-proc.c: New file.
* ompd-types.h: New file.

---
 libgomp/Makefile.am   |   2 +-
 libgomp/Makefile.in   |  10 ++--
 libgomp/libgompd.h|  10 
 libgomp/ompd-helper.c |  58 +
 libgomp/ompd-helper.h |  37 +
 libgomp/ompd-proc.c   | 117 ++
 libgomp/ompd-types.h  |  90 
 7 files changed, 319 insertions(+), 5 deletions(-)
 create mode 100644 libgomp/ompd-helper.c
 create mode 100644 libgomp/ompd-helper.h
 create mode 100644 libgomp/ompd-proc.c
 create mode 100644 libgomp/ompd-types.h

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index e15a838e55c..4306951046f 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -90,7 +90,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c error.c \
oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c oacc-target.c
 
-libgompd_la_SOURCES = ompd-lib.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-helper.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index af897d6c6ba..dc0f3b58ebe 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -235,7 +235,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
critical.lo \
$(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 libgompd_la_LIBADD =
-am_libgompd_la_OBJECTS = ompd-lib.lo
+am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo ompd-helper.lo
 libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -574,10 +574,10 @@ nodist_toolexeclib_HEADERS = libgomp.spec
 libgomp_version_info = -version-info $(libtool_VERSION)
 libgompd_version_info = -version-info $(libtool_VERSION)
 libgomp_la_LDFLAGS = $(libgomp_version_info) $(libgomp_version_script) \
-$(lt_host_flags)
+   $(lt_host_flags)
 
 libgompd_la_LDFLAGS = $(libgompd_version_info) $(libgompd_version_script) \
-$(lt_host_flags)
+   $(lt_host_flags)
 
 libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
 libgompd_la_DEPENDENCIES = $(libgompd_version_dep)
@@ -592,7 +592,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
oacc-target.c $(am__append_4)
-libgompd_la_SOURCES = ompd-lib.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-helper.c
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info 
$(libtool_VERSION)
@@ -816,7 +816,9 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-plugin.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-profiling.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-target.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-helper.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-lib.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-proc.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/priority_queue.Plo@am__quote@
diff --git a/libgomp/libgompd.h b/libgomp/libgompd.h
index 3a428e1c1e4..f1338e7b3e4 100644
--- a/libgomp/libgompd.h
+++ b/libgomp/libgompd.h
@@ -38,4 +38,14 @@
 
 extern ompd_callbacks_t gompd_callbacks;
 
+typedef __UINT64_TYPE__ _gompd_device_id;
+
+typedef struct _ompd_aspace_handle {
+  ompd_address_space_context_t *context;
+  ompd_device_t kind;
+  _gompd_device_id id;
+  ompd_address_space_handle_t *process_reference;
+  ompd_size_t ref_count;
+} ompd_address_space_handle_t;
+
 #endif /* LIBGOMPD_H */
diff --git a/libgomp/ompd-helper.c b/libgomp/ompd-helper.c
new file mode 100644
index 000..949b9cf771e
--- /dev/null
+++ b/libgomp/ompd-helper.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Yoosuk Sim .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public

Re: [PATCH] c++: Make convert_like complain about bad ck_ref_bind again [PR95789]

2020-07-03 Thread Jason Merrill via Gcc-patches


On 6/22/20 10:09 PM, Marek Polacek wrote:

convert_like issues errors about bad_p conversions at the beginning
of the function, but in the ck_ref_bind case, it only issues them
after we've called convert_like on the next conversion.

This doesn't work as expected since r10-7096 because when we see
a conversion from/to class type in a template, we return early, thereby
missing the error, and a bad_p conversion goes by undetected.  That
made the attached test to compile even though it should not.


Hmm, why isn't there an error at instantiation time?

Though giving an error at template parsing time is definitely preferable.


I had thought that I could just move the ck_ref_bind/bad_p errors
above to the rest of them, but that regressed diagnostics because
expr then wasn't converted yet by the nested convert_like_real call.


Yeah, the early section is really just for scalar conversions.

It would probably be good to do normal processing for all other bad 
conversions and only afterward build the IMPLICIT_CONV_EXPR if we aren't 
returning error_mark_node.


Jason

Re: [PATCH] Add TARGET_LOWER_LOCAL_DECL_ALIGNMENT [PR95237]

2020-07-03 Thread Jason Merrill via Gcc-patches


On 6/29/20 5:00 AM, Richard Biener wrote:

On Fri, Jun 26, 2020 at 10:11 PM H.J. Lu  wrote:


On Thu, Jun 25, 2020 at 1:10 AM Richard Biener
 wrote:


On Thu, Jun 25, 2020 at 2:53 AM Sunil Pandey  wrote:


On Wed, Jun 24, 2020 at 12:30 AM Richard Biener
 wrote:


On Tue, Jun 23, 2020 at 5:31 PM Sunil K Pandey via Gcc-patches
 wrote:


From: Sunil K Pandey 

Default for this hook is NOP. For x86, in 32 bit mode, this hook
sets alignment of long long on stack to 32 bits if preferred stack
boundary is 32 bits.

  - This patch fixes
 gcc.target/i386/pr69454-2.c
 gcc.target/i386/stackalign/longlong-1.c
  - Regression test on x86-64, no new fail introduced.


I think the name is badly chosen, TARGET_LOWER_LOCAL_DECL_ALIGNMENT


Yes, I can change the target hook name.


would be better suited (and then asks for LOCAL_DECL_ALIGNMENT to be
renamed to INCREASE_LOCAL_DECL_ALIGNMENT).


It seems like LOCAL_DECL_ALIGNMENT macro documentation is incorrect.
It increases as well as decreases alignment based on condition(-m32
-mpreferred-stack-boundary=2)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95885



You're calling it from do_type_align which IMHO is dangerous since that's
invoked from FIELD_DECL layout as well.  Instead invoke it from
layout_decl itself where we do

   if (code != FIELD_DECL)
 /* For non-fields, update the alignment from the type.  */
 do_type_align (type, decl);

and invoke the hook _after_ do_type_align.  Also avoid
invoking the hook on globals or hard regs and only
invoke it on VAR_DECLs, thus only

   if (VAR_P (decl) && !is_global_var (decl) && !DECL_HARD_REGISTER (decl))


It seems like decl property is not fully populated at this point call
to is_global_var (decl) on global variable return false.

$ cat foo.c
long long x;
int main()
{
if (__alignof__(x) != 8)
   __builtin_abort();
return 0;
}

Breakpoint 1, layout_decl (decl=0x77ffbb40, known_align=0)
 at /local/skpandey/gccwork/gccwork/gcc/gcc/stor-layout.c:674
674 do_type_align (type, decl);
Missing separate debuginfos, use: dnf debuginfo-install
gmp-6.1.2-10.fc31.x86_64 isl-0.16.1-9.fc31.x86_64
libmpc-1.1.0-4.fc31.x86_64 mpfr-3.1.6-5.fc31.x86_64
zlib-1.2.11-20.fc31.x86_64
(gdb) call debug_tree(decl)
  
 unit-size 
 align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7fffea801888 precision:64 min  max 
 pointer_to_this >
 DI foo.c:1:11 size  unit-size

 align:1 warn_if_not_align:0>

(gdb) p is_global_var(decl)
$1 = false
(gdb)


What about calling hook here

  603 do_type_align (tree type, tree decl)
  604 {
  605   if (TYPE_ALIGN (type) > DECL_ALIGN (decl))
  606 {
  607   SET_DECL_ALIGN (decl, TYPE_ALIGN (type));
  608   if (TREE_CODE (decl) == FIELD_DECL)
  609 DECL_USER_ALIGN (decl) = TYPE_USER_ALIGN (type);
  610   else
  611 /* Lower local decl alignment */
  612 if (VAR_P (decl)
  613 && !is_global_var (decl)
  614 && !DECL_HARD_REGISTER (decl)
  615 && cfun != NULL)
  616   targetm.lower_local_decl_alignment (decl);
  617 }


But that doesn't change anything (obviously).  layout_decl
is called quite early, too early it looks like.

Now there doesn't seem to be any other good place where
we are sure to catch the decl before we evaluate things
like __alignof__

void __attribute__((noipa))
foo (__SIZE_TYPE__ align, long long *p)
{
   if ((__SIZE_TYPE__)p & (align-1))
 __builtin_abort ();
}
int main()
{
   long long y;
   foo (_Alignof y, );
   return 0;
}

Joseph/Jason - do you have a good recommendation
how to deal with targets where natural alignment
is supposed to be lowered for optimization purposes?
(this case is for i?86 to avoid dynamic stack re-alignment
to align long long to 8 bytes with -mpreferred-stack-boundary=2)

I note that for -mincoming-stack-boundary=2 we do perform
dynamic stack re-alignment already.

I can't find a suitable existing target macro/hook for this,
but my gut feeling is that the default alignment should
instead be the lower one and instead the alignment for
globals should be raised as optimization?



Here is the updated patch from Sunil.


It does not address the fundamental issue that during
do_type_align the is_global_var predicate is not
reliable.  This means that for

int main()
{
   extern long z;
}

the new hook (with -m32 -mpreferred-stack-boundary=2)
will lower the alignment of 'z' which looks wrong.  During
layout_decl we can unfortunately not distinguish between
locals and globals.  We need to find another spot to adjust
alignment of locals.  For C that might be in finish_decl,
for C++ there's probably another suitable place.


cp_finish_decl could work, but layout_decl seems like the right spot; if 
the problem is that the appropriate flags currently aren't being set in 
time, can't we fix that?



Note it needs to be a place before the frontends possibly
inspect the alignment of the decl
In C++ constexpr

Re: [PATCH] avoid -Wnonnull for lambda stubs (PR c++/95984)

2020-07-03 Thread Jason Merrill via Gcc-patches


On 7/2/20 11:21 AM, Martin Sebor wrote:

On 7/1/20 3:25 PM, Jason Merrill wrote:

On 7/1/20 3:31 PM, Martin Sebor wrote:

The attached patch avoids null pointer checking for the first
argument to calls to the member operator() of lambda objects
emitted by the C++ front end.  This avoids both the spurious
-Wnonnull warnings for such calls as well as the ICE reported
in the bug.

In addition, the patch also avoids function argument checking
for any calls when the tf_warning bit isn't set.  This isn't
strictly necessary to avoid the ICE but it seems like a good
precaution against something similar occurring in other checks
in the future(*).

Finally, while testing the fix I noticed that the common code
doesn't recognize nullptr as a poiner when processing templates.
I've extended the handling to let it handle it as well.


Any possible value of nullptr_t is null, so I think ignoring it is 
appropriate.


In ordinary (including variadic) functions, GCC warns for all
null pointers, including nullptr.  In templates (and in auto
arguments), GCC warns for nullptr only when its converted to
a pointer type (e.g., (void*)nullptr) but not for naked nullptr.
As a user, I expect a warning for all null pointers in all these
contexts.  The distinction between the two kinds is too subtle
to appreciate by most of us (or to be useful in this context).
The test case below (I put it together to verify my patch was
working) shows the difference with lambdas:

   template  void f (L f) {
     f ((void*)nullptr);   // -Wnonnull
   }

   template  void g (L f) {
     f (nullptr);  // missing -Wnonnull
   }

   void h ()
   {
     f ([](auto...) __attribute__ ((nonnull)) { });
     g ([](auto...) __attribute__ ((nonnull)) { });
   }




[*] It seems to me that a more robust solution to prevent
the diagnostic subsystem from being re-entered as a result
of callbacks into the front end would be to have the pretty
printer disable all warnings prior to the bcallbacks and
re-enable them afterwards.  That would require an efficient
and reliable way of controlling all warnings (as well as
querying their state), which I think would be a useful
feature to have in any case.  For one thing, it would make
handling warnings and responding to #pragma GCC diagnostics
more robust.



+  const char *arg0str = IDENTIFIER_POINTER (arg0name);
+  closure = !strcmp (arg0str, "__closure");


Let's use id_equal here.


Done in the attached revision.



gcc/c-family/ChangeLog:

PR c++/95984
* c-common.c (check_function_nonnull):
(check_nonnull_arg):

gcc/cp/ChangeLog:

PR c++/95984
* call.c (build_over_call):


You're missing actual descriptions of your changes.  OK with that fixed.

Jason

[PATCH] PR fortran/95709 - [9/10/11 Regression] ICE in gfc_resolve_code, at fortran/resolve.c:11807

2020-07-03 Thread Harald Anlauf

> The obsolescent (=legacy) assigned GOTO should only allow scalar integer
> variables.  Check for proper conditions.

I've played around some more with invalid uses of assigned GOTO and found
that some cases produced a suboptimal error message.  The attached patch
(v2) improves upon this and provides a slightly larger testcase.

Again regtested on x86_64-pc-linux-gnu.

OK for master / backports?

Thanks,
Harald

> PR fortran/95709 - ICE in gfc_resolve_code, at fortran/resolve.c:11807
>
> The legacy "assigned GOTO" accepts only scalar integer variables.
> Check for proper arguments.
>
> gcc/fortran/
>   PR fortran/95709
>   * resolve.c (gfc_resolve_code): Check for valid arguments to
>   assigned GOTO.
>



pr95709-v2-patch
Description: Binary data

Re: [PATCH] libgomp: Add OMPD per-process functions.

2020-07-03 Thread y2s1982 . via Gcc-patches

Hello,

I believe I misused the write_memory() function. This function is currently
used
in ompd_device_initialize() to write a block of memory that represents the
id of
the device, stored in the handle as a pointer. I would need to resubmit the
patch
but I have some questions regarding the issue.

Re-reading the documentation, it seems the callback function is used to
write to the OpenMP program described in the context, which may be a
different device than the host, like a SMX, which would have its own
dedicated memory.
They way it is currently written, I fear it might just corrupt some random
memory block
somewhere in target device instead of writing to the handle if the
device_context
somewhere other than the host.

I am not sure how best to store that information. It is passed in as a void
*id with
the accompanying ompd_size_t sizeof_id variable. Would it be safe to assume
all variable information will be safely stored in the unsigned long long,
do some
casting based on the sizeof_id, and store it in unsigned long long member
variable?

Cheers,

Tony Sim


On Fri, Jul 3, 2020 at 1:49 PM y2s1982  wrote:

> This patch adds OMPD process and device handling functions and related
> data structures as described in the OMP 5.0 API documentation, section
> 5.5.2.
>
> 2020-07-03  Tony Sim  
>
> libgomp/ChangeLog:
>
> * Makefile.am (libgompd_la_SOURCES): Add ompd-proc.c
> * Makefile.in: Regenerate.
> * libgompd.h (ompd_address_space_handle_t): Add definition.
> * ompd-proc.c: New file.
> * ompd-types.h: New file.
>
> ---
>  libgomp/Makefile.am  |   2 +-
>  libgomp/Makefile.in  |   9 +--
>  libgomp/libgompd.h   |   9 +++
>  libgomp/ompd-proc.c  | 132 +++
>  libgomp/ompd-types.h |  90 +
>  5 files changed, 237 insertions(+), 5 deletions(-)
>  create mode 100644 libgomp/ompd-proc.c
>  create mode 100644 libgomp/ompd-types.h
>
> diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
> index e15a838e55c..e5556ef61e5 100644
> --- a/libgomp/Makefile.am
> +++ b/libgomp/Makefile.am
> @@ -90,7 +90,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c
> critical.c env.c error.c \
> oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c
> \
> affinity-fmt.c teams.c allocator.c oacc-profiling.c oacc-target.c
>
> -libgompd_la_SOURCES = ompd-lib.c
> +libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
>
>  include $(top_srcdir)/plugin/Makefrag.am
>
> diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
> index af897d6c6ba..f0f91761a0d 100644
> --- a/libgomp/Makefile.in
> +++ b/libgomp/Makefile.in
> @@ -235,7 +235,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo
> critical.lo \
> $(am__objects_1)
>  libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
>  libgompd_la_LIBADD =
> -am_libgompd_la_OBJECTS = ompd-lib.lo
> +am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo
>  libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
>  AM_V_P = $(am__v_P_@AM_V@)
>  am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
> @@ -574,10 +574,10 @@ nodist_toolexeclib_HEADERS = libgomp.spec
>  libgomp_version_info = -version-info $(libtool_VERSION)
>  libgompd_version_info = -version-info $(libtool_VERSION)
>  libgomp_la_LDFLAGS = $(libgomp_version_info) $(libgomp_version_script) \
> -$(lt_host_flags)
> +   $(lt_host_flags)
>
>  libgompd_la_LDFLAGS = $(libgompd_version_info) $(libgompd_version_script)
> \
> -$(lt_host_flags)
> +   $(lt_host_flags)
>
>  libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
>  libgompd_la_DEPENDENCIES = $(libgompd_version_dep)
> @@ -592,7 +592,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c
> critical.c env.c \
> oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
> affinity-fmt.c teams.c allocator.c oacc-profiling.c \
> oacc-target.c $(am__append_4)
> -libgompd_la_SOURCES = ompd-lib.c
> +libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
>
>  # Nvidia PTX OpenACC plugin.
>  @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info
> $(libtool_VERSION)
> @@ -817,6 +817,7 @@ distclean-compile:
>  @AMDEP_TRUE@@am__include@ @am__quote@
> ./$(DEPDIR)/oacc-profiling.Plo@am__quote@
>  @AMDEP_TRUE@@am__include@ @am__quote@
> ./$(DEPDIR)/oacc-target.Plo@am__quote@
>  @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-lib.Plo@am__quote@
> +@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-proc.Plo@am__quote@
>  @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
>  @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
>  @AMDEP_TRUE@@am__include@ @am__quote@
> ./$(DEPDIR)/priority_queue.Plo@am__quote@
> diff --git a/libgomp/libgompd.h b/libgomp/libgompd.h
> index 3a428e1c1e4..495995e00d3 100644
> --- a/libgomp/libgompd.h
> +++ b/libgomp/libgompd.h
> @@ -38,4 +38,13 @@
>
>  extern ompd_callbacks_t gompd_callbacks;
>
> +typedef struct _ompd_aspace_handle {
> +  ompd_address_space_context_t *context;
> +

[wwwdocs] Document new G++ features

2020-07-03 Thread Marek Polacek via Gcc-patches

Pushed.

commit c325a1addc0fc612a9db218be6abb41b2052a447
Author: Marek Polacek 
Date:   Fri Jul 3 14:14:34 2020 -0400

gcc-11/changes:: Document new C++ stuff.

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index cea01a9c..287dc864 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -76,12 +76,21 @@ a work-in-progress.
 C++
 
   The default mode has been changed to -std=gnu++17.
+  Several C++20 features have been implemented:
+
+  the compiler now supports consteval virtual 
functions
+  P2082R1, Fixing CTAD for aggregates
+
+  
   Several C++ Defect Reports have been resolved, e.g.:
-  
-DR 1512, Pointer comparison vs qualification conversions
-DR 2289, Uniqueness of decomposition declaration names
-DR 2237, Can a template-id name a constructor?
-  
+
+  DR 1512, Pointer comparison vs qualification conversions
+  DR 2289, Uniqueness of decomposition declaration names
+  DR 2237, Can a template-id name a constructor?
+
+  
+  G++ now performs better access checking in templates
+  (https://gcc.gnu.org/PR41437;>PR41437).

[wwwdocs] Another update of the C++ DR table

2020-07-03 Thread Marek Polacek via Gcc-patches

Pushed.

commit 265e9e1ea8117f34f6f97b7eb7be6df119f2c646
Author: Marek Polacek 
Date:   Wed Jul 1 11:07:21 2020 -0400

More updates to the C++ DRs table.

diff --git a/htdocs/projects/cxx-dr-status.html 
b/htdocs/projects/cxx-dr-status.html
index 482b59b0..92d72121 100644
--- a/htdocs/projects/cxx-dr-status.html
+++ b/htdocs/projects/cxx-dr-status.html
@@ -520,7 +520,7 @@
   https://wg21.link/cwg71;>71
   NAD
   Incorrect cross reference
-  ?
+  N/A
   
 
 
@@ -2911,7 +2911,7 @@
   https://wg21.link/cwg412;>412
   NAD
   Can a replacement allocation function be inline?
-  ?
+  N/A
   
 
 
@@ -3254,7 +3254,7 @@
   https://wg21.link/cwg461;>461
   NAD
   Make asm conditionally-supported
-  ?
+  N/A
   
 
 
@@ -3989,7 +3989,7 @@
   https://wg21.link/cwg566;>566
   NAD
   Conversion of negative floating point values to integer type
-  ?
+  N/A
   
 
 
@@ -4115,7 +4115,7 @@
   https://wg21.link/cwg584;>584
   NAD
   Unions and aliasing
-  ?
+  N/A
   
 
 
@@ -6278,7 +6278,7 @@
   https://wg21.link/cwg893;>893
   NAD
   Brace syntax for enumerator-definitions
-  ?
+  N/A
   
 
 
@@ -6391,7 +6391,7 @@
   https://wg21.link/cwg909;>909
   NAD
   Old-style casts with conversion functions
-  ?
+  No
   https://gcc.gnu.org/PR77465;>PR77465
 
 
@@ -6734,7 +6734,7 @@
   https://wg21.link/cwg958;>958
   NAD
   Lambdas and decltype
-  ?
+  N/A
   
 
 
@@ -7210,7 +7210,7 @@
   https://wg21.link/cwg1026;>1026
   NAD
   Cv-qualified non-class rvalues
-  ?
+  N/A
   
 
 
@@ -7896,7 +7896,7 @@
   https://wg21.link/cwg1124;>1124
   NAD
   Error in description of value category of pointer-to-member 
expression
-  ?
+  N/A
   
 
 
@@ -9226,7 +9226,7 @@
   https://wg21.link/cwg1314;>1314
   NAD
   Pointer arithmetic within standard-layout objects
-  ?
+  N/A
   
 
 
@@ -9289,7 +9289,7 @@
   https://wg21.link/cwg1323;>1323
   NAD
   Nonexistent nonterminal in alignment-specifier grammar
-  ?
+  N/A
   
 
 
@@ -9303,7 +9303,7 @@
   https://wg21.link/cwg1325;>1325
   NAD
   Omitted declarator in friend declarations
-  ?
+  N/A
   
 
 
@@ -9366,7 +9366,7 @@
   https://wg21.link/cwg1334;>1334
   NAD
   Layout compatibility and cv-qualification
-  ?
+  N/A
   
 
 
@@ -9716,7 +9716,7 @@
   https://wg21.link/cwg1384;>1384
   NAD
   reinterpret_cast in constant expressions
-  ?
+  N/A
   
 
 
@@ -9828,7 +9828,7 @@
   https://wg21.link/cwg1400;>1400
   NAD
   Function pointer equality
-  ?
+  N/A
   
 
 
@@ -9877,7 +9877,7 @@
   https://wg21.link/cwg1407;>1407
   NAD
   Integral to bool conversion in converted constant 
expressions
-  ?
+  No
   
 
 
@@ -10167,8 +10167,8 @@
   https://wg21.link/cwg1448;>1448
   NAD
   Integral values of type bool
-  Yes
-  
+  N/A
+  see CWG 903
 
 
   https://wg21.link/cwg1449;>1449
@@ -10706,7 +10706,7 @@
   https://wg21.link/cwg1525;>1525
   NAD
   Array bound inference in temporary array
-  ?
+  No
   
 
 
@@ -10811,7 +10811,7 @@
   https://wg21.link/cwg1540;>1540
   NAD
   Use of address constants in constant expressions
-  ?
+  N/A
   
 
 
@@ -11127,7 +11127,7 @@
   https://wg21.link/cwg1585;>1585
   NAD
   Value category of member access of rvalue reference member
-  ?
+  N/A
   
 
 
@@ -11778,15 +11778,15 @@
   https://wg21.link/cwg1678;>1678
   NAD
   Naming the type of an array of runtime bound
-  ?
-  
+  N/A
+  VLAs removed
 
 
   https://wg21.link/cwg1679;>1679
   NAD
   Range-based for and array of runtime bound
-  ?
-  
+  N/A
+  VLAs removed
 
 
   https://wg21.link/cwg1680;>1680
@@ -11813,8 +11813,8 @@
   https://wg21.link/cwg1683;>1683
   CD4
   Incorrect example after constexpr changes
-  ?
-  
+  N/A
+  works as expected
 
 
   https://wg21.link/cwg1684;>1684
@@ -11827,7 +11827,7 @@
   https://wg21.link/cwg1685;>1685
   NAD
   Value category of noexcept expression
-  ?
+  N/A
   
 
 
@@ -12233,7 +12233,7 @@
   https://wg21.link/cwg1743;>1743
   NAD
   init-captures in nested lambdas
-  ?
+  N/A
   
 
 
@@ -12247,7 +12247,7 @@
   https://wg21.link/cwg1745;>1745
   NAD
   thread_local constexpr variable
-  ?
+  Yes
   
 
 
@@ -12814,7 +12814,7 @@
   https://wg21.link/cwg1826;>1826
   NAD
   const

[PATCH] libgomp: Add OMPD per-process functions.

2020-07-03 Thread y2s1982 via Gcc-patches

This patch adds OMPD process and device handling functions and related
data structures as described in the OMP 5.0 API documentation, section
5.5.2.

2020-07-03  Tony Sim  

libgomp/ChangeLog:

* Makefile.am (libgompd_la_SOURCES): Add ompd-proc.c
* Makefile.in: Regenerate.
* libgompd.h (ompd_address_space_handle_t): Add definition.
* ompd-proc.c: New file.
* ompd-types.h: New file.

---
 libgomp/Makefile.am  |   2 +-
 libgomp/Makefile.in  |   9 +--
 libgomp/libgompd.h   |   9 +++
 libgomp/ompd-proc.c  | 132 +++
 libgomp/ompd-types.h |  90 +
 5 files changed, 237 insertions(+), 5 deletions(-)
 create mode 100644 libgomp/ompd-proc.c
 create mode 100644 libgomp/ompd-types.h

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index e15a838e55c..e5556ef61e5 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -90,7 +90,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c error.c \
oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c oacc-target.c
 
-libgompd_la_SOURCES = ompd-lib.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index af897d6c6ba..f0f91761a0d 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -235,7 +235,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
critical.lo \
$(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 libgompd_la_LIBADD =
-am_libgompd_la_OBJECTS = ompd-lib.lo
+am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo
 libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -574,10 +574,10 @@ nodist_toolexeclib_HEADERS = libgomp.spec
 libgomp_version_info = -version-info $(libtool_VERSION)
 libgompd_version_info = -version-info $(libtool_VERSION)
 libgomp_la_LDFLAGS = $(libgomp_version_info) $(libgomp_version_script) \
-$(lt_host_flags)
+   $(lt_host_flags)
 
 libgompd_la_LDFLAGS = $(libgompd_version_info) $(libgompd_version_script) \
-$(lt_host_flags)
+   $(lt_host_flags)
 
 libgomp_la_DEPENDENCIES = $(libgomp_version_dep)
 libgompd_la_DEPENDENCIES = $(libgompd_version_dep)
@@ -592,7 +592,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
oacc-target.c $(am__append_4)
-libgompd_la_SOURCES = ompd-lib.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info 
$(libtool_VERSION)
@@ -817,6 +817,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-profiling.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-target.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-lib.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-proc.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/priority_queue.Plo@am__quote@
diff --git a/libgomp/libgompd.h b/libgomp/libgompd.h
index 3a428e1c1e4..495995e00d3 100644
--- a/libgomp/libgompd.h
+++ b/libgomp/libgompd.h
@@ -38,4 +38,13 @@
 
 extern ompd_callbacks_t gompd_callbacks;
 
+typedef struct _ompd_aspace_handle {
+  ompd_address_space_context_t *context;
+  ompd_device_t kind;
+  ompd_size_t sizeof_id;
+  void *id;
+  ompd_address_space_handle_t *process_reference;
+  ompd_size_t ref_count;
+} ompd_address_space_handle_t;
+
 #endif /* LIBGOMPD_H */
diff --git a/libgomp/ompd-proc.c b/libgomp/ompd-proc.c
new file mode 100644
index 000..39feba056f2
--- /dev/null
+++ b/libgomp/ompd-proc.c
@@ -0,0 +1,132 @@
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Yoosuk Sim .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of

[PATCH] nvptx: : Add support for popcount and widening multiply instructions

2020-07-03 Thread Roger Sayle


The following patch adds support for three-input addition instructions to the 
nvptx backend.
The PTX ISA's "vadd.u32.u32.u32.add d, a, b, c" instruction effectively 
implements 32-bit d = a+b+c,
and the  "vsub.u32.u32.u32 d,a,b,c" instruction that provides 32-bit d = 
(a-b)+c.  The hope is that
these mnemonics help ptxas generate the low-level hardware's IADD3 instruction.

Tested by "make" and "make -k check" on --build=nvptx-none hosted on  
x86_64-pc-linux-gnu
with no new regressions.

[PATCH] nvptx: Add support for vadd.add and vsub.add instructions

2020-07-03  Roger Sayle  

gcc/ChangeLog:
* config/nvptx/nvptx.md (vadd_addsi4): New instruction.
(vsub_addsi4): New instruction.

gcc/testsuite/ChangeLog:
* gcc.target/nvptx/vadd_add.c: New test.
* gcc.target/nvptx/vsub_add.c: New test.


Hopefully, I've got the patch/diff file format correct this time.
Ok for mainline?

Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

-Original Message-
From: Tom de Vries  
Sent: 02 July 2020 14:29
To: Roger Sayle ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] nvptx: : Add support for popcount and widening multiply 
instructions

On 7/1/20 3:06 PM, Roger Sayle wrote:
> 
> The following patch adds support for the popc and mul.wide instructions to 
> the nvptx backend.
> I've a follow-up patch for supporting mul.hi instructions, but those 
> changes require some minor tweaks to GCC's middle-end, so I'll submit those 
> pieces separately.
> 
> Tested by "make" and "make -k check" on --build=nvptx-none hosted on 
> x86_64-pc-linux-gnu with no new regressions.
> 
> 2020-07-01  Roger Sayle  
> 
> gcc/ChangeLog:
> * config/nvptx/nvptx.md (popcount2): New instructions.
> (mulhishi3, mulsidi3, umulhisi3, umulsidi3): New instructions.
> 
> gcc/testsuite/ChangeLog:
> * gcc.target/nvptx/popc-1.c: New test.
> * gcc.target/nvptx/popc-2.c: New test.
> * gcc.target/nvptx/popc-3.c: New test.
> * gcc.target/nvptx/mul-wide.c: New test.
> * gcc.target/nvptx/umul-wide.c: New test.
> 
> 
> Ok for mainline?
> 

Hi Roger,

LGTM, please apply.

[ Btw, can you next time add the new files to the patch.  That's somewhat more 
convenient to apply. ]

Thanks
- Tom

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 5ceeac7..11d1d35 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -373,6 +373,22 @@
   ""
   "%.\\tadd%t0\\t%0, %1, %2;")
 
+(define_insn "vadd_addsi4"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+(plus:SI (plus:SI (match_operand:SI 1 "nvptx_register_operand" "R")
+ (match_operand:SI 2 "nvptx_register_operand" "R"))
+(match_operand:SI 3 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tvadd%t0%t1%t2.add\\t%0, %1, %2, %3;")
+
+(define_insn "vsub_addsi4"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+(plus:SI (minus:SI (match_operand:SI 1 "nvptx_register_operand" "R")
+  (match_operand:SI 2 "nvptx_register_operand" "R"))
+(match_operand:SI 3 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tvsub%t0%t1%t2.add\\t%0, %1, %2, %3;")
+
 (define_insn "sub3"
   [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
(minus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
diff --git a/gcc/testsuite/gcc.target/nvptx/vadd_add.c 
b/gcc/testsuite/gcc.target/nvptx/vadd_add.c
new file mode 100644
index 000..dcb2394
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/vadd_add.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int foo(int x, int y, int z)
+{
+  return x + y + z;
+}
+
+unsigned int bar(unsigned int x, unsigned int y, unsigned int z)
+{
+  return x + y + z;
+}
+
+/* { dg-final { scan-assembler-times "vadd.u32.u32.u32.add" 2 } } */
+
diff --git a/gcc/testsuite/gcc.target/nvptx/vsub_add.c 
b/gcc/testsuite/gcc.target/nvptx/vsub_add.c
new file mode 100644
index 000..3f632c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/vsub_add.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int foo(int x, int y, int z)
+{
+  return (x - y) + z;
+}
+
+int bar(int x, int y, int z)
+{
+  return x + (y - z);
+}
+
+unsigned int ufoo(unsigned int x, unsigned int y, unsigned int z)
+{
+  return (x - y) + z;
+}
+
+unsigned int ubar(unsigned int x, unsigned int y, unsigned int z)
+{
+  return x + (y - z);
+}
+
+/* { dg-final { scan-assembler-times "vsub.u32.u32.u32.add" 4 } } */
+

Re: [PATCH] Add -fld-path= to specify an arbitrary executable as the linker

2020-07-03 Thread Fāng-ruì Sòng via Gcc-patches




On 2020-07-03, Martin Liška wrote:

On 7/2/20 9:34 PM, Fāng-ruì Sòng wrote:

On 2020-07-01, Fāng-ruì Sòng wrote:

On 2020-07-01, Martin Liška wrote:

On 6/30/20 5:32 PM, Fāng-ruì Sòng wrote:

There is some concern about clang's -fuse-ld=path
http://lists.llvm.org/pipermail/cfe-dev/2020-June/065710.html and use
of COMPILER_PATH vs PATH.
Shall we introduce another option like -fld-path=path (PATH is used,
COMPILER_PATH is not used)?


I would recommend first landing a patch to LLVM and then we can do
a corresponding change to GCC.

Martin


Thank a lot for you welcoming words! This is what I intend to add for clang: 
https://reviews.llvm.org/D83015

I'll create a GCC patch superseding this one later.


Attached the new patch.


Thank you for the update patch:


From e7f86cdcaf03e4ddb98d0df9d07894d9ffb7d91a Mon Sep 17 00:00:00 2001
From: Fangrui Song 
Date: Thu, 2 Jul 2020 12:26:09 -0700
Subject: [PATCH] Add -fld-path= to specify an arbitrary executable as the
linker

The value can be either a relative path (relative to a COMPILER_PATH
directory or a PATH directory) or an absolute path. -fld-path=
complements -fuse-ld={bfd,gold,lld} which specifies the linker flavor.

PR driver/93645
* common.opt (-fld-path=): Add -fld-path=
* opts.c (common_handle_option): Handle OPT_fld_path_.
* gcc.c (driver_handle_option): Likewise.
* collect2.c (main): Likewise.
* doc/invoke.texi: Document -fld-path=.
---
gcc/collect2.c  | 57 -
gcc/common.opt  |  4 
gcc/doc/invoke.texi |  6 +
gcc/gcc.c   |  2 +-
gcc/opts.c  |  1 +
5 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/gcc/collect2.c b/gcc/collect2.c
index f8a5ce45994..efa652f7f82 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -844,6 +844,7 @@ main (int argc, char **argv)
  const char **ld1;
  bool use_plugin = false;
  bool use_collect_ld = false;
+  const char *ld_path = NULL;
  /* The kinds of symbols we will have to consider when scanning the
 outcome of a first pass link.  This is ALL to start with, then might
@@ -961,12 +962,21 @@ main (int argc, char **argv)
if (selected_linker == USE_DEFAULT_LD)
  selected_linker = USE_PLUGIN_LD;
  }
-   else if (strcmp (argv[i], "-fuse-ld=bfd") == 0)
- selected_linker = USE_BFD_LD;
-   else if (strcmp (argv[i], "-fuse-ld=gold") == 0)
- selected_linker = USE_GOLD_LD;
-   else if (strcmp (argv[i], "-fuse-ld=lld") == 0)
- selected_linker = USE_LLD_LD;
+   else if (strncmp (argv[i], "-fuse-ld=bfd", 9) == 0
+&& selected_linker != USE_LD_MAX)
+ {


This does not seem correct to me. You match -fuse-ld=bfd and then
test other option values in the following block.


This is correct but I probably should use:

- strncmp (argv[i], "-fuse-ld=bfd", 9) == 0
+ strncmp (argv[i], "-fuse-ld=", 9) == 0



+   if (strcmp (argv[i] + 9, "bfd") == 0)
+ selected_linker = USE_BFD_LD;
+   else if (strcmp (argv[i] + 9, "gold") == 0)
+ selected_linker = USE_GOLD_LD;
+   else if (strcmp (argv[i] + 9, "lld") == 0)
+ selected_linker = USE_LLD_LD;
+ }
+   else if (strncmp (argv[i], "-fld-path=", 10) == 0)
+ {
+   ld_path = argv[i] + 10;
+   selected_linker = USE_LD_MAX;
+ }
else if (strncmp (argv[i], "-o", 2) == 0)
  {
/* Parse the output filename if it's given so that we can make
@@ -1117,14 +1127,27 @@ main (int argc, char **argv)
  ld_file_name = find_a_file (, collect_ld_suffix, X_OK);
  use_collect_ld = ld_file_name != 0;
}
-  /* Search the compiler directories for `ld'.  We have protection against
- recursive calls in find_a_file.  */
-  if (ld_file_name == 0)
-ld_file_name = find_a_file (, ld_suffixes[selected_linker], X_OK);
-  /* Search the ordinary system bin directories
- for `ld' (if native linking) or `TARGET-ld' (if cross).  */
-  if (ld_file_name == 0)
-ld_file_name = find_a_file (, full_ld_suffixes[selected_linker], 
X_OK);
+  if (selected_linker == USE_LD_MAX)
+{
+  /* If -fld-path= does not contain a slash, search for the command using
+the PATH environment variable.  */


We also support file systems like Windows where the comment about a slash will 
be misleading.
You can just mention relative vs. absolute path.


The behavior is modeled after
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html 
Command Search and Execution

  e.  Otherwise, the command shall be searched for using the PATH environment 
variable as described in XBD Environment Variables :

is performed if "the command name does not contain any  characters".

For your suggestion, I think 'word' can mean a relative path as well, along 
with 'rel\path' and 'rel/path'.

Should I say 


  If -fld-path= does not contain a path component separator

Re: [PATCH 2/3] Don't copy back vars mapped with acc_map_data

2020-07-03 Thread Thomas Schwinge

Hi!

On 2020-01-17T13:18:20-0800, Julian Brown  wrote:
> This patch prevents "exit data" directives from copying back data that
> was mapped with an acc_map_data API call. This matches the behaviour
> expected by the pr92843-1.c test

> --- a/libgomp/oacc-mem.c
> +++ b/libgomp/oacc-mem.c
> @@ -1235,6 +1235,7 @@ goacc_exit_data_internal (struct gomp_device_descr 
> *acc_dev, size_t mapnum,
> n->refcount--;
>
>   if (copyfrom
> + && n->refcount != REFCOUNT_INFINITY
>   && (kind != GOMP_MAP_FROM || n->refcount == 0))
> gomp_copy_dev2host (acc_dev, aq, (void *) cur_node.host_start,
> (void *) (n->tgt->tgt_start + n->tgt_offset

Such a change -- re-posted as
,
and
<0585b4fda1ba5c76dce2ac5053a55e2ceef06041.1592343756.git.julian@codesourcery.com">http://mid.mail-archive.com/0585b4fda1ba5c76dce2ac5053a55e2ceef06041.1592343756.git.julian@codesourcery.com>
-- does fix the 'libgomp.oacc-c-c++-common/pr92843-1.c' test case, but I
don't agree that the change is correct.  Instead, I have pushed
"[OpenACC] Revert always-copyfrom behavior for 'GOMP_MAP_FORCE_FROM' in
'libgomp/oacc-mem.c:goacc_exit_data_internal'",
<87wo3ky5vn.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87wo3ky5vn.fsf@euler.schwinge.homeip.net>.
(This moves us past the point where it used to 'abort' before, but
doesn't resolve the XFAIL, yet, which needs changes from "[OpenACC]
Adjust dynamic reference count semantics",
<5e9472b80dc475214a4a082ef54ee919d7f9dcff.1592343756.git.julian@codesourcery.com">http://mid.mail-archive.com/5e9472b80dc475214a4a082ef54ee919d7f9dcff.1592343756.git.julian@codesourcery.com>.)


Grüße
 Thomas
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

Re: [PATCH 3/9] [OpenACC] Adjust dynamic reference count semantics

2020-07-03 Thread Thomas Schwinge

Hi Julian!

On 2020-06-30T15:51:14+0200, I wrote:
> On 2020-06-16T15:38:33-0700, Julian Brown  wrote:
>> This is a new version of the patch last sent here:
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546332.html
>>
>> Minus the bits that Thomas has committed already (thanks!), and with
>> adjustments to allow for GOMP_MAP_ATTACH being grouped together with a
>> preceding clause.
>>
>> OK?
>
> Please also update the "virtual refcount" comment in
> 'libgomp.oacc-c-c++-common/structured-dynamic-lifetimes-4.c'.
>
> Your patch now makes the 'libgomp.oacc-fortran/mdc-refcount-1-1-1.f90',
> 'libgomp.oacc-fortran/mdc-refcount-1-2-1.f90',
> 'libgomp.oacc-fortran/mdc-refcount-1-2-2.f90',
> 'libgomp.oacc-fortran/mdc-refcount-1-3-1.f90' test cases PASS (did you
> not see that?)

Ah, you said "Tested (as a series)", so that's probably why I saw this
intermediate step but you didn't.

> so we have to remove all XFAILing, 'print'/'dg-output'
> etc. from these, and it changes the error reporting in
> 'libgomp.oacc-fortran/mdc-refcount-1-4-1.f90', so we have to adjust that.
> See attached patch "into Adjust dynamic reference count semantics".

Given my recent "[OpenACC] Revert always-copyfrom behavior for
'GOMP_MAP_FORCE_FROM' in 'libgomp/oacc-mem.c:goacc_exit_data_internal'",
<87wo3ky5vn.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87wo3ky5vn.fsf@euler.schwinge.homeip.net>,
please also include the attached "into 'Adjust dynamic reference count
semantics': un-XFAIL 'libgomp.oacc-c-c++-common/pr92843-1.c'".


> Your patch regresses the attached
> 'libgomp.oacc-c-c++-common/struct-3-1-1.c'

That was confusing: that's a new test case, not yet in tree.

> which used to act like
> detailed in the file, but now does:
>
> CheCKpOInT1
> CheCKpOInT2
> a.out: 
> source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/struct-3-1-1.c:28: 
> main: Assertion `acc_is_present (, sizeof s.b)' failed.
> Aborted (core dumped)
>
> That means, after '#pragma acc enter data create(s.a)' we're no longer
> refusing '#pragma acc enter data create(s.b)', but then the
> 'acc_is_present' for 's.b' fails.  Is that a true regression introduced
> by your patch, or a separate issue (which before just worked by chance)?
> In the latter case, please file a PR.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 705898afc94c94545a2dd7ed9f451615c067385f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 3 Jul 2020 16:58:34 +0200
Subject: [PATCH] into 'Adjust dynamic reference count semantics': un-XFAIL
 'libgomp.oacc-c-c++-common/pr92843-1.c'

---
 libgomp/testsuite/libgomp.oacc-c-c++-common/pr92843-1.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr92843-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr92843-1.c
index 78fe1402ad46..db5b35b08d9f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr92843-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr92843-1.c
@@ -4,7 +4,6 @@
 /* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
 
 #include 
-#include 
 #include 
 #include 
 
@@ -135,15 +134,7 @@ test_acc_data ()
 assert (acc_is_present (h, sizeof h));
 
 assign_array (h, N, c1);
-fprintf (stderr, "CheCKpOInT1\n");
-// { dg-output ".*CheCKpOInT1(\n|\r\n|\r)" }
 acc_copyout_finalize (h, sizeof h);
-//TODO goacc_exit_datum: Assertion `is_tgt_unmapped || num_mappings > 1' failed.
-//TODO { dg-output ".*\[Aa\]ssert.*is_tgt_unmapped" { target { ! openacc_host_selected } } } ! Scan for what we expect in the "XFAILed" case (without actually XFAILing).
-//TODO { dg-shouldfail "XFAILed" { ! openacc_host_selected } } ! ... instead of 'dg-xfail-run-if' so that 'dg-output' is evaluated at all.
-//TODO { dg-final { if { [dg-process-target { xfail { ! openacc_host_selected } }] == "F" } { xfail "[testname-for-summary] really is XFAILed" } } } ! ... so that we still get an XFAIL visible in the log.
-fprintf (stderr, "CheCKpOInT2\n");
-// { dg-output ".CheCKpOInT2(\n|\r\n|\r)" { target { openacc_host_selected } } }
 assert (acc_is_present (h, sizeof h));
 verify_array (h, N, c1);
 
-- 
2.27.0

[PATCH] ipa-sra: Avoid transitive splits with type mismatches (PR 96040)

2020-07-03 Thread Martin Jambor

Hi,

PR 96040 revealed IPA-SRA, when checking whether an intended split is
the same as the one in a called function does not also check if the
types match and the transformation code does not handle any resulting
type mismatches.  This patch simply avoids the the split in the case
of mismatches, so that we do not have to be careful about invalid
floating-point values being passed in floating point registers and
related issues.

Bootstrapped and tested on x86_64-linux, pre-approved by Richi on IRC, I
will push it to master in a moment and to the gcc-10 branch immediately
after bootstrap on top of it finishes.

Thanks,


Martin


gcc/ChangeLog:

2020-07-03  Martin Jambor  

PR ipa/96040
* ipa-sra.c (all_callee_accesses_present_p): Do not accept type
mismatched accesses.

gcc/testsuite/ChangeLog:

2020-07-03  Martin Jambor  

PR ipa/96040
* gcc.dg/ipa/pr96040.c: New test.
---
 gcc/ipa-sra.c  |  4 ++-
 gcc/testsuite/gcc.dg/ipa/pr96040.c | 57 ++
 2 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr96040.c

diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c
index c81e8869e7a..03e3fc55daf 100644
--- a/gcc/ipa-sra.c
+++ b/gcc/ipa-sra.c
@@ -3271,7 +3271,9 @@ all_callee_accesses_present_p (isra_param_desc 
*param_desc,
continue;
   param_access *pacc = find_param_access (param_desc, argacc->unit_offset,
  argacc->unit_size);
-  if (!pacc || !pacc->certain)
+  if (!pacc
+ || !pacc->certain
+ || !types_compatible_p (argacc->type, pacc->type))
return false;
 }
   return true;
diff --git a/gcc/testsuite/gcc.dg/ipa/pr96040.c 
b/gcc/testsuite/gcc.dg/ipa/pr96040.c
new file mode 100644
index 000..af7e9c4ed94
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr96040.c
@@ -0,0 +1,57 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -std=c99" } */
+
+
+int puts(const char *);
+int snprintf(char *, unsigned long, const char *, ...);
+unsigned long strspn(const char *, const char *);
+
+struct TValue {
+  union {
+long long i;
+double n;
+  } value_;
+  unsigned char tt_;
+};
+
+static int tostringbuff (struct TValue *num, char *str) {
+  int len;
+  if (num->tt_ == 3) {
+len = snprintf(str,50,"%lld",num->value_.i);
+  } else {
+len = snprintf(str,50,"%.14g",num->value_.n);
+if (str[strspn(str, "-0123456789")] == '\0') {
+  str[len++] = '.';
+  str[len++] = '0';
+}
+  }
+  return len;
+}
+
+void unused (int *buff, struct TValue *num) {
+  char junk[50];
+  *buff += tostringbuff(num, junk);
+}
+
+char space[400];
+
+void addnum2buff (int *buff, struct TValue *num) __attribute__((__noinline__));
+void addnum2buff (int *buff, struct TValue *num) {
+  *buff += tostringbuff(num, space);
+}
+
+int __attribute__((noipa)) check_space (char *s)
+{
+  return (s[0] == '1' && s[1] == '.' && s[2] =='0' && s[3] == '\0');
+}
+
+int main(void) {
+int buff = 0;
+struct TValue num;
+num.value_.n = 1.0;
+num.tt_ = 19;
+addnum2buff(, );
+if (!check_space(space))
+  __builtin_abort ();
+return 0;
+}
-- 
2.27.0

Re: [PATCH 02/13] OpenACC reference count overhaul

2020-07-03 Thread Thomas Schwinge

Hi!

To move us one small step forward:

On 2020-06-25T13:03:53+0200, I wrote:
> Ping, in particular my question about different 'GOMP_MAP_FORCE_FROM' vs.
> 'GOMP_MAP_FROM' handling.
>
> (I have not yet looked whether 'GOMP_MAP_ALWAYS_FROM' may be generate
> nowadays, given your pending front end/middle end patches.)

It isn't, at least not given the current test cases, and I'm not aware of
data movement in OpenACC with (OpenMP) "always" semantics.

> On 2020-05-19T17:58:16+0200, I wrote:
>> On 2019-12-17T22:02:27-0800, Julian Brown  wrote:
>>> --- a/libgomp/oacc-mem.c
>>> +++ b/libgomp/oacc-mem.c
>>
>> (Unhelpful diff trimmed.)
>>
>>> +/* Unmap variables for OpenACC "exit data", with optional finalization
>>> +   (affecting all mappings in this operation).  */
>>
>>> +static void
>>> +goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
>>> + void **hostaddrs, size_t *sizes,
>>> + unsigned short *kinds, bool finalize, goacc_aq aq)
>>> +{
>>> +  gomp_mutex_lock (_dev->lock);
>>
>>> +  for (size_t i = 0; i < mapnum; ++i)
>>>  {
>>
>>> +  unsigned char kind = kinds[i] & 0xff;
>>> +  bool copyfrom = false;
>>
>>> +  switch (kind)
>>
>>> +   case GOMP_MAP_FROM:
>>> +   case GOMP_MAP_FORCE_FROM:
>>> +   case GOMP_MAP_ALWAYS_FROM:
>>> + copyfrom = true;
>>> + /* Fallthrough.  */
>>
>> What is the case that a 'GOMP_MAP_ALWAYS_FROM' would be generated for
>> OpenACC code?  Putting an 'assert' here, it never triggers, given the
>> current set of libgomp test cases.  If there is such a case, we should
>> add a test case, otherwise, I suggest we do put an 'assert' here (whilst
>> leaving in the supposedly correct code, if you'd like), to document that
>> this not currently expected, and thus not tested?

Instead of keeping dead code, I decided it's better to just "[OpenACC]
Remove (unused) 'GOMP_MAP_ALWAYS_FROM' handling from
'libgomp/oacc-mem.c:goacc_exit_data_internal'"; pushed to master branch
in commit 995aba5867b1c64b2b56a200ef16b135effe85f7, and releases/gcc-10
branch in commit ddce10e77f04410c4ce376e6efdf520a7311a11b, see attached.
Should a 'GOMP_MAP_ALWAYS_FROM' now ever appear (I don't see how), it
will be diagnosed via the 'gomp_fatal' with 'UNHANDLED kind'.

>>> +
>>> +   case GOMP_MAP_TO_PSET:
>>> +   case GOMP_MAP_POINTER:
>>> +   case GOMP_MAP_DELETE:
>>> +   case GOMP_MAP_RELEASE:
>>> + {
>>> +   struct splay_tree_key_s cur_node;
>>> +   cur_node.host_start = (uintptr_t) hostaddrs[i];
>>> +   cur_node.host_end = cur_node.host_start
>>> +   + (kind == GOMP_MAP_POINTER
>>> +  ? sizeof (void *) : sizes[i]);
>>> +   splay_tree_key n
>>> + = splay_tree_lookup (_dev->mem_map, _node);
>>> +
>>> +   if (n == NULL)
>>> + continue;
>>> +
>>> +   if (finalize)
>>> + {
>>> +   if (n->refcount != REFCOUNT_INFINITY)
>>> + n->refcount -= n->virtual_refcount;
>>> +   n->virtual_refcount = 0;
>>> + }
>>> +
>>> +   if (n->virtual_refcount > 0)
>>> + {
>>> +   if (n->refcount != REFCOUNT_INFINITY)
>>> + n->refcount--;
>>> +   n->virtual_refcount--;
>>> + }
>>> +   else if (n->refcount > 0 && n->refcount != REFCOUNT_INFINITY)
>>> + n->refcount--;
>>> +
>>> +   if (copyfrom
>>> +   && (kind != GOMP_MAP_FROM || n->refcount == 0))
>>> + gomp_copy_dev2host (acc_dev, aq, (void *) cur_node.host_start,
>>> + (void *) (n->tgt->tgt_start + n->tgt_offset
>>> +   + cur_node.host_start
>>> +   - n->host_start),
>>> + cur_node.host_end - cur_node.host_start);
>>
>> That 'kind != GOMP_MAP_FROM' conditional looks wrong to me.  This should
>> instead be 'kind == GOMP_MAP_ALWAYS_FROM'?  Or, get removed, together
>> with the 'GOMP_MAP_ALWAYS_FROM' handling above?  But definitely
>> 'GOMP_MAP_FORCE_FROM' and 'GOMP_MAP_FROM' need to be handled the same, as
>> far as I can tell?

I've now pushed "[OpenACC] Revert always-copyfrom behavior for
'GOMP_MAP_FORCE_FROM' in 'libgomp/oacc-mem.c:goacc_exit_data_internal'"
to master branch in commit e7f3f7fe08bdd49367f682398e1d2f4e6b60ef84, and
releases/gcc-10 branch in commit
50666d23b52794774eefbeff046d5c3235db8b99, see attached.

>>> +
>>> +   if (n->refcount == 0)
>>> + gomp_remove_var_async (acc_dev, n, aq);
>>> + }
>>> + break;
>>> +   default:
>>> + gomp_fatal (" goacc_exit_data_internal UNHANDLED kind 0x%.2x",
>>> + kind);
>>> }
>>>  }
>>>
>>>gomp_mutex_unlock (_dev->lock);
>>
>>>  }


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From

Re: [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions

2020-07-03 Thread Matthew Malcomson

With suggestions applied.
Testing with `-mabi=ilp32` found a bug around the trampoline
initialisation where the new larger size of the trampoline caused a
different execution path of `emit_block_move` which ICE'd on the
pre-existing `ptr_mode` address.


Commit Message
--

Instructions following RET or BR are not necessarily executed.  In order
to avoid speculation past RET and BR we can simply append a speculation
barrier.

Since these speculation barriers will not be architecturally executed,
they are not expected to add a high performance penalty.

The speculation barrier is to be SB when targeting architectures which
have this enabled, and DSB SY + ISB otherwise.

We add tests for each of the cases where such an instruction was seen.

This is implemented by modifying each machine description pattern that
emits either a RET or a BR instruction.  We choose not to use something
like `TARGET_ASM_FUNCTION_EPILOGUE` since it does not affect the
`indirect_jump`, `jump`, `sibcall_insn` and `sibcall_value_insn`
patterns and we find it preferable to implement the functionality in the
same way for every pattern.

There is one particular case which is slightly tricky.  The
implementation of TARGET_ASM_TRAMPOLINE_TEMPLATE uses a BR which needs
to be mitigated against.  The trampoline template is used *once* per
compilation unit, and the TRAMPOLINE_SIZE is exposed to the user via the
builtin macro __LIBGCC_TRAMPOLINE_SIZE__.
In the future we may implement function specific attributes to turn on
and off hardening on a per-function basis.
The fixed nature of the trampoline described above implies it will be
safer to ensure this speculation barrier is always used.

Testing:
  Bootstrap and regtest done on aarch64-none-linux
  Used a temporary hack(1) to use these options on every test in the
  testsuite and a script to check that the output never emitted an
  unmitigated RET or BR.


1) Temporary hack was a change to the testsuite to always use
`-save-temps` and run a script on the assembly output of those
compilations which produced one to ensure every RET or BR is immediately
followed by a speculation barrier.


gcc/ChangeLog:

2020-07-03  Matthew Malcomson  

* config/aarch64/aarch64-protos.h (aarch64_sls_barrier): New.
* config/aarch64/aarch64.c (aarch64_output_casesi): Emit
speculation barrier after BR instruction if needs be.
(aarch64_trampoline_init): Handle ptr_mode value & adjust size
of code copied.
(aarch64_sls_barrier): New.
(aarch64_asm_trampoline_template): Add needed barriers.
* config/aarch64/aarch64.h (AARCH64_ISA_SB): New.
(TARGET_SB): New.
(TRAMPOLINE_SIZE): Account for barrier.
* config/aarch64/aarch64.md (indirect_jump, *casesi_dispatch,
*do_return, simple_return, *sibcall_insn, *sibcall_value_insn):
Emit barrier if needs be, also account for possible barrier using
"sls_length" attribute.
(sls_length): New attribute.
(length): Determine default using any non-default sls_length
value.
* config/aarch64/aarch64.opt (-mharden-sls-retbr): Introduce new
option.

gcc/testsuite/ChangeLog:

2020-07-03  Matthew Malcomson  

* gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c: New test.
* gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c:
New test.
* gcc.target/aarch64/sls-mitigation/sls-mitigation.exp: New file.
* lib/target-supports.exp (check_effective_target_aarch64_asm_sb_ok):
New proc.



### Attachment also inlined for ease of reply###


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
8ca67d7e69edaf73c84f079e7e1c483009ad10c0..b035e4ec78e2ef1c9a931148dffacf6a50345b84
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -780,6 +780,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
+const char *aarch64_sls_barrier (int);
 extern bool aarch64_harden_sls_retbr_p (void);
 extern bool aarch64_harden_sls_blr_p (void);
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
2be52fd4d73def0007795159298e3d3e8fc4399d..d60d295830d3e422bb4267de275597d2087b99e6
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -281,6 +281,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_F32MM (aarch64_isa_flags & AARCH64_FL_F32MM)
 #define AARCH64_ISA_F64MM (aarch64_isa_flags & AARCH64_FL_F64MM)
 #define AARCH64_ISA_BF16  (aarch64_isa_flags & AARCH64_FL_BF16)
+#define AARCH64_ISA_SB(aarch64_isa_flags & AARCH64_FL_SB)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
@@ -378,6 +379,9 @@ extern unsigned aarch64_architecture_version;
 #define

Re: [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags

2020-07-03 Thread Matthew Malcomson

With suggestions applied.


Here we introduce the flags that will be used for straight line speculation.

The new flag introduced is `-mharden-sls=`.
This flag can take arguments of `none`, `all`, or a comma seperated list of one
or more of `retbr` or `blr`.
`none` indicates no special mitigation of the straight line speculation
vulnerability.
`all` requests all mitigations currently implemented.
`retbr` requests that the RET and BR instructions have a speculation barrier
inserted after them.
`blr` requests that BLR instructions are replaced by a BL to a function stub
using a BR with a speculation barrier after it.

Setting this on a per-function basis using attributes or the like is not
enabled, but may be in the future.

gcc/ChangeLog:

2020-07-03  Matthew Malcomson  

* config/aarch64/aarch64-protos.h (aarch64_harden_sls_retbr_p):
New.
(aarch64_harden_sls_blr_p): New.
* config/aarch64/aarch64.c (enum aarch64_sls_hardening_type):
New.
(aarch64_harden_sls_retbr_p): New.
(aarch64_harden_sls_blr_p): New.
(aarch64_validate_sls_mitigation): New.
(aarch64_override_options): Parse options for SLS mitigation.
* config/aarch64/aarch64.opt (-mharden-sls): New option.
* doc/invoke.texi: Document new option.



### Attachment also inlined for ease of reply###


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
9e43adb7db0373df6cc5ef1d2b22f217aca2aad2..8ca67d7e69edaf73c84f079e7e1c483009ad10c0
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -780,4 +780,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
+extern bool aarch64_harden_sls_retbr_p (void);
+extern bool aarch64_harden_sls_blr_p (void);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
f3551a73d87c4e686540f39224985592c3c66fd1..b1a7c10c4eaadd78eb45926c23efc51a8272b5fd
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14502,6 +14502,80 @@ aarch64_validate_mcpu (const char *str, const struct 
processor **res,
   return false;
 }
 
+/* Straight line speculation indicators.  */
+enum aarch64_sls_hardening_type
+{
+  SLS_NONE = 0,
+  SLS_RETBR = 1,
+  SLS_BLR = 2,
+  SLS_ALL = 3,
+};
+static enum aarch64_sls_hardening_type aarch64_sls_hardening;
+
+/* Return whether we should mitigatate Straight Line Speculation for the RET
+   and BR instructions.  */
+bool
+aarch64_harden_sls_retbr_p (void)
+{
+  return aarch64_sls_hardening & SLS_RETBR;
+}
+
+/* Return whether we should mitigatate Straight Line Speculation for the BLR
+   instruction.  */
+bool
+aarch64_harden_sls_blr_p (void)
+{
+  return aarch64_sls_hardening & SLS_BLR;
+}
+
+/* As of yet we only allow setting these options globally, in the future we may
+   allow setting them per function.  */
+static void
+aarch64_validate_sls_mitigation (const char *const_str)
+{
+  char *token_save = NULL;
+  char *str = NULL;
+
+  aarch64_sls_hardening = SLS_NONE;
+  if (strcmp (const_str, "none") == 0)
+{
+  aarch64_sls_hardening = SLS_NONE;
+  return;
+}
+  if (strcmp (const_str, "all") == 0)
+{
+  aarch64_sls_hardening = SLS_ALL;
+  return;
+}
+
+  char *str_root = xstrdup (const_str);
+  str = strtok_r (str_root, ",", _save);
+  if (!str)
+error ("invalid argument given to %<-mharden-sls=%>");
+
+  int temp = SLS_NONE;
+  while (str)
+{
+  if (strcmp (str, "blr") == 0)
+   temp |= SLS_BLR;
+  else if (strcmp (str, "retbr") == 0)
+   temp |= SLS_RETBR;
+  else if (strcmp (str, "none") == 0 || strcmp (str, "all") == 0)
+   {
+ error ("%<%s%> must be by itself for %<-mharden-sls=%>", str);
+ break;
+   }
+  else
+   {
+ error ("invalid argument %<%s%> for %<-mharden-sls=%>", str);
+ break;
+   }
+  str = strtok_r (NULL, ",", _save);
+}
+  aarch64_sls_hardening = (aarch64_sls_hardening_type) temp;
+  free (str_root);
+}
+
 /* Parses CONST_STR for branch protection features specified in
aarch64_branch_protect_types, and set any global variables required.  
Returns
the parsing result and assigns LAST_STR to the last processed token from
@@ -14746,6 +14820,9 @@ aarch64_override_options (void)
   selected_arch = NULL;
   selected_tune = NULL;
 
+  if (aarch64_harden_sls_string)
+aarch64_validate_sls_mitigation (aarch64_harden_sls_string);
+
   if (aarch64_branch_protection_string)
 aarch64_validate_mbranch_protection (aarch64_branch_protection_string);
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 
d99d14c137d8774d3c8dab860d475f68c01a2817..5170361fd5e5721e044d1664e522b2718f654b8e
 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt

Re: Fortran : Fortran translation issues PR52279

2020-07-03 Thread David Edelsohn via Gcc-patches

Mark,

A full bootstrap is successful with the translation markers restored
and declaring hint as "const char *".

const char *hint = _(" [see %<-fno-allow-invalid-boz%>]");

I assume that the translation system works correctly for that style.

Do you want to apply the patch or do you want me to?

Thanks, David

On Thu, Jul 2, 2020 at 11:47 AM David Edelsohn  wrote:
>
> Mark,
>
> A quick test with
>
> const char hint [] = _(" [see %<-fno-allow-invalid-boz%>]");
>
> reproduces the failure.
>
> const char *hint = _(" [see %<-fno-allow-invalid-boz%>]");
>
> seems to work.  I will do a full bootstrap test with that change later today.
>
> Do you want me to commit it if it works or do you want me to report it to you?
>
> Thanks, David
>
> On Thu, Jul 2, 2020 at 9:38 AM David Edelsohn  wrote:
> >
> > On Thu, Jul 2, 2020 at 8:56 AM Mark Eggleston
> >  wrote:
> > >
> > > On 02/07/2020 13:25, David Edelsohn wrote:
> > > > On Thu, Jul 2, 2020 at 6:16 AM Mark Eggleston
> > > >  wrote:
> > > >> I've committed the change from array to pointer. Does this fix your 
> > > >> builds?
> > > >>
> > > >> On 02/07/2020 08:18, Mark Eggleston wrote:
> > > >>> On 01/07/2020 20:07, David Edelsohn wrote:
> > >  This patch breaks bootstrap.
> > > >>> Apologies. I didn't see this when I built the compiler the with
> > > >>> bootstrap on x86_64. I'll endevour to get it fixed as soon as 
> > > >>> possible.
> > > > Mark,
> > > >
> > > > Your change did nothing because I previously had removed the
> > > > translation markers to restore bootstrap.  You were not sufficiently
> > > > observant to check the ChangeLog or notice that the line already was
> > > > changed.
> > > >
> > > > Please stop making random, untested changes.  This is inappropriate
> > > > for GCC development.
> > > OK. I had no way of confirming as the bootstrap worked for me. I naively
> > > thought that it was covered as an "obvious" fix.
> > >
> > > I was concerned that I caused a breakage and wanted to fix it.
> > >
> > > Regarding the ChangeLog I was unfamiliar with this situation and will
> > > check them if this ever happens again (I hope not).  I'll also keep
> > > portability issues in mind.
> > >
> > > I see the issue of the incorrect hint text has been addressed. As I
> > > understand it the string as is will not make it to the pot file so will
> > > not be translated, is that acceptable? I would like to be able to close
> > > the PR.
> >
> > Mark,
> >
> > I reverted your change for that one line to restore bootstrap for
> > everyone because Trunk should not be broken for 6 hours and it was
> > trivial to revert that one, localized change without affecting the
> > rest of your patch.
> >
> > The translation message for hint should be in the pot file
> > translation.  This needs to be fixed correctly.
> >
> > I would suggest that you restore your original change and work to
> > reproduce the failure.  I don't know if you were not building with NLS
> > or had some other local configuration.  It seems that
> >
> > hint[] = _("...")
> >
> > cannot be valid C syntax unless _( ... ) is a no-op in your build and
> > the compiler sees a constant string.  The failure occurred for many
> > other developers.
> >
> > Thanks, David

Re: [PATCH] Add -fld-path= to specify an arbitrary executable as the linker

2020-07-03 Thread Martin Liška


On 7/2/20 9:34 PM, Fāng-ruì Sòng wrote:

On 2020-07-01, Fāng-ruì Sòng wrote:

On 2020-07-01, Martin Liška wrote:

On 6/30/20 5:32 PM, Fāng-ruì Sòng wrote:

There is some concern about clang's -fuse-ld=path
http://lists.llvm.org/pipermail/cfe-dev/2020-June/065710.html and use
of COMPILER_PATH vs PATH.
Shall we introduce another option like -fld-path=path (PATH is used,
COMPILER_PATH is not used)?


I would recommend first landing a patch to LLVM and then we can do
a corresponding change to GCC.

Martin


Thank a lot for you welcoming words! This is what I intend to add for clang: 
https://reviews.llvm.org/D83015

I'll create a GCC patch superseding this one later.


Attached the new patch.


Thank you for the update patch:


From e7f86cdcaf03e4ddb98d0df9d07894d9ffb7d91a Mon Sep 17 00:00:00 2001
From: Fangrui Song 
Date: Thu, 2 Jul 2020 12:26:09 -0700
Subject: [PATCH] Add -fld-path= to specify an arbitrary executable as the
 linker

The value can be either a relative path (relative to a COMPILER_PATH
directory or a PATH directory) or an absolute path. -fld-path=
complements -fuse-ld={bfd,gold,lld} which specifies the linker flavor.

PR driver/93645
* common.opt (-fld-path=): Add -fld-path=
* opts.c (common_handle_option): Handle OPT_fld_path_.
* gcc.c (driver_handle_option): Likewise.
* collect2.c (main): Likewise.
* doc/invoke.texi: Document -fld-path=.
---
 gcc/collect2.c  | 57 -
 gcc/common.opt  |  4 
 gcc/doc/invoke.texi |  6 +
 gcc/gcc.c   |  2 +-
 gcc/opts.c  |  1 +
 5 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/gcc/collect2.c b/gcc/collect2.c
index f8a5ce45994..efa652f7f82 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -844,6 +844,7 @@ main (int argc, char **argv)
   const char **ld1;
   bool use_plugin = false;
   bool use_collect_ld = false;
+  const char *ld_path = NULL;
 
   /* The kinds of symbols we will have to consider when scanning the

  outcome of a first pass link.  This is ALL to start with, then might
@@ -961,12 +962,21 @@ main (int argc, char **argv)
if (selected_linker == USE_DEFAULT_LD)
  selected_linker = USE_PLUGIN_LD;
  }
-   else if (strcmp (argv[i], "-fuse-ld=bfd") == 0)
- selected_linker = USE_BFD_LD;
-   else if (strcmp (argv[i], "-fuse-ld=gold") == 0)
- selected_linker = USE_GOLD_LD;
-   else if (strcmp (argv[i], "-fuse-ld=lld") == 0)
- selected_linker = USE_LLD_LD;
+   else if (strncmp (argv[i], "-fuse-ld=bfd", 9) == 0
+&& selected_linker != USE_LD_MAX)
+ {


This does not seem correct to me. You match -fuse-ld=bfd and then
test other option values in the following block.


+   if (strcmp (argv[i] + 9, "bfd") == 0)
+ selected_linker = USE_BFD_LD;
+   else if (strcmp (argv[i] + 9, "gold") == 0)
+ selected_linker = USE_GOLD_LD;
+   else if (strcmp (argv[i] + 9, "lld") == 0)
+ selected_linker = USE_LLD_LD;
+ }
+   else if (strncmp (argv[i], "-fld-path=", 10) == 0)
+ {
+   ld_path = argv[i] + 10;
+   selected_linker = USE_LD_MAX;
+ }
else if (strncmp (argv[i], "-o", 2) == 0)
  {
/* Parse the output filename if it's given so that we can make
@@ -1117,14 +1127,27 @@ main (int argc, char **argv)
   ld_file_name = find_a_file (, collect_ld_suffix, X_OK);
   use_collect_ld = ld_file_name != 0;
 }
-  /* Search the compiler directories for `ld'.  We have protection against
- recursive calls in find_a_file.  */
-  if (ld_file_name == 0)
-ld_file_name = find_a_file (, ld_suffixes[selected_linker], X_OK);
-  /* Search the ordinary system bin directories
- for `ld' (if native linking) or `TARGET-ld' (if cross).  */
-  if (ld_file_name == 0)
-ld_file_name = find_a_file (, full_ld_suffixes[selected_linker], 
X_OK);
+  if (selected_linker == USE_LD_MAX)
+{
+  /* If -fld-path= does not contain a slash, search for the command using
+the PATH environment variable.  */


We also support file systems like Windows where the comment about a slash will 
be misleading.
You can just mention relative vs. absolute path.


+  if (lbasename (ld_path) == ld_path)


Does it really return the same pointer? Maybe strcmp can be needed?


+   ld_file_name = find_a_file (, ld_path, X_OK);
+  else if (file_exists (ld_path))
+   ld_file_name = ld_path;
+}
+  else
+{
+  /* Search the compiler directories for `ld'.  We have protection against
+recursive calls in find_a_file.  */
+  if (ld_file_name == 0)
+   ld_file_name = find_a_file (, ld_suffixes[selected_linker], X_OK);
+  /* Search the ordinary system bin directories
+for `ld' (if native linking) or `TARGET-ld' (if cross).  */
+  if (ld_file_name == 0)
+   ld_file_name =
+

Re: [PATCH][RFC] Do not stream all zeros for gcda files.

2020-07-03 Thread Gerald Pfeifer

On Fri, 3 Jul 2020, Rainer Orth wrote:
> I'm seeing the same on both i386-pc-solaris2.11 and
> sparc-sun-solaris2.11.  It's in stage2, so the bootstrap compiler 
> (gcc 8 in my case) should be immaterial.

Yes, that's what I meant (but did not articulate well).

> The following patch allowed bootstrap to succeed:

Thank you!  I was close to switching my nightly tester from i386
(really i686) to amd64 (= x86-64), but it appears it makes sense
to keep it for a while longer.

Gerald

Re: [PATCH] gcov: fix gcov-tool merge for TOPN counters

2020-07-03 Thread Martin Liška


On 7/2/20 9:03 PM, Gerald Pfeifer wrote:

On Tue, 16 Jun 2020, Martin Liška wrote:

libgcc/ChangeLog:

* libgcov-util.c (read_gcda_finalize): Remove const operator.
(merge_wrapper): Add both counts and use them properly.
(topn_to_memory_representation): New function.
(gcov_merge): Covert on disk representation to in memory
representation.


It may not have been this patch, but looking around the bootstrap
failure I reported earlier I noticed the following in stage 2:

   /scratch/tmp/gerald/GCC-HEAD/gcc/../libgcc/libgcov-util.c: In function
   'int gcov_profile_merge(gcov_info*, gcov_info*, int, int)':
   /scratch/tmp/gerald/GCC-HEAD/gcc/../libgcc/libgcov-util.c:703:12:
   warning: '*' may be used uninitialized [-Wmaybe-uninitialized]
   703 |   tgt_tail = tgt_infos[tgt_cnt - 1];
   |   ~^~~~

Do you also see this (or is this specific to my 32-bit tester)?


Hello.

Yes, I can also see it. But it's a false positive..

Thanks for heads up,
Martin



Gerald

Re: [PATCH][RFC] Do not stream all zeros for gcda files.

2020-07-03 Thread Martin Liška


Hi.

I'm going to install the following, tested with -m32.

Martin
>From 6c9e35a569f5a46fed7c8de6ac22545cb845a913 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 3 Jul 2020 13:45:45 +0200
Subject: [PATCH] gcov-dump: fix build for i386

gcc/ChangeLog:

	PR bootstrap/96046
	* gcov-dump.c (tag_function): Use gcov_position_t
	type.

Co-Authored-By: Rainer Orth 
---
 gcc/gcov-dump.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/gcov-dump.c b/gcc/gcov-dump.c
index 97ff27861c6..7e412c8d329 100644
--- a/gcc/gcov-dump.c
+++ b/gcc/gcov-dump.c
@@ -299,7 +299,7 @@ tag_function (const char *filename ATTRIBUTE_UNUSED,
 	  unsigned tag ATTRIBUTE_UNUSED, int length,
 	  unsigned depth ATTRIBUTE_UNUSED)
 {
-  long pos = gcov_position ();
+  gcov_position_t pos = gcov_position ();
 
   if (!length)
 printf (" placeholder");
@@ -309,7 +309,7 @@ tag_function (const char *filename ATTRIBUTE_UNUSED,
   printf (", lineno_checksum=0x%08x", gcov_read_unsigned ());
   printf (", cfg_checksum=0x%08x", gcov_read_unsigned ());
 
-  if (gcov_position () - pos < length)
+  if (gcov_position () - pos < (gcov_position_t) length)
 	{
 	  const char *name;
 	  
-- 
2.27.0

Re: [PATCH][RFC] Do not stream all zeros for gcda files.

2020-07-03 Thread Martin Liška


On 7/3/20 10:46 AM, Rainer Orth wrote:

Hi Gerald,


On Thu, 2 Jul 2020, Martin Liška wrote:

All right, you convinced me and I'm going to install the patch.


I'm fraid this may have broke i386-unknown-freebsd-11.4 (with clang 10.0
as bootstrap compiler, though that doesn't appear to be the trigger here):

/scratch/tmp/gerald/OBJ-0702-1130/./prev-gcc/xg++ ...
   -g -O2 -fno-checking -gtoggle -DIN_GCC -fno-exceptions -fno-rtti
   -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings
   -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute
   -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
   -Wno-overlength-strings -Werror -fno-common ...
   -o gcov-dump.o -MT gcov-dump.o -MMD -MP -MF ./.deps/gcov-dump.TPo
   /scratch/tmp/gerald/GCC-HEAD/gcc/gcov-dump.c
/scratch/tmp/gerald/GCC-HEAD/gcc/gcov-dump.c: In function 'void 
tag_function(const char*, unsigned int, int, unsigned int)':
/scratch/tmp/gerald/GCC-HEAD/gcc/gcov-dump.c:312:34: error: comparison of 
integer expressions of different signedness: 'long unsigned int' and 'int' 
[-Werror=sign-compare]
   312 |   if (gcov_position () - pos < length)
   |   ~~~^~~~


I'm seeing the same on both i386-pc-solaris2.11 and
sparc-sun-solaris2.11.  It's in stage2, so the bootstrap compiler (gcc 8
in my case) should be immaterial.

The following patch allowed bootstrap to succeed:


Hello.

Thank you for the fix.

Please use '(gcov_position_t)' instread '(unsigned)'.
It's fine with the patch and please reference:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96046

Martin





Rainer

Re: [patch] Extend store merging to STRING_CST

2020-07-03 Thread Richard Biener via Gcc-patches

On Thu, Jul 2, 2020 at 9:08 PM Eric Botcazou  wrote:
>
> > So this variant combined with the rest of the patch is OK then.
>
> Thanks.  It occurred to me that using string_constant might be slightly better
> (iti is already used by gimple_fold_builtin_memchr in the same file).

OK.

Thanks,
Richard.

>
> * gimple-fold.c (gimple_fold_builtin_memory_op): Fold calls that were
> initially created for the assignment of a variable-sized destination
> and whose source is now a string constant.
>
> --
> Eric Botcazou

Re: [PATCH] OpenMP: Disable GPU threads when only teams are used

2020-07-03 Thread Jakub Jelinek via Gcc-patches

On Thu, Jul 02, 2020 at 10:16:25PM +0100, Andrew Stubbs wrote:
> On 02/07/2020 18:00, Jakub Jelinek wrote:
> > On Thu, Jul 02, 2020 at 05:15:20PM +0100, Andrew Stubbs wrote:
> > > This patch, originally by Kwok, auto-adjusts the default OpenMP target
> > > arguments to set num_threads(1) when there are no parallel regions. There
> > > may still be multiple teams in this case.
> > > 
> > > The result is that libgomp will not attempt to launch GPU threads that 
> > > will
> > > never get used.
> > > 
> > > OK to commit?
> > 
> > That doesn't look safe to me.
> > My understanding of the patch is that it looks for parallel construct
> > lexically in the target region, but that isn't sufficient, one can do that
> > only if the target region can't encounter a parallel construct in the target
> > region (i.e. the body and all functions that are called from it at runtime).
> 
> OpenMP is complicated. :-(

And it is and getting worse.

> Is it normally expected that the runtime will always launch the maximum
> number of threads, just in case?

That is an implementation detail, the OpenMP model doesn't require that.
The question is whether when encountering the parallel you can ask for more
threads or not.  E.g. on the host or in the host fallback, that is the case, we
can just pthread_create as many threads as needed, for PTX there is the
theoretical possibility to use dynamic parallelism, but I think it doesn't
really work well and there were major problems with that.

Anyway, I'd think OpenMP code that will only do teams and not parallel
paralelism will be very rare in practice, it is true that in our testsuite
we have probably a lot of tests for that but those are artificial tests.
If somebody wants to get as much as possible from the hw, one should use all
of teams, parallel and simd parallelism.

If the user put an explicit thread_limit clause, I'd just trust the user
what he is doing.  If not, it is implementation defined what the maximum
will be, but I'd say using a maximum of 1 if we don't find a parallel
construct lexically nested is not a good default, even when it can be
conforming.  Because a reasonable application will have the parallel
parallelism burried in one or more of the functions it calls, or if not,
will use explicit thread_limit(1).

If you want to perform some IPA analysis for this and tweak the default
thread_limit based on what it (conservatively) finds out, I have nothing
against that.

Jakub

[PATCH] tree-optimization/96037 - fix uninitialized use of slp_op

2020-07-03 Thread Richard Biener

The following avoids leaving slp_def as passed to vect_is_simple_use
by reference uninitialized.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-07-03  Richard Biener  

PR tree-optimization/96037
* tree-vect-stmts.c (vect_is_simple_use): Initialize *slp_def.
---
 gcc/tree-vect-stmts.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index d68547ed1b5..9fa8854028b 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -11259,6 +11259,7 @@ vect_is_simple_use (vec_info *vinfo, stmt_vec_info 
stmt, slp_tree slp_node,
 }
   else
 {
+  *slp_def = NULL;
   if (gassign *ass = dyn_cast  (stmt->stmt))
{
  if (gimple_assign_rhs_code (ass) == COND_EXPR
-- 
2.25.1

Re: [Patch][gcn, nvptx, offloading] mkoffload – handle -fpic/-fPIC

2020-07-03 Thread Tom de Vries

On 6/26/20 9:35 PM, Moore, Catherine wrote:
> Hi Tom,
> 
> It doesn't look like you were explicitly cc'd on this patch and probably 
> haven't seen it.  Would you mind taking a look and approving the nvptx 
> portions?
> 

[ thanks for the ping, I think was explicitly cc-ed though.  I'm usually
not too fast in reviewing, and this time a vacation added to that. ]

The patch looks good to me.

I tried out the patch with one test-case and -pie -fPIC/-fpic already
seems to works, so perhaps we could have at least one test-case
exercising this in libgomp?  That sounds easier to do than the
shared-lib test-case.

I think it's a good idea though to do fPIE/fpie as well, either in this
patch or as follow-up.

Thanks,
- Tom

> Thanks,
> Catherine
> 
>> -Original Message-
>> From: Gcc-patches [mailto:gcc-patches-boun...@gcc.gnu.org] On Behalf
>> Of Burnus, Tobias
>> Sent: Tuesday, June 23, 2020 11:21 AM
>> To: gcc-patches ; Jakub Jelinek
>> 
>> Cc: Stubbs, Andrew ; Schwinge, Thomas
>> 
>> Subject: [Patch][gcn, nvptx, offloading] mkoffload – handle -fpic/-fPIC
>>
>> If the offloading code is (only) in a library, one can come up
>> with the idea to build those parts as shared library – and link
>> it to the nonoffloading code.(*)
>>
>> Currently, this fails as the mkoffload calls the nonoffloading
>> compiler without the -fpic/-fPIC flags, even though the compiler
>> was originally invoked with those options. – And at some point,
>> the linker then complains.
>>
>> This patch simply adds -fpic/-fPIC to the calls to the nonoffloading
>> ("host") compiler, invoked from mkoffload, if they were present before.
>>
>> For the testcase at hand, this works with both AMDGCN and nvptx
>> with the attached patch.
>>
>> OK for the trunk?
>>
>> Tobias
>>
>> PS: I think as mid-/longterm project it would be nice to test this
>> in the testsuite, but that's unfortunately a larger task.
>>
>> (*) Thomas mentioned that this is supposed to work also in more
>> complex cases than the one I outlined, although, that is probably
>> currently the most common one.
>>
>> -
>> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München /
>> Germany
>> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
>> Alexander Walter

[committed][OG10] amdgcn: Add fold_left_plus vector reductions

2020-07-03 Thread Andrew Stubbs


Now backported to OG10.

Andrew

On 03/07/2020 11:11, Andrew Stubbs wrote:
This patch implements a floating-point fold_left_plus vector pattern, 
which gives a significant speed-up in the BabelStream "dot" benchmark.


The GCN architecture can't actually do an in-order vector reduction any 
more efficiently than that equivalent scalar algorithm, so this is a bit 
of a cheat.  However, dividing the problem into threads using OpenACC or 
OpenMP has already broken the in-order semantics, so we may as well 
optimize the operation at the vector level too.


If the user has specifically sorted the input data in order to get a 
more correct FP result then using multiple threads is already the wrong 
thing to do. But, if the input data is in no particular numerical order 
then this optimization will give a correct answer much faster, albeit 
possibly a slightly different one each run.


Andrew

[committed] amdgcn: Add fold_left_plus vector reductions

2020-07-03 Thread Andrew Stubbs

This patch implements a floating-point fold_left_plus vector pattern, 
which gives a significant speed-up in the BabelStream "dot" benchmark.


The GCN architecture can't actually do an in-order vector reduction any 
more efficiently than that equivalent scalar algorithm, so this is a bit 
of a cheat.  However, dividing the problem into threads using OpenACC or 
OpenMP has already broken the in-order semantics, so we may as well 
optimize the operation at the vector level too.


If the user has specifically sorted the input data in order to get a 
more correct FP result then using multiple threads is already the wrong 
thing to do. But, if the input data is in no particular numerical order 
then this optimization will give a correct answer much faster, albeit 
possibly a slightly different one each run.


Andrew
amdgcn: Add fold_left_plus vector reductions

These aren't real in-order instructions, because the ISA can't do that
quickly, but a means to allow regular out-of-order reductions when that's
good enough, but the middle-end doesn't know so.

	gcc/
	* config/gcn/gcn-valu.md (fold_left_plus_): New.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 6d7fecaa12c..26559ff765e 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -3076,6 +3076,26 @@ (define_expand "reduc__scal_"
 DONE;
   })
 
+;; Warning: This "-ffast-math" implementation converts in-order reductions
+;;  into associative reductions. It's also used where OpenMP or
+;;  OpenACC paralellization has already broken the in-order semantics.
+(define_expand "fold_left_plus_"
+ [(match_operand: 0 "register_operand")
+  (match_operand: 1 "gcn_alu_operand")
+  (match_operand:V_FP 2 "gcn_alu_operand")]
+  "can_create_pseudo_p ()
+   && (flag_openacc || flag_openmp
+   || flag_associative_math)"
+  {
+rtx dest = operands[0];
+rtx scalar = operands[1];
+rtx vector = operands[2];
+rtx tmp = gen_reg_rtx (mode);
+
+emit_insn (gen_reduc_plus_scal_ (tmp, vector));
+emit_insn (gen_add3 (dest, scalar, tmp));
+ DONE;
+   })
 
 (define_insn "*_dpp_shr_"
   [(set (match_operand:V_1REG 0 "register_operand"   "=v")

[PATCH] fix scalar BB vectorization costing

2020-07-03 Thread Richard Biener

We were costing the scalar pattern stmts rather than the scalar
original stmt and also not appropriately looking at the pattern
stmt for whether the stmt is vectorized.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

2020-07-03  Richard Biener  

* tree-vect-slp.c (vect_bb_slp_scalar_cost): Cost the
original non-pattern stmts, look at the pattern stmt
vectorization status.

* gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp-2.c: New
testcase.
---
 .../costmodel/x86_64/costmodel-vect-slp-2.c   | 14 ++
 gcc/tree-vect-slp.c   | 26 ++-
 2 files changed, 28 insertions(+), 12 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp-2.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp-2.c
new file mode 100644
index 000..1b7ac34ccaa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-slp-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fdump-tree-slp-details" } */
+
+int a[4], b[4];
+void foo()
+{
+  a[0] = b[0] / 7;
+  a[1] = b[1] / 7;
+  a[2] = b[2] / 7;
+  a[3] = b[3] / 7;
+}
+
+/* We should cost the original division stmt, not the scalar pattern stmts.  */
+/* { dg-final { scan-tree-dump-times " / 7 1 times scalar_stmt costs" 4 "slp2" 
} } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 727eba0b12f..33fc87a9f86 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3039,7 +3039,6 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
 
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
 {
-  gimple *stmt = stmt_info->stmt;
   ssa_op_iter op_iter;
   def_operand_p def_p;
 
@@ -3051,7 +3050,9 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
 required defs in the SLP children in the scalar cost.  This
 way we make the vectorization more costly when compared to
 the scalar cost.  */
-  FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, op_iter, SSA_OP_DEF)
+  stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
+  gimple *orig_stmt = orig_stmt_info->stmt;
+  FOR_EACH_SSA_DEF_OPERAND (def_p, orig_stmt, op_iter, SSA_OP_DEF)
{
  imm_use_iterator use_iter;
  gimple *use_stmt;
@@ -3059,7 +3060,8 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
if (!is_gimple_debug (use_stmt))
  {
stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
-   if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
+   if (!use_stmt_info
+   || !PURE_SLP_STMT (vect_stmt_to_vectorize (use_stmt_info)))
  {
(*life)[i] = true;
BREAK_FROM_IMM_USE_STMT (use_iter);
@@ -3070,23 +3072,23 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
continue;
 
   /* Count scalar stmts only once.  */
-  if (gimple_visited_p (stmt))
+  if (gimple_visited_p (orig_stmt))
continue;
-  gimple_set_visited (stmt, true);
+  gimple_set_visited (orig_stmt, true);
 
   vect_cost_for_stmt kind;
-  if (STMT_VINFO_DATA_REF (stmt_info))
-{
-  if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)))
+  if (STMT_VINFO_DATA_REF (orig_stmt_info))
+   {
+ if (DR_IS_READ (STMT_VINFO_DATA_REF (orig_stmt_info)))
kind = scalar_load;
-  else
+ else
kind = scalar_store;
-}
-  else if (vect_nop_conversion_p (stmt_info))
+   }
+  else if (vect_nop_conversion_p (orig_stmt_info))
continue;
   else
kind = scalar_stmt;
-  record_stmt_cost (cost_vec, 1, kind, stmt_info, 0, vect_body);
+  record_stmt_cost (cost_vec, 1, kind, orig_stmt_info, 0, vect_body);
 }
 
   auto_vec subtree_life;
-- 
2.26.2

Re: [PATCH][RFC] Do not stream all zeros for gcda files.

2020-07-03 Thread Rainer Orth

Hi Gerald,

> On Thu, 2 Jul 2020, Martin Liška wrote:
>> All right, you convinced me and I'm going to install the patch.
>
> I'm fraid this may have broke i386-unknown-freebsd-11.4 (with clang 10.0 
> as bootstrap compiler, though that doesn't appear to be the trigger here):
>
> /scratch/tmp/gerald/OBJ-0702-1130/./prev-gcc/xg++ ...
>   -g -O2 -fno-checking -gtoggle -DIN_GCC -fno-exceptions -fno-rtti 
>   -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
>   -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
>   -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
>   -Wno-overlength-strings -Werror -fno-common ...
>   -o gcov-dump.o -MT gcov-dump.o -MMD -MP -MF ./.deps/gcov-dump.TPo 
>   /scratch/tmp/gerald/GCC-HEAD/gcc/gcov-dump.c
> /scratch/tmp/gerald/GCC-HEAD/gcc/gcov-dump.c: In function 'void 
> tag_function(const char*, unsigned int, int, unsigned int)':
> /scratch/tmp/gerald/GCC-HEAD/gcc/gcov-dump.c:312:34: error: comparison of 
> integer expressions of different signedness: 'long unsigned int' and 'int' 
> [-Werror=sign-compare]
>   312 |   if (gcov_position () - pos < length)
>   |   ~~~^~~~

I'm seeing the same on both i386-pc-solaris2.11 and
sparc-sun-solaris2.11.  It's in stage2, so the bootstrap compiler (gcc 8
in my case) should be immaterial.

The following patch allowed bootstrap to succeed:

--- gcc/gcov-dump.c	2020-07-03 09:09:35.027662036 +
+++ gcc/gcov-dump.c.orig	2020-07-02 20:03:21.955097136 +
@@ -299,7 +299,7 @@ tag_function (const char *filename ATTRI
 	  unsigned tag ATTRIBUTE_UNUSED, int length,
 	  unsigned depth ATTRIBUTE_UNUSED)
 {
-  long pos = gcov_position ();
+  gcov_position_t pos = gcov_position ();
 
   if (!length)
 printf (" placeholder");
@@ -309,7 +309,7 @@ tag_function (const char *filename ATTRI
   printf (", lineno_checksum=0x%08x", gcov_read_unsigned ());
   printf (", cfg_checksum=0x%08x", gcov_read_unsigned ());
 
-  if (gcov_position () - pos < length)
+  if (gcov_position () - pos < (unsigned) length)
 	{
 	  const char *name;
 	  

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH] refactor SLP constant insertion and provde entry insert helper

2020-07-03 Thread Richard Biener

This provides helpers to insert stmts on region entry abstracted
from loop/basic-block split out from vec_init_vector and used
from the SLP constant code generation path.  The SLP constant
code generation path is also changed to avoid needless SSA
copying since we can store VECTOR_CSTs directly in the vectorized
defs array, improving the IL from the vectorizer.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

2020-07-03  Richard Biener  

* tree-vectorizer.h (vec_info::insert_on_entry): New.
(vec_info::insert_seq_on_entry): Likewise.
* tree-vectorizer.c (vec_info::insert_on_entry): Implement.
(vec_info::insert_seq_on_entry): Likewise.
* tree-vect-stmts.c (vect_init_vector_1): Use
vec_info::insert_on_entry.
(vect_finish_stmt_generation): Set modified bit after
adjusting VUSE.
* tree-vect-slp.c (vect_create_constant_vectors): Simplify
by using vec_info::insert_seq_on_entry and bypassing
vec_init_vector.
(vect_schedule_slp_instance): Deal with all-constant
children later.
---
 gcc/tree-vect-slp.c   | 35 +++
 gcc/tree-vect-stmts.c | 25 ++---
 gcc/tree-vectorizer.c | 40 
 gcc/tree-vectorizer.h |  2 ++
 4 files changed, 59 insertions(+), 43 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index eff68f76bc3..727eba0b12f 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3786,26 +3786,20 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree 
op_node)
  permute_results);
  vec_cst = permute_results[number_of_vectors - j - 1];
}
- tree init;
- if (insert_after)
+ if (!gimple_seq_empty_p (ctor_seq))
{
- gimple_stmt_iterator gsi = gsi_for_stmt (insert_after->stmt);
- /* vect_init_vector inserts before.  */
- gsi_next ();
- init = vect_init_vector (vinfo, NULL, vec_cst,
-  vector_type, );
-   }
- else
-   init = vect_init_vector (vinfo, NULL, vec_cst,
-vector_type, NULL);
- if (ctor_seq != NULL)
-   {
- gimple_stmt_iterator gsi
-   = gsi_for_stmt (SSA_NAME_DEF_STMT (init));
- gsi_insert_seq_before (, ctor_seq, GSI_SAME_STMT);
+ if (insert_after)
+   {
+ gimple_stmt_iterator gsi
+   = gsi_for_stmt (insert_after->stmt);
+ gsi_insert_seq_after (, ctor_seq,
+   GSI_CONTINUE_LINKING);
+   }
+ else
+   vinfo->insert_seq_on_entry (NULL, ctor_seq);
  ctor_seq = NULL;
}
- voprnds.quick_push (init);
+ voprnds.quick_push (vec_cst);
  insert_after = NULL;
   number_of_places_left_in_vector = nunits;
  constant_p = true;
@@ -4418,10 +4412,11 @@ vect_schedule_slp_instance (vec_info *vinfo,
  || vect_stmt_dominates_stmt_p (last_stmt, vstmt))
last_stmt = vstmt;
}
-   /* This can happen when all children are pre-existing vectors.  */
-   if (!last_stmt)
- last_stmt = vect_find_first_scalar_stmt_in_slp (node)->stmt;
  }
+  /* This can happen when all children are pre-existing vectors or
+constants.  */
+  if (!last_stmt)
+   last_stmt = vect_find_first_scalar_stmt_in_slp (node)->stmt;
   if (is_a  (last_stmt))
si = gsi_after_labels (gimple_bb (last_stmt));
   else
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index d68547ed1b5..9228f9cde4a 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1315,29 +1315,7 @@ vect_init_vector_1 (vec_info *vinfo, stmt_vec_info 
stmt_vinfo, gimple *new_stmt,
   if (gsi)
 vect_finish_stmt_generation (vinfo, stmt_vinfo, new_stmt, gsi);
   else
-{
-  loop_vec_info loop_vinfo = dyn_cast  (vinfo);
-
-  if (loop_vinfo)
-{
- class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
- basic_block new_bb;
- edge pe;
-
- if (stmt_vinfo && nested_in_vect_loop_p (loop, stmt_vinfo))
-   loop = loop->inner;
-
- pe = loop_preheader_edge (loop);
-  new_bb = gsi_insert_on_edge_immediate (pe, new_stmt);
-  gcc_assert (!new_bb);
-   }
-  else
-   {
-  bb_vec_info bb_vinfo = dyn_cast  (vinfo);
- gimple_stmt_iterator gsi_region_begin = bb_vinfo->region_begin;
- gsi_insert_before (_region_begin, new_stmt, GSI_SAME_STMT);
-   }
-}
+vinfo->insert_on_entry (stmt_vinfo,

Re: Inconsistencies with associative/unordered containers

2020-07-03 Thread François Dumont via Gcc-patches


Hi

    Here is the patch to fix the 2nd point of this mail below.

    I prefer to qualify _Rb_tree_impl move constructor based on 
std::is_nothrow_move_constructible<_Base_key_compare> so that the logic 
of copying _Compare rather than moving it stays an implementation detail 
of _Rb_tree_key_compare.


    libstdc++: Fix [multi]map/[multi]set move constructors noexcept 
qualification


    Container move constructors shall not consider their allocator move
    constructor qualification.

    libstdc++-v3/ChangeLog:

    * include/bits/stl_tree.h (_Rb_tree_impl(_Rb_tree_impl&&)): 
Add noexcept

    qualification based only on _Compare one.
    * 
testsuite/23_containers/map/cons/noexcept_move_construct.cc: Add

    static asserts.
    * 
testsuite/23_containers/multimap/cons/noexcept_move_construct.cc:

    Likewise.
    * 
testsuite/23_containers/multiset/cons/noexcept_move_construct.cc:

    Likewise.
    * 
testsuite/23_containers/set/cons/noexcept_move_construct.cc: Likewise.


New tests run under Linux x86_64, ok to commit after other tests complete ?

François

On 01/07/20 10:50 pm, Jonathan Wakely wrote:

On 01/07/20 22:20 +0200, François Dumont wrote:

On 01/07/20 9:20 pm, Jonathan Wakely wrote:

On 01/07/20 19:10 +0300, Никита Байков wrote:

Hello,

I'm trying to make sense of associative and unordered containers' 
behavior. Some of the implementation details seem a little 
inconsistent to me. Could you please take a look?


Most of my questions concern move constructors and assignments, and 
their noexcept specifications. To be more specific, let me list 
some examples.


1. Move constructor for unordered_set is explicitly defaulted on 
its first declaration:


   /// Move constructor.
   unordered_set(unordered_set&&) = default;

Since unordered_set has no base classes and has just one data 
member of type _Hashtable, its noexcept specification would be 
derived from the following:


  template   typename _H1, typename _H2, typename _Hash, typename 
_RehashPolicy,

   typename _Traits>
    _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
   _H1, _H2, _Hash, _RehashPolicy, _Traits>::
    _Hashtable(_Hashtable&& __ht) noexcept
    : __hashtable_base(__ht),
   /* ... the of the initializer list and the constructor body */

This move constructor is marked noexcept unconditionally despite 
the fact it has to copy hash function _H1 and function _Equal that 
compares the keys.


Looks like a bug. It was introduced by
https://gcc.gnu.org/g:0462b6aa20fd6734f7497f5eed9496d33701a952

This program exited normally with GCC 4.8 and terminates since 4.9.0:

#include 

bool boom = false;

struct Eq : std::equal_to
{
  Eq() { }
  Eq(const Eq&) { if (boom) throw 1; }
};
  int main()
{
  std::unordered_set, Eq> s;
  try {
    boom = true;
    auto s2 = std::move(s);
  } catch (int) {
  }
}


I had notice this one and submitted a patch as part of a series:

https://gcc.gnu.org/pipermail/libstdc++/2019-November/049597.html

I would prefer to commit in the submitted order but if you wish I can 
reorder those.


I'll try to review the series tomorrow. Thanks for the reminder.



At the same time, its move assignment operator is also explicitly 
defaulted:


   /// Move assignment operator
   unordered_set&
   operator=(unordered_set&&) = default;

and therefore is conditionally noexcept, since the corresponding 
_Hashtable::operator= is declared as


_Hashtable&
  operator=(_Hashtable&& __ht)
noexcept(__node_alloc_traits::_S_nothrow_move()
   && is_nothrow_move_assignable<_H1>::value
   && is_nothrow_move_assignable<_Equal>::value);

So, the first question is why the copy assignment of _H1 and _Equal 
is considered to never throw, while the move assignment is not. As 
a side


Looks like a bug.

question, why does the move constructor choose to copy _H1 and 
_Equal instead of moving them?


Howard Hinnant asked me about that recently, because we do the same
for maps and sets as well as for unordered maps and sets.

It's probably wrong, but I don't really want to change it until
https://cplusplus.github.io/LWG/issue2227 is resolved.

2. Let's assume that there is nothing wrong with noexcept 
specifications in unordered containers, i.e., all constructors and 
assignment operators are marked noexcept as they ought to be. Why 
then are the associative containers not implemented the same way? 
As far as I can see in bits/stl_set.h and bits/stl_tree.h,


   //  Move constructor
   set(set&&) = default;

   // The only data member of set is _Rb_tree object, whose move 
constructor is explicitly defaulted

   _Rb_tree(_Rb_tree&&) = default;

   // The only data member of _Rb_tree is _Rb_tree_impl<_Compare> 
object that is defined as

   template
  struct _Rb_tree_impl
  : public _Node_allocator
  , public _Rb_tree_key_compare<_Key_compare>
  , public _Rb_tree_header
 {
  _Rb_tree_impl()

[PATCH] libgomp: Add OMPD process functions and datatypes.

Re: [PATCH] c++: Make convert_like complain about bad ck_ref_bind again [PR95789]

Re: [PATCH] Add TARGET_LOWER_LOCAL_DECL_ALIGNMENT [PR95237]

Re: [PATCH] avoid -Wnonnull for lambda stubs (PR c++/95984)

[PATCH] PR fortran/95709 - [9/10/11 Regression] ICE in gfc_resolve_code, at fortran/resolve.c:11807

Re: [PATCH] libgomp: Add OMPD per-process functions.

[wwwdocs] Document new G++ features

[wwwdocs] Another update of the C++ DR table

[PATCH] libgomp: Add OMPD per-process functions.

[PATCH] nvptx: : Add support for popcount and widening multiply instructions

Re: [PATCH] Add -fld-path= to specify an arbitrary executable as the linker

Re: [PATCH 2/3] Don't copy back vars mapped with acc_map_data

Re: [PATCH 3/9] [OpenACC] Adjust dynamic reference count semantics

[PATCH] ipa-sra: Avoid transitive splits with type mismatches (PR 96040)

Re: [PATCH 02/13] OpenACC reference count overhaul

Re: [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions

Re: [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags

Re: Fortran : Fortran translation issues PR52279

Re: [PATCH] Add -fld-path= to specify an arbitrary executable as the linker

Re: [PATCH][RFC] Do not stream all zeros for gcda files.

Re: [PATCH] gcov: fix gcov-tool merge for TOPN counters

Re: [PATCH][RFC] Do not stream all zeros for gcda files.

Re: [PATCH][RFC] Do not stream all zeros for gcda files.

Re: [patch] Extend store merging to STRING_CST

Re: [PATCH] OpenMP: Disable GPU threads when only teams are used

[PATCH] tree-optimization/96037 - fix uninitialized use of slp_op

Re: [Patch][gcn, nvptx, offloading] mkoffload – handle -fpic/-fPIC

[committed][OG10] amdgcn: Add fold_left_plus vector reductions

[committed] amdgcn: Add fold_left_plus vector reductions

[PATCH] fix scalar BB vectorization costing

Re: [PATCH][RFC] Do not stream all zeros for gcda files.

[PATCH] refactor SLP constant insertion and provde entry insert helper

Re: Inconsistencies with associative/unordered containers

33 matches

Site Navigation

Mail list logo

Footer information