Re: [PATCH] Fix symver attribute with LTO

2019-12-17 Thread Jan Hubicka
> Hi,
> with Jan's patch commited in r278878 we can use symver attribute for functions
> and variables.  The symver attribute is designed for replacing toplevel asm
> statements containing ".symver" which may be removed by LTO.  Unfortunately,
> a quick test shown GCC still generates buggy so file with LTO and new symver
> attribute.

Thanks for looking into this.  It was on my TODo list to actually
convert some packages, so it is great you did that.
> 
> Two issues:
> 
> 1. The symver node in symtab is marked as PREVAILING_DEF_IRONLY (no EXP) and
>then removed by LTO.

This is however wrong - linker should not mark it as
PREVAILING_DEF_IRONLY if it is used externally.  What linker do you use?
On my testcases this was working with
 GNU ld (GNU Binutils) 2.31.51.20181222
I could easily imagine that some linkers get it wrong which should be
reported to bintuils bugzilla but it is also easy to work around as done
in your patch.

> 2. The actual function body implementing the symver-ed function is also marked
>as PREVAILING_DEF_IRONLY and then removed or marked as local.  So no 
> ".globl"
>directive is outputed for it.

Here is the symver-ed function exported from the DSO (or is it set
to have hidden attribute)?
Again this was working for me, so it would be good to understand this
issue.

I was thinking to extend the patch to also use name@@@nodename syntax in
case the target node is static.  This also needs bit extra work since
during LTO partitioning we can not bring such symbol hidden and need to
introduce additional attribute.  I can look into that tomorrow.

Honza
> 
> Both issue cause symbols with symver missing in DSO (with LTO enabled).
> 
> I modified fuse-3.9.0 code to use new symver attribute and tried to build it
> with GCC trunk and LTO.  The result is a buggy DSO.  With this patch applied,
> fuse-3.9.0 can be built with LTO enabled and no problem.
> 
> I'll test symver patch and this patch with more packages.
> 
> Bootstrapped/regtested x86_64-linux.  I'm not a maintainer.
> 
> gcc/ChangeLog:
> 
> 2019-12-17  Xi Ruoyao  
> 
>   * cgraph.h (symtab_node::used_from_object_file_p): Symver nodes are
>   part of DSO ABI so always used by non-LTO object files.
>   * ipa-visibility.c (cgraph_externally_visible_p): Functions with symver
>   attributes should always be visible.
> 
> Index: gcc/cgraph.h
> ===
> --- gcc/cgraph.h  (revision 279452)
> +++ gcc/cgraph.h  (working copy)
> @@ -2682,7 +2682,7 @@ symtab_node::used_from_object_file_p (vo
>  {
>if (!TREE_PUBLIC (decl) || DECL_EXTERNAL (decl))
>  return false;
> -  if (resolution_used_from_other_file_p (resolution))
> +  if (symver || resolution_used_from_other_file_p (resolution))
>  return true;
>return false;
>  }
> Index: gcc/ipa-visibility.c
> ===
> --- gcc/ipa-visibility.c  (revision 279452)
> +++ gcc/ipa-visibility.c  (working copy)
> @@ -216,6 +216,8 @@ cgraph_externally_visible_p (struct cgra
>  return true;
>if (lookup_attribute ("noipa", DECL_ATTRIBUTES (node->decl)))
>  return true;
> +  if (lookup_attribute ("symver", DECL_ATTRIBUTES (node->decl)))
> +return true;
>if (TARGET_DLLIMPORT_DECL_ATTRIBUTES
>&& lookup_attribute ("dllexport",
>  DECL_ATTRIBUTES (node->decl)))
> -- 
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University
> 


Re: [patch] libgomp/openacc.f90 – clean-up public/private attributes

2019-12-17 Thread Tobias Burnus

Hi Thomas,

updated version committed (r279456) – which adds 'acc_device_gcn' to 
openacc_lib.h – besides the other cleanup.


On 12/16/19 9:41 PM, Thomas Schwinge wrote
I'll point out there also exists 'libgomp/config/accel/openacc.f90', 
I have now updated that file in the same way – and added a cross-ref 
comment to both libgomp/openacc.f90 and libgomp/openacc_lib.h.
Ouch. Ah, no: you said "the default was already 'public'" -- so 
there's no need to backport that to gcc-9-branch


Yes, before, it was just a stylistic problem.

so that 'acc_copyout_finalize', 'acc_delete_finalize' will be 
available for OpenACC Fortran users. (These functions are, what would 
you expect, no covered by any test case in 'libgomp.oacc-fortran/'...)


They are also not documented in libgomp.texi – nor are some of the 
_async_ ones. They are at least tested in C/C++ [via 
libgomp.oacc-c-c++-common/lib-32.c (both) and 
libgomp.oacc-c-c++-common/pr92843-1.c (acc_copyout_finalize, only)].



This isn't "'module openmp'". ;-)

Is it not? ;-)
Some vertical space before the "From openacc_kinds" comment, and 
before the 'acc_async_*' ones

I added one before – but not after to make is visually belong to that block.

public :: acc_update_device, acc_update_self, acc_is_present
public :: acc_copyin_async, acc_create_async, acc_copyout_async
public :: acc_delete_async, acc_update_device_async, acc_update_self_async
+  public :: acc_copyout_finalize, acc_delete_finalize

Put these into the place where they really belong, after 'acc_copyout,
and 'acc_delete', respectively?


I left them there, for now. The _async variants are also in one 
block and not sorted after their respective  variants.


Thanks for the suggestions & hints!

Cheers,

Tobias

commit 3e1b818b7a653e1f12194ef1c75edf7594794b43
Author: burnus 
Date:   Tue Dec 17 11:19:32 2019 +

libgomp/openacc.f90 – clean-up public/private attributes

* config/accel/openacc.f90 (module openacc_kinds): Use 'PUBLIC' to mark
all symbols as public except for the 'use …, only' imported symbol,
which is private.
(module openacc): Default to 'PRIVATE' to exclude openacc_internal; mark
all symbols from module openacc_kinds as PUBLIC
* openacc.f90: Add comment with crossref to that file and openmp_lib.h;
fix comment typo.
* openacc_lib.h (acc_device_gcn): Add this PARAMETER.



git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@279456 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 6b16bf34b17..b46a68255df 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,14 @@
+2019-12-17  Tobias Burnus  
+
+	* config/accel/openacc.f90 (module openacc_kinds): Use 'PUBLIC' to mark
+	all symbols as public except for the 'use …, only' imported symbol,
+	which is private.
+	(module openacc): Default to 'PRIVATE' to exclude openacc_internal; mark
+	all symbols from module openacc_kinds as PUBLIC
+	* openacc.f90: Add comment with crossref to that file and openmp_lib.h;
+	fix comment typo.
+	* openacc_lib.h (acc_device_gcn): Add this PARAMETER.
+
 2019-12-13  Julian Brown  
 
 	PR libgomp/92881
diff --git a/libgomp/config/accel/openacc.f90 b/libgomp/config/accel/openacc.f90
index 6a8c5e9cb3d..badf5e176dd 100644
--- a/libgomp/config/accel/openacc.f90
+++ b/libgomp/config/accel/openacc.f90
@@ -36,13 +36,12 @@ module openacc_kinds
   use iso_fortran_env, only: int32
   implicit none
 
+  public
   private :: int32
-  public :: acc_device_kind
 
-  integer, parameter :: acc_device_kind = int32
+  ! When adding items, also update 'public' setting in 'module openacc' below.
 
-  public :: acc_device_none, acc_device_default, acc_device_host
-  public :: acc_device_not_host, acc_device_nvidia
+  integer, parameter :: acc_device_kind = int32
 
   ! Keep in sync with include/gomp-constants.h.
   integer (acc_device_kind), parameter :: acc_device_none = 0
@@ -53,7 +52,7 @@ module openacc_kinds
   integer (acc_device_kind), parameter :: acc_device_nvidia = 5
   integer (acc_device_kind), parameter :: acc_device_gcn = 8
 
-end module
+end module openacc_kinds
 
 module openacc_internal
   use openacc_kinds
@@ -75,13 +74,20 @@ module openacc_internal
   integer (c_int), value :: d
 end function
   end interface
-end module
+end module openacc_internal
 
 module openacc
   use openacc_kinds
   use openacc_internal
   implicit none
 
+  private
+
+  ! From openacc_kinds
+  public :: acc_device_kind
+  public :: acc_device_none, acc_device_default, acc_device_host
+  public :: acc_device_not_host, acc_device_nvidia, acc_device_gcn
+
   public :: acc_on_device
 
   interface acc_on_device
diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90
index b37f1872d50..fb7fc6e6d77 100644
--- a/libgomp/openacc.f90
+++ b/libgomp/openacc.f90
@@ -27,6 +27,8 @@
 !  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 !  

Re: [Patch 0/X] HWASAN v3

2019-12-17 Thread Matthew Malcomson
I've noticed a few minor problems with this patch series after I sent it 
out (mostly testcase stuff, one documentation tidy-up, but also that one 
patch didn't bootstrap due to something fixed in a later patch).

I also rely on a documentation change that isn't part of the series.

I figure I should make this easy on anyone that wants to try the patch 
series out, so I'm attaching a compressed tarfile containing the entire 
patch series plus the additional documentation patch so it can all be 
applied at once with `git apply *`.

It's attached.

Matthew.



On 12/12/2019 15:18, Matthew Malcomson wrote:
> Hello,
> 
> I've gone through the suggestions Martin made and implemented  the ones I 
> think
> I can implement for GCC10.
> 
> The two functionality changes in this version are:
> Added the --param's hwasan-instrument-reads, hwasan-instrument-writes,
> hwasan-instrument-allocas, hwasan-memintrin, options.  I.e. Those that asan 
> has
> and that make sense for hwasan.
> 
> Avoided HWASAN_STACK_BACKGROUND in hwasan_increment_tag when using a
> deterministic tagging approach.
> 
> 
> There are a lot of extra comments and tests.
> 
> 
> Bootstrapped and regtested on x86_64 and AArch64.
> Bootstrapped with `--with-build-config=bootstrap-hwasan` on AArch64 and hwasan
> features tested there.
> Built the linux kernel using this feature and ran the test_kasan.ko testing to
> check the this works for the kernel.
> (NOTE: I actually did all the above testing before a search and replace of
> `memory_tagging_p` for `hwasan_sanitize_p` and fixing a typo in the
> `hwasan-instrument-allocas` parameter name, I will run all the tests again
> before committing but figure I'll send this out now since I fully expect the
> tests to still pass).
> 
> 
> I noticed one extra testsuite failure from those mentioned in the previous
> version emails: g++.dg/cpp2a/ucn2.C.
> I believe this is HWASAN correctly catching a problem in the compiler.
> I've logged the issue here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92919 
> .
> 
> 
> I haven't gotten ASAN_MARK to print as HWASAN_MARK when using memory tagging,
> since I'm not sure the way I found to implement this would be acceptable.  The
> inlined patch below works but it requires a special declaration instead of 
> just
> an ~#include~.
> 
> 
> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> index a1bc081..d81eb12 100644
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -101,10 +101,16 @@ extern void init_internal_fns ();
>   
>   extern const char *const internal_fn_name_array[];
>   
> +
> +extern bool hwasan_sanitize_p (void);
>   static inline const char *
>   internal_fn_name (enum internal_fn fn)
>   {
> -  return internal_fn_name_array[(int) fn];
> +  const char *ret = internal_fn_name_array[(int) fn];
> +  if (! strcmp (ret, "ASAN_MARK")
> +  && hwasan_sanitize_p ())
> +return "HWASAN_MARK";
> +  return ret;
>   }
>   
>   extern internal_fn lookup_internal_fn (const char *);
> 
> 
> Entire patch series attached to cover letter.
> 



all-patches.tar.gz
Description: all-patches.tar.gz


[PATCH] IPA-CP: Remove bogus static keyword (PR 92971)

2019-12-17 Thread Martin Jambor
Hi,

as reported in PR 92971, IPA-CP's
cgraph_edge_brings_all_agg_vals_for_node defines one local variable with
the static keyword which is a clear mistake, probabley a cut'n'paste
error when I originally wrote the code.

I'll commit the following as obvious after a round of bootstrap and
testing.  Early next year, I'll also commit it to all opened release
branches.

Thanks,

Martin


2019-12-17  Martin Jambor  

* ipa-cp.c (cgraph_edge_brings_all_agg_vals_for_node): Remove
static from local variable definition.
---
 gcc/ipa-cp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 1a80ccbde2d..6692eb7b878 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -5117,7 +5117,7 @@ cgraph_edge_brings_all_agg_vals_for_node (struct 
cgraph_edge *cs,
 
   for (i = 0; i < count; i++)
 {
-  static vec values = vNULL;
+  vec values = vNULL;
   class ipcp_param_lattices *plats;
   bool interesting = false;
   for (struct ipa_agg_replacement_value *av = aggval; av; av = av->next)
-- 
2.24.0



Re: Move -Wmaybe-uninitialized to -Wextra

2019-12-17 Thread Pedro Alves
On 12/16/19 2:45 PM, Martin Jambor wrote:
> On Sat, Dec 07 2019, Jeff Law wrote:
>> [...]

> I'm afraid I that -Wmaybe-uninitialized is getting out of hand.  I bet
> that some users regularly get these warnings coming from c++ header
> "libraries" (like they sometimes come out our vec.h which recently
> "broke" bootstrap) which they sometimes even cannot change and they then
> conclude that our -Wall is "unusable" and stop paying attention to all
> warnings.

-Wmaybe-uninitialized that trigger in std::optional (and clones) (PR80635 [1])
are particularly annoying, and there's no sane workaround the user can apply.
You'll find quite a number of those just by googling for it:

  https://www.google.com/search?q=std+optional+"-Wmaybe-uninitialized;

We have a few of those in GDB, and because GDB uses -Wall + -Werror, GDB
nowadays builds with -Wno-error=maybe-uninitialized so that we see the
warnings but the build continues without error.  People still occasionally
get confused and waste time with those warnings, though.  Here, just
this week, point 5:

  https://sourceware.org/ml/gdb-patches/2019-12/msg00706.html

FWIW, I've considered completely disabling -Wmaybe-uninitialized in
GDB instead of downgrading it from -Werror to a warning with -Wno-error.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80635

-- 
Pedro Alves



[committed, amdgcn] Implement clz and ctz

2019-12-17 Thread Andrew Stubbs
This patch implements the count leading and trailing zeros instruction 
patterns in the AMD GCN backend.


This is prerequisite for implementing the extract_last patterns.

Andrew Stubbs
Mentor Graphics / CodeSourcery
Add clz and ctz for amdgcn

2019-12-17  Andrew Stubbs  

	gcc/
	* config/gcn/gcn.h (CLZ_DEFINED_VALUE_AT_ZERO): Define.
	(CTZ_DEFINED_VALUE_AT_ZERO): Define.
	* config/gcn/gcn.md (s_mnemonic): Add clz and ctz.
	(expander): Likewise.
	(countzeros): New code iterator.
	(si2): New insn pattern.
	(di2): New insn pattern.

diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index bdf7188b5ff..76b449ba5cf 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -644,6 +644,10 @@ enum gcn_builtin_codes
 /* This needs to match gcn_function_value.  */
 #define LIBCALL_VALUE(MODE) gen_rtx_REG (MODE, SGPR_REGNO (RETURN_VALUE_REG))
 
+/* The s_ff0 and s_flbit instructions return -1 if no input bits are set.  */
+#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) ((VALUE) = -1, 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) ((VALUE) = -1, 2)
+
 
 /* Costs.  */
 
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 36908ba45f6..b48af0dbde8 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -331,7 +331,9 @@
 
 (define_code_attr s_mnemonic
   [(not "not%b")
-   (popcount "bcnt1_i32%b")])
+   (popcount "bcnt1_i32%b")
+   (clz "flbit_i32%b")
+   (ctz "ff1_i32%b")])
 
 (define_code_attr revmnemonic
   [(minus "subrev%i")
@@ -356,7 +358,9 @@
(umin "umin")
(umax "umax")
(not "one_cmpl")
-   (popcount "popcount")])
+   (popcount "popcount")
+   (clz "clz")
+   (ctz "ctz")])
 
 ;; }}}
 ;; {{{ Miscellaneous instructions
@@ -1389,6 +1393,28 @@
   [(set_attr "type" "sop1,vop1")
(set_attr "length" "8")])
 
+(define_code_iterator countzeros [clz ctz])
+
+(define_insn "si2"
+  [(set (match_operand:SI 0 "register_operand"  "=Sg,Sg")
+(countzeros:SI
+	  (match_operand:SI 1 "gcn_alu_operand" "SgA, B")))]
+  ""
+  "s_1\t%0, %1"
+  [(set_attr "type" "sop1")
+   (set_attr "length" "4,8")])
+
+; The truncate ensures that a constant passed to operand 1 is treated as DImode
+(define_insn "di2"
+  [(set (match_operand:SI 0 "register_operand""=Sg,Sg")
+	(truncate:SI
+	  (countzeros:DI
+	(match_operand:DI 1 "gcn_alu_operand" "SgA, B"]
+  ""
+  "s_1\t%0, %1"
+  [(set_attr "type" "sop1")
+   (set_attr "length" "4,8")])
+
 ;; }}}
 ;; {{{ ALU: generic 32-bit binop
 


[committed, amdgcn] Implement extract_last and fold_extract_last

2019-12-17 Thread Andrew Stubbs
This patch implements the vector extract last instruction patterns in 
the AMD GCN backend.


This is both an optimization and a "fix" for pr92772, in which the 
conditional reduction algorithm is broken for architectures with masked 
vectors.


This fixes too many testcase failures in vect.exp to name them all 
individually, but includes vect-cond_reduc-* and pr65947-10.c.


Andrew Stubbs
Mentor Graphics / CodeSourcery
Add extract_last for amdgcn

2019-12-17  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md (extract_last_): New expander.
	(fold_extract_last_): New expander.

	gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_vect_fold_extract_last): Add amdgcn.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 42604466161..3b3be8a9e36 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -591,6 +591,48 @@
(set_attr "exec" "none")
(set_attr "laneselect" "yes")])
 
+(define_expand "extract_last_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:DI 1 "gcn_alu_operand")
+   (match_operand:VEC_ALLREG_MODE 2 "register_operand")]
+  "can_create_pseudo_p ()"
+  {
+rtx dst = operands[0];
+rtx mask = operands[1];
+rtx vect = operands[2];
+rtx tmpreg = gen_reg_rtx (SImode);
+
+emit_insn (gen_clzdi2 (tmpreg, mask));
+emit_insn (gen_subsi3 (tmpreg, GEN_INT (63), tmpreg));
+emit_insn (gen_vec_extract (dst, vect, tmpreg));
+DONE;
+  })
+
+(define_expand "fold_extract_last_"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "gcn_alu_operand")
+   (match_operand:DI 2 "gcn_alu_operand")
+   (match_operand:VEC_ALLREG_MODE 3 "register_operand")]
+  "can_create_pseudo_p ()"
+  {
+rtx dst = operands[0];
+rtx default_value = operands[1];
+rtx mask = operands[2];
+rtx vect = operands[3];
+rtx else_label = gen_label_rtx ();
+rtx end_label = gen_label_rtx ();
+
+rtx cond = gen_rtx_EQ (VOIDmode, mask, const0_rtx);
+emit_jump_insn (gen_cbranchdi4 (cond, mask, const0_rtx, else_label));
+emit_insn (gen_extract_last_ (dst, mask, vect));
+emit_jump_insn (gen_jump (end_label));
+emit_barrier ();
+emit_label (else_label);
+emit_move_insn (dst, default_value);
+emit_label (end_label);
+DONE;
+  })
+
 (define_expand "vec_init"
   [(match_operand:VEC_ALLREG_MODE 0 "register_operand")
(match_operand 1)]
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 80e9d6720bd..98f1141a8a4 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6974,7 +6974,8 @@ proc check_effective_target_vect_logical_reduc { } {
 # Return 1 if the target supports the fold_extract_last optab.
 
 proc check_effective_target_vect_fold_extract_last { } {
-return [check_effective_target_aarch64_sve]
+return [expr { [check_effective_target_aarch64_sve]
+		   || [istarget amdgcn*-*-*] }]
 }
 
 # Return 1 if the target supports section-anchors


Re: [PATCH] Fix symver attribute with LTO

2019-12-17 Thread Xi Ruoyao
On 2019-12-17 09:32 +0100, Jan Hubicka wrote:
> > Hi,
> > with Jan's patch commited in r278878 we can use symver attribute for
> > functions
> > and variables.  The symver attribute is designed for replacing toplevel asm
> > statements containing ".symver" which may be removed by LTO.  Unfortunately,
> > a quick test shown GCC still generates buggy so file with LTO and new symver
> > attribute.
> 
> Thanks for looking into this.  It was on my TODo list to actually
> convert some packages, so it is great you did that.
> > Two issues:
> > 
> > 1. The symver node in symtab is marked as PREVAILING_DEF_IRONLY (no EXP) and
> >then removed by LTO.
> 
> This is however wrong - linker should not mark it as
> PREVAILING_DEF_IRONLY if it is used externally.  What linker do you use?
> On my testcases this was working with
>  GNU ld (GNU Binutils) 2.31.51.20181222
> I could easily imagine that some linkers get it wrong which should be
> reported to bintuils bugzilla but it is also easy to work around as done
> in your patch.

Hi Jan,

I'm using GNU ld 2.33.1.

I'll attach a testcase simplified from fuse-3.9 code.  "local: *;" in the
versioning script triggers the issue.  Without it there would be no problem.

> > 2. The actual function body implementing the symver-ed function is also
> > marked
> >as PREVAILING_DEF_IRONLY and then removed or marked as local.  So no
> > ".globl"
> >directive is outputed for it.
> 
> Here is the symver-ed function exported from the DSO (or is it set
> to have hidden attribute)?
> Again this was working for me, so it would be good to understand this
> issue.

It's also triggered by "local: *;".

Untar the attachment and use "make" to build it, then "make show-dynamic-syms"
to dump the dynamic symbol table.  I believe (with 99% chance) you'll see only
foo (VERS_1) and foo_v1 (because foo_v1 is marked as global in the version
script).  And foo (VERS_2) would be missing.  With this patch foo (VERS_2) would
show up.

We can't mark "foo_v2" to be "global" because it should not be a part of DSO
ABI.

The other 1% chance would be a regression in Binutils.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


pr48200.tar.gz
Description: application/compressed-tar


[PATCH 3/4] Also propagate SRA accesses from LHS to RHS (PR 92706)

2019-12-17 Thread Martin Jambor
Hi,

the previous patch unfortunately does not fix the first testcase in PR
92706 and since I am afraid it might be the important one, I also
focused on that.  The issue here is again total scalarization accesses
clashing with those representing accesses in the IL - on another
aggregate but here the sides are switched.  Whereas in the previous
case the total scalarization accesses prevented propagation along
assignments, here we have the user accesses on the LHS, so even though
we do not create anything there, we totally scalarize the RHS and
again end up with assignments with different scalarizations leading to
bad code.

So we clearly need to propagate information about accesses from RHSs
to LHSs too, which the patch below does.  Because the intent is only
preventing bad total scalarization - which the last patch now performs
late enough - and do not care about grp_write flag and so forth, the
propagation is a bit simpler and so I did not try to unify all of the
code for both directions.

I still think that even with this patch the total scalarization has to
follow the declared type of the aggregate and cannot be done using
integers of the biggest suitable power, at least in early SRA, because
these propagations of course do not work interprocedurally but
inlining can and does eventually bring accesses from two functions
together which could (and IMHO would) lead to same problems.

Bootstrapped and LTO-bootstrapped and tested on an x86_64-linux and
bootstrapped and tested it on aarch64 and i686 (except that on i686
the testcase will need to be skipped because __int128_t is not
available there).  I expect that review will lead to requests to
change things but as far as I am concerned, this is ready for trunk
too.

Thanks,

Martin

2019-12-11  Martin Jambor  

PR tree-optimization/92706
* tree-sra.c (struct access): Fields first_link, last_link,
next_queued and grp_queued renamed to first_rhs_link, last_rhs_link,
next_rhs_queued and grp_rhs_queued respectively, new fields
first_lhs_link, last_lhs_link, next_lhs_queued and grp_lhs_queued.
(struct assign_link): Field next renamed to next_rhs, new field
next_lhs.  Updated comment.
(work_queue_head): Renamed to rhs_work_queue_head.
(lhs_work_queue_head): New variable.
(add_link_to_lhs): New function.
(relink_to_new_repr): Also relink LHS lists.
(add_access_to_work_queue): Renamed to add_access_to_rhs_work_queue.
(add_access_to_lhs_work_queue): New function.
(pop_access_from_work_queue): Renamed to
pop_access_from_rhs_work_queue.
(pop_access_from_lhs_work_queue): New function.
(build_accesses_from_assign): Also add links to LHS lists and to LHS
work_queue.
(child_would_conflict_in_lacc): Renamed to
child_would_conflict_in_acc.  Adjusted parameter names.
(create_artificial_child_access): New parameter set_grp_read, use it.
(subtree_mark_written_and_enqueue): Renamed to
subtree_mark_written_and_rhs_enqueue.
(propagate_subaccesses_across_link): Renamed to
propagate_subaccesses_from_rhs.
(propagate_subaccesses_from_lhs): New function.
(propagate_all_subaccesses): Also propagate subaccesses from LHSs to
RHSs.

testsuite/
* gcc.dg/tree-ssa/pr92706-1.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c |  17 ++
 gcc/tree-sra.c| 316 --
 2 files changed, 253 insertions(+), 80 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c
new file mode 100644
index 000..c36d103798e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-esra-details" } */
+
+struct S { int i[4]; } __attribute__((aligned(128)));
+typedef __int128_t my_int128 __attribute__((may_alias));
+__int128_t load (void *p)
+{
+  struct S v;
+  __builtin_memcpy (, p, sizeof (struct S));
+  struct S u;
+  u = v;
+  struct S w;
+  w = u;
+  return *(my_int128 *)
+}
+
+/* { dg-final { scan-tree-dump-not "Created a replacement for u offset: 
\[^0\]" "esra" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 0f28157456c..9f087e5c27a 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -167,11 +167,15 @@ struct access
   struct access *next_sibling;
 
   /* Pointers to the first and last element in the linked list of assign
- links.  */
-  struct assign_link *first_link, *last_link;
+ links for propagation from LHS to RHS.  */
+  struct assign_link *first_rhs_link, *last_rhs_link;
 
-  /* Pointer to the next access in the work queue.  */
-  struct access *next_queued;
+  /* Pointers to the first and last element in the linked list of assign
+ links for propagation from LHS to RHS.  */
+  struct 

[PATCH 4/4] Make total scalarization also copy padding (PR 92486)

2019-12-17 Thread Martin Jambor
Hi,

PR 92486 shows that DSE, when seeing a "normal" gimple aggregate
assignment coming from a C struct assignment and one a representing a
folded memcpy, can kill the latter and keep in place only the former,
which does not copy padding - at least when SRA decides to totally
scalarize a least one of the aggregates (when not doing total
scalarization, SRA cares about padding)

Mind you, SRA would not totally scalarize an aggregate if it saw that
it takes part in a gimple assignment which is a folded memcpy (see how
type_changing_p is set in contains_vce_or_bfcref_p) but it doesn't
because of the DSE decisions.

I was asked to modify SRA to take padding into account - and to copy
it around - when totally scalarizing, which is what the patch below
does.

I believe the patch is correct in the sense that it will not cause
miscompilations but after I have seen inlining propagate the useless
(and small and ugly and certainly damaging) accesses far and wide, I
am more convinced that before that this is not the correct approach
and DSE should simple be able to discern between the two assignment
too - and that the semantics of a "normal" gimple assignments should
not include copying of padding.

But if the decision will be to make gimple aggregate always a solid
block copy, the patch can do it, and has passed bootstrap and testing
on x86_64-linux and a very similar one on aarch64-linux and
i686-linux.  I suppose that at least the way how it figures out the
type for copying will need change, but even then I'd rather not commit
it.

Thanks,

Martin

2019-12-13  Martin Jambor  

PR tree-optimization/92486
* tree-sra.c: Include langhooks.h.
(total_scalarization_fill_padding): New function.
(total_skip_all_accesses_until_pos): Also create accesses for padding.
(total_should_skip_creating_access): Pass new parameters to
total_skip_all_accesses_until_pos, update how much area is already
covered in cases of success.
(totally_scalarize_subtree): Track how much of an aggregate is
covered, create accesses for trailing padding.
---
 gcc/tree-sra.c | 102 -
 1 file changed, 92 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 9f087e5c27a..753bf63c33c 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -99,7 +99,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "builtins.h"
 #include "tree-sra.h"
-
+#include "langhooks.h"
 
 /* Enumeration of all aggregate reductions we can do.  */
 enum sra_mode { SRA_MODE_EARLY_IPA,   /* early call regularization */
@@ -2983,6 +2983,54 @@ create_total_scalarization_access (struct access 
*parent, HOST_WIDE_INT from,
   return access;
 }
 
+/* Create accesses to cover padding within PARENT, spanning from FROM to TO,
+   link it in between its children in between LAST_PTR and NEXT_SIBLING.  */
+
+static struct access **
+total_scalarization_fill_padding (struct access *parent, HOST_WIDE_INT from,
+ HOST_WIDE_INT to, struct access **last_ptr,
+ struct access *next_sibling)
+{
+  do
+{
+  /* Diff cannot be bigger than max_scalarization_size in
+analyze_all_variable_accesses.  */
+  HOST_WIDE_INT diff = to - from;
+  gcc_assert (diff >= BITS_PER_UNIT);
+  HOST_WIDE_INT stsz = 1 << floor_log2 (diff);
+  tree type;
+  scalar_int_mode mode;
+
+  while (true)
+   {
+ type = lang_hooks.types.type_for_size (stsz, 1);
+ if (type
+ && is_a  (TYPE_MODE (type), )
+ && GET_MODE_BITSIZE (mode) == stsz)
+   break;
+ stsz /= 2;
+ gcc_checking_assert (stsz >= BITS_PER_UNIT);
+   }
+
+  do {
+   tree expr = build_ref_for_offset (UNKNOWN_LOCATION, parent->base,
+ from, parent->reverse, type, NULL,
+ false);
+   struct access *access
+ = create_total_scalarization_access (parent, from, stsz, type,
+  expr, last_ptr, next_sibling);
+   access->grp_no_warning = 1;
+   last_ptr = >next_sibling;
+
+   from += stsz;
+  }
+  while (to - from >= stsz);
+  gcc_assert (from <= to);
+}
+  while (from < to);
+  return last_ptr;
+}
+
 static bool totally_scalarize_subtree (struct access *root);
 
 /* Move **LAST_PTR along the chain of siblings until it points to an access
@@ -2991,16 +3039,35 @@ static bool totally_scalarize_subtree (struct access 
*root);
false.  */
 
 static bool
-total_skip_all_accesses_until_pos (HOST_WIDE_INT pos, struct access 
***last_ptr)
+total_skip_all_accesses_until_pos (struct access *root, HOST_WIDE_INT pos,
+  HOST_WIDE_INT *covered,
+  struct access ***last_ptr)
 {
   struct access *next_child = 

[PATCH 2/4] SRA: Total scalarization after access propagation (PR 92706)

2019-12-17 Thread Martin Jambor
Hi,

this patch fixes the second testcase in PR 92706 by performing total
scalarization only quite a bit later, when we already have access
trees constructed and even done propagation of accesses from RHSs of
assignment to LHSs.

The new code simultaneously traverses the existing access tree and the
declared variable type and adds artificial accesses whenever they can
fit in between the existing ones.  This prevents us from creating
accesses based on the type which then clash with those which have
propagated here from another access tree describing an aggregate on a
RHS of an assignment, which means that both sides of the assignment
will be scalarized differently, leading to bad code and the aggregate
most likely not going away.

Bootstrapped and LTO-bootstrapped and tested on an x86_64-linux where
it causes two new guality XPASSes.  Along with the next patch I also
bootstrapped and tested it on aarch64 and i686.  I expect that review
will lead to requests to change things but as far as I am concerned,
this is ready for trunk.

Thanks,

Martin


2019-12-12  Martin Jambor  

PR tree-optimization/92706
* tree-sra.c (struct access): Adjust comment of
grp_total_scalarization.
(find_access_in_subtree): Look for single children spanning an entire
access.
(scalarizable_type_p): Allow register accesses, adjust callers.
(completely_scalarize): Remove function.
(scalarize_elem): Likewise.
(create_total_scalarization_access): Likewise.
(sort_and_splice_var_accesses): Do not track total scalarization
flags.
(analyze_access_subtree): New parameter totally, adjust to new meaning
of grp_total_scalarization.
(analyze_access_trees): Pass new parameter to analyze_access_subtree.
(can_totally_scalarize_forest_p): New function.
(create_total_scalarization_access): Likewise.
(total_skip_all_accesses_until_pos): Likewise.
(access_and_field_type_match_p): Likewise.
(total_handle_child_matching_pos_size): Likewise.
(total_skip_children_over_scalar_type): Likewise.
(total_should_skip_creating_access): Likewise.
(totally_scalarize_subtree): Likewise.
(analyze_all_variable_accesses): Perform total scalarization after
subaccess propagation using the new functions above.
(initialize_constant_pool_replacements): Output initializers by
traversing the access tree.

testsuite/
* gcc.dg/tree-ssa/pr92706-2.c: New test.
* gcc.dg/guality/pr59776.c: Xfail tests for s2.g.
---
 gcc/testsuite/gcc.dg/guality/pr59776.c|   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c |  19 +
 gcc/tree-sra.c| 666 --
 3 files changed, 503 insertions(+), 186 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c

diff --git a/gcc/testsuite/gcc.dg/guality/pr59776.c 
b/gcc/testsuite/gcc.dg/guality/pr59776.c
index 382abb622bb..6c1c8165b70 100644
--- a/gcc/testsuite/gcc.dg/guality/pr59776.c
+++ b/gcc/testsuite/gcc.dg/guality/pr59776.c
@@ -12,11 +12,11 @@ foo (struct S *p)
   struct S s1, s2; /* { dg-final { gdb-test pr59776.c:17 
"s1.f" "5.0" } } */
   s1 = *p; /* { dg-final { gdb-test pr59776.c:17 
"s1.g" "6.0" } } */
   s2 = s1; /* { dg-final { gdb-test pr59776.c:17 
"s2.f" "0.0" } } */
-  *(int *)  = 0;  /* { dg-final { gdb-test pr59776.c:17 
"s2.g" "6.0" } } */
+  *(int *)  = 0;  /* { dg-final { gdb-test pr59776.c:17 
"s2.g" "6.0" { xfail *-*-* } } } */
   asm volatile (NOP : : : "memory");   /* { dg-final { gdb-test pr59776.c:20 
"s1.f" "5.0" } } */
   asm volatile (NOP : : : "memory");   /* { dg-final { gdb-test pr59776.c:20 
"s1.g" "6.0" } } */
   s2 = s1; /* { dg-final { gdb-test pr59776.c:20 
"s2.f" "5.0" } } */
-  asm volatile (NOP : : : "memory");   /* { dg-final { gdb-test pr59776.c:20 
"s2.g" "6.0" } } */
+  asm volatile (NOP : : : "memory");   /* { dg-final { gdb-test pr59776.c:20 
"s2.g" "6.0" { xfail *-*-* } } } */
   asm volatile (NOP : : : "memory");
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c
new file mode 100644
index 000..37ab9765db0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-esra" } */
+
+typedef __UINT64_TYPE__ uint64_t;
+typedef __UINT32_TYPE__ uint32_t;
+struct S { uint32_t i[2]; } __attribute__((aligned(__alignof__(uint64_t;
+typedef uint64_t my_int64 __attribute__((may_alias));
+uint64_t load (void *p)
+{
+  struct S u, v, w;
+  uint64_t tem;
+  tem = *(my_int64 *)p;
+  *(my_int64 *) = tem;
+  u = v;
+  w = u;
+  return *(my_int64 *)
+}
+
+/* { dg-final { scan-tree-dump "Created a replacement for v" "esra" } } */
diff --git a/gcc/tree-sra.c 

[PATCH 1/4] Add verification of SRA accesses

2019-12-17 Thread Martin Jambor
Hi,

because the follow-up patches perform some non-trivial operations on
SRA patches, I wrote myself a verifier.  And sure enough, it has
spotted two issues, one of which is fixed in this patch too - we did
not correctly set the parent link when creating artificial accesses
for propagation across assignments.  The second one is the (not)
setting of reverse flag when creating accesses for total scalarization
but since the following patch removes the offending function, this
patch does not fix it.

Bootstrapped and tested on x86_64, I consider this a pre-requisite for
the followup patches (and the parent link fix really is).

Thanks,

Martin

2019-12-10  Martin Jambor  

* tree-sra.c (verify_sra_access_forest): New function.
(verify_all_sra_access_forests): Likewise.
(create_artificial_child_access): Set parent.
(analyze_all_variable_accesses): Call the verifier.
---
 gcc/tree-sra.c | 86 ++
 1 file changed, 86 insertions(+)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 87c156f2f54..e077a811da9 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2321,6 +2321,88 @@ build_access_trees (struct access *access)
   return true;
 }
 
+/* Traverse the access forest where ROOT is the first root and verify that
+   various important invariants hold true.  */
+
+DEBUG_FUNCTION void
+verify_sra_access_forest (struct access *root)
+{
+  struct access *access = root;
+  tree first_base = root->base;
+  gcc_assert (DECL_P (first_base));
+  do
+{
+  gcc_assert (access->base == first_base);
+  if (access->parent)
+   gcc_assert (access->offset >= access->parent->offset
+   && access->size <= access->parent->size);
+  if (access->next_sibling)
+   gcc_assert (access->next_sibling->offset
+   >= access->offset + access->size);
+
+  poly_int64 poffset, psize, pmax_size;
+  bool reverse;
+  tree base = get_ref_base_and_extent (access->expr, , ,
+  _size, );
+  HOST_WIDE_INT offset, size, max_size;
+  if (!poffset.is_constant ()
+ || !psize.is_constant ()
+ || !pmax_size.is_constant (_size))
+   gcc_unreachable ();
+  gcc_assert (base == first_base);
+  gcc_assert (offset == access->offset);
+  gcc_assert (access->grp_unscalarizable_region
+ || size == max_size);
+  gcc_assert (max_size == access->size);
+  gcc_assert (reverse == access->reverse);
+
+  if (access->first_child)
+   {
+ gcc_assert (access->first_child->parent == access);
+ access = access->first_child;
+   }
+  else if (access->next_sibling)
+   {
+ gcc_assert (access->next_sibling->parent == access->parent);
+ access = access->next_sibling;
+   }
+  else
+   {
+ while (access->parent && !access->next_sibling)
+   access = access->parent;
+ if (access->next_sibling)
+   access = access->next_sibling;
+ else
+   {
+ gcc_assert (access == root);
+ root = root->next_grp;
+ access = root;
+   }
+   }
+}
+  while (access);
+}
+
+/* Verify access forests of all candidates with accesses by calling
+   verify_access_forest on each on them.  */
+
+DEBUG_FUNCTION void
+verify_all_sra_access_forests (void)
+{
+  bitmap_iterator bi;
+  unsigned i;
+  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
+{
+  tree var = candidate (i);
+  struct access *access = get_first_repr_for_decl (var);
+  if (access)
+   {
+ gcc_assert (access->base == var);
+ verify_sra_access_forest (access);
+   }
+}
+}
+
 /* Return true if expr contains some ARRAY_REFs into a variable bounded
array.  */
 
@@ -2566,6 +2648,7 @@ create_artificial_child_access (struct access *parent, 
struct access *model,
   access->offset = new_offset;
   access->size = model->size;
   access->type = model->type;
+  access->parent = parent;
   access->grp_write = set_grp_write;
   access->grp_read = false;
   access->reverse = model->reverse;
@@ -2850,6 +2933,9 @@ analyze_all_variable_accesses (void)
 
   propagate_all_subaccesses ();
 
+  if (flag_checking)
+verify_all_sra_access_forests ();
+
   bitmap_copy (tmp, candidate_bitmap);
   EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
 {
-- 
2.24.0



Re: [PATCH] OpenACC 2.6 manual deep copy support (attach/detach)

2019-12-17 Thread Thomas Schwinge
Hi Julian!

As a first step, can you please split out just the code required to make
the OpenACC 'acc_attach*', 'acc_detach*' runtime library routines work?

Assuming there were no other defects in libgomp, whould this already make
the 'libgomp.oacc-c-c++-common/deep-copy-3.c',
'libgomp.oacc-c-c++-common/deep-copy-5.c' test cases work?


Grüße
 Thomas


signature.asc
Description: PGP signature


[committed, pr92772] Mention bug in comment

2019-12-17 Thread Andrew Stubbs

This patch only changes a comment, so I'm committing it as "obvious".

The point is that I don't intend to spend the time to fix the bug 
because implementing the fold_extract_last is both a work around and an 
optimization, but future ports might encounter the same problem and 
hopefully the pointer will save future readers some confusion.


Andrew
Add pointer to PR92772

2019-12-17  Andrew Stubbs  

	* tree-vect-loop.c (vect_create_epilog_for_reduction): Mention pr92772
	in the comments.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 353a5ff06e1..68699f2d814 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -4534,7 +4534,10 @@ vect_create_epilog_for_reduction (stmt_vec_info stmt_info,
  containing the last time the condition passed for that vector lane.
  The first match will be a 1 to allow 0 to be used for non-matching
  indexes.  If there are no matches at all then the vector will be all
- zeroes.  */
+ zeroes.
+   
+ PR92772: This algorithm is broken for architectures that support
+ masked vectors, but do not provide fold_extract_last.  */
   if (STMT_VINFO_REDUC_TYPE (reduc_info) == COND_REDUCTION)
 {
   auto_vec, 2> ccompares;


Re: [PATCH] IPA-CP: Remove bogus static keyword (PR 92971)

2019-12-17 Thread Jakub Jelinek
On Tue, Dec 17, 2019 at 01:50:32PM +0100, Martin Jambor wrote:
> Hi,
> 
> as reported in PR 92971, IPA-CP's
> cgraph_edge_brings_all_agg_vals_for_node defines one local variable with
> the static keyword which is a clear mistake, probabley a cut'n'paste
> error when I originally wrote the code.
> 
> I'll commit the following as obvious after a round of bootstrap and
> testing.  Early next year, I'll also commit it to all opened release
> branches.

Is that what you want to do though?
Because when it is an automatic variable (shouldn't it be auto_vec, btw),
then the first use of it doesn't make much sense:
  values = intersect_aggregates_with_edge (cs, i, values);
because it will be always (cs, i, vNULL).  So maybe the var should live
across the iterations or live in the caller that should pass a pointer (or
reference) to it?
With the patch, there will be leaks too, because the values vector is only
released if the function returns false and is not released otherwise.

> 2019-12-17  Martin Jambor  
> 
>   * ipa-cp.c (cgraph_edge_brings_all_agg_vals_for_node): Remove
>   static from local variable definition.
> ---
>  gcc/ipa-cp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 1a80ccbde2d..6692eb7b878 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -5117,7 +5117,7 @@ cgraph_edge_brings_all_agg_vals_for_node (struct 
> cgraph_edge *cs,
>  
>for (i = 0; i < count; i++)
>  {
> -  static vec values = vNULL;
> +  vec values = vNULL;
>class ipcp_param_lattices *plats;
>bool interesting = false;
>for (struct ipa_agg_replacement_value *av = aggval; av; av = av->next)
> -- 
> 2.24.0

Jakub



Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2019-12-17 Thread Kyrill Tkachov

Hi Christophe,

On 11/18/19 9:00 AM, Christophe Lyon wrote:

On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
 wrote:
>
> On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
>  wrote:
> >
> > On 18/10/2019 14:18, Christophe Lyon wrote:
> > > +  bool not_supported = arm_arch_notm || flag_pic || 
TARGET_NEON;

> > >
> >
> > This is a poor name in the context of the function as a whole.  What's
> > not supported.  Please think of a better name so that I have some idea
> > what the intention is.
>
> That's to keep most of the code common when checking if -mpure-code
> and -mslow-flash-data are supported.
> These 3 cases are common to the two compilation flags, and
> -mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.
>
> Would "common_unsupported_modes" work better for you?
> Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
> the two tests.
>

Hi,

Here is an updated version, using "common_unsupported_modes" instead
of "not_supported", and fixing the typo reported by Kyrill.
The ChangeLog is still the same.

OK?



The name looks ok to me. Richard had a concern about Armv8-M Baseline, 
but I do see it being supported as you pointed out.


So I believe all the concerns are addressed.

Thus the code is ok. However, please also updated the documentation for 
-mpure-code in invoke.texi (it currently states that a MOVT instruction 
is needed).


Thanks,

Kyrill





Thanks,

Christophe

> Thanks,
>
> Christophe
>
> >
> > R.


[PATCH] Some compute_objsize/gimple_call_alloc_size/maybe_warn_overflow cleanups (PR tree-optimization/92868)

2019-12-17 Thread Jakub Jelinek
Hi!

When looking at the PR, I wrote a cleanup patch with various things I've
noticed, with latest Martin's changes half of them aren't valid anymore, but
I found further ones.
So, besides formatting fixes, this patch tries to make sure the rng1 ranges
are meaningful even in some corner cases.  The first hunk and half is about
what rng1 will be for single argument attribute where to the corresponding
argument INTEGER_CST is passed, vanilla trunk can stuck into rng1[0] ==
rng1[1] e.g. negative value with whatever precision the argument has
(say for int argument and -23 passed to it will return (size_t) -23,
but range will be [-23, -23]) or e.g. for arguments wider than size_t like
__int128 could return numbers above SIZE_MAX.  Whatever the argument type
is, the attribute expects that value to be passed over to malloc/calloc or
similar functions and so it will be promoted or demoted to size_t.  The
  rng1[0] = wi::zero (rng1[1].get_precision ());
line is to avoid weird ranges and (conservatively) cover the whole range
of sizes the allocator function can return.  E.g. if the ranges for the two
arguments are [2, SIZE_MAX] * [2, SIZE_MAX], the upper bound overflows and
gimple_call_alloc_size would return SIZE_MAX with [4, SIZE_MAX] in the rng1,
but that is not accurate, because due to the overflow also [0, 3] would be
possible.  Or for [SIZE_MAX - 2, SIZE_MAX] * [SIZE_MAX - 2, SIZE_MAX] where
the overflow is both on the low and upper bounds, the function would return
SIZE_MAX and set rng1 to [((__int128) SIZE_MAX - 2) * (SIZE_MAX - 2), SIZE_MAX]
i.e. range where low bound is much higher than upper bound (== invalid
range).  The patch will just use [0, SIZE_MAX] range in those cases.
And the second hunk in tree-ssa-strlen.c is a fix for pointer comparison of
INTEGER_CSTs, while we cache INTEGER_CSTs and INTEGER_CSTs of the same type
with the same value and no overflow will compare equal, in that function, it
is actually quite unlikely they will have the same type, because destsize is
most likely a sizetype integer, while len most likely size_t (aka
size_type_node), but could be anything else that passes
useless_type_conversion_p).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-12-17  Jakub Jelinek  

PR tree-optimization/92868
* builtins.c (gimple_call_alloc_size): If there is only one size 
argument
and INTEGER_CST is passed to it, call get_range only on the converted 
value.
For overflow, set rng1[0] to zero.  Formatting fix.
(compute_objsize): Formatting fix.
* tree-ssa-strlen.c (maybe_warn_overflow): Remove spurious ; after }.
Use operand_equal_p instead of pointer comparison to compare 
INTEGER_CSTs.
Formatting fixes.

--- gcc/builtins.c.jj   2019-12-14 23:19:40.861879033 +0100
+++ gcc/builtins.c  2019-12-16 10:00:27.858664110 +0100
@@ -3749,6 +3749,13 @@ gimple_call_alloc_size (gimple *stmt, wi
 }
 
   tree size = gimple_call_arg (stmt, argidx1);
+  if (argidx2 > nargs && TREE_CODE (size) == INTEGER_CST)
+{
+  size = fold_convert (sizetype, size);
+  if (rng1)
+   get_range (size, rng1, rvals);
+  return size;
+}
 
   wide_int rng1_buf[2];
   /* If RNG1 is not set, use the buffer.  */
@@ -3758,12 +3765,10 @@ gimple_call_alloc_size (gimple *stmt, wi
   if (!get_range (size, rng1, rvals))
 return NULL_TREE;
 
-  if (argidx2 > nargs && TREE_CODE (size) == INTEGER_CST)
-return fold_convert (sizetype, size);
-
   /* To handle ranges do the math in wide_int and return the product
  of the upper bounds as a constant.  Ignore anti-ranges.  */
-  tree n = argidx2 < nargs ? gimple_call_arg (stmt, argidx2) : 
integer_one_node;
+  tree n
+= argidx2 < nargs ? gimple_call_arg (stmt, argidx2) : integer_one_node;
   wide_int rng2[2];
   if (!get_range (n, rng2, rvals))
 return NULL_TREE;
@@ -3783,6 +3788,7 @@ gimple_call_alloc_size (gimple *stmt, wi
   if (wi::gtu_p (rng1[1], wi::to_wide (size_max, prec)))
 {
   rng1[1] = wi::to_wide (size_max);
+  rng1[0] = wi::zero (rng1[1].get_precision ());
   return size_max;
 }
 
@@ -3962,8 +3968,7 @@ compute_objsize (tree dest, int ostype,
   if (!ostype)
 return NULL_TREE;
 
-  if (TREE_CODE (dest) == ARRAY_REF
-  || TREE_CODE (dest) == MEM_REF)
+  if (TREE_CODE (dest) == ARRAY_REF || TREE_CODE (dest) == MEM_REF)
 {
   tree ref = TREE_OPERAND (dest, 0);
   tree off = TREE_OPERAND (dest, 1);
--- gcc/tree-ssa-strlen.c.jj2019-12-14 23:19:40.860879048 +0100
+++ gcc/tree-ssa-strlen.c   2019-12-16 10:14:51.061654913 +0100
@@ -2039,7 +2039,7 @@ maybe_warn_overflow (gimple *stmt, tree
 {
   sizrng[0] = wi::zero (siz_prec);
   sizrng[1] = wi::to_wide (TYPE_MAX_VALUE (sizetype));
-};
+}
 
   sizrng[0] = sizrng[0].from (sizrng[0], siz_prec, UNSIGNED);
   sizrng[1] = sizrng[1].from (sizrng[1], siz_prec, UNSIGNED);
@@ -2048,7 +2048,11 @@ maybe_warn_overflow (gimple *stmt, tree
  

Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))

2019-12-17 Thread Jakub Jelinek
Hi!

I'd like to ping this patch (with the sizeof (c) -> sizeof (c) / sizeof (c[0])
testsuite fix Andreas pointed out).

Thanks!

On Tue, Dec 10, 2019 at 10:57:35AM +0100, Jakub Jelinek wrote:
> 2019-12-10  Jakub Jelinek  
> 
>   PR target/92841
>   * config/i386/i386.md (*stack_protect_set_2_,
>   *stack_protect_set_3): New define_insns and corresponding
>   define_peephole2s.
> 
>   * gcc.target/i386/pr92841.c: New test.

Jakub



Re: [PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

2019-12-17 Thread Kyrill Tkachov

Hi Mihail,

On 12/16/19 6:29 PM, Mihail Ionescu wrote:

Hi Kyrill,

On 11/12/2019 09:55 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 4/10] Clear GPR with CLRM

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to improve
code density of functions with the cmse_nonsecure_entry attribute and
when calling function with the cmse_nonsecure_call attribute by using
CLRM to do all the general purpose registers clearing as well as
clearing the APSR register.

=== Patch description ===

This patch adds a new pattern for the CLRM instruction and guards the
current clearing code in output_return_instruction() and thumb_exit()
on Armv8.1-M Mainline instructions not being present.
cmse_clear_registers () is then modified to use the new CLRM 
instruction

when targeting Armv8.1-M Mainline while keeping Armv8-M register
clearing code for VFP registers.

For the CLRM instruction, which does not mandated APSR in the register
list, checking whether it is the right volatile unspec or a clearing
register is done in clear_operation_p.

Note that load/store multiple were deemed sufficiently different in
terms of RTX structure compared to the CLRM pattern for a different
function to be used to validate the match_parallel.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm-protos.h (clear_operation_p): Declare.
    * config/arm/arm.c (clear_operation_p): New function.
    (cmse_clear_registers): Generate clear_multiple instruction 
pattern if

    targeting Armv8.1-M Mainline or successor.
    (output_return_instruction): Only output APSR register 
clearing if

    Armv8.1-M Mainline instructions not available.
    (thumb_exit): Likewise.
    * config/arm/predicates.md (clear_multiple_operation): New 
predicate.

    * config/arm/thumb2.md (clear_apsr): New define_insn.
    (clear_multiple): Likewise.
    * config/arm/unspecs.md (VUNSPEC_CLRM_APSR): New volatile 
unspec.


*** gcc/testsuite/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: Add check for CLRM.
    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse-14.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.  Restrict checks 
for Armv8-M

    GPR clearing when CLRM is not available.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/bitfield-9.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-13.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-7.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/hard/cmse-8.c: likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/soft/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-7.c: 
Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-8.c: 
Likewise.

    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-13.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-7.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-8.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-1.c: Likewise.
    * gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise.

Testing: bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?

Best regards,

Mihail


### Attachment also inlined for ease of reply 
###



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
f995974f9bb89ab3c7ff0888c394b0dfaf7da60c..1a948d2c97526ad7e67e8d4a610ac74cfdb13882 
100644

--- a/gcc/config/arm/arm-protos.h
+++ 

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2019-12-17 Thread Christophe Lyon
Ping?

On Wed, 11 Dec 2019 at 18:19, Christophe Lyon
 wrote:
>
> Ping?
>
> Le jeu. 5 déc. 2019 à 11:13, Christophe Lyon  a 
> écrit :
>>
>> ping?
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01667.html
>>
>> Kyrill approved the previous version modulo a typo fix, but Richard
>> wanted a better name for a variable.
>> Is that version OK?
>>
>> Thanks,
>>
>> Christophe
>>
>>
>> On Tue, 26 Nov 2019 at 16:29, Christophe Lyon
>>  wrote:
>> >
>> > ping?
>> >
>> > On Mon, 18 Nov 2019 at 10:00, Christophe Lyon
>> >  wrote:
>> > >
>> > > On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
>> > >  wrote:
>> > > >
>> > > > On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
>> > > >  wrote:
>> > > > >
>> > > > > On 18/10/2019 14:18, Christophe Lyon wrote:
>> > > > > > +  bool not_supported = arm_arch_notm || flag_pic || 
>> > > > > > TARGET_NEON;
>> > > > > >
>> > > > >
>> > > > > This is a poor name in the context of the function as a whole.  
>> > > > > What's
>> > > > > not supported.  Please think of a better name so that I have some 
>> > > > > idea
>> > > > > what the intention is.
>> > > >
>> > > > That's to keep most of the code common when checking if -mpure-code
>> > > > and -mslow-flash-data are supported.
>> > > > These 3 cases are common to the two compilation flags, and
>> > > > -mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.
>> > > >
>> > > > Would "common_unsupported_modes" work better for you?
>> > > > Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
>> > > > the two tests.
>> > > >
>> > >
>> > > Hi,
>> > >
>> > > Here is an updated version, using "common_unsupported_modes" instead
>> > > of "not_supported", and fixing the typo reported by Kyrill.
>> > > The ChangeLog is still the same.
>> > >
>> > > OK?
>> > >
>> > > Thanks,
>> > >
>> > > Christophe
>> > >
>> > > > Thanks,
>> > > >
>> > > > Christophe
>> > > >
>> > > > >
>> > > > > R.


[patch] libstdc++/configure: strengthen the check for availability of pthread_rwlock_t

2019-12-17 Thread Jérôme Lambourg
Hello,

This patch to libstdc++ configure ensures that pthread_rwlock_t is used only
when pthread is used for gthreads implementation.

The original issue is that VxWorks comes with its native tasking layer and an
optional pthread layer built above it. As pthread is an optional feature of the
kernel it may not be available on the target board.

Now, even being optional, the headers are present in the development
environment, and are in particular partially available when including the main
vxWorks.h header, and so are detected automatically as available by libstdc++.

This patch will thus refine the check and will make configure try to use
pthread_rwlock_t only when the gthr library already relies on pthread.

This was of course tested with vxworks targets, and we also verified that this
does not impact Linux (that uses pthread) or Windows (that does not use
pthread).

Best regards,
- Jerome

2019-12-16  Jerome Lambourg  

libstdc++
* acinclude.m4 (_GLIBCXX_USE_PTHREAD_RWLOCK_T): Checks that _PTHREADS
is defined after including gthr.h.
* configure: Regenerate.



pthread_rwlock_libstdcxx_configure.patch
Description: Binary data


[PATCH] Some x86 AMD -march= docs fixes + formatting fixes (PR target/92962)

2019-12-17 Thread Jakub Jelinek
Hi!

The bug report complained just about missing RDPID and WBNOINVD in znver2
description and double comma before CLWB, but reading the docs I found
various other nits and when trying to compare it with what the compiler
actually does, I found ugly formatting there too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, it might be worth to introduce PTA_BTVER1 etc. in i386.h similarly
how it is done for Intel CPUs and set new CPU PTA_* masks incrementally from
that rather than then always including the whole set, could do that
incrementally if desired.

2019-12-17  Jakub Jelinek  

PR target/92962
* common/config/i386/i386-common.c (processor_alias_table): Formatting
fixes.
* doc/invoke.texi (bdver3, bdver4, znver1): Add missing closing paren.
(znver2): Likewise.  Add RDPID and WBNOINVD, remove spurious comma
before CLWB.

--- gcc/common/config/i386/i386-common.c.jj 2019-12-09 15:02:30.131287575 
+0100
+++ gcc/common/config/i386/i386-common.c2019-12-16 22:26:44.477558339 
+0100
@@ -1617,7 +1617,7 @@ const pta processor_alias_table[] =
   {"pentium-m", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO,
 PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_FXSR},
   {"pentium4", PROCESSOR_PENTIUM4, CPU_NONE,
-PTA_MMX |PTA_SSE | PTA_SSE2 | PTA_FXSR},
+PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_FXSR},
   {"pentium4m", PROCESSOR_PENTIUM4, CPU_NONE,
 PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_FXSR},
   {"prescott", PROCESSOR_NOCONA, CPU_NONE,
@@ -1775,12 +1775,12 @@ const pta processor_alias_table[] =
   | PTA_SHA | PTA_LZCNT | PTA_POPCNT | PTA_CLWB | PTA_RDPID
   | PTA_WBNOINVD},
   {"btver1", PROCESSOR_BTVER1, CPU_GENERIC,
-PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
-  | PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_PRFCHW
+PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+  | PTA_SSSE3 | PTA_SSE4A | PTA_ABM | PTA_CX16 | PTA_PRFCHW
   | PTA_FXSR | PTA_XSAVE},
   {"btver2", PROCESSOR_BTVER2, CPU_BTVER2,
-PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
-  | PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_SSE4_1
+PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+  | PTA_SSSE3 | PTA_SSE4A | PTA_ABM | PTA_CX16 | PTA_SSE4_1
   | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
   | PTA_BMI | PTA_F16C | PTA_MOVBE | PTA_PRFCHW
   | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
--- gcc/doc/invoke.texi.jj  2019-12-13 11:36:01.140666243 +0100
+++ gcc/doc/invoke.texi 2019-12-16 22:26:50.434467619 +0100
@@ -27767,35 +27767,38 @@ instruction set extensions.)
 CPUs based on AMD Family 15h cores with x86-64 instruction set support.  (This
 supersets FMA4, AVX, XOP, LWP, AES, PCLMUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A,
 SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
+
 @item bdver2
 AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
 supersets BMI, TBM, F16C, FMA, FMA4, AVX, XOP, LWP, AES, PCLMUL, CX16, MMX,
 SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set 
 extensions.)
+
 @item bdver3
 AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
 supersets BMI, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, XOP, LWP, AES, 
 PCLMUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and
-64-bit instruction set extensions.
+64-bit instruction set extensions.)
+
 @item bdver4
 AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
 supersets BMI, BMI2, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, AVX2, XOP, LWP, 
 AES, PCLMUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1,
-SSE4.2, ABM and 64-bit instruction set extensions.
+SSE4.2, ABM and 64-bit instruction set extensions.)
 
 @item znver1
 AMD Family 17h core based CPUs with x86-64 instruction set support.  (This
 supersets BMI, BMI2, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED, MWAITX,
 SHA, CLZERO, AES, PCLMUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3,
 SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, and 64-bit
-instruction set extensions.
+instruction set extensions.)
+
 @item znver2
 AMD Family 17h core based CPUs with x86-64 instruction set support. (This
-supersets BMI, BMI2, ,CLWB, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED,
+supersets BMI, BMI2, CLWB, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED,
 MWAITX, SHA, CLZERO, AES, PCLMUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A,
-SSSE3, SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, and 64-bit
-instruction set extensions.)
-
+SSSE3, SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, RDPID,
+WBNOINVD, and 64-bit instruction set extensions.)
 
 @item btver1
 CPUs based on AMD Family 14h cores with x86-64 instruction set support.  (This

Jakub



Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support

2019-12-17 Thread Martin Jambor
Hi,

On Tue, Dec 17 2019, Thomas Schwinge wrote:
> On 2019-11-14T16:35:31+0100, Frederik Harwath  
> wrote:
>> this patch implements OpenACC 2.6 "acc_get_property" and related functions.
>
> [...]
>
>> --- a/libgomp/plugin/plugin-hsa.c
>> +++ b/libgomp/plugin/plugin-hsa.c
>> @@ -699,6 +699,32 @@ GOMP_OFFLOAD_get_num_devices (void)
>>return hsa_context.agent_count;
>>  }
>>  
>> +/* Part of the libgomp plugin interface.  Return the value of property
>> +   PROP of agent number N.  */
>> +
>> +union gomp_device_property_value
>> +GOMP_OFFLOAD_get_property (int n, int prop)
>> +{
>> +  union gomp_device_property_value nullval = { .val = 0 };
>> +
>> +  if (!init_hsa_context ())
>> +return nullval;
>
> I'm not familiar with that code, but similar to other plugins,
> 'init_hsa_context' already is called via 'GOMP_OFFLOAD_get_num_devices'
> (and 'GOMP_OFFLOAD_init_device', hmm...), so probably don't need to call
> it here?

I assume you always want to get the number of devices before querying
their properties but OTOH there is no harm in calling the initialization
function.

>> +
>> +  switch (prop)
>> +{
>> +case GOMP_DEVICE_PROPERTY_VENDOR:
>> +  return (union gomp_device_property_value) { .ptr = "AMD" };
>> +default:
>> +  return nullval;
>> +}
>> +}
>
> Not sure if "AMD" is actually correct here -- isn't HSA a
> vendor-independent standard?
>

Yes, it is supposed to be.  I think HSA is the correct "vendor" too,
essentially an abbreviation for HSA Foundation.

Thanks,

Martin


Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support

2019-12-17 Thread Andrew Stubbs

On 16/12/2019 23:00, Thomas Schwinge wrote:

There is no AMD GCN support yet. This will be added later on.


ACK, just to note that there now is a 'libgomp/plugin/plugin-gcn.c' that
at least needs to get a stub implementation (can mostly copy from
'libgomp/plugin/plugin-hsa.c'?) as otherwise the build will fail.


The code exists on the OG9 branch. It was omitted from the trunk 
submission because the other half of the properties support wasn't there 
yet.


Andrew


Re: [PATCH, GCC/ARM, 2/10] Add command line support for Armv8.1-M Mainline

2019-12-17 Thread Kyrill Tkachov

Hi Mihail,

On 12/16/19 6:28 PM, Mihail Ionescu wrote:

Hi Kyrill

On 11/06/2019 03:59 PM, Kyrill Tkachov wrote:

Hi Mihail,

On 11/4/19 4:49 PM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:
> [PATCH, GCC/ARM, 2/10] Add command line support
>
> Hi,
>
> === Context ===
>
> This patch is part of a patch series to add support for Armv8.1-M
> Mainline Security Extensions architecture. Its purpose is to add
> command-line support for that new architecture.
>
> === Patch description ===
>
> Besides the expected enabling of the new value for the -march
> command-line option (-march=armv8.1-m.main) and its extensions (see
> below), this patch disables support of the Security Extensions for 
this

> newly added architecture. This is done both by not including the cmse
> bit in the architecture description and by throwing an error message
> when user request Armv8.1-M Mainline Security Extensions. Note that
> Armv8-M Baseline and Mainline Security Extensions are still enabled.
>
> Only extensions for already supported instructions are implemented in
> this patch. Other extensions (MVE integer and float) will be added in
> separate patches. The following configurations are allowed for 
Armv8.1-M

> Mainline with regards to FPU and implemented in this patch:
> + no FPU (+nofp)
> + single precision VFPv5 with FP16 (+fp)
> + double precision VFPv5 with FP16 (+fp.dp)
>
> ChangeLog entry are as follow:
>
> *** gcc/ChangeLog ***
>
> 2019-10-23  Mihail-Calin Ionescu 
> 2019-10-23  Thomas Preud'homme 
>
>     * config/arm/arm-cpus.in (armv8_1m_main): New feature.
>     (ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, 
ARMv6k,
>     ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, 
ARMv7ve,
>     ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, 
ARMv8_3a,
>     ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): 
Reindent.

>     (ARMv8_1m_main): New feature group.
>     (armv8.1-m.main): New architecture.
>     * config/arm/arm-tables.opt: Regenerate.
>     * config/arm/arm.c (arm_arch8_1m_main): Define and default
> initialize.
>     (arm_option_reconfigure_globals): Initialize 
arm_arch8_1m_main.
>     (arm_options_perform_arch_sanity_checks): Error out when 
targeting

>     Armv8.1-M Mainline Security Extensions.
>     * config/arm/arm.h (arm_arch8_1m_main): Declare.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2019-10-23  Mihail-Calin Ionescu 
> 2019-10-23  Thomas Preud'homme 
>
>     * lib/target-supports.exp
> (check_effective_target_arm_arch_v8_1m_main_ok): Define.
>     (add_options_for_arm_arch_v8_1m_main): Likewise.
> (check_effective_target_arm_arch_v8_1m_main_multilib): Likewise.
>
> Testing: bootstrapped on arm-linux-gnueabihf and arm-none-eabi; 
testsuite

> shows no regression.
>
> Is this ok for trunk?
>
Ok.



Something that I remembered last night upon reflection...

New command-line options (or arguments to them) need documentation in 
invoke.texi.


Please add some either as part of this patch or as a separate patch 
if you prefer.



I've added the missing cli options in invoke.texi.

Here's the updated ChangeLog:

2019-12-06  Mihail-Calin Ionescu  
2019-12-16  Thomas Preud'homme  

* config/arm/arm-cpus.in (armv8_1m_main): New feature.
(ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, ARMv6k,
ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, ARMv7ve,
ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, ARMv8_3a,
ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): Reindent.
(ARMv8_1m_main): New feature group.
(armv8.1-m.main): New architecture.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch8_1m_main): Define and default 
initialize.

(arm_option_reconfigure_globals): Initialize arm_arch8_1m_main.
(arm_options_perform_arch_sanity_checks): Error out when targeting
Armv8.1-M Mainline Security Extensions.
* config/arm/arm.h (arm_arch8_1m_main): Declare.
* doc/invoke.texi: Document armv8.1-m.main.

*** gcc/testsuite/ChangeLog ***

2019-12-16  Mihail-Calin Ionescu  
2019-12-16  Thomas Preud'homme  

* lib/target-supports.exp
(check_effective_target_arm_arch_v8_1m_main_ok): Define.
(add_options_for_arm_arch_v8_1m_main): Likewise.
(check_effective_target_arm_arch_v8_1m_main_multilib): Likewise.



Thanks, this is ok.

Kyrill




Regards,
Mihail


Thanks,

Kyrill



Thanks,

Kyrill


> Best regards,
>
> Mihail
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index
> 
f8a3b3db67a537163bfe787d78c8f2edc4253ab3..652f2a4be9388fd7a74f0ec4615a292fd1cfcd36 


> 100644
> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -126,6 +126,9 @@ define feature armv8_5
>  # M-Profile security extensions.
>  define feature cmse
>
> +# Architecture rel 8.1-M.
> +define 

Re: [PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

2019-12-17 Thread Kyrill Tkachov

Hi Mihail,

On 12/16/19 6:29 PM, Mihail Ionescu wrote:


Hi Kyrill,

On 11/06/2019 04:12 PM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to enable
saving/restoring of nonsecure FP context in function with the
cmse_nonsecure_entry attribute.

=== Motivation ===

In Armv8-M Baseline and Mainline, the FP context is cleared on 
return from

nonsecure entry functions. This means the FP context might change when
calling a nonsecure entry function. This patch uses the new VLDR and
VSTR instructions available in Armv8.1-M Mainline to save/restore 
the FP

context when calling a nonsecure entry functionfrom nonsecure code.

=== Patch description ===

This patch consists mainly of creating 2 new instruction patterns to
push and pop special FP registers via vldm and vstr and using them in
prologue and epilogue. The patterns are defined as push/pop with an
unspecified operation on the memory accessed, with an unspecified
constant indicating what special FP register is being saved/restored.

Other aspects of the patch include:
  * defining the set of special registers that can be saved/restored 
and

    their name
  * reserving space in the stack frames for these push/pop
  * preventing return via pop
  * guarding the clearing of FPSCR to target architecture not having
    Armv8.1-M Mainline instructions.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.c (fp_sysreg_names): Declare and define.
    (use_return_insn): Also return false for Armv8.1-M Mainline.
    (output_return_instruction): Skip FPSCR clearing if Armv8.1-M
    Mainline instructions are available.
    (arm_compute_frame_layout): Allocate space in frame for FPCXTNS
    when targeting Armv8.1-M Mainline Security Extensions.
    (arm_expand_prologue): Save FPCXTNS if this is an Armv8.1-M
    Mainline entry function.
    (cmse_nonsecure_entry_clear_before_return): Clear IP and r4 if
    targeting Armv8.1-M Mainline or successor.
    (arm_expand_epilogue): Fix indentation of caller-saved register
    clearing.  Restore FPCXTNS if this is an Armv8.1-M Mainline
    entry function.
    * config/arm/arm.h (TARGET_HAVE_FP_CMSE): New macro.
    (FP_SYSREGS): Likewise.
    (enum vfp_sysregs_encoding): Define enum.
    (fp_sysreg_names): Declare.
    * config/arm/unspecs.md (VUNSPEC_VSTR_VLDR): New volatile 
unspec.

    * config/arm/vfp.md (push_fpsysreg_insn): New define_insn.
    (pop_fpsysreg_insn): Likewise.

*** gcc/testsuite/Changelog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: add checks for VSTR and 
VLDR.

    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse.exp: Run existing Armv8-M 
Mainline tests
    from mainline/8m subdirectory and new Armv8.1-M Mainline 
tests from

    mainline/8_1m subdirectory.
    * gcc.target/arm/cmse/mainline/bitfield-4.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-4.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-5.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-5.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-6.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-6.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-7.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-7.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-8.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-8.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-9.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-9.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-and-union-1.c: Move 
and rename

    into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-and-union.c: This.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-13.c: This. 
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-5.c: This. 
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-7.c: This. 
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Move into ...
    * 

Re: [PATCH] Some x86 AMD -march= docs fixes + formatting fixes (PR target/92962)

2019-12-17 Thread Uros Bizjak
On Tue, Dec 17, 2019 at 10:09 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The bug report complained just about missing RDPID and WBNOINVD in znver2
> description and double comma before CLWB, but reading the docs I found
> various other nits and when trying to compare it with what the compiler
> actually does, I found ugly formatting there too.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Note, it might be worth to introduce PTA_BTVER1 etc. in i386.h similarly
> how it is done for Intel CPUs and set new CPU PTA_* masks incrementally from
> that rather than then always including the whole set, could do that
> incrementally if desired.
>
> 2019-12-17  Jakub Jelinek  
>
> PR target/92962
> * common/config/i386/i386-common.c (processor_alias_table): Formatting
> fixes.
> * doc/invoke.texi (bdver3, bdver4, znver1): Add missing closing paren.
> (znver2): Likewise.  Add RDPID and WBNOINVD, remove spurious comma
> before CLWB.

OK.

Thanks,
Uros.

> --- gcc/common/config/i386/i386-common.c.jj 2019-12-09 15:02:30.131287575 
> +0100
> +++ gcc/common/config/i386/i386-common.c2019-12-16 22:26:44.477558339 
> +0100
> @@ -1617,7 +1617,7 @@ const pta processor_alias_table[] =
>{"pentium-m", PROCESSOR_PENTIUMPRO, CPU_PENTIUMPRO,
>  PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_FXSR},
>{"pentium4", PROCESSOR_PENTIUM4, CPU_NONE,
> -PTA_MMX |PTA_SSE | PTA_SSE2 | PTA_FXSR},
> +PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_FXSR},
>{"pentium4m", PROCESSOR_PENTIUM4, CPU_NONE,
>  PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_FXSR},
>{"prescott", PROCESSOR_NOCONA, CPU_NONE,
> @@ -1775,12 +1775,12 @@ const pta processor_alias_table[] =
>| PTA_SHA | PTA_LZCNT | PTA_POPCNT | PTA_CLWB | PTA_RDPID
>| PTA_WBNOINVD},
>{"btver1", PROCESSOR_BTVER1, CPU_GENERIC,
> -PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
> -  | PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_PRFCHW
> +PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
> +  | PTA_SSSE3 | PTA_SSE4A | PTA_ABM | PTA_CX16 | PTA_PRFCHW
>| PTA_FXSR | PTA_XSAVE},
>{"btver2", PROCESSOR_BTVER2, CPU_BTVER2,
> -PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
> -  | PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_SSE4_1
> +PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
> +  | PTA_SSSE3 | PTA_SSE4A | PTA_ABM | PTA_CX16 | PTA_SSE4_1
>| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
>| PTA_BMI | PTA_F16C | PTA_MOVBE | PTA_PRFCHW
>| PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
> --- gcc/doc/invoke.texi.jj  2019-12-13 11:36:01.140666243 +0100
> +++ gcc/doc/invoke.texi 2019-12-16 22:26:50.434467619 +0100
> @@ -27767,35 +27767,38 @@ instruction set extensions.)
>  CPUs based on AMD Family 15h cores with x86-64 instruction set support.  
> (This
>  supersets FMA4, AVX, XOP, LWP, AES, PCLMUL, CX16, MMX, SSE, SSE2, SSE3, 
> SSE4A,
>  SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.)
> +
>  @item bdver2
>  AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
>  supersets BMI, TBM, F16C, FMA, FMA4, AVX, XOP, LWP, AES, PCLMUL, CX16, MMX,
>  SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set
>  extensions.)
> +
>  @item bdver3
>  AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
>  supersets BMI, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, XOP, LWP, AES,
>  PCLMUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and
> -64-bit instruction set extensions.
> +64-bit instruction set extensions.)
> +
>  @item bdver4
>  AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
>  supersets BMI, BMI2, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, AVX2, XOP, LWP,
>  AES, PCLMUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1,
> -SSE4.2, ABM and 64-bit instruction set extensions.
> +SSE4.2, ABM and 64-bit instruction set extensions.)
>
>  @item znver1
>  AMD Family 17h core based CPUs with x86-64 instruction set support.  (This
>  supersets BMI, BMI2, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED, MWAITX,
>  SHA, CLZERO, AES, PCLMUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3,
>  SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, and 64-bit
> -instruction set extensions.
> +instruction set extensions.)
> +
>  @item znver2
>  AMD Family 17h core based CPUs with x86-64 instruction set support. (This
> -supersets BMI, BMI2, ,CLWB, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED,
> +supersets BMI, BMI2, CLWB, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED,
>  MWAITX, SHA, CLZERO, AES, PCLMUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A,
> -SSSE3, SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, and 64-bit
> -instruction set extensions.)
> -
> +SSSE3, SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, RDPID,
> +WBNOINVD, and 64-bit instruction set extensions.)
>
>  @item btver1
>  CPUs based on AMD Family 14h 

[PATCH 1/1] Work around array out of bounds warning in mkdeps

2019-12-17 Thread Andreas Krebbel
This suppresses an array out of bounds warning in mkdeps.c as proposed
by Martin Sebor in the bugzilla.

array subscript 2 is outside array bounds of ‘const char [2]’

Since this warning does occur during bootstrap it currently breaks
werror builds on IBM Z.

The problem can be reproduced also on x86_64 by changing the inlining
threshold using: --param max-inline-insns-auto=80

Bootstrapped and regression tested on x86_64 and IBM Z.

Ok for mainline?

libcpp/ChangeLog:

2019-12-17  Andreas Krebbel  

PR tree-optimization/92176
* mkdeps.c (deps_add_default_target):
---
 libcpp/mkdeps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libcpp/mkdeps.c b/libcpp/mkdeps.c
index 147aa909be7..d1001e30e19 100644
--- a/libcpp/mkdeps.c
+++ b/libcpp/mkdeps.c
@@ -268,7 +268,7 @@ deps_add_default_target (class mkdeps *d, const char *tgt)
 return;
 
   if (tgt[0] == '\0')
-deps_add_target (d, "-", 1);
+d->targets.push (xstrdup ("-"));
   else
 {
 #ifndef TARGET_OBJECT_SUFFIX
-- 
2.23.0



Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)

2019-12-17 Thread Uros Bizjak
On Tue, Dec 10, 2019 at 10:57 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The stack_protect_set_1_ pattern intentionally clears the register it
> used as a temporary to read the canary from the register and push it back
> on the stack for security reasons, to make sure the stack canary isn't
> spilled somewhere else etc.  On the following testcase, we end up with a
> weird:
> movq%fs:40, %rax
> movq%rax, 24(%rsp)
> xorl%eax, %eax
> movl$30, %eax
> sequence though, where the reporter rightfully complains it is a waste
> to clear the register and immediately set it to something else.
>
> We really don't want to split this into two patterns, because then the
> scheduler and whatever other post-RA passes could stick some further
> code in between and increase the lifetime of the security sensitive
> data in the register.
>
> So, the following patch uses peephole2 to merge the
> stack_protect_set_1_ insn with following setter of the same register.
> Only SImode and for TARGET_64BIT DImode moves are considered, as QI/HImode
> (unless actually emitted as SImode) moves don't overwrite the whole
> register, and for simplicity only the most common cases (no XMM/MM etc.
> sources).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
> (just check-gcc check-c++-all check-target-libstdc++-v3 with
> --target_board=unix/-fstack-protector-strong).  The -fstack-protector-strong
> testing was done with additional logging whenever the peephole2's kick in.
> During the -m64 testing, it kicked in 17682 times, during -m32 testing
> 6344 times.  Ok for trunk?
>
> 2019-12-10  Jakub Jelinek  
>
> PR target/92841
> * config/i386/i386.md (*stack_protect_set_2_,
> *stack_protect_set_3): New define_insns and corresponding
> define_peephole2s.
>
> * gcc.target/i386/pr92841.c: New test.

OK with a couple of changes below.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2019-12-05 10:04:05.35723 +0100
> +++ gcc/config/i386/i386.md 2019-12-09 19:29:31.578885594 +0100
> @@ -19732,6 +19732,98 @@ (define_insn "@stack_protect_set_1_"mov{}\t{%1, %2|%2, %1}\;mov{}\t{%2, %0|%0, 
> %2}\;xor{l}\t%k2, %k2"
>[(set_attr "type" "multi")])
>
> +;; Patterns and peephole2s to optimize stack_protect_set_1_
> +;; immediately followed by *mov{s,d}i_internal to the same register,
> +;; where we can avoid the xor{l} above.  We don't split this, so that
> +;; scheduling or anything else doesn't separate the *stack_protect_set*
> +;; pattern from the set of the register that overwrites the register
> +;; with a new value.
> +(define_insn "*stack_protect_set_2_"
> +  [(set (match_operand:PTR 0 "memory_operand" "=m")
> +   (unspec:PTR [(match_operand:PTR 3 "memory_operand" "m")]
> +   UNSPEC_SP_SET))
> +   (set (match_operand:SI 1 "register_operand" "=")
> +   (match_operand:SI 2 "general_operand" "g"))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "reload_completed
> +   && !reg_overlap_mentioned_p (operands[1], operands[2])"
> +{
> +  output_asm_insn ("mov{}\t{%3, %1|%1, %3}", operands);
> +  output_asm_insn ("mov{}\t{%1, %0|%0, %1}", operands);

This looks much better than the current asm template in
stack_protect_set_1_ and stack_protect_test_1_. Can you
maybe also change templates there?

> +  if (pic_32bit_operand (operands[2], SImode)
> +  || ix86_use_lea_for_mov (insn, operands + 1))
> +return "lea{l}\t{%E2, %1|%1, %E2}";
> +  else
> +return "mov{l}\t{%2, %1|%1, %2}";
> +}
> +  [(set_attr "type" "multi")
> +   (set_attr "length" "24")])
> +
> +(define_peephole2
> + [(parallel [(set (match_operand:PTR 0 "memory_operand")
> + (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
> + UNSPEC_SP_SET))
> +(set (match_operand:PTR 2 "general_reg_operand") (const_int 0))
> +(clobber (reg:CC FLAGS_REG))])
> +  (set (match_operand:SI 3 "general_reg_operand")
> +   (match_operand:SI 4 "general_operand"))]
> + "reload_completed
> +  && REGNO (operands[2]) == REGNO (operands[3])
> +  && !reg_overlap_mentioned_p (operands[3], operands[4])
> +  && (general_reg_operand (operands[4], SImode)
> +  || !register_operand (operands[4], SImode))"
> + [(parallel [(set (match_dup 0)
> + (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
> +(set (match_dup 3) (match_dup 4))
> +(clobber (reg:CC FLAGS_REG))])])

No need for "reload_completed" in peephole2 patterns. Also (IIRC),
operand predicate is not needed for operand 4 when it is mentioned in
the insn condition.

> +(define_insn "*stack_protect_set_3"
> +  [(set (match_operand:DI 0 "memory_operand" "=m,m,m")
> +   (unspec:DI [(match_operand:DI 3 "memory_operand" "m,m,m")]
> +  UNSPEC_SP_SET))
> +   (set (match_operand:DI 1 "register_operand" "=,r,r")
> +   (match_operand:DI 2 "general_operand" "Z,rem,i"))
> +   (clobber (reg:CC FLAGS_REG))]
> +  

[PATCH] Handle aggregate pass-through for self-recursive call (PR ipa/92794)

2019-12-17 Thread Feng Xue OS
If argument for a self-recursive call is a simple pass-through, the call
edge is also considered as source of any value originated from
non-recursive call to the function. Scalar pass-through and full aggregate
pass-through due to pointer pass-through have also been handled.
But we missed another kind of pass-through like below case,  partial
aggregate pass-through. This patch is meant to fix the problem which
caused ICE as in 92794.

  void foo(struct T *val_ptr)
  {
struct T new_val;
new_val.field = val_ptr->field;
foo ();
...
  }

Bootstrapped/regtested on x86_64-linux and aarch64-linux.

2019-12-17  Feng Xue  

PR ipa/92794
* ipa-cp.c (self_recursive_agg_pass_through_p): New function.
(intersect_with_plats): Use error_mark_node as place holder
when aggregate jump function is simple pass-through for
self-recursive call.
(intersect_with_agg_replacements): Likewise.
(intersect_aggregates_with_edge): Likewise.
(find_aggregate_values_for_callers_subset): Likewise.

Thanks,
FengFrom 42ba553ebf80eadb62619c5570a4b406f8c90c49 Mon Sep 17 00:00:00 2001
From: Feng Xue 
Date: Mon, 16 Dec 2019 20:33:36 +0800
Subject: [PATCH] Handle aggregate simple pass-through for self-recursive call

---
 gcc/ipa-cp.c   | 97 +-
 gcc/testsuite/gcc.dg/ipa/pr92794.c | 30 +
 2 files changed, 111 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr92794.c

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 1a80ccbde2d..0e17fedd649 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4564,6 +4564,23 @@ self_recursive_pass_through_p (cgraph_edge *cs, ipa_jump_func *jfunc, int i)
   return false;
 }
 
+/* Return true, if JFUNC, which describes a part of an aggregate represented
+   or pointed to by the i-th parameter of call CS, is a simple no-operation
+   pass-through function to itself.  */
+
+static bool
+self_recursive_agg_pass_through_p (cgraph_edge *cs, ipa_agg_jf_item *jfunc,
+   int i)
+{
+  if (cs->caller == cs->callee->function_symbol ()
+  && jfunc->jftype == IPA_JF_LOAD_AGG
+  && jfunc->offset == jfunc->value.load_agg.offset
+  && jfunc->value.pass_through.operation == NOP_EXPR
+  && jfunc->value.pass_through.formal_id == i)
+return true;
+  return false;
+}
+
 /* Given a NODE, and a subset of its CALLERS, try to populate blanks slots in
KNOWN_CSTS with constants that are also known for all of the CALLERS.  */
 
@@ -4756,10 +4773,19 @@ intersect_with_plats (class ipcp_param_lattices *plats,
 	  if (aglat->offset - offset == item->offset)
 	{
 	  gcc_checking_assert (item->value);
-	  if (aglat->is_single_const ()
-		  && values_equal_for_ipcp_p (item->value,
-	  aglat->values->value))
-		found = true;
+	  if (aglat->is_single_const ())
+		{
+		  tree value = aglat->values->value;
+
+		  if (values_equal_for_ipcp_p (item->value, value))
+		found = true;
+		  else if (item->value == error_mark_node)
+		{
+		  /* Replace unknown place holder value with real one.  */
+		  item->value = value;
+		  found = true;
+		}
+		}
 	  break;
 	}
 	  aglat = aglat->next;
@@ -4827,6 +4853,12 @@ intersect_with_agg_replacements (struct cgraph_node *node, int index,
 	{
 	  if (values_equal_for_ipcp_p (item->value, av->value))
 		found = true;
+	  else if (item->value == error_mark_node)
+		{
+		  /* Replace place holder value with real one.  */
+		  item->value = av->value;
+		  found = true;
+		}
 	  break;
 	}
 	}
@@ -4931,17 +4963,31 @@ intersect_aggregates_with_edge (struct cgraph_edge *cs, int index,
 	for (unsigned i = 0; i < jfunc->agg.items->length (); i++)
 	  {
 	struct ipa_agg_jf_item *agg_item = &(*jfunc->agg.items)[i];
-	tree value = ipa_agg_value_from_node (caller_info, cs->caller,
-		  agg_item);
-	if (value)
+	struct ipa_agg_value agg_value;
+
+	if (self_recursive_agg_pass_through_p (cs, agg_item, index))
+	  {
+		/* For a self-recursive call, if aggregate jump function is a
+		   simple pass-through, the exact value it stands for is not
+		   known, which must comes from other call sites.  But we
+		   still need to add a place holder in value sets to indicate
+		   it, here we use error_mark_node to represent the special
+		   unknown value, which will be replaced with real one during
+		   later intersecting operations.  */
+		agg_value.value = error_mark_node;
+	  }
+	else
 	  {
-		struct ipa_agg_value agg_value;
+		tree value = ipa_agg_value_from_node (caller_info, cs->caller,
+		  agg_item);
+		if (!value)
+		  continue;
 
-		agg_value.offset = agg_item->offset;
 		agg_value.value = value;
-
-		inter.safe_push (agg_value);
 	  }
+
+	agg_value.offset = agg_item->offset;
+	inter.safe_push (agg_value);
 	  }
   else
 	FOR_EACH_VEC_ELT (inter, k, item)
@@ -4960,11 +5006,27 @@ intersect_aggregates_with_edge 

Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2019-12-17 Thread Kyrill Tkachov



On 12/17/19 2:33 PM, Christophe Lyon wrote:

On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov
 wrote:

Hi Christophe,

On 11/18/19 9:00 AM, Christophe Lyon wrote:

On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
 wrote:

On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
 wrote:

On 18/10/2019 14:18, Christophe Lyon wrote:

+  bool not_supported = arm_arch_notm || flag_pic ||

TARGET_NEON;

This is a poor name in the context of the function as a whole.  What's
not supported.  Please think of a better name so that I have some idea
what the intention is.

That's to keep most of the code common when checking if -mpure-code
and -mslow-flash-data are supported.
These 3 cases are common to the two compilation flags, and
-mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.

Would "common_unsupported_modes" work better for you?
Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
the two tests.


Hi,

Here is an updated version, using "common_unsupported_modes" instead
of "not_supported", and fixing the typo reported by Kyrill.
The ChangeLog is still the same.

OK?


The name looks ok to me. Richard had a concern about Armv8-M Baseline,
but I do see it being supported as you pointed out.

So I believe all the concerns are addressed.

OK, thanks!


Thus the code is ok. However, please also updated the documentation for
-mpure-code in invoke.texi (it currently states that a MOVT instruction
is needed).


I didn't think about this :(
It currently says: "This option is only available when generating
non-pic code for M-profile targets with the MOVT instruction."

I suggest to remove the "with the MOVT instruction" part. Is that OK
if I commit my patch and this doc change?


Yes, I think that is simplest correct change to make.

Thanks,

Kyrill



Christophe


Thanks,

Kyrill




Thanks,

Christophe


Thanks,

Christophe


R.


Re: [GCC][testsuite][ARM][AArch64] Add ARM v8.6 effective target checks to target-supports.exp

2019-12-17 Thread Stam Markianos-Wright


On 12/13/19 11:15 AM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> Hi all,
>>
>> This small patch adds support for the ARM v8.6 extensions +bf16 and
>> +i8mm to the testsuite. This will be tested through other upcoming
>> patches, which is why we are not providing any explicit tests here.
>>
>> Ok for trunk?
>>
>> Also I don't have commit rights, so if someone could commit on my
>> behalf, that would be great :)
>>
>> The functionality here depends on CLI patches:
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02415.html
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>
>> but this patch applies cleanly without them, too.
>>
>> Cheers,
>> Stam
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-12-11  Stam Markianos-Wright  
>>
>>  * lib/target-supports.exp
>>  (check_effective_target_arm_v8_2a_i8mm_ok_nocache): New.
>>  (check_effective_target_arm_v8_2a_i8mm_ok): New.
>>  (add_options_for_arm_v8_2a_i8mm): New.
>>  (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): New.
>>  (check_effective_target_arm_v8_2a_bf16_neon_ok): New.
>>  (add_options_for_arm_v8_2a_bf16_neon): New.
> 
> The new effective-target keywords need to be documented in
> doc/sourcebuild.texi.

Added in new diff :)

> 
> LGTM otherwise.  For:
> 
>> diff --git a/gcc/testsuite/lib/target-supports.exp 
>> b/gcc/testsuite/lib/target-supports.exp
>> index 5b4cc02f921..36fb63e9929 100644
>> --- a/gcc/testsuite/lib/target-supports.exp
>> +++ b/gcc/testsuite/lib/target-supports.exp
>> @@ -4781,6 +4781,49 @@ proc add_options_for_arm_v8_2a_dotprod_neon { flags } 
>> {
>>   return "$flags $et_arm_v8_2a_dotprod_neon_flags"
>>   }
>>   
>> +# Return 1 if the target supports ARMv8.2+i8mm Adv.SIMD Dot Product
>> +# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
>> +# Record the command line options needed.
>> +
>> +proc check_effective_target_arm_v8_2a_i8mm_ok_nocache { } {
>> +global et_arm_v8_2a_i8mm_flags
>> +set et_arm_v8_2a_i8mm_flags ""
>> +
>> +if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
>> +return 0;
>> +}
>> +
>> +# Iterate through sets of options to find the compiler flags that
>> +# need to be added to the -march option.
>> +foreach flags {"" "-mfloat-abi=hard -mfpu=neon-fp-armv8" 
>> "-mfloat-abi=softfp -mfpu=neon-fp-armv8" } {
>> +if { [check_no_compiler_messages_nocache \
>> +  arm_v8_2a_i8mm_ok object {
>> +#include 
>> +#if !defined (__ARM_FEATURE_MATMUL_INT8)
>> +#error "__ARM_FEATURE_MATMUL_INT8 not defined"
>> +#endif
>> +} "$flags -march=armv8.2-a+i8mm"] } {
>> +set et_arm_v8_2a_i8mm_flags "$flags -march=armv8.2-a+i8mm"
>> +return 1
>> +}
>> +}
> 
> I wondered whether it would be better to add no options if testing
> with something that already supports i8mm (e.g. -march=armv8.6).
> That would mean trying:
> 
>"" "-march=armv8.2-a+i8mm" "-march=armv8.2-a+i8mm -mfloat-abi..." ...
> 
> instead.  But there are arguments both ways, and the above follows
> existing style, so OK.

Not quite sure if I understanding this right, but I think that's what 
the "" option in foreach flags{} is for?

i.e. currently what I'm seeing is:

+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm }  */

will pull through the first option that compiles to object file with no 
errors (check_no_compiler_messages_nocache arm_v8_2a_i8mm_ok object).

So in a lot of cases it should just be fine for "" and only pull in 
-march=armv8.2-a+i8mm.

I think that's right? Lmk if I'm not reading it properly!

Cheers,
Stam
> 
> Thanks,
> Richard
> 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 85573a49a2b..73408d12cbe 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1877,6 +1877,18 @@ ARM target supports extensions to generate the @code{VFMAL} and @code{VFMLS}
 half-precision floating-point instructions available from ARMv8.2-A and
 onwards.  Some multilibs may be incompatible with these options.
 
+@item arm_v8_2a_bf16_neon_ok
+@anchor{arm_v8_2a_bf16_neon_ok}
+ARM target supports options to generate instructions from ARMv8.2-A with
+the BFloat16 extension (bf16). Some multilibs may be incompatible with these
+options.
+
+@item arm_v8_2a_i8mm_ok
+@anchor{arm_v8_2a_i8mm_ok}
+ARM target supports options to generate instructions from ARMv8.2-A with
+the 8-Bit Integer Matrix Multiply extension (i8mm). Some multilibs may be
+incompatible with these options.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 80e9d6720bd..5a8ec5dda1f 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4781,6 +4781,49 @@ proc 

Re: Fix partitioning ICE with external comdats

2019-12-17 Thread Andreas Schwab
On Dez 17 2019, Jan Hubicka wrote:

> Index: symtab.c
> ===
> --- symtab.c  (revision 279178)
> +++ symtab.c  (working copy)
> @@ -1952,6 +1952,11 @@ symtab_node::get_partitioning_class (voi
>if (DECL_EXTERNAL (decl))
>  return SYMBOL_EXTERNAL;
>  
> +  /* Even static aliases of external functions as external.  Those can happen

s/as/are/

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


In 'libgomp/target.c', 'struct splay_tree_key_s', use 'struct splay_tree_aux' for infrequently-used or API-specific data (was: [PATCH] OpenACC 2.6 manual deep copy support (attach/detach))

2019-12-17 Thread Thomas Schwinge
Hi Jakub!

On 2019-11-06T18:43:39+, Julian Brown  wrote:
> In particular, [a new big patch] incorporates the idea [...]
> relating to adding the new "attach_count" field to the memory-mapping
> splay tree key type without growing that structure in the common case
> (i.e. when structure components are not being mapped, or for OpenMP).
> In short, a new auxiliary structure is added containing the previous
> "link_key" and "attach_count" fields: so, you can either have both
> pointers (though of course one of them may be NULL), or in the common
> case no aux pointer at all, so no growth in the base struct size.

That was in response to:

> On Fri, 18 Oct 2019 18:47:08 +0200
> Thomas Schwinge  wrote:
>> While reviewing
>> 
>> "OpenACC reference count overhaul", I've just now stumbled over one
>> thing that originally was designed here:
>> 
>> On 2018-12-10T19:41:37+, Julian Brown 
>> wrote:
>> > On Fri, 7 Dec 2018 14:50:19 +0100
>> > Jakub Jelinek  wrote:
>> >  
>> >> On Fri, Nov 30, 2018 at 03:41:09AM -0800, Julian Brown wrote:  
>> >> > @@ -918,8 +920,13 @@ struct splay_tree_key_s {
>> >> >uintptr_t tgt_offset;
>> >> >/* Reference count.  */
>> >> >uintptr_t refcount;
>> >> > -  /* Dynamic reference count.  */
>> >> > -  uintptr_t dynamic_refcount;
>> >> > +  /* Reference counts beyond those that represent genuine references 
>> >> > in the
>> >> > + linked splay tree key/target memory structures, e.g. for multiple 
>> >> > OpenACC
>> >> > + "present increment" operations (via "acc enter data") refering to 
>> >> > the same
>> >> > + host-memory block.  */
>> >> > +  uintptr_t virtual_refcount;
>> >> > +  /* For a block with attached pointers, the attachment counters for 
>> >> > each.  */
>> >> > +  unsigned short *attach_count;
>> >> >/* Pointer to the original mapping of "omp declare target link" 
>> >> > object.  */
>> >> >splay_tree_key link_key;
>> >> >  };
>> >> 
>> >> This is something I'm worried about a lot, the nodes keep growing
>> >> way too much.  
>> 
>> Is that just a would-be-nice-to-avoid, or is it an actual problem?
>> 
>> If the latter, can we maybe move some data into on-the-side data
>> structures, say an associative array keyed by [something suitable]?  I
>> would assume that compared to actual host to/from device data movement
>> (or even lookup etc.), lookup of values from such an associative array
>> should be relatively cheap?
>
> I'd be extremely wary of adding a completely separate off-the-side
> structure to keep track of attachment counters: the reference-counting
> behaviour is already complicated enough, and the risk of messing things
> up with another indirectly-linked structure to keep track of is too
> high (never mind the extra runtime overhead).

(Well, ACK; it was just an idea to think in a different direction maybe.)

> With the approach in this
> patch, at least the extra info for link_key/attach_count is directly
> accessible from the splay tree key struct via pointer indirection.
>
> This version entails slight additional overhead (another malloc'd
> block and another pointer indirection) for the link_key field (and also
> for the attach_count pointer). I've not benchmarked memory use or
> performance though, so I'm not sure how much impact this has on real
> code.

I extracted the changes related to that from Julian's big patch, see
attached "In 'libgomp/target.c', 'struct splay_tree_key_s', use 'struct
splay_tree_aux' for infrequently-used or API-specific data ".  Is this OK
to commit?  If approving this patch, please respond with "Reviewed-by:
NAME " so that your effort will be recorded in the commit log, see
.


As part of the OpenACC manual deep copy changes, we'll then later apply:

 struct splay_tree_aux {
   /* Pointer to the original mapping of "omp declare target link" object.  
*/
   splay_tree_key link_key;
+  /* For a block with attached pointers, the attachment counters for each.
+ Only used for OpenACC.  */
+  uintptr_t *attach_count;
 };


Grüße
 Thomas


From 6f81ae8189c5a53d9ab414363bfefd249b78e7c1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 17 Dec 2019 16:11:40 +0100
Subject: [PATCH] In 'libgomp/target.c', 'struct splay_tree_key_s', use 'struct
 splay_tree_aux' for infrequently-used or API-specific data

---
 libgomp/libgomp.h | 10 --
 libgomp/target.c  | 23 ---
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index b2cd07dfa67..d65a1fa250b 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -989,6 +989,13 @@ struct target_mem_desc {
 #define OFFSET_POINTER (~(uintptr_t) 1)
 #define OFFSET_STRUCT (~(uintptr_t) 2)
 
+/* Auxiliary structure for infrequently-used or API-specific data.  */
+
+struct splay_tree_aux {
+  /* Pointer to the original mapping of "omp 

C++ Patch Ping (was Re: [C++ PATCH] Improve C++ error recovery (PR c++/59655))

2019-12-17 Thread Jakub Jelinek
Hi!

On Tue, Dec 10, 2019 at 10:02:47PM +0100, Jakub Jelinek wrote:
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> Or do you want to use an additional bit for that?
> 
> 2019-12-10  Jakub Jelinek  
> 
>   PR c++/59655
>   * pt.c (push_tinst_level_loc): If limit_bad_template_recursion,
>   set TREE_NO_WARNING on tldcl.
>   * decl2.c (no_linkage_error): Treat templates with TREE_NO_WARNING
>   as defined during error recovery.
> 
>   * g++.dg/cpp0x/diag3.C: New test.

I'd like to ping this patch (or whether you want a special bit for it
in addition to TREE_NO_WARNING).

Thanks.

Jakub



Re: [PATCH] V10 patch #1, Use PLI to load up large DImode constants if -mcpu=future

2019-12-17 Thread Segher Boessenkool
Hi!

On Wed, Dec 11, 2019 at 07:12:23PM -0500, Michael Meissner wrote:
> --- gcc/config/rs6000/rs6000.c(revision 279141)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -5541,6 +5541,10 @@ num_insns_constant_gpr (HOST_WIDE_INT va
>  && (value >> 31 == -1 || value >> 31 == 0))
>  return 1;
>  
> +  /* PADDI can support up to 34 bit signed integers.  */
> +  else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value))
> +return 1;

Please follow up with a patch to not call random numbers "OFFSET".

Okay for trunk.  Thanks!


Segher


Re: [PATCH] V10 patch #2, use PLI to load up large SImode constants if -mcpu=future

2019-12-17 Thread Segher Boessenkool
Hi!

On Wed, Dec 11, 2019 at 07:15:15PM -0500, Michael Meissner wrote:
> This patch adds an alternative to use PLI to load up large SImode constants if
> -mcpu=future is used.

> 
>   * config/rs6000/rs6000.md (movsi_internal1): Add alternative to
>   use PLI to load up 34-bit constants if -mcpu=future.

This is okay for trunk.  Thanks!


Segher


Re: [PATCH] V10 patch #3, Use PADDI to add large constants if -mcpu=future is used

2019-12-17 Thread Segher Boessenkool
On Wed, Dec 11, 2019 at 07:17:02PM -0500, Michael Meissner wrote:
> This patch adds an alternative to use PADDI to add large SImode and DImode
> constants if -mcpu=future is used.

> 2019-12-09  Michael Meissner  
> 
>   * config/rs6000/predicates.md (add_operand): Allow eI constants.
>   * config/rs6000/rs6000.md (add3): Add alternative to
>   generate PADDI for 34-bit constants if -mcpu=future.

This is fine.  Okay for trunk.  Thanks!


Segher


[PATCH, committed] Add myself to MAINTAINERS

2019-12-17 Thread Mihail Ionescu

Hi all,
I have committed the attached patch adding myself to the Write After
Approval section of the MAINTAINERS file.

ChangeLog:

2019-12-17  Mihail Ionescu  

* MAINTAINERS (write_after_approval): Add myself.


Regards,
Mihail
diff --git a/MAINTAINERS b/MAINTAINERS
index 78f17c35e9e1c4fa1abec149f6974d0874875545..5a2cf0a8ab125d6147b0601a0b0bff996359e7cb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -433,6 +433,7 @@ Naveen H.S	
 Roland Illig	
 Meador Inge	
 Bernardo Innocenti
+Mihail Ionescu	
 Vladislav Ivanishin
 Alexander Ivchenko
 Balaji V. Iyer	


Re: [PING 3][PATCH] track dynamic allocation in strlen (PR 91582)

2019-12-17 Thread Christophe Lyon
On Sat, 14 Dec 2019 at 22:35, Jeff Law  wrote:
>
> On Fri, 2019-12-13 at 17:55 -0700, Martin Sebor wrote:
> > After more testing by Jeff's buildbot and correcting the problems
> > it exposed I have committed the attached patch in r279392.
> And just to close the loop on this.  Your last version fixed all the
> issues I saw in the tester.
>

Hi,

On my side, I've noticed that r279392 caused regressions on arm.
On arm-none-linux-gnueabi
--with-mode arm
--with-cpu cortex-a9
I see
gcc.dg/strlenopt-8.c: pattern found 2 times
FAIL: gcc.dg/strlenopt-8.c scan-tree-dump-times strlen1 "strlen \\(" 0
FAIL: gcc.dg/tree-ssa/pr87022.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.dg/tree-ssa/pr87022.c:26:19: warning: writing 1
byte into a region of size 0 [-Wstringop-overflow=]
/gcc/testsuite/gcc.dg/tree-ssa/pr87022.c:26:19: warning: writing 1
byte into a region of size 0 [-Wstringop-overflow=]
/gcc/testsuite/gcc.dg/tree-ssa/pr87022.c:26:19: warning: writing 1
byte into a region of size 0 [-Wstringop-overflow=]
/gcc/testsuite/gcc.dg/tree-ssa/pr87022.c:26:19: warning: writing 1
byte into a region of size 0 [-Wstringop-overflow=]
/gcc/testsuite/gcc.dg/tree-ssa/pr87022.c:26:19: warning: writing 1
byte into a region of size 0 [-Wstringop-overflow=]
/gcc/testsuite/gcc.dg/tree-ssa/pr87022.c:26:19: warning: writing 1
byte into a region of size 0 [-Wstringop-overflow=]

Christophe


> jeff
>
>


Fix partitioning ICE with external comdats

2019-12-17 Thread Jan Hubicka
Hi,
while hacking firefox to work around ABI compatibility issues with LLVM
I ran into an ICE where comdat group was resolved externaly but contains
a static alias (for thunk). In this case parittioner attempts to put
that static definition into a partition which triggers an ICE.

Bootstrapped/regtested x86_64-linux, comitted.

* symtab.c (symtab_node::get_partitioning_class): Aliases of external
symbols are external.
Index: symtab.c
===
--- symtab.c(revision 279178)
+++ symtab.c(working copy)
@@ -1952,6 +1952,11 @@ symtab_node::get_partitioning_class (voi
   if (DECL_EXTERNAL (decl))
 return SYMBOL_EXTERNAL;
 
+  /* Even static aliases of external functions as external.  Those can happen
+ when COMDAT got resolved to non-IL implementation.  */
+  if (alias && DECL_EXTERNAL (ultimate_alias_target ()->decl))
+return SYMBOL_EXTERNAL;
+
   if (varpool_node *vnode = dyn_cast  (this))
 {
   if (alias && definition && !ultimate_alias_target ()->definition)


Re: [PATCH] Fix symver attribute with LTO

2019-12-17 Thread Jan Hubicka
> Hi Jan,
> 
> I'm using GNU ld 2.33.1.
> 
> I'll attach a testcase simplified from fuse-3.9 code.  "local: *;" in the
> versioning script triggers the issue.  Without it there would be no problem.

Thanks.
You are right that I did not play with local:. Now I wonder what is the
intended behaviour here.

In resolution file I see:
1
foo.o 4
205 dc8dc21a4ac8d072 PREVAILING_DEF_IRONLY_EXP foo_v1
207 dc8dc21a4ac8d072 PREVAILING_DEF_IRONLY foo@VERS_1
216 dc8dc21a4ac8d072 PREVAILING_DEF_IRONLY foo_v2
218 dc8dc21a4ac8d072 PREVAILING_DEF_IRONLY foo@@VERS_2

If I link the DSO w/o -flto I get with objdump -T:
1100 gDF .text  0006 (VERS_1) foo
1100 gDF .text  0006  VERS_2  foo_v1
 gDO *ABS*    VERS_1  VERS_1
1110 gDF .text  0006  VERS_2  foo
 gDO *ABS*    VERS_2  VERS_2

So I think linker is right here that foo_v1 is exported.  I would have
expected PREVAILING_DEF_IRONLY_EXP for foo@VERS_1 and foo@@VERS_2 since
that symbols do get exported even though they land in special way in the
DSO symbol table.  So I think meaingful behaviour would be
 1) make linker plugin interface to not annotate symbol version symbols
with any resolution at all, since they are "special"
 2) make them PREVAILING_DEF_IRONLY_EXP/PREVAILING_DEF since they always
get exported to non-LTO land.
We could workaround that on GCC side but if you agree with this
understanding I would fill in binutils PR. Also since we use resolution
info at many places, I would simply add logic working around this at a
time we read resolution rahter than on one of places we use it.

Comparing to objedump -T

1100 gDF .text  0006  VERS_2  foo_v1
 gDO *ABS*    VERS_1  VERS_1
 gDO *ABS*    VERS_2  VERS_2

Why we also miss
1100 gDF .text  0006 (VERS_1) foo
and what does it mean?

I am not too happy about forcing GCC to keep foo_v2 exported when it is
not.  For example, calls to foo_v2 will then not be optimized as they
could if GCC knew it is not used by non-LTO world (except for fact that
symver is bound to it).

Would it be equivalent to:
1) output foo_v2 local
2) producing static alias with local name (.L1)
3) do .symver .L1,foo@@@VERS_2
That is somewhat more systematic and would not lead to false
visibilities.

Honza

> 
> > > 2. The actual function body implementing the symver-ed function is also
> > > marked
> > >as PREVAILING_DEF_IRONLY and then removed or marked as local.  So no
> > > ".globl"
> > >directive is outputed for it.
> > 
> > Here is the symver-ed function exported from the DSO (or is it set
> > to have hidden attribute)?
> > Again this was working for me, so it would be good to understand this
> > issue.
> 
> It's also triggered by "local: *;".
> 
> Untar the attachment and use "make" to build it, then "make show-dynamic-syms"
> to dump the dynamic symbol table.  I believe (with 99% chance) you'll see only
> foo (VERS_1) and foo_v1 (because foo_v1 is marked as global in the version
> script).  And foo (VERS_2) would be missing.  With this patch foo (VERS_2) 
> would
> show up.
> 
> We can't mark "foo_v2" to be "global" because it should not be a part of DSO
> ABI.
> 
> The other 1% chance would be a regression in Binutils.
> -- 
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University




Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2019-12-17 Thread Christophe Lyon
On Tue, 17 Dec 2019 at 16:31, Kyrill Tkachov
 wrote:
>
>
> On 12/17/19 2:33 PM, Christophe Lyon wrote:
> > On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov
> >  wrote:
> >> Hi Christophe,
> >>
> >> On 11/18/19 9:00 AM, Christophe Lyon wrote:
> >>> On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
> >>>  wrote:
>  On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
>   wrote:
> > On 18/10/2019 14:18, Christophe Lyon wrote:
> >> +  bool not_supported = arm_arch_notm || flag_pic ||
> >>> TARGET_NEON;
> > This is a poor name in the context of the function as a whole.  What's
> > not supported.  Please think of a better name so that I have some idea
> > what the intention is.
>  That's to keep most of the code common when checking if -mpure-code
>  and -mslow-flash-data are supported.
>  These 3 cases are common to the two compilation flags, and
>  -mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.
> 
>  Would "common_unsupported_modes" work better for you?
>  Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
>  the two tests.
> 
> >>> Hi,
> >>>
> >>> Here is an updated version, using "common_unsupported_modes" instead
> >>> of "not_supported", and fixing the typo reported by Kyrill.
> >>> The ChangeLog is still the same.
> >>>
> >>> OK?
> >>
> >> The name looks ok to me. Richard had a concern about Armv8-M Baseline,
> >> but I do see it being supported as you pointed out.
> >>
> >> So I believe all the concerns are addressed.
> > OK, thanks!
> >
> >> Thus the code is ok. However, please also updated the documentation for
> >> -mpure-code in invoke.texi (it currently states that a MOVT instruction
> >> is needed).
> >>
> > I didn't think about this :(
> > It currently says: "This option is only available when generating
> > non-pic code for M-profile targets with the MOVT instruction."
> >
> > I suggest to remove the "with the MOVT instruction" part. Is that OK
> > if I commit my patch and this doc change?
>
> Yes, I think that is simplest correct change to make.
>
> Thanks,
>
Thanks, committed as r279463.

> Kyrill
>
>
> > Christophe
> >
> >> Thanks,
> >>
> >> Kyrill
> >>
> >>
> >>
> >>> Thanks,
> >>>
> >>> Christophe
> >>>
>  Thanks,
> 
>  Christophe
> 
> > R.


Re: [PATCH 2/2] [ARM] Add support for -mpure-code in thumb-1 (v6m)

2019-12-17 Thread Christophe Lyon
On Tue, 17 Dec 2019 at 11:34, Kyrill Tkachov
 wrote:
>
> Hi Christophe,
>
> On 11/18/19 9:00 AM, Christophe Lyon wrote:
> > On Wed, 13 Nov 2019 at 15:46, Christophe Lyon
> >  wrote:
> > >
> > > On Tue, 12 Nov 2019 at 12:13, Richard Earnshaw (lists)
> > >  wrote:
> > > >
> > > > On 18/10/2019 14:18, Christophe Lyon wrote:
> > > > > +  bool not_supported = arm_arch_notm || flag_pic ||
> > TARGET_NEON;
> > > > >
> > > >
> > > > This is a poor name in the context of the function as a whole.  What's
> > > > not supported.  Please think of a better name so that I have some idea
> > > > what the intention is.
> > >
> > > That's to keep most of the code common when checking if -mpure-code
> > > and -mslow-flash-data are supported.
> > > These 3 cases are common to the two compilation flags, and
> > > -mslow-flash-data still needs to check TARGET_HAVE_MOVT in addition.
> > >
> > > Would "common_unsupported_modes" work better for you?
> > > Or I can duplicate the "arm_arch_notm || flag_pic || TARGET_NEON" in
> > > the two tests.
> > >
> >
> > Hi,
> >
> > Here is an updated version, using "common_unsupported_modes" instead
> > of "not_supported", and fixing the typo reported by Kyrill.
> > The ChangeLog is still the same.
> >
> > OK?
>
>
> The name looks ok to me. Richard had a concern about Armv8-M Baseline,
> but I do see it being supported as you pointed out.
>
> So I believe all the concerns are addressed.
OK, thanks!

>
> Thus the code is ok. However, please also updated the documentation for
> -mpure-code in invoke.texi (it currently states that a MOVT instruction
> is needed).
>
I didn't think about this :(
It currently says: "This option is only available when generating
non-pic code for M-profile targets with the MOVT instruction."

I suggest to remove the "with the MOVT instruction" part. Is that OK
if I commit my patch and this doc change?

Christophe

> Thanks,
>
> Kyrill
>
>
>
> >
> > Thanks,
> >
> > Christophe
> >
> > > Thanks,
> > >
> > > Christophe
> > >
> > > >
> > > > R.


[Patch, Fortran] PR92896 [10 Regression] Fix - Prevent character conversion in array constructor

2019-12-17 Thread Mark Eggleston

Prevent conversion of character data in array constructors.

Fix for PR fortran/92896 [10 Regression] [DEC] ICE in reduce_unary, at 
fortran/arith.c:1283.


This was caused by an unintended side affect of "Allow CHARACTER 
literals in assignments and data statements" (revision 277975). If the 
conversion occurs in a array constructor it is rejected.


Patch is attached.

OK for trunk?

Change logs:

gcc/fortran/ChangeLog

    Mark Eggleston  

    PR fortran/92896
    * array.c (walk_array_constructor): Replace call to cfg_convert_type
    with call to gfc_convert_type_warn with new argument set to true.
    (check_element_type): Replace call to cfg_convert_type with call to
    gfc_convert_type_warn with new argument set to true.
    * gfortran.h: Add argument "array" to gfc_convert_type_warn default
    value set to false.
    *intrinsic.c (gfc_convert_type_warn): Update description of arguments.
    Add new argument to argument list. Add check for conversion to numeric
    or logical from character and array set to true, i.e. if conversion
    from character is in an array constructor reject it, goto bad.

gcc/testsuite/ChangeLog

    Mark Eggleston  

    PR fortran/92896
    * gfortran.dg/no_char_conversion_in_array_constructor.f90: New test.

--
https://www.codethink.co.uk/privacy.html

>From f2bd94410ad637444b0014a776ada2df98859cc5 Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Tue, 17 Dec 2019 13:05:46 +
Subject: [PATCH] Prevent conversion of character data in array constructors.

Fix for PR fortran/92896 [10 Regression] [DEC] ICE in reduce_unary, at
fortran/arith.c:1283.

This was caused by an unintended side affect of "Allow CHARACTER literals
in assignments and data statements" (revision 277975). If the conversion
occurs in a array constructor it is rejected.
---
 gcc/fortran/array.c   |  7 ---
 gcc/fortran/gfortran.h|  3 ++-
 gcc/fortran/intrinsic.c   | 15 +--
 .../no_char_conversion_in_array_constructor.f90   | 10 ++
 4 files changed, 29 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/no_char_conversion_in_array_constructor.f90

diff --git a/gcc/fortran/array.c b/gcc/fortran/array.c
index 36223d2233d..88ee16669bf 100644
--- a/gcc/fortran/array.c
+++ b/gcc/fortran/array.c
@@ -1189,9 +1189,10 @@ walk_array_constructor (gfc_typespec *ts, gfc_constructor_base head)
 	  if (m == MATCH_ERROR)
 	return m;
 	}
-  else if (!gfc_convert_type (e, ts, 1) && e->ts.type != BT_UNKNOWN)
+  else if (!gfc_convert_type_warn (e, ts, 1, 1, true)
+	   && e->ts.type != BT_UNKNOWN)
 	return MATCH_ERROR;
-  }
+}
   return MATCH_YES;
 }
 
@@ -1390,7 +1391,7 @@ check_element_type (gfc_expr *expr, bool convert)
 return 0;
 
   if (convert)
-return gfc_convert_type(expr, _ts, 1) ? 0 : 1;
+return gfc_convert_type_warn (expr, _ts, 1, 1, true) ? 0 : 1;
 
   gfc_error ("Element in %s array constructor at %L is %s",
 	 gfc_typename (_ts), >where,
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index f4a2b99bdc4..218daee0805 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3188,7 +3188,8 @@ void gfc_intrinsic_done_1 (void);
 char gfc_type_letter (bt, bool logical_equals_int = false);
 gfc_symbol * gfc_get_intrinsic_sub_symbol (const char *);
 bool gfc_convert_type (gfc_expr *, gfc_typespec *, int);
-bool gfc_convert_type_warn (gfc_expr *, gfc_typespec *, int, int);
+bool gfc_convert_type_warn (gfc_expr *, gfc_typespec *, int, int,
+			bool array = false);
 bool gfc_convert_chartype (gfc_expr *, gfc_typespec *);
 int gfc_generic_intrinsic (const char *);
 int gfc_specific_intrinsic (const char *);
diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index 76b53bb7117..c913f5ab152 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -5096,10 +5096,15 @@ gfc_convert_type (gfc_expr *expr, gfc_typespec *ts, int eflag)
  1 Generate a gfc_error()
  2 Generate a gfc_internal_error().
 
-   'wflag' controls the warning related to conversion.  */
+   'wflag' controls the warning related to conversion.
+
+   'array' indicates whether the conversion is in an array constructor.
+   Non-standard conversion from character to numeric not allowed if true.
+*/
 
 bool
-gfc_convert_type_warn (gfc_expr *expr, gfc_typespec *ts, int eflag, int wflag)
+gfc_convert_type_warn (gfc_expr *expr, gfc_typespec *ts, int eflag, int wflag,
+		   bool array)
 {
   gfc_intrinsic_sym *sym;
   gfc_typespec from_ts;
@@ -5142,6 +5147,12 @@ gfc_convert_type_warn (gfc_expr *expr, gfc_typespec *ts, int eflag, int wflag)
   && gfc_compare_types (>ts, ts))
 return true;
 
+  /* If array is true then conversion is in an array constructor where
+ non-standard conversion is not allowed.  */
+  if (array && from_ts.type == BT_CHARACTER
+  && (gfc_numeric_ts (ts) || ts->type == 

Re: [PATCH][AArch64] Fixup core tunings

2019-12-17 Thread Wilco Dijkstra
Hi Richard,

> This changelog entry is inadequate.  It's also not in the correct style.
>
> It should say what has changed, not just that it has changed.

Sure, but there is often no useful space for that. We should auto generate
changelogs if they are deemed useful. I find the commit message a lot more
useful in general. Here is the updated version:


Several tuning settings in cores.def are not consistent.
Set the tuning for Cortex-A76AE and Cortex-A77 to neoversen1 so
it is the same as for Cortex-A76 and Neoverse N1.
Set the tuning for Neoverse E1 to cortexa73 so it's the same as for
Cortex-A65. Set the scheduler for Cortex-A65 and Cortex-A65AE to
cortexa53.

Bootstrap OK, OK for commit?

ChangeLog:
2019-12-17  Wilco Dijkstra  

* config/aarch64/aarch64-cores.def: 
("cortex-a76ae"): Use neoversen1 tuning.
("cortex-a77"): Likewise.
("cortex-a65"): Use cortexa53 scheduler.
("cortex-a65ae"): Likewise.
("neoverse-e1"): Use cortexa73 tuning.
--

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
053c6390e747cb9c818fe29a9b22990143b260ad..d170253c6eddca87f8b9f4f7fcc4692695ef83fb
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -101,13 +101,13 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  
thunderx2t99, 8_1A,  AARCH64_FL_FOR
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, 
cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, 
cortexa73, 0x41, 0xd0a, -1)
 AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, 
neoversen1, 0x41, 0xd0b, -1)
-AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_SSBS, cortexa72, 0x41, 0xd0e, -1)
-AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_SSBS, cortexa72, 0x41, 0xd0d, -1)
-AARCH64_CORE("cortex-a65",  cortexa65, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_SSBS, cortexa73, 0x41, 0xd06, -1)
-AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_SSBS, cortexa73, 0x41, 0xd43, -1)
+AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_SSBS, neoversen1, 0x41, 0xd0e, -1)
+AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_SSBS, neoversen1, 0x41, 0xd0d, -1)
+AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_SSBS, cortexa73, 0x41, 0xd06, -1)
+AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_SSBS, cortexa73, 0x41, 0xd43, -1)
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, 
neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_PROFILE, neoversen1, 0x41, 0xd0c, -1)
-AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_SSBS, cortexa53, 0x41, 0xd4a, -1)
+AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD 
| AARCH64_FL_SSBS, cortexa73, 0x41, 0xd4a, -1)
 
 /* HiSilicon ('H') cores. */
 AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
@@ -127,6 +127,6 @@ AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, 
cortexa53, 8A,  AARCH
 /* ARM DynamIQ big.LITTLE configurations.  */
 
 AARCH64_CORE("cortex-a75.cortex-a55",  cortexa75cortexa55, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, 
cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd0a, 0xd05), -1)
-AARCH64_CORE("cortex-a76.cortex-a55",  cortexa76cortexa55, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, 
cortexa72, 0x41, AARCH64_BIG_LITTLE (0xd0b, 0xd05), -1)
+AARCH64_CORE("cortex-a76.cortex-a55",  cortexa76cortexa55, 

Re: [PATCH] Some compute_objsize/gimple_call_alloc_size/maybe_warn_overflow cleanups (PR tree-optimization/92868)

2019-12-17 Thread Martin Sebor

On 12/17/19 1:58 AM, Jakub Jelinek wrote:

Hi!

When looking at the PR, I wrote a cleanup patch with various things I've
noticed, with latest Martin's changes half of them aren't valid anymore, but
I found further ones.
So, besides formatting fixes, this patch tries to make sure the rng1 ranges
are meaningful even in some corner cases.  The first hunk and half is about
what rng1 will be for single argument attribute where to the corresponding
argument INTEGER_CST is passed, vanilla trunk can stuck into rng1[0] ==
rng1[1] e.g. negative value with whatever precision the argument has
(say for int argument and -23 passed to it will return (size_t) -23,
but range will be [-23, -23]) or e.g. for arguments wider than size_t like
__int128 could return numbers above SIZE_MAX.  Whatever the argument type
is, the attribute expects that value to be passed over to malloc/calloc or
similar functions and so it will be promoted or demoted to size_t.  The
   rng1[0] = wi::zero (rng1[1].get_precision ());
line is to avoid weird ranges and (conservatively) cover the whole range
of sizes the allocator function can return.  E.g. if the ranges for the two
arguments are [2, SIZE_MAX] * [2, SIZE_MAX], the upper bound overflows and
gimple_call_alloc_size would return SIZE_MAX with [4, SIZE_MAX] in the rng1,
but that is not accurate, because due to the overflow also [0, 3] would be
possible.


We want allocations to appear to have "saturating behavior" so that
code like the following, for example, is diagnosed:

  void* g (int n)
  {
if (3 < n)
  n = 3;

void *p = malloc (n);
strcpy (p, "12345");   // almost certain buffer overflow
return p;
  }

PR 92942 tracks the missing warning in this case.

What you're doing in the patch, while necessary for an optimization,
would keep us from detecting these kinds of bugs.


Or for [SIZE_MAX - 2, SIZE_MAX] * [SIZE_MAX - 2, SIZE_MAX] where
the overflow is both on the low and upper bounds, the function would return
SIZE_MAX and set rng1 to [((__int128) SIZE_MAX - 2) * (SIZE_MAX - 2), SIZE_MAX]
i.e. range where low bound is much higher than upper bound (== invalid
range).  The patch will just use [0, SIZE_MAX] range in those cases.


GCC diagnoses excessive allocations by -Walloc-size-larger-than
when the product exceeds PTRDIFF_MAX, so I'm less concerned about
diagnosing stores into them, but as above, the conservative approach
that's necessary for optimization is the opposite of what's needed
in order to detect bugs.


And the second hunk in tree-ssa-strlen.c is a fix for pointer comparison of
INTEGER_CSTs, while we cache INTEGER_CSTs and INTEGER_CSTs of the same type
with the same value and no overflow will compare equal, in that function, it
is actually quite unlikely they will have the same type, because destsize is
most likely a sizetype integer, while len most likely size_t (aka
size_type_node), but could be anything else that passes
useless_type_conversion_p).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


I appreciate a cleanup but I don't have the impression this patch
does clean anything up.  Because of all the formatting changes and
no tests the effect of the changes isn't as clear as it should be.
(I wish you would resist the urge to reformat existing code on this
scale while also making changes with an observable effect in
the same diff.)  But thanks to the detailed explanation above
I think I can safely say that the builtins.c changes are not in
line with what I would like to see.

FWIW, a change I would welcome is replacing the *POFF argument to
compute_objsize with a range like gimple_call_alloc_size() takes.
That would let the function handle offsets in negative ranges and
detect more buffer overflows, such as the one in PR 92939.  (I plan
to fix that along with PR 92942 for GCC 11).

Martin




2019-12-17  Jakub Jelinek  

PR tree-optimization/92868
* builtins.c (gimple_call_alloc_size): If there is only one size 
argument
and INTEGER_CST is passed to it, call get_range only on the converted 
value.
For overflow, set rng1[0] to zero.  Formatting fix.
(compute_objsize): Formatting fix.
* tree-ssa-strlen.c (maybe_warn_overflow): Remove spurious ; after }.
Use operand_equal_p instead of pointer comparison to compare 
INTEGER_CSTs.
Formatting fixes.

--- gcc/builtins.c.jj   2019-12-14 23:19:40.861879033 +0100
+++ gcc/builtins.c  2019-12-16 10:00:27.858664110 +0100
@@ -3749,6 +3749,13 @@ gimple_call_alloc_size (gimple *stmt, wi
  }
  
tree size = gimple_call_arg (stmt, argidx1);

+  if (argidx2 > nargs && TREE_CODE (size) == INTEGER_CST)
+{
+  size = fold_convert (sizetype, size);
+  if (rng1)
+   get_range (size, rng1, rvals);
+  return size;
+}
  
wide_int rng1_buf[2];

/* If RNG1 is not set, use the buffer.  */
@@ -3758,12 +3765,10 @@ gimple_call_alloc_size (gimple *stmt, wi
if (!get_range (size, rng1, rvals))
  

Re: [Patch, Fortran] PR92896 [10 Regression] Fix - Prevent character conversion in array constructor

2019-12-17 Thread Steve Kargl
On Tue, Dec 17, 2019 at 03:41:41PM +, Mark Eggleston wrote:
> gcc/fortran/ChangeLog
> 
>      Mark Eggleston  
> 
>      PR fortran/92896
>      * array.c (walk_array_constructor): Replace call to cfg_convert_type

s/cfg_convert_type/gfc_convert_type

>      with call to gfc_convert_type_warn with new argument set to true.
>      (check_element_type): Replace call to cfg_convert_type with call to
>      gfc_convert_type_warn with new argument set to true.
>      * gfortran.h: Add argument "array" to gfc_convert_type_warn default
>      value set to false.

Do all current uses of gfc_convert_type_warn need to be updated
to account for the new parameter?  That is, doesn't this introduce
a mismatch in the prototype and existing code?

-- 
Steve


Re: [PATCH] V10 patch #5, Fix codegen bug with vector extracts using a variable offset & PC-relative address

2019-12-17 Thread Segher Boessenkool
Hi!

On Wed, Dec 11, 2019 at 07:48:39PM -0500, Michael Meissner wrote:
> This patch fixes a bug with vector extracts using a PC-relative address and a
> variable offset with using -mcpu=future.
> 
> Consider the code:
> 
>   #include 
> 
>   static vector double vd;
>   vector double *p = 
> 
>   double get (unsigned int n)
>   {
> return vec_extract (vd, n);
>   }
> 
> If you compile this code with -O2 -mcpu=future -mpcrel you get:
> 
>   get:
>   pla 9,.LANCHOR0@pcrel
>   lfdx 1,9,9
>   blr
> 
> This is because there is only one base register temporary, and the current 
> code
> tries to first create the offset and then use the same temporary to hold the
> address of the PC-relative value.
> 
> After combine the insn is:
> 
> (insn 14 9 15 2 (parallel [
> (set (reg/i:DF 33 1)
> (unspec:DF [
> (mem/c:V2DF (symbol_ref:DI ("*.LANCHOR0") [flags 
> 0x182]) [1 vd+0 S16 A128])
> (reg:DI 123 [ n ])
> ] UNSPEC_VSX_EXTRACT))
> (clobber (scratch:DI))
> (clobber (scratch:V2DI))
> ]) "foo.c":9:1 1314 {vsx_extract_v2df_var}

(After postreload as well, more to the point -- well, it has hard regs
there, of course).

> Split2 changes this to:

The vsx_extract__var splitter dooes, yeah.

> (insn 20 8 21 2 (set (reg:DI 3 3 [orig:123 n ] [123])
> (and:DI (reg:DI 3 3 [orig:123 n ] [123])
> (const_int 1 [0x1]))) "foo.c":9:1 193 {anddi3_mask}
>  (nil))
> (insn 21 20 22 2 (set (reg:DI 9 9 [126])
> (ashift:DI (reg:DI 3 3 [orig:123 n ] [123])
> (const_int 3 [0x3]))) "foo.c":9:1 256 {ashldi3}
>  (nil))

These two are just  rlwinm 3,3,3,8  together, btw.  A good example why
splitters after reload are not great.

>  ;; Variable V2DI/V2DF extract
>  (define_insn_and_split "vsx_extract__var"
> -  [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r")
> - (unspec: [(match_operand:VSX_D 1 "input_operand" "v,m,m")
> -  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
> - UNSPEC_VSX_EXTRACT))
> -   (clobber (match_scratch:DI 3 "=r,,"))
> -   (clobber (match_scratch:V2DI 4 "=,X,X"))]
> +  [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r,wa,r")
> + (unspec:
> +  [(match_operand:VSX_D 1 "input_operand" "v,em,em,ep,ep")
> +   (match_operand:DI 2 "gpc_reg_operand" "r,r,r,r,r")]
> +  UNSPEC_VSX_EXTRACT))
> +   (clobber (match_scratch:DI 3 "=r"))
> +   (clobber (match_scratch:V2DI 4 "=,X,X,X,X"))
> +   (clobber (match_scratch:DI 5 "=X,X,X,,"))]
>"VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
>"#"
>"&& reload_completed"
>[(const_int 0)]
>  {
>rs6000_split_vec_extract_var (operands[0], operands[1], operands[2],
> - operands[3], operands[4]);
> + operands[3], operands[4], operands[5]);

This writes to operands[2], which does not match its constraint.

Same in the other splitters.


Segher


[patch] Use simple LRA algorithm at -O0

2019-12-17 Thread Eric Botcazou
Hi,

LRA is getting measurably slower since GCC 8, at least on x86, and things are 
worsening since GCC 9.  While this might be legitimate when optimization is 
enabled, it's a pure waste of cycles at -O0 so the attached patch switches LRA 
over to using the simple algorithm when optimization is disabled.  The effect 
on code size is tiny (typically 0.2% on x86).

Tested on x86_64-suse-linux, OK for the mainline?


2019-12-17  Eric Botcazou  

* ira.c (ira): Use simple LRA algorithm when not optimizing.

-- 
Eric BotcazouIndex: ira.c
===
--- ira.c	(revision 279442)
+++ ira.c	(working copy)
@@ -5192,8 +5192,6 @@ ira (FILE *f)
   int ira_max_point_before_emit;
   bool saved_flag_caller_saves = flag_caller_saves;
   enum ira_region saved_flag_ira_region = flag_ira_region;
-  unsigned int i;
-  int num_used_regs = 0;
 
   clear_bb_flags ();
 
@@ -5207,18 +5205,28 @@ ira (FILE *f)
   /* Perform target specific PIC register initialization.  */
   targetm.init_pic_reg ();
 
-  ira_conflicts_p = optimize > 0;
+  if (optimize)
+{
+  ira_conflicts_p = true;
 
-  /* Determine the number of pseudos actually requiring coloring.  */
-  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
-num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
-
-  /* If there are too many pseudos and/or basic blocks (e.g. 10K
- pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
- use simplified and faster algorithms in LRA.  */
-  lra_simple_p
-= (ira_use_lra_p
-   && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
+  /* Determine the number of pseudos actually requiring coloring.  */
+  unsigned int num_used_regs = 0;
+  for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
+	if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
+	  num_used_regs++;
+
+  /* If there are too many pseudos and/or basic blocks (e.g. 10K
+	 pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
+	 use simplified and faster algorithms in LRA.  */
+  lra_simple_p
+	= ira_use_lra_p
+	  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
+}
+  else
+{
+  ira_conflicts_p = false;
+  lra_simple_p = ira_use_lra_p;
+}
 
   if (lra_simple_p)
 {


Re: [C++ PATCH] Improve C++ error recovery (PR c++/59655)

2019-12-17 Thread Jason Merrill

On 12/10/19 4:02 PM, Jakub Jelinek wrote:

Hi!

On the following testcase, we emit 2 errors and 1 warning, when the user
really should see one error.  The desirable error is static_assert failure,
the bogus error is during error recovery, complaining that a no_linkage
template isn't defined when it really is defined, but we haven't bothered
instantiating it, because limit_bad_template_recursion sad so.
And finally a warning from the middle-end about function being used even
when it is not defined.

The last one already checks TREE_NO_WARNING on the function decl to avoid
the warning, so this patch just uses TREE_NO_WARNING to signal this case,
both to that warning and to no_linkage_error.

Now, I admit I'm not 100% sure if using TREE_NO_WARNING is the best thing
for no_linkage_error, another possibility might be adding some bit in
lang_decl_base or so and setting that bit in addition to TREE_NO_WARNING,
where that new bit would mean this decl might be defined if we didn't decide
not to instantiate it.  With the patch as is, there is a risk if
TREE_NO_WARNING is set for some other reason on a fndecl (or variable
template instantiation?) and some errors have been reported already that we
won't report another error for it when we should.  But perhaps that is
acceptable, once users fix the original errors even if TREE_NO_WARNING will
be set, errorcount + sorrycount will be zero and thus no_linkage_error will
still report it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Or do you want to use an additional bit for that?

2019-12-10  Jakub Jelinek  

PR c++/59655
* pt.c (push_tinst_level_loc): If limit_bad_template_recursion,
set TREE_NO_WARNING on tldcl.
* decl2.c (no_linkage_error): Treat templates with TREE_NO_WARNING
as defined during error recovery.

* g++.dg/cpp0x/diag3.C: New test.

--- gcc/cp/pt.c.jj  2019-12-10 00:52:39.017449262 +0100
+++ gcc/cp/pt.c 2019-12-10 19:20:11.046062705 +0100
@@ -10640,7 +10640,12 @@ push_tinst_level_loc (tree tldcl, tree t
   anything else.  Do allow deduction substitution and decls usable in
   constant expressions.  */
if (!targs && limit_bad_template_recursion (tldcl))
-return false;
+{
+  /* Avoid no_linkage_errors and unused function warnings for this
+decl.  */
+  TREE_NO_WARNING (tldcl) = 1;
+  return false;
+}
  
/* When not -quiet, dump template instantiations other than functions, since

   announce_function will take care of those.  */
--- gcc/cp/decl2.c.jj   2019-11-28 09:05:13.376262983 +0100
+++ gcc/cp/decl2.c  2019-12-10 19:33:05.555237052 +0100
@@ -4414,7 +4414,14 @@ decl_maybe_constant_var_p (tree decl)
  void
  no_linkage_error (tree decl)
  {
-  if (cxx_dialect >= cxx11 && decl_defined_p (decl))
+  if (cxx_dialect >= cxx11
+  && (decl_defined_p (decl)
+ /* Treat templates which limit_bad_template_recursion decided
+not to instantiate as if they were defined.  */
+ || (errorcount + sorrycount > 0
+ && DECL_LANG_SPECIFIC (decl)
+ && DECL_TEMPLATE_INFO (decl)
+ && TREE_NO_WARNING (decl


I'm not sure we need to check DECL_TEMPLATE_INFO; if we've seen errors 
they could have interfered with non-template definitions, too.  OK 
either way.


Jason



Re: [PATCH] V10 patch #4, Add new prefixed/non-prefixed memory constraints

2019-12-17 Thread Segher Boessenkool
Hi!

On Wed, Dec 11, 2019 at 07:29:05PM -0500, Michael Meissner wrote:
> +(define_memory_constraint "em"
> +  "A memory operand that does not contain a prefixed address."
> +  (and (match_code "mem")
> +   (match_operand 0 "non_prefixed_memory")))
> +
> +(define_memory_constraint "ep"
> +  "A memory operand that does contains a prefixed address."
> +  (and (match_code "mem")
> +   (match_operand 0 "prefixed_memory")))

"does contain".  Or maybe just say "with a non-prefixed address" and
"with a prefixed address"?

> +;; Return true if the operand is a valid memory address that does not use a
> +;; prefixed address.
> +(define_predicate "non_prefixed_memory"
> +  (match_code "mem")
> +{
> +  enum insn_form iform
> += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> +
> +  return (iform != INSN_FORM_BAD
> +  && iform != INSN_FORM_PREFIXED_NUMERIC
> +   && iform != INSN_FORM_PCREL_LOCAL
> +   && iform != INSN_FORM_PCREL_EXTERNAL);
> +})

Why can this not use just !address_is_prefixed?  Why is an
INSN_FORM_PCREL_EXTERNAL address neither prefixed nor non-prefixed?  What
does "BAD" mean, really?  Should that ever happen, should that not ICE?

It is very confusing if any valid memory is neither "prefixed_memory" nor
"non_prefixed_memory"!

> --- gcc/doc/md.texi   (revision 279182)
> +++ gcc/doc/md.texi   (working copy)
> @@ -3373,6 +3373,12 @@ asm ("st %1,%0" : "=m<>" (mem) : "r" (va
>  
>  is not.
>  
> +@item em
> +A memory operand that does not contain a prefixed address.
> +
> +@item ep
> +A memory operand that does contains a prefixed address.

Same comments as above.


Segher


Re: [Patch, Fortran] PR92896 [10 Regression] Fix - Prevent character conversion in array constructor

2019-12-17 Thread Steve Kargl
On Tue, Dec 17, 2019 at 05:28:05PM +, Mark Eggleston wrote:
> 
> On 17/12/2019 17:06, Steve Kargl wrote:
> > On Tue, Dec 17, 2019 at 03:41:41PM +, Mark Eggleston wrote:
> >> gcc/fortran/ChangeLog
> >>
> >>       Mark Eggleston  
> >>
> >>       PR fortran/92896
> >>       * array.c (walk_array_constructor): Replace call to cfg_convert_type
> > s/cfg_convert_type/gfc_convert_type
> >
> >>       with call to gfc_convert_type_warn with new argument set to true.
> >>       (check_element_type): Replace call to cfg_convert_type with call to
> >>       gfc_convert_type_warn with new argument set to true.
> >>       * gfortran.h: Add argument "array" to gfc_convert_type_warn default
> >>       value set to false.
> > Do all current uses of gfc_convert_type_warn need to be updated
> > to account for the new parameter?  That is, doesn't this introduce
> > a mismatch in the prototype and existing code?
> 
> I used a default value so all existing calls remain as they are and 
> default to false. So no mismatch.
> 

% cat a.h
#ifndef _STDBOOL_H_
#include 
#endif
float foo(int, float, bool tmp = false);
% cat a.c
#include "a.h"
void
bar(float x)
{
  int n;
  n = 1;
  x = foo(n, x);
}
% /usr/home/sgk/work/x/bin/gcc -Wall -c a.c
In file included from a.c:2:
a.h:1:32: error: expected ';', ',' or ')' before '=' token
1 | float foo(int, float, bool tmp = false);
  |^
a.c: In function 'bar':
a.c:8:7: warning: implicit declaration of function 'foo' 
[-Wimplicit-function-declaration]
8 |   x = foo(n, x);
  |   ^~~

-- 
Steve


Re: [PATCH] Fix symver attribute with LTO

2019-12-17 Thread Jan Hubicka
> Would it be equivalent to:
> 1) output foo_v2 local
> 2) producing static alias with local name (.L1)
> 3) do .symver .L1,foo@@@VERS_2
> That is somewhat more systematic and would not lead to false
> visibilities.

I spent some time playing with this.  An in order to 
1) be able to handle foo_v2 according to the resolution info
   (so it behaves like a regular symbol and can be called dirrectly,
localized and optimized)
2) get intended objdump -T relocations
3) do not polute global symbol tables

I ended up with the following codegen:

.type   foo_v2, @function
foo_v2:
.LFB1:
.cfi_startproc
movl$2, %eax
ret
.cfi_endproc
.LFE1:
.size   foo_v2, .-foo_v2
.globl  .LSYMVER0
.set.LSYMVER0,foo_v2
.symver .LSYMVER0, foo@@@VERS_2

This uses @@@ symver version of gas which seems to have odd semantics of
requiring to be passed global symbol name which it then tkes away and
produces foo@@VERS_2.

So the nm outoutp of the ltrans unit is:
 T foo_v1
0010 t foo_v2
 T foo@VERS_1
0010 T foo@@VERS_2

So the difference to your patch is that foo_v2 is static which enables
normal optimizations.

Since additional symbol alias is produced this would also make it
possible to attach multiple symver attributes with @@ string.

Does somehting like this make sense to you? Modulo the obvious buffer
overflow issue?
Honza

Index: lto/lto-common.c
===
--- lto/lto-common.c(revision 279178)
+++ lto/lto-common.c(working copy)
@@ -2818,6 +2818,10 @@ read_cgraph_and_symbols (unsigned nfiles
   IDENTIFIER_POINTER
 (DECL_ASSEMBLER_NAME (snode->decl)));
  }
+   /* Symbol versions are always used externally, but linker does not
+  report that correctly.  */
+   else if (snode->symver && *res == LDPR_PREVAILING_DEF_IRONLY)
+ snode->resolution = LDPR_PREVAILING_DEF_IRONLY_EXP;
else
  snode->resolution = *res;
   }
Index: varasm.c
===
--- varasm.c(revision 279178)
+++ varasm.c(working copy)
@@ -5970,9 +5970,47 @@ do_assemble_symver (tree decl, tree targ
   ultimate_transparent_alias_target ();
   ultimate_transparent_alias_target ();
 #ifdef ASM_OUTPUT_SYMVER_DIRECTIVE
-  ASM_OUTPUT_SYMVER_DIRECTIVE (asm_out_file,
-  IDENTIFIER_POINTER (target),
-  IDENTIFIER_POINTER (id));
+  if (TREE_PUBLIC (target) && DECL_VISIBILITY (target) == VISIBILITY_DEFAULT)
+ASM_OUTPUT_SYMVER_DIRECTIVE (asm_out_file,
+IDENTIFIER_POINTER
+  (DECL_ASSEMBLER_NAME (target)),
+IDENTIFIER_POINTER (id));
+  else
+{
+  int nameend;
+  for (nameend = 0; IDENTIFIER_POINTER (id)[nameend] != '@'; nameend++)
+   ;
+  if (IDENTIFIER_POINTER (id)[nameend + 1] != '@'
+ || IDENTIFIER_POINTER (id)[nameend + 2] == '@')
+   {
+ sorry_at (DECL_SOURCE_LOCATION (target),
+   "can not produce % of a symbol that is "
+   "not exported with default visibility");
+ return;
+   }
+  tree tmpdecl = copy_node (decl);
+  char buf[256];
+  static int symver_labelno;
+  targetm.asm_out.generate_internal_label (buf,
+  "LSYMVER", symver_labelno++);
+  SET_DECL_ASSEMBLER_NAME (tmpdecl, get_identifier (buf));
+  globalize_decl (tmpdecl);
+#ifdef ASM_OUTPUT_DEF_FROM_DECLS
+  ASM_OUTPUT_DEF_FROM_DECLS (asm_out_file, tmpdecl,
+DECL_ASSEMBLER_NAME (target));
+#else
+  ASM_OUTPUT_DEF (asm_out_file,
+ IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (tmpdecl)),
+ IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (target)));
+#endif
+  memcpy (buf, IDENTIFIER_POINTER (id), nameend + 2);
+  buf[nameend + 2] = '@';
+  strcpy (buf + nameend + 3, IDENTIFIER_POINTER (id) + nameend + 2);
+  ASM_OUTPUT_SYMVER_DIRECTIVE (asm_out_file,
+  IDENTIFIER_POINTER
+(DECL_ASSEMBLER_NAME (tmpdecl)),
+  buf);
+}
 #else
   error ("symver is only supported on ELF platforms");
 #endif


Re: [Patch, Fortran] PR92896 [10 Regression] Fix - Prevent character conversion in array constructor

2019-12-17 Thread Janne Blomqvist
On Tue, Dec 17, 2019 at 7:47 PM Steve Kargl
 wrote:
>
> On Tue, Dec 17, 2019 at 05:28:05PM +, Mark Eggleston wrote:
> >
> > On 17/12/2019 17:06, Steve Kargl wrote:
> > > On Tue, Dec 17, 2019 at 03:41:41PM +, Mark Eggleston wrote:
> > >> gcc/fortran/ChangeLog
> > >>
> > >>   Mark Eggleston  
> > >>
> > >>   PR fortran/92896
> > >>   * array.c (walk_array_constructor): Replace call to 
> > >> cfg_convert_type
> > > s/cfg_convert_type/gfc_convert_type
> > >
> > >>   with call to gfc_convert_type_warn with new argument set to true.
> > >>   (check_element_type): Replace call to cfg_convert_type with call to
> > >>   gfc_convert_type_warn with new argument set to true.
> > >>   * gfortran.h: Add argument "array" to gfc_convert_type_warn default
> > >>   value set to false.
> > > Do all current uses of gfc_convert_type_warn need to be updated
> > > to account for the new parameter?  That is, doesn't this introduce
> > > a mismatch in the prototype and existing code?
> >
> > I used a default value so all existing calls remain as they are and
> > default to false. So no mismatch.
> >
>
> % cat a.h
> #ifndef _STDBOOL_H_
> #include 
> #endif
> float foo(int, float, bool tmp = false);
> % cat a.c
> #include "a.h"
> void
> bar(float x)
> {
>   int n;
>   n = 1;
>   x = foo(n, x);
> }
> % /usr/home/sgk/work/x/bin/gcc -Wall -c a.c
> In file included from a.c:2:
> a.h:1:32: error: expected ';', ',' or ')' before '=' token
> 1 | float foo(int, float, bool tmp = false);
>   |^
> a.c: In function 'bar':
> a.c:8:7: warning: implicit declaration of function 'foo' 
> [-Wimplicit-function-declaration]
> 8 |   x = foo(n, x);
>   |   ^~~

Well, frontends are nowadays C++, so

a) No need to include stdbool.h, bool is a builtin type.

b) optional arguments are a thing (they are also used elsewhere in the
Fortran frontend).


-- 
Janne Blomqvist


Re: [Patch, Fortran] PR92896 [10 Regression] Fix - Prevent character conversion in array constructor

2019-12-17 Thread Steve Kargl
On Tue, Dec 17, 2019 at 08:04:30PM +0200, Janne Blomqvist wrote:
> 
> Well, frontends are nowadays C++, so
> 
> a) No need to include stdbool.h, bool is a builtin type.
> 
> b) optional arguments are a thing (they are also used elsewhere in the
> Fortran frontend).
> 

Ah, yes.  The creep of C++ into gfortran code continues.

-- 
Steve
"All Things Must Pass", G. Harrison


[WIP] OpenACC 'acc_attach*', 'acc_detach*' runtime library routines (was: [PATCH] OpenACC 2.6 manual deep copy support (attach/detach))

2019-12-17 Thread Thomas Schwinge
Hi!

On 2019-12-17T12:28:32+0100, Thomas Schwinge  wrote:
> As a first step, can you please split out just the code required to make
> the OpenACC 'acc_attach*', 'acc_detach*' runtime library routines work?

I've now simply done this myself (that is, code extraction from Julian's
patch, not any development, mind you), see the attached "[WIP] OpenACC
'acc_attach*', 'acc_detach*' runtime library routines".  15 minutes of
work, for anyone curious.

> Assuming there were no other defects in libgomp, whould this already make
> the 'libgomp.oacc-c-c++-common/deep-copy-3.c',
> 'libgomp.oacc-c-c++-common/deep-copy-5.c' test cases work?

That's indeed the case.  :-)

Now, to apply some review/polish.


Grüße
 Thomas


>From 19321c3dc7b96a305a51941c0a485f814af84130 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 17 Dec 2019 17:57:36 +0100
Subject: [PATCH] [WIP] OpenACC 'acc_attach*', 'acc_detach*' runtime library
 routines

---
 libgomp/libgomp.h |  10 ++
 libgomp/libgomp.map   |  10 ++
 libgomp/oacc-mem.c|  85 
 libgomp/openacc.h |   6 +
 libgomp/target.c  | 130 ++
 .../libgomp.oacc-c-c++-common/deep-copy-3.c   |  34 +
 .../libgomp.oacc-c-c++-common/deep-copy-5.c   |  81 +++
 7 files changed, 356 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-3.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-5.c

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index d65a1fa250b..56225c1482b 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -994,6 +994,9 @@ struct target_mem_desc {
 struct splay_tree_aux {
   /* Pointer to the original mapping of "omp declare target link" object.  */
   splay_tree_key link_key;
+  /* For a block with attached pointers, the attachment counters for each.
+ Only used for OpenACC.  */
+  uintptr_t *attach_count;
 };
 
 struct splay_tree_key_s {
@@ -1155,6 +1158,13 @@ extern void gomp_copy_dev2host (struct gomp_device_descr *,
 struct goacc_asyncqueue *, void *, const void *,
 size_t);
 extern uintptr_t gomp_map_val (struct target_mem_desc *, void **, size_t);
+extern void gomp_attach_pointer (struct gomp_device_descr *,
+ struct goacc_asyncqueue *, splay_tree,
+ splay_tree_key, uintptr_t, size_t,
+ struct gomp_coalesce_buf *);
+extern void gomp_detach_pointer (struct gomp_device_descr *,
+ struct goacc_asyncqueue *, splay_tree_key,
+ uintptr_t, bool, struct gomp_coalesce_buf *);
 
 extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 	  size_t, void **, void **,
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index e9a0e059a30..1b7022b38c7 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -484,6 +484,16 @@ OACC_2.5.1 {
 	acc_register_library;
 } OACC_2.5;
 
+OACC_2.6 {
+  global:
+	acc_attach;
+	acc_attach_async;
+	acc_detach;
+	acc_detach_async;
+	acc_detach_finalize;
+	acc_detach_finalize_async;
+} OACC_2.5.1;
+
 GOACC_2.0 {
   global:
 	GOACC_data_end;
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 297a4e5806c..b76dfc44ca1 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -918,6 +918,91 @@ acc_update_self_async (void *h, size_t s, int async)
 }
 
 
+void
+acc_attach_async (void **hostaddr, int async)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+  goacc_aq aq = get_goacc_asyncqueue (async);
+
+  struct splay_tree_key_s cur_node;
+  splay_tree_key n;
+
+  if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+return;
+
+  gomp_mutex_lock (_dev->lock);
+
+  cur_node.host_start = (uintptr_t) hostaddr;
+  cur_node.host_end = cur_node.host_start + sizeof (void *);
+  n = splay_tree_lookup (_dev->mem_map, _node);
+
+  if (n == NULL)
+gomp_fatal ("struct not mapped for acc_attach");
+
+  gomp_attach_pointer (acc_dev, aq, _dev->mem_map, n, (uintptr_t) hostaddr,
+		   0, NULL);
+
+  gomp_mutex_unlock (_dev->lock);
+}
+
+void
+acc_attach (void **hostaddr)
+{
+  acc_attach_async (hostaddr, acc_async_sync);
+}
+
+static void
+goacc_detach_internal (void **hostaddr, int async, bool finalize)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+  struct splay_tree_key_s cur_node;
+  splay_tree_key n;
+  struct goacc_asyncqueue *aq = get_goacc_asyncqueue (async);
+
+  if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+return;
+
+  gomp_mutex_lock (_dev->lock);
+
+  cur_node.host_start = (uintptr_t) hostaddr;
+  cur_node.host_end = cur_node.host_start + sizeof (void *);
+  n = splay_tree_lookup (_dev->mem_map, _node);
+
+  if (n == NULL)
+gomp_fatal ("struct not mapped for acc_detach");
+
+  gomp_detach_pointer (acc_dev, aq, n, (uintptr_t) hostaddr, finalize, NULL);
+
+  gomp_mutex_unlock (_dev->lock);
+}
+
+void

Re: [Patch] Add OpenACC 2.6's no_create

2019-12-17 Thread Tobias Burnus

Hi Thomas,

I am reasonably comfortable with the current patch (regarding your 
TODOs) – see attachment. It is the previous patch plus your changes plus 
one additional condition (see below) in target.c's first 
GOMP_MAP_IF_PRESENT handling.


I intent to re-test it tomorrow and then commit it, unless some other 
issues or comments come up. — See a bunch of comments below.


Cheers,

Tobias

On 12/3/19 4:16 PM, Thomas Schwinge wrote:

So that's specifically what you fixed above
(See previous reply in this email. Now added an acc_is_present check. 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00156.html)
Another thing: I've added just another little bit of testsuite 
coverage, and another thing broke. See "TODO" in attached incremental 
patch. […]
Files included, the other issue was XFAILed by you (and hence passed). A 
fix for that issue is: 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01135.html — and a 
completely separate issue. (That patch is small, very localized and 
orthogonal to this patch.)

The incremental Fortran test case changes have bene done in a rush; not
sure if they make much sense, or should see some further work applied to
them.


I think one can do more, but they are fine. I am not 100% sure how to 
read the following:


  ! The no_create clause is meant for partially shared-memory machines.  This
  ! test is written to work on non-shared-memory machines, though this is not
  ! necessarily a useful way to use the no_create clause in practice.
  !$acc parallel !no_create (var)

First, why is 'no_create(var)' now commented? – For this code, it should 
really work both ways and independent whether commented boils down to 
'copy' (currently) or 'present' (with my other patch, linked above).



With these items considered/addressed as you feel comfortable, this is OK
for trunk.



My TODO items:

--- libgomp/target.c
+++ libgomp/target.c
@@ -671,6 +671,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
}
else if ((kind & typemask) == GOMP_MAP_IF_PRESENT)
{
+ //TODO TS is confused.  Handling this here, will inhibit 
'gomp_map_vars_existing' being used a bit further below.
  tgt->list[i].key = NULL;
  tgt->list[i].offset = 0;
  has_firstprivate = true;


True – but should it? the only effect seems to be that it bumps the ref 
count. (Should it or shouldn't it?) In any case if the data is not 
present, it will fail in this section.


However, I think the following is missing before 'continue' – even 
though testing did not hit it:


  /* Handle the attach/pointer clause next to it later, together with
 GOMP_MAP_IF_PRESENT as the data might be not available.  */
  if (i + 1 < mapnum
  && ((typemask & get_kind (short_mapkind, kinds, i + 1))
  == GOMP_MAP_POINTER))
++i;


@@ -908,6 +910,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
  splay_tree_key n = splay_tree_lookup (mem_map, _node);
  if (n != NULL)
{
+ //TODO TS is confused.  Due to the way the handling of 
'GOMP_MAP_NO_ALLOC' is done in the first loop, we're here re-doing 
'gomp_map_vars_existing'?
  tgt->list[i].key = n;
  tgt->list[i].offset = cur_node.host_start - n->host_start;
  tgt->list[i].length = n->host_end - n->host_start;
Essentially, yes – except that we know here that the variable does exist 
– in the block above, it also works, but only if the variable has been 
mapped at some point.

@@ -917,6 +920,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
}
  else
{
+ //TODO This is basically 'GOMP_MAP_FIRSTPRIVATE_INT' 
handling?
  tgt->list[i].key = NULL;
  tgt->list[i].offset = OFFSET_INLINED;
  tgt->list[i].length = sizes[i];
Yes – but one could also call it 'hostaddrs[i] == NULL' handling, which 
makes more sense semantically.

@@ -928,6 +932,11 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
  switch (kind2 & typemask)
{
case GOMP_MAP_POINTER:
+ //TODO abort();
+ //TODO This code path is exercised by 
'libgomp.oacc-fortran/no_create-2.f90'.
+ //TODO TS does not yet understand why this is 
needed.
+ //TODO Is this somehow similar to 
'GOMP_MAP_TO_PSET' handling?
+
  /* The data is not present but we have an attach
 or pointer clause next.  Skip over it.  */
  i++;


Yes, as -fdump-tree-omplower shows, it is handled like a normal map, 
except that the variable itself gets a 'no_alloc'.



Re: [Patch, Fortran] PR92896 [10 Regression] Fix - Prevent character conversion in array constructor

2019-12-17 Thread Mark Eggleston



On 17/12/2019 17:06, Steve Kargl wrote:

On Tue, Dec 17, 2019 at 03:41:41PM +, Mark Eggleston wrote:

gcc/fortran/ChangeLog

      Mark Eggleston  

      PR fortran/92896
      * array.c (walk_array_constructor): Replace call to cfg_convert_type

s/cfg_convert_type/gfc_convert_type


      with call to gfc_convert_type_warn with new argument set to true.
      (check_element_type): Replace call to cfg_convert_type with call to
      gfc_convert_type_warn with new argument set to true.
      * gfortran.h: Add argument "array" to gfc_convert_type_warn default
      value set to false.

Do all current uses of gfc_convert_type_warn need to be updated
to account for the new parameter?  That is, doesn't this introduce
a mismatch in the prototype and existing code?


I used a default value so all existing calls remain as they are and 
default to false. So no mismatch.


regards Mark

--
https://www.codethink.co.uk/privacy.html



Re: [PATCH] add -Wmismatched-tags (PR 61339)

2019-12-17 Thread Jason Merrill

On 12/16/19 6:31 PM, Martin Sebor wrote:

+  class_decl_loc_t *rdl = class2loc.get (type_decl);
+  if (!rdl)
+{
+  rdl = _or_insert (type_decl);


I was thinking

class_decl_loc_t *rdl = _or_insert (type_decl);

OK with that change.

Jason



[C++ PATCH] Avoid weird inform without previos error during SFINAE (PR c++/92965)

2019-12-17 Thread Jakub Jelinek
Hi!

On the following testcase, complain & tf_error is 0 during sfinae, so we
don't emit error, but we called structural_type_p with explain=true anyway,
which emitted the inform messages.
Fixed by doing it only when we emit the error.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

BTW, is the testcase valid in C++17 mode?  GCC 7/8/9/trunk accept it,
but clang++ rejects it.

2019-12-17  Jakub Jelinek  

PR c++/92965
* pt.c (invalid_nontype_parm_type_p): Call structural_type_p with
explain=true only if emitting error.

* g++.dg/cpp2a/nontype-class27.C: New test.

--- gcc/cp/pt.c.jj  2019-12-11 18:19:03.188162534 +0100
+++ gcc/cp/pt.c 2019-12-17 14:21:48.903024760 +0100
@@ -25829,11 +25829,13 @@ invalid_nontype_parm_type_p (tree type,
return true;
   if (!structural_type_p (type))
{
- auto_diagnostic_group d;
  if (complain & tf_error)
-   error ("%qT is not a valid type for a template non-type parameter "
-  "because it is not structural", type);
- structural_type_p (type, true);
+   {
+ auto_diagnostic_group d;
+ error ("%qT is not a valid type for a template non-type "
+"parameter because it is not structural", type);
+ structural_type_p (type, true);
+   }
  return true;
}
   return false;
--- gcc/testsuite/g++.dg/cpp2a/nontype-class27.C.jj 2019-12-17 
14:35:42.339473136 +0100
+++ gcc/testsuite/g++.dg/cpp2a/nontype-class27.C2019-12-17 
14:26:13.461040058 +0100
@@ -0,0 +1,15 @@
+// PR c++/92965
+// { dg-do compile { target c++2a } }
+
+template
+class TS {
+  int x;   // { dg-bogus "is not public" }
+public:
+  constexpr TS(int) {}
+};
+TS(int) -> TS<1>;
+
+template void foo() {}
+template void foo() {}
+
+void test() { foo<2>(); }

Jakub



[C++ PATCH] Fix bad defaulted comparison operator error recovery (PR c++/92966)

2019-12-17 Thread Jakub Jelinek
Hi!

When the prototype of defaulted comparison operator is incorrect, we set
DECL_MAYBE_DELETED, but don't set DECL_DEFAULTED_FN and other flags, so we
ICE during synthetize_method.

Seems only marking DECL_MAYBE_DELETED those operators that we are also going
to mark DECL_DEFAULTED_FN results in better error recovery.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-12-17  Jakub Jelinek  

PR c++/92966
* method.c (early_check_defaulted_comparison): Don't set
DECL_MAYBE_DELETED when returning false.

* g++.dg/cpp2a/spaceship-eq8.C: New test.

--- gcc/cp/method.c.jj  2019-12-11 18:19:03.185162579 +0100
+++ gcc/cp/method.c 2019-12-17 15:28:57.819285145 +0100
@@ -1146,7 +1146,7 @@ early_check_defaulted_comparison (tree f
 }
 
   /* We still need to deduce deleted/constexpr/noexcept and maybe return. */
-  DECL_MAYBE_DELETED (fn) = true;
+  DECL_MAYBE_DELETED (fn) = ok;
 
   return ok;
 }
--- gcc/testsuite/g++.dg/cpp2a/spaceship-eq8.C.jj   2019-12-17 
15:46:35.390322157 +0100
+++ gcc/testsuite/g++.dg/cpp2a/spaceship-eq8.C  2019-12-17 15:46:31.493380971 
+0100
@@ -0,0 +1,8 @@
+// PR c++/92966
+// { dg-do compile { target c++2a } }
+
+struct S {
+  int operator==(const S&) const = default;// { dg-error "must return 
'bool'" }
+  int s;   // { dg-message "declared here" 
"" { target *-*-* } .-1 }
+};
+static_assert(S{} == S{}); // { dg-error "" }

Jakub



[C++ PATCH] Disallow defaulted comparison operators in C++11-17 modes (PR c++/92973)

2019-12-17 Thread Jakub Jelinek
Hi!

As discussed on IRC, defaulted comparison operators were added only in
C++2a, so we shouldn't accept it in older standard modes.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-12-17  Jakub Jelinek  

PR c++/92973
* method.c (early_check_defaulted_comparison): For C++17 and earlier
diagnose defaulted comparison operators.

* g++.dg/cpp0x/spaceship-eq1.C: New test.

--- gcc/cp/method.c.jj  2019-12-17 15:28:57.819285145 +0100
+++ gcc/cp/method.c 2019-12-17 16:21:24.462870308 +0100
@@ -1092,6 +1092,13 @@ early_check_defaulted_comparison (tree f
 ctx = DECL_FRIEND_CONTEXT (fn);
   bool ok = true;
 
+  if (cxx_dialect < cxx2a)
+{
+  error_at (loc, "defaulted %qD only available with %<-std=c++2a%> or "
+"%<-std=gnu++2a%>", fn);
+  return false;
+}
+
   if (!DECL_OVERLOADED_OPERATOR_IS (fn, SPACESHIP_EXPR)
   && !same_type_p (TREE_TYPE (TREE_TYPE (fn)), boolean_type_node))
 {
--- gcc/testsuite/g++.dg/cpp0x/spaceship-eq1.C.jj   2019-12-17 
16:24:55.303697061 +0100
+++ gcc/testsuite/g++.dg/cpp0x/spaceship-eq1.C  2019-12-17 16:24:29.070091886 
+0100
@@ -0,0 +1,5 @@
+// PR c++/92973
+// { dg-do compile { target c++11 } }
+
+struct S { bool operator==(const S&) const = default; int s; };// { 
dg-error "only available with" "" { target c++17_down } }
+struct T { bool operator!=(const T&) const = default; int t; };// { 
dg-error "only available with" "" { target c++17_down } }

Jakub



Re: [modulo-sched][PATCH] Fix PR92591

2019-12-17 Thread Roman Zhuykov

Hello.


As pointed out in the PR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92591#c1, the test can be
fixed by DFA-checking more adjacent row sequences in the partial
schedule.
I've found that on powerpc64 gcc.c-torture/execute/pr61682.c test
catches same issue with -Os -fmodulo-sched-allow-regmoves with some
non-zero sms-dfa-history parameter values, so I added that test using
#include as second test into the patch.

Minor separate patch about modulo-sched parameters is also attached.
If no objection, I'll commit this two patches into trunk tomorrow
together with my PR90001 fix.

Trunk and 8/9 branches succesfully regstrapped on x64, and
cross-compiler check-gcc tested on ppc, ppc64, arm, aarch64, ia64 and
s390. Certainly a lot of testing were also done with changing default
sms-dfa-history value to some other than zero.


I think this should be backported into 9 and 8 branches, because second 
example gives an ICE there.  But I'm not sure about backporting 
sms-dfa-history upper bound limitation (<=16) into params.def in 
branches.  Compile-time may grow dramatically for huge values like 1000, 
so we have to limit it.  Is it ok to limit the parameter, or maybe it's 
better to implement some "history=MIN(history, 16)" logic in 
modulo-sched.c ?


I see that sometimes parameter limitation is backported, examples are:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80663
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79576

While at it, maybe you have some thoughts about selected value of 16.  
Maximum reasonable value for sms-dfa-history param seems to be max 
latency between two insns on target platform (calculated by dep_cost 
function in haifa-sched.c).


I'm posting full backport patch here, it suits 8/9 branches. Jakub and 
Richard, is it OK ?


Roman

Backport from mainline
gcc/ChangeLog:

2019-12-17  Roman Zhuykov  

* modulo-sched.c (ps_add_node_check_conflicts): Improve checking
for history > 0 case.
* params.def (sms-dfa-history): Limit to 16.

gcc/testsuite/ChangeLog:

2019-12-17  Roman Zhuykov  

* gcc.dg/pr92951-1.c: New test.
* gcc.dg/pr92951-2.c: New test.


diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c
--- a/gcc/modulo-sched.c
+++ b/gcc/modulo-sched.c
@@ -3209,7 +3209,7 @@ ps_add_node_check_conflicts (partial_schedule_ptr 
ps, int n,

 int c, sbitmap must_precede,
 sbitmap must_follow)
 {
-  int has_conflicts = 0;
+  int i, first, amount, has_conflicts = 0;
   ps_insn_ptr ps_i;

   /* First add the node to the PS, if this succeeds check for
@@ -3217,23 +3217,32 @@ ps_add_node_check_conflicts 
(partial_schedule_ptr ps, int n,

   if (! (ps_i = add_node_to_ps (ps, n, c, must_precede, must_follow)))
 return NULL; /* Failed to insert the node at the given cycle.  */

-  has_conflicts = ps_has_conflicts (ps, c, c)
- || (ps->history > 0
- && ps_has_conflicts (ps,
-  c - ps->history,
-  c + ps->history));
-
-  /* Try different issue slots to find one that the given node can be
- scheduled in without conflicts.  */
-  while (has_conflicts)
+  while (1)
 {
+  has_conflicts = ps_has_conflicts (ps, c, c);
+  if (ps->history > 0 && !has_conflicts)
+   {
+ /* Check all 2h+1 intervals, starting from c-2h..c up to c..2h,
+but not more than ii intervals.  */
+ first = c - ps->history;
+ amount = 2 * ps->history + 1;
+ if (amount > ps->ii)
+   amount = ps->ii;
+ for (i = first; i < first + amount; i++)
+   {
+ has_conflicts = ps_has_conflicts (ps,
+   i - ps->history,
+   i + ps->history);
+ if (has_conflicts)
+   break;
+   }
+   }
+  if (!has_conflicts)
+   break;
+  /* Try different issue slots to find one that the given node can 
be

+scheduled in without conflicts.  */
   if (! ps_insn_advance_column (ps, ps_i, must_follow))
break;
-  has_conflicts = ps_has_conflicts (ps, c, c)
- || (ps->history > 0
- && ps_has_conflicts (ps,
-  c - ps->history,
-  c + ps->history));
 }

   if (has_conflicts)
diff --git a/gcc/testsuite/gcc.dg/pr92951-1.c 
b/gcc/testsuite/gcc.dg/pr92951-1.c

--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr92951-1.c
@@ -0,0 +1,11 @@
+/* PR rtl-optimization/92591 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fmodulo-sched -fweb -fno-dce -fno-ivopts 
-fno-sched-pressure -fno-tree-loop-distribute-patterns --param 
sms-dfa-history=1" } */
+/* { dg-additional-options "-mcpu=e500mc" { target { powerpc-*-* } } } 
*/

+
+void
+wf (char *mr, int tc)
+{
+  while (tc-- > 0)
+*mr++ = 0;

[PATCH] Avoid suspicious -Wduplicate-branches warning in lto-wrapper.c (PR lto/92972)

2019-12-17 Thread Jakub Jelinek
Hi!

big ? "-fno-pie" : "-fno-pie" doesn't make much sense, either we want to
use big ? "-fno-PIE" : "-fno-pie", but as both mean the same thing, I think
just using "-fno-pie" is good enough.  + a few formatting nits and one
comment typo.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-12-17  Jakub Jelinek  

PR lto/92972
* lto-wrapper.c (merge_and_complain): Use just "-fno-pie" instead of
big ? "-fno-pie" : "-fno-pie".  Formatting fixes.  Fix comment typo.

--- gcc/lto-wrapper.c.jj2019-09-11 13:36:14.057264373 +0200
+++ gcc/lto-wrapper.c   2019-12-17 12:28:36.135056568 +0100
@@ -408,7 +408,7 @@ merge_and_complain (struct cl_decoded_op
   /* Merge PIC options:
   -fPIC + -fpic = -fpic
   -fPIC + -fno-pic = -fno-pic
-  -fpic/-fPIC + nothin = nothing.  
+  -fpic/-fPIC + nothing = nothing.
  It is a common mistake to mix few -fPIC compiled objects into otherwise
  non-PIC code.  We do not want to build everything with PIC then.
 
@@ -438,9 +438,10 @@ merge_and_complain (struct cl_decoded_op
   && pie_option->opt_index == OPT_fPIE;
(*decoded_options)[j].opt_index = big ? OPT_fPIE : OPT_fpie;
if (pie_option->value)
- (*decoded_options)[j].canonical_option[0] = big ? "-fPIE" : 
"-fpie";
+ (*decoded_options)[j].canonical_option[0]
+   = big ? "-fPIE" : "-fpie";
else
- (*decoded_options)[j].canonical_option[0] = big ? "-fno-pie" 
: "-fno-pie";
+ (*decoded_options)[j].canonical_option[0] = "-fno-pie";
(*decoded_options)[j].value = pie_option->value;
j++;
  }
@@ -482,7 +483,7 @@ merge_and_complain (struct cl_decoded_op
  {
(*decoded_options)[j].opt_index = OPT_fpie;
(*decoded_options)[j].canonical_option[0]
-= pic_option->value ? "-fpie" : "-fno-pie";
+ = pic_option->value ? "-fpie" : "-fno-pie";
  }
else if (!pic_option->value)
  (*decoded_options)[j].canonical_option[0] = "-fno-pie";

Jakub



[C++ PATCH] * name-lookup.c (get_std_name_hint): Add std::byte.

2019-12-17 Thread Jason Merrill
I noticed we didn't have a hint for std::byte yet.

Tested x86_64-pc-linux-gnu, applying to trunk.

---
 gcc/cp/name-lookup.c| 2 ++
 gcc/testsuite/g++.dg/lookup/missing-std-include-9.C | 3 +++
 2 files changed, 5 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/lookup/missing-std-include-9.C

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index e64cd9a9d66..181dad0e2f2 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -5641,6 +5641,8 @@ get_std_name_hint (const char *name)
 /* . */
 {"condition_variable", "", cxx11},
 {"condition_variable_any", "", cxx11},
+/* .  */
+{"byte", "", cxx17},
 /* .  */
 {"deque", "", cxx98},
 /* .  */
diff --git a/gcc/testsuite/g++.dg/lookup/missing-std-include-9.C 
b/gcc/testsuite/g++.dg/lookup/missing-std-include-9.C
new file mode 100644
index 000..f8e1e1dd8a7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/missing-std-include-9.C
@@ -0,0 +1,3 @@
+std::byte b;// { dg-error "byte" }
+// { dg-message "cstddef" "" { target c++17 } .-1 }
+// { dg-message "C..17" "" { target c++14_down } .-2 }

base-commit: ada5a6defe4bb8a68098bed51d0f22fc78d7efbc
-- 
2.18.1



[C++ PATCH] PR c++/79592 - missing explanation of invalid constexpr.

2019-12-17 Thread Jason Merrill
We changed months back to use the pre-generic form for constexpr evaluation,
but explain_invalid_constexpr_fn was still using DECL_SAVED_TREE.  This
mostly works, but misses some issues due to folding.  So with this patch we
save the pre-generic form of constexpr functions even when we know they
can't produce a constant result.

Tested x86_64-pc-linux-gnu, applying to trunk.

* constexpr.c (register_constexpr_fundef): Do store the body of a
template instantiation that is not potentially constant.
(explain_invalid_constexpr_fn): Look it up.
(cxx_eval_call_expression): Check fundef->result.
---
 gcc/cp/constexpr.c| 27 +--
 gcc/testsuite/g++.dg/cpp0x/constexpr-nsdmi1.C | 12 +
 2 files changed, 31 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-nsdmi1.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index f3f03e7d621..87d78d26728 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -885,16 +885,16 @@ register_constexpr_fundef (tree fun, tree body)
   return NULL;
 }
 
-  if (!potential_rvalue_constant_expression (massaged))
-{
-  if (!DECL_GENERATED_P (fun))
-   require_potential_rvalue_constant_expression (massaged);
-  return NULL;
-}
+  bool potential = potential_rvalue_constant_expression (massaged);
+  if (!potential && !DECL_GENERATED_P (fun))
+require_potential_rvalue_constant_expression (massaged);
 
   if (DECL_CONSTRUCTOR_P (fun)
   && cx_check_missing_mem_inits (DECL_CONTEXT (fun),
 massaged, !DECL_GENERATED_P (fun)))
+potential = false;
+
+  if (!potential && !DECL_GENERATED_P (fun))
 return NULL;
 
   /* Create the constexpr function table if necessary.  */
@@ -917,6 +917,12 @@ register_constexpr_fundef (tree fun, tree body)
   if (clear_ctx)
 DECL_CONTEXT (DECL_RESULT (fun)) = NULL_TREE;
 
+  if (!potential)
+/* For a template instantiation, we want to remember the pre-generic body
+   for explain_invalid_constexpr_fn, but do tell cxx_eval_call_expression
+   that it doesn't need to bother trying to expand the function.  */
+entry.result = error_mark_node;
+
   gcc_assert (*slot == NULL);
   *slot = ggc_alloc ();
   **slot = entry;
@@ -962,11 +968,15 @@ explain_invalid_constexpr_fn (tree fun)
 {
   /* Then if it's OK, the body.  */
   if (!DECL_DECLARED_CONSTEXPR_P (fun)
- && !LAMBDA_TYPE_P (CP_DECL_CONTEXT (fun)))
+ && DECL_DEFAULTED_FN (fun))
explain_implicit_non_constexpr (fun);
   else
{
- body = massage_constexpr_body (fun, DECL_SAVED_TREE (fun));
+ if (constexpr_fundef *fd = retrieve_constexpr_fundef (fun))
+   body = fd->body;
+ else
+   body = DECL_SAVED_TREE (fun);
+ body = massage_constexpr_body (fun, body);
  require_potential_rvalue_constant_expression (body);
  if (DECL_CONSTRUCTOR_P (fun))
cx_check_missing_mem_inits (DECL_CONTEXT (fun), body, true);
@@ -1919,6 +1929,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
 {
   new_call.fundef = retrieve_constexpr_fundef (fun);
   if (new_call.fundef == NULL || new_call.fundef->body == NULL
+ || new_call.fundef->result == error_mark_node
  || fun == current_function_decl)
 {
  if (!ctx->quiet)
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-nsdmi1.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-nsdmi1.C
new file mode 100644
index 000..b94cf3099f4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-nsdmi1.C
@@ -0,0 +1,12 @@
+// PR c++/79592
+// { dg-do compile { target c++11 } }
+
+struct pthread_mutex {
+  void *m_ptr;
+};
+
+struct M {
+  pthread_mutex m = { ((void *) 1LL) }; // { dg-error "reinterpret_cast" }
+};
+
+constexpr M m; // { dg-error "M::M" }

base-commit: 7484780e06a758b7a02f270ae8af30fd2c11fdef
-- 
2.18.1



[C++ PATCH] PR c++/92576 - redeclaration of variable template.

2019-12-17 Thread Jason Merrill
The variable templates patch way back when forgot to add handling here.  The
simplest answer seems to be recursing to the underlying declaration.

Tested x86_64-pc-linux-gnu, applying to trunk.

* decl.c (redeclaration_error_message): Recurse for variable
templates.
---
 gcc/cp/decl.c| 16 +---
 gcc/testsuite/g++.dg/cpp1y/var-templ32.C |  2 +-
 gcc/testsuite/g++.dg/cpp1y/var-templ65.C |  5 +
 3 files changed, 11 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ65.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 6dec5838303..86717dc8fed 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -2977,20 +2977,14 @@ redeclaration_error_message (tree newdecl, tree olddecl)
 {
   tree nt, ot;
 
-  if (TREE_CODE (DECL_TEMPLATE_RESULT (newdecl)) == TYPE_DECL)
-   {
- if (COMPLETE_TYPE_P (TREE_TYPE (newdecl))
- && COMPLETE_TYPE_P (TREE_TYPE (olddecl)))
-   return G_("redefinition of %q#D");
- return NULL;
-   }
-
   if (TREE_CODE (DECL_TEMPLATE_RESULT (newdecl)) == CONCEPT_DECL)
 return G_("redefinition of %q#D");
 
-  if (TREE_CODE (DECL_TEMPLATE_RESULT (newdecl)) != FUNCTION_DECL
- || (DECL_TEMPLATE_RESULT (newdecl)
- == DECL_TEMPLATE_RESULT (olddecl)))
+  if (TREE_CODE (DECL_TEMPLATE_RESULT (newdecl)) != FUNCTION_DECL)
+   return redeclaration_error_message (DECL_TEMPLATE_RESULT (newdecl),
+   DECL_TEMPLATE_RESULT (olddecl));
+
+  if (DECL_TEMPLATE_RESULT (newdecl) == DECL_TEMPLATE_RESULT (olddecl))
return NULL;
 
   nt = DECL_TEMPLATE_RESULT (newdecl);
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ32.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ32.C
index 80077a16b56..6767ff1d9c6 100644
--- a/gcc/testsuite/g++.dg/cpp1y/var-templ32.C
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ32.C
@@ -4,4 +4,4 @@ template
 bool V1 = true;
 
 template
-bool V1 = false; // { dg-error "primary template|not deducible" }
+bool V1 = false; // { dg-error "primary template|redefinition|not 
deducible" }
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ65.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ65.C
new file mode 100644
index 000..10398bb793f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ65.C
@@ -0,0 +1,5 @@
+// PR c++/84255
+// { dg-do compile { target c++14 } }
+
+template constexpr int var;
+template constexpr int var = 1; // { dg-error "redefinition" }

base-commit: adbad0a15e032b7be2d89f7bff0590714fe05476
-- 
2.18.1



Re: [PATCH] V10 patch #4, Add new prefixed/non-prefixed memory constraints

2019-12-17 Thread Michael Meissner
On Tue, Dec 17, 2019 at 11:15:29AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Dec 11, 2019 at 07:29:05PM -0500, Michael Meissner wrote:
> > +(define_memory_constraint "em"
> > +  "A memory operand that does not contain a prefixed address."
> > +  (and (match_code "mem")
> > +   (match_operand 0 "non_prefixed_memory")))
> > +
> > +(define_memory_constraint "ep"
> > +  "A memory operand that does contains a prefixed address."
> > +  (and (match_code "mem")
> > +   (match_operand 0 "prefixed_memory")))
> 
> "does contain".  Or maybe just say "with a non-prefixed address" and
> "with a prefixed address"?

Ok.

> > +;; Return true if the operand is a valid memory address that does not use a
> > +;; prefixed address.
> > +(define_predicate "non_prefixed_memory"
> > +  (match_code "mem")
> > +{
> > +  enum insn_form iform
> > += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> > +
> > +  return (iform != INSN_FORM_BAD
> > +  && iform != INSN_FORM_PREFIXED_NUMERIC
> > + && iform != INSN_FORM_PCREL_LOCAL
> > + && iform != INSN_FORM_PCREL_EXTERNAL);
> > +})
> 
> Why can this not use just !address_is_prefixed?  Why is an
> INSN_FORM_PCREL_EXTERNAL address neither prefixed nor non-prefixed?  What
> does "BAD" mean, really?  Should that ever happen, should that not ICE?

You can't just invert !address_is_prefixed, because it would all things that
may not be valid memory addresses.

So we could just do:

{
  /* If the operand is not a valid memory operand even if it is not prefixed,
 do not return true.  */
  if (!memory_operand (op, mode))
return false;

  return !address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
}

It is important that the predicate not return true if the operand is NOT a
valid memory address.  If you allow non-valid memory addresses, the register
allocator will create things like:

(mem:MODE (plus:DI (reg:DI x)
   (plus:DI (reg:DI y)
(const_int z

Or some such -- I forget the exact sequence it created.  A later pass would
then choke with bad insn.

INSN_FORM_BAD just means that the operand is not valid as a memory address.

> It is very confusing if any valid memory is neither "prefixed_memory" nor
> "non_prefixed_memory"!

The point was to make sure the memory is valid.  Once it is a valid memory
address, then just a simple !address_is_prefixed will work.

> > --- gcc/doc/md.texi (revision 279182)
> > +++ gcc/doc/md.texi (working copy)
> > @@ -3373,6 +3373,12 @@ asm ("st %1,%0" : "=m<>" (mem) : "r" (va
> >  
> >  is not.
> >  
> > +@item em
> > +A memory operand that does not contain a prefixed address.
> > +
> > +@item ep
> > +A memory operand that does contains a prefixed address.
> 
> Same comments as above.

Ok.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: C++ PATCH for c++/88337 - Implement P1327R1: Allow dynamic_cast in constexpr

2019-12-17 Thread Marek Polacek
On Mon, Dec 16, 2019 at 04:00:14PM -0500, Jason Merrill wrote:
> On 12/16/19 3:55 PM, Jason Merrill wrote:
> > On 12/14/19 4:25 PM, Marek Polacek wrote:
> > > On Fri, Dec 13, 2019 at 05:56:57PM -0500, Jason Merrill wrote:
> > > > On 12/13/19 3:20 PM, Marek Polacek wrote:
> > > > > +  /* Given dynamic_cast(v),
> > > > > +
> > > > > + [expr.dynamic.cast] If C is the class type to which T
> > > > > points or refers,
> > > > > + the runtime check logically executes as follows:
> > > > > +
> > > > > + If, in the most derived object pointed (referred) to
> > > > > by v, v points
> > > > > + (refers) to a public base class subobject of a C
> > > > > object, and if only
> > > > > + one object of type C is derived from the subobject
> > > > > pointed (referred)
> > > > > + to by v the result points (refers) to that C object.
> > > > > +
> > > > > + In this case, HINT >= 0.  */
> > > > > +  if (hint >= 0)
> > > > 
> > > > Won't this code work for hint == -3 as well?
> > > 
> > > Yes, it does.  In fact, none of the tests was testing the hint == -3
> > > case, so
> > > I've fixed the code up and added constexpr-dynamic15.C to test it.
> > > 
> > > > > +    {
> > > > > +  /* Look for a component with type TYPE.  */
> > > > > +  obj = get_component_with_type (obj, type);
> > > > 
> > > > You don't seem to use mdtype at all in this case.  Shouldn't
> > > > get_component_with_type stop at mdtype if it hasn't found type yet?
> > > 
> > > It was used for diagnostics but not in get_component_with_type.  It makes
> > > sense to stop at MDTYPE; I've adjusted the code to do so.  E.g., if
> > > we have OBJ in the form of g.D.2121.D.2122.D.2123.D.2124, usually the
> > > component with the most derived type is "g", but in a 'tor, it can be
> > > a different component too.
> > > 
> > > > > +  /* If not found or not accessible, give an error.  */
> > > > > +  if (obj == NULL_TREE || obj == error_mark_node)
> > > > > +    {
> > > > > +  if (reference_p)
> > > > > +    {
> > > > > +  if (!ctx->quiet)
> > > > > +    {
> > > > > +  error_at (loc, "reference % failed");
> > > > > +  if (obj == NULL_TREE)
> > > > > +    inform (loc, "dynamic type %qT of its operand does not "
> > > > > +    "have an unambiguous public base class %qT",
> > > > > +    mdtype, type);
> > > > > +  else
> > > > > +    inform (loc, "static type %qT of its operand is a "
> > > > > +    "non-public base class of dynamic type %qT",
> > > > > +    objtype, type);
> > > > > +
> > > > > +    }
> > > > > +  *non_constant_p = true;
> > > > > +    }
> > > > > +  return integer_zero_node;
> > > > > +    }
> > > > > +  else
> > > > > +    /* The result points to the TYPE object.  */
> > > > > +    return cp_build_addr_expr (obj, complain);
> > > > > +    }
> > > > > +  /* Otherwise, if v points (refers) to a public base class
> > > > > subobject of the
> > > > > + most derived object, and the type of the most derived
> > > > > object has a base
> > > > > + class, of type C, that is unambiguous and public, the
> > > > > result points
> > > > > + (refers) to the C subobject of the most derived object.
> > > > > +
> > > > > + But it can also be an invalid case.  */
> > > > 
> > > > And I think we need to fall through to this code if the hint
> > > > turns out to be
> > > > wrong, i.e. V is a public base of C, but v is not that
> > > > subobject, but rather
> > > > a sibling base of C, like
> > > 
> > > True.  HINT is really just an optimization hint, nothing more.  I've
> > > adjusted
> > > the code to fall through to the normal processing if the HINT >= 0
> > > or -3 case
> > > doesn't succeed.
> > > 
> > > > struct A { virtual void f(); };
> > > > struct B1: A { };
> > > > struct B2: A { };
> > > > struct C: B1, B2 { };
> > > > int main()
> > > > {
> > > >    C c;
> > > >    A* ap = (B1*)c;
> > > >    constexpr auto p = dynamic_cast(ap); // should succeed
> > > > }
> > > 
> > > Whew, there's always One More Case. :/  New constexpr-dynamic16.c
> > > covers it.
> > > 
> > > > > --- /dev/null
> > > > > +++ gcc/testsuite/g++.dg/cpp2a/constexpr-dynamic11.C
> > > > > @@ -0,0 +1,34 @@
> > > > > +// PR c++/88337 - Implement P1327R1: Allow
> > > > > dynamic_cast/typeid in constexpr.
> > > > > +// { dg-do compile { target c++2a } }
> > > > > +
> > > > > +// dynamic_cast in a constructor.
> > > > > +
> > > > > +struct V {
> > > > > +  virtual void f();
> > > > > +};
> > > > > +
> > > > > +struct A : V { };
> > > > > +
> > > > > +struct B : V {
> > > > > +  constexpr B(V*, A*);
> > > > > +};
> > > > > +
> > > > > +struct D : A, B {
> > > > > +  constexpr D() : B((A*)this, this) { } // { dg-message "in
> > > > > 'constexpr' expansion of" }
> > > > > +};
> > > > > +
> > > > > +constexpr B::B(V* v, A* a)
> > > > > +{
> > > > > +  // well-defined: v of type V*, V base of B results in B*
> > > > > +  

[PATCH 4/4] analyzer: purge state for unknown function calls

2019-12-17 Thread David Malcolm
Whilst analyzing the reproducer for detecting CVE-2005-1689
(krb5-1.4.1's src/lib/krb5/krb/recvauth.c), the analyzer reports
a false double-free of the form:

  krb5_xfree(inbuf.data);
  krb5_read_message(..., );
  krb5_xfree(inbuf.data); /* false diagnostic here.  */

where the call to krb5_read_message overwrites inbuf.data with
a freshly-malloced buffer.

This patch fixes the issue by purging state more thorougly when
handling a call with unknown behavior, by walking the graph of
memory regions that are reachable from the call.

gcc/analyzer/ChangeLog:
* analyzer.h (fndecl_has_gimple_body_p): New decl.
* engine.cc (impl_region_model_context::on_unknown_change): New
function.
(fndecl_has_gimple_body_p): Make non-static.
(exploded_node::on_stmt): Treat __analyzer_dump_exploded_nodes as
known.  Track whether we have a call with unknown side-effects and
pass it to on_call_post.
* exploded-graph.h (impl_region_model_context::on_unknown_change):
New decl.
* program-state.cc (sm_state_map::on_unknown_change): New function.
* program-state.h (sm_state_map::on_unknown_change): New decl.
* region-model.cc: Include "bitmap.h".
(region_model::on_call_pre): Return a bool, capturing whether the
call has unknown side effects.
(region_model::on_call_post): Add arg "bool unknown_side_effects"
and if true, call handle_unrecognized_call.
(class reachable_regions): New class.
(region_model::handle_unrecognized_call): New function.
* region-model.h (region_model::on_call_pre): Return a bool.
(region_model::on_call_post): Add arg "bool unknown_side_effects".
(region_model::handle_unrecognized_call): New decl.
(region_model_context::on_unknown_change): New vfunc.
(test_region_model_context::on_unknown_change): New function.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/data-model-1.c: Remove xfail.
* gcc.dg/analyzer/data-model-5b.c: Likewise.
* gcc.dg/analyzer/data-model-5c.c: Likewise.
* gcc.dg/analyzer/setjmp-3.c: Mark "foo" as pure.
* gcc.dg/analyzer/setjmp-4.c: Likewise.
* gcc.dg/analyzer/setjmp-6.c: Likewise.
* gcc.dg/analyzer/setjmp-7.c: Likewise.
* gcc.dg/analyzer/setjmp-7a.c: Likewise.
* gcc.dg/analyzer/setjmp-8.c: Likewise.
* gcc.dg/analyzer/setjmp-9.c: Likewise.
* gcc.dg/analyzer/unknown-fns.c: New test.
---
 gcc/analyzer/analyzer.h   |   2 +
 gcc/analyzer/engine.cc|  28 ++-
 gcc/analyzer/exploded-graph.h |   2 +
 gcc/analyzer/program-state.cc |   8 +
 gcc/analyzer/program-state.h  |   2 +
 gcc/analyzer/region-model.cc  | 217 +-
 gcc/analyzer/region-model.h   |  16 +-
 gcc/testsuite/gcc.dg/analyzer/data-model-1.c  |   4 +-
 gcc/testsuite/gcc.dg/analyzer/data-model-5b.c |   3 +-
 gcc/testsuite/gcc.dg/analyzer/data-model-5c.c |  10 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-3.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-4.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-6.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-7.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-7a.c |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-8.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-9.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/unknown-fns.c   | 115 ++
 18 files changed, 383 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/unknown-fns.c

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 987e2fe43f4c..17a24d813c62 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -82,6 +82,8 @@ extern void register_analyzer_pass ();
 
 extern label_text make_label_text (bool can_colorize, const char *fmt, ...);
 
+extern bool fndecl_has_gimple_body_p (tree fndecl);
+
 /* An RAII-style class for pushing/popping cfun within a scope.
Doing so ensures we get "In function " announcements
from the diagnostics subsystem.  */
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 93babf67a87b..162940a2bfa9 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -106,6 +106,15 @@ impl_region_model_context::on_svalue_purge (svalue_id 
first_unused_sid,
   return total;
 }
 
+void
+impl_region_model_context::on_unknown_change (svalue_id sid)
+{
+  int sm_idx;
+  sm_state_map *smap;
+  FOR_EACH_VEC_ELT (m_new_state->m_checker_states, sm_idx, smap)
+smap->on_unknown_change (sid);
+}
+
 /* class setjmp_svalue : public svalue.  */
 
 /* Compare the fields of this setjmp_svalue with OTHER, returning true
@@ -846,7 +855,7 @@ exploded_node::dump (const extrinsic_state _state) const
 /* Return true if FNDECL has a gimple body.  */
 // TODO: is there a pre-canned way to do this?
 
-static bool
+bool
 

[PATCH 0/4] analyzer: Fixes for problems seen with CVE-2005-1689

2019-12-17 Thread David Malcolm
I attempted to use the analyzer to detect CVE-2005-1689, a double-free in
krb5-1.4.1's src/lib/krb5/krb/recvauth.c

With v1-v4 of the analyzer, it emits 11 double-free warnings:
  https://dmalcolm.fedorapeople.org/gcc/2019-11-13/CVE-2005-1689.html
of which most were either false positives or duplicates.

With this patch kit, the analyzer emits just 2 double-free warnings,
both of which appear to be genuine problems:
  https://dmalcolm.fedorapeople.org/gcc/2019-12-17/CVE-2005-1689.html

(the output is still very verbose, but that can wait to a follow-up)

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

I've pushed these patches to dmalcolm/analyzer on the GCC git mirror.

David Malcolm (4):
  analyzer: add ChangeLog
  analyzer: better logging for dedupe_winners::add
  analyzer: fix dedupe issue seen with CVE-2005-1689
  analyzer: purge state for unknown function calls

 gcc/analyzer/ChangeLog|  10 +
 gcc/analyzer/analyzer.h   |   2 +
 gcc/analyzer/diagnostic-manager.cc|  37 ++-
 gcc/analyzer/diagnostic-manager.h |  13 +-
 gcc/analyzer/engine.cc|  28 ++-
 gcc/analyzer/exploded-graph.h |   2 +
 gcc/analyzer/pending-diagnostic.cc|   9 +
 gcc/analyzer/pending-diagnostic.h |   4 +
 gcc/analyzer/program-state.cc |   8 +
 gcc/analyzer/program-state.h  |   2 +
 gcc/analyzer/region-model.cc  | 217 +-
 gcc/analyzer/region-model.h   |  16 +-
 gcc/analyzer/sm-file.cc   |   2 +-
 gcc/analyzer/sm-malloc.cc |   8 +-
 gcc/analyzer/sm-pattern-test.cc   |   4 +-
 gcc/analyzer/sm-sensitive.cc  |   2 +-
 gcc/analyzer/sm-taint.cc  |   2 +-
 .../analyzer/CVE-2005-1689-dedupe-issue.c |  26 +++
 gcc/testsuite/gcc.dg/analyzer/data-model-1.c  |   4 +-
 gcc/testsuite/gcc.dg/analyzer/data-model-5b.c |   3 +-
 gcc/testsuite/gcc.dg/analyzer/data-model-5c.c |  10 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-3.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-4.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-6.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-7.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-7a.c |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-8.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/setjmp-9.c  |   2 +-
 gcc/testsuite/gcc.dg/analyzer/unknown-fns.c   | 115 ++
 29 files changed, 476 insertions(+), 62 deletions(-)
 create mode 100644 gcc/analyzer/ChangeLog
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/CVE-2005-1689-dedupe-issue.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/unknown-fns.c

-- 
2.21.0



[PATCH 1/4] analyzer: add ChangeLog

2019-12-17 Thread David Malcolm
gcc/analyzer/ChangeLog:
* ChangeLog: New file.
---
 gcc/analyzer/ChangeLog | 10 ++
 1 file changed, 10 insertions(+)
 create mode 100644 gcc/analyzer/ChangeLog

diff --git a/gcc/analyzer/ChangeLog b/gcc/analyzer/ChangeLog
new file mode 100644
index ..7144b69596e2
--- /dev/null
+++ b/gcc/analyzer/ChangeLog
@@ -0,0 +1,10 @@
+2019-12-13  David Malcolm  
+
+   * Initial creation
+
+
+Copyright (C) 2019 Free Software Foundation, Inc.
+
+Copying and distribution of this file, with or without modification,
+are permitted in any medium without royalty provided the copyright
+notice and this notice are preserved.
-- 
2.21.0



[PATCH 3/4] analyzer: fix dedupe issue seen with CVE-2005-1689

2019-12-17 Thread David Malcolm
Whilst analyzing the reproducer for detecting CVE-2005-1689
(krb5-1.4.1's src/lib/krb5/krb/recvauth.c), the analyzer reported
11 double-free diagnostics on lines of the form:

   krb5_xfree(inbuf.data);

with no deduplication occcurring.

The root cause is that the diagnostics each have a COMPONENT_REF for
the inbuf.data, but they are different trees, and the de-duplication
logic was using pointer equality.

This patch replaces the pointer equality tests with calls to a new
pending_diagnostic::same_tree_p, implemented using simple_cst_equal.

With this patch, de-duplication occurs, and only 3 diagnostics are
reported.  The 11 diagnostics are partitioned into 3 dedupe keys,
2 with 2 duplicates and 1 with 7 duplicates.

gcc/analyzer/ChangeLog:
* diagnostic-manager.cc (saved_diagnostic::operator==): Move here
from header.  Replace pointer equality test on m_var with call to
pending_diagnostic::same_tree_p.
* diagnostic-manager.h (saved_diagnostic::operator==): Move to
diagnostic-manager.cc.
* pending-diagnostic.cc (pending_diagnostic::same_tree_p): New.
* pending-diagnostic.h (pending_diagnostic::same_tree_p): New.
* sm-file.cc (file_diagnostic::subclass_equal_p): Replace pointer
equality on m_arg with call to pending_diagnostic::same_tree_p.
* sm-malloc.cc (malloc_diagnostic::subclass_equal_p): Likewise.
(possible_null_arg::subclass_equal_p): Likewise.
(null_arg::subclass_equal_p): Likewise.
(free_of_non_heap::subclass_equal_p): Likewise.
* sm-pattern-test.cc (pattern_match::operator==): Likewise.
* sm-sensitive.cc (exposure_through_output_file::operator==):
Likewise.
* sm-taint.cc (tainted_array_index::operator==): Likewise.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/CVE-2005-1689-dedupe-issue.c: New test.
---
 gcc/analyzer/diagnostic-manager.cc| 14 ++
 gcc/analyzer/diagnostic-manager.h | 13 +-
 gcc/analyzer/pending-diagnostic.cc|  9 +++
 gcc/analyzer/pending-diagnostic.h |  4 +++
 gcc/analyzer/sm-file.cc   |  2 +-
 gcc/analyzer/sm-malloc.cc |  8 +++---
 gcc/analyzer/sm-pattern-test.cc   |  4 +--
 gcc/analyzer/sm-sensitive.cc  |  2 +-
 gcc/analyzer/sm-taint.cc  |  2 +-
 .../analyzer/CVE-2005-1689-dedupe-issue.c | 26 +++
 10 files changed, 63 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/CVE-2005-1689-dedupe-issue.c

diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index 47d32f4f4c6c..5e4e44ecae73 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -65,6 +65,20 @@ saved_diagnostic::~saved_diagnostic ()
   delete m_d;
 }
 
+bool
+saved_diagnostic::operator== (const saved_diagnostic ) const
+{
+  return (m_sm == other.m_sm
+ /* We don't compare m_enode.  */
+ && m_snode == other.m_snode
+ && m_stmt == other.m_stmt
+ /* We don't compare m_stmt_finder.  */
+ && pending_diagnostic::same_tree_p (m_var, other.m_var)
+ && m_state == other.m_state
+ && m_d->equal_p (*other.m_d)
+ && m_trailing_eedge == other.m_trailing_eedge);
+}
+
 /* class diagnostic_manager.  */
 
 /* diagnostic_manager's ctor.  */
diff --git a/gcc/analyzer/diagnostic-manager.h 
b/gcc/analyzer/diagnostic-manager.h
index 5490a2f4e233..a8f9eb898ba7 100644
--- a/gcc/analyzer/diagnostic-manager.h
+++ b/gcc/analyzer/diagnostic-manager.h
@@ -36,18 +36,7 @@ public:
pending_diagnostic *d);
   ~saved_diagnostic ();
 
-  bool operator== (const saved_diagnostic ) const
-  {
-return (m_sm == other.m_sm
-   /* We don't compare m_enode.  */
-   && m_snode == other.m_snode
-   && m_stmt == other.m_stmt
-   /* We don't compare m_stmt_finder.  */
-   && m_var == other.m_var
-   && m_state == other.m_state
-   && m_d->equal_p (*other.m_d)
-   && m_trailing_eedge == other.m_trailing_eedge);
-  }
+  bool operator== (const saved_diagnostic ) const;
 
   //private:
   const state_machine *m_sm;
diff --git a/gcc/analyzer/pending-diagnostic.cc 
b/gcc/analyzer/pending-diagnostic.cc
index ae61b47bef66..29fcd7dc7e0d 100644
--- a/gcc/analyzer/pending-diagnostic.cc
+++ b/gcc/analyzer/pending-diagnostic.cc
@@ -61,4 +61,13 @@ evdesc::event_desc::formatted_print (const char *fmt, ...) 
const
   return result;
 }
 
+/* Return true if T1 and T2 are "the same" for the purposes of
+   diagnostic deduplication.  */
+
+bool
+pending_diagnostic::same_tree_p (tree t1, tree t2)
+{
+  return simple_cst_equal (t1, t2) == 1;
+}
+
 #endif /* #if ENABLE_ANALYZER */
diff --git a/gcc/analyzer/pending-diagnostic.h 
b/gcc/analyzer/pending-diagnostic.h
index 15a1379e8fd1..494a6852d91a 100644
--- 

[PATCH 2/4] analyzer: better logging for dedupe_winners::add

2019-12-17 Thread David Malcolm
gcc/analyzer/ChangeLog:
* diagnostic-manager.cc (dedupe_winners::add): Add logging
of deduplication decisions made.
---
 gcc/analyzer/diagnostic-manager.cc | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index b8aae115ae67..47d32f4f4c6c 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -299,12 +299,19 @@ public:
 dedupe_key *key = new dedupe_key (sd, dc->get_path ());
 if (dedupe_candidate **slot = m_map.get (key))
   {
+   if (logger)
+ logger->log ("already have this dedupe_key");
+
(*slot)->add_duplicate ();
 
if (dc->length () < (*slot)->length ())
  {
/* We've got a shorter path for the key; replace
   the current candidate.  */
+   if (logger)
+ logger->log ("length %i is better than existing length %i;"
+  " taking over this dedupe_key",
+  dc->length (), (*slot)->length ());
dc->m_num_dupes = (*slot)->get_num_dupes ();
delete *slot;
*slot = dc;
@@ -312,12 +319,22 @@ public:
else
  /* We haven't beaten the current best candidate;
 drop the new candidate.  */
- delete dc;
+ {
+   if (logger)
+ logger->log ("length %i isn't better than existing length %i;"
+  " dropping this candidate",
+  dc->length (), (*slot)->length ());
+   delete dc;
+ }
delete key;
   }
 else
-  /* This is the first candidate for this key.  */
-  m_map.put (key, dc);
+  {
+   /* This is the first candidate for this key.  */
+   m_map.put (key, dc);
+   if (logger)
+ logger->log ("first candidate for this dedupe_key");
+  }
   }
 
  /* Emit the simplest diagnostic within each set.  */
-- 
2.21.0



Re: [PATCH] V10 patch #4, Add new prefixed/non-prefixed memory constraints

2019-12-17 Thread Segher Boessenkool
On Tue, Dec 17, 2019 at 05:29:44PM -0500, Michael Meissner wrote:
> On Tue, Dec 17, 2019 at 11:15:29AM -0600, Segher Boessenkool wrote:
> > > +;; Return true if the operand is a valid memory address that does not 
> > > use a
> > > +;; prefixed address.
> > > +(define_predicate "non_prefixed_memory"
> > > +  (match_code "mem")
> > > +{
> > > +  enum insn_form iform
> > > += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> > > +
> > > +  return (iform != INSN_FORM_BAD
> > > +  && iform != INSN_FORM_PREFIXED_NUMERIC
> > > +   && iform != INSN_FORM_PCREL_LOCAL
> > > +   && iform != INSN_FORM_PCREL_EXTERNAL);
> > > +})
> > 
> > Why can this not use just !address_is_prefixed?  Why is an
> > INSN_FORM_PCREL_EXTERNAL address neither prefixed nor non-prefixed?  What
> > does "BAD" mean, really?  Should that ever happen, should that not ICE?
> 
> You can't just invert !address_is_prefixed, because it would all things that
> may not be valid memory addresses.

Yes, so test that *explicitly*, in the "prefixed_memory" predicate as
well please.  Make the two predicates as much the same as possible.

And what is with the INSN_FORM_PCREL_EXTERNAL?


Segher


Re: [PATCH] Fix attribute((section)) for templates

2019-12-17 Thread Nathan Sidwell

On 12/16/19 6:20 PM, Jason Merrill wrote:

On 11/29/19 6:23 PM, Strager Neds wrote:



How can we solve this problem? Some ideas (none of which I like):

* Disallow this code, possibly with an improved diagnostic.
* Silently make the section SECTION_LINKONCE if there is a conflict.
* Silently make the section not-SECTION_LINKONCE if there is a conflict.
* Silently make the section not-SECTION_LINKONCE unconditionally (even
   if there is no conflict).
* Make two sections with the same name, one with SECTION_LINKONCE and
   one with not-SECTION_LINKONCE. This is what Clang does. Clang seems to
   Do What I Mean for ELF; the .o file has one COMDAT section and another
   non-COMDAT section.
* Extend attribute((section())) to allow specifying different section
   names for different section flags.

Thanks in advance for your feedback!


s::var should probably go into its own COMDAT section named 
something like .gnu.linkonce.testsection.. Currently 
resolve_unique_section does nothing if DECL_SECTION_NAME is set.


coincidentally I fell over this (again) recently.  I agree, [[gnu::section 
("prefix")]] on a template should use that as a prefix to the comdat section for 
the instantiations -- overriding the default of .rodata or .data for variable 
instantiations for example.


nathan

--
Nathan Sidwell


Re: [PATCH] V10 patch #4, Add new prefixed/non-prefixed memory constraints

2019-12-17 Thread Michael Meissner
On Tue, Dec 17, 2019 at 05:35:24PM -0600, Segher Boessenkool wrote:
> On Tue, Dec 17, 2019 at 05:29:44PM -0500, Michael Meissner wrote:
> > On Tue, Dec 17, 2019 at 11:15:29AM -0600, Segher Boessenkool wrote:
> > > > +;; Return true if the operand is a valid memory address that does not 
> > > > use a
> > > > +;; prefixed address.
> > > > +(define_predicate "non_prefixed_memory"
> > > > +  (match_code "mem")
> > > > +{
> > > > +  enum insn_form iform
> > > > += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> > > > +
> > > > +  return (iform != INSN_FORM_BAD
> > > > +  && iform != INSN_FORM_PREFIXED_NUMERIC
> > > > + && iform != INSN_FORM_PCREL_LOCAL
> > > > + && iform != INSN_FORM_PCREL_EXTERNAL);
> > > > +})
> > > 
> > > Why can this not use just !address_is_prefixed?  Why is an
> > > INSN_FORM_PCREL_EXTERNAL address neither prefixed nor non-prefixed?  What
> > > does "BAD" mean, really?  Should that ever happen, should that not ICE?
> > 
> > You can't just invert !address_is_prefixed, because it would all things that
> > may not be valid memory addresses.
> 
> Yes, so test that *explicitly*, in the "prefixed_memory" predicate as
> well please.  Make the two predicates as much the same as possible.
> 
> And what is with the INSN_FORM_PCREL_EXTERNAL?

INSN_FORM_PCREL_EXTERNAL says that the operand is a reference to an external
symbol.  It cannot appear in an actual memory insns in normal usage, but it
needs to be handled several places:

1) pcrel_extern_addr needs to be able to load an external address into a GPR
register.

2) The prefixed insn attribute (and prefixed_paddi_p which it calls) needs to
recognize pcrel_extern_addr and note that it is prefixed.

3) The PCREL_OPT support will need to support it.  If you do the PCREL_OPT
support via combine and flow control passes, you will need to be able to handle
external references as addresses.

The function address_is_prefixed, specifically does not return true for
external symbols, because you can't use them in a normal context.

In the context of the patch (vector extract), it needs to decide whether the
address is prefixed or not, in order to decide whether it needs a second base
register temporary.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH] rs6000: Fix PR92923, __builtin_vec_xor() causes subregs to be used when not using V4SImode vectors

2019-12-17 Thread Peter Bergner
PR92923 shows a problem where builtin function usage is causing performance
problems due to unneeded subreg usage.  These are being caused by the front-
end emitting unneeded VIEW_CONVERT_EXPR's on the builtin functions operands.
These in tern where caused by a lack of overloaded builtin entries in the
rs6000 backend.  The following patch adds just enough new definitions to
match what our vector documentation says we must support.  I have also
added new test cases so that we will catch any regressions in this area.

This passed bootstrap and regression testing with no errors.  Ok for trunk?

This is a bug on the release branches too, but given how big this patch
ended up being, I don't think we want to backport this.

Peter


gcc/
PR target/92923
* config/rs6000/rs6000-builtin.def (VAND, VANDC, VNOR, VOR,
VXOR): Delete.
(EQV_V16QI_UNS, EQV_V8HI_UNS, EQV_V4SI_UNS, EQV_V2DI_UNS, EQV_V1TI_UNS,
NAND_V16QI_UNS, NAND_V8HI_UNS, NAND_V4SI_UNS, NAND_V2DI_UNS,
NAND_V1TI_UNS, ORC_V16QI_UNS, ORC_V8HI_UNS, ORC_V4SI_UNS, ORC_V2DI_UNS,
ORC_V1TI_UNS, VAND_V16QI_UNS, VAND_V16QI, VAND_V8HI_UNS, VAND_V8HI,
VAND_V4SI_UNS, VAND_V4SI, VAND_V2DI_UNS, VAND_V2DI, VAND_V4SF,
VAND_V2DF, VANDC_V16QI_UNS, VANDC_V16QI, VANDC_V8HI_UNS, VANDC_V8HI,
VANDC_V4SI_UNS, VANDC_V4SI, VANDC_V2DI_UNS, VANDC_V2DI, VANDC_V4SF,
VANDC_V2DF, VNOR_V16QI_UNS, VNOR_V16QI, VNOR_V8HI_UNS, VNOR_V8HI,
VNOR_V4SI_UNS, VNOR_V4SI, VNOR_V2DI_UNS, VNOR_V2DI, VNOR_V4SF,
VNOR_V2DF, VOR_V16QI_UNS, VOR_V16QI, VOR_V8HI_UNS, VOR_V8HI,
VOR_V4SI_UNS, VOR_V4SI, VOR_V2DI_UNS, VOR_V2DI, VOR_V4SF, VOR_V2DF,
VXOR_V16QI_UNS, VXOR_V16QI, VXOR_V8HI_UNS, VXOR_V8HI,
VXOR_V4SI_UNS, VXOR_V4SI, VXOR_V2DI_UNS, VXOR_V2DI, VXOR_V4SF,
VXOR_V2DF): Add definitions.
* config/rs6000/rs6000-call.c (altivec_overloaded_builtins)
: Remove.
: Add
definitions.
: Change unsigned usages to use the new *_UNS
definition names.
(rs6000_gimple_fold_builtin) : Use new definition names.
(builtin_function_type) : Handle unsigned
builtins.

   gcc/testsuite/
   PR target/92923
   * gcc.target/powerpc/pr92923-1.c: New test.
   * gcc.target/powerpc/pr92923-2.c: Likewise.

Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 279479)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1000,8 +1000,26 @@ BU_ALTIVEC_2 (VADDUHS, "vadduhs",
 BU_ALTIVEC_2 (VADDSHS,   "vaddshs",CONST,  altivec_vaddshs)
 BU_ALTIVEC_2 (VADDUWS,   "vadduws",CONST,  altivec_vadduws)
 BU_ALTIVEC_2 (VADDSWS,   "vaddsws",CONST,  altivec_vaddsws)
-BU_ALTIVEC_2 (VAND,  "vand",   CONST,  andv4si3)
-BU_ALTIVEC_2 (VANDC, "vandc",  CONST,  andcv4si3)
+BU_ALTIVEC_2 (VAND_V16QI_UNS, "vand_v16qi_uns",CONST,  andv16qi3)
+BU_ALTIVEC_2 (VAND_V16QI, "vand_v16qi",CONST,  andv16qi3)
+BU_ALTIVEC_2 (VAND_V8HI_UNS,  "vand_v8hi_uns", CONST,  andv8hi3)
+BU_ALTIVEC_2 (VAND_V8HI,  "vand_v8hi", CONST,  andv8hi3)
+BU_ALTIVEC_2 (VAND_V4SI_UNS,  "vand_v4si_uns", CONST,  andv4si3)
+BU_ALTIVEC_2 (VAND_V4SI,  "vand_v4si", CONST,  andv4si3)
+BU_ALTIVEC_2 (VAND_V2DI_UNS,  "vand_v2di_uns", CONST,  andv2di3)
+BU_ALTIVEC_2 (VAND_V2DI,  "vand_v2di", CONST,  andv2di3)
+BU_ALTIVEC_2 (VAND_V4SF,  "vand_v4sf", CONST,  andv4sf3)
+BU_ALTIVEC_2 (VAND_V2DF,  "vand_v2df", CONST,  andv2df3)
+BU_ALTIVEC_2 (VANDC_V16QI_UNS,"vandc_v16qi_uns",CONST, andcv16qi3)
+BU_ALTIVEC_2 (VANDC_V16QI,"vandc_v16qi",   CONST,  andcv16qi3)
+BU_ALTIVEC_2 (VANDC_V8HI_UNS, "vandc_v8hi_uns",CONST,  andcv8hi3)
+BU_ALTIVEC_2 (VANDC_V8HI, "vandc_v8hi",CONST,  andcv8hi3)
+BU_ALTIVEC_2 (VANDC_V4SI_UNS, "vandc_v4si_uns",CONST,  andcv4si3)
+BU_ALTIVEC_2 (VANDC_V4SI, "vandc_v4si",CONST,  andcv4si3)
+BU_ALTIVEC_2 (VANDC_V2DI_UNS, "vandc_v2di_uns",CONST,  andcv2di3)
+BU_ALTIVEC_2 (VANDC_V2DI, "vandc_v2di",CONST,  andcv2di3)
+BU_ALTIVEC_2 (VANDC_V4SF, "vandc_v4sf",CONST,  andcv4sf3)
+BU_ALTIVEC_2 (VANDC_V2DF, "vandc_v2df",CONST,  andcv2df3)
 BU_ALTIVEC_2 (VAVGUB,"vavgub", CONST,  uavgv16qi3_ceil)
 BU_ALTIVEC_2 (VAVGSB,"vavgsb", CONST,  avgv16qi3_ceil)
 BU_ALTIVEC_2 (VAVGUH,"vavguh", CONST,  uavgv8hi3_ceil)
@@ -1057,8 +1075,27 @@ BU_ALTIVEC_2 (VMULOUH, "vmulouh",
 BU_ALTIVEC_2 (VMULOSH,   "vmulosh",CONST,  
vec_widen_smult_odd_v8hi)
 BU_P8V_AV_2 (VMULOUW,"vmulouw",CONST,  
vec_widen_umult_odd_v4si)
 BU_P8V_AV_2 (VMULOSW,"vmulosw",CONST,  
vec_widen_smult_odd_v4si)
-BU_ALTIVEC_2 (VNOR,  "vnor",   CONST,  norv4si3)
-BU_ALTIVEC_2 (VOR,   

[PATCH 10/13] OpenACC 2.6 deep copy: Fortran front-end parts

2019-12-17 Thread Julian Brown
This patch has been broken out of the "OpenACC 2.6 manual deep copy
support" patch, last posted here:

  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02376.html

This part contains the Fortran front-end support for parsing the new
attach and detach clauses, as well as derived-type members on other
data-movement clauses (copyin, copyout, etc.).

Tested alongside other patches in this series with offloading to
NVPTX. OK?

Thanks,

Julian

ChangeLog

gcc/fortran/
* gfortran.h (gfc_omp_map_op): Add OMP_MAP_ATTACH, OMP_MAP_DETACH.
* openmp.c (gfc_match_omp_variable_list): Add allow_derived parameter.
Parse derived-type member accesses if true.
(omp_mask2): Add OMP_CLAUSE_ATTACH and OMP_CLAUSE_DETACH.
(gfc_match_omp_map_clause): Add allow_derived parameter.  Pass to
gfc_match_omp_variable_list.
(gfc_match_omp_clauses): Support attach and detach.  Support derived
types for appropriate OpenACC directives.
(OACC_PARALLEL_CLAUSES, OACC_SERIAL_CLAUSES, OACC_KERNELS_CLAUSES,
OACC_DATA_CLAUSES, OACC_ENTER_DATA_CLAUSES): Add OMP_CLAUSE_ATTACH.
(OACC_EXIT_DATA_CLAUSES): Add OMP_CLAUSE_DETACH.
(check_symbol_not_pointer): Don't disallow pointer objects of derived
type.
(resolve_oacc_data_clauses): Don't disallow allocatable derived types.
(resolve_omp_clauses): Perform duplicate checking only for non-derived
type component accesses (plain variables and arrays or array sections).
Support component refs.
* trans-expr.c (gfc_conv_component_ref,
conv_parent_component_references): Make global.
(gfc_auto_dereference_var): New function, broken out of...
(gfc_conv_variable): ...here.  Call above function.
* trans-openmp.c (gfc_omp_privatize_by_reference): Support component
refs.
(gfc_trans_omp_array_section): New function, broken out of...
(gfc_trans_omp_clauses): ...here.  Support component refs/derived
types, attach and detach clauses.
* trans.h (gfc_conv_component_ref, conv_parent_component_references,
gfc_auto_dereference_var): Add prototypes.

gcc/testsuite/
* gfortran.dg/goacc/derived-types.f90: New test.
* gfortran.dg/goacc/derived-types-2.f90: New test.
* gfortran.dg/goacc/data-clauses.f95: Adjust for expected errors.
* gfortran.dg/goacc/enter-exit-data.f95: Likewise.
---
 gcc/fortran/gfortran.h|   2 +
 gcc/fortran/openmp.c  | 160 +++---
 gcc/fortran/trans-expr.c  | 184 +--
 gcc/fortran/trans-openmp.c| 287 ++
 gcc/fortran/trans.h   |   8 +
 .../gfortran.dg/goacc/data-clauses.f95|  38 +--
 .../gfortran.dg/goacc/derived-types-2.f90 |  14 +
 .../gfortran.dg/goacc/derived-types.f90   |  77 +
 .../gfortran.dg/goacc/enter-exit-data.f95 |  24 +-
 9 files changed, 561 insertions(+), 233 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/derived-types-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/derived-types.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index f4a2b99bdc4..d3c0d0be217 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1192,10 +1192,12 @@ enum gfc_omp_depend_op
 enum gfc_omp_map_op
 {
   OMP_MAP_ALLOC,
+  OMP_MAP_ATTACH,
   OMP_MAP_TO,
   OMP_MAP_FROM,
   OMP_MAP_TOFROM,
   OMP_MAP_DELETE,
+  OMP_MAP_DETACH,
   OMP_MAP_FORCE_ALLOC,
   OMP_MAP_FORCE_TO,
   OMP_MAP_FORCE_FROM,
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index dc0521b40f0..d79f4a90271 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -233,7 +233,8 @@ static match
 gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 bool allow_common, bool *end_colon = NULL,
 gfc_omp_namelist ***headp = NULL,
-bool allow_sections = false)
+bool allow_sections = false,
+bool allow_derived = false)
 {
   gfc_omp_namelist *head, *tail, *p;
   locus old_loc, cur_loc;
@@ -259,7 +260,8 @@ gfc_match_omp_variable_list (const char *str, 
gfc_omp_namelist **list,
case MATCH_YES:
  gfc_expr *expr;
  expr = NULL;
- if (allow_sections && gfc_peek_ascii_char () == '(')
+ if ((allow_sections && gfc_peek_ascii_char () == '(')
+ || (allow_derived && gfc_peek_ascii_char () == '%'))
{
  gfc_current_locus = cur_loc;
  m = gfc_match_variable (, 0);
@@ -797,7 +799,7 @@ enum omp_mask1
   OMP_MASK1_LAST
 };
 
-/* OpenACC 2.0 specific clauses. */
+/* OpenACC 2.0+ specific clauses. */
 enum omp_mask2
 {
   OMP_CLAUSE_ASYNC,
@@ -823,6 +825,8 @@ enum omp_mask2
   OMP_CLAUSE_TILE,
   OMP_CLAUSE_IF_PRESENT,
   OMP_CLAUSE_FINALIZE,
+  

[PATCH 09/13] OpenACC 2.6 deep copy: C and C++ front-end parts

2019-12-17 Thread Julian Brown
This patch has been broken out of the "OpenACC 2.6 manual deep copy
support" patch, last posted here:

  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02376.html

This part contains the C and C++ changes to parse attach and detach
clauses and struct member accesses via "." or "->" on other data-movement
clauses (copyin, copyout, etc.).

Tested alongside other patches in this series with offloading to
NVPTX. OK?

Thanks,

Julian

ChangeLog

gcc/c-family/
* c-common.h (c_omp_map_clause_name): Add prototype.
* c-omp.c (c_omp_map_clause_name): New function.
* c-pragma.h (pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_ATTACH and
PRAGMA_OACC_CLAUSE_DETACH.

gcc/c/
* c-parser.c (c_parser_omp_clause_name): Add parsing of attach and
detach clauses.
(c_parser_omp_variable_list): Add ALLOW_DEREF optional parameter.
Allow deref (->) in variable lists if true.
(c_parser_omp_var_list_parens): Add ALLOW_DEREF optional parameter.
Pass to c_parser_omp_variable_list.
(c_parser_oacc_data_clause): Support attach and detach clauses.  Update
call to c_parser_omp_variable_list.
(c_parser_oacc_all_clauses): Support attach and detach clauses.
(OACC_DATA_CLAUSE_MASK, OACC_ENTER_DATA_CLAUSE_MASK,
OACC_KERNELS_CLAUSE_MASK, OACC_PARALLEL_CLAUSE_MASK,
OACC_SERIAL_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_ATTACH.
(OACC_EXIT_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DETACH.
* c-typeck.c (handle_omp_array_sections_1): Reject subarrays for attach
and detach.  Support deref.
(handle_omp_array_sections): Use GOMP_MAP_ATTACH_DETACH instead of
GOMP_MAP_ALWAYS_POINTER for OpenACC.
(c_oacc_check_attachments): New function.
(c_finish_omp_clauses): Check attach/detach arguments for being
pointers using above.  Support deref.

gcc/cp/
* parser.c (cp_parser_omp_clause_name): Support attach and detach
clauses.
(cp_parser_omp_var_list_no_open): Add ALLOW_DEREF optional parameter.
Parse deref if true.
(cp_parser_omp_var_list): Add ALLOW_DEREF optional parameter.  Pass to
cp_parser_omp_var_list_no_open.
(cp_parser_oacc_data_clause): Support attach and detach clauses.
Update call to cp_parser_omp_var_list_no_open.
(cp_parser_oacc_all_clauses): Support attach and detach.
(OACC_DATA_CLAUSE_MASK, OACC_ENTER_DATA_CLAUSE_MASK,
OACC_KERNELS_CLAUSE_MASK, OACC_PARALLEL_CLAUSE_MASK,
OACC_SERIAL_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_ATTACH.
(OACC_EXIT_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DETACH.
* semantics.c (handle_omp_array_sections_1): Reject subarrays for
attach and detach.
(handle_omp_array_sections): Use GOMP_MAP_ATTACH_DETACH instead of
GOMP_MAP_ALWAYS_POINTER for OpenACC.
(cp_oacc_check_attachments): New function.
(finish_omp_clauses): Use above function.  Allow structure fields and
class members to appear in OpenACC data clauses.  Support
GOMP_MAP_ATTACH_DETACH.  Support deref.

gcc/testsuite/
* c-c++-common/goacc/deep-copy-arrayofstruct.c: New test.
* c-c++-common/goacc/mdc-1.c: New test.
* c-c++-common/goacc/mdc-2.c: New test.
* gcc.dg/goacc/mdc.C: New test.
---
 gcc/c-family/c-common.h   |  1 +
 gcc/c-family/c-omp.c  | 33 +++
 gcc/c-family/c-pragma.h   |  2 +
 gcc/c/c-parser.c  | 53 --
 gcc/c/c-typeck.c  | 76 +-
 gcc/cp/parser.c   | 56 +--
 gcc/cp/semantics.c| 98 ---
 .../goacc/deep-copy-arrayofstruct.c   | 84 
 gcc/testsuite/c-c++-common/goacc/mdc-1.c  | 55 +++
 gcc/testsuite/c-c++-common/goacc/mdc-2.c  | 62 
 gcc/testsuite/g++.dg/goacc/mdc.C  | 68 +
 11 files changed, 554 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/deep-copy-arrayofstruct.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/mdc-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/mdc-2.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/mdc.C

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 2bcb54f66b9..2d89451b693 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1205,6 +1205,7 @@ extern bool c_omp_predefined_variable (tree);
 extern enum omp_clause_default_kind c_omp_predetermined_sharing (tree);
 extern tree c_omp_check_context_selector (location_t, tree);
 extern void c_omp_mark_declare_variant (location_t, tree, tree);
+extern const char *c_omp_map_clause_name (tree, bool);
 
 /* Return next tree in the chain for chain_next walking of tree nodes.  */
 static inline tree
diff --git 

[PATCH 08/13] OpenACC 2.6 deep copy: middle-end parts

2019-12-17 Thread Julian Brown
This patch has been broken out of the "OpenACC 2.6 manual deep copy
support" patch, last posted here:

  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02376.html

This part contains the middle-end support for OpenACC 2.6 attach and
detach operations, either as standalone clauses or as "attach/detach"
actions triggered by other (data movement) clauses, as detailed in the
specification.

Tested alongside other patches in this series with offloading to
NVPTX. OK?

Thanks,

Julian

ChangeLog

gcc/
* gimplify.c (gimplify_omp_var_data): Add GOVD_MAP_HAS_ATTACHMENTS.
(insert_struct_comp_map): Support derived-type member mappings
for arrays with descriptors which use GOMP_MAP_TO_PSET.  Support
GOMP_MAP_ATTACH_DETACH.
(gimplify_scan_omp_clauses): Tidy up OACC_ENTER_DATA/OACC_EXIT_DATA
mappings.  Handle attach/detach clauses and component references.
(gimplify_adjust_omp_clauses_1): Skip adjustments for explicit
attach/detach clauses.
(gimplify_omp_target_update): Handle finalize for detach.
* omp-low.c (lower_omp_target): Support GOMP_MAP_ATTACH,
GOMP_MAP_DETACH, GOMP_MAP_FORCE_DETACH.
* tree-pretty-print.c (dump_omp_clause): Likewise, plus
GOMP_MAP_ATTACH_DETACH.

include/
* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_ATTACH_DETACH.
---
 gcc/gimplify.c   | 232 ++-
 gcc/omp-low.c|   3 +
 gcc/tree-pretty-print.c  |  18 +++
 include/gomp-constants.h |   6 +-
 4 files changed, 229 insertions(+), 30 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e3088dcbe05..e3d5bc83c4f 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -123,6 +123,10 @@ enum gimplify_omp_var_data
   /* Flag for GOVD_REDUCTION: inscan seen in {in,ex}clusive clause.  */
   GOVD_REDUCTION_INSCAN = 0x200,
 
+  /* Flag for GOVD_MAP: (struct) vars that have pointer attachments for
+ fields.  */
+  GOVD_MAP_HAS_ATTACHMENTS = 8388608,
+
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
   | GOVD_LOCAL)
@@ -8209,20 +8213,33 @@ insert_struct_comp_map (enum tree_code code, tree c, 
tree struct_node,
tree prev_node, tree *scp)
 {
   enum gomp_map_kind mkind
-= code == OMP_TARGET_EXIT_DATA ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC;
+= (code == OMP_TARGET_EXIT_DATA || code == OACC_EXIT_DATA)
+  ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC;
 
   tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
   tree cl = scp ? prev_node : c2;
   OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
   OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (c));
   OMP_CLAUSE_CHAIN (c2) = scp ? *scp : prev_node;
-  OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (ptr_type_node);
+  if (OMP_CLAUSE_CHAIN (prev_node) != c
+  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (prev_node)) == OMP_CLAUSE_MAP
+  && (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (prev_node))
+ == GOMP_MAP_TO_PSET))
+OMP_CLAUSE_SIZE (c2) = OMP_CLAUSE_SIZE (OMP_CLAUSE_CHAIN (prev_node));
+  else
+OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (ptr_type_node);
   if (struct_node)
 OMP_CLAUSE_CHAIN (struct_node) = c2;
 
   /* We might need to create an additional mapping if we have a reference to a
- pointer (in C++).  */
-  if (OMP_CLAUSE_CHAIN (prev_node) != c)
+ pointer (in C++).  Don't do this if we have something other than a
+ GOMP_MAP_ALWAYS_POINTER though, i.e. a GOMP_MAP_TO_PSET.  */
+  if (OMP_CLAUSE_CHAIN (prev_node) != c
+  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (prev_node)) == OMP_CLAUSE_MAP
+  && ((OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (prev_node))
+  == GOMP_MAP_ALWAYS_POINTER)
+ || (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (prev_node))
+ == GOMP_MAP_ATTACH_DETACH)))
 {
   tree c4 = OMP_CLAUSE_CHAIN (prev_node);
   tree c3 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
@@ -8329,6 +8346,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
   struct gimplify_omp_ctx *ctx, *outer_ctx;
   tree c;
   hash_map *struct_map_to_clause = NULL;
+  hash_set *struct_deref_set = NULL;
   tree *prev_list_p = NULL, *orig_list_p = list_p;
   int handled_depend_iterators = -1;
   int nowait = -1;
@@ -8731,8 +8749,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
case OMP_TARGET_DATA:
case OMP_TARGET_ENTER_DATA:
case OMP_TARGET_EXIT_DATA:
-   case OACC_ENTER_DATA:
-   case OACC_EXIT_DATA:
case OACC_HOST_DATA:
  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
  || (OMP_CLAUSE_MAP_KIND (c)
@@ -8741,6 +8757,15 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
   mapped, but not the pointer to it.  */
remove = true;
 

[PATCH 11/13] OpenACC 2.6 deep copy: C and C++ execution tests

2019-12-17 Thread Julian Brown
This patch has been broken out of the "OpenACC 2.6 manual deep copy
support" patch, last posted here:

  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02376.html

This part adds C and C++ execution tests to libgomp.

Tested alongside other patches in this series with offloading to
NVPTX. OK?

Thanks,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-4.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-6.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-7.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-9.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-10.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-11.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-14.c: New test.
* testsuite/libgomp.oacc-c++/deep-copy-12.C: New test.
* testsuite/libgomp.oacc-c++/deep-copy-13.C: New test.
---
 .../testsuite/libgomp.oacc-c++/deep-copy-12.C | 72 +++
 .../testsuite/libgomp.oacc-c++/deep-copy-13.C | 72 +++
 .../libgomp.oacc-c-c++-common/deep-copy-1.c   | 24 +
 .../libgomp.oacc-c-c++-common/deep-copy-10.c  | 53 +++
 .../libgomp.oacc-c-c++-common/deep-copy-11.c  | 72 +++
 .../libgomp.oacc-c-c++-common/deep-copy-14.c  | 63 ++
 .../libgomp.oacc-c-c++-common/deep-copy-2.c   | 29 +++
 .../libgomp.oacc-c-c++-common/deep-copy-4.c   | 87 +++
 .../libgomp.oacc-c-c++-common/deep-copy-6.c   | 59 +
 .../libgomp.oacc-c-c++-common/deep-copy-7.c   | 45 ++
 .../libgomp.oacc-c-c++-common/deep-copy-8.c   | 54 
 .../libgomp.oacc-c-c++-common/deep-copy-9.c   | 53 +++
 12 files changed, 683 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/deep-copy-12.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/deep-copy-13.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-10.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-11.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-14.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-4.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-6.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-7.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-9.c

diff --git a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-12.C 
b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-12.C
new file mode 100644
index 000..a512008685d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-12.C
@@ -0,0 +1,72 @@
+#include 
+
+/* Test attach/detach with dereferences of reference to pointer to struct.  */
+
+typedef struct {
+  int *a;
+  int *b;
+  int *c;
+} mystruct;
+
+int main(int argc, char* argv[])
+{
+  const int N = 1024;
+  mystruct *m = (mystruct *) malloc (sizeof (*m));
+  mystruct * = m;
+  int i;
+
+  mref->a = (int *) malloc (N * sizeof (int));
+  m->b = (int *) malloc (N * sizeof (int));
+  m->c = (int *) malloc (N * sizeof (int));
+
+  for (i = 0; i < N; i++)
+{
+  mref->a[i] = 0;
+  m->b[i] = 0;
+  m->c[i] = 0;
+}
+
+#pragma acc enter data copyin(m[0:1])
+
+  for (int i = 0; i < 99; i++)
+{
+  int j;
+#pragma acc parallel loop copy(mref->a[0:N])
+  for (j = 0; j < N; j++)
+   mref->a[j]++;
+#pragma acc parallel loop copy(mref->b[0:N], m->c[5:N-10])
+  for (j = 0; j < N; j++)
+   {
+ mref->b[j]++;
+ if (j > 5 && j < N - 5)
+   m->c[j]++;
+   }
+}
+
+#pragma acc exit data copyout(m[0:1])
+
+  for (i = 0; i < N; i++)
+{
+  if (m->a[i] != 99)
+   abort ();
+  if (m->b[i] != 99)
+   abort ();
+  if (i > 5 && i < N-5)
+   {
+ if (m->c[i] != 99)
+   abort ();
+   }
+  else
+   {
+ if (m->c[i] != 0)
+   abort ();
+   }
+}
+
+  free (m->a);
+  free (m->b);
+  free (m->c);
+  free (m);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c++/deep-copy-13.C 
b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-13.C
new file mode 100644
index 000..a5194568603
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/deep-copy-13.C
@@ -0,0 +1,72 @@
+#include 
+
+/* Test array slice with reference to pointer.  */
+
+typedef struct {
+  int *a;
+  int *b;
+  int *c;
+} mystruct;
+
+int main(int argc, char* argv[])
+{
+  const int N 

[PATCH] Fix redundant load missed by fre [tree-optimization 92980]

2019-12-17 Thread Hongtao Liu
Hi:
  This patch is to simplify A * C + (-D) -> (A - D/C) * C when C is a
power of 2 and D mod C == 0.
  bootstrap and make check is ok.

changelog
gcc/
* gcc/match.pd (A * C + (-D) = (A - D/C) * C. when C is a
power of 2 and D mod C == 0): Add new simplification.

gcc/testsuite
* gcc.dg/pr92980.c: New test.

-- 
BR,
Hongtao
From 41f76f29f0070082e29082460efdb0bb9b9869f7 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Fri, 13 Dec 2019 15:52:02 +0800
Subject: [PATCH] Simplify A * C + (-D) = (A - D/C) * C. when C is a power of 2
 and D mod C == 0.

gcc/
	* gcc/match.pd (A * C + (-D) = (A - D/C) * C. when C is a
	power of 2 and D mod C == 0): Add new simplification.

gcc/testsuite
	* gcc.dg/pr92980.c: New test.
---
 gcc/match.pd   | 20 
 gcc/testsuite/gcc.dg/pr92980.c | 43 ++
 2 files changed, 63 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr92980.c

diff --git a/gcc/match.pd b/gcc/match.pd
index dda86964b4c..a128733e2c3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4297,6 +4297,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (tree_single_nonzero_warnv_p (@0, NULL))
{ constant_boolean_node (cmp == NE_EXPR, type); })))
 
+/* Simplify A * C + (-D) = (A - D/C) * C. when C is a power of 2
+   and D mod C == 0.  */
+(simplify
+ (plus (mult @0 integer_pow2p@1) INTEGER_CST@2)
+ (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
+ && TYPE_UNSIGNED (TREE_TYPE (@0))
+ && tree_fits_uhwi_p (@1)
+ && tree_fits_uhwi_p (@2))
+  (with
+   {
+ unsigned HOST_WIDE_INT c = tree_to_uhwi (@1);
+ unsigned HOST_WIDE_INT d = tree_to_uhwi (@2);
+ HOST_WIDE_INT neg_p = wi::sign_mask (d);
+ unsigned HOST_WIDE_INT negd = HOST_WIDE_INT_0U - d;
+ unsigned HOST_WIDE_INT modd = negd % c;
+ unsigned HOST_WIDE_INT divd = negd / c;
+}
+   (if (neg_p && modd == HOST_WIDE_INT_0U)
+(mult (minus @0 { build_int_cst (TREE_TYPE (@2), divd);}) @1)
+
 /* If we have (A & C) == C where C is a power of 2, convert this into
(A & C) != 0.  Similarly for NE_EXPR.  */
 (for cmp (eq ne)
diff --git a/gcc/testsuite/gcc.dg/pr92980.c b/gcc/testsuite/gcc.dg/pr92980.c
new file mode 100644
index 000..d7abf20788e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr92980.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre" }  */
+
+int f1(short *src1, int i, int k, int n)
+{
+  int j = k + n;
+  short sum = src1[j];
+  sum += src1[j-1];
+  if (i <= k)
+{
+  j+=2;
+  sum += src1[j-3];
+}
+  return sum + j;
+}
+
+int f2(int *src1, int i, int k, int n)
+{
+  int j = k + n;
+  int sum = src1[j];
+  sum += src1[j-1];
+  if (i <= k)
+{
+  j+=2;
+  sum += src1[j-3];
+}
+  return sum + j;
+}
+
+int f3(long long *src1, int i, int k, int n)
+{
+  int j = k + n;
+  long long sum = src1[j];
+  sum += src1[j-1];
+  if (i <= k)
+{
+  j+=2;
+  sum += src1[j-3];
+}
+  return sum + j;
+}
+
+/* { dg-final { scan-tree-dump-times "= \\*" 6 "fre1" } }  */
-- 
2.18.1



Re: [PATCH] Fix redundant load missed by fre [tree-optimization 92980]

2019-12-17 Thread Andrew Pinski
On Tue, Dec 17, 2019 at 6:33 PM Hongtao Liu  wrote:
>
> Hi:
>   This patch is to simplify A * C + (-D) -> (A - D/C) * C when C is a
> power of 2 and D mod C == 0.
>   bootstrap and make check is ok.

I don't see why D has to be negative here.


>TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
+ && TYPE_UNSIGNED (TREE_TYPE (@0))

This is the wrong check here.
Use INTEGRAL_TYPE_P .

>+ (plus (mult @0 integer_pow2p@1) INTEGER_CST@2)

 You might want a :s here for the mult and/or plus.

unsigned HOST_WIDE_INT d = tree_to_uhwi (@2);
...
Maybe use wide_int math instead of HOST_WIDE_INT here, then you don't
need the tree_fits_uhwi_p check.

Add a testcase should tests the pattern directly rather than indirectly.

Also we are in stage 3 which means bug fixes only so this might/should
wait until stage 1.

Thanks,
Andrew Pinski

>
> changelog
> gcc/
> * gcc/match.pd (A * C + (-D) = (A - D/C) * C. when C is a
> power of 2 and D mod C == 0): Add new simplification.
>
> gcc/testsuite
> * gcc.dg/pr92980.c: New test.
>
> --
> BR,
> Hongtao


[PATCH 03/13] OpenACC reference count consistency checking

2019-12-17 Thread Julian Brown
This is a rebased version of the reference count consistency checking
patch last posted upstream here:

  https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00239.html

This is not necessary for proper operation of the rest of the patches
in this series, but has proved useful in development.

Tested (with RC_CHECKING enabled and alongside other patches in this
series) with offloading to NVPTX.

OK?

Julian

ChangeLog

libgomp/
* libgomp.h (RC_CHECKING): New macro, disabled by default, guarding all
hunks in this patch.
(target_mem_desc): Add refcount_chk, mark fields.
(splay_tree_key_s): Add refcount_chk field.
(dump_tgt, gomp_rc_check): Add prototypes.
* oacc-mem.c (GOACC_enter_exit_data): Add refcount self-check code.
* oacc-parallel.c (GOACC_parallel_keyed): Add refcount self-check code.
(GOACC_data_start, GOACC_data_end): Likewise.
* target.c (stdio.h): Include.
(dump_tgt, rc_check_clear, rc_check_count, rc_check_verify,
gomp_rc_check): New functions to consistency-check reference counts.
---
 libgomp/libgomp.h   |  18 
 libgomp/oacc-mem.c  |   6 ++
 libgomp/oacc-parallel.c |  27 ++
 libgomp/target.c| 181 
 4 files changed, 232 insertions(+)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 865b9df2444..d20194bafbb 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -954,9 +954,17 @@ struct target_var_desc {
   uintptr_t length;
 };
 
+/* Uncomment to enable reference-count consistency checking (for development
+   use only).  */
+//#define RC_CHECKING 1
+
 struct target_mem_desc {
   /* Reference count.  */
   uintptr_t refcount;
+#ifdef RC_CHECKING
+  uintptr_t refcount_chk;
+  bool mark;
+#endif
   /* All the splay nodes allocated together.  */
   splay_tree_node array;
   /* Start of the target region.  */
@@ -1012,6 +1020,10 @@ struct splay_tree_key_s {
  "present increment" operations (via "acc enter data") referring to the 
same
  host-memory block.  */
   uintptr_t virtual_refcount;
+#ifdef RC_CHECKING
+  /* The recalculated reference count, for verification.  */
+  uintptr_t refcount_chk;
+#endif
   struct splay_tree_aux *aux;
 };
 
@@ -1158,6 +1170,12 @@ extern void gomp_copy_dev2host (struct gomp_device_descr 
*,
struct goacc_asyncqueue *, void *, const void *,
size_t);
 
+#ifdef RC_CHECKING
+extern void dump_tgt (const char *, struct target_mem_desc *);
+extern void gomp_rc_check (struct gomp_device_descr *,
+  struct target_mem_desc *);
+#endif
+
 extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
  size_t, void **, void **,
  size_t *, void *, bool,
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 2a0e7236b92..4a2185c58ad 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1147,4 +1147,10 @@ GOACC_enter_exit_data (int flags_m, size_t mapnum, void 
**hostaddrs,
   thr->prof_info = NULL;
   thr->api_info = NULL;
 }
+
+#ifdef RC_CHECKING
+  gomp_mutex_lock (_dev->lock);
+  gomp_rc_check (acc_dev, thr->mapped_data);
+  gomp_mutex_unlock (_dev->lock);
+#endif
 }
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 5c13a7e4348..eb281db323c 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -301,6 +301,15 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
_info);
 }
   
+#ifdef RC_CHECKING
+  gomp_mutex_lock (_dev->lock);
+  assert (tgt);
+  dump_tgt (__FUNCTION__, tgt);
+  tgt->prev = thr->mapped_data;
+  gomp_rc_check (acc_dev, tgt);
+  gomp_mutex_unlock (_dev->lock);
+#endif
+
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
 if (tgt->list[i].key != NULL)
@@ -351,6 +360,12 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
   thr->prof_info = NULL;
   thr->api_info = NULL;
 }
+
+#ifdef RC_CHECKING
+  gomp_mutex_lock (_dev->lock);
+  gomp_rc_check (acc_dev, thr->mapped_data);
+  gomp_mutex_unlock (_dev->lock);
+#endif
 }
 
 /* Legacy entry point (GCC 5).  Only provide host fallback execution.  */
@@ -484,6 +499,12 @@ GOACC_data_start (int flags_m, size_t mapnum,
   thr->prof_info = NULL;
   thr->api_info = NULL;
 }
+
+#ifdef RC_CHECKING
+  gomp_mutex_lock (_dev->lock);
+  gomp_rc_check (acc_dev, thr->mapped_data);
+  gomp_mutex_unlock (_dev->lock);
+#endif
 }
 
 void
@@ -557,6 +578,12 @@ GOACC_data_end (void)
   thr->prof_info = NULL;
   thr->api_info = NULL;
 }
+
+#ifdef RC_CHECKING
+  gomp_mutex_lock (>dev->lock);
+  gomp_rc_check (thr->dev, thr->mapped_data);
+  gomp_mutex_unlock (>dev->lock);
+#endif
 }
 
 void
diff --git a/libgomp/target.c b/libgomp/target.c
index 23f9e1618ca..5712da5b64e 100644
--- a/libgomp/target.c
+++ 

[PATCH 05/13] Factor out duplicate code in gimplify_scan_omp_clauses

2019-12-17 Thread Julian Brown
This patch was previously posted here:

  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00321.html

This version is the same as the last-posted version. The middle-end patch
later in the series depends on this. Tested alongside other patches in
this series with offloading to NVPTX. OK?

Thanks,

Julian

ChangeLog

gcc/
* gimplify.c (insert_struct_comp_map, extract_base_bit_offset): New.
(gimplify_scan_omp_clauses): Outline duplicated code into calls to
above two functions.
---
 gcc/gimplify.c | 290 ++---
 1 file changed, 157 insertions(+), 133 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9073680cb31..e3088dcbe05 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8186,6 +8186,138 @@ gimplify_omp_depend (tree *list_p, gimple_seq *pre_p)
   return 1;
 }
 
+/* Insert a GOMP_MAP_ALLOC or GOMP_MAP_RELEASE node following a
+   GOMP_MAP_STRUCT mapping.  C is an always_pointer mapping.  STRUCT_NODE is
+   the struct node to insert the new mapping after (when the struct node is
+   initially created).  PREV_NODE is the first of two or three mappings for a
+   pointer, and is either:
+ - the node before C, when a pair of mappings is used, e.g. for a C/C++
+   array section.
+ - not the node before C.  This is true when we have a reference-to-pointer
+   type (with a mapping for the reference and for the pointer), or for
+   Fortran derived-type mappings with a GOMP_MAP_TO_PSET.
+   If SCP is non-null, the new node is inserted before *SCP.
+   if SCP is null, the new node is inserted before PREV_NODE.
+   The return type is:
+ - PREV_NODE, if SCP is non-null.
+ - The newly-created ALLOC or RELEASE node, if SCP is null.
+ - The second newly-created ALLOC or RELEASE node, if we are mapping a
+   reference to a pointer.  */
+
+static tree
+insert_struct_comp_map (enum tree_code code, tree c, tree struct_node,
+   tree prev_node, tree *scp)
+{
+  enum gomp_map_kind mkind
+= code == OMP_TARGET_EXIT_DATA ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC;
+
+  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
+  tree cl = scp ? prev_node : c2;
+  OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
+  OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (c));
+  OMP_CLAUSE_CHAIN (c2) = scp ? *scp : prev_node;
+  OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (ptr_type_node);
+  if (struct_node)
+OMP_CLAUSE_CHAIN (struct_node) = c2;
+
+  /* We might need to create an additional mapping if we have a reference to a
+ pointer (in C++).  */
+  if (OMP_CLAUSE_CHAIN (prev_node) != c)
+{
+  tree c4 = OMP_CLAUSE_CHAIN (prev_node);
+  tree c3 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
+  OMP_CLAUSE_SET_MAP_KIND (c3, mkind);
+  OMP_CLAUSE_DECL (c3) = unshare_expr (OMP_CLAUSE_DECL (c4));
+  OMP_CLAUSE_SIZE (c3) = TYPE_SIZE_UNIT (ptr_type_node);
+  OMP_CLAUSE_CHAIN (c3) = prev_node;
+  if (!scp)
+   OMP_CLAUSE_CHAIN (c2) = c3;
+  else
+   cl = c3;
+}
+
+  if (scp)
+*scp = c2;
+
+  return cl;
+}
+
+/* Strip ARRAY_REFS or an indirect ref off BASE, find the containing object,
+   and set *BITPOSP and *POFFSETP to the bit offset of the access.
+   If BASE_REF is non-NULL and the containing object is a reference, set
+   *BASE_REF to that reference before dereferencing the object.
+   If BASE_REF is NULL, check that the containing object is a COMPONENT_REF or
+   has array type, else return NULL.  */
+
+static tree
+extract_base_bit_offset (tree base, tree *base_ref, poly_int64 *bitposp,
+poly_offset_int *poffsetp)
+{
+  tree offset;
+  poly_int64 bitsize, bitpos;
+  machine_mode mode;
+  int unsignedp, reversep, volatilep = 0;
+  poly_offset_int poffset;
+
+  if (base_ref)
+{
+  *base_ref = NULL_TREE;
+
+  while (TREE_CODE (base) == ARRAY_REF)
+   base = TREE_OPERAND (base, 0);
+
+  if (TREE_CODE (base) == INDIRECT_REF)
+   base = TREE_OPERAND (base, 0);
+}
+  else
+{
+  if (TREE_CODE (base) == ARRAY_REF)
+   {
+ while (TREE_CODE (base) == ARRAY_REF)
+   base = TREE_OPERAND (base, 0);
+ if (TREE_CODE (base) != COMPONENT_REF
+ || TREE_CODE (TREE_TYPE (base)) != ARRAY_TYPE)
+   return NULL_TREE;
+   }
+  else if (TREE_CODE (base) == INDIRECT_REF
+  && TREE_CODE (TREE_OPERAND (base, 0)) == COMPONENT_REF
+  && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0)))
+  == REFERENCE_TYPE))
+   base = TREE_OPERAND (base, 0);
+}
+
+  base = get_inner_reference (base, , , , ,
+ , , );
+
+  tree orig_base = base;
+
+  if ((TREE_CODE (base) == INDIRECT_REF
+   || (TREE_CODE (base) == MEM_REF
+  && integer_zerop (TREE_OPERAND (base, 1
+  && DECL_P (TREE_OPERAND (base, 0))
+  && TREE_CODE (TREE_TYPE (TREE_OPERAND (base, 0))) == 

[PATCH 04/13] Use gomp_map_val for OpenACC host-to-device address translation

2019-12-17 Thread Julian Brown
This patch was previously approved here, but I have not committed it yet
(without the other patches in this series):

  https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01156.html

Included for completeness. I will commit this alongside other patches
if they are approved (or it could probably go in by itself anyway).

Thanks,

Julian

ChangeLog

libgomp/
* libgomp.h (gomp_map_val): Add prototype.
* oacc-parallel.c (GOACC_parallel_keyed): Use gomp_map_val instead of
open-coding device-address calculation.
* target.c (gomp_map_val): Make global.
---
 libgomp/libgomp.h   | 1 +
 libgomp/oacc-parallel.c | 8 ++--
 libgomp/target.c| 2 +-
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index d20194bafbb..248d8dc5d63 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1169,6 +1169,7 @@ extern void gomp_copy_host2dev (struct gomp_device_descr 
*,
 extern void gomp_copy_dev2host (struct gomp_device_descr *,
struct goacc_asyncqueue *, void *, const void *,
size_t);
+extern uintptr_t gomp_map_val (struct target_mem_desc *, void **, size_t);
 
 #ifdef RC_CHECKING
 extern void dump_tgt (const char *, struct target_mem_desc *);
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index eb281db323c..4cc0636aae1 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -312,12 +312,8 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
 
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
-if (tgt->list[i].key != NULL)
-  devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
- + tgt->list[i].key->tgt_offset
- + tgt->list[i].offset);
-else
-  devaddrs[i] = NULL;
+devaddrs[i] = (void *) gomp_map_val (tgt, hostaddrs, i);
+
   if (aq == NULL)
 acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, dims,
tgt);
diff --git a/libgomp/target.c b/libgomp/target.c
index 5712da5b64e..46b20c04865 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -673,7 +673,7 @@ gomp_map_fields_existing (struct target_mem_desc *tgt,
  (void *) cur_node.host_end);
 }
 
-static inline uintptr_t
+attribute_hidden uintptr_t
 gomp_map_val (struct target_mem_desc *tgt, void **hostaddrs, size_t i)
 {
   if (tgt->list[i].key != NULL)
-- 
2.23.0



[PATCH 06/13] OpenACC 2.6 deep copy: attach/detach API routines

2019-12-17 Thread Julian Brown
This patch has been broken out of the "OpenACC 2.6 manual deep copy
support" patch, last posted here:

  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02376.html

It contains just the minimal libgomp bits necessary to support the OpenACC
runtime API routines (acc_attach, acc_detach and async/finalize versions
of same). This is essentially the same as the version posted by Thomas
here, modulo rebasing:

  https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01212.html

Tested alongside other patches in this series with offloading to
NVPTX. OK?

Thanks,

Julian

ChangeLog

libgomp/
* libgomp.h (struct splay_tree_aux): Add attach_count field.
(gomp_attach_pointer, gomp_detach_pointer): Add prototypes.
* libgomp.map (OACC_2.6): New section. Add acc_attach,
acc_attach_async, acc_detach, acc_detach_async, acc_detach_finalize,
acc_detach_finalize_async.
* oacc-mem.c (acc_attach_async, acc_attach, goacc_detach_internal,
acc_detach, acc_detach_async, acc_detach_finalize,
acc_detach_finalize_async): New functions.
* openacc.h (acc_attach, acc_attach_async, acc_detach,
(acc_detach_async, acc_detach_finalize, acc_detach_finalize_async): Add
prototypes.
* target.c (gomp_attach_pointer, gomp_detach_pointer): New functions.
(gomp_remove_var_internal): Free attachment counts if present.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-3.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-5.c: New test.
---
 libgomp/libgomp.h |  10 ++
 libgomp/libgomp.map   |  10 ++
 libgomp/oacc-mem.c|  84 +++
 libgomp/openacc.h |   6 +
 libgomp/target.c  | 130 ++
 .../libgomp.oacc-c-c++-common/deep-copy-3.c   |  34 +
 .../libgomp.oacc-c-c++-common/deep-copy-5.c   |  81 +++
 7 files changed, 355 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-3.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-5.c

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 248d8dc5d63..2017991b59c 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1002,6 +1002,9 @@ struct target_mem_desc {
 struct splay_tree_aux {
   /* Pointer to the original mapping of "omp declare target link" object.  */
   splay_tree_key link_key;
+  /* For a block with attached pointers, the attachment counters for each.
+ Only used for OpenACC.  */
+  uintptr_t *attach_count;
 };
 
 struct splay_tree_key_s {
@@ -1170,6 +1173,13 @@ extern void gomp_copy_dev2host (struct gomp_device_descr 
*,
struct goacc_asyncqueue *, void *, const void *,
size_t);
 extern uintptr_t gomp_map_val (struct target_mem_desc *, void **, size_t);
+extern void gomp_attach_pointer (struct gomp_device_descr *,
+struct goacc_asyncqueue *, splay_tree,
+splay_tree_key, uintptr_t, size_t,
+struct gomp_coalesce_buf *);
+extern void gomp_detach_pointer (struct gomp_device_descr *,
+struct goacc_asyncqueue *, splay_tree_key,
+uintptr_t, bool, struct gomp_coalesce_buf *);
 
 #ifdef RC_CHECKING
 extern void dump_tgt (const char *, struct target_mem_desc *);
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index c79430f8d8d..63276f7d29b 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -484,6 +484,16 @@ OACC_2.5.1 {
acc_register_library;
 } OACC_2.5;
 
+OACC_2.6 {
+  global:
+   acc_attach;
+   acc_attach_async;
+   acc_detach;
+   acc_detach_async;
+   acc_detach_finalize;
+   acc_detach_finalize_async;
+} OACC_2.5.1;
+
 GOACC_2.0 {
   global:
GOACC_data_end;
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 4a2185c58ad..08507791399 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -866,6 +866,90 @@ acc_update_self_async (void *h, size_t s, int async)
   update_dev_host (0, h, s, async);
 }
 
+void
+acc_attach_async (void **hostaddr, int async)
+{
+  struct goacc_thread *thr = goacc_thread ();
+  struct gomp_device_descr *acc_dev = thr->dev;
+  goacc_aq aq = get_goacc_asyncqueue (async);
+
+  struct splay_tree_key_s cur_node;
+  splay_tree_key n;
+
+  if (thr->dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+return;
+
+  gomp_mutex_lock (_dev->lock);
+
+  cur_node.host_start = (uintptr_t) hostaddr;
+  cur_node.host_end = cur_node.host_start + sizeof (void *);
+  n = splay_tree_lookup (_dev->mem_map, _node);
+
+  if (n == NULL)
+gomp_fatal ("struct not mapped for acc_attach");
+
+  gomp_attach_pointer (acc_dev, aq, _dev->mem_map, n, (uintptr_t) hostaddr,
+  0, NULL);
+
+  gomp_mutex_unlock (_dev->lock);
+}
+
+void

[PATCH 07/13] OpenACC 2.6 deep copy: libgomp parts

2019-12-17 Thread Julian Brown
This patch has been broken out of the "OpenACC 2.6 manual deep copy
support" patch, last posted here:

  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02376.html

This part contains the libgomp runtime support for the GOMP_MAP_ATTACH and
GOMP_MAP_DETACH mapping kinds (etc.), as introduced by the front-end
patches following in this series.

Tested alongside other patches in this series with offloading to
NVPTX. OK?

Thanks,

Julian

ChangeLog

include/
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_4, GOMP_MAP_DEEP_COPY):
Define.
(gomp_map_kind): Add GOMP_MAP_ATTACH, GOMP_MAP_DETACH,
GOMP_MAP_FORCE_DETACH.

libgomp/
* libgomp.h (struct target_var_desc): Add do_detach flag.
* oacc-init.c (acc_shutdown_1): Free aux block if present.
* oacc-mem.c (find_group_last): Add SIZES parameter. Support
struct components.  Tidy up and add some new checks.
(goacc_enter_data_internal): Update call to find_group_last.
(goacc_exit_data_internal): Support detach operations and
GOMP_MAP_STRUCT.
(GOACC_enter_exit_data): Handle initial GOMP_MAP_STRUCT or
GOMP_MAP_FORCE_PRESENT in finalization detection code.  Handle
attach/detach in enter/exit data detection code.
* target.c (gomp_map_vars_existing): Initialise do_detach field of
tgt_var_desc.
(gomp_map_vars_internal): Support attach.
(gomp_unmap_vars_internal): Support detach.
---
 include/gomp-constants.h |  10 
 libgomp/libgomp.h|   2 +
 libgomp/oacc-mem.c   | 121 +--
 libgomp/target.c |  51 -
 4 files changed, 166 insertions(+), 18 deletions(-)

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 9e356cdfeec..e8bd52e81bd 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -40,8 +40,11 @@
 #define GOMP_MAP_FLAG_SPECIAL_0(1 << 2)
 #define GOMP_MAP_FLAG_SPECIAL_1(1 << 3)
 #define GOMP_MAP_FLAG_SPECIAL_2(1 << 4)
+#define GOMP_MAP_FLAG_SPECIAL_4(1 << 6)
 #define GOMP_MAP_FLAG_SPECIAL  (GOMP_MAP_FLAG_SPECIAL_1 \
 | GOMP_MAP_FLAG_SPECIAL_0)
+#define GOMP_MAP_DEEP_COPY (GOMP_MAP_FLAG_SPECIAL_4 \
+| GOMP_MAP_FLAG_SPECIAL_2)
 /* Flag to force a specific behavior (or else, trigger a run-time error).  */
 #define GOMP_MAP_FLAG_FORCE(1 << 7)
 
@@ -127,6 +130,13 @@ enum gomp_map_kind
 /* Decrement usage count and deallocate if zero.  */
 GOMP_MAP_RELEASE = (GOMP_MAP_FLAG_SPECIAL_2
 | GOMP_MAP_DELETE),
+/* In OpenACC, attach a pointer to a mapped struct field.  */
+GOMP_MAP_ATTACH =  (GOMP_MAP_DEEP_COPY | 0),
+/* In OpenACC, detach a pointer to a mapped struct field.  */
+GOMP_MAP_DETACH =  (GOMP_MAP_DEEP_COPY | 1),
+/* In OpenACC, detach a pointer to a mapped struct field.  */
+GOMP_MAP_FORCE_DETACH =(GOMP_MAP_DEEP_COPY
+| GOMP_MAP_FLAG_FORCE | 1),
 
 /* Internal to GCC, not used in libgomp.  */
 /* Do not map, but pointer assign a pointer instead.  */
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 2017991b59c..6141cc117bc 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -948,6 +948,8 @@ struct target_var_desc {
   bool copy_from;
   /* True if data always should be copied from device to host at the end.  */
   bool always_copy_from;
+  /* True if variable should be detached at end of region.  */
+  bool do_detach;
   /* Relative offset against key host_start.  */
   uintptr_t offset;
   /* Actual length.  */
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 08507791399..ce9f2759dfa 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -956,33 +956,48 @@ acc_detach_finalize_async (void **hostaddr, int async)
mappings.  */
 
 static int
-find_group_last (int pos, size_t mapnum, unsigned short *kinds)
+find_group_last (int pos, size_t mapnum, size_t *sizes, unsigned short *kinds)
 {
   unsigned char kind0 = kinds[pos] & 0xff;
-  int first_pos = pos, last_pos = pos;
+  int first_pos = pos;
 
-  if (kind0 == GOMP_MAP_TO_PSET)
+  switch (kind0)
 {
+case GOMP_MAP_TO_PSET:
   while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER)
-   last_pos = ++pos;
+   pos++;
   /* We expect at least one GOMP_MAP_POINTER after a GOMP_MAP_TO_PSET.  */
-  assert (last_pos > first_pos);
-}
-  else
-{
+  assert (pos > first_pos);
+  break;
+
+case GOMP_MAP_STRUCT:
+  pos += sizes[pos];
+  break;
+
+case GOMP_MAP_POINTER:
+case GOMP_MAP_ALWAYS_POINTER:
+  /* These mappings are only expected after some other mapping.  If we
+see one by itself, something has gone wrong.  

[PATCH 01/13] Use aux struct in libgomp for infrequently-used/API-specific data

2019-12-17 Thread Julian Brown
This patch has been broken out of the "OpenACC 2.6 manual deep copy
support" patch, last posted here:

  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02376.html

This part is included for completeness. It is the same as the patch
posted by Thomas here:

  https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01208.html

Tested together with other patches in this series with offloading
to NVPTX.

OK for mainline?

Thanks,

Julian

ChangeLog

libgomp/
* libgomp.h (struct splay_tree_aux): New.
(struct splay_tree_key_s): Replace link_key field with aux pointer.
* target.c (gomp_map_vars_internal): Adjust for link_key being moved
to aux struct.
(gomp_remove_var_internal): Free aux block if present.
(gomp_load_image_to_device): Zero-initialise aux field instead of
link_key field.
(omp_target_associate_pointer): Zero-initialise aux field.
---
 libgomp/libgomp.h | 10 --
 libgomp/target.c  | 23 ---
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 36dcca28353..0f1f11284d5 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -989,6 +989,13 @@ struct target_mem_desc {
 #define OFFSET_POINTER (~(uintptr_t) 1)
 #define OFFSET_STRUCT (~(uintptr_t) 2)
 
+/* Auxiliary structure for infrequently-used or API-specific data.  */
+
+struct splay_tree_aux {
+  /* Pointer to the original mapping of "omp declare target link" object.  */
+  splay_tree_key link_key;
+};
+
 struct splay_tree_key_s {
   /* Address of the host object.  */
   uintptr_t host_start;
@@ -1002,8 +1009,7 @@ struct splay_tree_key_s {
   uintptr_t refcount;
   /* Dynamic reference count.  */
   uintptr_t dynamic_refcount;
-  /* Pointer to the original mapping of "omp declare target link" object.  */
-  splay_tree_key link_key;
+  struct splay_tree_aux *aux;
 };
 
 /* The comparison function.  */
diff --git a/libgomp/target.c b/libgomp/target.c
index 82ed38c01ec..97c2b5c5e4d 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -907,13 +907,15 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
  kind & typemask, cbufp);
else
  {
-   k->link_key = NULL;
+   k->aux = NULL;
if (n && n->refcount == REFCOUNT_LINK)
  {
/* Replace target address of the pointer with target address
   of mapped object in the splay tree.  */
splay_tree_remove (mem_map, n);
-   k->link_key = n;
+   k->aux
+ = gomp_malloc_cleared (sizeof (struct splay_tree_aux));
+   k->aux->link_key = n;
  }
size_t align = (size_t) 1 << (kind >> rshift);
tgt->list[i].key = k;
@@ -1031,7 +1033,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
kind);
  }
 
-   if (k->link_key)
+   if (k->aux && k->aux->link_key)
  {
/* Set link pointer on target to the device address of the
   mapped object.  */
@@ -1146,8 +1148,14 @@ gomp_remove_var_internal (struct gomp_device_descr 
*devicep, splay_tree_key k,
 {
   bool is_tgt_unmapped = false;
   splay_tree_remove (>mem_map, k);
-  if (k->link_key)
-splay_tree_insert (>mem_map, (splay_tree_node) k->link_key);
+  if (k->aux)
+{
+  if (k->aux->link_key)
+   splay_tree_insert (>mem_map,
+  (splay_tree_node) k->aux->link_key);
+  free (k->aux);
+  k->aux = NULL;
+}
   if (aq)
 devicep->openacc.async.queue_callback_func (aq, gomp_unref_tgt_void,
(void *) k->tgt);
@@ -1366,7 +1374,7 @@ gomp_load_image_to_device (struct gomp_device_descr 
*devicep, unsigned version,
   k->tgt_offset = target_table[i].start;
   k->refcount = REFCOUNT_INFINITY;
   k->dynamic_refcount = 0;
-  k->link_key = NULL;
+  k->aux = NULL;
   array->left = NULL;
   array->right = NULL;
   splay_tree_insert (>mem_map, array);
@@ -1399,7 +1407,7 @@ gomp_load_image_to_device (struct gomp_device_descr 
*devicep, unsigned version,
   k->tgt_offset = target_var->start;
   k->refcount = target_size & link_bit ? REFCOUNT_LINK : REFCOUNT_INFINITY;
   k->dynamic_refcount = 0;
-  k->link_key = NULL;
+  k->aux = NULL;
   array->left = NULL;
   array->right = NULL;
   splay_tree_insert (>mem_map, array);
@@ -2661,6 +2669,7 @@ omp_target_associate_ptr (const void *host_ptr, const 
void *device_ptr,
   k->tgt_offset = (uintptr_t) device_ptr + device_offset;
   k->refcount = REFCOUNT_INFINITY;
   k->dynamic_refcount = 0;
+  k->aux = NULL;
   array->left = NULL;
   array->right = NULL;
   splay_tree_insert (>mem_map, array);
-- 
2.23.0

[PATCH 02/13] OpenACC reference count overhaul

2019-12-17 Thread Julian Brown
This is a rebased version of the reference-count overhaul patch last
posted here:

  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02235.html

This version omits parts of the above patch already committed upstream and
merges some recent REFCOUNT_INFINITY changes. This patch causes the newish
PR92843 test to fail, though IMO that test relies on behaviour arising
from a rather nuanced reading of the spec. Hopefully we can resolve that
problem as a follow-up.

Tested alongside other patches in this series with offloading to
NVPTX. OK?

Julian

2019-11-22  Julian Brown  
Thomas Schwinge  

libgomp/
* libgomp.h (struct splay_tree_key_s): Substitute dynamic_refcount
field for virtual_refcount.
(enum gomp_map_vars_kind): Add GOMP_MAP_VARS_OPENACC_ENTER_DATA.
(gomp_free_memmap): Remove prototype.
* oacc-init.c (acc_shutdown_1): Iteratively call gomp_remove_var
instead of calling gomp_free_memmap.
* oacc-mem.c (acc_unmap_data): Open code instead of forcing
target_mem_desc's to_free NULL then calling gomp_unmap_vars.  Handle
REFCOUNT_INFINITY on target blocks.
(present_create_copy): Use virtual_refcount instead of
dynamic_refcount.  Re-do lookup for target pointer return value.
(delete_copyout): Update for virtual_refcount semantics.
(gomp_acc_insert_pointer, gomp_acc_remove_pointer, find_pointer):
Remove functions.
(find_group_last, goacc_enter_data_internal,
goacc_exit_data_internal): New functions.
(GOACC_enter_exit_data): Use goacc_enter_data_internal and
goacc_exit_data_internal helper functions.
* target.c (gomp_map_vars_internal): Handle
GOMP_MAP_VARS_OPENACC_ENTER_DATA.  Update for virtual_refcount
semantics.
(gomp_unmap_vars_internal): Update for virtual_refcount semantics.
(gomp_load_image_to_device, omp_target_associate_ptr): Zero-initialise
virtual_refcount field instead of dynamic_refcount.
(gomp_free_memmap): Remove function.
* testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: New test.
* testsuite/libgomp.c-c++-common/unmap-infinity-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/subset-subarray-mappings-1-r-p.c:
Remove PR92848 TODOs.
* testsuite/libgomp.oacc-c-c++-common/pr92843-1.c: Add XFAIL.
---
 libgomp/libgomp.h |   9 +-
 libgomp/oacc-init.c   |  10 +-
 libgomp/oacc-mem.c| 399 +++---
 libgomp/target.c  |  53 +--
 .../libgomp.c-c++-common/unmap-infinity-2.c   |  19 +
 .../libgomp.oacc-c-c++-common/pr92843-1.c |   1 +
 .../subset-subarray-mappings-1-r-p.c  |  16 -
 .../unmap-infinity-1.c|  17 +
 8 files changed, 228 insertions(+), 296 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/unmap-infinity-2.c
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 0f1f11284d5..865b9df2444 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1007,8 +1007,11 @@ struct splay_tree_key_s {
   uintptr_t tgt_offset;
   /* Reference count.  */
   uintptr_t refcount;
-  /* Dynamic reference count.  */
-  uintptr_t dynamic_refcount;
+  /* Reference counts beyond those that represent genuine references in the
+ linked splay tree key/target memory structures, e.g. for multiple OpenACC
+ "present increment" operations (via "acc enter data") referring to the 
same
+ host-memory block.  */
+  uintptr_t virtual_refcount;
   struct splay_tree_aux *aux;
 };
 
@@ -1139,6 +1142,7 @@ struct gomp_device_descr
 enum gomp_map_vars_kind
 {
   GOMP_MAP_VARS_OPENACC,
+  GOMP_MAP_VARS_OPENACC_ENTER_DATA,
   GOMP_MAP_VARS_TARGET,
   GOMP_MAP_VARS_DATA,
   GOMP_MAP_VARS_ENTER_DATA
@@ -1169,7 +1173,6 @@ extern void gomp_unmap_vars_async (struct target_mem_desc 
*, bool,
   struct goacc_asyncqueue *);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern bool gomp_fini_device (struct gomp_device_descr *);
-extern void gomp_free_memmap (struct splay_tree_s *);
 extern void gomp_unload_device (struct gomp_device_descr *);
 extern bool gomp_remove_var (struct gomp_device_descr *, splay_tree_key);
 extern void gomp_remove_var_async (struct gomp_device_descr *, splay_tree_key,
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index a444c604d59..dd88b58a379 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -370,7 +370,15 @@ acc_shutdown_1 (acc_device_t d)
   if (walk->dev)
{
  gomp_mutex_lock (>dev->lock);
- gomp_free_memmap (>dev->mem_map);
+
+ while (walk->dev->mem_map.root)
+   {
+ splay_tree_key k = >dev->mem_map.root->key;
+ if (k->aux)
+   

[PATCH 00/13] OpenACC 2.6 manual deep copy support

2019-12-17 Thread Julian Brown
Hi,

This patch series provides support for OpenACC 2.6's manual deep copy
(attach/detach) feature. Many of these patches have been submitted
previously, but this series has been rebased and the large deep-copy
part proper has been split into several pieces for ease of review.

Tested with offloading to NVPTX. Further commentary (together with
links to previous submissions) is provided alongside individual patches,
where relevant.

Thanks,

Julian

Julian Brown (13):
  Use aux struct in libgomp for infrequently-used/API-specific data
  OpenACC reference count overhaul
  OpenACC reference count consistency checking
  Use gomp_map_val for OpenACC host-to-device address translation
  Factor out duplicate code in gimplify_scan_omp_clauses
  OpenACC 2.6 deep copy: attach/detach API routines
  OpenACC 2.6 deep copy: libgomp parts
  OpenACC 2.6 deep copy: middle-end parts
  OpenACC 2.6 deep copy: C and C++ front-end parts
  OpenACC 2.6 deep copy: Fortran front-end parts
  OpenACC 2.6 deep copy: C and C++ execution tests
  OpenACC 2.6 deep copy: Fortran execution tests
  Fortran polymorphic class-type support for OpenACC

 gcc/c-family/c-common.h   |   1 +
 gcc/c-family/c-omp.c  |  33 ++
 gcc/c-family/c-pragma.h   |   2 +
 gcc/c/c-parser.c  |  53 +-
 gcc/c/c-typeck.c  |  76 ++-
 gcc/cp/parser.c   |  56 +-
 gcc/cp/semantics.c|  98 +++-
 gcc/fortran/gfortran.h|   2 +
 gcc/fortran/openmp.c  | 166 --
 gcc/fortran/trans-expr.c  | 184 +++---
 gcc/fortran/trans-openmp.c| 342 ---
 gcc/fortran/trans.h   |   8 +
 gcc/gimplify.c| 514 -
 gcc/omp-low.c |   3 +
 .../goacc/deep-copy-arrayofstruct.c   |  84 +++
 gcc/testsuite/c-c++-common/goacc/mdc-1.c  |  55 ++
 gcc/testsuite/c-c++-common/goacc/mdc-2.c  |  62 ++
 gcc/testsuite/g++.dg/goacc/mdc.C  |  68 +++
 .../gfortran.dg/goacc/data-clauses.f95|  38 +-
 .../gfortran.dg/goacc/derived-types-2.f90 |  14 +
 .../gfortran.dg/goacc/derived-types.f90   |  77 +++
 .../gfortran.dg/goacc/enter-exit-data.f95 |  24 +-
 gcc/tree-pretty-print.c   |  18 +
 include/gomp-constants.h  |  16 +-
 libgomp/libgomp.h |  50 +-
 libgomp/libgomp.map   |  10 +
 libgomp/oacc-init.c   |  10 +-
 libgomp/oacc-mem.c| 544 ++
 libgomp/oacc-parallel.c   |  35 +-
 libgomp/openacc.h |   6 +
 libgomp/target.c  | 440 --
 .../libgomp.c-c++-common/unmap-infinity-2.c   |  19 +
 .../testsuite/libgomp.oacc-c++/deep-copy-12.C |  72 +++
 .../testsuite/libgomp.oacc-c++/deep-copy-13.C |  72 +++
 .../libgomp.oacc-c-c++-common/deep-copy-1.c   |  24 +
 .../libgomp.oacc-c-c++-common/deep-copy-10.c  |  53 ++
 .../libgomp.oacc-c-c++-common/deep-copy-11.c  |  72 +++
 .../libgomp.oacc-c-c++-common/deep-copy-14.c  |  63 ++
 .../libgomp.oacc-c-c++-common/deep-copy-2.c   |  29 +
 .../libgomp.oacc-c-c++-common/deep-copy-3.c   |  34 ++
 .../libgomp.oacc-c-c++-common/deep-copy-4.c   |  87 +++
 .../libgomp.oacc-c-c++-common/deep-copy-5.c   |  81 +++
 .../libgomp.oacc-c-c++-common/deep-copy-6.c   |  59 ++
 .../libgomp.oacc-c-c++-common/deep-copy-7.c   |  45 ++
 .../libgomp.oacc-c-c++-common/deep-copy-8.c   |  54 ++
 .../libgomp.oacc-c-c++-common/deep-copy-9.c   |  53 ++
 .../libgomp.oacc-c-c++-common/pr92843-1.c |   1 +
 .../subset-subarray-mappings-1-r-p.c  |  16 -
 .../unmap-infinity-1.c|  17 +
 .../libgomp.oacc-fortran/class-ptr-param.f95  |  34 ++
 .../libgomp.oacc-fortran/classtypes-1.f95 |  48 ++
 .../libgomp.oacc-fortran/classtypes-2.f95 | 106 
 .../libgomp.oacc-fortran/deep-copy-1.f90  |  35 ++
 .../libgomp.oacc-fortran/deep-copy-2.f90  |  33 ++
 .../libgomp.oacc-fortran/deep-copy-3.f90  |  34 ++
 .../libgomp.oacc-fortran/deep-copy-4.f90  |  49 ++
 .../libgomp.oacc-fortran/deep-copy-5.f90  |  57 ++
 .../libgomp.oacc-fortran/deep-copy-6.f90  |  61 ++
 .../libgomp.oacc-fortran/deep-copy-7.f90  |  89 +++
 .../libgomp.oacc-fortran/deep-copy-8.f90  |  41 ++
 .../libgomp.oacc-fortran/derived-type-1.f90   |  28 +
 .../libgomp.oacc-fortran/derivedtype-1.f95|  30 +
 .../libgomp.oacc-fortran/derivedtype-2.f95|  41 ++
 .../libgomp.oacc-fortran/multidim-slice.f95   |  50 ++
 .../libgomp.oacc-fortran/update-2.f90 | 284 +
 65 files changed, 4225 insertions(+), 735 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/deep-copy-arrayofstruct.c
 create mode 100644 

[Ada] Wrong error on hidden must-override primitive

2019-12-17 Thread Pierre-Marie de Rodat
The compiler gave a wrong error about "must override" in the following
case.  A private type is completed with a derived type that inherits a
must-override function. Outside that package, a type extension of the
private type is declared.  The function on that type extension is not
visible, and is not must-override, so should not be in error. Indeed, it
cannot be overridden.  This patch fixes the bug, suppressing that error
message.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-12-18  Bob Duff  

gcc/ada/

* sem_ch3.adb (Derive_Subprogram): Do not set the
Requires_Overriding flag in the above-mentioned case.--- gcc/ada/sem_ch3.adb
+++ gcc/ada/sem_ch3.adb
@@ -15606,7 +15606,8 @@ package body Sem_Ch3 is
  Set_Derived_Name;
 
   --  Otherwise, the type is inheriting a private operation, so enter it
-  --  with a special name so it can't be overridden.
+  --  with a special name so it can't be overridden. See also below, where
+  --  we check for this case, and if so avoid setting Requires_Overriding.
 
   else
  Set_Chars (New_Subp, New_External_Name (Chars (Parent_Subp), 'P'));
@@ -15786,7 +15787,15 @@ package body Sem_Ch3 is
or else Is_Abstract_Subprogram (Alias (New_Subp))
  then
 Set_Is_Abstract_Subprogram (New_Subp);
- else
+
+ --  If the Chars of the new subprogram is different from that of the
+ --  parent's one, it means that we entered it with a special name so
+ --  it can't be overridden (see above). In that case we had better not
+ --  *require* it to be overridden. This is the case where the parent
+ --  type inherited the operation privately, so there's no danger of
+ --  dangling dispatching.
+
+ elsif Chars (New_Subp) = Chars (Alias (New_Subp)) then
 Set_Requires_Overriding (New_Subp);
  end if;
 



[Ada] AI12-0282: shared variable control aspects on formal types

2019-12-17 Thread Pierre-Marie de Rodat
Ada202X allows some aspects related to shared variable control to appear
on formal type declarations. These aspects represent new enforceable
parts of the contract between generic units and instantiations.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-12-18  Ed Schonberg  

gcc/ada/

* par-ch12.adb (P_Formal_Derived_Type_Definition): In Ada_2020
the keyword WITH can indicate the start of aspect specifications
and not a private type extension.
* sem_ch12.adb (Analyze_Formal_Type): Indicate that it is a
first subtype.
(Instantiate_Type): New procedure
Check_Shared_Variable_Control_Aspects to verify matching rules
between formal and actual types. Note that an array type with
aspect Atomic_Components is considered compatible with an array
type whose component type is Atomic, even though the array types
do not carry the same aspect.
* sem_ch13.adb (Analyze_One_Aspect): Allow shared variable
control aspects to appear on formal types.
(Rep_Item_Too_Early): Exclude aspects on formal types.
* sem_prag.adb (Mark_Type): Handle properly pragmas that come
from aspects on formal types.
(Analyze_Pragma, case Atomic_Components): Handle formal types.--- gcc/ada/par-ch12.adb
+++ gcc/ada/par-ch12.adb
@@ -971,9 +971,16 @@ package body Ch12 is
   end if;
 
   if Token = Tok_With then
- Scan; -- past WITH
- Set_Private_Present (Def_Node, True);
- T_Private;
+
+ if Ada_Version >= Ada_2020 and Token /= Tok_Private then
+--  Formal type has aspect specifications, parsed later.
+return Def_Node;
+
+ else
+Scan; -- past WITH
+Set_Private_Present (Def_Node, True);
+T_Private;
+ end if;
 
   elsif Token = Tok_Tagged then
  Scan;

--- gcc/ada/sem_ch12.adb
+++ gcc/ada/sem_ch12.adb
@@ -3410,7 +3410,11 @@ package body Sem_Ch12 is
 raise Program_Error;
   end case;
 
+  --  A formal type declaration declares a type and its first
+  --  subtype.
+
   Set_Is_Generic_Type (T);
+  Set_Is_First_Subtype (T);
 
   if Has_Aspects (N) then
  Analyze_Aspect_Specifications (N, T);
@@ -12178,6 +12182,10 @@ package body Sem_Ch12 is
   Loc: Source_Ptr;
   Subt   : Entity_Id;
 
+  procedure Check_Shared_Variable_Control_Aspects;
+  --  Ada_2020: Verify that shared variable control aspects (RM C.6)
+  --  that may be specified for a formal type are obeyed by the actual.
+
   procedure Diagnose_Predicated_Actual;
   --  There are a number of constructs in which a discrete type with
   --  predicates is illegal, e.g. as an index in an array type declaration.
@@ -12202,6 +12210,79 @@ package body Sem_Ch12 is
   --  Check that base types are the same and that the subtypes match
   --  statically. Used in several of the above.
 
+  
+  --  Check_Shared_Variable_Control_Aspects --
+  
+
+  --  Ada_2020: Verify that shared variable control aspects (RM C.6)
+  --  that may be specified for the formal are obeyed by the actual.
+
+  procedure Check_Shared_Variable_Control_Aspects is
+  begin
+ if Ada_Version >= Ada_2020 then
+if Is_Atomic (A_Gen_T) and then not Is_Atomic (Act_T) then
+   Error_Msg_NE
+  ("actual for& must be an atomic type", Actual, A_Gen_T);
+end if;
+
+if Is_Volatile (A_Gen_T) and then not Is_Volatile (Act_T) then
+   Error_Msg_NE
+  ("actual for& must be a Volatile type", Actual, A_Gen_T);
+end if;
+
+if
+  Is_Independent (A_Gen_T) and then not Is_Independent (Act_T)
+then
+   Error_Msg_NE
+ ("actual for& must be an Independent type", Actual, A_Gen_T);
+end if;
+
+--  We assume that an array type whose atomic component type
+--  is Atomic is equivalent to an array type with the explicit
+--  aspect Has_Atomic_Components. This is a reasonable inference
+--  from the intent of AI12-0282, and makes it legal to use an
+--  actual that does not have the identical aspect as the formal.
+
+if Has_Atomic_Components (A_Gen_T)
+   and then not Has_Atomic_Components (Act_T)
+then
+   if Is_Array_Type (Act_T)
+ and then Is_Atomic (Component_Type (Act_T))
+   then
+  null;
+
+   else
+  Error_Msg_NE
+("actual for& must have atomic components",
+   Actual, A_Gen_T);
+   end if;
+end if;
+
+if Has_Independent_Components (A_Gen_T)
+   

[Ada] Missing accessibility check on access discriminants

2019-12-17 Thread Pierre-Marie de Rodat
This patch fixes an issue whereby compile-time checks on return
aggregates with anonymous access discriminants were not performed when
multiple of such discriminants were present, the aggregate was within an
extended return statement, or the aggregate was within a qualified
expression.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-12-18  Justin Squirek  

gcc/ada/

* sem_ch6.adb (Analyze_Function_Return): Modify handling of
extended return statements to check accessibility of access
discriminants.
(Check_Aggregate_Accessibility): Removed.
(Check_Return_Obj_Accessibility): Added to centralize checking
of return aggregates and subtype indications in the case of an
extended return statement.--- gcc/ada/sem_ch6.adb
+++ gcc/ada/sem_ch6.adb
@@ -694,69 +694,199 @@ package body Sem_Ch6 is
   R_Type : constant Entity_Id := Etype (Scope_Id);
   --  Function result subtype
 
-  procedure Check_Aggregate_Accessibility (Aggr : Node_Id);
-  --  Apply legality rule of 6.5 (5.8) to the access discriminants of an
+  procedure Check_Return_Obj_Accessibility (Return_Stmt : Node_Id);
+  --  Apply legality rule of 6.5 (5.9) to the access discriminants of an
   --  aggregate in a return statement.
 
   procedure Check_Return_Subtype_Indication (Obj_Decl : Node_Id);
   --  Check that the return_subtype_indication properly matches the result
   --  subtype of the function, as required by RM-6.5(5.1/2-5.3/2).
 
-  ---
-  -- Check_Aggregate_Accessibility --
-  ---
+  
+  -- Check_Return_Obj_Accessibility --
+  
 
-  procedure Check_Aggregate_Accessibility (Aggr : Node_Id) is
- Typ   : constant Entity_Id := Etype (Aggr);
- Assoc : Node_Id;
- Discr : Entity_Id;
- Expr  : Node_Id;
- Obj   : Node_Id;
+  procedure Check_Return_Obj_Accessibility (Return_Stmt : Node_Id) is
+ Assoc : Node_Id;
+ Agg   : Node_Id := Empty;
+ Discr : Entity_Id;
+ Expr  : Node_Id;
+ Obj   : Node_Id;
+ Process_Exprs : Boolean := False;
+ Return_Obj: Node_Id;
 
   begin
- if Is_Record_Type (Typ) and then Has_Discriminants (Typ) then
-Discr := First_Discriminant (Typ);
-Assoc := First (Component_Associations (Aggr));
-while Present (Discr) loop
-   if Ekind (Etype (Discr)) = E_Anonymous_Access_Type then
+ --  Only perform checks on record types with access discriminants
+
+ if not Is_Record_Type (R_Type)
+   or else not Has_Discriminants (R_Type)
+ then
+return;
+ end if;
+
+ --  We are only interested in return statements
+
+ if not Nkind_In (Return_Stmt, N_Extended_Return_Statement,
+   N_Simple_Return_Statement)
+ then
+return;
+ end if;
+
+ --  Fetch the object from the return statement, in the case of a
+ --  simple return statement the expression is part of the node.
+
+ if Nkind (Return_Stmt) = N_Extended_Return_Statement then
+Return_Obj := Last (Return_Object_Declarations (Return_Stmt));
+
+--  We could be looking at something that's been expanded with
+--  an initialzation procedure which we can safely ignore.
+
+if Nkind (Return_Obj) /= N_Object_Declaration then
+   return;
+end if;
+ else
+Return_Obj := Return_Stmt;
+ end if;
+
+ --  We may need to check an aggregate or a subtype indication
+ --  depending on how the discriminants were specified and whether
+ --  we are looking at an extended return statement.
+
+ if Nkind (Return_Obj) = N_Object_Declaration
+   and then Nkind (Object_Definition (Return_Obj))
+  = N_Subtype_Indication
+ then
+Assoc := First (Constraints
+ (Constraint (Object_Definition (Return_Obj;
+ else
+--  Qualified expressions may be nested
+
+Agg := Original_Node (Expression (Return_Obj));
+while Nkind (Agg) = N_Qualified_Expression loop
+   Agg := Original_Node (Expression (Agg));
+end loop;
+
+--  If we are looking at an aggregate instead of a function call we
+--  can continue checking accessibility for the supplied
+--  discriminant associations.
+
+if Nkind (Agg) = N_Aggregate then
+   if Present (Expressions (Agg)) then
+  Assoc := First (Expressions (Agg));
+  Process_Exprs := True;
+   else
+  Assoc := 

[Ada] Fix three-letter typos like "sss" in comments and docs

2019-12-17 Thread Pierre-Marie de Rodat
Fix three-letter typos like "alllowed" or "corrresponding". They can be
detected with this command:

  $ grep "[[:alpha:]]\([[:lower:]]\+\)\1\1" ...

but need to be manually filtered for things like "ieee", "dd-mm-" or
hexadecimal literals.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-12-18  Piotr Trojanek  

gcc/ada/

* doc/gnat_rm/implementation_defined_pragmas.rst,
doc/gnat_rm/obsolescent_features.rst,
doc/gnat_ugn/gnat_and_program_execution.rst, exp_attr.adb,
exp_ch9.adb, init.c, libgnat/s-valrea.adb, par-ch6.adb,
sem_attr.adb, sem_ch4.adb, sem_util.ads: Fix trivial typos.
* gnat_rm.texi, gnat_ugn.texi: Regenerate.--- gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
+++ gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
@@ -1344,7 +1344,7 @@ are equivalent to
 The precondition ensures that one and only one of the case guards is
 satisfied on entry to the subprogram.
 The postcondition ensures that for the case guard that was True on entry,
-the corrresponding consequence is True on exit. Other consequence expressions
+the corresponding consequence is True on exit. Other consequence expressions
 are not evaluated.
 
 A precondition ``P`` and postcondition ``Q`` can also be

--- gcc/ada/doc/gnat_rm/obsolescent_features.rst
+++ gcc/ada/doc/gnat_rm/obsolescent_features.rst
@@ -49,7 +49,7 @@ pragma Task_Info
 The functionality provided by pragma ``Task_Info`` is now part of the
 Ada language. The ``CPU`` aspect and the package
 ``System.Multiprocessors`` offer a less system-dependent way to specify
-task affinity or to query the number of processsors.
+task affinity or to query the number of processors.
 
 Syntax
 

--- gcc/ada/doc/gnat_ugn/gnat_and_program_execution.rst
+++ gcc/ada/doc/gnat_ugn/gnat_and_program_execution.rst
@@ -2964,7 +2964,7 @@ integer arithmetic package. The compiler will make calls
 to this package, though only in cases where it cannot be
 sure that ``Long_Long_Integer`` is sufficient to guard against
 intermediate overflows. This package does not use dynamic
-alllocation, but it does use the secondary stack, so an
+allocation, but it does use the secondary stack, so an
 appropriate secondary stack package must be present (this
 is always true for standard full Ada, but may require
 specific steps for restricted run times such as ZFP).

--- gcc/ada/exp_attr.adb
+++ gcc/ada/exp_attr.adb
@@ -5246,7 +5246,7 @@ package body Exp_Attr is
 Rep_To_Pos_Flag (Ptyp, Loc));
 
 else
-   --  Add Boolean parameter True, to request program errror if
+   --  Add Boolean parameter True, to request program error if
--  we have a bad representation on our hands. If checks are
--  suppressed, then add False instead
 
@@ -6216,7 +6216,7 @@ package body Exp_Attr is
 Make_Integer_Literal (Loc, 1))),
 Rep_To_Pos_Flag (Ptyp, Loc));
 else
-   --  Add Boolean parameter True, to request program errror if
+   --  Add Boolean parameter True, to request program error if
--  we have a bad representation on our hands. Add False if
--  checks are suppressed.
 

--- gcc/ada/exp_ch9.adb
+++ gcc/ada/exp_ch9.adb
@@ -363,7 +363,7 @@ package body Exp_Ch9 is
--  a null trailing statement with the given Loc (which is the sloc of
--  the accept, delay, or entry call statement). There might not be any
--  generated code for the accept, delay, or entry call itself (the effect
-   --  of these statements is part of the general processsing done for the
+   --  of these statements is part of the general processing done for the
--  enclosing selective accept, timed entry call, or asynchronous select),
--  and the null statement is there to carry the sloc of that statement to
--  the back-end for trace-based coverage analysis purposes.

--- gcc/ada/gnat_rm.texi
+++ gcc/ada/gnat_rm.texi
@@ -2751,7 +2751,7 @@ pragma Postcondition (if C2 then Pred2);
 The precondition ensures that one and only one of the case guards is
 satisfied on entry to the subprogram.
 The postcondition ensures that for the case guard that was True on entry,
-the corrresponding consequence is True on exit. Other consequence expressions
+the corresponding consequence is True on exit. Other consequence expressions
 are not evaluated.
 
 A precondition @code{P} and postcondition @code{Q} can also be
@@ -28804,7 +28804,7 @@ this kind of implementation dependent addition.
 The functionality provided by pragma @code{Task_Info} is now part of the
 Ada language. The @code{CPU} aspect and the package
 @code{System.Multiprocessors} offer a less system-dependent way to specify
-task affinity or to query the number of processsors.
+task affinity or to query the number of processors.
 
 Syntax
 

--- gcc/ada/gnat_ugn.texi
+++ 

[Ada] Missing accessibility actuals on calls to interface conversion functions

2019-12-17 Thread Pierre-Marie de Rodat
In certain cases of conversions to interface types, the compiler
generates a special function to handle the conversion. In cases where
such a function has an extra accessibility-level formal and the target
type of the conversion has a designated type that comes from a limited
view (via limited_with_clause), the resolution of the type conversion
wasn't retrieving the needed nonlimited view, which resulted in the call
to the interface conversion function not being expanded and hence not
being passed the needed actual for the accessibility level.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-12-18  Gary Dismukes  

gcc/ada/

* sem_res.adb (Resolve_Type_Conversion): Add handling for access
types with designated operand and target types that are
referenced in places that have a limited view of an interface
type by retrieving the nonlimited view when it exists.  Add ???
comments related to missing limited_with_clause handling for
Target (in the non-access case).--- gcc/ada/sem_res.adb
+++ gcc/ada/sem_res.adb
@@ -11827,12 +11827,35 @@ package body Sem_Res is
Set_Etype (Expression (N), Opnd);
 end if;
 
+--  It seems that Non_Limited_View should also be applied for
+--  Target when it has a limited view, but that leads to missing
+--  error checks on interface conversions further below. ???
+
 if Is_Access_Type (Opnd) then
Opnd := Designated_Type (Opnd);
+
+   --  If the type of the operand is a limited view, use nonlimited
+   --  view when available. If it is a class-wide type, recover the
+   --  class-wide type of the nonlimited view.
+
+   if From_Limited_With (Opnd)
+ and then Has_Non_Limited_View (Opnd)
+   then
+  Opnd := Non_Limited_View (Opnd);
+   end if;
 end if;
 
 if Is_Access_Type (Target_Typ) then
Target := Designated_Type (Target);
+
+   --  If the target type is a limited view, use nonlimited view
+   --  when available.
+
+   if From_Limited_With (Target)
+ and then Has_Non_Limited_View (Target)
+   then
+  Target := Non_Limited_View (Target);
+   end if;
 end if;
 
 if Opnd = Target then
@@ -11840,6 +11863,10 @@ package body Sem_Res is
 
 --  Conversion from interface type
 
+--  It seems that it would be better for the error checks below
+--  to be performed as part of Validate_Conversion (and maybe some
+--  of the error checks above could be moved as well?). ???
+
 elsif Is_Interface (Opnd) then
 
--  Ada 2005 (AI-217): Handle entities from limited views



[Ada] Document the introduction of the Object_Size attribute in Ada 2020

2019-12-17 Thread Pierre-Marie de Rodat
This adds references to Ada 2020 in the section documenting the two size
attributes used by GNAT, namely Object_Size and Value_Size, as well as
in the head comment of Subtypes_Statically_Match.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-12-18  Eric Botcazou  

gcc/ada/

* einfo.ads (Handling of Type'Size Value): Add references to the
introduction of Object_Size in Ada 2020.
* sem_eval.adb (Subtypes_Statically_Match): Likewise.--- gcc/ada/einfo.ads
+++ gcc/ada/einfo.ads
@@ -132,19 +132,23 @@ package Einfo is
 --  default size of objects, creates chaos, and major incompatibilities in
 --  existing code.
 
+--  The Ada 2020 RM acknowledges it and adopts GNAT's Object_Size attribute
+--  for determining the default size of objects, but stops short of applying
+--  it universally like GNAT. Indeed the notable exceptions are nonaliased
+--  stand-alone objects, which are not covered by Object_Size in Ada 2020.
+
 --  We proceed as follows, for discrete and fixed-point subtypes, we have
 --  two separate sizes for each subtype:
 
 --The Object_Size, which is used for determining the default size of
 --objects and components. This size value can be referred to using the
 --Object_Size attribute. The phrase "is used" here means that it is
---the basis of the determination of the size. The backend is free to
+--the basis of the determination of the size. The back end is free to
 --pad this up if necessary for efficiency, e.g. an 8-bit stand-alone
 --character might be stored in 32 bits on a machine with no efficient
 --byte access instructions such as the Alpha.
 
---The default rules for the value of Object_Size for fixed-point and
---discrete types are as follows:
+--The default rules for the value of Object_Size are as follows:
 
 --   The Object_Size for base subtypes reflect the natural hardware
 --   size in bits (see Ttypes and Cstand for integer types). For
@@ -158,9 +162,11 @@ package Einfo is
 --   base type, and the Object_Size of a derived first subtype is copied
 --   from the parent first subtype.
 
---The Value_Size which is the number of bits required to store a value
+--The Ada 2020 RM defined attribute Object_Size uses this implementation.
+
+--The Value_Size, which is the number of bits required to store a value
 --of the type. This size can be referred to using the Value_Size
---attribute. This value is used to determine how tightly to pack
+--attribute. This value is used for determining how tightly to pack
 --records or arrays with components of this type, and also affects
 --the semantics of unchecked conversion (unchecked conversions where
 --the Value_Size values differ generate a warning, and are potentially
@@ -182,7 +188,7 @@ package Einfo is
 --   dynamic bounds, it is assumed that the value can range down or up
 --   to the corresponding bound of the ancestor.
 
---The RM defined attribute Size corresponds to the Value_Size attribute.
+--The Ada 95 RM defined attribute Size is identified with Value_Size.
 
 --The Size attribute may be defined for a first-named subtype. This sets
 --the Value_Size of the first-named subtype to the given value, and the
@@ -194,14 +200,15 @@ package Einfo is
 --subtypes. The Value_Size of any other static subtypes is not affected.
 
 --Value_Size and Object_Size may be explicitly set for any subtype using
---an attribute definition clause. Note that the use of these attributes
---can cause the RM 13.1(14) rule to be violated. If two access types
---reference aliased objects whose subtypes have differing Object_Size
---values as a result of explicit attribute definition clauses, then it
---is erroneous to convert from one access subtype to the other.
-
---At the implementation level, Esize stores the Object_Size and the
---RM_Size field stores the Value_Size (and hence the value of the
+--an attribute definition clause. Note that the use of such a clause can
+--cause the RM 13.1(14) rule to be violated, in Ada 95 and 2020 for the
+--Value_Size attribute, but only in Ada 95 for the Object_Size attribute.
+--If access types reference aliased objects whose subtypes have differing
+--Object_Size values as a result of explicit attribute definition clauses,
+--then it is erroneous to convert from one access subtype to the other.
+
+--At the implementation level, the Esize field stores the Object_Size
+--and the RM_Size field stores the Value_Size (hence the value of the
 --Size attribute, which, as noted above, is equivalent to Value_Size).
 
 --  To get a feel for the difference, consider the following examples (note

--- gcc/ada/sem_eval.adb
+++ gcc/ada/sem_eval.adb
@@ -5905,7 +5905,8 @@ package body Sem_Eval is
--  In addition, in GNAT, the object size (Esize) values of the types must
--  match if they are set (unless 

[Ada] Bad "already use-visible" warning re: use in private part

2019-12-17 Thread Pierre-Marie de Rodat
This patch fixes a bug in which if a parent package has a use clause in
its private part, and a child of that parent has a use clause for the
same thing in its context clause, the compiler incorrectly warns that
the one in the child is redundant.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-12-18  Bob Duff  

gcc/ada/

* sem_ch8.adb (Note_Redundant_Use): It was already checking for
a use clause in the visible part of the child. Add an additional
check for a use clause in the context clause of the child.--- gcc/ada/sem_ch8.adb
+++ gcc/ada/sem_ch8.adb
@@ -9607,15 +9607,16 @@ package body Sem_Ch8 is
   Par : constant Entity_Id := Defining_Entity (Parent (Decl));
   Spec : constant Node_Id  :=
Specification (Unit (Cunit (Current_Sem_Unit)));
-
+  Cur_List : constant List_Id := List_Containing (Cur_Use);
begin
   if Is_Compilation_Unit (Par)
 and then Par /= Cunit_Entity (Current_Sem_Unit)
-and then Parent (Cur_Use) = Spec
-and then List_Containing (Cur_Use) =
-   Visible_Declarations (Spec)
   then
- return;
+ if Cur_List = Context_Items (Cunit (Current_Sem_Unit))
+   or else Cur_List = Visible_Declarations (Spec)
+ then
+return;
+ end if;
   end if;
end;
 end if;
@@ -9629,7 +9630,6 @@ package body Sem_Ch8 is
  then
 Redundant := Clause;
 Prev_Use  := Cur_Use;
-
  end if;
 
  if Present (Redundant) and then Parent (Redundant) /= Prev_Use then



  1   2   >