[PATCH, testsuite] Fix PR92464 by adjust test case loop bound

2019-11-12 Thread Kewen.Lin
Hi,

As PR92464 shows, the recent vectorization cost adjustment on load
insns is responsible for this regression.  It leads the profitable
min iteration count to change from 19 to 12.  The case happens to
hit the threshold.  By actual runtime performance evaluation, the
vectorized version perform on par with non vectorized version
(before).  So the vectorization on 12 is actually fine.  To keep
the case sensitive on high peeling cost, this patch is to adjust
the loop bound from 16 to 14.

Verified on ppc64-redhat-linux (BE P7) and powerpc64le-linux-gnu
(LE P8). 


BR,
Kewen

-

gcc/testsuite/ChangeLog

2019-11-13  Kewen Lin  

PR target/92464
* gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust
loop bound due to load cost adjustment.


diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
index 4a7da2e..1bb064e 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
@@ -4,7 +4,7 @@
 #include 
 #include "../../tree-vect.h"

-#define N 16
+#define N 14
 #define OFF 4

 /* Check handling of accesses for which the "initial condition" -



[PATCH 3/3] Improve efficiency of copying section from another tree

2019-11-12 Thread Strager Neds
Several parts of GCC need to copy a section name from one tree (or
symtab_node) to another. Currently, this is implemented naively:

1. Query the source's section name
2. Hash the section name string
3. Find the section_hash_entry in the symbol table
4. Increment the section_hash_entry's reference count
5. Assign the destination's section to the section_hash_entry

Since we have the source's section_hash_entry, we can copy the section
name from one symtab_node to another efficiently with the following
algorithm:

1. Query the source's section_hash_entry
2. Increment the section_hash_entry's reference count
3. Assign the destination's section to the section_hash_entry

Implement this algorithm in the overload of symtab_node::set_section
which takes an existing symtab_node.

I did not measure the performance impact of this patch. In
particular, I do not know if this patch actually improves performance.

This patch should not change behavior.

Testing: Bootstrap on x86_64-linux-gnu with --disable-multilib
--enable-checking=release --enable-languages=c,c++. Observe no change in
test results.

2019-11-12  Matthew Glazar 

* gcc/cgraph.h (symtab_node::set_section_for_node): Declare new
overload.
(symtab_node::set_section_from_string): Rename from set_section.
(symtab_node::set_section_from_node): Declare.
* gcc/symtab.c (symtab_node::set_section_for_node): Define new
overload.
(symtab_node::set_section_from_string): Rename from set_section.
(symtab_node::set_section_from_node): Define.
(symtab_node::set_section): Call renamed set_section_from_string.
(symtab_node::set_section): Call new set_section_from_node.


diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 3b07258b31d..928a8bc2729 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -313,6 +313,10 @@ public:
  use set_section.  */
   void set_section_for_node (const char *section);

+  /* Like set_section_for_node, but copying the section name from another
+ node.  */
+  void set_section_for_node (const symtab_node );
+
   /* Set initialization priority to PRIORITY.  */
   void set_init_priority (priority_type priority);

@@ -627,8 +631,9 @@ protected:
   void *data,
   bool include_overwrite);
 private:
-  /* Worker for set_section.  */
-  static bool set_section (symtab_node *n, void *s);
+  /* Workers for set_section.  */
+  static bool set_section_from_string (symtab_node *n, void *s);
+  static bool set_section_from_node (symtab_node *n, void *o);

   /* Worker for symtab_resolve_alias.  */
   static bool set_implicit_section (symtab_node *n, void *);
diff --git a/gcc/symtab.c b/gcc/symtab.c
index a2aa519e760..40752addcb6 100644
--- a/gcc/symtab.c
+++ b/gcc/symtab.c
@@ -1596,15 +1596,37 @@ symtab_node::set_section_for_node (const char *section)
 }
 }

-/* Worker for set_section.  */
+void
+symtab_node::set_section_for_node (const symtab_node )
+{
+  if (x_section == other.x_section)
+return;
+  if (get_section () && other.get_section ())
+gcc_checking_assert (strcmp (get_section (), other.get_section ()) != 0);
+  release_section_hash_entry (x_section);
+  if (other.x_section)
+x_section = retain_section_hash_entry (other.x_section);
+  else
+x_section = NULL;
+}
+
+/* Workers for set_section.  */

 bool
-symtab_node::set_section (symtab_node *n, void *s)
+symtab_node::set_section_from_string (symtab_node *n, void *s)
 {
   n->set_section_for_node ((char *)s);
   return false;
 }

+bool
+symtab_node::set_section_from_node (symtab_node *n, void *o)
+{
+  const symtab_node  = *static_cast (o);
+  n->set_section_for_node (other);
+  return false;
+}
+
 /* Set section of symbol and its aliases.  */

 void
@@ -1612,15 +1634,14 @@ symtab_node::set_section (const char *section)
 {
   gcc_assert (!this->alias || !this->analyzed);
   call_for_symbol_and_aliases
-(symtab_node::set_section, const_cast(section), true);
+(symtab_node::set_section_from_string, const_cast(section), true);
 }

 void
 symtab_node::set_section (const symtab_node )
 {
-  const char *section = other.get_section ();
   call_for_symbol_and_aliases
-(symtab_node::set_section, const_cast(section), true);
+(symtab_node::set_section_from_node, const_cast(), true);
 }

 /* Return the initialization priority.  */


[PATCH 2/3] Refactor section name ref counting

2019-11-12 Thread Strager Neds
symtab_node::set_section_for_node manages the reference count of
section_hash_entry objects. I plan to add another function which needs
to manage the reference count of these objects. To avoid duplicating
code, factor the existing logic into reusable functions.

This patch should not change behavior.

Testing: Bootstrap on x86_64-linux-gnu with --disable-multilib
--enable-checking=release --enable-languages=c,c++. Observe no change in
test results.

2019-11-12  Matthew Glazar 

* gcc/symtab.c (symtab_node::set_section_for_node): Extract reference
counting logic into ...
(retain_section_hash_entry): ... here (new function) and ...
(release_section_hash_entry): ... here (new function).


diff --git a/gcc/symtab.c b/gcc/symtab.c
index 84d17c36189..a2aa519e760 100644
--- a/gcc/symtab.c
+++ b/gcc/symtab.c
@@ -368,6 +368,30 @@ section_name_hasher::equal (section_hash_entry
*n1, const char *name)
   return n1->name == name || !strcmp (n1->name, name);
 }

+static section_hash_entry *
+retain_section_hash_entry (section_hash_entry *entry)
+{
+  entry->ref_count++;
+  return entry;
+}
+
+static void
+release_section_hash_entry (section_hash_entry *entry)
+{
+  if (entry)
+{
+  entry->ref_count--;
+  if (!entry->ref_count)
+{
+  hashval_t hash = htab_hash_string (entry->name);
+  section_hash_entry **slot =
symtab->section_hash->find_slot_with_hash (entry->name,
+hash, INSERT);
+  ggc_free (entry);
+  symtab->section_hash->clear_slot (slot);
+}
+}
+}
+
 /* Add node into symbol table.  This function is not used directly, but via
cgraph/varpool node creation routines.  */

@@ -1543,46 +1567,33 @@ void
 symtab_node::set_section_for_node (const char *section)
 {
   const char *current = get_section ();
-  section_hash_entry **slot;

   if (current == section
   || (current && section
   && !strcmp (current, section)))
 return;

-  if (current)
-{
-  x_section->ref_count--;
-  if (!x_section->ref_count)
-{
-  hashval_t hash = htab_hash_string (x_section->name);
-  slot = symtab->section_hash->find_slot_with_hash (x_section->name,
-hash, INSERT);
-  ggc_free (x_section);
-  symtab->section_hash->clear_slot (slot);
-}
-  x_section = NULL;
-}
+  release_section_hash_entry (x_section);
   if (!section)
 {
+  x_section = NULL;
   implicit_section = false;
   return;
 }
   if (!symtab->section_hash)
 symtab->section_hash = hash_table::create_ggc (10);
-  slot = symtab->section_hash->find_slot_with_hash (section,
-htab_hash_string (section),
-INSERT);
+  section_hash_entry **slot = symtab->section_hash->find_slot_with_hash
+(section, htab_hash_string (section), INSERT);
   if (*slot)
-x_section = (section_hash_entry *)*slot;
+x_section = retain_section_hash_entry (*slot);
   else
 {
   int len = strlen (section);
   *slot = x_section = ggc_cleared_alloc ();
+  x_section->ref_count = 1;
   x_section->name = ggc_vec_alloc (len + 1);
   memcpy (x_section->name, section, len + 1);
 }
-  x_section->ref_count++;
 }

 /* Worker for set_section.  */


[PATCH 1/3] Refactor copying decl section names

2019-11-12 Thread Strager Neds
Sometimes, we need to copy a section name from one decl or symtab node
to another. Currently, this is done by getting the source's section
name and setting the destination's section name. For example:

set_decl_section_name (dest, DECL_SECTION_NAME (source));
dest->set_section (source->get_section ());

This code could be more efficient. Section names are stored in an
interning hash table, but the current interfaces of
set_decl_section_name and symtab_node::set_section force unnecessary
indirections (to get the section name) and hashing (to find the section
name in the hash table).

Overload set_decl_section_name and symtab_node::set_section to accept an
existing symtab_node to copy the section name from:

set_decl_section_name (dest, source);
dest->set_section (*source);

For now, implement these new functions as a simple wrapper around the
existing functions. In the future, these functions can be implemented
using just a pointer copy and an increment (for the reference count).

This patch should not change behavior.

Testing: Bootstrap on x86_64-linux-gnu with --disable-multilib
--enable-checking=release --enable-languages=c,c++. Observe no change in
test results.

2019-11-12  Matthew Glazar 

* gcc/cgraph.h (symtab_node::get_section): Constify.
(symtab_node::set_section): Declare new overload.
* gcc/symtab.c (symtab_node::set_section): Define new overload.
(symtab_node::copy_visibility_from): Use new overload of
symtab_node::set_section.
(symtab_node::resolve_alias): Same.
* gcc/tree.h (set_decl_section_name): Declare new overload.
* gcc/tree.c (set_decl_section_name): Define new overload.
* gcc/c/c-decl.c (merge_decls): Use new overload of
set_decl_section_name.
* gcc/cp/decl.c (duplicate_decls): Same.
* gcc/cp/method.c (use_thunk): Same.
* gcc/cp/optimize.c (maybe_clone_body): Same.
* gcc/d/decl.cc (finish_thunk): Same.
* gcc/tree-emutls.c (get_emutls_init_templ_addr): Same.
* gcc/cgraphclones.c (cgraph_node::create_virtual_clone): Use new
overload of symtab_node::set_section.
(cgraph_node::create_version_clone_with_body): Same.
* gcc/trans-mem.c (ipa_tm_create_version): Same.


diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 2841b4f5a77..366fbf2a28a 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -2845,7 +2845,7 @@ merge_decls (tree newdecl, tree olddecl, tree
newtype, tree oldtype)
|| TREE_PUBLIC (olddecl)
|| TREE_STATIC (olddecl))
   && DECL_SECTION_NAME (newdecl) != NULL)
-set_decl_section_name (olddecl, DECL_SECTION_NAME (newdecl));
+set_decl_section_name (olddecl, newdecl);

   /* This isn't quite correct for something like
 int __thread x attribute ((tls_model ("local-exec")));
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0abde3d8f91..3b07258b31d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -246,7 +246,7 @@ public:
 }

   /* Return section as string.  */
-  const char * get_section ()
+  const char * get_section () const
 {
   if (!x_section)
 return NULL;
@@ -305,6 +305,9 @@ public:
   /* Set section for symbol and its aliases.  */
   void set_section (const char *section);

+  /* Like set_section, but copying the section name from another node.  */
+  void set_section (const symtab_node );
+
   /* Set section, do not recurse into aliases.
  When one wants to change section of symbol and its aliases,
  use set_section.  */
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 41a600e64a5..0b1c93534f2 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -572,7 +572,7 @@ cgraph_node::create_virtual_clone (vec redirect_callers,
   set_new_clone_decl_and_node_flags (new_node);
   new_node->clone.tree_map = tree_map;
   if (!implicit_section)
-new_node->set_section (get_section ());
+new_node->set_section (*this);

   /* Clones of global symbols or symbols with unique names are unique.  */
   if ((TREE_PUBLIC (old_decl)
@@ -996,7 +996,7 @@ cgraph_node::create_version_clone_with_body
   new_version_node->local = 1;
   new_version_node->lowered = true;
   if (!implicit_section)
-new_version_node->set_section (get_section ());
+new_version_node->set_section (*this);
   /* Clones of global symbols or symbols with unique names are unique.  */
   if ((TREE_PUBLIC (old_decl)
&& !DECL_EXTERNAL (old_decl)
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 5c5a85e3221..ed4034f8e9d 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -2830,7 +2830,7 @@ duplicate_decls (tree newdecl, tree olddecl,
bool newdecl_is_friend)
  done later in decl_attributes since we are called before attributes
  are assigned.  */
   if (DECL_SECTION_NAME (newdecl) != NULL)
-set_decl_section_name (olddecl, DECL_SECTION_NAME (newdecl));
+set_decl_section_name (olddecl, newdecl);

   if (DECL_ONE_ONLY (newdecl))
 {
diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index 47441c10c52..d111792af5b 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ 

Re: [golang-dev] [PATCH] libgo/test: Add flags to find libgcc_s in build-tree testing

2019-11-12 Thread Maciej W. Rozycki
On Mon, 11 Nov 2019, Ian Lance Taylor wrote:

> > gcc/testsuite/
> > * lib/go.exp (go_link_flags): Add `ld_library_path' setting to
> > find shared `libgcc_s' library at run time in build-tree
> > testing.
> 
> Is there similar code for other languages, such as Fortran?  I don't
> see why Go would be different here.

 An example simulator invocation in Fortran testing here looks like:

spawn qemu-riscv64 -E 
LD_LIBRARY_PATH=.:/scratch/macro/riscv-linux/obj/gcc/riscv64-linux-gnu/lib64/lp64d/libgfortran/.libs:/scratch/macro/riscv-linux/obj/gcc/riscv64-linux-gnu/lib64/lp64d/libgfortran/.libs:/scratch/macro/riscv-linux/obj/gcc/riscv64-linux-gnu/lib64/lp64d/libatomic/.libs:/scratch/macro/riscv-linux/obj/gcc/gcc:/scratch/macro/riscv-linux/obj/gcc/gcc/lib32/ilp32:/scratch/macro/riscv-linux/obj/gcc/gcc/lib32/ilp32d:/scratch/macro/riscv-linux/obj/gcc/gcc/lib64/lp64:/scratch/macro/riscv-linux/obj/gcc/gcc/lib64/lp64d:/scratch/macro/riscv-linux/obj/gcc/gcc:/scratch/macro/riscv-linux/obj/gcc/gcc/lib32/ilp32:/scratch/macro/riscv-linux/obj/gcc/gcc/lib32/ilp32d:/scratch/macro/riscv-linux/obj/gcc/gcc/lib64/lp64:/scratch/macro/riscv-linux/obj/gcc/gcc/lib64/lp64d
 ./alloc_comp_4.exe

and there are indeed copies of newly-built `libgcc_s' available in these 
directories, however Fortran testing is different as it is done as a part 
of GCC proper (similarly to `check-gcc-go') rather than a top-level 
library (libgfortran/ has no separate testsuite associated and 
`check-target-libgfortran' does nothing).

 I believe this is arranged by `gcc-set-multilib-library-path' in 
gcc/testsuite/lib/gcc-defs.exp, and this is does get invoked from 
`go_link_flags' too, however it has no chance to work, as the paths are 
only set if the `rootme' TCL variable is.  And `rootme' is only set via 
site.exp in gcc/ AFAICT, so all the top-level lib*/ test suites that call 
`gcc-set-multilib-library-path' are busted (unless invoked standalone with 
handcrafted site.exp or suchlike), though I guess no other one relies on 
`libgcc_s', not at least throughout, which I find plausible and which is 
why they appear to work just fine in my test environment (though I have to 
admit I haven't gone through all the testsuite failures yet, so maybe 
there is indeed a case there or a dozen that is broken).

 So I think you are right with libgo/ testing being no particularly 
different from other top-level libraries and this requires a better 
clean-up, which would go into gcc/testsuite/lib/gcc-defs.exp instead.

 In particular I think the use of `exec' is unsafe as it has a 
ridiculously short executable path length limit imposed (which I have 
actually overrun in the past in my previous test environments) and does 
not work for a remote host (as already lossily guarded against).  Both 
issues are addressed with the use of `remote_exec host', as I did with my 
proposed code.

 Also do we need to add non-selected multilib run-time load paths?  That 
seems to be the reason of the requirement to know `rootme' on one hand and 
of questionable use on the other, however the history of the change that 
introduced it is too complicated for me to know the answer the question 
offhand.  I'll have to spend some time looking into it and reading through 
past discussions, though it may take a couple days as I have an unrelated 
change outstanding that is not a bug fix and which I therefore want to 
give priority and submit before stage 1 ends.

 Please consider this patch withdrawn then, and I'll propose a replacement 
change for gcc/testsuite/lib/gcc-defs.exp soon.  We can discuss the 
concerns there.

> >  NB as a heads-up numerous tests fail quietly (i.e. with no FAIL report
> > and no name of the test case given either) to link due to unsatisfied
> > symbol references, such as:
> >
> > .../bin/riscv64-linux-gnu-ld: _gotest_.o: in function 
> > `cmd..z2fgo..z2finternal..z2fcache.Cache.get':
> > .../riscv64-linux-gnu/libgo/gotest24771/test/cache.go:182: undefined 
> > reference to `cmd..z2fgo..z2finternal..z2frenameio.ReadFile'
> >
> > which I take is due to a reference to `libgotool.a' -- which is where the
> > required symbols are defined -- missing from the linker invocation.  I
> > don't know what's supposed to supply the library to the linker or whether
> > this indeed the actual cause; I find the way libgo tests have been wired
> > unusual and consequently hard to follow, so maybe someone more familiar
> > with this stuff will be able to tell what is going on here.  I'll be happy
> > to push any patches through testing.
> 
> (That is, of course, a libgo test failure, and as such is not affected
> by your patch to go.exp.)
> 
> In normal usage, that test is linked against libgotool.a because of
> the variable extra_check_libs_cmd_go_internal_cache in
> libgo/Makefile.am.  That variable is added to GOLIBS in the CHECK
> variable in libgo/Makefile.am.  Maybe the fix is for
> libgo/testsuite/lib/libgo.exp to use GOLIBS.

 Oh, I can see 

[PATCH] libgo/test: Pass $GOLIBS to compilation in DejaGNU testing

2019-11-12 Thread Maciej W. Rozycki
Pass $GOLIBS to compilation in DejaGNU testing like with direct compiler 
invocation from `libgo/testsuite/gotest', removing link problems in 
cross-toolchain testing like:

.../bin/riscv64-linux-gnu-ld: _gotest_.o: in function 
`cmd..z2fgo..z2finternal..z2fcache.Cache.get':
.../riscv64-linux-gnu/libgo/gotest24771/test/cache.go:182: undefined reference 
to `cmd..z2fgo..z2finternal..z2frenameio.ReadFile'

due to `libgotool.a' missing from the linker invocation command and 
improving overall test results for the `riscv64-linux-gnu' target (here 
with the `x86_64-linux-gnu' host and RISC-V QEMU in the Linux user 
emulation mode as the target board) from 133 PASSes and 26 FAILs to 145 
PASSes and 29 FAILs.
---
 libgo/testsuite/libgo.testmain/testmain.exp |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

gcc-test-libgo-dejagnu-golibs.diff
Index: gcc/libgo/testsuite/libgo.testmain/testmain.exp
===
--- gcc.orig/libgo/testsuite/libgo.testmain/testmain.exp
+++ gcc/libgo/testsuite/libgo.testmain/testmain.exp
@@ -47,7 +47,11 @@ if [info exists gluefile] {
 regsub $gluefile $object_files "" object_files
 }
 
-set comp_output [go_target_compile "$object_files _testmain.go" \
+set golibs ""
+if [info exists env(GOLIBS)] {
+set golibs "$env(GOLIBS)"
+}
+set comp_output [go_target_compile "$object_files _testmain.go $golibs" \
 "./a.exe" "executable" $options]
 if ![ string match "" $comp_output ] {
 verbose -log $comp_output


Re: Fix ICE in ipa-cp when mixing -O0 and -O2 code in LTO

2019-11-12 Thread Jan Hubicka
> > this fixes second ICE seen during profiledbootstrap with Ada enabled.
> > The problem is that Ada builds some object files with -O2 -O0 (not sure
> > why) and these functions get flag_ipa_cp but !optimize.   
> 
> # Compile s-excdeb.o without optimization and with debug info to let the
> # debugger set breakpoints and inspect subprogram parameters on exception
> # related events.
> 
> We could probably try wih -Og these days.

That sounds like a good idea to me :)

Using -O0 and -fprofile-generate triggers yet another problem.  The
module gets a constructor function registering the profile info and
TARGET_OPTIMIZATION_NODE is not set for it (because it is built after
free lang data). This eventually makes ipa-cp to think that the function
is compiled with -O2 (because WPA is invoked with -O2) and ICE again.

I am testing the following:

Index: ipa.c
===
--- ipa.c   (revision 278094)
+++ ipa.c   (working copy)
@@ -914,7 +914,10 @@ cgraph_build_static_cdtor_1 (char which,
 void
 cgraph_build_static_cdtor (char which, tree body, int priority)
 {
-  cgraph_build_static_cdtor_1 (which, body, priority, false, NULL, NULL);
+  gcc_assert (!in_lto_p);
+  cgraph_build_static_cdtor_1 (which, body, priority, false,
+  optimization_default_node,
+  target_option_default_node);
 }
 
 /* When target does not have ctors and dtors, we call all constructor


Re: [PATCH 5/7 libgomp,amdgcn] Optimize GCN OpenMP malloc performance

2019-11-12 Thread Jakub Jelinek
On Tue, Nov 12, 2019 at 05:47:11PM +, Andrew Stubbs wrote:
> > Not really sure if it is a good idea to print anything, at least not when
> > in some debugging mode.  I mean, it is fairly easy to write code that will
> > trigger this.  And, what is the reason why you can't free the
> > gomp_malloced memory, like comparing if the team_freed pointer is in between
> > TEAM_ARENA_START and TEAM_ARENA_END or similar, don't do anything in that
> > case, otherwise use free?
> 
> Falling back to malloc is a big performance hit. There's a global lock
> affecting all teams in all running kernels. If we're running into this then
> a) I want to know about it so I can tune the arena size, and b) I want the
> user to know why performance is suddenly worse.
> 
> At least for now, I want to keep the message. I've updated the comment
> though.

At some point we'll need to pass some data from the host to offloading
target, including ICVs etc. (at least for OpenMP 5.1) and at that point we
can pass also the GOMP_DEBUG state.

> I wanted to ensure that the loop would vectorize inline, but I don't think
> it was doing so anyway. I need to look at that, but how is this, for now?

Ok.

Jakub



Re: Free ipa-prop edge summaries for inline calls

2019-11-12 Thread Martin Jambor
On Sun, Oct 27 2019, Jan Hubicka wrote:
> Hi,
> this patch makes ipa-prop to free edge summaries (jump functions) for
> calls which has been inlined because they are no longer useful.
> The main change is to change IPA_EDGE_REF from get_create to get
> and thus we need to watch for missing summaires at some places so
> combining -O0 and -O2 code works.

So, I never quite liked the IPA_NODE_REF and IPA_EDGE_REF macros.
Perhaps now would be a good time to replace them everywhere with the get
(and get_create) methods of the summary holders?

Since ipa_node_params_sum->get might be a bit too long, perhaps we could
use ipcp_node_sum->get or something similar.  And similarly for edges.

What do you think?

Martin


>
> Bootstrapped/regtested x86_64-linux, comitted.
>
>   * ipa-cp.c (propagate_constants_across_call): If args are not available
>   just drop everything to varying.
>   (find_aggregate_values_for_callers_subset): Watch for missing
>   edge summary.
>   (find_more_scalar_values_for_callers_subs): Likewise.
>   * ipa-prop.c (ipa_compute_jump_functions_for_edge,
>   update_jump_functions_after_inlining, propagate_controlled_uses):
>   Watch for missing summaries.
>   (ipa_propagate_indirect_call_infos): Remove summary after propagation
>   is finished.
>   (ipa_write_node_info): Watch for missing summaries.
>   (ipa_read_edge_info): Create new ref.
>   (ipa_edge_args_sum_t): Add remove.
>   (IPA_EDGE_REF_GET_CREATE): New macro.
>   * ipa-fnsummary.c (evaluate_properties_for_edge): Watch for missing
>   edge summary.
>   (remap_edge_change_prob): Likewise.


[PATCH] rs6000: Use ULL on big hexadecimal literal

2019-11-12 Thread Segher Boessenkool
C++98 does not have long long int, and does not use (unsigned) long
long int for hexadecimal literals.  So let's use an ULL suffix here,
which is still not strict C++98, but which works with more compilers.

Tested etc.; committing to trunk.


Segher


2019-11-12  Segher Boessenkool  

* config/rs6000/rs6000.md (rs6000_set_fpscr_drn): Use ULL on big
hexadecimal literal.

---
 gcc/config/rs6000/rs6000.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 1327285..d165344 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6008,7 +6008,7 @@ (define_expand "rs6000_set_fpscr_drn"
   /* Insert new RN mode into FSCPR.  */
   emit_insn (gen_rs6000_mffs (tmp_df));
   tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
-  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0xFFF8)));
+  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0xFFF8ULL)));
   emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn));
 
   /* Need to write to field 7.  The fields are [0:15].  The equation to
-- 
1.8.3.1



Re: [PATCH] Enable libsanitizer build on riscv64

2019-11-12 Thread Jim Wilson
On Mon, Nov 11, 2019 at 3:45 AM Andreas Schwab  wrote:
> Only ubsan is supported so far.  This has been tested on openSUSE
> Tumbleweed, there are no testsuite failures.
>
> * configure.tgt (riscv64-*-linux*): Enable build.

With a workaround for the ipc_perm/mode issue, I can reproduce the
same result on my Fedora rawhide machine.  This patch is OK.

Jim


[C++ PATCH] Merge some using-decl handling

2019-11-12 Thread Nathan Sidwell
We currently process member and nonmember using decls completely 
separately.  C++20 will have some new using decls for enums (see 
wg21.link/p1099 for details), and there will be some commonality -- 
member using decls can refer to non-member enumerators and non-member 
using-decls can refer to member enumerators.


It makes sense to handle the lookup of using decls in one place, rather 
than duplicate the handling.  This patch does that, with no change to 
semantics.  Some error messages change, that is all.


nathan
--
Nathan Sidwell
2019-11-12  Nathan Sidwell  

	gcc/cp/
	* name-lookup.c (lookup_using_decl): New function, merged from ...
	(do_class_using_decl): ... here.  Call it.  And ...
	(finish_nonmember_using_decl): ... here.  Call it.

	gcc/testsuite/
	* g++.dg/cpp0x/using-enum-2.C: Adjust expected error text.
	* g++.dg/cpp0x/using-enum-3.C: Likewise.
	* g++.dg/lookup/using4.C: Likewise.
	* g++.dg/lookup/using7.C: Likewise.
	* g++.dg/template/using12.C: Likewise.
	* g++.dg/template/using18.C: Likewise.
	* g++.dg/template/using22.C: Likewise.

Index: gcc/cp/name-lookup.c
===
--- gcc/cp/name-lookup.c	(revision 278096)
+++ gcc/cp/name-lookup.c	(working copy)
@@ -4585,100 +4585,164 @@ push_class_level_binding (tree name, tre
 }
 
-/* Process "using SCOPE::NAME" in a class scope.  Return the
-   USING_DECL created.  */
+/* Process and lookup a using decl SCOPE::lookup.name, filling in
+   lookup.values & lookup.type.  Return true if ok.  */
 
-tree
-do_class_using_decl (tree scope, tree name)
+static bool
+lookup_using_decl (tree scope, name_lookup )
 {
-  if (name == error_mark_node)
-return NULL_TREE;
+  tree current = current_scope ();
+  bool dependent_p = false;
 
-  if (!scope || !TYPE_P (scope))
+  if (TREE_CODE (scope) == NAMESPACE_DECL)
 {
-  error ("using-declaration for non-member at class scope");
-  return NULL_TREE;
-}
+  /* Naming a namespace member.  */
+  if (TYPE_P (current))
+	{
+	  error ("using-declaration for non-member at class scope");
+	  return false;
+	}
 
-  /* Make sure the name is not invalid */
-  if (TREE_CODE (name) == BIT_NOT_EXPR)
-{
-  error ("%<%T::%D%> names destructor", scope, name);
-  return NULL_TREE;
+  qualified_namespace_lookup (scope, );
 }
-
-  /* Using T::T declares inheriting ctors, even if T is a typedef.  */
-  if (MAYBE_CLASS_TYPE_P (scope)
-  && (name == TYPE_IDENTIFIER (scope)
-	  || constructor_name_p (name, scope)))
+  else if (TREE_CODE (scope) == ENUMERAL_TYPE)
 {
-  maybe_warn_cpp0x (CPP0X_INHERITING_CTORS);
-  name = ctor_identifier;
-  CLASSTYPE_NON_AGGREGATE (current_class_type) = true;
-  TYPE_HAS_USER_CONSTRUCTOR (current_class_type) = true;
+  error ("using-declaration may not name enumerator %<%E::%D%>",
+	 scope, lookup.name);
+  return false;
 }
-
-  /* Cannot introduce a constructor name.  */
-  if (constructor_name_p (name, current_class_type))
+  else
 {
-  error ("%<%T::%D%> names constructor in %qT",
-	 scope, name, current_class_type);
-  return NULL_TREE;
-}
+  /* Naming a class member.  */
+  if (!TYPE_P (current))
+	{
+	  error ("using-declaration for member at non-class scope");
+	  return false;
+	}
+
+  /* Make sure the name is not invalid */
+  if (TREE_CODE (lookup.name) == BIT_NOT_EXPR)
+	{
+	  error ("%<%T::%D%> names destructor", scope, lookup.name);
+	  return false;
+	}
 
-  /* From [namespace.udecl]:
+  /* Using T::T declares inheriting ctors, even if T is a typedef.  */
+  if (MAYBE_CLASS_TYPE_P (scope)
+	  && (lookup.name == TYPE_IDENTIFIER (scope)
+	  || constructor_name_p (lookup.name, scope)))
+	{
+	  maybe_warn_cpp0x (CPP0X_INHERITING_CTORS);
+	  lookup.name = ctor_identifier;
+	  CLASSTYPE_NON_AGGREGATE (current) = true;
+	  TYPE_HAS_USER_CONSTRUCTOR (current) = true;
+	}
 
-   A using-declaration used as a member-declaration shall refer to a
-   member of a base class of the class being defined.
+  /* Cannot introduce a constructor name.  */
+  if (constructor_name_p (lookup.name, current))
+	{
+	  error ("%<%T::%D%> names constructor in %qT",
+		 scope, lookup.name, current);
+	  return false;
+	}
 
- In general, we cannot check this constraint in a template because
- we do not know the entire set of base classes of the current
- class type. Morover, if SCOPE is dependent, it might match a
- non-dependent base.  */
+  /* Member using decls finish processing when completing the
+	 class.  */
+  /* From [namespace.udecl]:
 
-  tree decl = NULL_TREE;
-  if (!dependent_scope_p (scope))
-{
-  base_kind b_kind;
-  tree binfo = lookup_base (current_class_type, scope, ba_any, _kind,
-tf_warning_or_error);
-  if (b_kind < bk_proper_base)
+ A using-declaration used as a member-declaration shall refer
+ to a member of a base class of the 

Re: [Patch, RFC] PR81651/Fortran - Enhancement request: have f951 print out fully qualified module file name

2019-11-12 Thread Harald Anlauf
On 11/11/19 23:37, Janne Blomqvist wrote:
> On Mon, Nov 11, 2019 at 11:54 PM Harald Anlauf  wrote:
>>
>> Dear all,
>>
>> the attached patch prints the fully qualified path if an error occurs
>> during module read.  E.g., instead of a less helpful error message,
>>
>> pr81651.f90:2:6:
>>
>> 2 |   use netcdf
>>   |  1
>> Fatal Error: File 'netcdf.mod' opened at (1) is not a GNU Fortran module
>> file
>>
>> gfortran will print
>>
>> pr81651.f90:2:7:
>>
>> 2 |   use netcdf
>>   |   1
>> Fatal Error: File '/opt/pgi/pkg/netcdf/include/netcdf.mod' opened at (1)
>> is not a GNU Fortran module file
>>
>> Regtested on x86_64-pc-linux-gnu.
>>
>> I couldn't think of a sensible test for the testsuite, thus no testcase
>> provided.
>>
>> OK for trunk?
>>
>> Thanks,
>> Harald
>>
>> 2019-11-11  Harald Anlauf  
>>
>> PR fortran/81651
>> * module.c (gzopen_included_file, gzopen_included_file_1)
>> (gzopen_intrinsic_module, bad_module, gfc_use_module): Use fully
>> qualified module path for error reporting.
>
> Ok.
>

Committed as svn rev. 278105.

Thanks for the review!

Harald



Re: [PATCH 4/7] Remove gcc/params.* files.

2019-11-12 Thread Rainer Orth
Hi Martin,

> gcc/ChangeLog:
>
> 2019-11-06  Martin Liska  
>
>   * Makefile.in: Remove PARAMS_H and params.list
>   and params.options.

this has obviously not been tested properly: it completely broke
gcc.dg/params/params.exp:

+ERROR: couldn't open 
"/var/gcc/regression/trunk/11.5-gcc/build/gcc/testsuite/gcc/../../params.options":
 no such file or directory
[...]
+ERROR: tcl error sourcing /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/para
ms/params.exp.
[...]

once for every instance of parallel testing.

Please fix and properly follow testing procedures in the future.  Just
looking for FAILs is not enough: e.g. comparing mail-report.log before
and after the patch would have immediately shown this.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 2/4] MSP430: Disable exception handling by default for C++

2019-11-12 Thread Richard Sandiford
Jozef Lawrynowicz  writes:
> diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
> index 1df645e283c..1ce449cb935 100644
> --- a/gcc/testsuite/lib/gcc-dg.exp
> +++ b/gcc/testsuite/lib/gcc-dg.exp
> @@ -417,6 +417,16 @@ proc gcc-dg-prune { system text } {
>   return "::unsupported::large return values"
>  }
>  
> +# If exceptions are disabled, mark tests expecting exceptions to be 
> enabled
> +# as unsupported.
> +if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
> + return "::unsupported::exception handling disabled"
> +}

This is probably safe, but...

> +
> +if [regexp "(^|\n)\[^\n\]*: error: #error .__cpp_exceptions." $text] {
> + return "::unsupported::exception handling disabled"
> +}

...it looks like this would disable g++.dg/cpp1y/feat-neg.C for all
targets.  I assume this was motivated by g++.dg/cpp2a/feat-cxx2a.C,
but the kind of effective-target tests you had in the original patch
are probably better there.  It might then be more robust to test that
[check_effective_target_...] for the "exception handling disabled" case
above as well, so that other targets aren't affected accidentally.

It'd be good to test a target/multilib that has exceptions enabled by
default to make sure there's no change in the number of unsupported
tests (rather than just no new fails).

Thanks,
Richard


[PATCH 2/2] testsuite: Add testcases for PR92449

2019-11-12 Thread Segher Boessenkool
2019-11-12  Segher Boessenkool  

gcc/testsuite/
* gcc.c-torture/compile/pr92449.c: New test.
* gcc.target/powerpc/pr92449-1.c: New test.

---
 gcc/testsuite/gcc.c-torture/compile/pr92449.c | 7 +++
 gcc/testsuite/gcc.target/powerpc/pr92449-1.c  | 7 +++
 2 files changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr92449.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr92449-1.c

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr92449.c 
b/gcc/testsuite/gcc.c-torture/compile/pr92449.c
new file mode 100644
index 000..74e7377
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr92449.c
@@ -0,0 +1,7 @@
+/* PR target/92449 */
+/* { dg-additional-options "-ffast-math -fno-cx-limited-range" } */
+
+void do_div (_Complex double *a, _Complex double *b)
+{
+  *a = *b / (4.0 - 5.0fi);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr92449-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr92449-1.c
new file mode 100644
index 000..f9fcb84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr92449-1.c
@@ -0,0 +1,7 @@
+/* { dg-options "-Ofast -mdejagnu-cpu=power9 " } */
+
+int
+compare_exponents_unordered (double exponent1, double exponent2)
+{
+  return __builtin_vec_scalar_cmp_exp_unordered (exponent1, exponent2);
+}
-- 
1.8.3.1



[PATCH 0/2] Fix PR 92449 (unordered with -ffast-math)

2019-11-12 Thread Segher Boessenkool
Patch 1 fixes the target builtin mentioned in PR 92449 comment 1, where
the user code asks us to generate code for unordered although we are
compiling without NaNs.  This fixes it by simple hardcoding the result
for that case.

Patch 2 adds a testcase for this, and one for the tree-complex part that
Jakub fixed earlier.

Tested on powerpc64-linux {-m32,-m64}.  Cmmitting to trunk.


Segher


 gcc/config/rs6000/vsx.md  | 12 
 gcc/testsuite/gcc.c-torture/compile/pr92449.c |  7 +++
 gcc/testsuite/gcc.target/powerpc/pr92449-1.c  |  7 +++
 3 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr92449.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr92449-1.c

-- 
1.8.3.1



[PATCH 1/2] rs6000: Handle unordered for xscmpexp[dq]p without NaNs (PR92449)

2019-11-12 Thread Segher Boessenkool
2019-11-12  Segher Boessenkool  

* config/rs6000/vsx.md (xscmpexpdp_ for CMP_TEST): Handle
UNORDERED if !HONOR_NANS (DFmode).
(xscmpexpqp__ for CMP_TEST and IEEE128): Handle UNORDERED
if !HONOR_NANS (mode).

---
 gcc/config/rs6000/vsx.md | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index aa13b20..3aa8e21 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4526,6 +4526,12 @@ (define_expand "xscmpexpdp_"
 (const_int 0)))]
   "TARGET_P9_VECTOR"
 {
+  if ( == UNORDERED && !HONOR_NANS (DFmode))
+{
+  emit_move_insn (operands[0], const0_rtx);
+  DONE;
+}
+
   operands[3] = gen_reg_rtx (CCFPmode);
 })
 
@@ -4554,6 +4560,12 @@ (define_expand "xscmpexpqp__"
 (const_int 0)))]
   "TARGET_P9_VECTOR"
 {
+  if ( == UNORDERED && !HONOR_NANS (mode))
+{
+  emit_move_insn (operands[0], const0_rtx);
+  DONE;
+}
+
   operands[3] = gen_reg_rtx (CCFPmode);
 })
 
-- 
1.8.3.1



Re: Fix ICE in ipa-cp when mixing -O0 and -O2 code in LTO

2019-11-12 Thread Eric Botcazou
> this fixes second ICE seen during profiledbootstrap with Ada enabled.
> The problem is that Ada builds some object files with -O2 -O0 (not sure
> why) and these functions get flag_ipa_cp but !optimize.   

# Compile s-excdeb.o without optimization and with debug info to let the
# debugger set breakpoints and inspect subprogram parameters on exception
# related events.

We could probably try wih -Og these days.

-- 
Eric Botcazou


Re: [Patch] PR fortran/92470 Fixes for CFI_address

2019-11-12 Thread Tobias Burnus

Hi all,

On 11/12/19 3:42 PM, Tobias Burnus wrote:
(2) CFI_establish: For allocatables, it is clear – base_addr == NULL. 
For pointers, it is clear as well – it has to be '0' according to the 
standard. But for CFI_attribute_other … I have now asked at 
https://mailman.j3-fortran.org/pipermail/j3/2019-November/thread.html#11740 



While I still have problems to decipher the standard, regarding 
CFI_establish, Steve L wrote:


"In the C descriptor world, arrays start at zero as they do in C. The 
only way they can become non-zero is through argument association, 
allocation or pointer association as specified in 18.5.3p3. For 
non-pointer, not-allocatable objects (this means "other"), the lower 
bounds are supposed to be always zero."


Hence, I now also set it for CFI_attribute_other to 0 – and check it in 
a test case (most users there have NULL as base_addr, hence, only a 
single assert is in that file).


Build on x86-64_gnu-linux.
OK for the trunk and GCC-9?

Tobias

2019-11-12  Tobias Burnus  

	libgfortran/
	PR fortran/92470
	* runtime/ISO_Fortran_binding.c (CFI_establish): Set lower_bound to 0
	also for CFI_attribute_other.

	gcc/testsuite/
	PR fortran/92470
	* gfortran.dg/ISO_Fortran_binding_1.c (establish_c): Add assert for
	lower_bound == 0.

diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c
index 091e754d8f9..a5714593c52 100644
--- a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c
+++ b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c
@@ -109,6 +109,7 @@ int establish_c(CFI_cdesc_t * desc)
 		  CFI_attribute_pointer,
 		  CFI_type_struct,
 		  sizeof(t), 1, extent);
+  assert (desc->dim[0].lower_bound == 0);
   for (idx[0] = 0; idx[0] < extent[0]; idx[0]++)
 {
   res_addr = (t*)CFI_address (desc, idx);
diff --git a/libgfortran/runtime/ISO_Fortran_binding.c b/libgfortran/runtime/ISO_Fortran_binding.c
index 7ae2a9351da..91d9ae46d3d 100644
--- a/libgfortran/runtime/ISO_Fortran_binding.c
+++ b/libgfortran/runtime/ISO_Fortran_binding.c
@@ -387,13 +387,7 @@ int CFI_establish (CFI_cdesc_t *dv, void *base_addr, CFI_attribute_t attribute,
 
   for (int i = 0; i < rank; i++)
 	{
-	  /* If the C Descriptor is for a pointer then the lower bounds of every
-	   * dimension are set to zero. */
-	  if (attribute == CFI_attribute_pointer)
-	dv->dim[i].lower_bound = 0;
-	  else
-	dv->dim[i].lower_bound = 1;
-
+	  dv->dim[i].lower_bound = 0;
 	  dv->dim[i].extent = extents[i];
 	  if (i == 0)
 	dv->dim[i].sm = dv->elem_len;


Re: [PATCH] Set AVX128_OPTIMAL for all avx targets.

2019-11-12 Thread H.J. Lu
On Tue, Nov 12, 2019 at 2:48 AM Hongtao Liu  wrote:
>
> On Tue, Nov 12, 2019 at 4:41 PM Richard Biener
>  wrote:
> >
> > On Tue, Nov 12, 2019 at 9:29 AM Hongtao Liu  wrote:
> > >
> > > On Tue, Nov 12, 2019 at 4:19 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Nov 12, 2019 at 8:36 AM Hongtao Liu  wrote:
> > > > >
> > > > > Hi:
> > > > >   This patch is about to set X86_TUNE_AVX128_OPTIMAL as default for
> > > > > all AVX target because we found there's still performance gap between
> > > > > 128-bit auto-vectorization and 256-bit auto-vectorization even with
> > > > > epilog vectorized.
> > > > >   The performance influence of setting avx128_optimal as default on
> > > > > SPEC2017 with option `-march=native -funroll-loops -Ofast -flto" on
> > > > > CLX is as bellow:
> > > > >
> > > > > INT rate
> > > > > 500.perlbench_r -0.32%
> > > > > 502.gcc_r   -1.32%
> > > > > 505.mcf_r   -0.12%
> > > > > 520.omnetpp_r   -0.34%
> > > > > 523.xalancbmk_r -0.65%
> > > > > 525.x264_r  2.23%
> > > > > 531.deepsjeng_r 0.81%
> > > > > 541.leela_r -0.02%
> > > > > 548.exchange2_r 10.89%  --> big improvement
> > > > > 557.xz_r0.38%
> > > > > geomean for intrate 1.10%
> > > > >
> > > > > FP rate
> > > > > 503.bwaves_r1.41%
> > > > > 507.cactuBSSN_r -0.14%
> > > > > 508.namd_r  1.54%
> > > > > 510.parest_r-0.87%
> > > > > 511.povray_r0.28%
> > > > > 519.lbm_r   0.32%
> > > > > 521.wrf_r   -0.54%
> > > > > 526.blender_r   0.59%
> > > > > 527.cam4_r  -2.70%
> > > > > 538.imagick_r   3.92%
> > > > > 544.nab_r   0.59%
> > > > > 549.fotonik3d_r -5.44%  -> regression
> > > > > 554.roms_r  -2.34%
> > > > > geomean for fprate  -0.28%
> > > > >
> > > > > The 10% improvement of 548.exchange_r is because there is 9-layer
> > > > > nested loop, and the loop count for innermost layer is small(enough
> > > > > for 128-bit vectorization, but not for 256-bit vectorization).
> > > > > Since loop count is not statically analyzed out, vectorizer will
> > > > > choose 256-bit vectorization which would never never be triggered. The
> > > > > vectorization of epilog will introduced some extra instructions,
> > > > > normally it will bring back some performance, but since it's 9-layer
> > > > > nested loop, costs of extra instructions will cover the gain.
> > > > >
> > > > > The 5.44% regression of 549.fotonik3d_r is because 256-bit
> > > > > vectorization is better than 128-bit vectorization. Generally when
> > > > > enabling 256-bit or 512-bit vectorization, there will be instruction
> > > > > clocksticks reduction also with frequency reduction. when frequency
> > > > > reduction is less than instructions clocksticks reduction, long vector
> > > > > width vectorization would be better than shorter one, otherwise the
> > > > > opposite. The regression of 549.fotonik3d_r is due to this, similar
> > > > > for 554.roms_r, 528.cam4_r, for those 3 benchmarks, 512-bit
> > > > > vectorization is best.
> > > > >
> > > > > Bootstrap and regression test on i386 is ok.
> > > > > Ok for trunk?
> > > >
> > > > I don't think 128_optimal does what you think it does.  If you want to
> > > > prefer 128bit AVX adjust the preference, but 128_optimal describes
> > > > a microarchitectural detail (AVX256 ops are split into two AVX128 ops)
> > > But it will set target_prefer_avx128 by default.
> > > 
> > > 2694  /* Enable 128-bit AVX instruction generation
> > > 2695 for the auto-vectorizer.  */
> > > 2696  if (TARGET_AVX128_OPTIMAL
> > > 2697  && (opts_set->x_prefer_vector_width_type == PVW_NONE))
> > > 2698opts->x_prefer_vector_width_type = PVW_AVX128;
> > > -
> > > And it may be too confusing to add another tuning flag.
> >
> > Well, it's confusing to mix two things - defaulting the vector width 
> > preference
> > and the architectural detail of Bulldozer and early Zen parts.  So please 
> > split
> > the tuning.  And then re-benchmark with _just_ changing the preference
> Actually, the result is similar, I've test both(patch using
> avx128_optimal and trunk_gcc apply additional
> -mprefer-vector-width=128).
> And i would give a test to see the affect of FDO.

It is hard to tell if 128-bit vector size or 256-bit vector size works better.
For SPEC CPU 2017, 128-bit vector size gives better overall scores.
One can always change vector size, even to 512-bit, as some workloads
are faster with 512-bit vector size.

-- 
H.J.


Re: [PATCH 0/6] Implement asm flag outputs for arm + aarch64

2019-11-12 Thread Richard Sandiford
Richard Henderson  writes:
> I've put the implementation into config/arm/aarch-common.c, so
> that it can be shared between the two targets.  This required
> a little bit of cleanup to the CC modes and constraints to get
> the two targets to match up.
>
> I really should have done more than just x86 years ago, so that
> it would be done now and I could just use it in the kernel...  ;-)

Thanks for doing this, looks great.

Apart from the vc/vs thing you mentioned in the follow-up for 4/6,
it looks like 4/6, 5/6 and 6/6 are missing "hs" and "lo".  OK for
aarch64 with those added.

Richard


Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-12 Thread Martin Sebor

On 11/12/19 10:54 AM, Jeff Law wrote:

On 11/12/19 1:15 AM, Richard Biener wrote:

On Tue, Nov 12, 2019 at 6:10 AM Jeff Law  wrote:


On 11/6/19 3:34 PM, Martin Sebor wrote:

On 11/6/19 2:06 PM, Martin Sebor wrote:

On 11/6/19 1:39 PM, Jeff Law wrote:

On 11/6/19 1:27 PM, Martin Sebor wrote:

On 11/6/19 11:55 AM, Jeff Law wrote:

On 11/6/19 11:00 AM, Martin Sebor wrote:

The -Wstringop-overflow warnings for single-byte and multi-byte
stores mention the amount of data being stored and the amount of
space remaining in the destination, such as:

warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]

 123 |   *p = 0;
 |   ~~~^~~
note: destination object declared here
  45 |   char b[N];
 |^

A warning like this can take some time to analyze.  First, the size
of the destination isn't mentioned and may not be easy to tell from
the sources.  In the note above, when N's value is the result of
some non-trivial computation, chasing it down may be a small project
in and of itself.  Second, it's also not clear why the region size
is zero.  It could be because the offset is exactly N, or because
it's negative, or because it's in some range greater than N.

Mentioning both the size of the destination object and the offset
makes the existing messages clearer, are will become essential when
GCC starts diagnosing overflow into allocated buffers (as my
follow-on patch does).

The attached patch enhances -Wstringop-overflow to do this by
letting compute_objsize return the offset to its caller, doing
something similar in get_stridx, and adding a new function to
the strlen pass to issue this enhanced warning (eventually, I'd
like the function to replace the -Wstringop-overflow handler in
builtins.c).  With the change, the note above might read something
like:

note: at offset 11 to object ‘b’ with size 8 declared here
  45 |   char b[N];
 |^

Tested on x86_64-linux.

Martin

gcc-store-offset.diff

gcc/ChangeLog:

  * builtins.c (compute_objsize): Add an argument and set it to
offset
  into destination.
  * builtins.h (compute_objsize): Add an argument.
  * tree-object-size.c (addr_object_size): Add an argument and
set it
  to offset into destination.
  (compute_builtin_object_size): Same.
  * tree-object-size.h (compute_builtin_object_size): Add an
argument.
  * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
set it
  to offset into destination.
  (maybe_warn_overflow): New function.
  (handle_store): Call maybe_warn_overflow to issue warnings.

gcc/testsuite/ChangeLog:

  * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
messages.
  * g++.dg/warn/Wstringop-overflow-3.C: Same.
  * gcc.dg/Wstringop-overflow-17.c: Same.




Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c(revision 277886)
+++ gcc/tree-ssa-strlen.c(working copy)
@@ -189,6 +189,52 @@ struct laststmt_struct
static int get_stridx_plus_constant (strinfo *, unsigned
HOST_WIDE_INT, tree);
static void handle_builtin_stxncpy (built_in_function,
gimple_stmt_iterator *);
+/* Sets MINMAX to either the constant value or the range VAL
is in
+   and returns true on success.  */
+
+static bool
+get_range (tree val, wide_int minmax[2], const vr_values *rvals =
NULL)
+{
+  if (tree_fits_uhwi_p (val))
+{
+  minmax[0] = minmax[1] = wi::to_wide (val);
+  return true;
+}
+
+  if (TREE_CODE (val) != SSA_NAME)
+return false;
+
+  if (rvals)
+{
+  gimple *def = SSA_NAME_DEF_STMT (val);
+  if (gimple_assign_single_p (def)
+  && gimple_assign_rhs_code (def) == INTEGER_CST)
+{
+  /* get_value_range returns [0, N] for constant
assignments.  */
+  val = gimple_assign_rhs1 (def);
+  minmax[0] = minmax[1] = wi::to_wide (val);
+  return true;
+}

Umm, something seems really off with this hunk.  If the SSA_NAME is
set
via a simple constant assignment, then the range ought to be a
singleton
ie [CONST,CONST].   Is there are particular test were this is not
true?

The only way offhand I could see this happening is if originally
the RHS
wasn't a constant, but due to optimizations it either simplified
into a
constant or a constant was propagated into an SSA_NAME appearing on
the
RHS.  This would have to happen between the last range analysis and
the
point where you're making this query.


Yes, I think that's right.  Here's an example where it happens:

void f (void)
{
  char s[] = "1234";
  unsigned n = strlen (s);
  char vla[n];   // or malloc (n)
  vla[n] = 0;// n = [4, 4]
  ...
}

The strlen call is folded to 4 but that's not propagated to
the access until sometime after the strlen pass is done.

Hmm.  Are we calling set_range_info in that case?  That goes behind the
back of pass instance of vr_values.  If so, that might argue we want to
be setting it in vr_values 

handle VIEW_CONVERT_EXPR in debug_node

2019-11-12 Thread Ulrich Drepper
I am using debug_node() to emit the tree of functions for later
processing.  For this I need all the information to be present.  So far
I came across one expression type that isn't handled correctly.  For
VIEW_CONVERT_EXPR only the type value is printed, not the first tree
operand.  The following patch fixes this.

OK?

2019-11-12  Ulrich Drepper  

* tree-dump.c (dequeue_and_dump): Print first tree operand
for VIEW_CONVERT_EXPR.

diff --git a/gcc/tree-dump.c b/gcc/tree-dump.c
index 51c0965861f..83eb29b7e2b 100644
--- a/gcc/tree-dump.c
+++ b/gcc/tree-dump.c
@@ -561,6 +561,7 @@ dequeue_and_dump (dump_info_p di)
 case ADDR_EXPR:
 case INDIRECT_REF:
 case CLEANUP_POINT_EXPR:
+case VIEW_CONVERT_EXPR:
 case SAVE_EXPR:
 case REALPART_EXPR:
 case IMAGPART_EXPR:



signature.asc
Description: OpenPGP digital signature


Fix ICE in ipa-cp when mixing -O0 and -O2 code in LTO

2019-11-12 Thread Jan Hubicka
Hi,
this fixes second ICE seen during profiledbootstrap with Ada enabled.
The problem is that Ada builds some object files with -O2 -O0 (not sure
why) and these functions get flag_ipa_cp but !optimize.   Because of -O0
ipa-cp pass is not run at compile time and summaries are never produced.
So In addition to !flag_ipa_cp functions we also need to punt on
!optimize.  This is something we may want to clean up next stage1.

Bootstrapped/regtesetd x86_64-linux, comited

* ipa-cp.c (ignore_edge_p): Also look for optimize flag.
(ipcp_verify_propagated_values): Likewise.
(propagate_constants_across_call): Likewise.
(propagate_constants_topo): Likewise.
(ipcp_propagate_stage): Likewise.
Index: ipa-cp.c
===
--- ipa-cp.c(revision 278094)
+++ ipa-cp.c(working copy)
@@ -816,6 +816,8 @@ ignore_edge_p (cgraph_edge *e)
 = e->callee->function_or_virtual_thunk_symbol (, e->caller);
 
   return (avail <= AVAIL_INTERPOSABLE
+ || !opt_for_fn (e->caller->decl, optimize)
+ || !opt_for_fn (ultimate_target->decl, optimize)
  || !opt_for_fn (e->caller->decl, flag_ipa_cp)
  || !opt_for_fn (ultimate_target->decl, flag_ipa_cp));
 }
@@ -1471,7 +1473,8 @@ ipcp_verify_propagated_values (void)
   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
 {
   class ipa_node_params *info = IPA_NODE_REF (node);
-  if (!opt_for_fn (node->decl, flag_ipa_cp))
+  if (!opt_for_fn (node->decl, flag_ipa_cp)
+ || !opt_for_fn (node->decl, optimize))
continue;
   int i, count = ipa_get_param_count (info);
 
@@ -2316,7 +2319,9 @@ propagate_constants_across_call (struct
   parms_count = ipa_get_param_count (callee_info);
   if (parms_count == 0)
 return false;
-  if (!args)
+  if (!args
+  || !opt_for_fn (cs->caller->decl, flag_ipa_cp)
+  || !opt_for_fn (cs->caller->decl, optimize))
 {
   for (i = 0; i < parms_count; i++)
ret |= set_all_contains_variable (ipa_get_parm_lattices (callee_info,
@@ -3238,7 +3243,8 @@ propagate_constants_topo (class ipa_topo
   FOR_EACH_VEC_ELT (cycle_nodes, j, v)
if (v->has_gimple_body_p ())
  {
-   if (opt_for_fn (v->decl, flag_ipa_cp))
+   if (opt_for_fn (v->decl, flag_ipa_cp)
+   && opt_for_fn (v->decl, optimize))
  push_node_to_stack (topo, v);
/* When V is not optimized, we can not push it to stac, but
   still we need to set all its callees lattices to bottom.  */
@@ -3269,7 +3275,8 @@ propagate_constants_topo (class ipa_topo
 their topological sort.  */
   FOR_EACH_VEC_ELT (cycle_nodes, j, v)
if (v->has_gimple_body_p ()
-   && opt_for_fn (v->decl, flag_ipa_cp))
+   && opt_for_fn (v->decl, flag_ipa_cp)
+   && opt_for_fn (v->decl, optimize))
  {
struct cgraph_edge *cs;
 
@@ -3348,7 +3355,9 @@ ipcp_propagate_stage (class ipa_topo_inf
 
   FOR_EACH_DEFINED_FUNCTION (node)
   {
-if (node->has_gimple_body_p () && opt_for_fn (node->decl, flag_ipa_cp))
+if (node->has_gimple_body_p ()
+   && opt_for_fn (node->decl, flag_ipa_cp)
+   && opt_for_fn (node->decl, optimize))
   {
 class ipa_node_params *info = IPA_NODE_REF (node);
 determine_versionability (node, info);


[PATCH] contrib/download_prerequisites: Use http instead of ftp

2019-11-12 Thread Janne Blomqvist
Convert the download_prerequisites script to use http instead of
ftp. This works better with firewalls, proxies, and so on. It's also
faster, a quick test on my system before patch:

time contrib/download_prerequisites --directory=/tmp/foo --force
...
real0m17,843s

After patch:

time contrib/download_prerequisites --directory=/tmp/foo --force
...
real0m11,059s

(fastest of three runs)

Question: Should we in fact use https? I haven't used it since
download_prerequisites checks that the sha512/md5 matches the ones in
the GCC tree, but maybe there are reasons? Even https is in fact
faster than ftp:

time contrib/download_prerequisites --directory=/tmp/foo --force
...
real0m12,729s
---
 contrib/download_prerequisites | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/contrib/download_prerequisites b/contrib/download_prerequisites
index 72976c46c92..aa0356e6266 100755
--- a/contrib/download_prerequisites
+++ b/contrib/download_prerequisites
@@ -32,7 +32,7 @@ mpfr='mpfr-3.1.4.tar.bz2'
 mpc='mpc-1.0.3.tar.gz'
 isl='isl-0.18.tar.bz2'
 
-base_url='ftp://gcc.gnu.org/pub/gcc/infrastructure/'
+base_url='http://gcc.gnu.org/pub/gcc/infrastructure/'
 
 echo_archives() {
 echo "${gmp}"
@@ -58,7 +58,7 @@ esac
 if type wget > /dev/null ; then
   fetch='wget'
 else
-  fetch='curl -LO -u anonymous:'
+  fetch='curl -LO'
 fi
 chksum_extension='sha512'
 directory='.'
-- 
2.17.1



Watch for missing summaires in ipa-profile

2019-11-12 Thread Jan Hubicka
Hi,
this patch fixes ICE when callee summary is missing in ipa-profile
(and fixes previously latent bug on handling thunks)

Bootstrapped/regtested x86_64-linux, comitted.

Honza

PR ipa/92471
* ipa-profile.c (check_argument_count): Break out from ...;
watch for missing summaries.
(ipa_profile): Here.
Index: ipa-profile.c
===
--- ipa-profile.c   (revision 278094)
+++ ipa-profile.c   (working copy)
@@ -476,6 +476,27 @@ ipa_propagate_frequency (struct cgraph_n
   return changed;
 }
 
+/* Check that number of arguments of N agrees with E.
+   Be conservative when summaries are not present.  */
+
+static bool
+check_argument_count (struct cgraph_node *n, struct cgraph_edge *e)
+{
+  if (!ipa_node_params_sum || !ipa_edge_args_sum)
+return true;
+  class ipa_node_params *info = IPA_NODE_REF (n->function_symbol ());
+  if (!info)
+return true;
+  ipa_edge_args *e_info = IPA_EDGE_REF (e);
+  if (!e)
+return true;
+  if (ipa_get_param_count (info) != ipa_get_cs_argument_count (e_info)
+  && (ipa_get_param_count (info) >= ipa_get_cs_argument_count (e_info)
+ || !stdarg_p (TREE_TYPE (n->decl
+return false;
+  return true;
+}
+
 /* Simple ipa profile pass propagating frequencies across the callgraph.  */
 
 static unsigned int
@@ -599,14 +620,7 @@ ipa_profile (void)
 "Not speculating: target is overwritable "
 "and can be discarded.\n");
}
- else if (ipa_node_params_sum && ipa_edge_args_sum
-  && (!vec_safe_is_empty
-  (IPA_NODE_REF (n2)->descriptors))
-  && ipa_get_param_count (IPA_NODE_REF (n2))
- != ipa_get_cs_argument_count (IPA_EDGE_REF (e))
-   && (ipa_get_param_count (IPA_NODE_REF (n2))
-   >= ipa_get_cs_argument_count (IPA_EDGE_REF (e))
-   || !stdarg_p (TREE_TYPE (n2->decl
+ else if (check_argument_count (n2, e))
{
  nmismatch++;
  if (dump_file)


Re: [PATCH 5/7] Remove last leftover usage of params* files.

2019-11-12 Thread Martin Liška

On 11/12/19 4:39 PM, Harwath, Frederik wrote:

Hi Martin,

On 06.11.19 13:40, Martin Liska wrote:


(finalize_options_struct): Remove.


This patch has been committed by now, but it seems that a single use of 
finalize_options_struct has been overlooked
in gcc/tree-streamer-in.c.


Thank you for heads up. I'll fix it tomorrow.

Martin



Best regards,
Frederik





Re: Teach ipa-cp to propagate value ranges over binary operaitons too

2019-11-12 Thread Jan Hubicka
> 
> well, it all relies on the simple fact that arithmetic jump function
> discovery does not cross NOP_EXPRs, or any chain of assignments, so they
> are only constructed from things like
> 
>   _2 = param_2(D) + 4;
>   bar (_2);
> 
> but not from
> 
>   _1 = (NOP_EXPR) param_2(D);
>   _2 =  _1 + 4;
>   bar (_2);
> 
> I understand that this is getting ever more fragile and that we may want
> to remove this limitation.  But there is of course the tradeoff of how
> many types we want to stream.  But currently there is no type2 or type3.

I see, so type2==type3==type1 and we need only one type conversion which
I do.

Note that since all types (modulo variadic ones which tends to be
horribly broken with LTO) go into global stream streaming an extra type
is basically costing you a pointer in memory and one index in stream.

Honza
> 
> >> 
> >> Also I noticed that we use NOP_EXPR to convert from type1 all the way to 
> >> type4
> >> while ipa-fnsummary uses VIEW_CONVERT_EXPR to convert type3 to type4 that 
> >> seems
> >> more valid here.
> 
> Perhaps.
> 
> > However VR folders always returns varying on VIEW_CONVERT_EXPR
> >> (which is probably something that can be fixed)
> >> 
> >> Bootstrapped/regtested x86_64-linux. Does this look OK?
> 
> Yes, it does.
> 
> >> 
> >> Honza
> >>* ipa-cp.c (propagate_vr_across_jump_function): Also propagate
> >>binary operations.
> >> 
> >> Index: ipa-cp.c
> >> ===
> >> --- ipa-cp.c   (revision 278094)
> >> +++ ipa-cp.c   (working copy)
> >> @@ -1974,23 +2039,51 @@ propagate_vr_across_jump_function (cgrap
> >>if (jfunc->type == IPA_JF_PASS_THROUGH)
> >>  {
> >>enum tree_code operation = ipa_get_jf_pass_through_operation 
> >> (jfunc);
> >> +  class ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
> >> +  int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
> >> +  class ipcp_param_lattices *src_lats
> >> +  = ipa_get_parm_lattices (caller_info, src_idx);
> >> +  tree operand_type = ipa_get_type (caller_info, src_idx);
> >>  
> >> +  if (src_lats->m_value_range.bottom_p ())
> >> +  return dest_lat->set_to_bottom ();
> >> +
> >> +  value_range vr;
> >>if (TREE_CODE_CLASS (operation) == tcc_unary)
> >>{
> >> -class ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
> >> -int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
> >> -tree operand_type = ipa_get_type (caller_info, src_idx);
> >> -class ipcp_param_lattices *src_lats
> >> -  = ipa_get_parm_lattices (caller_info, src_idx);
> >> -
> >> -if (src_lats->m_value_range.bottom_p ())
> >> -  return dest_lat->set_to_bottom ();
> >> -value_range vr;
> >> -if (ipa_vr_operation_and_type_effects (,
> >> -   _lats->m_value_range.m_vr,
> >> -   operation, param_type,
> >> -   operand_type))
> >> -  return dest_lat->meet_with ();
> >> +ipa_vr_operation_and_type_effects (,
> >> +   _lats->m_value_range.m_vr,
> >> +   operation, param_type,
> >> +   operand_type);
> >> +  }
> >> +  /* A crude way to prevent unbounded number of value range updates
> >> +   in SCC components.  We should allow limited number of updates within
> >> +   SCC, too.  */
> >> +  else if (!ipa_edge_within_scc (cs))
> >> +  {
> >> +tree op = ipa_get_jf_pass_through_operand (jfunc);
> >> +value_range op_vr (op, op);
> >> +value_range op_res,res;
> >> +
> >
> > Do we really know operation is tcc_binary here?
> 
> it is either NOP_EXPR (which in jump functions is basically a marker
> that there really is no operation going on), or something coming from
> GIMPLE_UNARY_RHS - which hopefully is tcc_unary and here we can
> conveniently share the same path for NOP_EXPR too - or from
> GIMPLE_BINARY_RHS, which hopefully is tcc_binary.  But an assert may be
> reasonable here.
> 
> >
> >> +range_fold_binary_expr (_res, operation, operand_type,
> >> +_lats->m_value_range.m_vr, _vr);
> >> +ipa_vr_operation_and_type_effects (,
> >> +   _res,
> >> +   NOP_EXPR, param_type,
> >> +   operand_type);
> 
> Martin


Re: [PATCH] implement -Wrestrict for sprintf (PR 83688)

2019-11-12 Thread Martin Sebor

On 11/12/19 10:22 AM, Martin Sebor wrote:


Committed in r278098.


I thought I'd tested the kernel with the patch before and got
no warnings, so having rebuilt it again just now I'm surprised
to see the 16 instances below (7 of which are distinct).  I'm
happy to report that none looks like a false positive.

Martin

drivers/input/joystick/analog.c:428:3: warning: ‘snprintf’ argument 4 
overlaps destination object ‘’ [-Wrestrict]

  428 |   snprintf(analog->name, sizeof(analog->name), "%s %d-hat",
  |   ^
  429 | analog->name, hweight16(analog->mask & ANALOG_HATS_ALL));
--
drivers/leds/led-class-flash.c:212:9: warning: ‘sprintf’ argument 3 
overlaps destination object ‘buf’ [-Wrestrict]

  212 |  return sprintf(buf, "%s\n", buf);
  | ^
drivers/leds/led-class-flash.c:189:40: note: destination object 
referenced by ‘restrict’-qualified argument 1 was declared here

--
drivers/thunderbolt/xdomain.c:656:9: warning: ‘sprintf’ argument 3 
overlaps destination object ‘buf’ [-Wrestrict]

  656 |  return sprintf(buf, "%s\n", buf);
  | ^
drivers/thunderbolt/xdomain.c:650:15: note: destination object 
referenced by ‘restrict’-qualified argument 1 was declared here

--
./drivers/staging/rtl8723bs/include/osdep_service.h:267:48: warning: 
‘snprintf’ argument 4 overlaps destination object ‘thread_name’ [-Wrestrict]
  267 | #define rtw_sprintf(buf, size, format, arg...) snprintf(buf, 
size, format, ##arg)
  | 
^~
drivers/staging/rtl8723bs/hal/rtl8723bs_xmit.c:486:2: note: in expansion 
of macro ‘rtw_sprintf’

--
drivers/net/wireless/ti/wl18xx/../wlcore/debugfs.h:86:9: warning: 
‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict]

   86 |   res = snprintf(buf, sizeof(buf), "%s[%d] = %d\n", \
  | ^
   87 |   buf, i, stats->sub.name[i]);  \
--
drivers/net/wireless/ti/wl18xx/../wlcore/debugfs.h:86:9: warning: 
‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict]

   86 |   res = snprintf(buf, sizeof(buf), "%s[%d] = %d\n", \
  | ^
   87 |   buf, i, stats->sub.name[i]);  \
--
drivers/net/wireless/ti/wl18xx/../wlcore/debugfs.h:86:9: warning: 
‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict]

   86 |   res = snprintf(buf, sizeof(buf), "%s[%d] = %d\n", \
  | ^
   87 |   buf, i, stats->sub.name[i]);  \
--
drivers/net/wireless/ti/wl18xx/../wlcore/debugfs.h:86:9: warning: 
‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict]

   86 |   res = snprintf(buf, sizeof(buf), "%s[%d] = %d\n", \
  | ^
   87 |   buf, i, stats->sub.name[i]);  \
--
drivers/net/wireless/ti/wl18xx/../wlcore/debugfs.h:86:9: warning: 
‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict]

   86 |   res = snprintf(buf, sizeof(buf), "%s[%d] = %d\n", \
  | ^
   87 |   buf, i, stats->sub.name[i]);  \
--
drivers/net/wireless/ti/wl18xx/../wlcore/debugfs.h:86:9: warning: 
‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict]

   86 |   res = snprintf(buf, sizeof(buf), "%s[%d] = %d\n", \
  | ^
   87 |   buf, i, stats->sub.name[i]);  \
--
drivers/net/wireless/ti/wl18xx/../wlcore/debugfs.h:86:9: warning: 
‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict]

   86 |   res = snprintf(buf, sizeof(buf), "%s[%d] = %d\n", \
  | ^
   87 |   buf, i, stats->sub.name[i]);  \
--
drivers/net/wireless/ti/wl18xx/../wlcore/debugfs.h:86:9: warning: 
‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict]

   86 |   res = snprintf(buf, sizeof(buf), "%s[%d] = %d\n", \
  | ^
   87 |   buf, i, stats->sub.name[i]);  \
--
drivers/net/wireless/ti/wl18xx/../wlcore/debugfs.h:86:9: warning: 
‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict]

   86 |   res = snprintf(buf, sizeof(buf), "%s[%d] = %d\n", \
  | ^
   87 |   buf, i, stats->sub.name[i]);  \
--
drivers/net/wireless/ti/wl18xx/../wlcore/debugfs.h:86:9: warning: 
‘snprintf’ argument 4 overlaps destination object ‘buf’ [-Wrestrict]

   86 |   res = snprintf(buf, sizeof(buf), "%s[%d] = %d\n", \
  | ^
   87 |   buf, i, stats->sub.name[i]);  \
--
drivers/net/wireless/ti/wlcore/boot.c:113:4: warning: ‘snprintf’ 
argument 4 overlaps destination object ‘min_fw_str’ [-Wrestrict]

  113 |

Re: Teach ipa-cp to propagate value ranges over binary operaitons too

2019-11-12 Thread Martin Jambor
Hi,

On Tue, Nov 12 2019, Richard Biener wrote:
> On Tue, 12 Nov 2019, Jan Hubicka wrote:
>
>> Hi,
>> this patch adds propagation of value ranges through binary operations.
>> This is disabled for value ranges within SCC to avoid infinite loop during
>> propagation.  I am bit worried about types here.  As far as I can say we
>> have something like
>> 
>> VR in lattice of type1
>> foo (type1 param)
>> {
>>   bar ((type3)((type2)param+(type2)4))
>> }
>> bar (type4 param)
>> {
>>use param
>> }
>> 
>> Now in code type1 is called "operand_type" and type4 is called param_type.
>> The arithmetics always happens in operand_type but I do not see why these
>> needs to be necessarily the same?  Anyway this immitates what 
>> constant jump functions does.

well, it all relies on the simple fact that arithmetic jump function
discovery does not cross NOP_EXPRs, or any chain of assignments, so they
are only constructed from things like

  _2 = param_2(D) + 4;
  bar (_2);

but not from

  _1 = (NOP_EXPR) param_2(D);
  _2 =  _1 + 4;
  bar (_2);

I understand that this is getting ever more fragile and that we may want
to remove this limitation.  But there is of course the tradeoff of how
many types we want to stream.  But currently there is no type2 or type3.

>> 
>> Also I noticed that we use NOP_EXPR to convert from type1 all the way to 
>> type4
>> while ipa-fnsummary uses VIEW_CONVERT_EXPR to convert type3 to type4 that 
>> seems
>> more valid here.

Perhaps.

> However VR folders always returns varying on VIEW_CONVERT_EXPR
>> (which is probably something that can be fixed)
>> 
>> Bootstrapped/regtested x86_64-linux. Does this look OK?

Yes, it does.

>> 
>> Honza
>>  * ipa-cp.c (propagate_vr_across_jump_function): Also propagate
>>  binary operations.
>> 
>> Index: ipa-cp.c
>> ===
>> --- ipa-cp.c (revision 278094)
>> +++ ipa-cp.c (working copy)
>> @@ -1974,23 +2039,51 @@ propagate_vr_across_jump_function (cgrap
>>if (jfunc->type == IPA_JF_PASS_THROUGH)
>>  {
>>enum tree_code operation = ipa_get_jf_pass_through_operation (jfunc);
>> +  class ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
>> +  int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
>> +  class ipcp_param_lattices *src_lats
>> += ipa_get_parm_lattices (caller_info, src_idx);
>> +  tree operand_type = ipa_get_type (caller_info, src_idx);
>>  
>> +  if (src_lats->m_value_range.bottom_p ())
>> +return dest_lat->set_to_bottom ();
>> +
>> +  value_range vr;
>>if (TREE_CODE_CLASS (operation) == tcc_unary)
>>  {
>> -  class ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
>> -  int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
>> -  tree operand_type = ipa_get_type (caller_info, src_idx);
>> -  class ipcp_param_lattices *src_lats
>> -= ipa_get_parm_lattices (caller_info, src_idx);
>> -
>> -  if (src_lats->m_value_range.bottom_p ())
>> -return dest_lat->set_to_bottom ();
>> -  value_range vr;
>> -  if (ipa_vr_operation_and_type_effects (,
>> - _lats->m_value_range.m_vr,
>> - operation, param_type,
>> - operand_type))
>> -return dest_lat->meet_with ();
>> +  ipa_vr_operation_and_type_effects (,
>> + _lats->m_value_range.m_vr,
>> + operation, param_type,
>> + operand_type);
>> +}
>> +  /* A crude way to prevent unbounded number of value range updates
>> + in SCC components.  We should allow limited number of updates within
>> + SCC, too.  */
>> +  else if (!ipa_edge_within_scc (cs))
>> +{
>> +  tree op = ipa_get_jf_pass_through_operand (jfunc);
>> +  value_range op_vr (op, op);
>> +  value_range op_res,res;
>> +
>
> Do we really know operation is tcc_binary here?

it is either NOP_EXPR (which in jump functions is basically a marker
that there really is no operation going on), or something coming from
GIMPLE_UNARY_RHS - which hopefully is tcc_unary and here we can
conveniently share the same path for NOP_EXPR too - or from
GIMPLE_BINARY_RHS, which hopefully is tcc_binary.  But an assert may be
reasonable here.

>
>> +  range_fold_binary_expr (_res, operation, operand_type,
>> +  _lats->m_value_range.m_vr, _vr);
>> +  ipa_vr_operation_and_type_effects (,
>> + _res,
>> + NOP_EXPR, param_type,
>> + operand_type);

Martin


Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-12 Thread Jeff Law
On 11/12/19 1:15 AM, Richard Biener wrote:
> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law  wrote:
>>
>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
 On 11/6/19 1:39 PM, Jeff Law wrote:
> On 11/6/19 1:27 PM, Martin Sebor wrote:
>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
 The -Wstringop-overflow warnings for single-byte and multi-byte
 stores mention the amount of data being stored and the amount of
 space remaining in the destination, such as:

 warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]

 123 |   *p = 0;
 |   ~~~^~~
 note: destination object declared here
  45 |   char b[N];
 |^

 A warning like this can take some time to analyze.  First, the size
 of the destination isn't mentioned and may not be easy to tell from
 the sources.  In the note above, when N's value is the result of
 some non-trivial computation, chasing it down may be a small project
 in and of itself.  Second, it's also not clear why the region size
 is zero.  It could be because the offset is exactly N, or because
 it's negative, or because it's in some range greater than N.

 Mentioning both the size of the destination object and the offset
 makes the existing messages clearer, are will become essential when
 GCC starts diagnosing overflow into allocated buffers (as my
 follow-on patch does).

 The attached patch enhances -Wstringop-overflow to do this by
 letting compute_objsize return the offset to its caller, doing
 something similar in get_stridx, and adding a new function to
 the strlen pass to issue this enhanced warning (eventually, I'd
 like the function to replace the -Wstringop-overflow handler in
 builtins.c).  With the change, the note above might read something
 like:

 note: at offset 11 to object ‘b’ with size 8 declared here
  45 |   char b[N];
 |^

 Tested on x86_64-linux.

 Martin

 gcc-store-offset.diff

 gcc/ChangeLog:

  * builtins.c (compute_objsize): Add an argument and set it to
 offset
  into destination.
  * builtins.h (compute_objsize): Add an argument.
  * tree-object-size.c (addr_object_size): Add an argument and
 set it
  to offset into destination.
  (compute_builtin_object_size): Same.
  * tree-object-size.h (compute_builtin_object_size): Add an
 argument.
  * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
 set it
  to offset into destination.
  (maybe_warn_overflow): New function.
  (handle_store): Call maybe_warn_overflow to issue warnings.

 gcc/testsuite/ChangeLog:

  * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
 messages.
  * g++.dg/warn/Wstringop-overflow-3.C: Same.
  * gcc.dg/Wstringop-overflow-17.c: Same.

>>>
 Index: gcc/tree-ssa-strlen.c
 ===
 --- gcc/tree-ssa-strlen.c(revision 277886)
 +++ gcc/tree-ssa-strlen.c(working copy)
 @@ -189,6 +189,52 @@ struct laststmt_struct
static int get_stridx_plus_constant (strinfo *, unsigned
 HOST_WIDE_INT, tree);
static void handle_builtin_stxncpy (built_in_function,
 gimple_stmt_iterator *);
+/* Sets MINMAX to either the constant value or the range VAL
 is in
 +   and returns true on success.  */
 +
 +static bool
 +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
 NULL)
 +{
 +  if (tree_fits_uhwi_p (val))
 +{
 +  minmax[0] = minmax[1] = wi::to_wide (val);
 +  return true;
 +}
 +
 +  if (TREE_CODE (val) != SSA_NAME)
 +return false;
 +
 +  if (rvals)
 +{
 +  gimple *def = SSA_NAME_DEF_STMT (val);
 +  if (gimple_assign_single_p (def)
 +  && gimple_assign_rhs_code (def) == INTEGER_CST)
 +{
 +  /* get_value_range returns [0, N] for constant
 assignments.  */
 +  val = gimple_assign_rhs1 (def);
 +  minmax[0] = minmax[1] = wi::to_wide (val);
 +  return true;
 +}
>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
>>> set
>>> via a simple constant assignment, then the range ought to be a

Re: [8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes

2019-11-12 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Oct 30, 2019 at 4:58 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Fri, Oct 25, 2019 at 2:37 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> This is another patch in the series to remove the assumption that
>> >> all modes involved in vectorisation have to be the same size.
>> >> Rather than have the target provide a list of vector sizes,
>> >> it makes the target provide a list of vector "approaches",
>> >> with each approach represented by a mode.
>> >>
>> >> A later patch will pass this mode to targetm.vectorize.related_mode
>> >> to get the vector mode for a given element mode.  Until then, the modes
>> >> simply act as an alternative way of specifying the vector size.
>> >
>> > Is there a restriction to use integer vector modes for the hook
>> > or would FP vector modes be OK as well?
>>
>> Conceptually, each mode returned by the hook represents a set of vector
>> modes, with the set containing one member for each supported element
>> type.  The idea is to represent the set using the member with the
>> smallest element type, preferring integer modes over floating-point
>> modes in the event of a tie.  So using a floating-point mode as the
>> representative mode is fine if floating-point elements are the smallest
>> (or only) supported element type.
>>
>> > Note that your x86 change likely disables word_mode vectorization with
>> > -mno-sse?
>>
>> No, that still works, because...
>>
>> > That is, how do we represent GPR vectorization "size" here?
>> > The preferred SIMD mode hook may return an integer mode,
>> > are non-vector modes OK for autovectorize_vector_modes?
>>
>> ...at least with all current targets, preferred_simd_mode is only
>> an integer mode if the target has no "real" vectorisation support
>> for that element type.  There's no need to handle that case in
>> autovectorize_vector_sizes/modes, and e.g. the x86 hook does nothing
>> when SSE is disabled.
>>
>> So while preferred_simd_mode can continue to return integer modes,
>> autovectorize_vector_modes always returns vector modes.
>
> Hmm, I see.  IIRC I was playing with a patch for x86 that
> enabled word-mode vectorization (64bits) for SSE before (I see
> we don't do that at the moment).  The MMX-with-SSE has made
> that somewhat moot but with iterating over modes we could
> even make MMX-with-SSE (MMX modes) and word-mode vectors
> coexist by allowing the hook to return V4SI, V2SI, DImode?
> Because MMX-with-SSE might be more costly than word-mode
> but can of course handle more cases.
>
> So you say the above isn't supported and cannot be made supported?

It isn't supported as things stand.  It shouldn't be hard to make
it work, but I'm not sure what the best semantics would be.

AIUI, before the series, returning word_mode from preferred_simd_mode
just means that vectors should have the same size as word_mode.  If the
target defines V2SI, we'll use that as the raw type mode for SI vectors,
regardless of whether V2SI is enabled.  If the mode *is* enabled,
the TYPE_MODE will also be V2SI and so returning word_mode from
preferred_simd_mode is equivalent to returning V2SImode.  If the mode
isn't enabled, the TYPE_MODE will be word_mode if that's suitable and
BLKmode otherwise.

The situation's similar for SF; if the target defines and supports V2SF,
returning word_mode would be equivalent to returning V2SFmode.

But it sounds like returning word_mode for the new hook would behave
differently, in that we'd force the raw type mode to be DImode even
if V2SImode is defined and supported.  So what should happen for float
types?  Should we reject those, or behave as above and apply the usual
mode_for_vector treatment for a word_mode-sized vector?

If code contains a mixture of HImode and SImode elements, should
we use DImode for both of them, or SImode for HImode elements?
Should the modes be passed to the target's related_vector_mode
hook in the same way as for vectors, or handled before then?

I could implement one of these.  I'm just not sure it'd turn out
to be the right one, once someone actually tries to use it. :-)

FWIW, another way of doing the same thing would be to define
emulated vector modes, e.g. EMUL_V2SI, giving them a lower
priority than the real V2SI.  This is already possible with
VECTOR_MODES_WITH_PREFIX.  Because these emulated modes would
be permanently unsupported, the associated TYPE_MODE would always
be the equivalent integer mode (if appropriate).  So we could force
integer modes that way too.  This has the advantage that we never lose
sight of what the element type is, and so can choose between pairing
EMUL_V2SI and EMUL_V4HI vs. pairing EMUL_V2SI and EMUL_V2HI,
just like we can for "real" vector modes.

Of course that's "a bit" of a hack.  But then so IMO is using integer
modes for this kind of choice. :-)

Another option I'd considered was having the hook return a list of
abstract identifiers that are only meaningful to the target, either
with accompanying 

Re: [PATCH 5/7 libgomp,amdgcn] Optimize GCN OpenMP malloc performance

2019-11-12 Thread Andrew Stubbs

On 12/11/2019 13:56, Jakub Jelinek wrote:

s/reeduced/reduced/


Done.


Not really sure if it is a good idea to print anything, at least not when
in some debugging mode.  I mean, it is fairly easy to write code that will
trigger this.  And, what is the reason why you can't free the
gomp_malloced memory, like comparing if the team_freed pointer is in between
TEAM_ARENA_START and TEAM_ARENA_END or similar, don't do anything in that
case, otherwise use free?


Falling back to malloc is a big performance hit. There's a global lock 
affecting all teams in all running kernels. If we're running into this 
then a) I want to know about it so I can tune the arena size, and b) I 
want the user to know why performance is suddenly worse.


At least for now, I want to keep the message. I've updated the comment 
though.



+  /* Clear the allocated memory.
+ This should vectorize.  The allocation has been rounded up to the next
+ 4-byte boundary, so this is safe.  */
+  for (int i = 0; i

Formatting (spaces around <, +=, +, between int and *.  Shouldn't 4 be
sizeof (int)?  And wouldn't memset (result, 0, size); do the same job?


I wanted to ensure that the loop would vectorize inline, but I don't 
think it was doing so anyway. I need to look at that, but how is this, 
for now?


Andrew
Optimize GCN OpenMP malloc performance

2019-11-12  Andrew Stubbs  

	libgomp/
	* config/gcn/team.c (gomp_gcn_enter_kernel): Set up the team arena
	and use team_malloc variants.
	(gomp_gcn_exit_kernel): Use team_free.
	* libgomp.h (TEAM_ARENA_SIZE): Define.
	(TEAM_ARENA_START): Define.
	(TEAM_ARENA_FREE): Define.
	(TEAM_ARENA_END): Define.
	(team_malloc): New function.
	(team_malloc_cleared): New function.
	(team_free): New function.
	* team.c (gomp_new_team): Initialize and use team_malloc.
	(free_team): Use team_free.
	(gomp_free_thread): Use team_free.
	(gomp_pause_host): Use team_free.
	* work.c (gomp_init_work_share): Use team_malloc.
	(gomp_fini_work_share): Use team_free.

diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c
index c566482bda2..20d419198e0 100644
--- a/libgomp/config/gcn/team.c
+++ b/libgomp/config/gcn/team.c
@@ -57,16 +57,28 @@ gomp_gcn_enter_kernel (void)
   /* Starting additional threads is not supported.  */
   gomp_global_icv.dyn_var = true;
 
+  /* Initialize the team arena for optimized memory allocation.
+ The arena has been allocated on the host side, and the address
+ passed in via the kernargs.  Each team takes a small slice of it.  */
+  register void **kernargs asm("s8");
+  void *team_arena = (kernargs[4] + TEAM_ARENA_SIZE*teamid);
+  void * __lds *arena_start = (void * __lds *)TEAM_ARENA_START;
+  void * __lds *arena_free = (void * __lds *)TEAM_ARENA_FREE;
+  void * __lds *arena_end = (void * __lds *)TEAM_ARENA_END;
+  *arena_start = team_arena;
+  *arena_free = team_arena;
+  *arena_end = team_arena + TEAM_ARENA_SIZE;
+
   /* Allocate and initialize the team-local-storage data.  */
-  struct gomp_thread *thrs = gomp_malloc_cleared (sizeof (*thrs)
+  struct gomp_thread *thrs = team_malloc_cleared (sizeof (*thrs)
 		  * numthreads);
   set_gcn_thrs (thrs);
 
   /* Allocate and initailize a pool of threads in the team.
  The threads are already running, of course, we just need to manage
  the communication between them.  */
-  struct gomp_thread_pool *pool = gomp_malloc (sizeof (*pool));
-  pool->threads = gomp_malloc (sizeof (void *) * numthreads);
+  struct gomp_thread_pool *pool = team_malloc (sizeof (*pool));
+  pool->threads = team_malloc (sizeof (void *) * numthreads);
   for (int tid = 0; tid < numthreads; tid++)
 	pool->threads[tid] = [tid];
   pool->threads_size = numthreads;
@@ -91,7 +103,7 @@ void
 gomp_gcn_exit_kernel (void)
 {
   gomp_free_thread (gcn_thrs ());
-  free (gcn_thrs ());
+  team_free (gcn_thrs ());
 }
 
 /* This function contains the idle loop in which a thread waits
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 19e1241ee4c..bab733d2b2d 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -106,6 +106,69 @@ extern void gomp_aligned_free (void *);
GCC's builtin alloca().  */
 #define gomp_alloca(x)  __builtin_alloca(x)
 
+/* Optimized allocators for team-specific data that will die with the team.  */
+
+#ifdef __AMDGCN__
+/* The arena is initialized in config/gcn/team.c.  */
+#define TEAM_ARENA_SIZE  64*1024  /* Must match the value in plugin-gcn.c.  */
+#define TEAM_ARENA_START 16  /* LDS offset of free pointer.  */
+#define TEAM_ARENA_FREE  24  /* LDS offset of free pointer.  */
+#define TEAM_ARENA_END   32  /* LDS offset of end pointer.  */
+
+static inline void * __attribute__((malloc))
+team_malloc (size_t size)
+{
+  /* 4-byte align the size.  */
+  size = (size + 3) & ~3;
+
+  /* Allocate directly from the arena.
+ The compiler does not support DS atomics, yet. */
+  void *result;
+  asm 

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-12 Thread Dennis Zhang
Hi Kyrill,

On 12/11/2019 15:57, Kyrill Tkachov wrote:
> 
> On 11/12/19 3:50 PM, Dennis Zhang wrote:
>> Hi Kyrill,
>>
>> On 12/11/2019 09:40, Kyrill Tkachov wrote:
>>> Hi Dennis,
>>>
>>> On 11/7/19 1:48 PM, Dennis Zhang wrote:
 Hi Kyrill,

 I have rebased the patch on top of current truck.
 For resolve_overloaded, I redefined my memtag overloading function to
 fit the latest resolve_overloaded_builtin interface.

 Regression tested again and survived for aarch64-none-linux-gnu.
>>> Please reply inline rather than top-posting on gcc-patches.
>>>
>>>
 Cheers
 Dennis

 Changelog is updated as following:

 gcc/ChangeLog:

 2019-11-07  Dennis Zhang  

  * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
  AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
  AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
  AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
  AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
  (aarch64_init_memtag_builtins): New.
  (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
  (aarch64_general_init_builtins): Call 
 aarch64_init_memtag_builtins.
  (aarch64_expand_builtin_memtag): New.
  (aarch64_general_expand_builtin): Call 
 aarch64_expand_builtin_memtag.
  (AARCH64_BUILTIN_SUBCODE): New macro.
  (aarch64_resolve_overloaded_memtag): New.
  (aarch64_resolve_overloaded_builtin_general): New hook. Call
  aarch64_resolve_overloaded_memtag to handle overloaded MTE 
 builtins.
  * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
  __ARM_FEATURE_MEMORY_TAGGING when enabled.
  (aarch64_resolve_overloaded_builtin): Call
  aarch64_resolve_overloaded_builtin_general.
  * config/aarch64/aarch64-protos.h
  (aarch64_resolve_overloaded_builtin_general): New declaration.
  * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
  (TARGET_MEMTAG): Likewise.
  * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
  UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
  (irg, gmi, subp, addg, ldg, stg): New instructions.
  * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New 
 macro.
  (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
  (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): 
 Likewise.
  * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
  (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
  * config/arm/types.md (memtag): New.
  * doc/invoke.texi (-memtag): Update description.

 gcc/testsuite/ChangeLog:

 2019-11-07  Dennis Zhang  

  * gcc.target/aarch64/acle/memtag_1.c: New test.
  * gcc.target/aarch64/acle/memtag_2.c: New test.
  * gcc.target/aarch64/acle/memtag_3.c: New test.


 On 04/11/2019 16:40, Kyrill Tkachov wrote:
> Hi Dennis,
>
> On 10/17/19 11:03 AM, Dennis Zhang wrote:
>> Hi,
>>
>> Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
>> It can be used for spatial and temporal memory safety detection and
>> lightweight lock and key system.
>>
>> This patch enables new intrinsics leveraging MTE instructions to
>> implement functionalities of creating tags, setting tags, reading 
>> tags,
>> and manipulating tags.
>> The intrinsics are part of Arm ACLE extension:
>> https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics 
>>
>> The MTE ISA specification can be found at
>> https://developer.arm.com/docs/ddi0487/latest chapter D6.
>>
>> Bootstraped and regtested for aarch64-none-linux-gnu.
>>
>> Please help to check if it's OK for trunk.
>>
> This looks mostly ok to me but for further review this needs to be
> rebased on top of current trunk as there are some conflicts with 
> the SVE
> ACLE changes that recently went in. Most conflicts looks trivial to
> resolve but one that needs more attention is the definition of the
> TARGET_RESOLVE_OVERLOADED_BUILTIN hook.
>
> Thanks,
>
> Kyrill
>
>> Many Thanks
>> Dennis
>>
>> gcc/ChangeLog:
>>
>> 2019-10-16  Dennis Zhang  
>>
>>   * config/aarch64/aarch64-builtins.c (enum
>> aarch64_builtins): Add
>>   AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
>>   AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
>>   AARCH64_MEMTAG_BUILTIN_INC_TAG,
>> AARCH64_MEMTAG_BUILTIN_SET_TAG,
>>   AARCH64_MEMTAG_BUILTIN_GET_TAG, and
>> AARCH64_MEMTAG_BUILTIN_END.
>>   (aarch64_init_memtag_builtins): New.
>>   (AARCH64_INIT_MEMTAG_BUILTINS_DECL): 

Re: [PATCH][gcc] libgccjit: handle long literals in playback::context::new_string_literal

2019-11-12 Thread Andrea Corallo


Andrea Corallo writes:

> Andrea Corallo writes:
>
>> Andrea Corallo writes:
>>
>>> Hi all,
>>> yesterday I've found an interesting bug in libgccjit.
>>> Seems we have an hard limitation of 200 characters for literal strings.
>>> Attempting to create longer strings lead to ICE during pass_expand
>>> while performing a sanity check in get_constant_size.
>>>
>>> Tracking down the issue seems the code we have was inspired from
>>> c-family/c-common.c:c_common_nodes_and_builtins were array_domain_type
>>> is actually defined with a size of 200.
>>> The comment that follows that point sounded premonitory :) :)
>>>
>>> /* Make a type for arrays of characters.
>>>With luck nothing will ever really depend on the length of this
>>>array type.  */
>>>
>>> At least in the current implementation the type is set by
>>> fix_string_type were the actual string length is taken in account.
>>>
>>> I attach a patch updating the logic accordingly and a new testcase
>>>  for that.
>>>
>>> make check-jit is passing clean.
>>>
>>> Best Regards
>>>   Andrea
>>>
>>> gcc/jit/ChangeLog
>>> 2019-??-??  Andrea Corallo  
>>>
>>> * jit-playback.h
>>> (gcc::jit::recording::context m_recording_ctxt): Remove
>>> m_char_array_type_node field.
>>> * jit-playback.c
>>> (playback::context::context) Remove m_char_array_type_node from member
>>> initializer list.
>>> (playback::context::new_string_literal) Fix logic to handle string
>>> length > 200.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2019-??-??  Andrea Corallo  
>>>
>>> * jit.dg/all-non-failing-tests.h: Add test-long-string-literal.c.
>>> * jit.dg/test-long-string-literal.c: New testcase.
>>> diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
>>> index 9eeb2a7..a26b8d3 100644
>>> --- a/gcc/jit/jit-playback.c
>>> +++ b/gcc/jit/jit-playback.c
>>> @@ -88,7 +88,6 @@ playback::context::context (recording::context *ctxt)
>>>: log_user (ctxt->get_logger ()),
>>>  m_recording_ctxt (ctxt),
>>>  m_tempdir (NULL),
>>> -m_char_array_type_node (NULL),
>>>  m_const_char_ptr (NULL)
>>>  {
>>>JIT_LOG_SCOPE (get_logger ());
>>> @@ -670,9 +669,12 @@ playback::rvalue *
>>>  playback::context::
>>>  new_string_literal (const char *value)
>>>  {
>>> -  tree t_str = build_string (strlen (value), value);
>>> -  gcc_assert (m_char_array_type_node);
>>> -  TREE_TYPE (t_str) = m_char_array_type_node;
>>> +  /* Compare with c-family/c-common.c: fix_string_type.  */
>>> +  size_t len = strlen (value);
>>> +  tree i_type = build_index_type (size_int (len));
>>> +  tree a_type = build_array_type (char_type_node, i_type);
>>> +  tree t_str = build_string (len, value);
>>> +  TREE_TYPE (t_str) = a_type;
>>>
>>>/* Convert to (const char*), loosely based on
>>>   c/c-typeck.c: array_to_pointer_conversion,
>>> @@ -2703,10 +2705,6 @@ playback::context::
>>>  replay ()
>>>  {
>>>JIT_LOG_SCOPE (get_logger ());
>>> -  /* Adapted from c-common.c:c_common_nodes_and_builtins.  */
>>> -  tree array_domain_type = build_index_type (size_int (200));
>>> -  m_char_array_type_node
>>> -= build_array_type (char_type_node, array_domain_type);
>>>
>>>m_const_char_ptr
>>>  = build_pointer_type (build_qualified_type (char_type_node,
>>> diff --git a/gcc/jit/jit-playback.h b/gcc/jit/jit-playback.h
>>> index d4b148e..801f610 100644
>>> --- a/gcc/jit/jit-playback.h
>>> +++ b/gcc/jit/jit-playback.h
>>> @@ -322,7 +322,6 @@ private:
>>>
>>>auto_vec m_functions;
>>>auto_vec m_globals;
>>> -  tree m_char_array_type_node;
>>>tree m_const_char_ptr;
>>>
>>>/* Source location handling.  */
>>> diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h 
>>> b/gcc/testsuite/jit.dg/all-non-failing-tests.h
>>> index 0272e6f8..1b3d561 100644
>>> --- a/gcc/testsuite/jit.dg/all-non-failing-tests.h
>>> +++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h
>>> @@ -220,6 +220,13 @@
>>>  #undef create_code
>>>  #undef verify_code
>>>
>>> +/* test-long-string-literal.c */
>>> +#define create_code create_code_long_string_literal
>>> +#define verify_code verify_code_long_string_literal
>>> +#include "test-long-string-literal.c"
>>> +#undef create_code
>>> +#undef verify_code
>>> +
>>>  /* test-sum-of-squares.c */
>>>  #define create_code create_code_sum_of_squares
>>>  #define verify_code verify_code_sum_of_squares
>>> diff --git a/gcc/testsuite/jit.dg/test-long-string-literal.c 
>>> b/gcc/testsuite/jit.dg/test-long-string-literal.c
>>> new file mode 100644
>>> index 000..882567c
>>> --- /dev/null
>>> +++ b/gcc/testsuite/jit.dg/test-long-string-literal.c
>>> @@ -0,0 +1,48 @@
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include "libgccjit.h"
>>> +
>>> +#include "harness.h"
>>> +
>>> +const char very_long_string[] =
>>> +  
>>> "abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc"
>>> +  
>>> "abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc"
>>> +  
>>> 

Re: [PATCH] implement -Wrestrict for sprintf (PR 83688)

2019-11-12 Thread Martin Sebor

On 10/31/19 10:31 AM, Jeff Law wrote:

On 10/8/19 5:51 PM, Martin Sebor wrote:

Attached is a resubmission of the -Wrestrict implementation for
the sprintf family of functions.  The original patch was posted
in 2017 but never approved.  This revision makes only a few minor
changes to the original code, mostly necessitated by rebasing on
the top of trunk.

The description from the original posting still applies today:

   The enhancement works by first determining the base object (or
   pointer) from the destination of the sprintf call, the constant
   offset into the object (and subobject for struct members), and
   its size.  For each %s argument, it then computes the same data.
   If it determines that overlap between the two is possible it
   stores the data for the directive argument (including the size
   of the argument) for later processing.  After the whole call and
   format string have been processed, the code then iterates over
   the stored directives and their arguments and compares the size
   and length of the argument against the function's overall output.
   If they overlap it issues a warning.

The solution is pretty simple.  The only details that might be
worth calling out are the addition of a few utility functions some
of which at first glance look like they could be replaced by calls
to existing utilities:

  *  array_elt_at_offset
  *  field_at_offset
  *  get_origin_and_offset
  *  alias_offset

I'm a bit surprised we don't already have routines that perform these
functions.


I double-checked but couldn't find corresponding alternatives.
I might need these functions in other places in the future so
if/when I do I will move them somewhere more central (and
possibly also generalize them in the process).





Specifically, get_origin_and_offset looks like a dead ringer for
get_addr_base_and_unit_offset, except since the former is only
used for warnings it is less conservative.  It also works with
SSA_NAMEs.  This is also the function I expect to need to make
changes to (and fix bugs in).  The rest of the functions are
general utilities that could perhaps be moved to tree.c at some
point when there is a use for them elsewhere (I have some work
in progress that might need at least one of them).

Another likely question worth addressing is why the sprintf
overlap detection isn't handled in gimple-ssa-warn-restrict.c.
There is an opportunity for code sharing between the two "passes"
but it will require some fairly intrusive changes to the latter.
Those feel out of scope for the initial solution.

Finally, because of new dependencies between existing classes in
the file, some code had to be moved around within it a bit.  That
contributed to the size of the patch making the changes seem more
extensive than they really are.

Tested on x86_64-linux with Binutils/GDB and Glibc.

Martin

The original submission:
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00036.html

gcc-83688.diff

PR middle-end/83688 - check if buffers may overlap when copying strings using 
sprintf

gcc/ChangeLog:

PR middle-end/83688
* gimple-ssa-sprintf.c (format_result::alias_info): New struct.
(directive::argno): New member.
(format_result::aliases, format_result::alias_count): New data members.
(format_result::append_alias): New member function.
(fmtresult::dst_offset): New data member.
(pass_sprintf_length::call_info::dst_origin): New data member.
(pass_sprintf_length::call_info::dst_field, dst_offset): Same.
(char_type_p, array_elt_at_offset, field_at_offset): New functions.
(get_origin_and_offset): Same.
(format_string): Call it.
(format_directive): Call append_alias and set directive argument
number.
(maybe_warn_overlap): New function.
(pass_sprintf_length::compute_format_length): Call it.
(pass_sprintf_length::handle_gimple_call): Initialize new members.
* gcc/tree-ssa-strlen.c (): Also enable when -Wrestrict is on.

gcc/testsuite/ChangeLog:

PR tree-optimization/35503
* gcc.dg/tree-ssa/builtin-sprintf-warn-23.c: New test.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index b11d7989d5e..b47ed019615 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c



+void
+format_result::append_alias (const directive , HOST_WIDE_INT off,
+const result_range )
+{
+  unsigned cnt = alias_count + 1;
+  alias_info *ar = XNEWVEC (alias_info, cnt);

So just a question.  Is there a particular reason why you're
self-managing the vector here?  Would it be easier to use the vec.h
capabilities?  I guess it's not that big of a deal since it's pretty
well isolated in here.


None of the vec templates is safely copyable or assignable (PR 90904)
so the code would be more cumbersome if it did use it than this way.
(For comparison, see class args_loc_t in gimple-ssa-isolate-paths.c
that uses vec).




@@ -2143,6 +2190,249 @@ 

Re: [patch, fortran] Load scalar intent-in variables at the beginning of procedures

2019-11-12 Thread Thomas König
Hi Tobias,

> On 11/12/19 1:42 PM, Thomas König wrote:
>>> Ah, of course. I should have said module procedures. Or even module 
>>> procedures without bind(C)?
>> It would probably be the latter. The change would actually be rather small: 
>> If conditions are met, just add attr.value for INTENT(IN). This is something 
>> we should probably do when we are forced into doing an ABI change by other 
>> circumstances.
> 
> Will this still work if one does:
> 
> module m
> contains
> integer function val(y)
>  integer, intent(in) :: y
>  val = 2*y
> end function val
> end module m
> 
> use m
> interface
>  integer function proc(z)
>integer, intent(in) :: z
>  end function proc
> end interface
> procedure(proc), pointer :: ff
> ff => val
> print *, ff(10)
> end

You are right, it would not work. So, scratch that idea. Maybe we should commit 
this as a test case so nobody gets funny ideas in two year‘s time 

So, I think we can then discuss the original patch.

Regards

Thomas

[C++ PATCH] c++/89070 - bogus [[nodiscard]] warning in SFINAE.

2019-11-12 Thread Marek Polacek
This is a complaint that we issue a [[nodiscard]] warning even in SFINAE
contexts.  Here 'complain' is tf_decltype, but not tf_warning so I guess
we can fix it as below.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-11-12  Marek Polacek  

PR c++/89070 - bogus [[nodiscard]] warning in SFINAE.
* cvt.c (convert_to_void): Guard maybe_warn_nodiscard calls with
tf_warning.

* g++.dg/cpp1z/nodiscard7.C: New test.

diff --git gcc/cp/cvt.c gcc/cp/cvt.c
index d41aeb8f1fc..23facb77634 100644
--- gcc/cp/cvt.c
+++ gcc/cp/cvt.c
@@ -1201,7 +1201,8 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
if (DECL_DESTRUCTOR_P (fn))
  return expr;
 
-  maybe_warn_nodiscard (expr, implicit);
+  if (complain & tf_warning)
+   maybe_warn_nodiscard (expr, implicit);
   break;
 
 case INDIRECT_REF:
@@ -1357,7 +1358,8 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
 && !is_reference)
   warning_at (loc, OPT_Wunused_value, "value computed is not 
used");
 expr = TREE_OPERAND (expr, 0);
-   if (TREE_CODE (expr) == CALL_EXPR)
+   if (TREE_CODE (expr) == CALL_EXPR
+   && (complain & tf_warning))
  maybe_warn_nodiscard (expr, implicit);
   }
 
@@ -1435,7 +1437,8 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
   AGGR_INIT_EXPR_ARGP (init));
}
}
-  maybe_warn_nodiscard (expr, implicit);
+  if (complain & tf_warning)
+   maybe_warn_nodiscard (expr, implicit);
   break;
 
 default:;
diff --git gcc/testsuite/g++.dg/cpp1z/nodiscard7.C 
gcc/testsuite/g++.dg/cpp1z/nodiscard7.C
new file mode 100644
index 000..80dac63e41e
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1z/nodiscard7.C
@@ -0,0 +1,18 @@
+// PR c++/89070 - bogus [[nodiscard]] warning in SFINAE.
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+[[nodiscard]] static int match() { return 42; }
+};
+
+template
+auto g() -> decltype( T::match(), bool() )
+{
+return T::match();
+}
+
+int main()
+{
+g();
+}



Re: [PATCH] Fix slowness in demangler

2019-11-12 Thread Tim Rühsen
On 11/12/19 4:15 PM, Ian Lance Taylor wrote:
> On Tue, Nov 12, 2019 at 6:15 AM Tim Rühsen  wrote:
>>
>> this is a proposal to fix
>> https://sourceware.org/bugzilla/show_bug.cgi?id=25180
>>
>> In short:
>> cxxfilt
>> _ZZ1_DO1z1Dclaa1D1VEE1VE2zo
>>
>> takes several minutes with 100% CPU before it comes back with a result.
>>
>> With this patch the result is returned immediately. The test suite in
>> binutils-gdb/libiberty/ throws no error.
>>
>> I'd like to note that I am not subscribed to the list, so please add me
>> to CC when replying. Thanks in advance.
> 
> This is OK with an appropriate ChangeLog entry.

Thanks for feedback, Ian.

Attached is the patch with a ChangeLog entry.

Regards, Tim
From 1311f0499ff0a5353e3201587e1e50c9b9cc58c2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tim=20R=C3=BChsen?= 
Date: Tue, 12 Nov 2019 13:10:47 +0100
Subject: [PATCH] Fix demangler slowness issue

Fixes #25180 (binutils bugtracker)

The demangler works with two passes. The first one is for counting
certain items. It was missing the protection against traversing subtrees
multiple times without reaching the recursion limit.  The second pass
had this protection.
Without the protection it was possible to craft input that excessively
used the CPU.

The fix uses the same mechanism as pass 2 to counterfeit this kind
of (malicious) input.
---
 include/demangle.h  |  1 +
 libiberty/ChangeLog | 18 ++
 libiberty/cp-demangle.c | 15 +++
 libiberty/cp-demint.c   |  3 +++
 4 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/include/demangle.h b/include/demangle.h
index f5d9b9e8b5..3b00dbc31a 100644
--- a/include/demangle.h
+++ b/include/demangle.h
@@ -481,6 +481,7 @@ struct demangle_component
  Initialize to zero.  Private to d_print_comp.
  All other fields are final after initialization.  */
   int d_printing;
+  int d_counting;
 
   union
   {
diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog
index 95cb1525f2..c86b06f0bf 100644
--- a/libiberty/ChangeLog
+++ b/libiberty/ChangeLog
@@ -1,3 +1,21 @@
+2019-11-12  Tim Ruehsen  
+
+	* ../include/demangle.h (struct demangle_component):
+	Add member d_counting.
+	* cp-demangle.c (d_print_init): Remove const from 4th param.
+	(cplus_demangle_fill_name): Initialize d->d_counting.
+	(cplus_demangle_fill_extended_operator): Likewise.
+	(cplus_demangle_fill_ctor): Likewise.
+	(cplus_demangle_fill_dtor): Likewise.
+	(d_make_empty): Likewise.
+	(d_count_templates_scopes): Remobe const from 3rd param,
+	Return on dc->d_counting > 1,
+	Increment dc->d_counting.
+* cp-demint.c (cplus_demangle_fill_component): Initialize d->d_counting.
+	(cplus_demangle_fill_builtin_type): Likewise.
+	(cplus_demangle_fill_operator): Likewise.
+	This fixes bug #25180 (binutils bugtracker)
+
 2019-08-08  Martin Liska  
 
 	PR bootstrap/91352
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index aa78c86dd4..f7c4dbbd11 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -517,7 +517,7 @@ d_growable_string_callback_adapter (const char *, size_t, void *);
 
 static void
 d_print_init (struct d_print_info *, demangle_callbackref, void *,
-	  const struct demangle_component *);
+	  struct demangle_component *);
 
 static inline void d_print_error (struct d_print_info *);
 
@@ -864,6 +864,7 @@ cplus_demangle_fill_name (struct demangle_component *p, const char *s, int len)
   if (p == NULL || s == NULL || len <= 0)
 return 0;
   p->d_printing = 0;
+  p->d_counting = 0;
   p->type = DEMANGLE_COMPONENT_NAME;
   p->u.s_name.s = s;
   p->u.s_name.len = len;
@@ -880,6 +881,7 @@ cplus_demangle_fill_extended_operator (struct demangle_component *p, int args,
   if (p == NULL || args < 0 || name == NULL)
 return 0;
   p->d_printing = 0;
+  p->d_counting = 0;
   p->type = DEMANGLE_COMPONENT_EXTENDED_OPERATOR;
   p->u.s_extended_operator.args = args;
   p->u.s_extended_operator.name = name;
@@ -900,6 +902,7 @@ cplus_demangle_fill_ctor (struct demangle_component *p,
   || (int) kind > gnu_v3_object_ctor_group)
 return 0;
   p->d_printing = 0;
+  p->d_counting = 0;
   p->type = DEMANGLE_COMPONENT_CTOR;
   p->u.s_ctor.kind = kind;
   p->u.s_ctor.name = name;
@@ -920,6 +923,7 @@ cplus_demangle_fill_dtor (struct demangle_component *p,
   || (int) kind > gnu_v3_object_dtor_group)
 return 0;
   p->d_printing = 0;
+  p->d_counting = 0;
   p->type = DEMANGLE_COMPONENT_DTOR;
   p->u.s_dtor.kind = kind;
   p->u.s_dtor.name = name;
@@ -937,6 +941,7 @@ d_make_empty (struct d_info *di)
 return NULL;
   p = >comps[di->next_comp];
   p->d_printing = 0;
+  p->d_counting = 0;
   ++di->next_comp;
   return p;
 }
@@ -4068,11 +4073,13 @@ d_growable_string_callback_adapter (const char *s, size_t l, void *opaque)
 
 static void
 d_count_templates_scopes (struct d_print_info *dpi,
-			  const struct demangle_component *dc)
+			  struct demangle_component 

[C] Add a target hook that allows targets to verify type usage

2019-11-12 Thread Richard Sandiford
This patch adds a new target hook to check whether there are any
target-specific reasons why a type cannot be used in a certain
source-language context.  It works in a similar way to existing
hooks like TARGET_INVALID_CONVERSION and TARGET_INVALID_UNARY_OP.

The reason for adding the hook is to report invalid uses of SVE types.
Throughout a TU, the SVE vector and predicate types represent values
that can be stored in an SVE vector or predicate register.  At certain
points in the TU we might be able to generate code that assumes the
registers have a particular size, but often we can't.  In some cases
we might even make multiple different assumptions in the same TU
(e.g. when implementing an ifunc for multiple vector lengths).

But SVE types themselves are the same type throughout.  The register
size assumptions change how we generate code, but they don't change
the definition of the types.

This means that the types do not have a fixed size at the C level
even when -msve-vector-bits=N is in effect.  It also means that the
size does not work in the same way as for C VLAs, where the abstract
machine evaluates the size at a particular point and then carries that
size forward to later code.

The SVE ACLE deals with this by making it invalid to use C and C++
constructs that depend on the size or layout of SVE types.  The spec
refers to the types as "sizeless" types and defines their semantics as
edits to the standards.  See:

  https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00868.html

for a fuller description and:

  https://gcc.gnu.org/ml/gcc/2019-11/msg00088.html

for a recent update on the status.

However, since all current sizeless types are target-specific built-in
types, there's no real reason for the frontends to handle them directly.
They can just hand off the checks to target code instead.  It's then
possible for the errors to refer to "SVE types" rather than "sizeless
types", which is likely to be more meaningful to users.

There is a slight overlap between the new tests and the ones for
gnu_vector_type_p in r277950, but here the emphasis is on testing
sizelessness.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-11-12  Richard Sandiford  

gcc/
* target.h (type_context_kind): New enum.
(verify_type_context): Declare.
* target.def (verify_type_context): New target hook.
* doc/tm.texi.in (TARGET_VERIFY_TYPE_CONTEXT): Likewise.
* doc/tm.texi: Regenerate.
* tree.c (verify_type_context): New function.
* config/aarch64/aarch64-protos.h (aarch64_sve::verify_type_context):
Declare.
* config/aarch64/aarch64-sve-builtins.cc (verify_type_context):
New function.
* config/aarch64/aarch64.c (aarch64_verify_type_context): Likewise.
(TARGET_VERIFY_TYPE_CONTEXT): Define.

gcc/c-family/
* c-common.c (pointer_int_sum): Use verify_type_context to check
whether the target allows pointer arithmetic for the types involved.
(c_sizeof_or_alignof_type, c_alignof_expr): Use verify_type_context
to check whether the target allows sizeof and alignof operations
for the types involved.

gcc/c/
* c-decl.c (start_decl): Allow initialization of variables whose
size is a POLY_INT_CST.
(finish_decl): Use verify_type_context to check whether the target
allows variables with a particular type to have static or thread-local
storage duration.  Don't raise a second error if such variables do
not have a constant size.
(grokdeclarator): Use verify_type_context to check whether the
target allows fields or array elements to have a particular type.
* c-typeck.c (pointer_diff): Use verify_type_context to test whether
the target allows pointer difference for the types involved.
(build_unary_op): Likewise for pointer increment and decrement.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/sizeless-1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Likewise.

Index: gcc/target.h
===
--- gcc/target.h2019-11-08 08:31:17.0 +
+++ gcc/target.h2019-11-12 16:01:45.643584681 +
@@ -218,6 +218,35 @@ enum omp_device_kind_arch_isa {
   omp_device_isa
 };
 
+/* The contexts in which the use of a type T can be checked by
+   TARGET_VERIFY_TYPE_CONTEXT.  */
+enum type_context_kind {
+  /* Directly measuring the size of T.  */
+  TCTX_SIZEOF,
+
+  /* Directly measuring the alignment of T.  */
+  TCTX_ALIGNOF,
+
+  /* Creating objects of type T with static storage duration.  */
+  TCTX_STATIC_STORAGE,
+
+  /* Creating objects of type T with thread-local storage duration.  */
+  TCTX_THREAD_STORAGE,
+
+  /* Creating a field of type T.  */
+  TCTX_FIELD,
+
+  /* Creating an array with elements of type T.  */
+  TCTX_ARRAY_ELEMENT,
+
+  /* Adding to or subtracting from a pointer 

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-12 Thread Kyrill Tkachov



On 11/12/19 3:50 PM, Dennis Zhang wrote:

Hi Kyrill,

On 12/11/2019 09:40, Kyrill Tkachov wrote:

Hi Dennis,

On 11/7/19 1:48 PM, Dennis Zhang wrote:

Hi Kyrill,

I have rebased the patch on top of current truck.
For resolve_overloaded, I redefined my memtag overloading function to
fit the latest resolve_overloaded_builtin interface.

Regression tested again and survived for aarch64-none-linux-gnu.

Please reply inline rather than top-posting on gcc-patches.



Cheers
Dennis

Changelog is updated as following:

gcc/ChangeLog:

2019-11-07  Dennis Zhang  

 * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
 AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
 AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
 AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
 AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
 (aarch64_init_memtag_builtins): New.
 (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
 (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
 (aarch64_expand_builtin_memtag): New.
 (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
 (AARCH64_BUILTIN_SUBCODE): New macro.
 (aarch64_resolve_overloaded_memtag): New.
 (aarch64_resolve_overloaded_builtin_general): New hook. Call
 aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
 * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
 __ARM_FEATURE_MEMORY_TAGGING when enabled.
 (aarch64_resolve_overloaded_builtin): Call
 aarch64_resolve_overloaded_builtin_general.
 * config/aarch64/aarch64-protos.h
 (aarch64_resolve_overloaded_builtin_general): New declaration.
 * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
 (TARGET_MEMTAG): Likewise.
 * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
 UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
 (irg, gmi, subp, addg, ldg, stg): New instructions.
 * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
 (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
 (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise.
 * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
 (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
 * config/arm/types.md (memtag): New.
 * doc/invoke.texi (-memtag): Update description.

gcc/testsuite/ChangeLog:

2019-11-07  Dennis Zhang  

 * gcc.target/aarch64/acle/memtag_1.c: New test.
 * gcc.target/aarch64/acle/memtag_2.c: New test.
 * gcc.target/aarch64/acle/memtag_3.c: New test.


On 04/11/2019 16:40, Kyrill Tkachov wrote:

Hi Dennis,

On 10/17/19 11:03 AM, Dennis Zhang wrote:

Hi,

Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
It can be used for spatial and temporal memory safety detection and
lightweight lock and key system.

This patch enables new intrinsics leveraging MTE instructions to
implement functionalities of creating tags, setting tags, reading tags,
and manipulating tags.
The intrinsics are part of Arm ACLE extension:
https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
The MTE ISA specification can be found at
https://developer.arm.com/docs/ddi0487/latest chapter D6.

Bootstraped and regtested for aarch64-none-linux-gnu.

Please help to check if it's OK for trunk.


This looks mostly ok to me but for further review this needs to be
rebased on top of current trunk as there are some conflicts with the SVE
ACLE changes that recently went in. Most conflicts looks trivial to
resolve but one that needs more attention is the definition of the
TARGET_RESOLVE_OVERLOADED_BUILTIN hook.

Thanks,

Kyrill


Many Thanks
Dennis

gcc/ChangeLog:

2019-10-16  Dennis Zhang  

  * config/aarch64/aarch64-builtins.c (enum
aarch64_builtins): Add
  AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
  AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
  AARCH64_MEMTAG_BUILTIN_INC_TAG,
AARCH64_MEMTAG_BUILTIN_SET_TAG,
  AARCH64_MEMTAG_BUILTIN_GET_TAG, and
AARCH64_MEMTAG_BUILTIN_END.
  (aarch64_init_memtag_builtins): New.
  (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
  (aarch64_general_init_builtins): Call
aarch64_init_memtag_builtins.
  (aarch64_expand_builtin_memtag): New.
  (aarch64_general_expand_builtin): Call
aarch64_expand_builtin_memtag.
  (AARCH64_BUILTIN_SUBCODE): New macro.
  (aarch64_resolve_overloaded_memtag): New.
  (aarch64_resolve_overloaded_builtin): New hook. Call
  aarch64_resolve_overloaded_memtag to handle overloaded MTE
builtins.
  * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define
  __ARM_FEATURE_MEMORY_TAGGING when enabled.
  * config/aarch64/aarch64-protos.h
(aarch64_resolve_overloaded_builtin):
  Add declaration.
  * 

Re: [PATCH][AArch64] Implement Armv8.5-A memory tagging (MTE) intrinsics

2019-11-12 Thread Dennis Zhang
Hi Kyrill,

On 12/11/2019 09:40, Kyrill Tkachov wrote:
> Hi Dennis,
> 
> On 11/7/19 1:48 PM, Dennis Zhang wrote:
>> Hi Kyrill,
>>
>> I have rebased the patch on top of current truck.
>> For resolve_overloaded, I redefined my memtag overloading function to
>> fit the latest resolve_overloaded_builtin interface.
>>
>> Regression tested again and survived for aarch64-none-linux-gnu.
> 
> Please reply inline rather than top-posting on gcc-patches.
> 
> 
>> Cheers
>> Dennis
>>
>> Changelog is updated as following:
>>
>> gcc/ChangeLog:
>>
>> 2019-11-07  Dennis Zhang  
>>
>> * config/aarch64/aarch64-builtins.c (enum aarch64_builtins): Add
>> AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
>> AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
>> AARCH64_MEMTAG_BUILTIN_INC_TAG, AARCH64_MEMTAG_BUILTIN_SET_TAG,
>> AARCH64_MEMTAG_BUILTIN_GET_TAG, and AARCH64_MEMTAG_BUILTIN_END.
>> (aarch64_init_memtag_builtins): New.
>> (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
>> (aarch64_general_init_builtins): Call aarch64_init_memtag_builtins.
>> (aarch64_expand_builtin_memtag): New.
>> (aarch64_general_expand_builtin): Call aarch64_expand_builtin_memtag.
>> (AARCH64_BUILTIN_SUBCODE): New macro.
>> (aarch64_resolve_overloaded_memtag): New.
>> (aarch64_resolve_overloaded_builtin_general): New hook. Call
>> aarch64_resolve_overloaded_memtag to handle overloaded MTE builtins.
>> * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
>> __ARM_FEATURE_MEMORY_TAGGING when enabled.
>> (aarch64_resolve_overloaded_builtin): Call
>> aarch64_resolve_overloaded_builtin_general.
>> * config/aarch64/aarch64-protos.h
>> (aarch64_resolve_overloaded_builtin_general): New declaration.
>> * config/aarch64/aarch64.h (AARCH64_ISA_MEMTAG): New macro.
>> (TARGET_MEMTAG): Likewise.
>> * config/aarch64/aarch64.md (define_c_enum "unspec"): Add
>> UNSPEC_GEN_TAG, UNSPEC_GEN_TAG_RND, and UNSPEC_TAG_SPACE.
>> (irg, gmi, subp, addg, ldg, stg): New instructions.
>> * config/aarch64/arm_acle.h (__arm_mte_create_random_tag): New macro.
>> (__arm_mte_exclude_tag, __arm_mte_increment_tag): Likewise.
>> (__arm_mte_ptrdiff, __arm_mte_set_tag, __arm_mte_get_tag): Likewise.
>> * config/aarch64/predicates.md (aarch64_memtag_tag_offset): New.
>> (aarch64_granule16_uimm6, aarch64_granule16_simm9): New.
>> * config/arm/types.md (memtag): New.
>> * doc/invoke.texi (-memtag): Update description.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-07  Dennis Zhang  
>>
>> * gcc.target/aarch64/acle/memtag_1.c: New test.
>> * gcc.target/aarch64/acle/memtag_2.c: New test.
>> * gcc.target/aarch64/acle/memtag_3.c: New test.
>>
>>
>> On 04/11/2019 16:40, Kyrill Tkachov wrote:
>>> Hi Dennis,
>>>
>>> On 10/17/19 11:03 AM, Dennis Zhang wrote:
 Hi,

 Arm Memory Tagging Extension (MTE) is published with Armv8.5-A.
 It can be used for spatial and temporal memory safety detection and
 lightweight lock and key system.

 This patch enables new intrinsics leveraging MTE instructions to
 implement functionalities of creating tags, setting tags, reading tags,
 and manipulating tags.
 The intrinsics are part of Arm ACLE extension:
 https://developer.arm.com/docs/101028/latest/memory-tagging-intrinsics
 The MTE ISA specification can be found at
 https://developer.arm.com/docs/ddi0487/latest chapter D6.

 Bootstraped and regtested for aarch64-none-linux-gnu.

 Please help to check if it's OK for trunk.

>>> This looks mostly ok to me but for further review this needs to be
>>> rebased on top of current trunk as there are some conflicts with the SVE
>>> ACLE changes that recently went in. Most conflicts looks trivial to
>>> resolve but one that needs more attention is the definition of the
>>> TARGET_RESOLVE_OVERLOADED_BUILTIN hook.
>>>
>>> Thanks,
>>>
>>> Kyrill
>>>
 Many Thanks
 Dennis

 gcc/ChangeLog:

 2019-10-16  Dennis Zhang  

  * config/aarch64/aarch64-builtins.c (enum 
 aarch64_builtins): Add
  AARCH64_MEMTAG_BUILTIN_START, AARCH64_MEMTAG_BUILTIN_IRG,
  AARCH64_MEMTAG_BUILTIN_GMI, AARCH64_MEMTAG_BUILTIN_SUBP,
  AARCH64_MEMTAG_BUILTIN_INC_TAG, 
 AARCH64_MEMTAG_BUILTIN_SET_TAG,
  AARCH64_MEMTAG_BUILTIN_GET_TAG, and 
 AARCH64_MEMTAG_BUILTIN_END.
  (aarch64_init_memtag_builtins): New.
  (AARCH64_INIT_MEMTAG_BUILTINS_DECL): New macro.
  (aarch64_general_init_builtins): Call
 aarch64_init_memtag_builtins.
  (aarch64_expand_builtin_memtag): New.
  (aarch64_general_expand_builtin): Call
 aarch64_expand_builtin_memtag.
  (AARCH64_BUILTIN_SUBCODE): New macro.
  (aarch64_resolve_overloaded_memtag): New.
  (aarch64_resolve_overloaded_builtin): New hook. Call
   

Re: [committed] Handle POLY_INT_CST in copy_reference_ops_from_ref

2019-11-12 Thread Andreas Schwab
On Nov 12 2019, Richard Sandiford wrote:

> I'll try to make the tests ILP32 clean once we're in stage 3, including
> fixing the problems that Andreas pointed out.

Note that the massive testsuite failures cause the gcc-testresults mail
to become so huge (> 4Mb) that gcc.gnu.org rejects it.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-12 Thread Martin Sebor

On 11/11/19 10:10 PM, Jeff Law wrote:

On 11/6/19 3:34 PM, Martin Sebor wrote:

On 11/6/19 2:06 PM, Martin Sebor wrote:

On 11/6/19 1:39 PM, Jeff Law wrote:

On 11/6/19 1:27 PM, Martin Sebor wrote:

On 11/6/19 11:55 AM, Jeff Law wrote:

On 11/6/19 11:00 AM, Martin Sebor wrote:

The -Wstringop-overflow warnings for single-byte and multi-byte
stores mention the amount of data being stored and the amount of
space remaining in the destination, such as:

warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]

 123 |   *p = 0;
 |   ~~~^~~
note: destination object declared here
  45 |   char b[N];
 |^

A warning like this can take some time to analyze.  First, the size
of the destination isn't mentioned and may not be easy to tell from
the sources.  In the note above, when N's value is the result of
some non-trivial computation, chasing it down may be a small project
in and of itself.  Second, it's also not clear why the region size
is zero.  It could be because the offset is exactly N, or because
it's negative, or because it's in some range greater than N.

Mentioning both the size of the destination object and the offset
makes the existing messages clearer, are will become essential when
GCC starts diagnosing overflow into allocated buffers (as my
follow-on patch does).

The attached patch enhances -Wstringop-overflow to do this by
letting compute_objsize return the offset to its caller, doing
something similar in get_stridx, and adding a new function to
the strlen pass to issue this enhanced warning (eventually, I'd
like the function to replace the -Wstringop-overflow handler in
builtins.c).  With the change, the note above might read something
like:

note: at offset 11 to object ‘b’ with size 8 declared here
  45 |   char b[N];
 |^

Tested on x86_64-linux.

Martin

gcc-store-offset.diff

gcc/ChangeLog:

  * builtins.c (compute_objsize): Add an argument and set it to
offset
  into destination.
  * builtins.h (compute_objsize): Add an argument.
  * tree-object-size.c (addr_object_size): Add an argument and
set it
  to offset into destination.
  (compute_builtin_object_size): Same.
  * tree-object-size.h (compute_builtin_object_size): Add an
argument.
  * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
set it
  to offset into destination.
  (maybe_warn_overflow): New function.
  (handle_store): Call maybe_warn_overflow to issue warnings.

gcc/testsuite/ChangeLog:

  * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
messages.
  * g++.dg/warn/Wstringop-overflow-3.C: Same.
  * gcc.dg/Wstringop-overflow-17.c: Same.




Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c    (revision 277886)
+++ gcc/tree-ssa-strlen.c    (working copy)
@@ -189,6 +189,52 @@ struct laststmt_struct
    static int get_stridx_plus_constant (strinfo *, unsigned
HOST_WIDE_INT, tree);
    static void handle_builtin_stxncpy (built_in_function,
gimple_stmt_iterator *);
    +/* Sets MINMAX to either the constant value or the range VAL
is in
+   and returns true on success.  */
+
+static bool
+get_range (tree val, wide_int minmax[2], const vr_values *rvals =
NULL)
+{
+  if (tree_fits_uhwi_p (val))
+    {
+  minmax[0] = minmax[1] = wi::to_wide (val);
+  return true;
+    }
+
+  if (TREE_CODE (val) != SSA_NAME)
+    return false;
+
+  if (rvals)
+    {
+  gimple *def = SSA_NAME_DEF_STMT (val);
+  if (gimple_assign_single_p (def)
+  && gimple_assign_rhs_code (def) == INTEGER_CST)
+    {
+  /* get_value_range returns [0, N] for constant
assignments.  */
+  val = gimple_assign_rhs1 (def);
+  minmax[0] = minmax[1] = wi::to_wide (val);
+  return true;
+    }

Umm, something seems really off with this hunk.  If the SSA_NAME is
set
via a simple constant assignment, then the range ought to be a
singleton
ie [CONST,CONST].   Is there are particular test were this is not
true?

The only way offhand I could see this happening is if originally
the RHS
wasn't a constant, but due to optimizations it either simplified
into a
constant or a constant was propagated into an SSA_NAME appearing on
the
RHS.  This would have to happen between the last range analysis and
the
point where you're making this query.


Yes, I think that's right.  Here's an example where it happens:

    void f (void)
    {
  char s[] = "1234";
  unsigned n = strlen (s);
  char vla[n];   // or malloc (n)
  vla[n] = 0;    // n = [4, 4]
  ...
    }

The strlen call is folded to 4 but that's not propagated to
the access until sometime after the strlen pass is done.

Hmm.  Are we calling set_range_info in that case?  That goes behind the
back of pass instance of vr_values.  If so, that might argue we want to
be setting it in vr_values too.


No, set_range_info is only called for ranges.  In this case,
handle_builtin_strlen 

Re: [PATCH 5/7] Remove last leftover usage of params* files.

2019-11-12 Thread Harwath, Frederik
Hi Martin,

On 06.11.19 13:40, Martin Liska wrote:

>   (finalize_options_struct): Remove.

This patch has been committed by now, but it seems that a single use of 
finalize_options_struct has been overlooked
in gcc/tree-streamer-in.c.

Best regards,
Frederik



Re: [PATCH] Free dominance info at the beginning of pass_jump_after_combine

2019-11-12 Thread Ilya Leoshkevich
> Am 12.11.2019 um 15:32 schrieb Segher Boessenkool 
> :
> 
> Hi!
> 
> On Tue, Nov 12, 2019 at 03:11:05PM +0100, Ilya Leoshkevich wrote:
>> try_forward_edges does not update dominance info, and merge_blocks
>> relies on it being up-to-date.  In PR92430 stale dominance info makes
>> merge_blocks produce a loop in the dominator tree, which in turn makes
>> delete_basic_block loop forever.
>> 
>> Fix by freeing dominance info at the beginning of cleanup_cfg.
> 
>> --- a/gcc/cfgcleanup.c
>> +++ b/gcc/cfgcleanup.c
>> @@ -3312,6 +3312,9 @@ public:
>> unsigned int
>> pass_jump_after_combine::execute (function *)
>> {
>> +  /* Jump threading does not keep dominators up-to-date.  */
>> +  free_dominance_info (CDI_DOMINATORS);
>> +  free_dominance_info (CDI_POST_DOMINATORS);
>>   cleanup_cfg (flag_thread_jumps ? CLEANUP_THREADING : 0);
>>   return 0;
>> }
> 
> Why do you always free it, if if only gets invalidated if flag_thread_jumps?
> 
> It may be a good idea to throw away the dom info anyway, but the comment
> seems off then?

Hmm, come to think of it, it would make sense to make flag_thread_jumps
a gate for this pass, and then run free_dominance_info (CDI_DOMINATORS)
and cleanup_cfg (CLEANUP_THREADING) unconditionally. What do you think?

Best regards,
Ilya


Re: [PATCH] [MIPS] Sanitize the constant argument for rotr3

2019-11-12 Thread Jeff Law
On 11/12/19 7:56 AM, Dragan Mladjenovic wrote:
> From: "Dragan Mladjenovic" 
> 
> This was dormant for quite some time, but it started happening for me
> on gcc.c-torture/compile/pr65153.c sometime after r276645 for -mabi=32 linux 
> runs.
> 
> The pattern accepts any SMALL_OPERAND constant value while it asserts during 
> the final
> that the value is in the mode size range. I this case it happens that 
> combine_and_move_insns
> during ira makes a pattern with negative "shift count" which fails at final 
> stage.
> 
> This simple fix just truncates the constant operand to mode size the same as 
> shift patterns.
> 
> gcc/ChangeLog:
> 
> 2019-11-12  Dragan Mladjenovic  
> 
>   * config/mips/mips.md (rotr3): Sanitize the constant argument
>   instead of asserting its value.
> ---
> 
> Ok, for trunk and backport to gcc 9 and 8 branches?
OK.  But I'm not sure the formatting is right.  The bit-and operator
should be indented so that it lines up with the start of INTVAL (...).

jeff



Re: [C++] Fix interaction between aka changes and DR1558 (PR92206)

2019-11-12 Thread Jason Merrill

On 10/25/19 2:53 PM, Richard Sandiford wrote:

One of the changes in r277281 was to make the typedef variant
handling in strip_typedefs pass the raw DECL_ORIGINAL_TYPE to the
recursive call, instead of applying TYPE_MAIN_VARIANT first.
This PR shows that that interacts badly with the implementation
of DR1558, because we then refuse to strip aliases with dependent
template parameters and trip:

   gcc_assert (!typedef_variant_p (result)
  || ((flags & STF_USER_VISIBLE)
  && !user_facing_original_type_p (result)));

Keeping the current behaviour but suppressing the ICE leads to a
duplicate error (the dg-bogus in the first test), so that didn't
seem like a good fix.

I assume keeping the alias should never actually be necessary for
DECL_ORIGINAL_TYPEs, because it will already have been checked
somewhere, even for implicit TYPE_DECLs.  This patch therefore
passes a flag to say that we can assume the type is validated
elsewhere.

It seems a rather clunky fix, sorry, but restoring the
TYPE_MAIN_VARIANT (...) isn't compatible with the aka stuff.

Bootstrapped & regression-tested on aarch64-linux-gnu.  OK to install?



2019-10-25  Richard Sandiford  

gcc/cp/
PR c++/92206
* cp-tree.h (STF_ASSUME_VALID): New constant.


Let's call this STF_STRIP_DEPENDENT.


* tree.c (strip_typedefs): Add STF_ASSUME_VALID to the flags
when calling strip_typedefs recursively on a DECL_ORIGINAL_TYPE.
Don't apply the fix for DR1558 in that case; allow aliases with
dependent template parameters to be stripped instead.

gcc/testsuite/
PR c++/92206
* g++.dg/pr92206-1.C: New test.
* g++.dg/pr92206-2.C: Likewise.
* g++.dg/pr92206-3.C: Likewise.


Let's call these g++.dg/cpp0x/alias-decl-pr92206*.

OK with those changes.

Jason



Re: [PATCH] Fix slowness in demangler

2019-11-12 Thread Ian Lance Taylor via gcc-patches
On Tue, Nov 12, 2019 at 6:15 AM Tim Rühsen  wrote:
>
> this is a proposal to fix
> https://sourceware.org/bugzilla/show_bug.cgi?id=25180
>
> In short:
> cxxfilt
> _ZZ1_DO1z1Dclaa1D1VEE1VE2zo
>
> takes several minutes with 100% CPU before it comes back with a result.
>
> With this patch the result is returned immediately. The test suite in
> binutils-gdb/libiberty/ throws no error.
>
> I'd like to note that I am not subscribed to the list, so please add me
> to CC when replying. Thanks in advance.

This is OK with an appropriate ChangeLog entry.

Thanks.

Ian


Re: [committed] Handle POLY_INT_CST in copy_reference_ops_from_ref

2019-11-12 Thread Richard Sandiford
Christophe Lyon  writes:
> On Fri, 8 Nov 2019 at 10:44, Richard Sandiford
>  wrote:
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Applied as obvious.
>>
>
> Hi Richard,
>
> The new deref_2.c test fails with -mabi=ilp32:
> FAIL: gcc.target/aarch64/sve/acle/general/deref_2.c
> -march=armv8.2-a+sve (test for excess errors)
> Excess errors:
> /gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:17:39:
> error: no matching function for call to 'svld1(svbool_t&, int32_t*&)'
> /gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:17:38:
> error: invalid conversion from 'int32_t*' {aka 'long int*'} to 'const
> int*' [-fpermissive]
> /gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:17:38:
> error: invalid conversion from 'int32_t*' {aka 'long int*'} to 'const
> unsigned int*' [-fpermissive]
> /gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:18:43:
> error: no matching function for call to 'svld1(svbool_t&, int32_t*&)'
> /gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:18:42:
> error: invalid conversion from 'int32_t*' {aka 'long int*'} to 'const
> int*' [-fpermissive]
> /gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:18:42:
> error: invalid conversion from 'int32_t*' {aka 'long int*'} to 'const
> unsigned int*' [-fpermissive]

Ugh.  This is because for -mabi=ilp32, newlib's stdint.h defines int32_t
to be long int rather than int.  It's easy to update the ACLE code to use
the right stdint.h type, but the problem is that for ILP32, the C++ code:

  int x;
  f();

does not resolve to any of the overloads:

  void f(int8_t *);
  void f(int16_t *);
  void f(int32_t *);
  void f(int64_t *);

Of course, if int32_t was defined to "int" then the above would fail for
"long int", but it seems especially surprising that this doesn't work
for int.  These days using long int directly is usually a mistake anyway.

I guess this int != int32_t thing is just something that users will have
to live with if they care about compatibility with ILP32 newlib.

I'll try to make the tests ILP32 clean once we're in stage 3, including
fixing the problems that Andreas pointed out.

Thanks,
Richard


Re: [Patch] PR fortran/92470 Fixes for CFI_address

2019-11-12 Thread Paul Richard Thomas
Hi Tobias,

Thanks for taking care of this so rapidly :-)

OK for trunk and for 9-branch.

Cheers

Paul

On Tue, 12 Nov 2019 at 14:42, Tobias Burnus  wrote:
>
> Regarding the uncontroversial part: CFI_address. This has been reported
> by Vipul Parekh a few hours ago and the problem is: The lower bounds
> stored in a bind(C) descriptor are either 0 – or, for
> pointer/allocatable arrays, the value used during allocation/pointer
> association (cf. F2018, 18.5.3, para 3, quoted in the PR).
>
> But CFI_address was always assuming 0.
>
> When fixing it, ISO_Fortran_binding_1.f90 started to fail – and looking
> through the code, I run in two problems related to the "lower_bound"s:
>
> (1) CFI_section: Nothing in the standard states, which 'lower_bound's
> shall be used for  'result'. Creating a section in Fortran always gives
> .true. for "any(lbound(array()) == 1)" – and the CFI array
> descriptors often uses '0' when Fortran has '1'. Another option would be
> to propagate the specified array section on to the CFI descriptor (i.e.
> the specified lower_bounds if not NULL or the "source"'s lower bounds
> (if lower_bound is NULL) – gfortran does the latter.
>
> (2) CFI_establish: For allocatables, it is clear – base_addr == NULL.
> For pointers, it is clear as well – it has to be '0' according to the
> standard. But for CFI_attribute_other …
>
> I have now asked at
> https://mailman.j3-fortran.org/pipermail/j3/2019-November/thread.html#11740
> – Bob thinks there might be an issue for (2) but both Bob and Bill claim
> that it is well-defined for (1). But I am not convinced. However, as it
> is unclear, I have now reverted my local changes and only kept the non
> lower_bound changes for CFI_establish/CFI_section.
>
> Additionally, the 'dv' value of CFI_establish is some pointer to memory
> which can hold an array descriptor. This memory can contain any garbage
> (e.g. via dv = malloc(…) with glibc's MALLOC_PERTURB_ set). Hence, it
> does not make sense to check 'dv' for a certain value.
>
> Build + regtested on x86_64-gnu-linux.
> OK for the trunk? Should it be backported to GCC 9?
>
> Cheers,
>
> Tobias
>


-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


[PATCH] [MIPS] Sanitize the constant argument for rotr3

2019-11-12 Thread Dragan Mladjenovic
From: "Dragan Mladjenovic" 

This was dormant for quite some time, but it started happening for me
on gcc.c-torture/compile/pr65153.c sometime after r276645 for -mabi=32 linux 
runs.

The pattern accepts any SMALL_OPERAND constant value while it asserts during 
the final
that the value is in the mode size range. I this case it happens that 
combine_and_move_insns
during ira makes a pattern with negative "shift count" which fails at final 
stage.

This simple fix just truncates the constant operand to mode size the same as 
shift patterns.

gcc/ChangeLog:

2019-11-12  Dragan Mladjenovic  

* config/mips/mips.md (rotr3): Sanitize the constant argument
instead of asserting its value.
---

Ok, for trunk and backport to gcc 9 and 8 branches?

 gcc/config/mips/mips.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 658f5e6..1d63aca 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -5845,8 +5845,8 @@
   "ISA_HAS_ROR"
 {
   if (CONST_INT_P (operands[2]))
-gcc_assert (INTVAL (operands[2]) >= 0
-   && INTVAL (operands[2]) < GET_MODE_BITSIZE (mode));
+operands[2] = GEN_INT (INTVAL (operands[2])
+ & (GET_MODE_BITSIZE (mode) - 1));
 
   return "ror\t%0,%1,%2";
 }
-- 
1.9.1



Re: [PATCH][arm][2/X] Implement __qadd, __qsub, __qdbl intrinsics

2019-11-12 Thread Christophe Lyon
On Thu, 7 Nov 2019 at 11:27, Kyrill Tkachov  wrote:
>
> Hi all,
>
> This patch implements some more Q-bit-setting intrinsics from ACLE.
> With the plumbing from patch 1 in place they are a simple builtin->RTL
> affair.
>
> Bootstrapped and tested on arm-none-linux-gnueabihf.
>
> Committing to trunk.
> Thanks,
> Kyrill
>
> 2019-11-07  Kyrylo Tkachov  
>
>  * config/arm/arm.md (arm_): New define_expand.
>  (arm__insn): New define_insn.
>  * config/arm/arm_acle.h (__qadd, __qsub, __qdbl): Define.
>  * config/arm/arm_acle_builtins.def: Add builtins for qadd, qsub.
>  * config/arm/iterators.md (SSPLUSMINUS): New code iterator.
>  (ss_op): New code_attr.
>
> 2019-11-07  Kyrylo Tkachov  
>
>  * gcc.target/arm/acle/dsp_arith.c: New test.
>

Hi Kyrill,

This new test fails when gcc is configured --with-cpu=cortex-m3:
FAIL: gcc.target/arm/acle/dsp_arith.c   -O0  (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c:10:10: warning:
implicit declaration of function '__qadd'
[-Wimplicit-function-declaration]
/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c:16:10: warning:
implicit declaration of function '__qdbl'
[-Wimplicit-function-declaration]
/gcc/testsuite/gcc.target/arm/acle/dsp_arith.c:24:10: warning:
implicit declaration of function '__qsub'
[-Wimplicit-function-declaration]

The new intrinsics are defined under __ARM_FEATURE_DSP but the
arm_qbit_flags effective target passes with "" as flags.

Christophe


[Patch] PR fortran/92470 Fixes for CFI_address

2019-11-12 Thread Tobias Burnus
Regarding the uncontroversial part: CFI_address. This has been reported 
by Vipul Parekh a few hours ago and the problem is: The lower bounds 
stored in a bind(C) descriptor are either 0 – or, for 
pointer/allocatable arrays, the value used during allocation/pointer 
association (cf. F2018, 18.5.3, para 3, quoted in the PR).


But CFI_address was always assuming 0.

When fixing it, ISO_Fortran_binding_1.f90 started to fail – and looking 
through the code, I run in two problems related to the "lower_bound"s:


(1) CFI_section: Nothing in the standard states, which 'lower_bound's 
shall be used for  'result'. Creating a section in Fortran always gives 
.true. for "any(lbound(array()) == 1)" – and the CFI array 
descriptors often uses '0' when Fortran has '1'. Another option would be 
to propagate the specified array section on to the CFI descriptor (i.e. 
the specified lower_bounds if not NULL or the "source"'s lower bounds 
(if lower_bound is NULL) – gfortran does the latter.


(2) CFI_establish: For allocatables, it is clear – base_addr == NULL. 
For pointers, it is clear as well – it has to be '0' according to the 
standard. But for CFI_attribute_other …


I have now asked at 
https://mailman.j3-fortran.org/pipermail/j3/2019-November/thread.html#11740 
– Bob thinks there might be an issue for (2) but both Bob and Bill claim 
that it is well-defined for (1). But I am not convinced. However, as it 
is unclear, I have now reverted my local changes and only kept the non 
lower_bound changes for CFI_establish/CFI_section.


Additionally, the 'dv' value of CFI_establish is some pointer to memory 
which can hold an array descriptor. This memory can contain any garbage 
(e.g. via dv = malloc(…) with glibc's MALLOC_PERTURB_ set). Hence, it 
does not make sense to check 'dv' for a certain value.


Build + regtested on x86_64-gnu-linux.
OK for the trunk? Should it be backported to GCC 9?

Cheers,

Tobias

2019-12-11  Tobias Burnus  

	libgfortran/
	PR fortran/92470
	* runtime/ISO_Fortran_binding.c (CFI_address): Handle non-zero
	lower_bound; update error message.
	(CFI_allocate): Fix comment typo.
	(CFI_establish): Fix identation, fix typos, don't check values of 'dv'
	argument.

	gcc/testsuite/
	PR fortran/92470
	* gfortran.dg/ISO_Fortran_binding_17.c: New.
	* gfortran.dg/ISO_Fortran_binding_17.f90: New.
	* gfortran.dg/ISO_Fortran_binding_1.c (elemental_mult_c, allocate_c,
	section_c, select_part_c): Update for CFI_{address} changes;
	add asserts.

 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c  | 56 
 gcc/testsuite/gfortran.dg/ISO_Fortran_binding_17.c | 25 +++
 .../gfortran.dg/ISO_Fortran_binding_17.f90 | 77 ++
 libgfortran/runtime/ISO_Fortran_binding.c  | 40 +--
 4 files changed, 160 insertions(+), 38 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c
index a6353c7cca6..091e754d8f9 100644
--- a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c
+++ b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c
@@ -1,6 +1,7 @@
 /* Test F2008 18.5: ISO_Fortran_binding.h functions.  */
 
 #include "../../../libgfortran/ISO_Fortran_binding.h"
+#include 
 #include 
 #include 
 #include 
@@ -33,13 +34,34 @@ int elemental_mult_c(CFI_cdesc_t * a_desc, CFI_cdesc_t * b_desc,
   || c_desc->rank != 2)
 return err;
 
-  for (idx[0] = 0; idx[0] < a_desc->dim[0].extent; idx[0]++)
-for (idx[1] = 0; idx[1] < a_desc->dim[1].extent; idx[1]++)
-  {
-	res_addr = CFI_address (a_desc, idx);
-	*res_addr = *(int*)CFI_address (b_desc, idx)
-		* *(int*)CFI_address (c_desc, idx);
-  }
+  if (a_desc->attribute == CFI_attribute_other)
+{
+  assert (a_desc->dim[0].lower_bound == 0);
+  assert (a_desc->dim[1].lower_bound == 0);
+  for (idx[0] = 0; idx[0] < a_desc->dim[0].extent; idx[0]++)
+	for (idx[1] = 0; idx[1] < a_desc->dim[1].extent; idx[1]++)
+	  {
+	res_addr = CFI_address (a_desc, idx);
+	*res_addr = *(int*)CFI_address (b_desc, idx)
+			* *(int*)CFI_address (c_desc, idx);
+	  }
+}
+  else
+{
+  assert (a_desc->attribute == CFI_attribute_allocatable
+	  || a_desc->attribute == CFI_attribute_pointer);
+  for (idx[0] = a_desc->dim[0].lower_bound;
+	   idx[0] < a_desc->dim[0].extent + a_desc->dim[0].lower_bound;
+	   idx[0]++)
+	for (idx[1] = a_desc->dim[1].lower_bound;
+	 idx[1] < a_desc->dim[1].extent + a_desc->dim[1].lower_bound;
+	 idx[1]++)
+	  {
+	res_addr = CFI_address (a_desc, idx);
+	*res_addr = *(int*)CFI_address (b_desc, idx)
+			* *(int*)CFI_address (c_desc, idx);
+	  }
+}
 
   return 0;
 }
@@ -57,15 +79,16 @@ int allocate_c(CFI_cdesc_t * da, CFI_index_t lower[], CFI_index_t upper[])
   CFI_index_t idx[2];
   int *res_addr;
 
+  if (da->attribute == CFI_attribute_other) return err;
   if (CFI_allocate(da, lower, upper, 0)) return err;
+  assert (da->dim[0].lower_bound == lower[0]);
+  assert 

Re: [PATCH 7/7 libgomp,amdgcn] GCN Libgomp Plugin

2019-11-12 Thread Andrew Stubbs

On 12/11/2019 14:01, Jakub Jelinek wrote:

On Tue, Nov 12, 2019 at 01:29:16PM +, Andrew Stubbs wrote:

2019-11-12  Andrew Stubbs  

libgomp/
* plugin/Makefrag.am: Add amdgcn plugin support.
* plugin/configfrag.ac: Likewise.
* plugin/plugin-gcn.c: New file.
* configure: Regenerate.


I'm a little bit worried about the elf.h include, not all targets might have
it, but perhaps that can be resolved incrementally if somebody reports it.


We only support running on x86_64 hosts that have the ROCm tools 
installed. Access to elf.h ought not to be a big deal.


When we move to HSACO v3 binaries then we should be able to drop the 
manual relocation handling, so this will go away then.


Thanks

Andrew


[PR c++/6936] Delete duplicate test

2019-11-12 Thread Nathan Sidwell

6936 and using38 are the same test.  Deleting one of them.

nathan
--
Nathan Sidwell
2019-11-12  Nathan Sidwell  

	* g++.dg/lookup/pr6936.C: Delete, identical to using38.C

Index: g++.dg/lookup/pr6936.C
===
--- g++.dg/lookup/pr6936.C	(revision 278094)
+++ g++.dg/lookup/pr6936.C	(working copy)
@@ -1,23 +0,0 @@
-// { dg-do compile }
-// PR c++/6936
-
-struct Baser
-{
-enum { j, i }; // { dg-message "declared" }
-};
-
-struct Base : Baser
-{
-static void j();
-static void i();
-};
-
-struct Derv : Base
-{
-  using Baser::j;
-private:
-  using Baser::i;
-};
-
-int k = Derv::j;
-int l = Derv::i; // { dg-error "context" }


[PATCH] PR90838: Support ctz idioms

2019-11-12 Thread Wilco Dijkstra
Hi,

Support common idioms for count trailing zeroes using an array lookup.
The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
constant which when multiplied by a power of 2 contains a unique value
in the top 5 or 6 bits.  This is then indexed into a table which maps it
to the number of trailing zeroes.  When the table is valid, we emit a
sequence using the target defined value for ctz (0):

int ctz1 (unsigned x)
{
  static const char table[32] =
{
  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};

  return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
}

Is optimized to:

rbitw0, w0
clz w0, w0
and w0, w0, 31
ret

Bootstrapped on AArch64. OK for commit?

ChangeLog:

2019-11-12  Wilco Dijkstra  

PR tree-optimization/90838
* generic-match-head.c (optimize_count_trailing_zeroes):
Add stub function.
* gimple-match-head.c (gimple_simplify): Add support for ARRAY_REF.
(optimize_count_trailing_zeroes): Add new function.
* match.pd: Add matching for ctz idioms.
* testsuite/gcc.target/aarch64/pr90838.c: New test.

--

diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c
index 
fdc603977fc5b03a843944f75ce262f5d2256308..5a38bd233585225d60f0159c9042a16d9fdc9d80
 100644
--- a/gcc/generic-match-head.c
+++ b/gcc/generic-match-head.c
@@ -88,3 +88,10 @@ optimize_successive_divisions_p (tree, tree)
 {
   return false;
 }
+
+static bool
+optimize_count_trailing_zeroes (tree type, tree array_ref, tree input,
+   tree mulc, tree shift, tree _val)
+{
+  return false;
+}
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 
53278168a59f5ac10ce6760f04fd42589a0792e7..2d3b305f8ea54e4ca31c64994af30b34bb7eff09
 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -909,6 +909,24 @@ gimple_simplify (gimple *stmt, gimple_match_op *res_op, 
gimple_seq *seq,
res_op->set_op (TREE_CODE (op0), type, valueized);
return true;
  }
+   else if (code == ARRAY_REF)
+ {
+   tree rhs1 = gimple_assign_rhs1 (stmt);
+   tree op1 = TREE_OPERAND (rhs1, 1);
+   tree op2 = TREE_OPERAND (rhs1, 2);
+   tree op3 = TREE_OPERAND (rhs1, 3);
+   tree op0 = TREE_OPERAND (rhs1, 0);
+   bool valueized = false;
+
+   op0 = do_valueize (op0, top_valueize, valueized);
+   op1 = do_valueize (op1, top_valueize, valueized);
+
+   if (op2 && op3)
+ res_op->set_op (code, type, op0, op1, op2, op3);
+   else
+ res_op->set_op (code, type, op0, op1);
+   return gimple_resimplify4 (seq, res_op, valueize) || valueized;
+ }
break;
  case GIMPLE_UNARY_RHS:
{
@@ -1222,3 +1240,57 @@ optimize_successive_divisions_p (tree divisor, tree 
inner_div)
 }
   return true;
 }
+
+/* Recognize count trailing zeroes idiom.
+   The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
+   constant which when multiplied by a power of 2 contains a unique value
+   in the top 5 or 6 bits.  This is then indexed into a table which maps it
+   to the number of trailing zeroes.  Array[0] is returned so the caller can
+   emit an appropriate sequence depending on whether ctz (0) is defined on
+   the target.  */
+static bool
+optimize_count_trailing_zeroes (tree type, tree array, tree x, tree mulc,
+   tree tshift, tree _val)
+{
+  gcc_assert (TREE_CODE (mulc) == INTEGER_CST);
+  gcc_assert (TREE_CODE (tshift) == INTEGER_CST);
+
+  tree input_type = TREE_TYPE (x);
+
+  if (!direct_internal_fn_supported_p (IFN_CTZ, input_type, OPTIMIZE_FOR_BOTH))
+return false;
+
+  unsigned HOST_WIDE_INT val = tree_to_uhwi (mulc);
+  unsigned shiftval = tree_to_uhwi (tshift);
+  unsigned input_bits = tree_to_shwi (TYPE_SIZE (input_type));
+
+  /* Check the array is not wider than integer type and the input is a 32-bit
+ or 64-bit type.  The shift should extract the top 5..7 bits.  */
+  if (TYPE_PRECISION (type) > 32)
+return false;
+  if (input_bits != 32 && input_bits != 64)
+return false;
+  if (shiftval < input_bits - 7 || shiftval > input_bits - 5)
+return false;
+
+  tree t = build4 (ARRAY_REF, type, array, size_int (0), NULL_TREE, NULL_TREE);
+  t = fold_const_aggregate_ref (t);
+  if (t == NULL)
+return false;
+
+  zero_val = build_int_cst (integer_type_node, tree_to_shwi (t));
+
+  for (unsigned i = 0; i < input_bits; i++, val <<= 1)
+{
+  if (input_bits == 32)
+   val &= 0x;
+  t = build4 (ARRAY_REF, type, array, size_int ((int)(val >> shiftval)),
+ NULL_TREE, NULL_TREE);
+  t = fold_const_aggregate_ref (t);
+  if (t == NULL || tree_to_shwi 

Re: [PATCH] Free dominance info at the beginning of pass_jump_after_combine

2019-11-12 Thread Segher Boessenkool
Hi!

On Tue, Nov 12, 2019 at 03:11:05PM +0100, Ilya Leoshkevich wrote:
> try_forward_edges does not update dominance info, and merge_blocks
> relies on it being up-to-date.  In PR92430 stale dominance info makes
> merge_blocks produce a loop in the dominator tree, which in turn makes
> delete_basic_block loop forever.
> 
> Fix by freeing dominance info at the beginning of cleanup_cfg.

> --- a/gcc/cfgcleanup.c
> +++ b/gcc/cfgcleanup.c
> @@ -3312,6 +3312,9 @@ public:
>  unsigned int
>  pass_jump_after_combine::execute (function *)
>  {
> +  /* Jump threading does not keep dominators up-to-date.  */
> +  free_dominance_info (CDI_DOMINATORS);
> +  free_dominance_info (CDI_POST_DOMINATORS);
>cleanup_cfg (flag_thread_jumps ? CLEANUP_THREADING : 0);
>return 0;
>  }

Why do you always free it, if if only gets invalidated if flag_thread_jumps?

It may be a good idea to throw away the dom info anyway, but the comment
seems off then?


Segher


Re: [patch, fortran] Load scalar intent-in variables at the beginning of procedures

2019-11-12 Thread Tobias Burnus

Hi Thomas,

On 11/12/19 1:42 PM, Thomas König wrote:

Ah, of course. I should have said module procedures. Or even module procedures 
without bind(C)?

It would probably be the latter. The change would actually be rather small: If 
conditions are met, just add attr.value for INTENT(IN). This is something we 
should probably do when we are forced into doing an ABI change by other 
circumstances.


Will this still work if one does:

module m
contains
integer function val(y)
  integer, intent(in) :: y
  val = 2*y
end function val
end module m

use m
interface
  integer function proc(z)
integer, intent(in) :: z
  end function proc
end interface
procedure(proc), pointer :: ff
ff => val
print *, ff(10)
end

Tobias



Re: [PATCH 4/7 libgomp,amdgcn] GCN libgomp port

2019-11-12 Thread Andrew Stubbs

On 12/11/2019 13:46, Jakub Jelinek wrote:

On Tue, Nov 12, 2019 at 01:29:13PM +, Andrew Stubbs wrote:

2019-11-12  Andrew Stubbs  

include/
* gomp-constants.h (GOMP_DEVICE_GCN): Define.
(GOMP_VERSION_GCN): Define.


Perhaps this could be 0, but not a big deal.


OG9 uses 0 and is not binary compatible; this was a deliberate bump.


libgomp/
* Makefile.am (libgomp_la_SOURCES): Add oacc-target.c.
* Makefile.in: Regenerate.
* config.h.in (PLUGIN_GCN): Add new undef.
* config/accel/openacc.f90 (acc_device_gcn): New parameter.
* config/gcn/affinity-fmt.c: New file.
* config/gcn/bar.c: New file.
* config/gcn/bar.h: New file.
* config/gcn/doacross.h: New file.
* config/gcn/icv-device.c: New file.
* config/gcn/oacc-target.c: New file.
* config/gcn/simple-bar.h: New file.
* config/gcn/target.c: New file.
* config/gcn/task.c: New file.
* config/gcn/team.c: New file.
* config/gcn/time.c: New file.
* configure.ac: Add amdgcn*-*-*.
* configure: Regenerate.
* configure.tgt: Add amdgcn*-*-*.
* libgomp-plugin.h (offload_target_type): Add OFFLOAD_TARGET_TYPE_GCN.
* libgomp.h (gcn_thrs): Add amdgcn variant.
(set_gcn_thrs): Likewise.
(gomp_thread): Likewise.
* oacc-int.h (goacc_thread): Likewise.
* oacc-target.c: New file.
* openacc.f90 (acc_device_gcn): New parameter.
* openacc.h (acc_device_t): Add acc_device_gcn.
* team.c (gomp_free_pool_helper): Add amdgcn support.


Ok, thanks.


Thanks.

Andrew


[PATCH] Fix slowness in demangler

2019-11-12 Thread Tim Rühsen
Hi,

this is a proposal to fix
https://sourceware.org/bugzilla/show_bug.cgi?id=25180

In short:
cxxfilt
_ZZ1_DO1z1Dclaa1D1VEE1VE2zo

takes several minutes with 100% CPU before it comes back with a result.

With this patch the result is returned immediately. The test suite in
binutils-gdb/libiberty/ throws no error.

I'd like to note that I am not subscribed to the list, so please add me
to CC when replying. Thanks in advance.

Regards, Tim
From 27f770b2d9ff4e381431612c41ce18d4b44a6667 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tim=20R=C3=BChsen?= 
Date: Tue, 12 Nov 2019 13:10:47 +0100
Subject: [PATCH] [libiberty] Fix demangler slowness issue

Fixes #25180

The demangler works with two passes. The first one is for counting
certain items. It was missing the protection against traversing subtrees
multiple times without reaching the recursion limit.  The second pass
had this protection.
Without the protection it was possible to craft input that excessively
used the CPU.

The fix uses the same mechanism as pass 2 to counterfeit this kind
of (malicious) input.
---
 include/demangle.h  |  1 +
 libiberty/cp-demangle.c | 15 +++
 libiberty/cp-demint.c   |  3 +++
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/demangle.h b/include/demangle.h
index f5d9b9e8b5..3b00dbc31a 100644
--- a/include/demangle.h
+++ b/include/demangle.h
@@ -481,6 +481,7 @@ struct demangle_component
  Initialize to zero.  Private to d_print_comp.
  All other fields are final after initialization.  */
   int d_printing;
+  int d_counting;
 
   union
   {
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index aa78c86dd4..f7c4dbbd11 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -517,7 +517,7 @@ d_growable_string_callback_adapter (const char *, size_t, void *);
 
 static void
 d_print_init (struct d_print_info *, demangle_callbackref, void *,
-	  const struct demangle_component *);
+	  struct demangle_component *);
 
 static inline void d_print_error (struct d_print_info *);
 
@@ -864,6 +864,7 @@ cplus_demangle_fill_name (struct demangle_component *p, const char *s, int len)
   if (p == NULL || s == NULL || len <= 0)
 return 0;
   p->d_printing = 0;
+  p->d_counting = 0;
   p->type = DEMANGLE_COMPONENT_NAME;
   p->u.s_name.s = s;
   p->u.s_name.len = len;
@@ -880,6 +881,7 @@ cplus_demangle_fill_extended_operator (struct demangle_component *p, int args,
   if (p == NULL || args < 0 || name == NULL)
 return 0;
   p->d_printing = 0;
+  p->d_counting = 0;
   p->type = DEMANGLE_COMPONENT_EXTENDED_OPERATOR;
   p->u.s_extended_operator.args = args;
   p->u.s_extended_operator.name = name;
@@ -900,6 +902,7 @@ cplus_demangle_fill_ctor (struct demangle_component *p,
   || (int) kind > gnu_v3_object_ctor_group)
 return 0;
   p->d_printing = 0;
+  p->d_counting = 0;
   p->type = DEMANGLE_COMPONENT_CTOR;
   p->u.s_ctor.kind = kind;
   p->u.s_ctor.name = name;
@@ -920,6 +923,7 @@ cplus_demangle_fill_dtor (struct demangle_component *p,
   || (int) kind > gnu_v3_object_dtor_group)
 return 0;
   p->d_printing = 0;
+  p->d_counting = 0;
   p->type = DEMANGLE_COMPONENT_DTOR;
   p->u.s_dtor.kind = kind;
   p->u.s_dtor.name = name;
@@ -937,6 +941,7 @@ d_make_empty (struct d_info *di)
 return NULL;
   p = >comps[di->next_comp];
   p->d_printing = 0;
+  p->d_counting = 0;
   ++di->next_comp;
   return p;
 }
@@ -4068,11 +4073,13 @@ d_growable_string_callback_adapter (const char *s, size_t l, void *opaque)
 
 static void
 d_count_templates_scopes (struct d_print_info *dpi,
-			  const struct demangle_component *dc)
+			  struct demangle_component *dc)
 {
-  if (dc == NULL)
+  if (dc == NULL || dc->d_counting > 1 || dpi->recursion > MAX_RECURSION_COUNT)
 return;
 
+  ++ dc->d_counting;
+
   switch (dc->type)
 {
 case DEMANGLE_COMPONENT_NAME:
@@ -4202,7 +4209,7 @@ d_count_templates_scopes (struct d_print_info *dpi,
 
 static void
 d_print_init (struct d_print_info *dpi, demangle_callbackref callback,
-	  void *opaque, const struct demangle_component *dc)
+	  void *opaque, struct demangle_component *dc)
 {
   dpi->len = 0;
   dpi->last_char = '\0';
diff --git a/libiberty/cp-demint.c b/libiberty/cp-demint.c
index 950e4dc552..16bf1f8ce6 100644
--- a/libiberty/cp-demint.c
+++ b/libiberty/cp-demint.c
@@ -125,6 +125,7 @@ cplus_demangle_fill_component (struct demangle_component *p,
   p->u.s_binary.left = left;
   p->u.s_binary.right = right;
   p->d_printing = 0;
+  p->d_counting = 0;
 
   return 1;
 }
@@ -149,6 +150,7 @@ cplus_demangle_fill_builtin_type (struct demangle_component *p,
 	  p->type = DEMANGLE_COMPONENT_BUILTIN_TYPE;
 	  p->u.s_builtin.type = _demangle_builtin_types[i];
 	  p->d_printing = 0;
+	  p->d_counting = 0;
 	  return 1;
 	}
 }
@@ -176,6 +178,7 @@ cplus_demangle_fill_operator (struct demangle_component *p,
 	  p->type = 

Re: [PATCH] Free dominance info at the beginning of pass_jump_after_combine

2019-11-12 Thread Richard Biener
On Tue, 12 Nov 2019, Ilya Leoshkevich wrote:

> Bootstrapped and regtested on x86_64-redhat-linux, s390x-redhat-linux
> and ppc64le-redhat-linux.  OK for trunk and gcc-9-branch?
> 
> try_forward_edges does not update dominance info, and merge_blocks
> relies on it being up-to-date.  In PR92430 stale dominance info makes
> merge_blocks produce a loop in the dominator tree, which in turn makes
> delete_basic_block loop forever.
> 
> Fix by freeing dominance info at the beginning of cleanup_cfg.

You can omit freeing CDI_POST_DOMINATORS, those are never kept
across passes.

OK with that change.

Richard.

> gcc/ChangeLog:
> 
> 2019-11-12  Ilya Leoshkevich  
> 
>   PR rtl-optimization/92430
>   * cfgcleanup.c (pass_jump_after_combine::execute): Free
>   dominance info at the beginning.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-11-12  Ilya Leoshkevich  
> 
>   PR rtl-optimization/92430
>   * gcc.dg/pr92430.c: New test (from Arseny Solokha).
> ---
>  gcc/cfgcleanup.c   |  3 +++
>  gcc/testsuite/gcc.dg/pr92430.c | 25 +
>  2 files changed, 28 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr92430.c
> 
> diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
> index 835f7d79ea4..20096de88b4 100644
> --- a/gcc/cfgcleanup.c
> +++ b/gcc/cfgcleanup.c
> @@ -3312,6 +3312,9 @@ public:
>  unsigned int
>  pass_jump_after_combine::execute (function *)
>  {
> +  /* Jump threading does not keep dominators up-to-date.  */
> +  free_dominance_info (CDI_DOMINATORS);
> +  free_dominance_info (CDI_POST_DOMINATORS);
>cleanup_cfg (flag_thread_jumps ? CLEANUP_THREADING : 0);
>return 0;
>  }
> diff --git a/gcc/testsuite/gcc.dg/pr92430.c b/gcc/testsuite/gcc.dg/pr92430.c
> new file mode 100644
> index 000..915606893ba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr92430.c
> @@ -0,0 +1,25 @@
> +// PR rtl-optimization/92430
> +// { dg-do compile }
> +// { dg-options "-Os -fno-if-conversion -fno-tree-dce 
> -fno-tree-loop-optimize -fno-tree-vrp" }
> +
> +int eb, ko;
> +
> +void
> +e9 (int pe, int lx)
> +{
> +  int ir;
> +
> +  for (ir = 0; ir < 1; ++ir)
> +{
> +  for (ko = 0; ko < 1; ++ko)
> +{
> +  for (eb = 0; eb < 1; ++eb)
> +ko += pe;
> +
> +  for (ko = 0; ko < 1; ++ko)
> +;
> +}
> +
> +  pe = ir = lx;
> +}
> +}
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

[PATCH] Free dominance info at the beginning of pass_jump_after_combine

2019-11-12 Thread Ilya Leoshkevich
Bootstrapped and regtested on x86_64-redhat-linux, s390x-redhat-linux
and ppc64le-redhat-linux.  OK for trunk and gcc-9-branch?

try_forward_edges does not update dominance info, and merge_blocks
relies on it being up-to-date.  In PR92430 stale dominance info makes
merge_blocks produce a loop in the dominator tree, which in turn makes
delete_basic_block loop forever.

Fix by freeing dominance info at the beginning of cleanup_cfg.

gcc/ChangeLog:

2019-11-12  Ilya Leoshkevich  

PR rtl-optimization/92430
* cfgcleanup.c (pass_jump_after_combine::execute): Free
dominance info at the beginning.

gcc/testsuite/ChangeLog:

2019-11-12  Ilya Leoshkevich  

PR rtl-optimization/92430
* gcc.dg/pr92430.c: New test (from Arseny Solokha).
---
 gcc/cfgcleanup.c   |  3 +++
 gcc/testsuite/gcc.dg/pr92430.c | 25 +
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr92430.c

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 835f7d79ea4..20096de88b4 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -3312,6 +3312,9 @@ public:
 unsigned int
 pass_jump_after_combine::execute (function *)
 {
+  /* Jump threading does not keep dominators up-to-date.  */
+  free_dominance_info (CDI_DOMINATORS);
+  free_dominance_info (CDI_POST_DOMINATORS);
   cleanup_cfg (flag_thread_jumps ? CLEANUP_THREADING : 0);
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/pr92430.c b/gcc/testsuite/gcc.dg/pr92430.c
new file mode 100644
index 000..915606893ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr92430.c
@@ -0,0 +1,25 @@
+// PR rtl-optimization/92430
+// { dg-do compile }
+// { dg-options "-Os -fno-if-conversion -fno-tree-dce -fno-tree-loop-optimize 
-fno-tree-vrp" }
+
+int eb, ko;
+
+void
+e9 (int pe, int lx)
+{
+  int ir;
+
+  for (ir = 0; ir < 1; ++ir)
+{
+  for (ko = 0; ko < 1; ++ko)
+{
+  for (eb = 0; eb < 1; ++eb)
+ko += pe;
+
+  for (ko = 0; ko < 1; ++ko)
+;
+}
+
+  pe = ir = lx;
+}
+}
-- 
2.23.0



Re: [PATCH 7/7 libgomp,amdgcn] GCN Libgomp Plugin

2019-11-12 Thread Jakub Jelinek
On Tue, Nov 12, 2019 at 01:29:16PM +, Andrew Stubbs wrote:
> 2019-11-12  Andrew Stubbs  
> 
>   libgomp/
>   * plugin/Makefrag.am: Add amdgcn plugin support.
>   * plugin/configfrag.ac: Likewise.
>   * plugin/plugin-gcn.c: New file.
>   * configure: Regenerate.

I'm a little bit worried about the elf.h include, not all targets might have
it, but perhaps that can be resolved incrementally if somebody reports it.

Ok.

Jakub



Re: [PATCH 5/7 libgomp,amdgcn] Optimize GCN OpenMP malloc performance

2019-11-12 Thread Jakub Jelinek
On Tue, Nov 12, 2019 at 01:29:14PM +, Andrew Stubbs wrote:
> 2019-11-12  Andrew Stubbs  
> 
>   libgomp/
>   * config/gcn/team.c (gomp_gcn_enter_kernel): Set up the team arena
>   and use team_malloc variants.
>   (gomp_gcn_exit_kernel): Use team_free.
>   * libgomp.h (TEAM_ARENA_SIZE): Define.
>   (TEAM_ARENA_FREE): Define.
>   (TEAM_ARENA_END): Define.
>   (team_malloc): New function.
>   (team_malloc_cleared): New function.
>   (team_free): New function.
>   * team.c (gomp_new_team): Use team_malloc.
>   (free_team): Use team_free.
>   (gomp_free_thread): Use team_free.
>   (gomp_pause_host): Use team_free.
>   * work.c (gomp_init_work_share): Use team_malloc.
>   (gomp_fini_work_share): Use team_free.

> +  /* Handle OOM.  */
> +  if (result + size > *(void * __lds *)TEAM_ARENA_END)
> +{
> +  const char msg[] = "GCN team arena exhausted\n";
> +  write (2, msg, sizeof(msg)-1);
> +  /* It's better to continue with reeduced performance than abort.

s/reeduced/reduced/

Not really sure if it is a good idea to print anything, at least not when
in some debugging mode.  I mean, it is fairly easy to write code that will
trigger this.  And, what is the reason why you can't free the
gomp_malloced memory, like comparing if the team_freed pointer is in between
TEAM_ARENA_START and TEAM_ARENA_END or similar, don't do anything in that
case, otherwise use free?

> + Beware that this won't get freed, which might cause more problems.  
> */
> +  result = gomp_malloc (size);
> +}
> +  return result;
> +}
> +
> +static inline void * __attribute__((malloc)) __attribute__((optimize("-O3")))
> +team_malloc_cleared (size_t size)
> +{
> +  char *result = team_malloc (size);
> +
> +  /* Clear the allocated memory.
> + This should vectorize.  The allocation has been rounded up to the next
> + 4-byte boundary, so this is safe.  */
> +  for (int i = 0; i +*(int*)(result+i) = 0;

Formatting (spaces around <, +=, +, between int and *.  Shouldn't 4 be
sizeof (int)?  And wouldn't memset (result, 0, size); do the same job?

Jakub



Re: Teach ipa-cp to propagate value ranges over binary operaitons too

2019-11-12 Thread Jan Hubicka
> > + tree op = ipa_get_jf_pass_through_operand (jfunc);
> > + value_range op_vr (op, op);
> > + value_range op_res,res;
> > +
> 
> Do we really know operation is tcc_binary here?
Constant propagation already assumes that at the same spot:

  if (TREE_CODE_CLASS (opcode) == tcc_unary)
res = fold_unary (opcode, res_type, input);
  else
res = fold_binary (opcode, res_type, input,
   ipa_get_jf_pass_through_operand (jfunc));


> 
> > + range_fold_binary_expr (_res, operation, operand_type,
> > + _lats->m_value_range.m_vr, _vr);
> > + ipa_vr_operation_and_type_effects (,
> > +_res,
> > +NOP_EXPR, param_type,
> > +operand_type);
> 
> I hope this one deals with undefined/varying/etc. properly.

ipa_vr_operation will return false if result is varying/undefined which
I ignore here, but then...
> 
> I'm also worried about types here - but I know too little about
> how we construct the jump function to say whether it's OK.
> 
> > +   }
> > +  if (!vr.undefined_p () && !vr.varying_p ())
> > +   {
> > + if (jfunc->m_vr)
> > +   {
> > + value_range jvr;
> > + if (ipa_vr_operation_and_type_effects (, jfunc->m_vr,
> > +NOP_EXPR,
> > +param_type,
> > +jfunc->m_vr->type ()))
> > +   vr.intersect (*jfunc->m_vr);
> > +   }
> > + return dest_lat->meet_with ();
Return will not happen here and we will end up at the same path as
before assuming we know nothing.
> > }
> >  }
> >else if (jfunc->type == IPA_JF_CONST)
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



Re: [PATCH 4/7 libgomp,amdgcn] GCN libgomp port

2019-11-12 Thread Jakub Jelinek
On Tue, Nov 12, 2019 at 01:29:13PM +, Andrew Stubbs wrote:
> 2019-11-12  Andrew Stubbs  
> 
>   include/
>   * gomp-constants.h (GOMP_DEVICE_GCN): Define.
>   (GOMP_VERSION_GCN): Define.

Perhaps this could be 0, but not a big deal.

>   libgomp/
>   * Makefile.am (libgomp_la_SOURCES): Add oacc-target.c.
>   * Makefile.in: Regenerate.
>   * config.h.in (PLUGIN_GCN): Add new undef.
>   * config/accel/openacc.f90 (acc_device_gcn): New parameter.
>   * config/gcn/affinity-fmt.c: New file.
>   * config/gcn/bar.c: New file.
>   * config/gcn/bar.h: New file.
>   * config/gcn/doacross.h: New file.
>   * config/gcn/icv-device.c: New file.
>   * config/gcn/oacc-target.c: New file.
>   * config/gcn/simple-bar.h: New file.
>   * config/gcn/target.c: New file.
>   * config/gcn/task.c: New file.
>   * config/gcn/team.c: New file.
>   * config/gcn/time.c: New file.
>   * configure.ac: Add amdgcn*-*-*.
>   * configure: Regenerate.
>   * configure.tgt: Add amdgcn*-*-*.
>   * libgomp-plugin.h (offload_target_type): Add OFFLOAD_TARGET_TYPE_GCN.
>   * libgomp.h (gcn_thrs): Add amdgcn variant.
>   (set_gcn_thrs): Likewise.
>   (gomp_thread): Likewise.
>   * oacc-int.h (goacc_thread): Likewise.
>   * oacc-target.c: New file.
>   * openacc.f90 (acc_device_gcn): New parameter.
>   * openacc.h (acc_device_t): Add acc_device_gcn.
>   * team.c (gomp_free_pool_helper): Add amdgcn support.

Ok, thanks.

Jakub



Re: [PATCH 3/7 libgomp,nvptx] Add device number to GOMP_OFFLOAD_openacc_async_construct

2019-11-12 Thread Jakub Jelinek
On Tue, Nov 12, 2019 at 01:29:12PM +, Andrew Stubbs wrote:
> 2019-11-12  Andrew Stubbs  
> 
>   libgomp/
>   * libgomp-plugin.h (GOMP_OFFLOAD_openacc_async_construct): Add int
>   parameter.
>   * oacc-async.c (lookup_goacc_asyncqueue): Pass device number to the
>   queue constructor.
>   * oacc-host.c (host_openacc_async_construct): Add device parameter.
>   * plugin/plugin-nvptx.c (GOMP_OFFLOAD_openacc_async_construct): Add
>   device parameter.

LGTM.

Jakub



Re: [PATCH 1/7 libgomp,nvptx] Move generic libgomp files from nvptx to accel

2019-11-12 Thread Jakub Jelinek
On Tue, Nov 12, 2019 at 01:29:10PM +, Andrew Stubbs wrote:
> 2019-11-12  Andrew Stubbs  
> 
>   libgomp/
>   * configure.tgt (nvptx*-*-*): Add "accel" directory.
>   * config/nvptx/libgomp-plugin.c: Move ...
>   * config/accel/libgomp-plugin.c: ... to here.
>   * config/nvptx/lock.c: Move ...
>   * config/accel/lock.c: ... to here.
>   * config/nvptx/mutex.c: Move ...
>   * config/accel/mutex.c: ... to here.
>   * config/nvptx/mutex.h: Move ...
>   * config/accel/mutex.h: ... to here.
>   * config/nvptx/oacc-async.c: Move ...
>   * config/accel/oacc-async.c: ... to here.
>   * config/nvptx/oacc-cuda.c: Move ...
>   * config/accel/oacc-cuda.c: ... to here.
>   * config/nvptx/oacc-host.c: Move ...
>   * config/accel/oacc-host.c: ... to here.
>   * config/nvptx/oacc-init.c: Move ...
>   * config/accel/oacc-init.c: ... to here.
>   * config/nvptx/oacc-mem.c: Move ...
>   * config/accel/oacc-mem.c: ... to here.
>   * config/nvptx/oacc-plugin.c: Move ...
>   * config/accel/oacc-plugin.c: ... to here.
>   * config/nvptx/omp-lock.h: Move ...
>   * config/accel/omp-lock.h: ... to here.
>   * config/nvptx/openacc.f90: Move ...
>   * config/accel/openacc.f90: ... to here.
>   * config/nvptx/pool.h: Move ...
>   * config/accel/pool.h: ... to here.
>   * config/nvptx/proc.c: Move ...
>   * config/accel/proc.c: ... to here.
>   * config/nvptx/ptrlock.c: Move ...
>   * config/accel/ptrlock.c: ... to here.
>   * config/nvptx/ptrlock.h: Move ...
>   * config/accel/ptrlock.h: ... to here.
>   * config/nvptx/sem.c: Move ...
>   * config/accel/sem.c: ... to here.
>   * config/nvptx/sem.h: Move ...
>   * config/accel/sem.h: ... to here.
>   * config/nvptx/thread-stacksize.h: Move ...
>   * config/accel/thread-stacksize.h: ... to here.

Ok, thanks.

Jakub



Re: [PATCH 2/2] gdbinit.in: fix wrong reference to function argument

2019-11-12 Thread Andreas Schwab
On Nov 12 2019, Konstantin Kharlamov wrote:

> Besides, I suspect, the number of actual users of this gdbinit is around
> zero, otherwise someone would have noticed the warning that gdb prints on
> every usage of these functions while the PATCH 1/2 is not applied.

It's easy to ignore it.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: Teach ipa-cp to propagate value ranges over binary operaitons too

2019-11-12 Thread Richard Biener
On Tue, 12 Nov 2019, Jan Hubicka wrote:

> Hi,
> this patch adds propagation of value ranges through binary operations.
> This is disabled for value ranges within SCC to avoid infinite loop during
> propagation.  I am bit worried about types here.  As far as I can say we
> have something like
> 
> VR in lattice of type1
> foo (type1 param)
> {
>   bar ((type3)((type2)param+(type2)4))
> }
> bar (type4 param)
> {
>use param
> }
> 
> Now in code type1 is called "operand_type" and type4 is called param_type.
> The arithmetics always happens in operand_type but I do not see why these
> needs to be necessarily the same?  Anyway this immitates what 
> constant jump functions does.
> 
> Also I noticed that we use NOP_EXPR to convert from type1 all the way to type4
> while ipa-fnsummary uses VIEW_CONVERT_EXPR to convert type3 to type4 that 
> seems
> more valid here. However VR folders always returns varying on 
> VIEW_CONVERT_EXPR
> (which is probably something that can be fixed)
> 
> Bootstrapped/regtested x86_64-linux. Does this look OK?
> 
> Honza
>   * ipa-cp.c (propagate_vr_across_jump_function): Also propagate
>   binary operations.
> 
> Index: ipa-cp.c
> ===
> --- ipa-cp.c  (revision 278094)
> +++ ipa-cp.c  (working copy)
> @@ -1974,23 +2039,51 @@ propagate_vr_across_jump_function (cgrap
>if (jfunc->type == IPA_JF_PASS_THROUGH)
>  {
>enum tree_code operation = ipa_get_jf_pass_through_operation (jfunc);
> +  class ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
> +  int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
> +  class ipcp_param_lattices *src_lats
> + = ipa_get_parm_lattices (caller_info, src_idx);
> +  tree operand_type = ipa_get_type (caller_info, src_idx);
>  
> +  if (src_lats->m_value_range.bottom_p ())
> + return dest_lat->set_to_bottom ();
> +
> +  value_range vr;
>if (TREE_CODE_CLASS (operation) == tcc_unary)
>   {
> -   class ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
> -   int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
> -   tree operand_type = ipa_get_type (caller_info, src_idx);
> -   class ipcp_param_lattices *src_lats
> - = ipa_get_parm_lattices (caller_info, src_idx);
> -
> -   if (src_lats->m_value_range.bottom_p ())
> - return dest_lat->set_to_bottom ();
> -   value_range vr;
> -   if (ipa_vr_operation_and_type_effects (,
> -  _lats->m_value_range.m_vr,
> -  operation, param_type,
> -  operand_type))
> - return dest_lat->meet_with ();
> +   ipa_vr_operation_and_type_effects (,
> +  _lats->m_value_range.m_vr,
> +  operation, param_type,
> +  operand_type);
> + }
> +  /* A crude way to prevent unbounded number of value range updates
> +  in SCC components.  We should allow limited number of updates within
> +  SCC, too.  */
> +  else if (!ipa_edge_within_scc (cs))
> + {
> +   tree op = ipa_get_jf_pass_through_operand (jfunc);
> +   value_range op_vr (op, op);
> +   value_range op_res,res;
> +

Do we really know operation is tcc_binary here?

> +   range_fold_binary_expr (_res, operation, operand_type,
> +   _lats->m_value_range.m_vr, _vr);
> +   ipa_vr_operation_and_type_effects (,
> +  _res,
> +  NOP_EXPR, param_type,
> +  operand_type);

I hope this one deals with undefined/varying/etc. properly.

I'm also worried about types here - but I know too little about
how we construct the jump function to say whether it's OK.

> + }
> +  if (!vr.undefined_p () && !vr.varying_p ())
> + {
> +   if (jfunc->m_vr)
> + {
> +   value_range jvr;
> +   if (ipa_vr_operation_and_type_effects (, jfunc->m_vr,
> +  NOP_EXPR,
> +  param_type,
> +  jfunc->m_vr->type ()))
> + vr.intersect (*jfunc->m_vr);
> + }
> +   return dest_lat->meet_with ();
>   }
>  }
>else if (jfunc->type == IPA_JF_CONST)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] Refactor tree-loop-distribution for thread safety

2019-11-12 Thread Giuliano Belinassi
Hi, Richard.

On 11/12, Richard Biener wrote:
> On Sat, Nov 9, 2019 at 3:26 PM Giuliano Belinassi
>  wrote:
> >
> > Hi all,
> >
> > This patch refactors tree-loop-distribution.c for thread safety without
> > use of C11 __thread feature. All global variables were moved to a struct
> > which is initialized at ::execute time.
> 
> Thanks for working on this.  I've been thinking on how to make this
> nicer which naturally leads to the use of C++ classes and member
> functions which get 'this' for free.  This means all functions that
> make use of 'priv' in your patch would need to become member
> functions of the class and pass_loop_distribution::execute would
> wrap it like

Wouldn't it require that we have one instance of the
`pass_loop_distribution` class for each thread? I don't know how
the pass manager would handle this at the current state, but probably
I will need to patch it in my branch.

> 
> unsigned int
> pass_loop_distribution::execute (function *fun)
> {
>   return priv_pass_vars().execute (fun);
> }


> 
> please find a better name for 'priv_pass_vars' since you can't
> reuse that name for other passes due to C++ ODR rules.
> I would suggest 'loop_distribution'.
> 
> Can you try if going this route works well?

Of course :)

> 
> Thanks,
> Richard.
> 
> > I can install this patch myself in trunk if it's OK.
> >
> > gcc/ChangeLog
> > 2019-11-09  Giuliano Belinassi  
> >
> > * cfgloop.c (get_loop_body_in_custom_order): New.
> > * cfgloop.h (get_loop_body_in_custom_order): New prototype.
> > * tree-loop-distribution.c (struct priv_pass_vars): New.
> > (bb_top_order_cmp_r): New.
> > (create_rdg_vertices): Update prototype.
> > (stmts_from_loop): Same as above.
> > (update_for_merge): Same as above.
> > (partition_merge_into): Same as above.
> > (get_data_dependence): Same as above.
> > (data_dep_in_cycle_p): Same as above.
> > (update_type_for_merge): Same as above.
> > (build_rdg_partition_for-vertex): Same as above.
> > (classify_builtin_ldst): Same as above.
> > (classify_partition): Same as above.
> > (share_memory_accesses): Same as above.
> > (rdg_build_partitions): Same as above.
> > (pg_add_dependence_edges): Same as above.
> > (build_partition_graph): Same as above.
> > (merge_dep_scc_partitions): Same as above.
> > (break_alias_scc_partitions): Same as above.
> > (finalize_partitions): Same as above.
> > (distribute_loop): Same as above.
> > (bb_top_order_init): New function.
> > (bb_top_order_destroy): New function.
> > (pass_loop_distribution::execute): Initialize struct priv.
> >
> > Thank you,
> > Giuliano.

Thank you,
Giuliano.


[PATCH 6/7 amdgcn] Use a single worker for OpenACC on AMD GCN

2019-11-12 Thread Andrew Stubbs
This patch prevents the compiler using multiple workers in a gang.  This
should be reverted when worker support is committed.

I will commit this with the reset of the series.

Andrew


2019-11-12  Andrew Stubbs  
Julian Brown  

gcc/
* config/gcn/gcn.c (gcn_goacc_validate_dims): Ensure
flag_worker_partitioning is not set.
(TARGET_GOACC_WORKER_PARTITIONING): Remove target hook definition.
* config/gcn/gcn.opt (macc-experimental-workers): Default to off.
---
 gcc/config/gcn/gcn.c   | 4 ++--
 gcc/config/gcn/gcn.opt | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index cdd24277cf6..1a69737f693 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -4695,6 +4695,8 @@ gcn_goacc_validate_dims (tree decl, int dims[], int fn_level,
   /* FIXME: remove -facc-experimental-workers when they're ready.  */
   int max_workers = flag_worker_partitioning ? 16 : 1;
 
+  gcc_assert (!flag_worker_partitioning);
+
   /* The vector size must appear to be 64, to the user, unless this is a
  SEQ routine.  The real, internal value is always 1, which means use
  autovectorization, but the user should not see that.  */
@@ -6073,8 +6075,6 @@ print_operand (FILE *file, rtx x, int code)
 #define TARGET_GOACC_REDUCTION gcn_goacc_reduction
 #undef  TARGET_GOACC_VALIDATE_DIMS
 #define TARGET_GOACC_VALIDATE_DIMS gcn_goacc_validate_dims
-#undef  TARGET_GOACC_WORKER_PARTITIONING
-#define TARGET_GOACC_WORKER_PARTITIONING true
 #undef  TARGET_HARD_REGNO_MODE_OK
 #define TARGET_HARD_REGNO_MODE_OK gcn_hard_regno_mode_ok
 #undef  TARGET_HARD_REGNO_NREGS
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index bdc878f35ad..402deb625bd 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -65,7 +65,7 @@ Target Report RejectNegative Var(flag_bypass_init_error)
 bool flag_worker_partitioning = false
 
 macc-experimental-workers
-Target Report Var(flag_worker_partitioning) Init(1)
+Target Report Var(flag_worker_partitioning) Init(0)
 
 int stack_size_opt = -1
 


[PATCH 5/7 libgomp,amdgcn] Optimize GCN OpenMP malloc performance

2019-11-12 Thread Andrew Stubbs
This patch implements a malloc optimization to improve the startup and
shutdown overhead for each OpenMP team.

New malloc functions are created, "team_malloc" and "team_free", that
take memory from a per-team memory arena provided by the plugin, rather
than the shared heap space, which is slow, and gets worse the more teams
are trying to allocate at once.

These new functions are used both in the gcn/team.c file and in selected
places elsewhere in libgomp.  Arena-space is limited (and larger sizes
have greater overhead at launch time) so this should not be a global
search and replace.

Dummy pass-through definitions are provided for other targets.

OK to commit?

Thanks

Andrew


2019-11-12  Andrew Stubbs  

libgomp/
* config/gcn/team.c (gomp_gcn_enter_kernel): Set up the team arena
and use team_malloc variants.
(gomp_gcn_exit_kernel): Use team_free.
* libgomp.h (TEAM_ARENA_SIZE): Define.
(TEAM_ARENA_FREE): Define.
(TEAM_ARENA_END): Define.
(team_malloc): New function.
(team_malloc_cleared): New function.
(team_free): New function.
* team.c (gomp_new_team): Use team_malloc.
(free_team): Use team_free.
(gomp_free_thread): Use team_free.
(gomp_pause_host): Use team_free.
* work.c (gomp_init_work_share): Use team_malloc.
(gomp_fini_work_share): Use team_free.
---
 libgomp/config/gcn/team.c | 18 ++---
 libgomp/libgomp.h | 56 +++
 libgomp/team.c| 12 -
 libgomp/work.c|  4 +--
 4 files changed, 78 insertions(+), 12 deletions(-)

diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c
index c566482bda2..063571fc751 100644
--- a/libgomp/config/gcn/team.c
+++ b/libgomp/config/gcn/team.c
@@ -57,16 +57,26 @@ gomp_gcn_enter_kernel (void)
   /* Starting additional threads is not supported.  */
   gomp_global_icv.dyn_var = true;
 
+  /* Initialize the team arena for optimized memory allocation.
+ The arena has been allocated on the host side, and the address
+ passed in via the kernargs.  Each team takes a small slice of it.  */
+  register void **kernargs asm("s8");
+  void *team_arena = (kernargs[4] + TEAM_ARENA_SIZE*teamid);
+  void * __lds *arena_free = (void * __lds *)TEAM_ARENA_FREE;
+  void * __lds *arena_end = (void * __lds *)TEAM_ARENA_END;
+  *arena_free = team_arena;
+  *arena_end = team_arena + TEAM_ARENA_SIZE;
+
   /* Allocate and initialize the team-local-storage data.  */
-  struct gomp_thread *thrs = gomp_malloc_cleared (sizeof (*thrs)
+  struct gomp_thread *thrs = team_malloc_cleared (sizeof (*thrs)
 		  * numthreads);
   set_gcn_thrs (thrs);
 
   /* Allocate and initailize a pool of threads in the team.
  The threads are already running, of course, we just need to manage
  the communication between them.  */
-  struct gomp_thread_pool *pool = gomp_malloc (sizeof (*pool));
-  pool->threads = gomp_malloc (sizeof (void *) * numthreads);
+  struct gomp_thread_pool *pool = team_malloc (sizeof (*pool));
+  pool->threads = team_malloc (sizeof (void *) * numthreads);
   for (int tid = 0; tid < numthreads; tid++)
 	pool->threads[tid] = [tid];
   pool->threads_size = numthreads;
@@ -91,7 +101,7 @@ void
 gomp_gcn_exit_kernel (void)
 {
   gomp_free_thread (gcn_thrs ());
-  free (gcn_thrs ());
+  team_free (gcn_thrs ());
 }
 
 /* This function contains the idle loop in which a thread waits
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 19e1241ee4c..659aeb95ffe 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -106,6 +106,62 @@ extern void gomp_aligned_free (void *);
GCC's builtin alloca().  */
 #define gomp_alloca(x)  __builtin_alloca(x)
 
+/* Optimized allocators for team-specific data that will die with the team.  */
+
+#ifdef __AMDGCN__
+/* The arena is initialized in config/gcn/team.c.  */
+#define TEAM_ARENA_SIZE 64*1024  /* Must match the value in plugin-gcn.c.  */
+#define TEAM_ARENA_FREE 16  /* LDS offset of free pointer.  */
+#define TEAM_ARENA_END  24  /* LDS offset of end pointer.  */
+
+static inline void * __attribute__((malloc))
+team_malloc (size_t size)
+{
+  /* 4-byte align the size.  */
+  size = (size + 3) & ~3;
+
+  /* Allocate directly from the arena.
+ The compiler does not support DS atomics, yet. */
+  void *result;
+  asm ("ds_add_rtn_u64 %0, %1, %2\n\ts_waitcnt 0"
+   : "=v"(result) : "v"(TEAM_ARENA_FREE), "v"(size), "e"(1L) : "memory");
+
+  /* Handle OOM.  */
+  if (result + size > *(void * __lds *)TEAM_ARENA_END)
+{
+  const char msg[] = "GCN team arena exhausted\n";
+  write (2, msg, sizeof(msg)-1);
+  /* It's better to continue with reeduced performance than abort.
+ Beware that this won't get freed, which might cause more problems.  */
+  result = gomp_malloc (size);
+}
+  return 

[PATCH 7/7 libgomp,amdgcn] GCN Libgomp Plugin

2019-11-12 Thread Andrew Stubbs
This patch contributes the GCN libgomp plugin, with the various
configure and make bits to go with it.

This implementation is a much-cleaned-up version of the one present on the
openacc-gcc-9-branch.

OK to commit?

Thanks

Andrew


2019-11-12  Andrew Stubbs  

libgomp/
* plugin/Makefrag.am: Add amdgcn plugin support.
* plugin/configfrag.ac: Likewise.
* plugin/plugin-gcn.c: New file.
* configure: Regenerate.
---
 libgomp/plugin/Makefrag.am   |   14 +
 libgomp/plugin/configfrag.ac |   35 +
 libgomp/plugin/plugin-gcn.c  | 3985 ++
 3 files changed, 4034 insertions(+)
 create mode 100644 libgomp/plugin/plugin-gcn.c

diff --git a/libgomp/plugin/Makefrag.am b/libgomp/plugin/Makefrag.am
index 168ef59de41..45ed043e333 100644
--- a/libgomp/plugin/Makefrag.am
+++ b/libgomp/plugin/Makefrag.am
@@ -52,3 +52,17 @@ libgomp_plugin_hsa_la_LDFLAGS += $(PLUGIN_HSA_LDFLAGS)
 libgomp_plugin_hsa_la_LIBADD = libgomp.la $(PLUGIN_HSA_LIBS)
 libgomp_plugin_hsa_la_LIBTOOLFLAGS = --tag=disable-static
 endif
+
+if PLUGIN_GCN
+# AMD GCN plugin
+libgomp_plugin_gcn_version_info = -version-info $(libtool_VERSION)
+toolexeclib_LTLIBRARIES += libgomp-plugin-gcn.la
+libgomp_plugin_gcn_la_SOURCES = plugin/plugin-gcn.c
+libgomp_plugin_gcn_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_GCN_CPPFLAGS) \
+	-D_GNU_SOURCE
+libgomp_plugin_gcn_la_LDFLAGS = $(libgomp_plugin_gcn_version_info) \
+	$(lt_host_flags)
+libgomp_plugin_gcn_la_LDFLAGS += $(PLUGIN_GCN_LDFLAGS)
+libgomp_plugin_gcn_la_LIBADD = libgomp.la $(PLUGIN_GCN_LIBS)
+libgomp_plugin_gcn_la_LIBTOOLFLAGS = --tag=disable-static
+endif
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 9718ac752e2..424ec6c96b2 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -137,6 +137,15 @@ AC_SUBST(PLUGIN_HSA_CPPFLAGS)
 AC_SUBST(PLUGIN_HSA_LDFLAGS)
 AC_SUBST(PLUGIN_HSA_LIBS)
 
+PLUGIN_GCN=0
+PLUGIN_GCN_CPPFLAGS=
+PLUGIN_GCN_LDFLAGS=
+PLUGIN_GCN_LIBS=
+AC_SUBST(PLUGIN_GCN)
+AC_SUBST(PLUGIN_GCN_CPPFLAGS)
+AC_SUBST(PLUGIN_GCN_LDFLAGS)
+AC_SUBST(PLUGIN_GCN_LIBS)
+
 # Parse '--enable-offload-targets', figure out the corresponding libgomp
 # plugins, and configure to find the corresponding offload compilers.
 # 'offload_plugins' and 'offload_targets' will be populated in the same order.
@@ -237,6 +246,29 @@ if test x"$enable_offload_targets" != x; then
 ;;
 esac
 ;;
+
+  amdgcn*)
+	case "${target}" in
+	  x86_64-*-*)
+	case " ${CC} ${CFLAGS} " in
+	  *" -m32 "*)
+		PLUGIN_GCN=0
+		;;
+	  *)
+		tgt_plugin=gcn
+		PLUGIN_GCN=$tgt
+		PLUGIN_GCN_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
+		PLUGIN_GCN_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
+		PLUGIN_GCN_LIBS="-ldl"
+		PLUGIN_GCN=1
+		;;
+	  esac
+	;;
+	  *-*-*)
+	PLUGIN_GCN=0
+	 ;;
+	esac
+	;;
   *)
 	AC_MSG_ERROR([unknown offload target specified])
 	;;
@@ -275,6 +307,9 @@ AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
 AM_CONDITIONAL([PLUGIN_HSA], [test $PLUGIN_HSA = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
   [Define to 1 if the HSA plugin is built, 0 if not.])
+AM_CONDITIONAL([PLUGIN_GCN], [test $PLUGIN_GCN = 1])
+AC_DEFINE_UNQUOTED([PLUGIN_GCN], [$PLUGIN_GCN],
+  [Define to 1 if the GCN plugin is built, 0 if not.])
 
 if test "$HSA_RUNTIME_LIB" != ""; then
   HSA_RUNTIME_LIB="$HSA_RUNTIME_LIB/"
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
new file mode 100644
index 000..583916759a5
--- /dev/null
+++ b/libgomp/plugin/plugin-gcn.c
@@ -0,0 +1,3985 @@
+/* Plugin for AMD GCN execution.
+
+   Copyright (C) 2013-2019 Free Software Foundation, Inc.
+
+   Contributed by Mentor Embedded
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* {{{ Includes and defines  */
+
+#include "config.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "libgomp-plugin.h"
+#include "gomp-constants.h"
+#include 
+#include 

[PATCH 4/7 libgomp,amdgcn] GCN libgomp port

2019-11-12 Thread Andrew Stubbs
This patch contributes a libgomp implementation for AMD GCN, minus the
plugin which is later in this series.

GCN has been allocated ID number "8", even though devices "6" and "7"
are no longer present in every place where the IDs exist (they were HSA
and Intel MIC).

Most of these changes are simply based on the model already defined for
NVPTX, adapted for GCN as appropriate.  It is assumed that the "accel"
patch has already been applied, with the files that did not need to be
adjusted.

The "oacc-target.c" file is new.  This file allows new target-specific
symbols to be added to libgomp.  I couldn't find an existing way to do
this without adding a new top-level file also, to there's an empty
placeholder also.  (The OG9 branch has this symbol in libgcc, but that
seems wrong.)

OK to commit?

Thanks

Andrew


2019-11-12  Andrew Stubbs  

include/
* gomp-constants.h (GOMP_DEVICE_GCN): Define.
(GOMP_VERSION_GCN): Define.

libgomp/
* Makefile.am (libgomp_la_SOURCES): Add oacc-target.c.
* Makefile.in: Regenerate.
* config.h.in (PLUGIN_GCN): Add new undef.
* config/accel/openacc.f90 (acc_device_gcn): New parameter.
* config/gcn/affinity-fmt.c: New file.
* config/gcn/bar.c: New file.
* config/gcn/bar.h: New file.
* config/gcn/doacross.h: New file.
* config/gcn/icv-device.c: New file.
* config/gcn/oacc-target.c: New file.
* config/gcn/simple-bar.h: New file.
* config/gcn/target.c: New file.
* config/gcn/task.c: New file.
* config/gcn/team.c: New file.
* config/gcn/time.c: New file.
* configure.ac: Add amdgcn*-*-*.
* configure: Regenerate.
* configure.tgt: Add amdgcn*-*-*.
* libgomp-plugin.h (offload_target_type): Add OFFLOAD_TARGET_TYPE_GCN.
* libgomp.h (gcn_thrs): Add amdgcn variant.
(set_gcn_thrs): Likewise.
(gomp_thread): Likewise.
* oacc-int.h (goacc_thread): Likewise.
* oacc-target.c: New file.
* openacc.f90 (acc_device_gcn): New parameter.
* openacc.h (acc_device_t): Add acc_device_gcn.
* team.c (gomp_free_pool_helper): Add amdgcn support.
---
 include/gomp-constants.h  |   2 +
 libgomp/Makefile.am   |   2 +-
 libgomp/config.h.in   |   3 +
 libgomp/config/accel/openacc.f90  |   1 +
 libgomp/config/gcn/affinity-fmt.c |  51 +++
 libgomp/config/gcn/bar.c  | 232 ++
 libgomp/config/gcn/bar.h  | 168 ++
 libgomp/config/gcn/doacross.h |  58 
 libgomp/config/gcn/icv-device.c   |  72 ++
 libgomp/config/gcn/oacc-target.c  |  31 
 libgomp/config/gcn/simple-bar.h   |  61 
 libgomp/config/gcn/target.c   |  67 +
 libgomp/config/gcn/task.c |  39 +
 libgomp/config/gcn/team.c | 202 ++
 libgomp/config/gcn/time.c |  52 +++
 libgomp/configure.ac  |   2 +-
 libgomp/configure.tgt |   4 +
 libgomp/libgomp-plugin.h  |   3 +-
 libgomp/libgomp.h |  18 +++
 libgomp/oacc-int.h|   9 +-
 libgomp/oacc-target.c |   1 +
 libgomp/openacc.f90   |   1 +
 libgomp/openacc.h |   1 +
 libgomp/team.c|   3 +
 26 files changed, 1189 insertions(+), 16 deletions(-)
 create mode 100644 libgomp/config/gcn/affinity-fmt.c
 create mode 100644 libgomp/config/gcn/bar.c
 create mode 100644 libgomp/config/gcn/bar.h
 create mode 100644 libgomp/config/gcn/doacross.h
 create mode 100644 libgomp/config/gcn/icv-device.c
 create mode 100644 libgomp/config/gcn/oacc-target.c
 create mode 100644 libgomp/config/gcn/simple-bar.h
 create mode 100644 libgomp/config/gcn/target.c
 create mode 100644 libgomp/config/gcn/task.c
 create mode 100644 libgomp/config/gcn/team.c
 create mode 100644 libgomp/config/gcn/time.c
 create mode 100644 libgomp/oacc-target.c

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 82e9094c934..9e356cdfeec 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -174,6 +174,7 @@ enum gomp_map_kind
 #define GOMP_DEVICE_NVIDIA_PTX		5
 #define GOMP_DEVICE_INTEL_MIC		6
 #define GOMP_DEVICE_HSA			7
+#define GOMP_DEVICE_GCN			8
 
 #define GOMP_DEVICE_ICV			-1
 #define GOMP_DEVICE_HOST_FALLBACK	-2
@@ -215,6 +216,7 @@ enum gomp_map_kind
 #define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
 #define GOMP_VERSION_HSA 0
+#define GOMP_VERSION_GCN 1
 
 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
 #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0x)
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 7d36343a4be..669b9e4defd 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -65,7 +65,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \
 	proc.c sem.c bar.c ptrlock.c time.c fortran.c 

[PATCH 3/7 libgomp,nvptx] Add device number to GOMP_OFFLOAD_openacc_async_construct

2019-11-12 Thread Andrew Stubbs
This patch is preparatory for the amdgcn plugin.  The current
implementation was written for CUDA in which the device associated with
the queue is inferred by some hidden magic (which seems questionable to
me but then I don't fully understand it).  The GCN plugin needs to know
for which device the queue is intended, so this simply provides that
information to the queue constructor.

OK to commit?

Thanks

Andrew


2019-11-12  Andrew Stubbs  

libgomp/
* libgomp-plugin.h (GOMP_OFFLOAD_openacc_async_construct): Add int
parameter.
* oacc-async.c (lookup_goacc_asyncqueue): Pass device number to the
queue constructor.
* oacc-host.c (host_openacc_async_construct): Add device parameter.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_openacc_async_construct): Add
device parameter.
---
 libgomp/libgomp-plugin.h  | 2 +-
 libgomp/oacc-async.c  | 3 ++-
 libgomp/oacc-host.c   | 2 +-
 libgomp/plugin/plugin-nvptx.c | 2 +-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 01483f27f4c..de969e1ba45 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -112,7 +112,7 @@ extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,
    void **, unsigned *, void *);
 extern void *GOMP_OFFLOAD_openacc_create_thread_data (int);
 extern void GOMP_OFFLOAD_openacc_destroy_thread_data (void *);
-extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (void);
+extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (int);
 extern bool GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *);
 extern int GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *);
 extern bool GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *);
diff --git a/libgomp/oacc-async.c b/libgomp/oacc-async.c
index 1760e8c90c6..2b24ae7adc2 100644
--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c
@@ -100,7 +100,8 @@ lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
 
   if (!dev->openacc.async.asyncqueue[async])
 {
-  dev->openacc.async.asyncqueue[async] = dev->openacc.async.construct_func ();
+  dev->openacc.async.asyncqueue[async]
+	= dev->openacc.async.construct_func (dev->target_id);
 
   if (!dev->openacc.async.asyncqueue[async])
 	{
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 12299aee65d..cbcac9bf7b3 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -211,7 +211,7 @@ host_openacc_async_queue_callback (struct goacc_asyncqueue *aq
 }
 
 static struct goacc_asyncqueue *
-host_openacc_async_construct (void)
+host_openacc_async_construct (int device __attribute__((unused)))
 {
   /* Non-NULL 0x... value as opaque dummy.  */
   return (struct goacc_asyncqueue *) -1;
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 9e088612b44..911d0f66a6e 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1575,7 +1575,7 @@ GOMP_OFFLOAD_openacc_cuda_set_stream (struct goacc_asyncqueue *aq, void *stream)
 }
 
 struct goacc_asyncqueue *
-GOMP_OFFLOAD_openacc_async_construct (void)
+GOMP_OFFLOAD_openacc_async_construct (int device __attribute__((unused)))
 {
   CUstream stream = NULL;
   CUDA_CALL_ERET (NULL, cuStreamCreate, , CU_STREAM_DEFAULT);


[PATCH 2/7 amdgcn] GCN mkoffload

2019-11-12 Thread Andrew Stubbs
This patch adds the mkoffload tool to the amdgcn backend.  It's similar,
but not quite the same as that on the openacc-gcc-9-branch.

I will commit this patch when the others in this series are approved.

Andrew


2019-11-12  Andrew Stubbs  

gcc/
* config/gcn/mkoffload.c: New file.
* config/gcn/offload.h: New file.
---
 gcc/config/gcn/mkoffload.c | 694 +
 gcc/config/gcn/offload.h   |  35 ++
 2 files changed, 729 insertions(+)
 create mode 100644 gcc/config/gcn/mkoffload.c
 create mode 100644 gcc/config/gcn/offload.h

diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
new file mode 100644
index 000..40b56375b75
--- /dev/null
+++ b/gcc/config/gcn/mkoffload.c
@@ -0,0 +1,694 @@
+/* Offload image generation tool for AMD GCN.
+
+   Copyright (C) 2014-2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+/* Munges GCN assembly into a C source file defining the GCN code as a
+   string.
+
+   This is not a complete assembler.  We presume the source is well
+   formed from the compiler and can die horribly if it is not.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "obstack.h"
+#include "diagnostic.h"
+#include "intl.h"
+#include 
+#include "collect-utils.h"
+#include "gomp-constants.h"
+
+const char tool_name[] = "gcn mkoffload";
+
+#define COMMENT_PREFIX "#"
+
+struct id_map
+{
+  id_map *next;
+  char *gcn_name;
+};
+
+static id_map *func_ids, **funcs_tail = _ids;
+static id_map *var_ids, **vars_tail = _ids;
+
+/* Files to unlink.  */
+static const char *gcn_s1_name;
+static const char *gcn_s2_name;
+static const char *gcn_o_name;
+static const char *gcn_cfile_name;
+
+enum offload_abi offload_abi = OFFLOAD_ABI_UNSET;
+
+/* Delete tempfiles.  */
+
+void
+tool_cleanup (bool from_signal ATTRIBUTE_UNUSED)
+{
+  if (gcn_cfile_name)
+maybe_unlink (gcn_cfile_name);
+  if (gcn_s1_name)
+maybe_unlink (gcn_s1_name);
+  if (gcn_s2_name)
+maybe_unlink (gcn_s2_name);
+  if (gcn_o_name)
+maybe_unlink (gcn_o_name);
+}
+
+static void
+mkoffload_cleanup (void)
+{
+  tool_cleanup (false);
+}
+
+/* Unlink FILE unless requested otherwise.  */
+
+void
+maybe_unlink (const char *file)
+{
+  if (!save_temps)
+{
+  if (unlink_if_ordinary (file) && errno != ENOENT)
+	fatal_error (input_location, "deleting file %s: %m", file);
+}
+  else if (verbose)
+fprintf (stderr, "[Leaving %s]\n", file);
+}
+
+/* Add or change the value of an environment variable, outputting the
+   change to standard error if in verbose mode.  */
+
+static void
+xputenv (const char *string)
+{
+  if (verbose)
+fprintf (stderr, "%s\n", string);
+  putenv (CONST_CAST (char *, string));
+}
+
+/* Read the whole input file.  It will be NUL terminated (but
+   remember, there could be a NUL in the file itself.  */
+
+static const char *
+read_file (FILE *stream, size_t *plen)
+{
+  size_t alloc = 16384;
+  size_t base = 0;
+  char *buffer;
+
+  if (!fseek (stream, 0, SEEK_END))
+{
+  /* Get the file size.  */
+  long s = ftell (stream);
+  if (s >= 0)
+	alloc = s + 100;
+  fseek (stream, 0, SEEK_SET);
+}
+  buffer = XNEWVEC (char, alloc);
+
+  for (;;)
+{
+  size_t n = fread (buffer + base, 1, alloc - base - 1, stream);
+
+  if (!n)
+	break;
+  base += n;
+  if (base + 1 == alloc)
+	{
+	  alloc *= 2;
+	  buffer = XRESIZEVEC (char, buffer, alloc);
+	}
+}
+  buffer[base] = 0;
+  *plen = base;
+  return buffer;
+}
+
+/* Parse STR, saving found tokens into PVALUES and return their number.
+   Tokens are assumed to be delimited by ':'.  */
+
+static unsigned
+parse_env_var (const char *str, char ***pvalues)
+{
+  const char *curval, *nextval;
+  char **values;
+  unsigned num = 1, i;
+
+  curval = strchr (str, ':');
+  while (curval)
+{
+  num++;
+  curval = strchr (curval + 1, ':');
+}
+
+  values = (char **) xmalloc (num * sizeof (char *));
+  curval = str;
+  nextval = strchr (curval, ':');
+  if (nextval == NULL)
+nextval = strchr (curval, '\0');
+
+  for (i = 0; i < num; i++)
+{
+  int l = nextval - curval;
+  values[i] = (char *) xmalloc (l + 1);
+  memcpy (values[i], curval, l);
+  values[i][l] = 0;
+  curval = nextval + 1;
+  nextval = strchr (curval, ':');
+  if 

[PATCH 1/7 libgomp,nvptx] Move generic libgomp files from nvptx to accel

2019-11-12 Thread Andrew Stubbs
This patch shouldn't change anything much at all; it's just an internal
reorganization of files.

The idea is to move files from libgomp "nvptx" directory that have
nothing NVPTX-specific in them.  By placing them in a separate "accel"
directory they can be shared with the GCN port, thus preventing much of
the duplication.

OK to commit?

Thanks

Andrew

2019-11-12  Andrew Stubbs  

libgomp/
* configure.tgt (nvptx*-*-*): Add "accel" directory.
* config/nvptx/libgomp-plugin.c: Move ...
* config/accel/libgomp-plugin.c: ... to here.
* config/nvptx/lock.c: Move ...
* config/accel/lock.c: ... to here.
* config/nvptx/mutex.c: Move ...
* config/accel/mutex.c: ... to here.
* config/nvptx/mutex.h: Move ...
* config/accel/mutex.h: ... to here.
* config/nvptx/oacc-async.c: Move ...
* config/accel/oacc-async.c: ... to here.
* config/nvptx/oacc-cuda.c: Move ...
* config/accel/oacc-cuda.c: ... to here.
* config/nvptx/oacc-host.c: Move ...
* config/accel/oacc-host.c: ... to here.
* config/nvptx/oacc-init.c: Move ...
* config/accel/oacc-init.c: ... to here.
* config/nvptx/oacc-mem.c: Move ...
* config/accel/oacc-mem.c: ... to here.
* config/nvptx/oacc-plugin.c: Move ...
* config/accel/oacc-plugin.c: ... to here.
* config/nvptx/omp-lock.h: Move ...
* config/accel/omp-lock.h: ... to here.
* config/nvptx/openacc.f90: Move ...
* config/accel/openacc.f90: ... to here.
* config/nvptx/pool.h: Move ...
* config/accel/pool.h: ... to here.
* config/nvptx/proc.c: Move ...
* config/accel/proc.c: ... to here.
* config/nvptx/ptrlock.c: Move ...
* config/accel/ptrlock.c: ... to here.
* config/nvptx/ptrlock.h: Move ...
* config/accel/ptrlock.h: ... to here.
* config/nvptx/sem.c: Move ...
* config/accel/sem.c: ... to here.
* config/nvptx/sem.h: Move ...
* config/accel/sem.h: ... to here.
* config/nvptx/thread-stacksize.h: Move ...
* config/accel/thread-stacksize.h: ... to here.
---
 libgomp/config/{nvptx => accel}/libgomp-plugin.c   | 0
 libgomp/config/{nvptx => accel}/lock.c | 0
 libgomp/config/{nvptx => accel}/mutex.c| 0
 libgomp/config/{nvptx => accel}/mutex.h| 0
 libgomp/config/{nvptx => accel}/oacc-async.c   | 0
 libgomp/config/{nvptx => accel}/oacc-cuda.c| 0
 libgomp/config/{nvptx => accel}/oacc-host.c| 0
 libgomp/config/{nvptx => accel}/oacc-init.c| 0
 libgomp/config/{nvptx => accel}/oacc-mem.c | 0
 libgomp/config/{nvptx => accel}/oacc-plugin.c  | 0
 libgomp/config/{nvptx => accel}/omp-lock.h | 0
 libgomp/config/{nvptx => accel}/openacc.f90| 0
 libgomp/config/{nvptx => accel}/pool.h | 0
 libgomp/config/{nvptx => accel}/proc.c | 0
 libgomp/config/{nvptx => accel}/ptrlock.c  | 0
 libgomp/config/{nvptx => accel}/ptrlock.h  | 0
 libgomp/config/{nvptx => accel}/sem.c  | 0
 libgomp/config/{nvptx => accel}/sem.h  | 0
 libgomp/config/{nvptx => accel}/thread-stacksize.h | 0
 libgomp/configure.tgt  | 2 +-
 20 files changed, 1 insertion(+), 1 deletion(-)
 rename libgomp/config/{nvptx => accel}/libgomp-plugin.c (100%)
 rename libgomp/config/{nvptx => accel}/lock.c (100%)
 rename libgomp/config/{nvptx => accel}/mutex.c (100%)
 rename libgomp/config/{nvptx => accel}/mutex.h (100%)
 rename libgomp/config/{nvptx => accel}/oacc-async.c (100%)
 rename libgomp/config/{nvptx => accel}/oacc-cuda.c (100%)
 rename libgomp/config/{nvptx => accel}/oacc-host.c (100%)
 rename libgomp/config/{nvptx => accel}/oacc-init.c (100%)
 rename libgomp/config/{nvptx => accel}/oacc-mem.c (100%)
 rename libgomp/config/{nvptx => accel}/oacc-plugin.c (100%)
 rename libgomp/config/{nvptx => accel}/omp-lock.h (100%)
 rename libgomp/config/{nvptx => accel}/openacc.f90 (100%)
 rename libgomp/config/{nvptx => accel}/pool.h (100%)
 rename libgomp/config/{nvptx => accel}/proc.c (100%)
 rename libgomp/config/{nvptx => accel}/ptrlock.c (100%)
 rename libgomp/config/{nvptx => accel}/ptrlock.h (100%)
 rename libgomp/config/{nvptx => accel}/sem.c (100%)
 rename libgomp/config/{nvptx => accel}/sem.h (100%)
 rename libgomp/config/{nvptx => accel}/thread-stacksize.h (100%)

diff --git a/libgomp/config/nvptx/libgomp-plugin.c b/libgomp/config/accel/libgomp-plugin.c
similarity index 100%
rename from libgomp/config/nvptx/libgomp-plugin.c
rename to libgomp/config/accel/libgomp-plugin.c
diff --git a/libgomp/config/nvptx/lock.c b/libgomp/config/accel/lock.c
similarity index 100%
rename from libgomp/config/nvptx/lock.c
rename to libgomp/config/accel/lock.c
diff --git a/libgomp/config/nvptx/mutex.c b/libgomp/config/accel/mutex.c
similarity index 100%
rename from libgomp/config/nvptx/mutex.c
rename to 

[PATCH 0/7 libgomp,amdgcn] AMD GCN Offloading Support

2019-11-12 Thread Andrew Stubbs
Hi all,

This patch series contributes initial OpenMP and OpenACC support for AMD
GCN GPUs.

The test results are not yet perfect, but there are many more passes than
failures, so this is a good starting point.  The rest of the issues can
be addressed as bugs during stage 3.

I have another, unfinished, patch to massage the testsuite itself.  I'll
post this shortly, once I've finished checking the forward port is
appropriate.

This series implements only single-worker support for OpenACC.  Julian
Brown may post the multiple-worker support soon, if it isn't too
difficult to forward-port.  Otherwise that will have to wait for GCC 11.

Andrew

Andrew Stubbs (7):
  Move generic libgomp files from nvptx to accel
  GCN mkoffload
  Add device number to GOMP_OFFLOAD_openacc_async_construct
  GCN libgomp port
  Optimize GCN OpenMP malloc performance
  Use a single worker for OpenACC on AMD GCN
  GCN Libgomp Plugin

 gcc/config/gcn/gcn.c  |4 +-
 gcc/config/gcn/gcn.opt|2 +-
 gcc/config/gcn/mkoffload.c|  694 +++
 gcc/config/gcn/offload.h  |   35 +
 include/gomp-constants.h  |2 +
 libgomp/Makefile.am   |2 +-
 libgomp/Makefile.in   |   61 +-
 libgomp/config.h.in   |3 +
 .../config/{nvptx => accel}/libgomp-plugin.c  |0
 libgomp/config/{nvptx => accel}/lock.c|0
 libgomp/config/{nvptx => accel}/mutex.c   |0
 libgomp/config/{nvptx => accel}/mutex.h   |0
 libgomp/config/{nvptx => accel}/oacc-async.c  |0
 libgomp/config/{nvptx => accel}/oacc-cuda.c   |0
 libgomp/config/{nvptx => accel}/oacc-host.c   |0
 libgomp/config/{nvptx => accel}/oacc-init.c   |0
 libgomp/config/{nvptx => accel}/oacc-mem.c|0
 libgomp/config/{nvptx => accel}/oacc-plugin.c |0
 libgomp/config/{nvptx => accel}/omp-lock.h|0
 libgomp/config/{nvptx => accel}/openacc.f90   |1 +
 libgomp/config/{nvptx => accel}/pool.h|0
 libgomp/config/{nvptx => accel}/proc.c|0
 libgomp/config/{nvptx => accel}/ptrlock.c |0
 libgomp/config/{nvptx => accel}/ptrlock.h |0
 libgomp/config/{nvptx => accel}/sem.c |0
 libgomp/config/{nvptx => accel}/sem.h |0
 .../{nvptx => accel}/thread-stacksize.h   |0
 libgomp/config/gcn/affinity-fmt.c |   51 +
 libgomp/config/gcn/bar.c  |  232 +
 libgomp/config/gcn/bar.h  |  168 +
 libgomp/config/gcn/doacross.h |   58 +
 libgomp/config/gcn/icv-device.c   |   72 +
 libgomp/config/gcn/oacc-target.c  |   31 +
 libgomp/config/gcn/simple-bar.h   |   61 +
 libgomp/config/gcn/target.c   |   67 +
 libgomp/config/gcn/task.c |   39 +
 libgomp/config/gcn/team.c |  212 +
 libgomp/config/gcn/time.c |   52 +
 libgomp/configure |   61 +-
 libgomp/configure.ac  |2 +-
 libgomp/configure.tgt |6 +-
 libgomp/libgomp-plugin.h  |5 +-
 libgomp/libgomp.h |   74 +
 libgomp/oacc-async.c  |3 +-
 libgomp/oacc-host.c   |2 +-
 libgomp/oacc-int.h|9 +-
 libgomp/oacc-target.c |1 +
 libgomp/openacc.f90   |1 +
 libgomp/openacc.h |1 +
 libgomp/plugin/Makefrag.am|   14 +
 libgomp/plugin/configfrag.ac  |   35 +
 libgomp/plugin/plugin-gcn.c   | 3985 +
 libgomp/plugin/plugin-nvptx.c |2 +-
 libgomp/team.c|   15 +-
 libgomp/work.c|4 +-
 55 files changed, 6035 insertions(+), 32 deletions(-)
 create mode 100644 gcc/config/gcn/mkoffload.c
 create mode 100644 gcc/config/gcn/offload.h
 rename libgomp/config/{nvptx => accel}/libgomp-plugin.c (100%)
 rename libgomp/config/{nvptx => accel}/lock.c (100%)
 rename libgomp/config/{nvptx => accel}/mutex.c (100%)
 rename libgomp/config/{nvptx => accel}/mutex.h (100%)
 rename libgomp/config/{nvptx => accel}/oacc-async.c (100%)
 rename libgomp/config/{nvptx => accel}/oacc-cuda.c (100%)
 rename libgomp/config/{nvptx => accel}/oacc-host.c (100%)
 rename libgomp/config/{nvptx => accel}/oacc-init.c (100%)
 rename libgomp/config/{nvptx => accel}/oacc-mem.c (100%)
 rename libgomp/config/{nvptx => accel}/oacc-plugin.c (100%)
 rename libgomp/config/{nvptx => accel}/omp-lock.h (100%)
 rename libgomp/config/{nvptx => accel}/openacc.f90 (98%)
 rename libgomp/config/{nvptx => accel}/pool.h (100%)
 rename libgomp/config/{nvptx => accel}/proc.c (100%)
 rename libgomp/config/{nvptx => 

Re: Ping: [PATCH V6] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-11-12 Thread Jan Hubicka
> Hi,
> 
> On Tue, Nov 12 2019, Jan Hubicka wrote:
> > Also note that there is a long standing problem with inlining ipacp
> > clones.  This can be shown on the following example:
> >
> > struct a {int a;};
> > static int foo (struct a a)
> > {
> >   return a.a;
> > }
> > __attribute__ ((noinline))
> > static int bar (struct a a)
> > {
> >   return foo(a);
> > }
> > main()
> > {
> >   struct a a={1};
> >   return bar (a);
> > }
> >
> > Now if you compile it with -O2 -fno-early-inlining ipacp correctly
> > determines constants:
> >
> > Estimating effects for bar/1.
> >Estimating body: bar/1
> >Known to be false: 
> >size:6 time:14.00 nonspec time:14.00
> >  - context independent values, size: 6, time_benefit: 0.00
> >  Decided to specialize for all known contexts, code not going to grow.
> > Setting dest_lattice to bottom, because type of param 0 of foo is NULL or 
> > unsuitable for bits propagation
> >
> > Estimating effects for foo/0.
> >Estimating body: foo/0
> >Known to be false: op0[offset: 0] changed
> >size:3 time:2.00 nonspec time:3.00
> >  - context independent values, size: 3, time_benefit: 1.00
> >  Decided to specialize for all known contexts, code not going to grow.
> >
> >
> > Yet the intended tranformation to "return 1" does not happen:
> >
> > __attribute__((noinline))
> > bar.constprop (struct a a)
> > {
> >   int a$a;
> >
> >[local count: 1073741824]:
> >   a$a_5 = a.a;
> >   return a$a_5;
> >
> > }
> >
> >
> >
> > ;; Function main (main, funcdef_no=2, decl_uid=1937, cgraph_uid=3, 
> > symbol_order=2) (executed once)
> >
> > main ()
> > {
> >   struct a a;
> >   int _3;
> >
> >[local count: 1073741824]:
> >   a.a = 1;
> >   _3 = bar.constprop (a); [tail call]
> >   a ={v} {CLOBBER};
> >   return _3;
> >
> > }
> >
> > The problem here is that foo get inlined into bar and we never apply
> > ipcp transform on foo, so a.a never gets constant propagated.  
> 
> Ugh, we never... what?  That is quite bad, how come we don't have PR
> about this?

I remember speaking about it with you few times years ago :)
> 
> >
> > For value ranges this works since late passes are able to propagate
> > constants from value ranges we attach to the default def SSA names.  I
> 
> Well, there are no SSA names for parts of aggregates.

I think all we need is to make FRE's alias oracle walker which is
responsible for propagation of constants to see if it hits entry of
function, check that base is a parameter and look into ipcp transform
summary if known value is there.
> 
> > think correct answer here is to do no subtitution in in ipa-prop.c
> > transform function.  Rather note the known values for late passes and
> > let FRE do its job.
> 
> And where would you like to save it?   Do a load at the beginning of the
> function?  My thinking was that it is better to modify the IL rather
> than storing stuff to ad-hoc on-the-side data structures.

It is already saved in the ipcp transform summary. It is about keeping
it around while copmiling function and using it same way as we use, say
results of ipa-reference analysis.

Honza
> 
> Martin


Re: [PATCH 2/2] gdbinit.in: fix wrong reference to function argument

2019-11-12 Thread Konstantin Kharlamov




On Вт, ноя 12, 2019 at 14:08, Andreas Schwab  
wrote:

On Nov 12 2019, Konstantin Kharlamov wrote:

 I'm definitely missing something. Who are these users, and how can 
they

 make anything useful of these functions if they don't even pass an
 argument?


By printing the desired value.


Hah, okay. Well, in this case their workflow now gonna be 2 times 
simpler since they don't have to type in two commands, but only one :)


Besides, I suspect, the number of actual users of this gdbinit is 
around zero, otherwise someone would have noticed the warning that gdb 
prints on every usage of these functions while the PATCH 1/2 is not 
applied.





Re: Ping: [PATCH V6] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-11-12 Thread Martin Jambor
Hi,

On Tue, Nov 12 2019, Jan Hubicka wrote:
> Also note that there is a long standing problem with inlining ipacp
> clones.  This can be shown on the following example:
>
> struct a {int a;};
> static int foo (struct a a)
> {
>   return a.a;
> }
> __attribute__ ((noinline))
> static int bar (struct a a)
> {
>   return foo(a);
> }
> main()
> {
>   struct a a={1};
>   return bar (a);
> }
>
> Now if you compile it with -O2 -fno-early-inlining ipacp correctly
> determines constants:
>
> Estimating effects for bar/1.
>Estimating body: bar/1
>Known to be false: 
>size:6 time:14.00 nonspec time:14.00
>  - context independent values, size: 6, time_benefit: 0.00
>  Decided to specialize for all known contexts, code not going to grow.
> Setting dest_lattice to bottom, because type of param 0 of foo is NULL or 
> unsuitable for bits propagation
>
> Estimating effects for foo/0.
>Estimating body: foo/0
>Known to be false: op0[offset: 0] changed
>size:3 time:2.00 nonspec time:3.00
>  - context independent values, size: 3, time_benefit: 1.00
>  Decided to specialize for all known contexts, code not going to grow.
>
>
> Yet the intended tranformation to "return 1" does not happen:
>
> __attribute__((noinline))
> bar.constprop (struct a a)
> {
>   int a$a;
>
>[local count: 1073741824]:
>   a$a_5 = a.a;
>   return a$a_5;
>
> }
>
>
>
> ;; Function main (main, funcdef_no=2, decl_uid=1937, cgraph_uid=3, 
> symbol_order=2) (executed once)
>
> main ()
> {
>   struct a a;
>   int _3;
>
>[local count: 1073741824]:
>   a.a = 1;
>   _3 = bar.constprop (a); [tail call]
>   a ={v} {CLOBBER};
>   return _3;
>
> }
>
> The problem here is that foo get inlined into bar and we never apply
> ipcp transform on foo, so a.a never gets constant propagated.  

Ugh, we never... what?  That is quite bad, how come we don't have PR
about this?

>
> For value ranges this works since late passes are able to propagate
> constants from value ranges we attach to the default def SSA names.  I

Well, there are no SSA names for parts of aggregates.

> think correct answer here is to do no subtitution in in ipa-prop.c
> transform function.  Rather note the known values for late passes and
> let FRE do its job.

And where would you like to save it?   Do a load at the beginning of the
function?  My thinking was that it is better to modify the IL rather
than storing stuff to ad-hoc on-the-side data structures.

Martin


Re: [PATCH 2/2] gdbinit.in: fix wrong reference to function argument

2019-11-12 Thread Andreas Schwab
On Nov 12 2019, Konstantin Kharlamov wrote:

> I'm definitely missing something. Who are these users, and how can they
> make anything useful of these functions if they don't even pass an
> argument?

By printing the desired value.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Use known value ranges while evaluating ipa predicates

2019-11-12 Thread Jan Hubicka
Hi,
this implements use of value ranges in ipa-predicates so inliner know
when some tests are going to be removed (especially NULL pointer
checks).

Bootstrapped/regtested x86_64-linux. Martin, I would apprechiate if you
look on the patch. 

Honza

* ipa-cp.c (ipa_vr_operation_and_type_effects): Move up in file.
(ipa_value_range_from_jfunc): New function.
* ipa-fnsummary.c (evaluate_conditions_for_known_args): Add
known_value_ranges parameter; use it to evalulate conditions.
(evaluate_properties_for_edge): Compute known value ranges.
(ipa_fn_summary_t::duplicate): Update use of
evaluate_conditions_for_known_args.
(estimate_ipcp_clone_size_and_time): Likewise.
(ipa_merge_fn_summary_after_inlining): Likewise.
* ipa-prop.h (ipa_value_range_from_jfunc): Declare.
* gcc.dg/ipa/inline-9.c: New testcase.
Index: ipa-cp.c
===
--- ipa-cp.c(revision 278094)
+++ ipa-cp.c(working copy)
@@ -1459,6 +1459,87 @@ ipa_context_from_jfunc (ipa_node_params
   return ctx;
 }
 
+/* Emulate effects of unary OPERATION and/or conversion from SRC_TYPE to
+   DST_TYPE on value range in SRC_VR and store it to DST_VR.  Return true if
+   the result is a range or an anti-range.  */
+
+static bool
+ipa_vr_operation_and_type_effects (value_range *dst_vr,
+  value_range *src_vr,
+  enum tree_code operation,
+  tree dst_type, tree src_type)
+{
+  range_fold_unary_expr (dst_vr, operation, dst_type, src_vr, src_type);
+  if (dst_vr->varying_p () || dst_vr->undefined_p ())
+return false;
+  return true;
+}
+
+/* Determine value_range of JFUNC given that INFO describes the caller node or
+   the one it is inlined to, CS is the call graph edge corresponding to JFUNC
+   and PARM_TYPE of the parameter.  */
+
+value_range
+ipa_value_range_from_jfunc (ipa_node_params *info, cgraph_edge *cs,
+   ipa_jump_func *jfunc, tree parm_type)
+{
+  value_range vr;
+  return vr;
+  if (jfunc->m_vr)
+ipa_vr_operation_and_type_effects (,
+  jfunc->m_vr,
+  NOP_EXPR, parm_type,
+  jfunc->m_vr->type ());
+  if (vr.singleton_p ())
+return vr;
+  if (jfunc->type == IPA_JF_PASS_THROUGH)
+{
+  int idx;
+  ipcp_transformation *sum
+   = ipcp_get_transformation_summary (cs->caller->inlined_to
+  ? cs->caller->inlined_to
+  : cs->caller);
+  if (!sum || !sum->m_vr)
+   return vr;
+
+  idx = ipa_get_jf_pass_through_formal_id (jfunc);
+
+  if (!(*sum->m_vr)[idx].known)
+   return vr;
+  tree vr_type = ipa_get_type (info, idx);
+  value_range srcvr ((*sum->m_vr)[idx].type,
+wide_int_to_tree (vr_type, (*sum->m_vr)[idx].min),
+wide_int_to_tree (vr_type, (*sum->m_vr)[idx].max));
+
+  enum tree_code operation = ipa_get_jf_pass_through_operation (jfunc);
+
+  if (TREE_CODE_CLASS (operation) == tcc_unary)
+   {
+ value_range res;
+
+ if (ipa_vr_operation_and_type_effects (,
+,
+operation, parm_type,
+vr_type))
+   vr.intersect (res);
+   }
+  else
+   {
+ value_range op_res, res;
+ tree op = ipa_get_jf_pass_through_operand (jfunc);
+ value_range op_vr (op, op);
+
+ range_fold_binary_expr (_res, operation, vr_type, , _vr);
+ if (ipa_vr_operation_and_type_effects (,
+_res,
+NOP_EXPR, parm_type,
+vr_type))
+   vr.intersect (res);
+   }
+}
+  return vr;
+}
+
 /* If checking is enabled, verify that no lattice is in the TOP state, i.e. not
bottom, not containing a variable component and without any known value at
the same time.  */
@@ -1936,22 +2017,6 @@ propagate_bits_across_jump_function (cgr
 return dest_lattice->set_to_bottom ();
 }
 
-/* Emulate effects of unary OPERATION and/or conversion from SRC_TYPE to
-   DST_TYPE on value range in SRC_VR and store it to DST_VR.  Return true if
-   the result is a range or an anti-range.  */
-
-static bool
-ipa_vr_operation_and_type_effects (value_range *dst_vr,
-  value_range *src_vr,
-  enum tree_code operation,
-  tree dst_type, tree src_type)
-{
-  range_fold_unary_expr (dst_vr, operation, dst_type, src_vr, src_type);
-  if (dst_vr->varying_p () || dst_vr->undefined_p ())
-return 

Re: [PATCH 2/2] gdbinit.in: fix wrong reference to function argument

2019-11-12 Thread Konstantin Kharlamov




On Вт, ноя 12, 2019 at 13:50, Andreas Schwab  
wrote:

On Nov 12 2019, Konstantin Kharlamov wrote:

 Gdb documentation says that gdb-defined functions refer to their 
args as
 $arg0…$argN. The just "$" that gdbinit is using here refers to 
something

 else,


That's the last thing printed.


Hmm, okay… Am I missing something…? As I'm reading, this looks 
definitely wrong, because the "last thing printed" is just a random 
value. To illustrate: I can do a calculation, like `p 2 + 2`, and then 
I execute, say, `pgg stmt`, and debuggee crashes because it calls 
`debug_gimple_stmt(4)`. That's what happened to me a lot of times until 
I figured the problem is in gdbinit.



 which results in gdb errors, or even crashes of debuggee. Let's
 fix that.


That breaks all users of these macros, since you are now required to
pass an argument.


I'm definitely missing something. Who are these users, and how can they 
make anything useful of these functions if they don't even pass an 
argument?



Andreas.

--
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA 
B9D7

"And now for something completely different."





Teach ipa-cp to propagate value ranges over binary operaitons too

2019-11-12 Thread Jan Hubicka
Hi,
this patch adds propagation of value ranges through binary operations.
This is disabled for value ranges within SCC to avoid infinite loop during
propagation.  I am bit worried about types here.  As far as I can say we
have something like

VR in lattice of type1
foo (type1 param)
{
  bar ((type3)((type2)param+(type2)4))
}
bar (type4 param)
{
   use param
}

Now in code type1 is called "operand_type" and type4 is called param_type.
The arithmetics always happens in operand_type but I do not see why these
needs to be necessarily the same?  Anyway this immitates what 
constant jump functions does.

Also I noticed that we use NOP_EXPR to convert from type1 all the way to type4
while ipa-fnsummary uses VIEW_CONVERT_EXPR to convert type3 to type4 that seems
more valid here. However VR folders always returns varying on VIEW_CONVERT_EXPR
(which is probably something that can be fixed)

Bootstrapped/regtested x86_64-linux. Does this look OK?

Honza
* ipa-cp.c (propagate_vr_across_jump_function): Also propagate
binary operations.

Index: ipa-cp.c
===
--- ipa-cp.c(revision 278094)
+++ ipa-cp.c(working copy)
@@ -1974,23 +2039,51 @@ propagate_vr_across_jump_function (cgrap
   if (jfunc->type == IPA_JF_PASS_THROUGH)
 {
   enum tree_code operation = ipa_get_jf_pass_through_operation (jfunc);
+  class ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
+  int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
+  class ipcp_param_lattices *src_lats
+   = ipa_get_parm_lattices (caller_info, src_idx);
+  tree operand_type = ipa_get_type (caller_info, src_idx);
 
+  if (src_lats->m_value_range.bottom_p ())
+   return dest_lat->set_to_bottom ();
+
+  value_range vr;
   if (TREE_CODE_CLASS (operation) == tcc_unary)
{
- class ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
- int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
- tree operand_type = ipa_get_type (caller_info, src_idx);
- class ipcp_param_lattices *src_lats
-   = ipa_get_parm_lattices (caller_info, src_idx);
-
- if (src_lats->m_value_range.bottom_p ())
-   return dest_lat->set_to_bottom ();
- value_range vr;
- if (ipa_vr_operation_and_type_effects (,
-_lats->m_value_range.m_vr,
-operation, param_type,
-operand_type))
-   return dest_lat->meet_with ();
+ ipa_vr_operation_and_type_effects (,
+_lats->m_value_range.m_vr,
+operation, param_type,
+operand_type);
+   }
+  /* A crude way to prevent unbounded number of value range updates
+in SCC components.  We should allow limited number of updates within
+SCC, too.  */
+  else if (!ipa_edge_within_scc (cs))
+   {
+ tree op = ipa_get_jf_pass_through_operand (jfunc);
+ value_range op_vr (op, op);
+ value_range op_res,res;
+
+ range_fold_binary_expr (_res, operation, operand_type,
+ _lats->m_value_range.m_vr, _vr);
+ ipa_vr_operation_and_type_effects (,
+_res,
+NOP_EXPR, param_type,
+operand_type);
+   }
+  if (!vr.undefined_p () && !vr.varying_p ())
+   {
+ if (jfunc->m_vr)
+   {
+ value_range jvr;
+ if (ipa_vr_operation_and_type_effects (, jfunc->m_vr,
+NOP_EXPR,
+param_type,
+jfunc->m_vr->type ()))
+   vr.intersect (*jfunc->m_vr);
+   }
+ return dest_lat->meet_with ();
}
 }
   else if (jfunc->type == IPA_JF_CONST)


Re: [PATCH 2/2] gdbinit.in: fix wrong reference to function argument

2019-11-12 Thread Andreas Schwab
On Nov 12 2019, Konstantin Kharlamov wrote:

> Gdb documentation says that gdb-defined functions refer to their args as
> $arg0…$argN. The just "$" that gdbinit is using here refers to something
> else,

That's the last thing printed.

> which results in gdb errors, or even crashes of debuggee. Let's
> fix that.

That breaks all users of these macros, since you are now required to
pass an argument.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [patch, fortran] Load scalar intent-in variables at the beginning of procedures

2019-11-12 Thread Thomas König
Hi Janne,

> Ah, of course. I should have said module procedures. Or even module
> procedures without bind(C)?

It would probably be the latter.

The change would actually be rather small: If conditions are met, just add 
attr.value for INTENT(IN).

This is something we should probably do when we are forced into doing an ABI 
change by other circumstances.

Regards, Thomad

Re: [PATCH, OpenACC, v2] Non-contiguous array support for OpenACC data clauses

2019-11-12 Thread Chung-Lin Tang

Hi Thomas,
thanks for the first review. I'm still working on another revision,
but wanted to respond to some of the issues you raised first:

On 2019/11/7 8:48 AM, Thomas Schwinge wrote:

(1) The simplest solution: implement a processing which searches and reverts 
such
non-contiguous array map entries in GOACC_parallel_keyed.
(note: I have implemented this in the current attached "v2" patch)

(2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes;
i.e. still do the proper gomp_map_vars processing for all cases.

(3) Modify the non-contiguous array map conventions: a possible solution is to 
use
two maps placed together: one for the array pointer, another for the array 
descriptor (as
opposed to the current style of using only one map) This needs more further 
elaborate
compiler/runtime work.

The first two options will pessimize host-mode performance somewhat. The third 
I have
some WIP patches, but it's still buggy ATM. Seeking your opinion on what we 
should do.

I'll have to think about it some more, but variant (1) doesn't seem so
bad actually, for a first take.  While it's not nice to pessimize in
particular directives with 'if (false)' clauses, at least it does work,
the run-time overhead should not be too bad (also compared to variant
(2), I suppose), and variant (3) can still be implemented later.


The issue is that (1),(2) vs (3) have different binary interfaces, so a 
decision has to be
made first, lest we again have compatibility issues later.

Also, (1) vs (2) also may be somewhat different do to the memory copying 
effects of
gomp_map_vars()  (possible semantic difference versus the usual shared memory 
expectations?)

I'm currently working on another way of implementing something similar to (3),
but using the variadic arguments of GOACC_parallel_keyed instead of maps, WDYT?


@@ -13238,6 +13247,7 @@ handle_omp_array_sections (tree c, enum c_omp_regi
unsigned int num = types.length (), i;
tree t, side_effects = NULL_TREE, size = NULL_TREE;
tree condition = NULL_TREE;
+  tree ncarray_dims = NULL_TREE;
  
if (int_size_in_bytes (TREE_TYPE (first)) <= 0)

maybe_zero_len = true;
@@ -13261,6 +13271,13 @@ handle_omp_array_sections (tree c, enum c_omp_regi
length = fold_convert (sizetype, length);
  if (low_bound == NULL_TREE)
low_bound = integer_zero_node;
+
+ if (non_contiguous)
+   {
+ ncarray_dims = tree_cons (low_bound, length, ncarray_dims);
+ continue;
+   }
+
  if (!maybe_zero_len && i > first_non_one)
{
  if (integer_nonzerop (low_bound))

I'm not at all familiar with this array sections code, will trust your
understanding that we don't need any of the processing that you're
skipping here ('continue'): 'TREE_SIDE_EFFECTS' handling for the length
expressions, and other things.


I will re-check on this.

Ditto for the other minor issues you raised.


  if (DECL_P (decl))
{
  if (DECL_SIZE (decl)
@@ -2624,6 +2830,14 @@ scan_omp_target (gomp_target *stmt, omp_context *o
gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
  }

+  /* If is OpenACC construct, put non-contiguous array clauses (if any)
+ in front of clause chain. The runtime can then test the first to see
+ if the additional map processing for them is required.  */
+  if (is_gimple_omp_oacc (stmt))
+reorder_noncontig_array_clauses (gimple_omp_target_clauses_ptr (stmt));

Should that be deemed unsuitable for any reason, then add a new
'GOACC_FLAG_*' flag to indicate existance of non-contiguous arrays.


I'm considering using that convention unconditionally, not sure if it's faster
though, since that means we can't do the 'early breaking' you mentioned when
scanning through maps looking for GOMP_MAP_NONCONTIG_ARRAY_P.


--- include/gomp-constants.h(revision 277827)
+++ include/gomp-constants.h(working copy)
@@ -40,6 +40,7 @@
  #define GOMP_MAP_FLAG_SPECIAL_0   (1 << 2)
  #define GOMP_MAP_FLAG_SPECIAL_1   (1 << 3)
  #define GOMP_MAP_FLAG_SPECIAL_2   (1 << 4)
+#define GOMP_MAP_FLAG_SPECIAL_3(1 << 5)
  #define GOMP_MAP_FLAG_SPECIAL (GOMP_MAP_FLAG_SPECIAL_1 \
 | GOMP_MAP_FLAG_SPECIAL_0)
  /* Flag to force a specific behavior (or else, trigger a run-time error).  */
@@ -127,6 +128,26 @@ enum gomp_map_kind
  /* Decrement usage count and deallocate if zero.  */
  GOMP_MAP_RELEASE =(GOMP_MAP_FLAG_SPECIAL_2
 | GOMP_MAP_DELETE),
+/* Mapping kinds for non-contiguous arrays.  */
+GOMP_MAP_NONCONTIG_ARRAY = (GOMP_MAP_FLAG_SPECIAL_3),
+GOMP_MAP_NONCONTIG_ARRAY_TO =  (GOMP_MAP_NONCONTIG_ARRAY
+| GOMP_MAP_TO),
+GOMP_MAP_NONCONTIG_ARRAY_FROM =(GOMP_MAP_NONCONTIG_ARRAY

Re: Ping: [PATCH V6] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-11-12 Thread Jan Hubicka
> > +2019-11-05  Feng Xue  
> > +
> > +   PR ipa/91682
> > +   * ipa-prop.h (jump_func_type): New value IPA_JF_LOAD_AGG.
> > +   (ipa_load_agg_data, ipa_agg_value, ipa_agg_value_set): New structs.
> > +   (ipa_agg_jf_item): Add new field jftype and type, redefine field value.
> > +   (ipa_agg_jump_function): Remove member function equal_to.
> > +   (ipa_agg_jump_function_p): Remove typedef.
> > +   (ipa_copy_agg_values, ipa_release_agg_values): New functions.
> > +   * ipa-prop.c (ipa_print_node_jump_functions_for_edge): Dump
> > +   information for aggregate jump function.
> > +   (get_ssa_def_if_simple_copy): Add new parameter rhs_stmt to
> > +   record last definition statement.
> > +   (load_from_unmodified_param_or_agg): New function.
> > +   (ipa_known_agg_contents_list): Add new field type and value, remove
> > +   field constant.
> > +   (build_agg_jump_func_from_list): Rename parameter const_count to
> > +   value_count, build aggregate jump function from ipa_load_agg_data.
> > +   (analyze_agg_content_value): New function.
> > +   (extract_mem_content): Analyze memory store assignment to prepare
> > +   information for aggregate jump function generation.
> > +   (determine_known_aggregate_parts): Add new parameter fbi, remove
> > +   parameter aa_walk_budeget_p.
> > +   (update_jump_functions_after_inlining): Update aggregate jump function.
> > +   (ipa_find_agg_cst_for_param): Change type of parameter agg.
> > +   (try_make_edge_direct_simple_call): Add new parameter new_root.
> > +   (try_make_edge_direct_virtual_call): Add new parameter new_root and
> > +   new_root_info.
> > +   (update_indirect_edges_after_inlining): Pass new argument to
> > +   try_make_edge_direct_simple_call and try_make_edge_direct_virtual_call.
> > +   (ipa_write_jump_function): Write aggregate jump function to file.
> > +   (ipa_read_jump_function): Read aggregate jump function from file.
> > +   (ipa_agg_value::equal_to): Migrate from ipa_agg_jf_item::equal_to.
> > +   * ipa-cp.c (ipa_get_jf_arith_result): New function.
> > +   (ipa_agg_value_from_node): Likewise.
> > +   (ipa_agg_value_set_from_jfunc): Likewise.
> > +   (propagate_vals_across_arith_jfunc): Likewise.
> > +   (propagate_aggregate_lattice): Likewise.
> > +   (ipa_get_jf_pass_through_result): Call ipa_get_jf_arith_result.
> > +   (propagate_vals_across_pass_through): Call
> > +   propagate_vals_across_arith_jfunc.
> > +   (get_clone_agg_value): Move forward.
> > +   (propagate_aggs_across_jump_function): Handle aggregate jump function
> > +   propagation.
> > +   (agg_jmp_p_vec_for_t_vec): Remove.
> > +   (context_independent_aggregate_values): Change use of
> > +   vec to vec.
> > +   (copy_plats_to_inter, intersect_with_plats): Likewise.
> > +   (agg_replacements_to_vector, intersect_with_agg_replacements): Likewise.
> > +   (intersect_aggregate_with_edge): Likewise.
> > +   (find_aggregate_values_for_callers_subset): Likewise.
> > +   (cgraph_edge_brings_all_agg_vals_for_node): Likewise.
> > +   (estimate_local_effects): Change use of vec
> > +   and vec to vec.
> > +   (gather_context_independent_values): Likewise.
> > +   (perform_estimation_of_a_value, decide_whether_version_node): Likewise.
> > +   * ipa-fnsummary.c (evaluate_conditions_for_known_args): Change use of
> > +   vec to vec.
> > +   (evaluate_properties_for_edge): Likewise.
> > +   (estimate_edge_devirt_benefit): Likewise.
> > +   (estimate_edge_size_and_time):  Likewise.
> > +   (estimate_calls_size_and_time): Likewise.
> > +   (ipa_call_context::ipa_call_context): Likewise.
> > +   (estimate_ipcp_clone_size_and_time):  Likewise.
> > +   * ipa-fnsummary.h (ipa_call_context): Change use of
> > +   vec to vec.
> > +   * ipa-inline-analysis.c (do_estimate_edge_time): Change use of
> > +   vec to vec.
> > +   (do_estimate_edge_size): Likewise.
> > +   (do_estimate_edge_hints): Likewise.
> > +
> 
> OK, thanks - this looks like very nice ipa-prop improvement.
Also note that there is a long standing problem with inlining ipacp
clones.  This can be shown on the following example:

struct a {int a;};
static int foo (struct a a)
{
  return a.a;
}
__attribute__ ((noinline))
static int bar (struct a a)
{
  return foo(a);
}
main()
{
  struct a a={1};
  return bar (a);
}

Now if you compile it with -O2 -fno-early-inlining ipacp correctly
determines constants:

Estimating effects for bar/1.
   Estimating body: bar/1
   Known to be false: 
   size:6 time:14.00 nonspec time:14.00
 - context independent values, size: 6, time_benefit: 0.00
 Decided to specialize for all known contexts, code not going to grow.
Setting dest_lattice to bottom, because type of param 0 of foo is NULL or 
unsuitable for bits propagation

Estimating effects for foo/0.
   Estimating body: foo/0
   Known to be false: op0[offset: 0] changed
   size:3 time:2.00 nonspec time:3.00
 - context independent values, size: 3, time_benefit: 1.00
 Decided to specialize for all known contexts, code not going 

Re: Ping: [PATCH V6] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-11-12 Thread Jan Hubicka
> +2019-11-05  Feng Xue  
> +
> + PR ipa/91682
> + * ipa-prop.h (jump_func_type): New value IPA_JF_LOAD_AGG.
> + (ipa_load_agg_data, ipa_agg_value, ipa_agg_value_set): New structs.
> + (ipa_agg_jf_item): Add new field jftype and type, redefine field value.
> + (ipa_agg_jump_function): Remove member function equal_to.
> + (ipa_agg_jump_function_p): Remove typedef.
> + (ipa_copy_agg_values, ipa_release_agg_values): New functions.
> + * ipa-prop.c (ipa_print_node_jump_functions_for_edge): Dump
> + information for aggregate jump function.
> + (get_ssa_def_if_simple_copy): Add new parameter rhs_stmt to
> + record last definition statement.
> + (load_from_unmodified_param_or_agg): New function.
> + (ipa_known_agg_contents_list): Add new field type and value, remove
> + field constant.
> + (build_agg_jump_func_from_list): Rename parameter const_count to
> + value_count, build aggregate jump function from ipa_load_agg_data.
> + (analyze_agg_content_value): New function.
> + (extract_mem_content): Analyze memory store assignment to prepare
> + information for aggregate jump function generation.
> + (determine_known_aggregate_parts): Add new parameter fbi, remove
> + parameter aa_walk_budeget_p.
> + (update_jump_functions_after_inlining): Update aggregate jump function.
> + (ipa_find_agg_cst_for_param): Change type of parameter agg.
> + (try_make_edge_direct_simple_call): Add new parameter new_root.
> + (try_make_edge_direct_virtual_call): Add new parameter new_root and
> + new_root_info.
> + (update_indirect_edges_after_inlining): Pass new argument to
> + try_make_edge_direct_simple_call and try_make_edge_direct_virtual_call.
> + (ipa_write_jump_function): Write aggregate jump function to file.
> + (ipa_read_jump_function): Read aggregate jump function from file.
> + (ipa_agg_value::equal_to): Migrate from ipa_agg_jf_item::equal_to.
> + * ipa-cp.c (ipa_get_jf_arith_result): New function.
> + (ipa_agg_value_from_node): Likewise.
> + (ipa_agg_value_set_from_jfunc): Likewise.
> + (propagate_vals_across_arith_jfunc): Likewise.
> + (propagate_aggregate_lattice): Likewise.
> + (ipa_get_jf_pass_through_result): Call ipa_get_jf_arith_result.
> + (propagate_vals_across_pass_through): Call
> + propagate_vals_across_arith_jfunc.
> + (get_clone_agg_value): Move forward.
> + (propagate_aggs_across_jump_function): Handle aggregate jump function
> + propagation.
> + (agg_jmp_p_vec_for_t_vec): Remove.
> + (context_independent_aggregate_values): Change use of
> + vec to vec.
> + (copy_plats_to_inter, intersect_with_plats): Likewise.
> + (agg_replacements_to_vector, intersect_with_agg_replacements): Likewise.
> + (intersect_aggregate_with_edge): Likewise.
> + (find_aggregate_values_for_callers_subset): Likewise.
> + (cgraph_edge_brings_all_agg_vals_for_node): Likewise.
> + (estimate_local_effects): Change use of vec
> + and vec to vec.
> + (gather_context_independent_values): Likewise.
> + (perform_estimation_of_a_value, decide_whether_version_node): Likewise.
> + * ipa-fnsummary.c (evaluate_conditions_for_known_args): Change use of
> + vec to vec.
> + (evaluate_properties_for_edge): Likewise.
> + (estimate_edge_devirt_benefit): Likewise.
> + (estimate_edge_size_and_time):  Likewise.
> + (estimate_calls_size_and_time): Likewise.
> + (ipa_call_context::ipa_call_context): Likewise.
> + (estimate_ipcp_clone_size_and_time):  Likewise.
> + * ipa-fnsummary.h (ipa_call_context): Change use of
> + vec to vec.
> + * ipa-inline-analysis.c (do_estimate_edge_time): Change use of
> + vec to vec.
> + (do_estimate_edge_size): Likewise.
> + (do_estimate_edge_hints): Likewise.
> +

OK, thanks - this looks like very nice ipa-prop improvement.

Honza


[PATCH] Fix PR92460

2019-11-12 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-11-12  Richard Biener  

PR tree-optimization/92460
* tree-vect-stmts.c (vectorizable_simd_clone_call): Unshare
expression before gimplifying.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 278081)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -4246,8 +4246,8 @@ vectorizable_simd_clone_call (stmt_vec_i
{
  gimple_seq stmts;
  arginfo[i].op
-   = force_gimple_operand (arginfo[i].op, , true,
-   NULL_TREE);
+   = force_gimple_operand (unshare_expr (arginfo[i].op),
+   , true, NULL_TREE);
  if (stmts != NULL)
{
  basic_block new_bb;


Re: [PATCH 3/X] [libsanitizer] Add option to bootstrap using HWASAN

2019-11-12 Thread Martin Liška

On 11/11/19 5:03 PM, Matthew Malcomson wrote:

Ah!
My apologies -- I sent up a series with a few documentation mistakes.
(the others were wording problems so less noticeable)


That's fine, I fixed that very easily.

Right now, I can confirm using a aarch64 KVM with the following linux kernel:
5.4.0-rc6-3.g7068448-default works. I haven't tried HWASAN bootstrap, but I can
run almost all hwasan.exp tests.

There are 2 exceptions:

FAIL: gcc.dg/hwasan/stack-tagging-basic-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/hwasan/large-aligned-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test

These fail due to unused value of a function that returns int. The attached 
patch fixes that.
I'm planning to make a proper comments about the series starting next week.

For the meantime, I have some libsanitizer upstream suggestions
that you can may be discuss. It's mostly about
shadow memory dump differences in between ASAN and HWASAN:

Let's consider one example:

$ cat malloc.c

#include 

int main(int argc, char **argv)
{
char *ptr = malloc (argc);
return ptr[1];
}

$ gcc malloc.c -fsanitize=address && ./a.out
=
==7319==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xaca007b1 
at pc 0x004007a0 bp 0xf26df150 sp 0xf26df168
READ of size 1 at 0xaca007b1 thread T0
#0 0x40079c in main (/home/marxin/Programming/gcc/a.out+0x40079c)
#1 0xb0d3d3e8 in __libc_start_main (/lib64/libc.so.6+0x243e8)
#2 0x400670  (/home/marxin/Programming/gcc/a.out+0x400670)

0xaca007b1 is located 0 bytes to the right of 1-byte region 
[0xaca007b0,0xaca007b1)
allocated by thread T0 here:
#0 0xb0f2bdbc in __interceptor_malloc 
../../../../libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x400748 in main (/home/marxin/Programming/gcc/a.out+0x400748)
#2 0xb0d3d3e8 in __libc_start_main (/lib64/libc.so.6+0x243e8)
#3 0x400670  (/home/marxin/Programming/gcc/a.out+0x400670)

SUMMARY: AddressSanitizer: heap-buffer-overflow 
(/home/marxin/Programming/gcc/a.out+0x40079c) in main
Shadow bytes around the buggy address:
  0x200ff59400a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x200ff59400b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x200ff59400c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x200ff59400d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x200ff59400e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x200ff59400f0: fa fa fa fa fa fa[01]fa fa fa fa fa fa fa fa fa
  0x200ff5940100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x200ff5940110: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x200ff5940120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x200ff5940130: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x200ff5940140: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:cb
  Shadow gap:  cc
==7319==ABORTING

$ gcc malloc.c -fsanitize=hwaddress && ./a.out
==7329==ERROR: HWAddressSanitizer: tag-mismatch on address 0xefdee001 at pc 
0x804bbcd0
READ of size 1 at 0xefdee001 tags: 03/01 (ptr/mem) in thread T0
#0 0x804bbccc in SigTrap<0> 
../../../../libsanitizer/hwasan/hwasan_checks.h:27
#1 0x804bbccc in CheckAddress<(__hwasan::ErrorAction)0, 
(__hwasan::AccessType)0, 0> ../../../../libsanitizer/hwasan/hwasan_checks.h:88
#2 0x804bbccc in __hwasan_load1 
../../../../libsanitizer/hwasan/hwasan.cpp:469
#3 0x4007d4 in main (/home/marxin/Programming/gcc/a.out+0x4007d4)
#4 0x8035e3e8 in __libc_start_main (/lib64/libc.so.6+0x243e8)
#5 0x4006b0  (/home/marxin/Programming/gcc/a.out+0x4006b0)

[0xefdee000,0xefdee020) is a small allocated heap chunk; size: 32 
offset: 1
0xefdee001 is located 0 bytes to the right of 1-byte region 
[0xefdee000,0xefdee001)
allocated here:
#0 0x804bd81c in __sanitizer_malloc 
../../../../libsanitizer/hwasan/hwasan_interceptors.cpp:169
#1 0x4007b8 in main (/home/marxin/Programming/gcc/a.out+0x4007b8)
#2 0x8035e3e8 in __libc_start_main (/lib64/libc.so.6+0x243e8)
#3 0x4006b0  (/home/marxin/Programming/gcc/a.out+0x4006b0)

Thread: T0 0xeffe2000 stack: [0xd63c2000,0xd6bc2000) sz: 8388608 
tls: [0x80e25020,0x80e25790)
Memory tags around 

[PATCH] Fix PR92461

2019-11-12 Thread Richard Biener


Bootstrapped & tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-11-12  Richard Biener  

PR tree-optimization/92461
* tree-vect-loop.c (vect_create_epilog_for_reduction): Update
stmt after propagation.

* gcc.dg/torture/pr92461.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 278081)
+++ gcc/tree-vect-loop.c(working copy)
@@ -5300,8 +5300,11 @@ vect_create_epilog_for_reduction (stmt_v
   orig_name = PHI_RESULT (exit_phi);
   scalar_result = scalar_results[k];
   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
-FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-  SET_USE (use_p, scalar_result);
+   {
+ FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+   SET_USE (use_p, scalar_result);
+ update_stmt (use_stmt);
+   }
 }
 
   phis.release ();
Index: gcc/testsuite/gcc.dg/torture/pr92461.c
===
--- gcc/testsuite/gcc.dg/torture/pr92461.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr92461.c  (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ftree-vectorize" } */
+
+short int zb;
+
+void
+gs (void)
+{
+  while (zb < 1)
+{
+  int at;
+
+  zb %= 1;
+
+  for (at = 0; at < 56; ++at)
+   zb += zb;
+
+  ++zb;
+}
+}


[PATCH 0/2] gdbinit.in fixes

2019-11-12 Thread Konstantin Kharlamov
This includes the unmerged previously posted here patch about calling
with `call`. I made a typo in commit message formatting, so I fix it
here.

I'd like to note that I am not subscribed to the list, so please add me
to CC when replying. Thanks in advance.

Konstantin Kharlamov (2):
  gdbinit.in: call a function with "call", not "set"
  gdbinit.in: fix wrong reference to function argument

 gcc/gdbinit.in | 84 +-
 1 file changed, 42 insertions(+), 42 deletions(-)

-- 
2.24.0



[PATCH 2/2] gdbinit.in: fix wrong reference to function argument

2019-11-12 Thread Konstantin Kharlamov
Gdb documentation says that gdb-defined functions refer to their args as
$arg0…$argN. The just "$" that gdbinit is using here refers to something
else, which results in gdb errors, or even crashes of debuggee. Let's
fix that.

* (debug,debug_rtx,pr,prl,pt,pct,pgg,pgq,pgq,pgs,pge,pmz,ptc,
pdn,ptn,pdd,prc,pi,pbs,pbm,pel,pcfun,trt): replace $ with $arg0
---
 gcc/gdbinit.in | 82 +-
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/gcc/gdbinit.in b/gcc/gdbinit.in
index a933ddc6141..71a01edaa71 100644
--- a/gcc/gdbinit.in
+++ b/gcc/gdbinit.in
@@ -17,153 +17,153 @@
 # .
 
 define pp
-call debug ($)
+call debug ($arg0)
 end
 
 document pp
-Print a representation of the GCC data structure that is $.
+Print a representation of the GCC data structure that is $arg0.
 Works only when an inferior is executing.
 end
 
 define pr
-call debug_rtx ($)
+call debug_rtx ($arg0)
 end
 
 document pr
-Print the full structure of the rtx that is $.
+Print the full structure of the rtx that is $arg0.
 Works only when an inferior is executing.
 end
 
 define prl
-call debug_rtx_list ($, debug_rtx_count)
+call debug_rtx_list ($arg0, debug_rtx_count)
 end
 
 document prl
-Print the full structure of all rtx insns beginning at $.
+Print the full structure of all rtx insns beginning at $arg0.
 Works only when an inferior is executing.
 Uses variable debug_rtx_count to control number of insns printed:
-  debug_rtx_count > 0: print from $ on.
-  debug_rtx_count < 0: print a window around $.
+  debug_rtx_count > 0: print from $arg0 on.
+  debug_rtx_count < 0: print a window around $arg0.
 
 There is also debug_rtx_find (rtx, uid) that will scan a list for UID and print
 it using debug_rtx_list. Usage example: set $foo=debug_rtx_find(first, 42)
 end
 
 define pt
-call debug_tree ($)
+call debug_tree ($arg0)
 end
 
 document pt
-Print the full structure of the tree that is $.
+Print the full structure of the tree that is $arg0.
 Works only when an inferior is executing.
 end
 
 define pct
-call debug_c_tree ($)
+call debug_c_tree ($arg0)
 end
 
 document pct
-Print the tree that is $ in C syntax.
+Print the tree that is $arg0 in C syntax.
 Works only when an inferior is executing.
 end
 
 define pgg
-call debug_gimple_stmt ($)
+call debug_gimple_stmt ($arg0)
 end
 
 document pgg
-Print the Gimple statement that is $ in C syntax.
+Print the Gimple statement that is $arg0 in C syntax.
 Works only when an inferior is executing.
 end
 
 define pgq
-call debug_gimple_seq ($)
+call debug_gimple_seq ($arg0)
 end
 
 document pgq
-Print the Gimple sequence that is $ in C syntax.
+Print the Gimple sequence that is $arg0 in C syntax.
 Works only when an inferior is executing.
 end
 
 define pgs
-call debug_generic_stmt ($)
+call debug_generic_stmt ($arg0)
 end
 
 document pgs
-Print the statement that is $ in C syntax.
+Print the statement that is $arg0 in C syntax.
 Works only when an inferior is executing.
 end
 
 define pge
-call debug_generic_expr ($)
+call debug_generic_expr ($arg0)
 end
 
 document pge
-Print the expression that is $ in C syntax.
+Print the expression that is $arg0 in C syntax.
 Works only when an inferior is executing.
 end
 
 define pmz
-call mpz_out_str(stderr, 10, $)
+call mpz_out_str(stderr, 10, $arg0)
 end
 
 document pmz
-Print the mpz value that is $
+Print the mpz value that is $arg0
 Works only when an inferior is executing.
 end
 
 define ptc
-output (enum tree_code) $.base.code
+output (enum tree_code) $arg0.base.code
 echo \n
 end
 
 document ptc
-Print the tree-code of the tree node that is $.
+Print the tree-code of the tree node that is $arg0.
 end
 
 define pdn
-output $.decl_minimal.name->identifier.id.str
+output $arg0.decl_minimal.name->identifier.id.str
 echo \n
 end
 
 document pdn
-Print the name of the decl-node that is $.
+Print the name of the decl-node that is $arg0.
 end
 
 define ptn
-output $.type.name->decl_minimal.name->identifier.id.str
+output $arg0.type.name->decl_minimal.name->identifier.id.str
 echo \n
 end
 
 document ptn
-Print the name of the type-node that is $.
+Print the name of the type-node that is $arg0.
 end
 
 define pdd
-call debug_dwarf_die ($)
+call debug_dwarf_die ($arg0)
 end
 
 document pdd
-Print the dw_die_ref that is in $.
+Print the dw_die_ref that is in $arg0.
 end
 
 define prc
-output (enum rtx_code) $.code
+output (enum rtx_code) $arg0.code
 echo \ (
-output $.mode
+output $arg0.mode
 echo )\n
 end
 
 document prc
-Print the rtx-code and machine mode of the rtx that is $.
+Print the rtx-code and machine mode of the rtx that is $arg0.
 end
 
 define pi
-print $.u.fld[0].rt_rtx@7
+print $arg0.u.fld[0].rt_rtx@7
 end
 
 document pi
-Print the fields of an instruction that is $.
+Print the fields of an instruction that is $arg0.
 end
 
 define pbs
@@ -176,20 +176,20 @@ including the global binding level.
 end
 
 define pbm
-call bitmap_print (stderr, $, "", "\n")
+call bitmap_print 

[PATCH 1/2] gdbinit.in: call a function with "call", not "set"

2019-11-12 Thread Konstantin Kharlamov
Last time a command that calls a function of debuggee with "set" was
added is 2013 year. Apparently something has changed since then, since
doing "set foo()" in gdb to call a "foo()" results in error.
Disregarding, it looks wrong to call a function with "set". Let's use
"call" instead.

* (debug_rtx,debug_rtx_list,debug_tree,debug_c_tree,debug_gimple_stmt,
debug_gimple_seq,mpz_out_str,debug_dwarf_die,print_binding_stack,
bitmap_print): Replace "set" with "call"
---
 gcc/gdbinit.in | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/gdbinit.in b/gcc/gdbinit.in
index 42302aecfe3..a933ddc6141 100644
--- a/gcc/gdbinit.in
+++ b/gcc/gdbinit.in
@@ -26,7 +26,7 @@ Works only when an inferior is executing.
 end
 
 define pr
-set debug_rtx ($)
+call debug_rtx ($)
 end
 
 document pr
@@ -35,7 +35,7 @@ Works only when an inferior is executing.
 end
 
 define prl
-set debug_rtx_list ($, debug_rtx_count)
+call debug_rtx_list ($, debug_rtx_count)
 end
 
 document prl
@@ -50,7 +50,7 @@ it using debug_rtx_list. Usage example: set 
$foo=debug_rtx_find(first, 42)
 end
 
 define pt
-set debug_tree ($)
+call debug_tree ($)
 end
 
 document pt
@@ -59,7 +59,7 @@ Works only when an inferior is executing.
 end
 
 define pct
-set debug_c_tree ($)
+call debug_c_tree ($)
 end
 
 document pct
@@ -68,7 +68,7 @@ Works only when an inferior is executing.
 end
 
 define pgg
-set debug_gimple_stmt ($)
+call debug_gimple_stmt ($)
 end
 
 document pgg
@@ -77,7 +77,7 @@ Works only when an inferior is executing.
 end
 
 define pgq
-set debug_gimple_seq ($)
+call debug_gimple_seq ($)
 end
 
 document pgq
@@ -86,7 +86,7 @@ Works only when an inferior is executing.
 end
 
 define pgs
-set debug_generic_stmt ($)
+call debug_generic_stmt ($)
 end
 
 document pgs
@@ -95,7 +95,7 @@ Works only when an inferior is executing.
 end
 
 define pge
-set debug_generic_expr ($)
+call debug_generic_expr ($)
 end
 
 document pge
@@ -104,7 +104,7 @@ Works only when an inferior is executing.
 end
 
 define pmz
-set mpz_out_str(stderr, 10, $)
+call mpz_out_str(stderr, 10, $)
 end
 
 document pmz
@@ -140,7 +140,7 @@ Print the name of the type-node that is $.
 end
 
 define pdd
-set debug_dwarf_die ($)
+call debug_dwarf_die ($)
 end
 
 document pdd
@@ -167,7 +167,7 @@ Print the fields of an instruction that is $.
 end
 
 define pbs
-set print_binding_stack ()
+call print_binding_stack ()
 end
 
 document pbs
@@ -176,7 +176,7 @@ including the global binding level.
 end
 
 define pbm
-set bitmap_print (stderr, $, "", "\n")
+call bitmap_print (stderr, $, "", "\n")
 end
 
 document pbm
-- 
2.24.0



Re: [PATCH] Enable libsanitizer build on riscv64

2019-11-12 Thread Jakub Jelinek
On Tue, Nov 12, 2019 at 11:32:56AM +0100, Jakub Jelinek wrote:
> On Tue, Nov 12, 2019 at 10:56:21AM +0100, Andreas Schwab wrote:
> > On Nov 11 2019, Jim Wilson wrote:
> > 
> > > ../../../../gcc-git/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:1136:1:
> > > note: in expansion of macro ‘CHECK_SIZE_AND_OFFSET’
> > >  1136 | CHECK_SIZE_AND_OFFSET(ipc_perm, mode);
> > >   | ^
> > 
> > Looks like you are using an unreleased version of glibc.  This works
> > correctly with glibc 2.30.
> > 
> > As you have noticed, this will need to be corrected for all
> > architectures where the ipc_perm structure has been changed in commit
> > 2f959dfe84, once glibc 2.31 has been released.  Care to file an llvm
> > issue about that?
> 
> We actually have a change cherry-picked from upstream for the
> https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=2f959dfe849e0646e27403f2e4091536496ac0f0
> glibc change - 
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01554.html
> , but only for arm, while it apparently broke either all or many other
> architectures (at least x86_64 and riscv64 are now reported).

From the linux targets supported by GCC libsanitizer, I think affected
are sparc 32-bit, s390 31-bit (this one is even an ABI change, as mode
not only changed size, but on big endian didn't change offset and
unfortunately libsanitizer intercepts shmctl), arm (again, on big endian
an ABI change which libsanitizer interception will not cope with, as it uses
dlsym rather than dlvsym and is not symbol versioned), x86_64, i?86,
riscv64.
So, either we go for something like untested:
--- libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp   
2019-11-07 17:56:23.551835239 +0100
+++ libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp   
2019-11-12 12:14:24.216763190 +0100
@@ -1129,10 +1129,12 @@ CHECK_SIZE_AND_OFFSET(ipc_perm, gid);
 CHECK_SIZE_AND_OFFSET(ipc_perm, cuid);
 CHECK_SIZE_AND_OFFSET(ipc_perm, cgid);
 #if (!defined(__aarch64__) || !SANITIZER_LINUX || __GLIBC_PREREQ (2, 21)) && \
-!defined(__arm__)
+(!SANITIZER_LINUX || !__GLIBC_PREREQ (2, 30) || \
+ defined(__powerpc__) || (defined(__sparc__) && defined(__arch64__)) \
+ defined(__mips__) || defined(__aarch64__) || defined(__s390x__))
 /* On aarch64 glibc 2.20 and earlier provided incorrect mode field.  */
-/* On Arm newer glibc provide a different mode field, it's hard to detect
-   so just disable the check.  */
+/* glibc 2.30 and earlier provided 16-bit mode field instead of 32-bit
+   on most architectures.  */
 CHECK_SIZE_AND_OFFSET(ipc_perm, mode);
 #endif
 
or perhaps better change sanitizer_platform_limits_posix.h to match the
glibc 2.31 definition and similarly to aarch64 don't check mode for
!__GLIBC_PREREQ (2, 31), that would be something like untested:
--- libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h 
2019-11-07 17:56:23.530835549 +0100
+++ libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h 
2019-11-12 12:22:26.314511706 +0100
@@ -207,26 +207,13 @@ struct __sanitizer_ipc_perm {
   u64 __unused1;
   u64 __unused2;
 #elif defined(__sparc__)
-#if defined(__arch64__)
   unsigned mode;
-  unsigned short __pad1;
-#else
-  unsigned short __pad1;
-  unsigned short mode;
   unsigned short __pad2;
-#endif
   unsigned short __seq;
   unsigned long long __unused1;
   unsigned long long __unused2;
-#elif defined(__mips__) || defined(__aarch64__) || defined(__s390x__)
-  unsigned int mode;
-  unsigned short __seq;
-  unsigned short __pad1;
-  unsigned long __unused1;
-  unsigned long __unused2;
 #else
-  unsigned short mode;
-  unsigned short __pad1;
+  unsigned int mode;
   unsigned short __seq;
   unsigned short __pad2;
 #if defined(__x86_64__) && !defined(_LP64)
--- libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp   
2019-11-07 17:56:23.551835239 +0100
+++ libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp   
2019-11-12 12:23:42.959358844 +0100
@@ -1128,11 +1128,9 @@ CHECK_SIZE_AND_OFFSET(ipc_perm, uid);
 CHECK_SIZE_AND_OFFSET(ipc_perm, gid);
 CHECK_SIZE_AND_OFFSET(ipc_perm, cuid);
 CHECK_SIZE_AND_OFFSET(ipc_perm, cgid);
-#if (!defined(__aarch64__) || !SANITIZER_LINUX || __GLIBC_PREREQ (2, 21)) && \
-!defined(__arm__)
-/* On aarch64 glibc 2.20 and earlier provided incorrect mode field.  */
-/* On Arm newer glibc provide a different mode field, it's hard to detect
-   so just disable the check.  */
+#if !SANITIZER_LINUX || __GLIBC_PREREQ (2, 31)
+/* glibc 2.30 and earlier provided 16-bit mode field instead of 32-bit
+   on most architectures.  */
 CHECK_SIZE_AND_OFFSET(ipc_perm, mode);
 #endif
 
But I'm afraid I don't really have the cycles to test this on all targets,
nor does it fix the arm be or s390 31-bit problem with shmctl.

Jakub



  1   2   >