Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-18 Thread luoxhu

Hi Martin,

On 2019/6/18 17:34, Martin Liška wrote:

On 6/18/19 11:02 AM, luoxhu wrote:

Hi,

On 2019/6/18 13:51, Martin Liška wrote:

On 6/18/19 3:45 AM, Xiong Hu Luo wrote:

Hello.

Thank you for the interest in the area.


This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
Currently the default instrument function can only find the indirect function
that called more than 50% with an incorrect count number returned.

Can you please explain what you mean by 'an incorrect count number returned'?


For a test case indir-call-topn.c, it include 2 indirect calls "one" and "two". the profiling data 
is as below with trunk code (including your patch, count[0] and count[2] is switched by your code, the count[0] is used 
in ipa-profile but only support the top1 format, my patch adds the support for the topn format. count[0] was incorrect 
as WITHOUT your patch it is 0,  things getting better with your fix as the count[0] is 35000, but still not 
correct, in fact, "one" is running 17500 times, and "two" is running the other 17500 times):

indir-call-topn.gcda:   22:    01a9:  18:COUNTERS indirect_call 9 counts
indir-call-topn.gcda:   24:   0: *35000 1868707024 0* 0 0 0 
0 0

Running with the "--param indir-call-topn-profile=1" will give below profile 
data, My patch is based on this profile result and do the optimization for multiple 
indirect targets, performance can get much improve on this testcase and SPEC2017 for some 
benchmarks(LLVM already support this several years ago...).

indir-call-topn.gcda:   26:    01b1:  18:COUNTERS indirect_call_topn 9 
counts
indir-call-topn.gcda:   28:   0: *0 969338501 17500 
1868707024 17500* 0 0 0


test case indir-call-topn.c:

#include 


typedef int (*fptr) (int);
int
one (int a)
{
   return 1;
}

int
two (int a)
{
   return 0;
}

fptr table[] = {, };

int
main()
{
   int i, x;
   fptr p = 

   one (3);

   for (i = 0; i < 35000; i++)
     {
   x = (*p) (3);
   p = table[x];
     }
   printf ("done:%d\n", x);
}


I've got it. So it's situation where you have distribution equal to 50% and 
50%. Note that it's
the only valid situation where both edges with be >= 50%. That's the threshold 
for which
we speculatively devirtualize edges. That said, you don't need generic topn 
counter, but a probably
only a top2 counter which can be generalized from single-value counter type. 
I'm saying that
because I removed the TOPN, mainly due to:
https://github.com/gcc-mirror/gcc/commit/5cb221f2b9c268df47c97b4837230b15e65f9c14#diff-d003c64ae14449d86df03508de98bde7L179

which is over-complicated profiling function. And the changes that I've done 
recently are motivated
to preserve a stable builds. That's achieved by noticing that a single-value 
counter can't handle all
seen values.


Actually, the algorithm of function __gcov_one_value_profiler_body in 
libgcc/libgcov-profiler.c has functionality issue when profiling the testcase I 
provide.


118 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
119 int use_atomic)
120 {
121   if (value == counters[1])
122 counters[2]++;
123   else if (counters[2] == 0)
124 {
125   counters[2] = 1;
126   counters[1] = value;
127 }
128   else
129 counters[2]--;
130
131   if (use_atomic)
132 __atomic_fetch_add ([0], 1, __ATOMIC_RELAXED);
133   else
134 counters[0]++;
135 }

function "one" is 1868707024, function "two" is 969338501. Loop running from 
0->(35000-1):


  value  counters[0]counters[1]   counters[2]
18687070241 1868707024 1
 9693385012 1868707024 0
18687070243 1868707024 1
 9693385014 1868707024 0
18687070245 1868707024 1
...
 969338501 350001868707024 0

Finally, counters[] return value is [35000, 1868707024, 0].
In ipa-profile.c and value-prof.c, counters[0] is the statement that executed 
all, counters[2] is the indirect call that counters[1] executed which is 0 here.
This counters[2] shouldn't be 0 in fact, which means prob is 0(It was expected 
to be 50%, right?). This prob will cause ipa-profile fail to create speculative 
edge and do indirect call later. I think this is the reason why topn was 
introduced by Rong Xu in 2014 (8ceaa1e) and reimplemented that in LLVM later. 
There was definitely a bug here before re-enable topn.


dump-profile: indir-call-topn.fb.gcc.wpa.069i.profile_estimate
  1 Histogram:5
  2   35001: time:2 (8.70) size:2 (8.00)
  3   35000: time:19 (91.30) size:7 (36.00)
  4   17500: time:4 (100.00) size:2 (44.00)
  5   1: time:0 (100.00) size:0 (44.00)
  6   0: time:37 (100.00) size:14 (100.00)
  7 Determined min count: 17500 

Go patch committed: Stack allocate buffers for non-escaping string ops

2019-06-18 Thread Ian Lance Taylor
This patch to the Go frontend by Cherry Zhang stack allocates a buffer
for non-escaping string operations.  For string concatenation, string
to/from byte or rune slice conversion, and int to string conversion,
if the result does not escape, we can allocate a small (32-element, or
4-byte for int to string) buffer on stack, and pass it to the runtime
function.  If the result fits in the buffer, it doesn't need to do a
heap allocation.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 272460)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-62d1b667f3e85f72a186b04aad36d701160a4611
+0e4aa31b26a20b6a6a2ca102b85ba8c8b8cdf876
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 272460)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -3739,8 +3739,11 @@ Type_conversion_expression::do_flatten(G
   this->expr_ = Expression::make_temporary_reference(temp, 
this->location());
 }
 
-  // For interface conversion, decide if we can allocate on stack.
-  if (this->type()->interface_type() != NULL)
+  // For interface conversion and string to/from slice conversions,
+  // decide if we can allocate on stack.
+  if (this->type()->interface_type() != NULL
+  || this->type()->is_string_type()
+  || this->expr_->type()->is_string_type())
 {
   Node* n = Node::make_node(this);
   if ((n->encoding() & ESCAPE_MASK) == Node::ESCAPE_NONE)
@@ -3984,9 +3987,21 @@ Type_conversion_expression::do_get_backe
  return se->get_backend(context);
}
 
+  Expression* buf;
+  if (this->no_escape_)
+{
+  Type* byte_type = Type::lookup_integer_type("uint8");
+  Expression* buflen =
+Expression::make_integer_ul(4, NULL, loc);
+  Type* array_type = Type::make_array_type(byte_type, buflen);
+  buf = Expression::make_allocation(array_type, loc);
+  buf->allocation_expression()->set_allocate_on_stack();
+  buf->allocation_expression()->set_no_zero();
+}
+  else
+buf = Expression::make_nil(loc);
   Expression* i2s_expr =
-  Runtime::make_call(Runtime::INTSTRING, loc, 2,
-Expression::make_nil(loc), this->expr_);
+Runtime::make_call(Runtime::INTSTRING, loc, 2, buf, this->expr_);
   return Expression::make_cast(type, i2s_expr, loc)->get_backend(context);
 }
   else if (type->is_string_type() && expr_type->is_slice_type())
@@ -4019,7 +4034,21 @@ Type_conversion_expression::do_get_backe
   go_assert(e->integer_type()->is_rune());
   code = Runtime::SLICERUNETOSTRING;
 }
-  return Runtime::make_call(code, loc, 2, Expression::make_nil(loc),
+
+  Expression* buf;
+  if (this->no_escape_)
+{
+  Type* byte_type = Type::lookup_integer_type("uint8");
+  Expression* buflen =
+Expression::make_integer_ul(tmp_string_buf_size, NULL, loc);
+  Type* array_type = Type::make_array_type(byte_type, buflen);
+  buf = Expression::make_allocation(array_type, loc);
+  buf->allocation_expression()->set_allocate_on_stack();
+  buf->allocation_expression()->set_no_zero();
+}
+  else
+buf = Expression::make_nil(loc);
+  return Runtime::make_call(code, loc, 2, buf,
this->expr_)->get_backend(context);
 }
   else if (type->is_slice_type() && expr_type->is_string_type())
@@ -4035,9 +4064,20 @@ Type_conversion_expression::do_get_backe
  go_assert(e->integer_type()->is_rune());
  code = Runtime::STRINGTOSLICERUNE;
}
-  Expression* s2a = Runtime::make_call(code, loc, 2,
-  Expression::make_nil(loc),
-  this->expr_);
+
+  Expression* buf;
+  if (this->no_escape_)
+{
+  Expression* buflen =
+Expression::make_integer_ul(tmp_string_buf_size, NULL, loc);
+  Type* array_type = Type::make_array_type(e, buflen);
+  buf = Expression::make_allocation(array_type, loc);
+  buf->allocation_expression()->set_allocate_on_stack();
+  buf->allocation_expression()->set_no_zero();
+}
+  else
+buf = Expression::make_nil(loc);
+  Expression* s2a = Runtime::make_call(code, loc, 2, buf, this->expr_);
   return Expression::make_unsafe_cast(type, s2a, 
loc)->get_backend(context);
 }
   else if (type->is_numeric_type())
@@ -7428,7 +7468,35 @@ String_concat_expression::do_flatten(Gog
 tce->set_no_copy(true);
 }
 
-  Expression* 

Re: [PATCH] Fix PR84521

2019-06-18 Thread Max Filippov
On Tue, Jun 18, 2019 at 4:53 PM Wilco Dijkstra  wrote:
> > It would work if a frame pointer was initialized in the function test, but
> > it wasn't:
>
> Right, because it unwinds, it needs a valid frame pointer since we no
> longer store the stack pointer. So xtensa_frame_pointer_required
> should do something like:
>
>   if (cfun->machine->accesses_prev_frame || cfun->has_nonlocal_label)
> return true;

You're right, with this change things are back to normal.

-- 
Thanks.
-- Max


Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)

2019-06-18 Thread Martin Sebor

On 6/14/19 2:59 PM, Jeff Law wrote:

On 6/4/19 1:40 PM, Martin Sebor wrote:

On 6/3/19 5:24 PM, Martin Sebor wrote:

On 5/31/19 2:46 PM, Jeff Law wrote:

On 5/22/19 3:34 PM, Martin Sebor wrote:

-Wreturn-local-addr detects a subset of instances of returning
the address of a local object from a function but the warning
doesn't try to handle alloca or VLAs, or some non-trivial cases
of ordinary automatic variables[1].

The attached patch extends the implementation of the warning to
detect those.  It still doesn't detect instances where the address
is the result of a built-in such strcpy[2].

Tested on x86_64-linux.

Martin

[1] For example, this is only diagnosed with the patch:

    void* f (int i)
    {
  struct S { int a[2]; } s[2];
  return >a[i];
    }

[2] The following is not diagnosed even with the patch:

    void sink (void*);

    void* f (int i)
    {
  char a[6];
  char *p = __builtin_strcpy (a, "123");
  sink (p);
  return p;
    }

I would expect detecting to be possible and useful.  Maybe as
a follow-up.

gcc-71924.diff

PR middle-end/71924 - missing -Wreturn-local-addr returning alloca
result
PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an
address of a local array plus offset

gcc/ChangeLog:

 PR c/71924
 * gimple-ssa-isolate-paths.c (is_addr_local): New function.
 (warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
 (find_implicit_erroneous_behavior): Call
warn_return_addr_local_phi_arg.
 (find_explicit_erroneous_behavior): Call warn_return_addr_local.

gcc/testsuite/ChangeLog:

 PR c/71924
 * gcc.dg/Wreturn-local-addr-2.c: New test.
 * gcc.dg/Walloca-4.c: Prune expected warnings.
 * gcc.dg/pr41551.c: Same.
 * gcc.dg/pr59523.c: Same.
 * gcc.dg/tree-ssa/pr88775-2.c: Same.
 * gcc.dg/winline-7.c: Same.

diff --git a/gcc/gimple-ssa-isolate-paths.c
b/gcc/gimple-ssa-isolate-paths.c
index 33fe352bb23..2933ecf502e 100644
--- a/gcc/gimple-ssa-isolate-paths.c
+++ b/gcc/gimple-ssa-isolate-paths.c
@@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple
*stmt)
     return false;
   }
+/* Return true if EXPR is a expression of pointer type that refers
+   to the address of a variable with automatic storage duration.
+   If so, set *PLOC to the location of the object or the call that
+   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
+   also consider PHI statements and set *PMAYBE when some but not
+   all arguments of such statements refer to local variables, and
+   to clear it otherwise.  */
+
+static bool
+is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
+   hash_set *visited = NULL)
+{
+  if (TREE_CODE (exp) == SSA_NAME)
+    {
+  gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
+  enum gimple_code code = gimple_code (def_stmt);
+
+  if (is_gimple_assign (def_stmt))
+    {
+  tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
+  if (POINTER_TYPE_P (type))
+    {
+  tree ptr = gimple_assign_rhs1 (def_stmt);
+  return is_addr_local (ptr, ploc, pmaybe, visited);
+    }
+  return false;
+    }

So this is going to recurse on the rhs1 of something like
POINTER_PLUS_EXPR, that's a good thing :-)   But isn't it non-selective
about the codes where we recurse?

Consider

    ptr = (cond) ? res1 : res2

I think we'll end up recursing on the condition rather than looking at
res1 and res2.


I suspect there are a very limited number of expression codes that
appear on the RHS where we'd want to recurse on one or both operands.

POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse
on both and logically and the result), BIT_AND (maybe we masked off some
bits in an address).  That's probably about it :-)

Are there any other codes you've seen or think would be useful in
practice to recurse through?  I'd rather list them explicitly rather
than just recurse down through every rhs1 we encounter.


I don't have a list of codes to test for.  I initially contemplated
enumerating them but in the end decided the pointer type check would
be sufficient.  I wouldn't expect a COND_EXPR here.  Don't they get
transformed into PHIs?  In all my tests they do and and running
the whole test suite with an assert that it doesn't come up doesn't
expose any either.  (I left the assert for COND_EXPR there.)  If
a COND_EXPR really can come up in a GIMPLE assignment here can you
please show me how so I can add a test for it?

A COND_EXPR on the RHS of an assignment is valid gimple.  That's what we
need to consider here -- what is and what is not valid gimple.  And its
more likely that PHIs will be transformed into RHS COND_EXPRs -- that's
standard practice for if-conversion.

Gosh, how to get one?  It happens all the time :-)  Since I know DOM so
well, I just shoved a quick assert into optimize_stmt to abort if we
encounter a gimple assignment where the RHS is a COND_EXPR.  It blew up
instantly building libgcc :-)

COmpile 

[PATCH] let hash-based containers work with non-trivial types (PR 90923)

2019-06-18 Thread Martin Sebor

Let me try that again to the right list.

On 6/18/19 9:14 PM, Martin Sebor wrote:

Bug 90923 shows that even though GCC hash-table based containers
like hash_map can be instantiated on types with user-defined ctors
and dtors they invoke the dtors of such types without invoking
the corresponding ctors.

It was thanks to this bug that I spent a day debugging "interesting"
miscompilations during GCC bootstrap (in fairness, it was that and
bug 90904 about auto_vec copy assignment/construction also being
hosed even for POD types).

The attached patch corrects the hash_map and hash_set templates
to invoke the ctors of the elements they insert and makes them
(hopefully) safe to use with non-trivial user-defined types.

Tested on x86_64-linux.

Martin


PR middle-end/90923 - hash_map destroys elements without constructing them

gcc/ChangeLog:

	PR middle-end/90923
	* hash-map.h (hash_map::put): On insertion invoke element ctor.
	(hash_map::get_or_insert): Same.  Reformat comment.
	* hash-set.h (hash_set::add): On insertion invoke element ctor.
	* hash-map-tests.c (test_map_of_type_with_ctor_and_dtor): New.
 	* hash-set-tests.c (test_map_of_type_with_ctor_and_dtor): New.

diff --git a/gcc/hash-map-tests.c b/gcc/hash-map-tests.c
index b79c7821684..9b365ef1480 100644
--- a/gcc/hash-map-tests.c
+++ b/gcc/hash-map-tests.c
@@ -103,12 +103,98 @@ test_map_of_strings_to_int ()
   ASSERT_EQ (1, string_map.elements ());
 }
 
+typedef struct hash_map_test_val_t
+{
+  static int ndefault;
+  static int ncopy;
+  static int nassign;
+  static int ndtor;
+
+  hash_map_test_val_t ()
+: ptr ()
+  {
+++ndefault;
+  }
+
+  hash_map_test_val_t (const hash_map_test_val_t &)
+: ptr ()
+  {
+++ncopy;
+  }
+
+  hash_map_test_val_t& operator= (const hash_map_test_val_t &)
+{
+ ++nassign;
+ return *this;
+}
+
+  ~hash_map_test_val_t ()
+{
+ gcc_assert (ptr == );
+ ++ndtor;
+}
+
+  void *ptr;
+} val_t;
+
+int val_t::ndefault;
+int val_t::ncopy;
+int val_t::nassign;
+int val_t::ndtor;
+
+static void
+test_map_of_type_with_ctor_and_dtor ()
+{
+  typedef hash_map  Map;
+
+  {
+Map m;
+(void)
+  }
+
+  ASSERT_TRUE (val_t::ndefault == 0);
+  ASSERT_TRUE (val_t::ncopy == 0);
+  ASSERT_TRUE (val_t::nassign == 0);
+  ASSERT_TRUE (val_t::ndtor == 0);
+
+  {
+Map m;
+void *p = 
+m.get_or_insert (p);
+  }
+
+  ASSERT_TRUE (val_t::ndefault + val_t::ncopy == val_t::ndtor);
+
+  {
+Map m;
+void *p = , *q = 
+val_t  = m.get_or_insert (p);
+val_t  = m.get_or_insert (q);
+
+ASSERT_TRUE (v1.ptr ==  &&  == v2.ptr);
+  }
+
+  ASSERT_TRUE (val_t::ndefault + val_t::ncopy == val_t::ndtor);
+
+  {
+Map m;
+void *p = , *q = 
+m.get_or_insert (p);
+m.remove (p);
+m.get_or_insert (q);
+m.remove (q);
+
+ASSERT_TRUE (val_t::ndefault + val_t::ncopy == val_t::ndtor);
+  }
+}
+
 /* Run all of the selftests within this file.  */
 
 void
 hash_map_tests_c_tests ()
 {
   test_map_of_strings_to_int ();
+  test_map_of_type_with_ctor_and_dtor ();
 }
 
 } // namespace selftest
diff --git a/gcc/hash-map.h b/gcc/hash-map.h
index 588dfda04fa..71cc1dead1d 100644
--- a/gcc/hash-map.h
+++ b/gcc/hash-map.h
@@ -21,8 +21,12 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef hash_map_h
 #define hash_map_h
 
-template
+/* KeyId must be a trivial (POD) type.  Value may be non-trivial
+   (non-POD).  Ctors and dtors are invoked as necessary on
+   inserted and removed elements.  On hash_map destruction all
+   elements are removed.  */
+
+template
 class GTY((user)) hash_map
 {
   typedef typename Traits::key_type Key;
@@ -151,12 +155,16 @@ public:
 {
   hash_entry *e = m_table.find_slot_with_hash (k, Traits::hash (k),
 		   INSERT);
-  bool existed = !hash_entry::is_empty (*e);
-  if (!existed)
-	e->m_key = k;
+  bool ins = hash_entry::is_empty (*e);
+  if (ins)
+	{
+	  e->m_key = k;
+	  new ((void *) >m_value) Value (v);
+	}
+  else
+	e->m_value = v;
 
-  e->m_value = v;
-  return existed;
+  return !ins;
 }
 
   /* if the passed in key is in the map return its value otherwise NULL.  */
@@ -168,8 +176,8 @@ public:
 }
 
   /* Return a reference to the value for the passed in key, creating the entry
- if it doesn't already exist.  If existed is not NULL then it is set to false
- if the key was not previously in the map, and true otherwise.  */
+ if it doesn't already exist.  If existed is not NULL then it is set to
+ false if the key was not previously in the map, and true otherwise.  */
 
   Value _or_insert (const Key , bool *existed = NULL)
 {
@@ -177,7 +185,10 @@ public:
 		   INSERT);
   bool ins = Traits::is_empty (*e);
   if (ins)
-	e->m_key = k;
+	{
+	  e->m_key = k;
+	  new ((void *)>m_value) Value ();
+	}
 
   if (existed != NULL)
 	*existed = !ins;
diff --git a/gcc/hash-set-tests.c b/gcc/hash-set-tests.c
index e0d1d47805b..c96fe538d9f 100644
--- 

Re: *ping* Re: [PATCH] PR fortran/89103 - Allow blank format items in format strings

2019-06-18 Thread Jerry DeLisle

I will see if I can get this one.

On 6/17/19 6:37 AM, Mark Eggleston wrote:


On 12/06/2019 19:11, Steve Kargl wrote:

On Tue, Jun 11, 2019 at 11:50:40AM +0200, Jakub Jelinek wrote:

On Tue, Jun 11, 2019 at 10:30:59AM +0100, Mark Eggleston wrote:

 Jim MacArthur 
 Mark Eggleston 

Two spaces before < instead of one.

This is not a patch review, just comments:

Mark, do you plan to address any of Jakub's comments.
Do note, I just 'OK' Jakub's patch that uses G_()
forms for the strings.


Now that Jakubs's patch has been committed, please find attached an updated 
patch and updated change logs:


gcc/fortran

     Jim MacArthur  
     Mark Eggleston  

     PR fortran/89103
     * gfortran.texi: Add -fdec-blank-format-item
     * invoke.texi: Add option to list of options.
     * invoke.texi: Add to section on Commas in FORMAT specifications.
     * io.c (check_format): At FMT_RPAREN goto finished if
     -fdec-blank-format-item otherwise set error string.
     * lang.opt: Add new option.
     * options.c (set_dec_flags): Add SET_BITFLAG for
     flag_dec_format_defaults.

gcc/testsuite

     Jim MacArthur  
     Mark Eggleston  

     PR fortran/89103
     * gfortran.dg/dec_format_empty_item_1.f: New test.
     * gfortran.dg/dec_format_empty_item_2.f: New test.
     * gfortran.dg/dec_format_empty_item_3.f: New test.

as before... Please can someone commit this as do not have commit rights.



Also, do you have plans to contribute additional
patches (either for -fdec* extensions or preferrably
to help with bug fixes and new features)?  It may be
advantageous for you to get a commit bit.
Yes, I do intend to contribute additional patches, mostly -fdec- patches, there 
are also some patches unrelated to -fdec* extensions.







[PATCH] PR fortran/69398 -- Duplicate dimensions and CLASS

2019-06-18 Thread Steve Kargl
The attach patch fixes an issue where a CLASS entity
is specified with duplicate dimension.  See the testcase
for an exmaple.  Regression tested on x86_64-*-freebsd.
OK to commit?

2019-06-18  Steven G. Kargl  

PR fortran/69398
* decl.c (attr_decl): Check for duplicate DIMENSION attribute for a
CLASS entity.
 
2019-06-18  Steven G. Kargl  

PR fortran/69398
* gfortran.dg/pr69398.f90: New test.


-- 
Steve
2019-06-18  Steven G. Kargl  

   PR fortran/69398
   * decl.c (attr_decl): Check for duplicate DIMENSION attribute for a
   CLASS entity.

 
2019-06-18  Steven G. Kargl  

   PR fortran/69398
   * gfortran.dg/pr69398.f90: New test.


Go patch committed: Avoid copy for string([]byte) conversion in string concat

2019-06-18 Thread Ian Lance Taylor
This patch to the Go frontend by Cherry Zhang avoids a copy for a
string([]byte) conversion used in string concatenation.  If a
string([]byte) conversion is used immediately in a string
concatenation, we don't need to copy the backing store of the byte
slice, as the runtime function doesn't hold any reference to it.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

2019-06-18  Cherry Zhang  

* go.dg/concatstring.go: New test.
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 272133)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-b1ae35965cadac235d7d218e689944286cccdd90
+62d1b667f3e85f72a186b04aad36d701160a4611
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 272133)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -7408,6 +7408,26 @@ String_concat_expression::do_flatten(Gog
 return this;
   Location loc = this->location();
   Type* type = this->type();
+
+  // Mark string([]byte) operands to reuse the backing store.
+  // runtime.concatstrings does not keep the reference.
+  //
+  // Note: in the gc runtime, if all but one inputs are empty,
+  // concatstrings returns the only nonempty input without copy.
+  // So it is not safe to reuse the backing store if it is a
+  // string([]byte) conversion. So the gc compiler does the
+  // no-copy optimization only when there is at least one
+  // constant nonempty input. Currently the gccgo runtime
+  // doesn't do this, so we don't do the check.
+  for (Expression_list::iterator p = this->exprs_->begin();
+   p != this->exprs_->end();
+   ++p)
+{
+  Type_conversion_expression* tce = (*p)->conversion_expression();
+  if (tce != NULL)
+tce->set_no_copy(true);
+}
+
   Expression* nil_arg = Expression::make_nil(loc);
   Expression* call;
   switch (this->exprs_->size())
Index: gcc/testsuite/go.dg/concatstring.go
===
--- gcc/testsuite/go.dg/concatstring.go (nonexistent)
+++ gcc/testsuite/go.dg/concatstring.go (working copy)
@@ -0,0 +1,8 @@
+// { dg-do compile }
+// { dg-options "-fgo-debug-optimization" }
+
+package p
+
+func F(b []byte, x string) string {
+   return "hello " + string(b) + x // { dg-error "no copy 
string\\(\\\[\\\]byte\\)" }
+}


Re: [PATCH] Fix PR84521

2019-06-18 Thread Wilco Dijkstra
Hi Max,

> It would work if a frame pointer was initialized in the function test, but
> it wasn't:

Right, because it unwinds, it needs a valid frame pointer since we no
longer store the stack pointer. So xtensa_frame_pointer_required
should do something like:

  if (cfun->machine->accesses_prev_frame || cfun->has_nonlocal_label)
return true;

Wilco

Re: [PATCH] Fix PR84521

2019-06-18 Thread Max Filippov
On Tue, Jun 18, 2019 at 10:07 AM Wilco Dijkstra  wrote:
> > The testcase from the patch passes with the trunk xtensa-linux-gcc
> > with windowed ABI. But with the changes in this patch a lot of tests
> > that use longjmp are failing on xtensa-linux.
>
> Interesting. I looked at the _xtensa_nonlocal_goto implementation in
> libgcc/config/xtensa/lib2funcs.S, and it should work fine given it already
> checks for the frame pointer to be within the bounds of a frame.

It would work if a frame pointer was initialized in the function test, but
it wasn't:

test:
entry   sp, 64
l32ra2, .LC1
memw
l32i.n  a2, a2, 0
memw
s32i.n  a2, sp, 20
s32i.n  a7, sp, 0<
l32ra2, .LC2
s32i.n  a2, sp, 4

original version stored the sp there.

-- 
Thanks.
-- Max


[committed] [PR90921] Fortran OpenACC 'declare' directive's module handling causes duplicate data clauses (was: [PATCH, OpenACC] Fortran "declare create"/allocate support for OpenACC)

2019-06-18 Thread Thomas Schwinge
Hi!

On Thu, 4 Oct 2018 14:04:13 +0100, Julian Brown  wrote:
> On Sun, 23 Sep 2018 10:48:52 +0200
> Bernhard Reutner-Fischer  wrote:
> 
> > On Sat, 22 Sep 2018 at 00:32, Julian Brown 
> > wrote:
> > 
> > @@ -6218,13 +6221,20 @@ add_clause (gfc_symbol *sym, gfc_omp_map_op
> > map_op) {
> >gfc_omp_namelist *n;
> > 
> > +  if (!module_oacc_clauses)
> > +module_oacc_clauses = gfc_get_omp_clauses ();
> > +
> > +  if (sym->backend_decl == NULL)
> > +gfc_get_symbol_decl (sym);
> > +
> > +  for (n = module_oacc_clauses->lists[OMP_LIST_MAP]; n != NULL; n =
> > n->next)
> > +if (n->sym->backend_decl == sym->backend_decl)
> > +  return;
> > +
> > 
> > Didn't look too close, but should this throw an error instead of
> > silently returning, or was the error emitted earlier?

Bernhard, thanks for pointing out this "smelly" code, and then Julian for
analyzing the actual issue:

> The purpose of this fragment seems not to have been to do with error
> reporting at all, but rather to do with de-duplicating symbols that
> are listed (once) in clauses of "declare" directives in module blocks.
> Variables that are listed twice are diagnosed elsewhere.
> 
> As for why the de-duplication is necessary, it seems to be because of
> the way that modules are instantiated in programs and in subroutines.
> E.g. in declare-allocatable-1.f90, we have something along the lines of:
> 
>   module vars
> implicit none
> integer, parameter :: n = 100
> real*8, allocatable :: b(:)
>!$acc declare create (b)
>   end module vars
> 
>   program test
> use vars
> ...
>   end program test
> 
>   subroutine sub1
> use vars
> ...
>   end subroutine sub1
> 
>   subroutine sub2
> use vars
> ...
>   end subroutine sub2
> 
> The function find_module_oacc_declare_clauses is called for each of
> 'test', 'sub1' and 'sub2'. But in trans-decl.c:finish_oacc_declare, the
> new declare clauses are only attached to the namespace for a FL_PROGRAM
> (i.e. 'test'), not for the subroutines. The module_oacc_clauses global
> variable is reset only after moving the clauses to a FL_PROGRAM's
> namespace, otherwise it accumulates.
> 
> Hence, with the above code, we'd scan 'test', find declare clauses, and
> attach them to the namespace for 'test'. We'd then reset
> module_oacc_clauses.
> 
> Then, we'd scan 'sub1', and accumulate declare clauses from 'vars' into
> a fresh module_oacc_clauses.
> 
> Then we'd scan 'sub2', and accumulate declare clauses from 'vars'
> again: this is why the de-duplication in the patch seemed to be
> necessary.
> 
> This seems wrong to me though, and admits the possibility of clauses
> instantiated in a subroutine "leaking" into a subsequent program block.
> As a tentative fix, I've tried resetting module_oacc_clauses before
> each time the find_module_oacc_declare_clauses traversal takes place,
> and removing the de-duplication code.

So, that's clearly a separate bug from everything else discussed as part
of this patch submission;  "Fortran OpenACC
'declare' directive's module handling causes duplicate data clauses"
filed.

> This seems to work fine for the current tests in the testsuite, but I
> wonder the reason that things weren't done like like that to start
> with? The code dates back to 2015 (by James Norris):
> 
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02367.html

There remains a lot of mystery to be resolved regarding the OpenACC
'declare' implementation...  :-(

> --- a/gcc/fortran/trans-decl.c
> +++ b/gcc/fortran/trans-decl.c
> @@ -6272,6 +6275,8 @@ finish_oacc_declare (gfc_namespace *ns, gfc_symbol 
> *sym, bool block)
>gfc_omp_clauses *omp_clauses = NULL;
>gfc_omp_namelist *n, *p;
>  
> +  module_oacc_clauses = NULL;
> +
>gfc_traverse_ns (ns, find_module_oacc_declare_clauses);
>  
>if (module_oacc_clauses && sym->attr.flavor == FL_PROGRAM)
> @@ -6283,7 +6288,6 @@ finish_oacc_declare (gfc_namespace *ns, gfc_symbol 
> *sym, bool block)
>new_oc->clauses = module_oacc_clauses;
>  
>ns->oacc_declare = new_oc;
> -  module_oacc_clauses = NULL;
>  }
>  
>if (!ns->oacc_declare)

I cannot claim to understand this Fortran OpenACC 'declare' directive's
module handling here, but I can at least confirm via a test case that
I've added, that your change makes the duplicate data clauses go away;
committed to trunk in r272454 "[PR90921] Fortran OpenACC 'declare'
directive's module handling causes duplicate data clauses", see attached.


Grüße
 Thomas


From 9f15ed31065cf6baaae9b3e0e4c16fb9e958fbd9 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Tue, 18 Jun 2019 22:15:53 +
Subject: [PATCH] [PR90921] Fortran OpenACC 'declare' directive's module
 handling causes duplicate data clauses

	gcc/fortran/
	PR fortran/90921
	* trans-decl.c (finish_oacc_declare): Reset module_oacc_clauses
	before scanning each namespace.
	gcc/testsuite/
	PR fortran/90921
	* gfortran.dg/goacc/declare-3.f95: Update.

git-svn-id: 

Re: std::inclusive_scan

2019-06-18 Thread Jonathan Wakely

On 18/06/19 19:08 +0100, Dietmar Kuehl via libstdc++ wrote:

The first release shipping the parallel algorithms is gcc-9.1.0. Specifically 
std::inclusive_scan() *without* execution policy seems to be missing, though. I 
guess, that’s because it was an add/renamed algorithm. The versions
taking an execution policy are present.


Right.

But as of a few minutes ago (and r272459 in subversion) the versions
without an execution policy are also present, thanks to this patch.

Tested x86_64-linux, committed to trunk.



On 18 Jun 2019, at 18:21, Bence Kodaj  wrote:

Dear libstdc++ list members,

Could you tell me whether std::inclusive_scan is supposed to be included in
the most recent version of libstdc++?

Apologies if this is a well-known issue - I did search the list archive for
inclusive_scan at https://gcc.gnu.org/ml/libstdc++/ , and I got no hits.

Why I'm asking: per
https://en.cppreference.com/w/cpp/algorithm/inclusive_scan , this algorithm
is supposed to be part of the STL since C++17, but apparently, it's missing
from the version of libstdc++ that shipped with gcc 8.1 (which I installed
a few months ago). I also looked in the git mirror of the libstdc++ source
code (https://github.com/gcc-mirror/gcc/tree/master/libstdc%2B%2B-v3), but
there's no inclusive_scan  in any of the  headers I found
(include/std/numeric, include/experimental/numeric, include/ext/numeric,
include/parallel/numeric).

Am I missing something, or is std::inclusive_scan actually not part of the
most recent libstdc++?

Best regards,
Bence Kodaj


commit b93041f0d3c9a2fc64f0f5fb538e78d5e2001d32
Author: redi 
Date:   Tue Jun 18 23:01:16 2019 +

Implement new serial algorithms from Parallelism TS (P0024R2)

These new (non-parallel) algorithms were added to C++17 along with the
parallel algorithms, but were missing from libstdc++.

* include/bits/algorithmfwd.h: Change title of doc group.
* include/bits/stl_algo.h (for_each_n): Add new C++17 algorithm from
P0024R2.
* include/bits/stl_numeric.h: Define doc group and add algos to it.
* include/std/numeric (__is_random_access_iter): New internal trait.
(reduce, transform_reduce, exclusive_scan, inclusive_scan)
(transform_exclusive_scan, transform_inclusive_scan): Likewise.
* testsuite/25_algorithms/for_each/for_each_n.cc: New test.
* testsuite/26_numerics/exclusive_scan/1.cc: New test.
* testsuite/26_numerics/inclusive_scan/1.cc: New test.
* testsuite/26_numerics/reduce/1.cc: New test.
* testsuite/26_numerics/transform_exclusive_scan/1.cc: New test.
* testsuite/26_numerics/transform_inclusive_scan/1.cc: New test.
* testsuite/26_numerics/transform_reduce/1.cc: New test.
* testsuite/util/testsuite_iterators.h (test_container::size()): New
member function.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272459 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/bits/algorithmfwd.h b/libstdc++-v3/include/bits/algorithmfwd.h
index 40e051aa9e3..5e47fffe86e 100644
--- a/libstdc++-v3/include/bits/algorithmfwd.h
+++ b/libstdc++-v3/include/bits/algorithmfwd.h
@@ -154,7 +154,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
 
   /**
-   * @defgroup set_algorithms Set Operation
+   * @defgroup set_algorithms Set Operations
* @ingroup sorting_algorithms
*
* These algorithms are common set operations performed on sequences
diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index b50c642f0e6..ca957e0b9a7 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -3867,6 +3867,39 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   return __f; // N.B. [alg.foreach] says std::move(f) but it's redundant.
 }
 
+#if __cplusplus >= 201703L
+  /**
+   *  @brief Apply a function to every element of a sequence.
+   *  @ingroup non_mutating_algorithms
+   *  @param  __first  An input iterator.
+   *  @param  __n  A value convertible to an integer.
+   *  @param  __f  A unary function object.
+   *  @return   `__first+__n`
+   *
+   *  Applies the function object `__f` to each element in the range
+   *  `[first, first+n)`.  `__f` must not modify the order of the sequence.
+   *  If `__f` has a return value it is ignored.
+  */
+  template
+_InputIterator
+for_each_n(_InputIterator __first, _Size __n, _Function __f)
+{
+  auto __n2 = std::__size_to_integer(__n);
+  using _Cat = typename iterator_traits<_InputIterator>::iterator_category;
+  if constexpr (is_base_of_v)
+	return std::for_each(__first, __first + __n2, __f);
+  else
+	{
+	  while (__n2-->0)
+	{
+	  __f(*__first);
+	  ++__first;
+	}
+	  return __first;
+	}
+}
+#endif // C++17
+
   /**
*  @brief Find the first occurrence of a value in a sequence.
*  @ingroup 

[committed] [PR85221] Set 'omp declare target', 'omp declare target link' attributes for Fortran OpenACC 'declare'd variables (was: [gomp4] Re: OpenACC declare directive updates)

2019-06-18 Thread Thomas Schwinge
Hi!

On Fri, 27 Nov 2015 12:37:23 +0100, I wrote:
> On Thu, 19 Nov 2015 10:22:16 -0600, James Norris  
> wrote:
> > [...]

> Merging your trunk r230722 and r230725 with the existing Fortran OpenACC
> declare implementation present on gomp-4_0-branch, I effectively applied
> the following to gomp-4_0-branch in 231002.  Please verify this.
> 
> Regarding my Fortran XFAIL comments in
> ,
> with some of my earlier changes "#if 0"ed in
> gcc/fortran/trans-decl.c:add_attributes_to_decl,
> libgomp.oacc-fortran/declare-3.f90 again PASSes.  But I don't understand
> (why something like) this code (isn't needed/done differently in C/C++).

There remains a lot of mystery to be resolved regarding the OpenACC
'declare' implementation...  :-(

> --- gcc/fortran/trans-decl.c
> +++ gcc/fortran/trans-decl.c
> @@ -1302,15 +1302,20 @@ add_attributes_to_decl (symbol_attribute sym_attr, 
> tree list)
>}
>  
>if (sym_attr.omp_declare_target
> +#if 0 /* TODO */
>|| sym_attr.oacc_declare_create
>|| sym_attr.oacc_declare_copyin
>|| sym_attr.oacc_declare_deviceptr
> -  || sym_attr.oacc_declare_device_resident)
> +  || sym_attr.oacc_declare_device_resident
> +#endif
> +  )
>  list = tree_cons (get_identifier ("omp declare target"),
> NULL_TREE, list);
> +#if 0 /* TODO */
>if (sym_attr.oacc_declare_link)
>  list = tree_cons (get_identifier ("omp declare target link"),
> NULL_TREE, list);
> +#endif

As PR85221 "[openacc] ICE in install_var_field, at omp-low.c:657" tells
us, yes, these are actually necessary.

I'm confused why not all OpenACC 'declare' clauses are handled here, but
looking into that is for another day, or week.

Committed to trunk in r272453 "[PR85221] Set 'omp declare target', 'omp
declare target link' attributes for Fortran OpenACC 'declare'd
variables", see attached.


Grüße
 Thomas


From b7194d24d942998da2ab8f6f5dc080e3fff81972 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Tue, 18 Jun 2019 22:15:43 +
Subject: [PATCH] [PR85221] Set 'omp declare target', 'omp declare target link'
 attributes for Fortran OpenACC 'declare'd variables

	gcc/fortran/
	PR fortran/85221
	* trans-decl.c (add_attributes_to_decl): Handle OpenACC 'declare'
	directive.
	gcc/testsuite/
	PR fortran/85221
	* gfortran.dg/goacc/declare-3.f95: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272453 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/fortran/ChangeLog |  6 +++
 gcc/fortran/trans-decl.c  |  9 +++-
 gcc/testsuite/ChangeLog   |  3 ++
 gcc/testsuite/gfortran.dg/goacc/declare-3.f95 | 47 +++
 4 files changed, 63 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/declare-3.f95

diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index d30fa2e50a88..6fd97b61ce05 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,9 @@
+2019-06-18  Thomas Schwinge  
+
+	PR fortran/85221
+	* trans-decl.c (add_attributes_to_decl): Handle OpenACC 'declare'
+	directive.
+
 2019-06-16  Thomas Koenig  
 
 	* dump_parse_tree (debug): Add verison for formal arglist.
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index b8e07274febd..f504c279c31b 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -1432,10 +1432,15 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
   list = oacc_replace_fn_attrib_attr (list, dims);
 }
 
-  if (sym_attr.omp_declare_target_link)
+  if (sym_attr.omp_declare_target_link
+  || sym_attr.oacc_declare_link)
 list = tree_cons (get_identifier ("omp declare target link"),
 		  NULL_TREE, list);
-  else if (sym_attr.omp_declare_target)
+  else if (sym_attr.omp_declare_target
+	   || sym_attr.oacc_declare_create
+	   || sym_attr.oacc_declare_copyin
+	   || sym_attr.oacc_declare_deviceptr
+	   || sym_attr.oacc_declare_device_resident)
 list = tree_cons (get_identifier ("omp declare target"),
 		  clauses, list);
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 59d39e8c179a..552ccc6fbd68 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,8 @@
 2019-06-18  Thomas Schwinge  
 
+	PR fortran/85221
+	* gfortran.dg/goacc/declare-3.f95: New file.
+
 	PR middle-end/90859
 	* c-c++-common/goacc/firstprivate-mappings-1.c: Update.
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/declare-3.f95 b/gcc/testsuite/gfortran.dg/goacc/declare-3.f95
new file mode 100644
index ..ec5d4c5a062a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/declare-3.f95
@@ -0,0 +1,47 @@
+! Test valid usage of the OpenACC 'declare' directive.
+
+module mod_a
+  implicit none
+  integer :: a
+  !$acc declare create (a)
+end module
+
+module mod_b
+  implicit none
+  integer :: b
+  !$acc declare copyin (b)
+end module
+
+module mod_c
+  

[committed] [PR90859] Document status quo for "[OMP] Mappings for VLA different depending on 'target { c && { ! lp64 } }'" (was: [committed] Test cases to verify OpenACC 'firstprivate' mappings)

2019-06-18 Thread Thomas Schwinge
Hi!

On Wed, 19 Jun 2019 00:47:27 +0200, I wrote:
> I committed to trunk in r272451 a few "Test cases to verify OpenACC
> 'firstprivate' mappings"

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c

> +static void
> +vla (int array_li)
> +{
> +  _Complex double array[array_li];
> +  uint32_t array_so;
> +#pragma acc parallel \
> +  copyout (array_so)
> +  /* The gimplifier has created an implicit 'firstprivate' clause for the 
> array
> + length.
> + { dg-final { scan-tree-dump {(?n)#pragma omp target oacc_parallel 
> map\(from:array_so \[len: 4\]\) firstprivate\(array_li.[0-9]+\)} omplower { 
> target { ! c++ } } } }
> + { dg-final { scan-tree-dump {(?n)#pragma omp target oacc_parallel 
> map\(from:array_so \[len: 4\]\) firstprivate\(} omplower { target { c++ } } } 
> }
> + (C++ computes an intermediate value, so can't scan for 
> 'firstprivate(array_li)'.)  */
> +  {
> +array_so = sizeof array;
> +  }
> +  if (array_so != sizeof array)
> +__builtin_abort ();
> +}

It doesn't resolve PR90859, but at least in trunk r272452 we now
'Document status quo for "[OMP] Mappings for VLA different depending on
'target { c && { ! lp64 } }'"', see attached.


Grüße
 Thomas


From 75fdd6636c07a400578f53fb9d87aa13274819c4 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Tue, 18 Jun 2019 22:15:16 +
Subject: [PATCH] [PR90859] Document status quo for "[OMP] Mappings for VLA
 different depending on 'target { c && { ! lp64 } }'"

	gcc/testsuite/
	PR middle-end/90859
	* c-c++-common/goacc/firstprivate-mappings-1.c: Update.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272452 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog| 3 +++
 gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index bffa899c7676..59d39e8c179a 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,8 @@
 2019-06-18  Thomas Schwinge  
 
+	PR middle-end/90859
+	* c-c++-common/goacc/firstprivate-mappings-1.c: Update.
+
 	* c-c++-common/goacc/firstprivate-mappings-1.c: New file.
 	* g++.dg/goacc/firstprivate-mappings-1.C: Likewise.
 
diff --git a/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c b/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c
index c8270472a9c5..33576c50ecab 100644
--- a/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c
@@ -426,6 +426,9 @@ vla (int array_li)
  { dg-final { scan-tree-dump {(?n)#pragma omp target oacc_parallel map\(from:array_so \[len: 4\]\) firstprivate\(array_li.[0-9]+\)} omplower { target { ! c++ } } } }
  { dg-final { scan-tree-dump {(?n)#pragma omp target oacc_parallel map\(from:array_so \[len: 4\]\) firstprivate\(} omplower { target { c++ } } } }
  (C++ computes an intermediate value, so can't scan for 'firstprivate(array_li)'.)  */
+  /* For C, non-LP64, the gimplifier has also created a mapping for the array
+ itself; PR90859.
+ { dg-final { scan-tree-dump {(?n)#pragma omp target oacc_parallel map\(from:array_so \[len: 4\]\) firstprivate\(array_li.[0-9]+\) map\(tofrom:\(\*array.[0-9]+\) \[len: D\.[0-9]+\]\) map\(firstprivate:array \[pointer assign, bias: 0\]\) \[} omplower { target { c && { ! lp64 } } } } } */
   {
 array_so = sizeof array;
   }
-- 
2.20.1



signature.asc
Description: PGP signature


Re: Review Hashtable extract node API

2019-06-18 Thread Jonathan Wakely

On 18/06/19 22:42 +0200, François Dumont wrote:

On 6/18/19 12:54 PM, Jonathan Wakely wrote:

On 18/06/19 07:52 +0200, François Dumont wrote:

A small regression noticed while merging.

We shouldn't keep on using a moved-from key_type instance.

Ok to commit ? Feel free to do it if you prefer, I'll do so at end 
of Europe day otherwise.



diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h

index f5809c7443a..7e89e1b44c4 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -743,7 +743,8 @@ namespace __detail
std::tuple<>()
  };
  auto __pos
-    = __h->_M_insert_unique_node(__k, __bkt, __code, __node._M_node);
+    = __h->_M_insert_unique_node(__h->_M_extract()(__node._M_node->_M_v()),
+ __bkt, __code, __node._M_node);
  __node._M_node = nullptr;
  return __pos->second;
    }


I can't create an example where this causes a problem, because the key
passed to _M_insert_unique_node is never used. So it doesn't matter
that it's been moved from.

So I have to wonder why we just added the key parameter to that
function, if it's never used.


I think you've been influence by my patch. I was using a 
"_NodeAccessor" which wasn't giving access to the node without taking 
owership so I needed to pass the key properly to compute new bucket 
index in case of rehash.


But with your approach this change to the _M_insert_unique_node was 
simply unecessary so here is a patch to cleanup this part.


Ha! I see, thanks. So I should have removed that key_type parameter
again after removing the NodeAccessor stuff.



Ok to commit ?


No, because that would restore the original signature of the
_M_insert_unique_node function, but it has changed contract. Old
callers who expect that function to delete the node would now leak
memory if an exception is thrown.

If we change the contract of the function we need to change its
mangled name, so that callers expecting the old contract will not use
the new function.

I'll think about the best way to do that ...




[committed] Test cases to verify OpenACC 'firstprivate' mappings

2019-06-18 Thread Thomas Schwinge
Hi!

I committed to trunk in r272451 a few "Test cases to verify OpenACC
'firstprivate' mappings", see attached.  More to come, later on.


Grüße
 Thomas


From 2f195960a11d9eb027e4abcfc5faaca2ff5fe9e3 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Tue, 18 Jun 2019 22:15:03 +
Subject: [PATCH] Test cases to verify OpenACC 'firstprivate' mappings

	gcc/testsuite/
	* c-c++-common/goacc/firstprivate-mappings-1.c: New file.
	* g++.dg/goacc/firstprivate-mappings-1.C: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c++/firstprivate-mappings-1.C: New file.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-mappings-1.c:
	Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272451 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog   |   3 +
 .../goacc/firstprivate-mappings-1.c   | 533 ++
 .../g++.dg/goacc/firstprivate-mappings-1.C| 529 +
 libgomp/ChangeLog |   4 +
 .../firstprivate-mappings-1.C |   3 +
 .../firstprivate-mappings-1.c |   6 +
 6 files changed, 1078 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/firstprivate-mappings-1.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/firstprivate-mappings-1.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-mappings-1.c

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 699a94b3ed40..bffa899c7676 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,8 @@
 2019-06-18  Thomas Schwinge  
 
+	* c-c++-common/goacc/firstprivate-mappings-1.c: New file.
+	* g++.dg/goacc/firstprivate-mappings-1.C: Likewise.
+
 	PR testsuite/90861
 	* c-c++-common/goacc/declare-pr90861.c: New file.
 
diff --git a/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c b/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c
new file mode 100644
index ..c8270472a9c5
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/firstprivate-mappings-1.c
@@ -0,0 +1,533 @@
+/* Verify OpenACC 'firstprivate' mappings.  */
+
+/* This file is also sourced from
+   '../../../../libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-mappings-1.c'
+   as an execution test.  */
+
+/* See also '../../g++.dg/goacc/firstprivate-mappings-1.C'.  */
+
+/* { dg-additional-options "-fdump-tree-omplower" } */
+
+/* { dg-additional-options "-fext-numeric-literals" { target c++ } } */
+
+/* { dg-additional-options "-Wno-psabi" } as apparently we're doing funny
+   things with vector arguments.  */
+
+#include 
+#include 
+#include 
+
+
+#ifdef __SIZEOF_INT128__
+# define HAVE_INT128 1
+#else
+# define HAVE_INT128 0
+#endif
+
+
+/* The one is only relevant for offloading compilation; will always be enabled
+   when doing tree scanning.  */
+#ifdef ACC_DEVICE_TYPE_nvidia
+/* PR71064.  */
+# define DO_LONG_DOUBLE 0
+#else
+# define DO_LONG_DOUBLE 1
+#endif
+
+
+/* Simplify scanning for function names in tree dumps.  */
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/* Inside the following OpenACC 'parallel' constructs' regions, we modify the
+   'firstprivate' variables, so that we can check that we don't copy these
+   back.  */
+
+
+static void
+p (short *spi)
+{
+  short *spo;
+#pragma acc parallel \
+  copyout (spo) \
+  firstprivate (spi)
+  {
+spo = ++spi;
+  }
+  if (spo != spi + 1)
+__builtin_abort ();
+}
+
+
+static void
+b (bool bi)
+{
+  bool bo;
+#pragma acc parallel \
+  copyout (bo) \
+  firstprivate (bi)
+  {
+bo = (bi = !bi);
+  }
+  if (bo != !bi)
+__builtin_abort ();
+}
+
+
+static void
+i (int8_t i8i,
+   uint8_t u8i,
+   int16_t i16i,
+   uint16_t u16i,
+   int32_t i32i,
+   uint32_t u32i,
+   int64_t i64i,
+   uint64_t u64i)
+{
+  int8_t i8o;
+  uint8_t u8o;
+  int16_t i16o;
+  uint16_t u16o;
+  int32_t i32o;
+  uint32_t u32o;
+  int64_t i64o;
+  uint64_t u64o;
+#pragma acc parallel \
+  copyout (i8o) \
+  firstprivate (i8i) \
+  copyout (u8o) \
+  firstprivate (u8i) \
+  copyout (i16o) \
+  firstprivate (i16i) \
+  copyout (u16o) \
+  firstprivate (u16i) \
+  copyout (i32o) \
+  firstprivate (i32i) \
+  copyout (u32o) \
+  firstprivate (u32i) \
+  copyout (i64o) \
+  firstprivate (i64i) \
+  copyout (u64o) \
+  firstprivate (u64i)
+  {
+i8o = --i8i;
+u8o = ++u8i;
+i16o = --i16i;
+u16o = ++u16i;
+i32o = --i32i;
+u32o = ++u32i;
+i64o = --i64i;
+u64o = ++u64i;
+  }
+  if (i8o != i8i - 1)
+__builtin_abort ();
+  if (u8o != u8i + 1)
+__builtin_abort ();
+  if (i16o != i16i - 1)
+__builtin_abort ();
+  if (u16o != u16i + 1)
+__builtin_abort ();
+  if (i32o != i32i - 1)
+__builtin_abort ();
+  if (u32o != u32i + 1)
+__builtin_abort ();
+  if (i64o != i64i - 1)
+__builtin_abort ();
+  if (u64o != u64i + 1)
+__builtin_abort ();
+}
+
+
+#if HAVE_INT128
+static void
+i128 (__int128 i128i, unsigned 

[committed] Fix description of 'GOMP_MAP_FIRSTPRIVATE' (was: [OpenACC] declare directive)

2019-06-18 Thread Thomas Schwinge
Hi!

On Mon, 23 Nov 2015 13:37:20 +0100, I wrote:
> Hi Jim!
> 
> A few things I noticed when working on merging your trunk r230275 into
> gomp-4_0-branch.  Please fix these (on trunk).

> | --- include/gomp-constants.h
> | +++ include/gomp-constants.h
> | @@ -72,6 +72,11 @@ enum gomp_map_kind
> | POINTER_SIZE_UNITS.  */
> |  GOMP_MAP_FORCE_DEVICEPTR = (GOMP_MAP_FLAG_SPECIAL_1 | 0),
> |  /* Do not map, copy bits for firstprivate instead.  */
> | +/* OpenACC device_resident.  */
> | +GOMP_MAP_DEVICE_RESIDENT = (GOMP_MAP_FLAG_SPECIAL_1 | 1),
> | +/* OpenACC link.  */
> | +GOMP_MAP_LINK =(GOMP_MAP_FLAG_SPECIAL_1 | 2),
> | +/* Allocate.  */
> |  GOMP_MAP_FIRSTPRIVATE =(GOMP_MAP_FLAG_SPECIAL | 0),
> |  /* Similarly, but store the value in the pointer rather than
> | pointed by the pointer.  */
> 
> I suspect the "Do not map, copy bits for firstprivate instead" comment
> still applies to GOMP_MAP_FIRSTPRIVATE only, which here (unintended?) got
> an "Allocate" comment added?

As obvious, now fixed on trunk in r272450 "Fix description of
'GOMP_MAP_FIRSTPRIVATE'", see attached.


Grüße
 Thomas


From 2a0899eaf3a9aabe64cb0649fd7a7fa263ebfcaa Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Tue, 18 Jun 2019 22:14:52 +
Subject: [PATCH] Fix description of 'GOMP_MAP_FIRSTPRIVATE'

..., which got garbled in r230275.

	include/
	* gomp-constants.h (enum gomp_map_kind): Fix description of
	'GOMP_MAP_FIRSTPRIVATE'.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272450 138bc75d-0d04-0410-961f-82ee72b054a4
---
 include/ChangeLog| 5 +
 include/gomp-constants.h | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/ChangeLog b/include/ChangeLog
index 28f4664aa557..6d09a8d6e07b 100644
--- a/include/ChangeLog
+++ b/include/ChangeLog
@@ -1,3 +1,8 @@
+2019-06-18  Thomas Schwinge  
+
+	* gomp-constants.h (enum gomp_map_kind): Fix description of
+	'GOMP_MAP_FIRSTPRIVATE'.
+
 2019-06-10  Martin Liska  
 
 	* ansidecl.h (ATTRIBUTE_WARN_UNUSED_RESULT): New macro.
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 8b93634f1b8b..82e9094c9342 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -71,12 +71,11 @@ enum gomp_map_kind
 /* Is a device pointer.  OMP_CLAUSE_SIZE for these is unused; is implicitly
POINTER_SIZE_UNITS.  */
 GOMP_MAP_FORCE_DEVICEPTR =		(GOMP_MAP_FLAG_SPECIAL_1 | 0),
-/* Do not map, copy bits for firstprivate instead.  */
 /* OpenACC device_resident.  */
 GOMP_MAP_DEVICE_RESIDENT =		(GOMP_MAP_FLAG_SPECIAL_1 | 1),
 /* OpenACC link.  */
 GOMP_MAP_LINK =			(GOMP_MAP_FLAG_SPECIAL_1 | 2),
-/* Allocate.  */
+/* Do not map, copy bits for firstprivate instead.  */
 GOMP_MAP_FIRSTPRIVATE =		(GOMP_MAP_FLAG_SPECIAL | 0),
 /* Similarly, but store the value in the pointer rather than
pointed by the pointer.  */
-- 
2.20.1



signature.asc
Description: PGP signature


[committed] Add missing results check in 'libgomp.fortran/allocatable3.f90' (was: [5/5] gomp-3_0-branch merge to trunk - OpenMP testsuite additions and fixes)

2019-06-18 Thread Thomas Schwinge
Hi!

On Thu, 5 Jun 2008 12:08:28 -0400, Jakub Jelinek  wrote:
> This patch contains the OpenMP specific testsuite additions and fixes.

>   * testsuite/libgomp.fortran/allocatable3.f90: New test.

As obvious (that is, as done for all the other test cases), I committed
to trunk in r272449 "Add missing results check in
'libgomp.fortran/allocatable3.f90'", see attached.


Grüße
 Thomas


From 4173ac452932749a0f1d19228adc9a2284bc Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Tue, 18 Jun 2019 22:14:43 +
Subject: [PATCH] Add missing results check in
 'libgomp.fortran/allocatable3.f90'

	libgomp/
	* testsuite/libgomp.fortran/allocatable3.f90: Add missing results
	check.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272449 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  | 5 +
 libgomp/testsuite/libgomp.fortran/allocatable3.f90 | 1 +
 2 files changed, 6 insertions(+)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index bd57e38ce19a..ef6397f1d28b 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,8 @@
+2019-06-18  Thomas Schwinge  
+
+	* testsuite/libgomp.fortran/allocatable3.f90: Add missing results
+	check.
+
 2019-06-18  Cesar Philippidis  
 
 	* testsuite/libgomp.oacc-fortran/allocatable-array-1.f90: New
diff --git a/libgomp/testsuite/libgomp.fortran/allocatable3.f90 b/libgomp/testsuite/libgomp.fortran/allocatable3.f90
index 03ed1ac3f1ad..df69fff54919 100644
--- a/libgomp/testsuite/libgomp.fortran/allocatable3.f90
+++ b/libgomp/testsuite/libgomp.fortran/allocatable3.f90
@@ -18,4 +18,5 @@
   l = l.or.any (a.ne.0)
   deallocate (a)
 !$omp end parallel
+  if (l.or.allocated (a)) STOP 2
 end
-- 
2.20.1



signature.asc
Description: PGP signature


[committed] Add 'libgomp.oacc-fortran/allocatable-array-1.f90' (was: [gomp4] Properly handle allocatable scalars in acc update)

2019-06-18 Thread Thomas Schwinge
Hi!

On Thu, 8 Jun 2017 14:40:29 -0700, Cesar Philippidis  
wrote:
> This patch fixes a bug I introduced while [...]

>   libgomp/
>   * testsuite/libgomp.oacc-fortran/allocatable-array-1.f90: New test.

This test case actually PASSes with current trunk, committed in r272448
"Add 'libgomp.oacc-fortran/allocatable-array-1.f90'", see attached.


Grüße
 Thomas


From 601722d6802378ebca4b7be0d6c53d6e6541b64c Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Tue, 18 Jun 2019 22:14:34 +
Subject: [PATCH] Add 'libgomp.oacc-fortran/allocatable-array-1.f90'

	libgomp/
	* testsuite/libgomp.oacc-fortran/allocatable-array-1.f90: New
	file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272448 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog |  5 
 .../allocatable-array-1.f90   | 27 +++
 2 files changed, 32 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/allocatable-array-1.f90

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 62c45828a009..bd57e38ce19a 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,8 @@
+2019-06-18  Cesar Philippidis  
+
+	* testsuite/libgomp.oacc-fortran/allocatable-array-1.f90: New
+	file.
+
 2019-06-18  Thomas Schwinge  
 
 	PR fortran/90743
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/allocatable-array-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/allocatable-array-1.f90
new file mode 100644
index ..c9a76385d9f7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/allocatable-array-1.f90
@@ -0,0 +1,27 @@
+! { dg-do run }
+
+program main
+  integer, parameter :: n = 40
+  integer, allocatable :: ar(:,:,:)
+  integer :: i
+
+  allocate (ar(1:n,0:n-1,0:n-1))
+  !$acc enter data copyin (ar)
+
+  !$acc update host (ar)
+
+  !$acc update device (ar)
+
+  call update_ar (ar, n)
+
+  !$acc exit data copyout (ar)
+end program main
+
+subroutine update_ar (ar, n)
+  integer :: n
+  integer, dimension (1:n,0:n-1,0:n-1) :: ar
+
+  !$acc update host (ar)
+
+  !$acc update device (ar)
+end subroutine update_ar
-- 
2.20.1



signature.asc
Description: PGP signature


[committed] [PR90743] Fortran 'allocatable' with OpenACC data/OpenMP 'target' 'map' clauses

2019-06-18 Thread Thomas Schwinge
Hi!

On Wed, 5 Jun 2019 12:32:12 +0200, Jakub Jelinek  wrote:
> On Wed, Jun 05, 2019 at 12:26:32PM +0200, Thomas Schwinge wrote:
> > On Wed, 5 Jun 2019 12:00:25 +0200, Jakub Jelinek  wrote:
> > > On Wed, Jun 05, 2019 at 11:25:07AM +0200, Thomas Schwinge wrote:
> > > > +  !$omp target map(to: a) map(tofrom: b, c, d) map(from: e)
> > > > +  !$acc parallel copyin(a) copy(b, c, d) copyout(e)
> > > 
> > > Is mixing OpenMP and OpenACC construct this way defined at all?
> > 
> > It's not.  I'm using this just to avoid duplicating the test case file,
> > that is, '-fopenacc' and '-fopenmp' aren't enabled at the same time.
> 
> I think it is better to duplicate the test, it avoids confusion.

I committed to trunk in r272447 "[PR90743] Fortran 'allocatable' with
OpenACC data/OpenMP 'target' 'map' clauses", see attached.

That however doesn't resolve this topic: more test cases are needed, and
code changes, too, to support other clauses.


Grüße
 Thomas


From 561ffc69c504b2c897fc2991cf0bb99defa80efb Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Tue, 18 Jun 2019 22:14:24 +
Subject: [PATCH] [PR90743] Fortran 'allocatable' with OpenACC data/OpenMP
 'target' 'map' clauses

Test what OpenMP 5.0 has to say on this topic.  And, do the same for OpenACC.

	libgomp/
	PR fortran/90743
	* oacc-parallel.c (GOACC_parallel_keyed): Handle NULL mapping
	case.
	* testsuite/libgomp.fortran/target-allocatable-1-1.f90: New file.
	* testsuite/libgomp.fortran/target-allocatable-1-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/allocatable-1-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/allocatable-1-2.f90: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272447 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog |  8 ++
 libgomp/oacc-parallel.c   |  9 +-
 .../target-allocatable-1-1.f90| 69 
 .../target-allocatable-1-2.f90| 82 +++
 .../libgomp.oacc-fortran/allocatable-1-1.f90  | 68 +++
 .../libgomp.oacc-fortran/allocatable-1-2.f90  | 81 ++
 6 files changed, 314 insertions(+), 3 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-allocatable-1-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-allocatable-1-2.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/allocatable-1-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/allocatable-1-2.f90

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 1a0d363e4ba2..62c45828a009 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,5 +1,13 @@
 2019-06-18  Thomas Schwinge  
 
+	PR fortran/90743
+	* oacc-parallel.c (GOACC_parallel_keyed): Handle NULL mapping
+	case.
+	* testsuite/libgomp.fortran/target-allocatable-1-1.f90: New file.
+	* testsuite/libgomp.fortran/target-allocatable-1-2.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/allocatable-1-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/allocatable-1-2.f90: Likewise.
+
 	PR testsuite/90861
 	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Update.
 
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index e56330f6226b..0c2cfa05a438 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -325,9 +325,12 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
   
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
-devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
-			+ tgt->list[i].key->tgt_offset
-			+ tgt->list[i].offset);
+if (tgt->list[i].key != NULL)
+  devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
+			  + tgt->list[i].key->tgt_offset
+			  + tgt->list[i].offset);
+else
+  devaddrs[i] = NULL;
   if (aq == NULL)
 acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, dims,
 tgt);
diff --git a/libgomp/testsuite/libgomp.fortran/target-allocatable-1-1.f90 b/libgomp/testsuite/libgomp.fortran/target-allocatable-1-1.f90
new file mode 100644
index ..429a855a20b2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target-allocatable-1-1.f90
@@ -0,0 +1,69 @@
+! Test 'allocatable' with OpenMP 'target' 'map' clauses.
+
+! See also '../libgomp.oacc-fortran/allocatable-1-1.f90'.
+
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+! { dg-additional-options "-DMEM_SHARED" { target offload_device_shared_as } }
+
+program main
+  implicit none
+  integer, allocatable :: a, b, c, d, e
+
+  allocate (a)
+  a = 11
+
+  b = 25 ! Implicit allocation.
+
+  c = 52 ! Implicit allocation.
+
+  !No 'allocate (d)' here.
+
+  !No 'allocate (e)' here.
+
+  !$omp target map(to: a) map(tofrom: b, c, d) map(from: e)
+
+  if (.not. allocated (a)) stop 1
+  if (a .ne. 11) stop 2
+  a = 33
+
+  if (.not. allocated (b)) stop 3
+  if (b .ne. 25) stop 4
+
+  if (.not. allocated (c)) stop 5
+  if (c .ne. 52) stop 6
+  c = 10
+
+  if (allocated (d)) stop 7
+  d = 42 ! Implicit 

Re: [PATCH][gcc] libgccjit: check result_type in gcc_jit_context_new_binary_op

2019-06-18 Thread David Malcolm
On Mon, 2019-06-10 at 09:15 +, Andrea Corallo wrote:
> Hi all,
> I would like to propose this patch to check for the return type of
> binary operators to be a numeric type.
> Not doing so can lead the compiler into funny crashes.
> 
> Does not introduce regressions running make check-jit.
> 
> OK for trunk?
> 
> Bests
>   Andrea
> 
> 2019-06-09  Andrea Corallo  andrea.cora...@arm.com
> 
> * libgccjit.c (gcc_jit_context_new_binary_op): Check result_type to
> be a
> numeric type.

Thanks for this patch.  Please can you add a test case that triggers
the error-handling case?  (I'm trying to imagine a situation where this
could have happened).

See gcc/testsuite/jit.dg/test-error-new-binary-op-bad-op.c or similar.

Dave


[committed] [PR90861] Document status quo for OpenACC 'declare' not cleaning up for VLAs

2019-06-18 Thread Thomas Schwinge
Hi!

This doesn't resolve PR90861, but at least in trunk r272446 we now
"Document status quo for OpenACC 'declare' not cleaning up for VLAs", see
attached.


Grüße
 Thomas


From 3f8b36838cd2aa34d59d867ed22fad054f489884 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Tue, 18 Jun 2019 22:14:14 +
Subject: [PATCH] [PR90861] Document status quo for OpenACC 'declare' not
 cleaning up for VLAs

	gcc/testsuite/
	PR testsuite/90861
	* c-c++-common/goacc/declare-pr90861.c: New file.
	libgomp/
	PR testsuite/90861
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Update.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272446 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog   |  3 ++
 .../c-c++-common/goacc/declare-pr90861.c  | 21 +
 libgomp/ChangeLog |  3 ++
 .../libgomp.oacc-c-c++-common/declare-vla.c   | 47 +--
 4 files changed, 71 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/declare-pr90861.c

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 981055838ab6..699a94b3ed40 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,8 @@
 2019-06-18  Thomas Schwinge  
 
+	PR testsuite/90861
+	* c-c++-common/goacc/declare-pr90861.c: New file.
+
 	PR testsuite/90868
 	* c-c++-common/goacc/declare-1.c: Update.
 	* c-c++-common/goacc/declare-2.c: Likewise.
diff --git a/gcc/testsuite/c-c++-common/goacc/declare-pr90861.c b/gcc/testsuite/c-c++-common/goacc/declare-pr90861.c
new file mode 100644
index ..7c905624f7a1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/declare-pr90861.c
@@ -0,0 +1,21 @@
+/* Verify that OpenACC 'declare' cleans up for VLAs.  */
+
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+void f1 (void)
+{
+#define N_f1 1000
+  int A_f1[N_f1];
+#pragma acc declare copy(A_f1)
+  /* { dg-final { scan-tree-dump-times {#pragma omp target oacc_declare map\(to:A_f1} 1 gimple } }
+ { dg-final { scan-tree-dump-times {#pragma omp target oacc_declare map\(from:A_f1} 1 gimple } } */
+}
+
+void f2 (void)
+{
+  int N_f2 = 1000;
+  int A_f2[N_f2];
+#pragma acc declare copy(A_f2)
+  /* { dg-final { scan-tree-dump-times {#pragma omp target oacc_declare map\(to:\(\*A_f2} 1 gimple } }
+ { dg-final { scan-tree-dump-times {#pragma omp target oacc_declare map\(from:\(\*A_f2} 1 gimple { xfail *-*-* } } } TODO PR90861 */
+}
diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 06004aafde98..1a0d363e4ba2 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,5 +1,8 @@
 2019-06-18  Thomas Schwinge  
 
+	PR testsuite/90861
+	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Update.
+
 	PR middle-end/90862
 	* testsuite/libgomp.oacc-c-c++-common/declare-1.c: Update.
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
index 3ea148ed40db..0f51badca42e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
@@ -1,9 +1,10 @@
-/* Verify that acc declare accept VLA variables.  */
+/* Verify OpenACC 'declare' with VLAs.  */
 
 #include 
 
-int
-main ()
+
+void
+f (void)
 {
   int N = 1000;
   int i, A[N];
@@ -20,6 +21,46 @@ main ()
 
   for (i = 0; i < N; i++)
 assert (A[i] == i);
+}
+
+
+/* The same as 'f' but everything contained in an OpenACC 'data' construct.  */
+
+void
+f_data (void)
+{
+#pragma acc data
+  {
+int N = 1000;
+int i, A[N];
+# pragma acc declare copy(A)
+
+for (i = 0; i < N; i++)
+  A[i] = -i;
+
+# pragma acc kernels
+for (i = 0; i < N; i++)
+  A[i] = i;
+
+# pragma acc update host(A)
+
+for (i = 0; i < N; i++)
+  assert (A[i] == i);
+  }
+}
+
+
+int
+main ()
+{
+  f ();
+
+  f_data ();
 
   return 0;
 }
+
+
+/* { dg-xfail-run-if "TODO PR90861" { *-*-* } { "-DACC_MEM_SHARED=0" } }
+   This might XPASS if the compiler happens to put the two 'A' VLAs at the same
+   address.  */
-- 
2.20.1



signature.asc
Description: PGP signature


[committed] [PR90868] Document status quo for duplicate OpenACC 'declare' directives for 'extern' variables

2019-06-18 Thread Thomas Schwinge
Hi!

This doesn't resolve PR90868, but at least in trunk r272445 we now
"Document status quo for duplicate OpenACC 'declare' directives for
'extern' variables", see attached.


Grüße
 Thomas


From 267951437cde77a09e62d9c151002eeed3cf457c Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Tue, 18 Jun 2019 22:14:04 +
Subject: [PATCH] [PR90868] Document status quo for duplicate OpenACC 'declare'
 directives for 'extern' variables

	gcc/testsuite/
	PR testsuite/90868
	* c-c++-common/goacc/declare-1.c: Update.
	* c-c++-common/goacc/declare-2.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272445 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog  |  4 +
 gcc/testsuite/c-c++-common/goacc/declare-1.c | 85 -
 gcc/testsuite/c-c++-common/goacc/declare-2.c | 99 
 3 files changed, 187 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 473fd66d39fd..981055838ab6 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,9 @@
 2019-06-18  Thomas Schwinge  
 
+	PR testsuite/90868
+	* c-c++-common/goacc/declare-1.c: Update.
+	* c-c++-common/goacc/declare-2.c: Likewise.
+
 	PR middle-end/90862
 	* c-c++-common/goacc/declare-1.c: Update.
 	* c-c++-common/goacc/declare-2.c: Likewise.
diff --git a/gcc/testsuite/c-c++-common/goacc/declare-1.c b/gcc/testsuite/c-c++-common/goacc/declare-1.c
index 7c4380f4f041..46ee01b67595 100644
--- a/gcc/testsuite/c-c++-common/goacc/declare-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/declare-1.c
@@ -96,6 +96,84 @@ f (void)
 }
 
 
+/* The same as 'f'.  */
+
+void
+f_2 (void)
+{
+  int va0;
+#pragma acc declare create(va0)
+
+  int va1;
+#pragma acc declare copyin(va1)
+
+  int *va2;
+#pragma acc declare deviceptr(va2)
+
+  int va3;
+#pragma acc declare device_resident(va3)
+
+#ifndef __cplusplus
+  /* TODO PR90868
+
+ C: "error: variable '[...]' used more than once with '#pragma acc declare'".  */
+#else
+  extern int ve0;
+#pragma acc declare create(ve0)
+
+  extern int ve1;
+#pragma acc declare copyin(ve1)
+
+  extern int *ve2;
+#pragma acc declare deviceptr(ve2)
+
+  extern int ve3;
+#pragma acc declare device_resident(ve3)
+
+  extern int ve4;
+#pragma acc declare link(ve4)
+
+  extern int ve5;
+#pragma acc declare present_or_copyin(ve5)
+ 
+  extern int ve6;
+#pragma acc declare present_or_create(ve6)
+#endif
+
+  int va5;
+#pragma acc declare copy(va5)
+
+  int va6;
+#pragma acc declare copyout(va6)
+
+  int va7;
+#pragma acc declare present(va7)
+
+  int va8;
+#pragma acc declare present_or_copy(va8)
+
+  int va9;
+#pragma acc declare present_or_copyin(va9)
+
+  int va10;
+#pragma acc declare present_or_copyout(va10)
+
+  int va11;
+#pragma acc declare present_or_create(va11)
+
+ a:
+  {
+int va0;
+#pragma acc declare create(va0)
+if (v1)
+  goto a;
+else
+  goto b;
+  }
+ b:;
+}
+
+
 /* The same as 'f' but everything contained in an OpenACC 'data' construct.  */
 
 void
@@ -115,7 +193,12 @@ f_data (void)
 int va3;
 # pragma acc declare device_resident(va3)
 
-#if 0 /* TODO */
+#if 0
+/* TODO PR90868
+
+   C: "error: variable '[...]' used more than once with '#pragma acc declare'".
+   C++: ICE during gimplification.  */
+
 extern int ve0;
 # pragma acc declare create(ve0)
 
diff --git a/gcc/testsuite/c-c++-common/goacc/declare-2.c b/gcc/testsuite/c-c++-common/goacc/declare-2.c
index af43b6bc8162..e2e22be57e9e 100644
--- a/gcc/testsuite/c-c++-common/goacc/declare-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/declare-2.c
@@ -96,3 +96,102 @@ f_data (void)
 # pragma acc declare present (v2) /* { dg-error "invalid use of" } */
   }
 }
+
+
+/* Testing for PR90868 "Duplicate OpenACC 'declare' directives for 'extern'
+   variables".  */
+
+
+void
+f_pr90868 (void)
+{
+  extern int we0;
+#pragma acc declare create(we0)
+
+  extern int we1;
+#pragma acc declare copyin(we1)
+
+  extern int *we2;
+#pragma acc declare deviceptr(we2)
+
+  extern int we3;
+#pragma acc declare device_resident(we3)
+
+  extern int we4;
+#pragma acc declare link(we4)
+
+  extern int we5;
+#pragma acc declare present_or_copyin(we5)
+ 
+  extern int we6;
+#pragma acc declare present_or_create(we6)
+}
+
+
+/* The same as 'f_pr90868'.  */
+
+/* The errors are emitted for C only; for C++, the duplicate OpenACC 'declare'
+   directives for 'extern' variables are accepted.  */
+
+void
+f_pr90868_2 (void)
+{
+  extern int we0;
+#pragma acc declare create(we0) /* { dg-error "variable 'we0' used more than once with '#pragma acc declare'" "" { target c } } */
+
+  extern int we1;
+#pragma acc declare copyin(we1) /* { dg-error "variable 'we1' used more than once with '#pragma acc declare'" "" { target c } } */
+
+  extern int *we2;
+#pragma acc declare deviceptr(we2) /* { dg-error "variable 'we2' used more than once with '#pragma acc declare'" "" { target c } } */
+
+  extern int we3;
+#pragma acc declare device_resident(we3) /* { dg-error 

Re: [PATCH] xtensa: fix PR target/90922

2019-06-18 Thread Max Filippov
On Tue, Jun 18, 2019 at 3:09 PM augustine.sterl...@gmail.com
 wrote:
>
> On Tue, Jun 18, 2019 at 2:27 PM Max Filippov  wrote:
> >
> > Stack pointer adjustment code in prologue missed a case of no
> > callee-saved registers and a stack frame size bigger than 128 bytes.
> > Handle that case.
> >
> > This fixes the following gcc tests with call0 ABI:
> >   gcc.c-torture/execute/stdarg-2.c
> >   gcc.dg/torture/pr55882.c
> >   gcc.dg/torture/pr57569.c
>
> Approved, please apply.

Thanks. Applied to trunk.
I'll backport it later to gcc-7..9 branches.

-- 
Thanks.
-- Max


[committed] [PR90862] OpenACC 'declare' ICE when nested inside another construct (was: [OpenACC] declare directive)

2019-06-18 Thread Thomas Schwinge
Hi!

On Wed, 11 Nov 2015 19:07:58 -0600, James Norris  
wrote:
> [...]
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -170,6 +170,7 @@ enum gf_mask {
>  GF_OMP_TARGET_KIND_OACC_DATA = 7,
>  GF_OMP_TARGET_KIND_OACC_UPDATE = 8,
>  GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 9,
> +GF_OMP_TARGET_KIND_OACC_DECLARE = 10,
> [...]

This forgot to update 'check_omp_nesting_restrictions', giving rise to
PR90862 "OpenACC 'declare' ICE when nested inside another construct".
Now fixed on trunk in r272444, see attached.


Grüße
 Thomas


From acb4157074770f715968c3a9c1e6929f98fcddc8 Mon Sep 17 00:00:00 2001
From: tschwinge 
Date: Tue, 18 Jun 2019 22:13:54 +
Subject: [PATCH] [PR90862] OpenACC 'declare' ICE when nested inside another
 construct

	gcc/
	PR middle-end/90862
	* omp-low.c (check_omp_nesting_restrictions): Handle
	GF_OMP_TARGET_KIND_OACC_DECLARE.
	gcc/testsuite/
	PR middle-end/90862
	* c-c++-common/goacc/declare-1.c: Update.
	* c-c++-common/goacc/declare-2.c: Likewise.
	libgomp/
	PR middle-end/90862
	* testsuite/libgomp.oacc-c-c++-common/declare-1.c: Update.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272444 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |  6 ++
 gcc/omp-low.c |  1 +
 gcc/testsuite/ChangeLog   |  6 ++
 gcc/testsuite/c-c++-common/goacc/declare-1.c  | 82 +++-
 gcc/testsuite/c-c++-common/goacc/declare-2.c  | 35 ++-
 libgomp/ChangeLog |  5 +
 .../libgomp.oacc-c-c++-common/declare-1.c | 98 +--
 7 files changed, 223 insertions(+), 10 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index cbf6915c8286..43a0a232dc21 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2019-06-18  Thomas Schwinge  
+
+	PR middle-end/90862
+	* omp-low.c (check_omp_nesting_restrictions): Handle
+	GF_OMP_TARGET_KIND_OACC_DECLARE.
+
 2019-06-18  Uroš Bizjak  
 
 	* config/i386/i386.md (@cmp_1): Rename from cmp_1.
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 9df21a4d0466..b0f1d94abf73 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3119,6 +3119,7 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 	case GF_OMP_TARGET_KIND_OACC_UPDATE: stmt_name = "update"; break;
 	case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
 	  stmt_name = "enter/exit data"; break;
+	case GF_OMP_TARGET_KIND_OACC_DECLARE: stmt_name = "declare"; break;
 	case GF_OMP_TARGET_KIND_OACC_HOST_DATA: stmt_name = "host_data";
 	  break;
 	default: gcc_unreachable ();
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 2848d2ceecab..473fd66d39fd 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2019-06-18  Thomas Schwinge  
+
+	PR middle-end/90862
+	* c-c++-common/goacc/declare-1.c: Update.
+	* c-c++-common/goacc/declare-2.c: Likewise.
+
 2019-06-18  Marek Polacek  
 
 	PR c++/84698
diff --git a/gcc/testsuite/c-c++-common/goacc/declare-1.c b/gcc/testsuite/c-c++-common/goacc/declare-1.c
index 35b1ccd367bd..7c4380f4f041 100644
--- a/gcc/testsuite/c-c++-common/goacc/declare-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/declare-1.c
@@ -1,5 +1,5 @@
-/* Test valid uses of declare directive.  */
-/* { dg-do compile } */
+/* Test valid use of the OpenACC 'declare' directive.  */
+
 
 int v0;
 #pragma acc declare create(v0)
@@ -25,6 +25,7 @@ int v9;
 int v10;
 #pragma acc declare present_or_create(v10)
 
+
 void
 f (void)
 {
@@ -93,3 +94,80 @@ f (void)
   }
  b:;
 }
+
+
+/* The same as 'f' but everything contained in an OpenACC 'data' construct.  */
+
+void
+f_data (void)
+{
+#pragma acc data
+  {
+int va0;
+# pragma acc declare create(va0)
+
+int va1;
+# pragma acc declare copyin(va1)
+
+int *va2;
+# pragma acc declare deviceptr(va2)
+
+int va3;
+# pragma acc declare device_resident(va3)
+
+#if 0 /* TODO */
+extern int ve0;
+# pragma acc declare create(ve0)
+
+extern int ve1;
+# pragma acc declare copyin(ve1)
+
+extern int *ve2;
+# pragma acc declare deviceptr(ve2)
+
+extern int ve3;
+# pragma acc declare device_resident(ve3)
+
+extern int ve4;
+# pragma acc declare link(ve4)
+
+extern int ve5;
+# pragma acc declare present_or_copyin(ve5)
+ 
+extern int ve6;
+# pragma acc declare present_or_create(ve6)
+#endif
+
+int va5;
+# pragma acc declare copy(va5)
+
+int va6;
+# pragma acc declare copyout(va6)
+
+int va7;
+# pragma acc declare present(va7)
+
+int va8;
+# pragma acc declare present_or_copy(va8)
+
+int va9;
+# pragma acc declare present_or_copyin(va9)
+
+int va10;
+# pragma acc declare present_or_copyout(va10)
+
+int va11;
+# pragma acc declare present_or_create(va11)
+
+  a:
+{
+  int va0;
+# pragma acc declare create(va0)
+  if (v1)
+	goto a;
+  else
+	goto b;
+}
+  b:;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/declare-2.c b/gcc/testsuite/c-c++-common/goacc/declare-2.c

Re: [PATCH] xtensa: fix PR target/90922

2019-06-18 Thread augustine.sterl...@gmail.com
On Tue, Jun 18, 2019 at 2:27 PM Max Filippov  wrote:
>
> Stack pointer adjustment code in prologue missed a case of no
> callee-saved registers and a stack frame size bigger than 128 bytes.
> Handle that case.
>
> This fixes the following gcc tests with call0 ABI:
>   gcc.c-torture/execute/stdarg-2.c
>   gcc.dg/torture/pr55882.c
>   gcc.dg/torture/pr57569.c

Approved, please apply.


C++ PATCH to add test for c++/84698

2019-06-18 Thread Marek Polacek
Another test that got fixed by my fix for member friend templates with
noexcept (r270005).  It's short and valid, so I'll add it too.

Tested x86_64-linux, applying to trunk.

2019-06-18  Marek Polacek  

PR c++/84698
* g++.dg/cpp0x/noexcept42.C: New test.

diff --git gcc/testsuite/g++.dg/cpp0x/noexcept42.C 
gcc/testsuite/g++.dg/cpp0x/noexcept42.C
new file mode 100644
index 000..5d7218dd3e0
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/noexcept42.C
@@ -0,0 +1,21 @@
+// PR c++/84698
+// { dg-do compile { target c++11 } }
+
+template
+struct X {
+  void swap(X& o) noexcept { }
+
+  template
+  friend void swap(X& a, X& b) noexcept(noexcept(a.swap(b)));
+};
+
+template
+inline void swap(X& a, X& b) noexcept(noexcept(a.swap(b)))
+{
+}
+
+int
+main ()
+{
+  X x;
+}


[PATCH] Add CXX_FOR_BUILD to HOST_EXPORTS

2019-06-18 Thread Michael Forney
gcc/configure needs this to generate auto-build.h using the right C++
compiler.

2019-06-18  Michael Forney  

* Makefile.tpl (HOST_EXPORTS): Add CXX_FOR_BUILD.
* Makefile.in: Regenerate.

---
I ran into this since I needed to pass some special flags to the build
C++ compiler for my system and was getting the mysterious error

make[3]: *** No rule to make target 'auto-build.h', needed by 
'build/genmddeps.o'.  Stop.

Turns out this was because the sub-configure run by gcc/configure was
failing because my CXX_FOR_BUILD was being ignored, but it is not treated
as a fatal error (#65794). The configure directory containing config.log
is subsequently deleted, making it difficult to figure out what actually
went wrong.

 Makefile.in  | 1 +
 Makefile.tpl | 1 +
 2 files changed, 2 insertions(+)

diff --git a/Makefile.in b/Makefile.in
index 02cc7a39094..86c0c6d5b2d 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -198,6 +198,7 @@ HOST_EXPORTS = \
AR="$(AR)"; export AR; \
AS="$(AS)"; export AS; \
CC_FOR_BUILD="$(CC_FOR_BUILD)"; export CC_FOR_BUILD; \
+   CXX_FOR_BUILD="$(CXX_FOR_BUILD)"; export CXX_FOR_BUILD; \
DLLTOOL="$(DLLTOOL)"; export DLLTOOL; \
LD="$(LD)"; export LD; \
LDFLAGS="$(STAGE1_LDFLAGS) $(LDFLAGS)"; export LDFLAGS; \
diff --git a/Makefile.tpl b/Makefile.tpl
index 1cdc023c82f..efed1511750 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -201,6 +201,7 @@ HOST_EXPORTS = \
AR="$(AR)"; export AR; \
AS="$(AS)"; export AS; \
CC_FOR_BUILD="$(CC_FOR_BUILD)"; export CC_FOR_BUILD; \
+   CXX_FOR_BUILD="$(CXX_FOR_BUILD)"; export CXX_FOR_BUILD; \
DLLTOOL="$(DLLTOOL)"; export DLLTOOL; \
LD="$(LD)"; export LD; \
LDFLAGS="$(STAGE1_LDFLAGS) $(LDFLAGS)"; export LDFLAGS; \
-- 
2.20.1



[PATCH] xtensa: fix PR target/90922

2019-06-18 Thread Max Filippov
Stack pointer adjustment code in prologue missed a case of no
callee-saved registers and a stack frame size bigger than 128 bytes.
Handle that case.

This fixes the following gcc tests with call0 ABI:
  gcc.c-torture/execute/stdarg-2.c
  gcc.dg/torture/pr55882.c
  gcc.dg/torture/pr57569.c

2019-06-18  Max Filippov  
gcc/
* config/xtensa/xtensa.c (xtensa_expand_prologue): Add stack
pointer adjustment for the case of no callee-saved registers and
stack frame bigger than 128 bytes.
---
 gcc/config/xtensa/xtensa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index 19bd616d67f6..ee5612441e25 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -2865,7 +2865,8 @@ xtensa_expand_prologue (void)
gen_rtx_SET (mem, reg));
}
}
-  if (total_size > 1024)
+  if (total_size > 1024
+ || (!callee_save_size && total_size > 128))
{
  rtx tmp_reg = gen_rtx_REG (Pmode, A9_REG);
  emit_move_insn (tmp_reg, GEN_INT (total_size -
-- 
2.11.0



C++ PATCH to add test for c++/71548

2019-06-18 Thread Marek Polacek
Invalid, but we used to crash here, so let's make sure the ICE doesn't
creep back in.

Tested on x86_64-linux, applying to trunk.

2019-06-18  Marek Polacek  

PR c++/71548
* g++.dg/cpp0x/variadic177.C: New test.

diff --git gcc/testsuite/g++.dg/cpp0x/variadic177.C 
gcc/testsuite/g++.dg/cpp0x/variadic177.C
new file mode 100644
index 000..96736a0ac0a
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/variadic177.C
@@ -0,0 +1,12 @@
+// PR c++/71548
+// { dg-do compile { target c++11 } }
+
+template class fl {};
+template class = fl>
+struct S {};
+template
+void f(S ) {}
+void lol() {
+S<> s;
+f(s); // { dg-error "no matching function for call to" }
+}


Re: Review Hashtable extract node API

2019-06-18 Thread François Dumont

On 6/18/19 12:54 PM, Jonathan Wakely wrote:

On 18/06/19 07:52 +0200, François Dumont wrote:

A small regression noticed while merging.

We shouldn't keep on using a moved-from key_type instance.

Ok to commit ? Feel free to do it if you prefer, I'll do so at end of 
Europe day otherwise.



diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h

index f5809c7443a..7e89e1b44c4 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -743,7 +743,8 @@ namespace __detail
std::tuple<>()
  };
  auto __pos
-    = __h->_M_insert_unique_node(__k, __bkt, __code, __node._M_node);
+    = 
__h->_M_insert_unique_node(__h->_M_extract()(__node._M_node->_M_v()),

+ __bkt, __code, __node._M_node);
  __node._M_node = nullptr;
  return __pos->second;
    }


I can't create an example where this causes a problem, because the key
passed to _M_insert_unique_node is never used. So it doesn't matter
that it's been moved from.

So I have to wonder why we just added the key parameter to that
function, if it's never used.


I think you've been influence by my patch. I was using a "_NodeAccessor" 
which wasn't giving access to the node without taking owership so I 
needed to pass the key properly to compute new bucket index in case of 
rehash.


But with your approach this change to the _M_insert_unique_node was 
simply unecessary so here is a patch to cleanup this part.


Ok to commit ?




As far as I can tell, it would only be used for a non-default range
hash function, and I don't care about that. Frankly I find the
policy-based _Hashtable completely unmaintainable and I'd gladly get
rid of all of it that isn't needed for the std::unordered_xxx
containers. The non-standard extensions are not used by anybody,
apparently not tested properly (or this regression should have been
noticed) and make the code too complicated.
I never consider removing this but it would indeed make maintainance 
easy. I'll add to my TODO.


We're adding new parameters that have to be passed around even though
they're never used by 99.999% of users. No wonder the code is only
fast at -O3.







diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index ab579a7059e..41edafaa2e3 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -693,11 +693,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __node_base*
   _M_get_previous_node(size_type __bkt, __node_base* __n);
 
-  // Insert node __n with key __k and hash code __code, in bucket __bkt
-  // if no rehash (assumes no element with same key already present).
+  // Insert node __n with hash code __code, in bucket __bkt if no
+  // rehash (assumes no element with same key already present).
   // Takes ownership of __n if insertion succeeds, throws otherwise.
   iterator
-  _M_insert_unique_node(const key_type& __k, size_type __bkt,
+  _M_insert_unique_node(size_type __bkt,
 			__hash_code __code, __node_type* __n,
 			size_type __n_elt = 1);
 
@@ -831,7 +831,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	else
 	  {
 		__ret.position
-		  = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr);
+		  = _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
 		__nh._M_ptr = nullptr;
 		__ret.inserted = true;
 	  }
@@ -918,8 +918,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  if (_M_find_node(__bkt, __k, __code) == nullptr)
 		{
 		  auto __nh = __src.extract(__pos);
-		  _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr,
-	__n_elt);
+		  _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt);
 		  __nh._M_ptr = nullptr;
 		  __n_elt = 1;
 		}
@@ -1671,7 +1670,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  return std::make_pair(iterator(__p), false);
 
 	// Insert the node
-	auto __pos = _M_insert_unique_node(__k, __bkt, __code, __node._M_node);
+	auto __pos = _M_insert_unique_node(__bkt, __code, __node._M_node);
 	__node._M_node = nullptr;
 	return { __pos, true };
   }
@@ -1705,9 +1704,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 auto
 _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	   _H1, _H2, _Hash, _RehashPolicy, _Traits>::
-_M_insert_unique_node(const key_type& __k, size_type __bkt,
-			  __hash_code __code, __node_type* __node,
-			  size_type __n_elt)
+_M_insert_unique_node(size_type __bkt, __hash_code __code,
+			  __node_type* __node, size_type __n_elt)
 -> iterator
 {
   const __rehash_state& __saved_state = _M_rehash_policy._M_state();
@@ -1718,7 +1716,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   if (__do_rehash.first)
 	{
 	  _M_rehash(__do_rehash.second, __saved_state);
-	  __bkt = _M_bucket_index(__k, __code);
+	  __bkt = _M_bucket_index(this->_M_extract()(__node->_M_v()), __code);
 	}
 
   this->_M_store_code(__node, __code);
@@ -1804,7 +1802,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 

Re: [PATCH][PR89341]Fix ICE on function definition with weakref/alias attributes attached

2019-06-18 Thread Jeff Law
On 3/26/19 5:40 AM, JunMa wrote:
> Hi
> 
> According to gnu document of function attributes, neither weakref nor alias
> could be attached to a function defined in current translation unit.
> Although GCC checks the attributes under some circumstances, it still fails
> on some cases and even causes ICE.
> 
> This patch checks whether alias/weakref attribute attaches on a function
> declaration which has body, and removes the attribute later.
> This also avoid the ICE.
> 
> Bootstrapped/regtested on x86_64-linux, ok for GCC10?
> 
> Regards
> JunMa
> 
> 
> gcc/ChangeLog
> 
> 2019-03-26  Jun Ma 
> 
>     PR89341
>     * cgraphunit.c (handle_alias_pairs): Check whether alias attribute
> attaches
>     on a function declaration which has body.
> 
> gcc/testsuite/ChangeLog
> 
> 2019-03-26  Jun Ma 
> 
>     PR89341
>     * gcc.dg/attr-alias-6.c: New test.
>     * gcc.dg/attr-weakref-5.c: Likewise.
Based on my reading of the BZ, this should result in a hard error,
rather than an "attribute ignored" warning.

Jeff


Re: [PATCH] [ARC] Fix PR89838

2019-06-18 Thread Jeff Law
On 6/6/19 1:32 AM, Claudiu Zissulescu wrote:
> Hi Andrew,
> 
> This is a proposed fix for bugzilla PR89838 issue. It also needs to be 
> backported to gcc9 and, eventually, gcc8 branches.
> 
> Ok to apply?
> Claudiu
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_symbol_binds_local_p): New function.
>   (arc_legitimize_pic_address): Simplify and cleanup the function.
>   (SYMBOLIC_CONST): Remove.
>   (prepare_pic_move): Likewise.
>   (prepare_move_operands): Handle complex mov cases here.
>   (arc_legitimize_address_0): Remove call to
>   arc_legitimize_pic_address.
>   (arc_legitimize_address): Remove call to
>   arc_legitimize_tls_address.
>   * config/arc/arc.md (movqi_insn): Allow Cm3 match.
>   (movhi_insn): Likewise.
> 
> /gcc/testsuite
> -xx-xx  Claudiu Zissulescu  
> 
>   * gcc.target/arc/pr89838.c: New file.
OK.

THe BZ mentions that this was found building a glibc test for ARC.  Is
there a glibc port for the ARC?  I don't see one in the glibc git repo.
 Are you aware of any plans to produce an official glibc port.

I believe building glibc is a hell of a better sniff test than building
newlib.  So if it's in the plan, I'd love to re-wire my tester to test
with glibc rather than newlib on the ARC port.

jeff


[PATCH] PR fortran/87907 -- Don't dereference a NULL pointer

2019-06-18 Thread Steve Kargl
The attach patch has been regression tested on x86_64-*-freebsd.
If the pointer is NULL, the function simply returns.  It seems
that gfortran then does the Right Thing.  OK to commit?

2019-06-18  Steven G. Kargl  

 PR fortran/87907
 * resolve.c (resolve_contained_fntype): Do not dereference a NULL
 pointer.

2019-06-18  Steven G. Kargl  

 PR fortran/87907
* gfortran.dg/pr87907.f90: New testcase.


-- 
Steve
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c	(revision 272432)
+++ gcc/fortran/resolve.c	(working copy)
@@ -583,6 +583,9 @@ resolve_contained_fntype (gfc_symbol *sym, gfc_namespa
   || sym->attr.entry_master)
 return;
 
+  if (!sym->result)
+return;
+
   /* Try to find out of what the return type is.  */
   if (sym->result->ts.type == BT_UNKNOWN && sym->result->ts.interface == NULL)
 {
Index: gcc/testsuite/gfortran.dg/pr87907.f90
===
--- gcc/testsuite/gfortran.dg/pr87907.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr87907.f90	(working copy)
@@ -0,0 +1,23 @@
+! { dg-do compile }
+! PR fortran/pr87907
+! Original testcase contributed by Gerhard Stienmetz 
+module m
+   interface
+  module function g(x) result(z)
+ integer, intent(in) :: x
+ integer, allocatable :: z
+  end
+   end interface
+end
+
+submodule(m) m2
+   contains
+  subroutine g(x)   ! { dg-error "mismatch in argument" }
+  end
+end
+
+program p
+   use m! { dg-error "has a type" }
+   integer :: x = 3
+   call g(x)! { dg-error "which is not consistent with" }
+end


Re: [PATCH,RFC,V2 2/3] Add CTF command line options : -gtLEVEL

2019-06-18 Thread Bernhard Reutner-Fischer
On 12 June 2019 20:00:09 CEST, Indu Bhagat  wrote:
>-gtLEVEL is used to request CTF debug information and also to specify
>how much
>CTF debug information.

The option name is way too generic IMO.
-gctfLEVEL or some such would at least  indicate its intended purpose, fwiw.

thanks

>
>[Changes from V1]
>  None
>
>gcc/ChangeLog:
> 
>   * common.opt: Add CTF debug info options.
>   * doc/invoke.texi: Document the CTF debug info options.
>   * flag-types.h (enum ctf_debug_info_levels): New enum.
>   * opts.c (common_handle_option): New Function.
>   (set_ctf_debug_level): Handle the new CTF debug info options.



Re: [PATCH] implement -Wformat-diag, v2

2019-06-18 Thread Martin Sebor

On 6/18/19 12:59 PM, Jeff Law wrote:

On 5/22/19 10:42 AM, Martin Sebor wrote:

Attached is a revised implementation of the -Wformat-diag checker
incorporating the feedback I got on the first revision.

Martin

gcc-wformat-diag-checker.diff

gcc/c-family/ChangeLog:

* c-format.c (function_format_info::format_type): Adjust type.
(function_format_info::is_raw): New member.
(decode_format_type): Adjust signature.  Handle "raw" diag attributes.
(decode_format_attr): Adjust call to decode_format_type.
Avoid a redundant call to convert_format_name_to_system_name.
Avoid abbreviating the word "arguments" in a diagnostic.
(format_warning_substr): New function.
(avoid_dollar_number): Quote dollar sign in a diagnostic.
(finish_dollar_format_checking): Same.
(check_format_info): Same.
(struct baltoks_t): New.
(c_opers, c_keywords, cxx_keywords, badwords, contrs): New arrays.
(maybe_diag_unbalanced_tokens, check_tokens, check_plain): New
functions.
(check_format_info_main): Call check_plain.  Use baltoks_t.  Call
maybe_diag_unbalanced_tokens.
(handle_format_attribute): Spell out the word "arguments" in
a diagnostic.

gcc/testsuite/ChangeLog:
* gcc.dg/format/gcc_diag-11.c: New test.

High level comment.  This is painful to read.  But that's probably an
artifact of trying to cobble together C code to parse strings and codify
the conventions.ie, it's likely inherent for the problem you're
trying to solve.


It wasn't exactly a lot of fun to write either.  I suspect it
would have been even worse if I had used regular expressions.
It is more complicated than strictly necessary because it's
trying to balance usability of the warning with efficiency.
(Although I'm sure both could be improved.)


diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index a7f76c1c01d..713fce442c9 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -320,10 +352,8 @@ decode_format_attr (const_tree fntype, tree atname, tree 
args,
  {
const char *p = IDENTIFIER_POINTER (format_type_id);
  
-  p = convert_format_name_to_system_name (p);

+  info->format_type = decode_format_type (p, >is_raw);
  
-  info->format_type = decode_format_type (p);

-
if (!c_dialect_objc ()
   && info->format_type == gcc_objc_string_format_type)
{

Did you mean to drop the call to convert_format_name_to_system_name?


Yes, it's redundant (it's the first thing decode_format_type does).


@@ -2789,6 +2850,904 @@ check_argument_type (const format_char_info *fci,
return true;
  }
  
+

+/* Helper for initializing global token_t arrays below.  */
+#define NAME(name) { name, sizeof name - 1, NULL }
+
+/* C/C++ operators that are expected to be quoted within the format
+   string.  */
+
+static const token_t c_opers[] =
+  {
+   NAME ("!="), NAME ("%="),  NAME ("&&"),  NAME ("&="), NAME ("*="),
+   NAME ("++"), NAME ("+="),  NAME ("--"),  NAME ("-="), NAME ("->"),
+   NAME ("/="), NAME ("<<"),  NAME ("<<="), NAME ("<="), NAME ("=="),
+   NAME (">="), NAME (">>="), NAME (">>"),  NAME ("?:"),  NAME ("^="),
+   NAME ("|="), NAME ("||")
+  };

This clearly isn't a full list of operators.  Is there a particular
reason why we aren't diagnosing others.  I guess you're catching the
single character operators via the ISPUNCT checks?  That seems a little
loose (@ isn't an operator for example).  It  may be OK in practice though.


Yes, it only handles two-character operators and its only purpose
is to make diagnostics more readable and less repetitive (otherwise
we'd get one for each occurrence of the punctuator). I think @ is
an operator in Objective-C (I only know this because I fixed a few
instances of it there).


+
+  if (nchars)
+{
+  /* This is the most common problem: go the extra mile to decribe

s/decribe/describe/




+
+static void
+maybe_diag_unbalanced_tokens (location_t format_string_loc,
+ const char *orig_format_chars,
+ tree format_string_cst,
+ baltoks_t )

Needs a function comment.



@@ -2828,10 +3789,26 @@ check_format_info_main (format_check_results *res,
  
init_dollar_format_checking (info->first_arg_num, first_fillin_param);
  
+  /* In GCC diagnostic functions check plain directives (substrings within

+ the format string that don't start with %) for quoting and punctuations
+ problems.  */
+  bool ck_plain = (!info->is_raw
+  && (info->format_type == gcc_diag_format_type
+  || info->format_type == gcc_tdiag_format_type
+  || info->format_type == gcc_cdiag_format_type
+  || info->format_type == gcc_cxxdiag_format_type));
+
while (*format_chars != 0)
  {
-  if (*format_chars++ != '%')
+  if (ck_plain)
+   format_chars = check_plain 

Re: [PATCH] implement -Wformat-diag, v2

2019-06-18 Thread Jeff Law
On 5/22/19 10:42 AM, Martin Sebor wrote:
> Attached is a revised implementation of the -Wformat-diag checker
> incorporating the feedback I got on the first revision.
> 
> Martin
> 
> gcc-wformat-diag-checker.diff
> 
> gcc/c-family/ChangeLog:
> 
>   * c-format.c (function_format_info::format_type): Adjust type.
>   (function_format_info::is_raw): New member.
>   (decode_format_type): Adjust signature.  Handle "raw" diag attributes.
>   (decode_format_attr): Adjust call to decode_format_type.
>   Avoid a redundant call to convert_format_name_to_system_name.
>   Avoid abbreviating the word "arguments" in a diagnostic.
>   (format_warning_substr): New function.
>   (avoid_dollar_number): Quote dollar sign in a diagnostic.
>   (finish_dollar_format_checking): Same.
>   (check_format_info): Same.
>   (struct baltoks_t): New.
>   (c_opers, c_keywords, cxx_keywords, badwords, contrs): New arrays.
>   (maybe_diag_unbalanced_tokens, check_tokens, check_plain): New
>   functions.
>   (check_format_info_main): Call check_plain.  Use baltoks_t.  Call
>   maybe_diag_unbalanced_tokens.
>   (handle_format_attribute): Spell out the word "arguments" in
>   a diagnostic.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.dg/format/gcc_diag-11.c: New test.
High level comment.  This is painful to read.  But that's probably an
artifact of trying to cobble together C code to parse strings and codify
the conventions.ie, it's likely inherent for the problem you're
trying to solve.


> 
> diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
> index a7f76c1c01d..713fce442c9 100644
> --- a/gcc/c-family/c-format.c
> +++ b/gcc/c-family/c-format.c
> @@ -320,10 +352,8 @@ decode_format_attr (const_tree fntype, tree atname, tree 
> args,
>  {
>const char *p = IDENTIFIER_POINTER (format_type_id);
>  
> -  p = convert_format_name_to_system_name (p);
> +  info->format_type = decode_format_type (p, >is_raw);
>  
> -  info->format_type = decode_format_type (p);
> -  
>if (!c_dialect_objc ()
>  && info->format_type == gcc_objc_string_format_type)
>   {
Did you mean to drop the call to convert_format_name_to_system_name?



> @@ -2789,6 +2850,904 @@ check_argument_type (const format_char_info *fci,
>return true;
>  }
>  
> +
> +/* Helper for initializing global token_t arrays below.  */
> +#define NAME(name) { name, sizeof name - 1, NULL }
> +
> +/* C/C++ operators that are expected to be quoted within the format
> +   string.  */
> +
> +static const token_t c_opers[] =
> +  {
> +   NAME ("!="), NAME ("%="),  NAME ("&&"),  NAME ("&="), NAME ("*="),
> +   NAME ("++"), NAME ("+="),  NAME ("--"),  NAME ("-="), NAME ("->"),
> +   NAME ("/="), NAME ("<<"),  NAME ("<<="), NAME ("<="), NAME ("=="),
> +   NAME (">="), NAME (">>="), NAME (">>"),  NAME ("?:"),  NAME ("^="),
> +   NAME ("|="), NAME ("||")
> +  };
This clearly isn't a full list of operators.  Is there a particular
reason why we aren't diagnosing others.  I guess you're catching the
single character operators via the ISPUNCT checks?  That seems a little
loose (@ isn't an operator for example).  It  may be OK in practice though.



> +

> +
> +  if (nchars)
> +{
> +  /* This is the most common problem: go the extra mile to decribe
s/decribe/describe/



> +
> +static void
> +maybe_diag_unbalanced_tokens (location_t format_string_loc,
> +   const char *orig_format_chars,
> +   tree format_string_cst,
> +   baltoks_t )
Needs a function comment.


> @@ -2828,10 +3789,26 @@ check_format_info_main (format_check_results *res,
>  
>init_dollar_format_checking (info->first_arg_num, first_fillin_param);
>  
> +  /* In GCC diagnostic functions check plain directives (substrings within
> + the format string that don't start with %) for quoting and punctuations
> + problems.  */
> +  bool ck_plain = (!info->is_raw
> +&& (info->format_type == gcc_diag_format_type
> +|| info->format_type == gcc_tdiag_format_type
> +|| info->format_type == gcc_cdiag_format_type
> +|| info->format_type == gcc_cxxdiag_format_type));
> +
>while (*format_chars != 0)
>  {
> -  if (*format_chars++ != '%')
> +  if (ck_plain)
> + format_chars = check_plain (format_string_loc,
> + format_string_cst,
> + orig_format_chars, format_chars,
> + baltoks);
> +
> +  if (*format_chars == 0 || *format_chars++ != '%')
>   continue;
> +
>if (*format_chars == 0)
Isn't the second test of *format_chars == 0 dead now?

I'm going to throw this into my tester and see what, if anything, pops
out while you fixup the nits above.  Assuming there isn't anything
major, my inclination is to go forward.  We may over time improve the

Re: RFA: Synchronize top level files with binutils

2019-06-18 Thread Richard Earnshaw (lists)
On 18/06/2019 17:20, Nick Clifton wrote:
> Hi Richard,
> 
>>>   OK, here is a resubmission of my patch with just the addition of the
>>>   libctf patches this time.
> 
>   [Sorry - this one got put on a back burner].
> 
>> Would it be feasible to backport this to the other maintained branches
>> so that the option of using them with current binutils would be available?
> 
> Do you have any particular branches in mind ?  There do seem to be quite a 
> lot of them...
> 
> Cheers
>   Nick
> 
> 
Only the official branches: gcc-7, gcc-8 and gcc-9.  I would expect
branch owners to do any other branches as and when they might require it.

R.


Re: [PATCH] Fix PR84521

2019-06-18 Thread Wilco Dijkstra
Hi,

> > Is this test valid?  Can jmp buffer be allowed on stack?
> 
> Sure, the contents of the jmp buffer is only valid during the lifetime
>  of the call frame anyway.

Indeed. The issue with jmp buffer being on the stack causing incorrect
restore when doing longjmp has just been fixed (PR64242). 

Wilco
 

Re: [PATCH,RFC,V2 0/3] Support for CTF in GCC

2019-06-18 Thread Indu Bhagat
Hello,

PING, In case this patch series slipped your attention.

Thanks
Indu

On Wed, Jun 12, 2019 at 10:50 AM Indu Bhagat  wrote:
>
> Hello,
>
> Thanks for the feedback on the previous patch set.
>
> This is the second posting of the RFC patch for CTF support in GCC. This patch
> set does not rely on debug hooks, but it keeps CTF and DWARF debug info
> generation separated in the compiler.
>
> For CTF generation, callsites in symbol_table::finalize_compilation_unit and
> rest_of_decl_compilation are used. For CTF emission, callsite in
> symbol_table::finalize_compilation_unit is used.
>
> Summary of the GCC RFC V2 patch set :
> Patch 1 and Patch 2 have remain unchanged since V1.
> Patch 1 is a simple addition of a new function lang_GNU_GIMPLE to check for
> GIMPLE frontend.
> Patch 2 and Patch 3 set up the framework for CTF support in GCC :
> -- Patch 2 adds the new command line option for generating CTF. CTF generation
>is enabled in the compiler by specifying an explicit -gt or
>-gtLEVEL[LEVEL=1,2] :
>
> -gtLEVEL
>
> This is used to request CTF debug information and to specify how much CTF
> debug information, LEVEL[=0,1,2] can be specified. If -gt is specified
> (with no LEVEL), the default value of LEVEL is 2.
>
> -gt0 (Level 0) produces no CTF debug information at all. Thus, -gt0
> negates -gt.
>
> -gt1 (Level 1) produces CTF information for tracebacks only. This includes
> CTF callsite information, but does not include type information for other
> entities.
>
> -gt2 (Level 2) produces type information for entities (functions, 
> variables
> etc.) at file-scope or global-scope only. This level of information can be
> used by dynamic tracers like DTrace.
>
> --  Patch 3 initializes the CTF container if user-level option for CTF
> generation is specified. CTF is generated for all to-be-emitted global
> decls if gtLEVEL of 2 is specified.
>
> Tested on x86_64-linux and sparc64-linux.
>
> Thanks
>
> Indu Bhagat (3):
>   Add new function lang_GNU_GIMPLE
>   Add CTF command line options : -gtLEVEL
>   Setup for CTF generation and emission
>
>  gcc/ChangeLog   |  27 ++
>  gcc/Makefile.in |   3 +
>  gcc/cgraphunit.c|  12 +-
>  gcc/common.opt  |   9 +
>  gcc/ctfout.c| 163 
>  gcc/ctfout.h|  52 +++
>  gcc/doc/invoke.texi |  16 +
>  gcc/flag-types.h|  13 +
>  gcc/gengtype.c  |   4 +-
>  gcc/langhooks.c |   9 +
>  gcc/langhooks.h |   1 +
>  gcc/opts.c  |  26 ++
>  gcc/passes.c|   7 +-
>  gcc/testsuite/ChangeLog |   7 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-1.c  |   6 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf-preamble-1.c |  11 +
>  gcc/testsuite/gcc.dg/debug/ctf/ctf.exp  |  41 ++
>  gcc/testsuite/gcc.dg/debug/dwarf2-ctf-1.c   |   7 +
>  gcc/toplev.c|  18 +
>  include/ChangeLog   |   4 +
>  include/ctf.h   | 483 
> 
>  21 files changed, 913 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/ctfout.c
>  create mode 100644 gcc/ctfout.h
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-preamble-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf.exp
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2-ctf-1.c
>  create mode 100644 include/ctf.h
>
> --
> 1.8.3.1
>


Re: [PATCH] Fix PR84521

2019-06-18 Thread Andreas Schwab
On Jun 18 2019, "H.J. Lu"  wrote:

>> +void
>> +test (void)
>> +{
>> +  void *buf[5];
>> +  void *volatile q = p;
>> +
>> +  if (!__builtin_setjmp (buf))
>> +broken_longjmp (buf);
>
> Is this test valid?  Can jmp buffer be allowed on stack?

Sure, the contents of the jmp buffer is only valid during the lifetime
of the call frame anyway.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH], PowerPC PR90822 (cleanup lfiwax, lfiwzx generation)

2019-06-18 Thread Michael Meissner
On Tue, Jun 18, 2019 at 06:37:54AM -0500, Segher Boessenkool wrote:
> On Mon, Jun 17, 2019 at 05:24:37PM -0400, Michael Meissner wrote:
> > I wrote the code to generate LFIWAX and LFIWZX originally for the power7 in 
> > the
> > 2010 time frame.  At the time, we did not allow SImode to go into floating
> > point and vector registers.  As part of the power9 work, we now allow 
> > SImode to
> > go into FP/vector registers with for 64-bit code targetting -mcpu=power8 or
> > higher.  But we never went back and tweaked the LFIWAX/LFIWZX support.
> 
> Why do we allow it only in 64-bit mode?  I mean, it sounds like only
> handling 64-bit mode causes us to have more code and more complexity
> instead of less.

The main reason is extendsidi2 and zero_extendsidi2.  These are not enabled on
32-bit (due to the EXTSI mode iterator), so that the common code is done to do
sign/zero extension.  And I felt that if you allowed it, the compiler would
move the extensions to the fp/vector unit.  Note that direct moves of 64-bit
items to/from the GPRs is somewhat messy.

> > I was writing code for a possible future PowerPC machine, and the new code
> > added an attribute that caused some of the -mno-vsx tests to fail.  This was
> > due to the floatsi2_lfiwax and floatunssi2_lfiwzx patterns did 
> > not
> > have a non-VSX alternative, and the attribute processing needed to process 
> > the
> > alternatives before the first split pass.
> 
> I don't understand what you mean...  "attribute processing"?

In my current "prefixed" attribute support (that says whether a prefixed
instruction is used and changes the length), I eliminated the "maybe_prefix"
attribute, since you have complained about it.  This means that when the
"prefixed" attribute is checked, it has to figure out which alternative it is
is, to see what the "type" attribute is.  The floatsi2_lfiwax and
floatunssi2_lfiwzx patterns do not have a non-VSX alternative.

In the past before the "isa" attribute enabled removing alternatives based on
the machine, this was harmless, because the split that those instructions do
(before register allocation) would create appropriate code.

Now that "isa" removes the alternatives, there are no alternatives that will
process the insn, so the compiler dies with insn not found.

Obviously, I can make a simpler patch just to fix that problem.

> > In general, the 32-bit code seems to generate a lot less instructions,
> > including fewer lfiwax/lfiwzx instructions.  On power8/power9 32-bit code,
> > there was more mtvsrwz mtvsrwa instructions.
> 
> Interesting.  Is that caused by less register pressure?
> 
> > --- gcc/config/rs6000/rs6000.md (revision 272166)
> > +++ gcc/config/rs6000/rs6000.md (working copy)
> 
> This patch is very hard to read.  It mixes insertions and deletions of
> different definitions, where the only thing they have in common is some
> braces or parens or whitespace usually.

I was trying to move things so that related things were together (i.e. the
basic lfiwax and lfiwzx patterns and the two define_insn_and_splits that
generate it).  I tend to think that when you look at the code and not the
patches, that it makes more sense.

> 
> Maybe more context (-U) helps, maybe whole-function mode is better (-W),
> maybe something else.  It also sometimes helps to do things as a patch
> series instead of as one patch.  Please experiment.
> 
> > +; On 32-bit systems, we need to have special versions of LFIWAX and LFIWZX 
> > because
> > +; the sign/zero extend insns are not defined.
> 
> I don't understand what this means.

See above about EXTSI.  For reference here is the code from rs6000.md.

; Everything we can extend SImode to.
(define_mode_iterator EXTSI [(DI "TARGET_POWERPC64")])

; ...

(define_insn "zero_extendsi2"
  [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,d,wa,wa,r,wa")
(zero_extend:EXTSI (match_operand:SI 1 "reg_or_mem_operand" 
"m,r,Z,Z,r,wa,wa")))]
  ""
  "@
   lwz%U1%X1 %0,%1
   rldicl %0,%1,0,32
   lfiwzx %0,%y1
   lxsiwzx %x0,%y1
   mtvsrwz %x0,%1
   mfvsrwz %0,%x1
   xxextractuw %x0,%x1,4"
  [(set_attr "type" "load,shift,fpload,fpload,mffgpr,mftgpr,vecexts")
   (set_attr "isa" "*,*,p7,p8v,p8v,p8v,p9v")])

; and similar for extendsi2.

> 
> [ Deleted all "-" lines below, to make some sense of it. ]
> 
> 
> > +(define_insn_and_split "lfiwax"
> 
> This could use a better name?  Why is it separate from extendsidi2 anyway?

I was using the name that is currently in the code, i.e. the instruction.

> > +  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,v,v")
> > +   (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,v,v")]
> >UNSPEC_LFIWAX))]
> >"TARGET_HARD_FLOAT && TARGET_LFIWAX"
> >"@
> > lfiwax %0,%y1
> > lxsiwax %x0,%y1
> > mtvsrwa %x0,%1
> > +   vextsw2d %0,%1
> > +   #"
> > +  "&& reload_completed && TARGET_P8_VECTOR && !TARGET_P9_VECTOR
> > +   && altivec_register_operand (operands[1], SImode)"
> 
> "&& reload_completed && 

Re: [C++ PATCH] PR c++/90875 - added -Wswitch-outside-range option.

2019-06-18 Thread Marek Polacek
On Tue, Jun 18, 2019 at 01:17:10PM -0400, Matthew Beliveau wrote:
> Hello,
> 
> This patch should change the formatting, and move the test files into
> the appropriate directory!

It doesn't address my other comments, though, so please send a new version
with that fixed.

Marek


Re: [C++ PATCH] PR c++/90875 - added -Wswitch-outside-range option.

2019-06-18 Thread Matthew Beliveau
Hello,

This patch should change the formatting, and move the test files into
the appropriate directory!

Thank you

On Mon, Jun 17, 2019 at 1:45 PM Marek Polacek  wrote:
>
> Thanks for the patch.
>
> On Mon, Jun 17, 2019 at 01:26:37PM -0400, Matthew Beliveau wrote:
> > --- gcc/c-family/c.opt
> > +++ gcc/c-family/c.opt
> > @@ -819,6 +819,10 @@ Wswitch-bool
> >  C ObjC C++ ObjC++ Var(warn_switch_bool) Warning Init(1)
> >  Warn about switches with boolean controlling expression.
> >
> > +Wswitch-outside-range
> > +C++ ObjC++ Var(warn_switch_outside_range) Warning Init(1)
>
> This warning is not C++- specific; it also applies to C and Obj-C.
>
> > +Warn about switch values that are outside of their type's range.
>
> This is slightly imprecise -- the values are outside of the type
> of the controlling expression of the switch.  So I'd say
> "that are outside of the switch's type range" or so.
>
> > diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
> > index bf9da0f0a6e..c23496b2668 100644
> > --- gcc/doc/invoke.texi
> > +++ gcc/doc/invoke.texi
> > @@ -5390,6 +5390,12 @@ switch ((int) (a == 4))
> >  @end smallexample
> >  This warning is enabled by default for C and C++ programs.
> >
> > +@item -Wswitch-outside-range
> > +@opindex Wswitch-outside-range
> > +@opindex Wno-switch-outside-range
> > +Warn whenever a @code{switch} state has a value that is outside of it's
>
> "its"
>
> > +respective type range.
> > +
>
> This should also say
> "This warning is enabled by default for C and C++ programs."
>
> > diff --git gcc/testsuite/g++.dg/warn/Wswitch-outside-range-1.C 
> > gcc/testsuite/g++.dg/warn/Wswitch-outside-range-1.C
> > new file mode 100644
> > index 000..29e56f3ba2d
> > --- /dev/null
> > +++ gcc/testsuite/g++.dg/warn/Wswitch-outside-range-1.C
> > @@ -0,0 +1,8 @@
> > +// PR c++/90875
> > +
> > +void f(char c)
> > +{
> > +  switch (c)
> > +case 300: // { dg-warning "case label value exceeds maximum value for 
> > type" }
> > +case -300:; // { dg-warning "case label value is less than minimum 
> > value for type" }
> > +}
> > diff --git gcc/testsuite/g++.dg/warn/Wswitch-outside-range-2.C 
> > gcc/testsuite/g++.dg/warn/Wswitch-outside-range-2.C
> > new file mode 100644
> > index 000..20cc019b209
> > --- /dev/null
> > +++ gcc/testsuite/g++.dg/warn/Wswitch-outside-range-2.C
> > @@ -0,0 +1,9 @@
> > +// PR c++/90875
> > +// { dg-options -Wno-switch-outside-range }
> > +
> > +void f(char c)
> > +{
> > +  switch (c)
> > +case 300: //{ dg-bogus "case label value is less than minimum value 
> > for type" }
> > +case -300:; // { dg-bogus "case label value is less than minimum value 
> > for type" }
> > +}
> > diff --git gcc/testsuite/g++.dg/warn/Wswitch-outside-range-3.C 
> > gcc/testsuite/g++.dg/warn/Wswitch-outside-range-3.C
> > new file mode 100644
> > index 000..baf15561af0
> > --- /dev/null
> > +++ gcc/testsuite/g++.dg/warn/Wswitch-outside-range-3.C
> > @@ -0,0 +1,9 @@
> > +// PR c++/90875
> > +// { dg-options -Wno-pedantic }
> > +
> > +void f(char c)
> > +{
> > +  switch (c)
> > +
> > +case -300 ... 300:; // { dg-warning "lower value in case label range 
> > less than minimum value for type|upper value in case label range exceeds 
> > maximum value for type" }
> > +}
> > diff --git gcc/testsuite/g++.dg/warn/Wswitch-outside-range-4.C 
> > gcc/testsuite/g++.dg/warn/Wswitch-outside-range-4.C
> > new file mode 100644
> > index 000..d9bd756dc50
> > --- /dev/null
> > +++ gcc/testsuite/g++.dg/warn/Wswitch-outside-range-4.C
> > @@ -0,0 +1,9 @@
> > +// PR c++/90875
> > +// { dg-options "-Wno-pedantic -Wno-switch-outside-range" }
>
> (You can also use __extension__ so that you don't need -Wno-pedantic.)
>
> > +void f(char c)
> > +{
> > +  switch (c)
> > +
> > +case -300 ... 300:; // { dg-bogus "lower value in case label range 
> > less than minimum value for type|upper value in case label range exceeds 
> > maximum value for type" }
> > +}
>
> The tests belong to c-c++-common/ because we want to test both the C and
> C++ compilers.
>
> Please also fix the formatting as Jakub suggested.
>
> Thanks,
> --
> Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA
Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-06-18  Matthew Beliveau  

	PR c++/90875 - added -Wswitch-outside-range option
	* doc/invoke.texi (Wswitch-outside-range): Document.

	* c-warn.c (c_do_switch_warnings): Implemented new Wswitch-outside-range
	warning option.

	* c.opt (Wswitch-outside-range): Added new option.
	
	* c-c++-common/Wswitch-outside-range-1.C: New test.
	* c-c++-common/Wswitch-outside-range-2.C: New test.
	* c-c++-common/Wswitch-outside-range-3.C: New test.
	* c-c++-common/Wswitch-outside-range-4.C: New test.

diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
index 5941c10cddb..743099c75ca 100644
--- gcc/c-family/c-warn.c
+++ gcc/c-family/c-warn.c
@@ -1460,8 +1460,9 @@ c_do_switch_warnings (splay_tree cases, location_t switch_location,
    min_value) >= 0)
 	{

Re: [PATCH] Fix PR84521

2019-06-18 Thread Wilco Dijkstra
Hi Max,
 
> The testcase from the patch passes with the trunk xtensa-linux-gcc
> with windowed ABI. But with the changes in this patch a lot of tests
> that use longjmp are failing on xtensa-linux.

Interesting. I looked at the _xtensa_nonlocal_goto implementation in
libgcc/config/xtensa/lib2funcs.S, and it should work fine given it already
checks for the frame pointer to be within the bounds of a frame.
Whether we pass the virtual frame pointer or the hard frame pointer value
shouldn't matter as long as both are >= SP and < prev_SP. Maybe there
is an off by one issue in:

.Lfirstframe:
l32ia7, a6, 4   /* a7 = next */
bgeua2, a7, .Lnextframe

> call0 and windowed ABI xtensa code are not supposed to work together.
 
OK so that's not an issue then.

Wilco
 

Re: [PATCH] Fix PR84521

2019-06-18 Thread H.J. Lu
On Tue, May 28, 2019 at 10:37 AM Wilco Dijkstra  wrote:
>
>
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr84521.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr84521.c
> new file mode 100644
> index 
> ..995a30223afd1401186c7eaf541f27606aed59b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr84521.c
> @@ -0,0 +1,54 @@
> +/* { dg-require-effective-target indirect_jumps } */
> +/* { dg-additional-options "-fomit-frame-pointer -fno-inline" }  */
> +
> +extern void abort (void);
> +
> +void
> +broken_longjmp (void *p)
> +{
> +  __builtin_longjmp (p, 1);
> +}
> +
> +volatile int x = 256;
> +void *volatile p = (void*)
> +void *volatile p1;
> +
> +void
> +test (void)
> +{
> +  void *buf[5];
> +  void *volatile q = p;
> +
> +  if (!__builtin_setjmp (buf))
> +broken_longjmp (buf);

Is this test valid?  Can jmp buffer be allowed on stack?

> +  /* Fails if stack pointer corrupted.  */
> +  if (p != q)
> +abort ();
> +}
> +
> +void
> +test2 (void)
> +{
> +  void *volatile q = p;
> +  p1 = __builtin_alloca (x);
> +  test ();
> +
> +  /* Fails if frame pointer corrupted.  */
> +  if (p != q)
> +abort ();
> +}
> +
> +int
> +main (void)
> +{
> +  void *volatile q = p;
> +  test ();
> +  test2 ();
> +  /* Fails if stack pointer corrupted.  */
> +  if (p != q)
> +abort ();
> +
> +  return 0;
> +}
> +



-- 
H.J.


[PATCH, i386]: Use parametrized pattern names some more

2019-06-18 Thread Uros Bizjak
2019-06-18  Uroš Bizjak  

* config/i386/i386.md (@cmp_1): Rename from cmp_1.
(@add3_carry): Rename from add3_carry.
(@sub3_carry_ccc): Rename from sub3_carry_ccc.
(@sub3_carry_ccgz): Rename form sub3_carry_ccgz.
(@copysign3_const): Rename from copysign3_const.
(@copysign3_var): Rename from copysign3_var.
(@xorsign3_1): Rename from xorsign3_1.
(@x86_shift_adj_1): Rename from x86_shift_adj_1.
(@x86_shift_adj_2): Rename from x86_shift_adj_2.
(@x86_shift_adj_3): Rename from x86_shift_adj_3.
(cmpstrnsi): Use gen_cmp_1.
(lwp_slwpcb): Use gen_lwp_slwpcb_1.
(@lwp_slwpcb_1): Rename from lwp_slwpcb_1.
(@umonitor_): Rename from umonitor_.
* config/i386/i386-expand.c (ix86_expand_copysign):
Use gen_copysign3_const and gen_copysign3_var.
(ix86_expand_xorsign): Use gen_xorsign3_1.
(ix86_expand_branch): Use gen_sub3_carry_ccc,
gen_sub3_carry_ccgz and gen_cmp1.
(ix86_expand_int_addcc): Use gen_sub3_carry and gen_add3_carry.
(ix86_split_ashl): Use gen_x86_shift_adj_1 and gen_x86_shift_adj_2.
(ix86_split_ashr): Use gen_x86_shift_adj_1 and gen_x86_shift_adj_3.
(ix86_split_lshr): Ditto.
(ix86_expand_builtin) : Use gen_umonitor.

No functional changes.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 770106162fb0..4acd7621cf2b 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -1850,7 +1850,7 @@ void
 ix86_expand_copysign (rtx operands[])
 {
   machine_mode mode, vmode;
-  rtx dest, op0, op1, mask, nmask;
+  rtx dest, op0, op1, mask;
 
   dest = operands[0];
   op0 = operands[1];
@@ -1862,13 +1862,15 @@ ix86_expand_copysign (rtx operands[])
 vmode = V4SFmode;
   else if (mode == DFmode)
 vmode = V2DFmode;
-  else
+  else if (mode == TFmode)
 vmode = mode;
+  else
+gcc_unreachable ();
+
+  mask = ix86_build_signbit_mask (vmode, 0, 0);
 
   if (CONST_DOUBLE_P (op0))
 {
-  rtx (*copysign_insn)(rtx, rtx, rtx, rtx);
-
   if (real_isneg (CONST_DOUBLE_REAL_VALUE (op0)))
op0 = simplify_unary_operation (ABS, mode, op0, mode);
 
@@ -1886,32 +1888,14 @@ ix86_expand_copysign (rtx operands[])
   else if (op0 != CONST0_RTX (mode))
op0 = force_reg (mode, op0);
 
-  mask = ix86_build_signbit_mask (vmode, 0, 0);
-
-  if (mode == SFmode)
-   copysign_insn = gen_copysignsf3_const;
-  else if (mode == DFmode)
-   copysign_insn = gen_copysigndf3_const;
-  else
-   copysign_insn = gen_copysigntf3_const;
-
-  emit_insn (copysign_insn (dest, op0, op1, mask));
+  emit_insn (gen_copysign3_const (mode, dest, op0, op1, mask));
 }
   else
 {
-  rtx (*copysign_insn)(rtx, rtx, rtx, rtx, rtx, rtx);
-
-  nmask = ix86_build_signbit_mask (vmode, 0, 1);
-  mask = ix86_build_signbit_mask (vmode, 0, 0);
-
-  if (mode == SFmode)
-   copysign_insn = gen_copysignsf3_var;
-  else if (mode == DFmode)
-   copysign_insn = gen_copysigndf3_var;
-  else
-   copysign_insn = gen_copysigntf3_var;
+  rtx nmask = ix86_build_signbit_mask (vmode, 0, 1);
 
-  emit_insn (copysign_insn (dest, NULL_RTX, op0, op1, nmask, mask));
+  emit_insn (gen_copysign3_var
+(mode, dest, NULL_RTX, op0, op1, nmask, mask));
 }
 }
 
@@ -2020,7 +2004,6 @@ ix86_split_copysign_var (rtx operands[])
 void
 ix86_expand_xorsign (rtx operands[])
 {
-  rtx (*xorsign_insn)(rtx, rtx, rtx, rtx);
   machine_mode mode, vmode;
   rtx dest, op0, op1, mask;
 
@@ -2031,21 +2014,15 @@ ix86_expand_xorsign (rtx operands[])
   mode = GET_MODE (dest);
 
   if (mode == SFmode)
-{
-  xorsign_insn = gen_xorsignsf3_1;
-  vmode = V4SFmode;
-}
+vmode = V4SFmode;
   else if (mode == DFmode)
-{
-  xorsign_insn = gen_xorsigndf3_1;
-  vmode = V2DFmode;
-}
+vmode = V2DFmode;
   else
 gcc_unreachable ();
 
   mask = ix86_build_signbit_mask (vmode, 0, 0);
 
-  emit_insn (xorsign_insn (dest, op0, op1, mask));
+  emit_insn (gen_xorsign3_1 (mode, dest, op0, op1, mask));
 }
 
 /* Deconstruct an xorsign operation into bit masks.  */
@@ -2224,22 +2201,9 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx 
op1, rtx label)
 
  case LT: case LTU: case GE: case GEU:
{
- rtx (*cmp_insn) (rtx, rtx);
- rtx (*sbb_insn) (rtx, rtx, rtx);
  bool uns = (code == LTU || code == GEU);
-
- if (TARGET_64BIT)
-   {
- cmp_insn = gen_cmpdi_1;
- sbb_insn
-   = uns ? gen_subdi3_carry_ccc : gen_subdi3_carry_ccgz;
-   }
- else
-   {
- cmp_insn = gen_cmpsi_1;
- sbb_insn
-   = uns ? gen_subsi3_carry_ccc : gen_subsi3_carry_ccgz;
-   }
+ rtx (*sbb_insn) (machine_mode, rtx, rtx, 

PING 2: [PATCH] implement -Wformat-diag, v2

2019-06-18 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01513.html

On 6/11/19 7:31 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01513.html

On 5/22/19 10:42 AM, Martin Sebor wrote:

Attached is a revised implementation of the -Wformat-diag checker
incorporating the feedback I got on the first revision.

Martin






Re: [PATCH] Fix PR84521

2019-06-18 Thread Max Filippov
Hello,

On Mon, Jun 17, 2019 at 6:10 PM Jeff Law  wrote:
> On 6/17/19 6:58 PM, Wilco Dijkstra wrote:
> >> You mention that arm, mips and xtensa are still broken.  Are they worse
> >> after this patch?

The testcase from the patch passes with the trunk xtensa-linux-gcc
with windowed ABI. But with the changes in this patch a lot of tests
that use longjmp are failing on xtensa-linux.

> >> I think xtensa has two abis and the frame pointer is different across
> >> them.  Presumably a longjmp from one abi to the other isn't valid.

call0 and windowed ABI xtensa code are not supposed to work together.

-- 
Thanks.
-- Max


Re: RFA: Synchronize top level files with binutils

2019-06-18 Thread Nick Clifton
Hi Richard,

>>   OK, here is a resubmission of my patch with just the addition of the
>>   libctf patches this time.

  [Sorry - this one got put on a back burner].

> Would it be feasible to backport this to the other maintained branches
> so that the option of using them with current binutils would be available?

Do you have any particular branches in mind ?  There do seem to be quite a lot 
of them...

Cheers
  Nick




Re: Use ODR for canonical types construction in LTO

2019-06-18 Thread Jan Hubicka
> 
> ICK.  Can you please solve the C++ issue differently?  The patch
> also seems to do many other things ...

If you refer to the fact that C++ seem to create non-ODR types that are
structurally equivalent to ODR types, i tracked it down.  Testcase is
testsuite/g++.dg/lto/20080904_0.C
compiled with -O0 -flto. Then we get structural match of:

truct Base

struct 
{
  int (*__vtbl_ptr_type) () * _vptr.Base;
  char * _buf;
  unsigned int _len;
}

More precisely:

  constant 192>
unit-size  constant 24>
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x76963a80
fields 
unsigned DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x76963540>
unsigned virtual DI a.C:5:7 size  
unit-size 
align:64 warn_if_not_align:0 offset_align 128
offset 
bit-offset  context 
chain 
used private unsigned nonlocal decl_3 DI a.C:15:10 size 
 unit-size 
align:64 warn_if_not_align:0 offset_align 128 offset  bit-offset  context 
 chain >> 
context 
pointer_to_this  reference_to_this 
>

  constant 160>
unit-size  constant 20>
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x769783f0
fields 
unsigned DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x76963540>
unsigned virtual DI a.C:5:7 size  
unit-size 
align:64 warn_if_not_align:0 offset_align 128
offset 
bit-offset  context 
chain 
private unsigned nonlocal decl_3 DI a.C:15:10 size  unit-size 
align:64 warn_if_not_align:0 offset_align 128 offset  bit-offset  context 
 chain >> context 
 reference_to_this >

which differ by nothing except of TYPE_NAME and TYPE_SIZE.
They are created here.

  if (CLASSTYPE_NON_LAYOUT_POD_P (t) || CLASSTYPE_EMPTY_P (t))
{
  /* T needs a different layout as a base (eliding virtual bases
 or whatever).  Create that version.  */
  tree base_t = make_node (TREE_CODE (t));

which makes sense I suppose, but confuses ODR logic and also goes odd
way through canonical type calculations. In non-LTO alias sets are same
because C++ FE does:

31  cxx_get_alias_set (tree t)
32  {
33if (IS_FAKE_BASE_TYPE (t))
34  /* The base variant of a type must be in the same alias set
as the
35 complete type.  */
36  return get_alias_set (TYPE_CONTEXT (t));

I am not quite sure why it is not simply copying the TYPE_CANONICAL
when the type is created instead.

In LTO I think they may end up with different canonical types and later
alias sets because they may have different virtual bases.

I am not sure why this does not seem to reak with TBAA, but will try to
get testcase where both types are structurally different (and do not
differ only by size). Perhaps we want to expose IS_FAKE_BASE to middle
end to get things working more reasonably.

Honza


Re: [C++ PATCH] Avoid constexpr garbage for implicit conversion to void.

2019-06-18 Thread Jason Merrill
On Tue, Jun 11, 2019 at 12:28 AM Jason Merrill  wrote:
>
> On Fri, Jun 7, 2019 at 5:08 PM Jason Merrill  wrote:
> >
> > All expression statements and some other places express implicit conversion 
> > to
> > void with a CONVERT_EXPR.  There's no reason to build up a new one as part 
> > of
> > constexpr evaluation.
> >
> > The ADDR_EXPR change also avoids a bit of garbage by discarding an 
> > ADDR_EXPR we
> > just built but didn't end up using.
> >
> > The location wrapper change doesn't affect garbage, it's just a minor
> > optimization.
>
> More constexpr garbage reduction:

And more:

1) Avoid building anything when converting to reference and back to pointer.
2) Avoid building up new COMPONENT_REFs for store evaluation.
3) Share array index checking between ARRAY_REF and store evaluation.
4) Track where we're calling unshare_constructor from.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 325be200c3c9a2fefed41f2adf874ccd3ca5b4cf
Author: Jason Merrill 
Date:   Sat Jun 15 07:45:01 2019 -0400

Handle constexpr conversion from and then to the same type.

* constexpr.c (cxx_eval_constant_expression): Handle conversion from
and then to the same type.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 22901f811f1..0f68a0c9fca 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -5034,6 +5034,10 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
if (*non_constant_p)
  return t;
tree type = TREE_TYPE (t);
+
+   if (VOID_TYPE_P (type))
+ return void_node;
+
if (TREE_CODE (op) == PTRMEM_CST
&& !TYPE_PTRMEM_P (type))
  op = cplus_expand_constant (op);
@@ -5094,14 +5098,18 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
 conversion.  */
  return fold (t);
 
+   tree sop;
+
/* Handle an array's bounds having been deduced after we built
   the wrapping expression.  */
if (same_type_ignoring_tlq_and_bounds_p (type, TREE_TYPE (op)))
  r = op;
+   else if (sop = tree_strip_nop_conversions (op),
+sop != op && (same_type_ignoring_tlq_and_bounds_p
+  (type, TREE_TYPE (sop
+ r = sop;
else if (tcode == UNARY_PLUS_EXPR)
  r = fold_convert (TREE_TYPE (t), op);
-   else if (VOID_TYPE_P (type))
- r = void_node;
else
  r = fold_build1 (tcode, type, op);
 
commit 30360c70d9d5bc5b680b6e274fa6aca6f2ee8137
Author: Jason Merrill 
Date:   Fri Jun 14 07:45:01 2019 -0400

* constexpr.c (cxx_eval_store_expression): Delay target evaluation.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 7c733d78b5b..22f4fa0d351 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3747,22 +3747,18 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
tree t,
   if (*non_constant_p)
return t;
 }
-  target = cxx_eval_constant_expression (ctx, target,
-true,
-non_constant_p, overflow_p);
-  if (*non_constant_p)
-return t;
 
-  if (!same_type_ignoring_top_level_qualifiers_p (TREE_TYPE (target), type))
+  bool evaluated = false;
+  if (lval)
 {
-  /* For initialization of an empty base, the original target will be
- *(base*)this, which the above evaluation resolves to the object
-argument, which has the derived type rather than the base type.  In
-this situation, just evaluate the initializer and return, since
-there's no actual data to store.  */
-  gcc_assert (is_empty_class (type));
-  return cxx_eval_constant_expression (ctx, init, false,
-  non_constant_p, overflow_p);
+  /* If we want to return a reference to the target, we need to evaluate it
+as a whole; otherwise, only evaluate the innermost piece to avoid
+building up unnecessary *_REFs.  */
+  target = cxx_eval_constant_expression (ctx, target, true,
+non_constant_p, overflow_p);
+  evaluated = true;
+  if (*non_constant_p)
+   return t;
 }
 
   /* Find the underlying variable.  */
@@ -3792,7 +3788,17 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
tree t,
  break;
 
default:
- object = probe;
+ if (evaluated)
+   object = probe;
+ else
+   {
+ probe = cxx_eval_constant_expression (ctx, probe, true,
+   non_constant_p, overflow_p);
+ evaluated = true;
+ if (*non_constant_p)
+   return t;
+   }
+ break;
}
 }
 
@@ -3948,7 +3954,7 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, tree 
t,
   new_ctx.object = target;
   init = cxx_eval_constant_expression (_ctx, init, false,
 

PING^2: [PATCH] i386: Generate standard floating point scalar operation patterns

2019-06-18 Thread H.J. Lu
On Mon, Jun 3, 2019 at 3:50 PM H.J. Lu  wrote:
>
> On Tue, May 21, 2019 at 8:54 AM H.J. Lu  wrote:
> >
> > On Wed, May 15, 2019 at 2:29 PM Richard Sandiford
> >  wrote:
> > >
> > > "H.J. Lu"  writes:
> > > > On Thu, Feb 7, 2019 at 9:49 AM H.J. Lu  wrote:
> > > >>
> > > >> Standard scalar operation patterns which preserve the rest of the 
> > > >> vector
> > > >> look like
> > > >>
> > > >>  (vec_merge:V2DF
> > > >>(vec_duplicate:V2DF
> > > >>  (op:DF (vec_select:DF (reg/v:V2DF 85 [ x ])
> > > >> (parallel [ (const_int 0 [0])]))
> > > >>  (reg:DF 87))
> > > >>(reg/v:V2DF 85 [ x ])
> > > >>(const_int 1 [0x1])]))
> > > >>
> > > >> Add such pattens to i386 backend and convert VEC_CONCAT patterns to
> > > >> standard standard scalar operation patterns.
> > >
> > > It looks like there's some variety in the patterns used, e.g.:
> > >
> > > (define_insn 
> > > "_vm3"
> > >   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> > > (vec_merge:VF_128
> > >   (smaxmin:VF_128
> > > (match_operand:VF_128 1 "register_operand" "0,v")
> > > (match_operand:VF_128 2 "vector_operand" 
> > > "xBm,"))
> > >  (match_dup 1)
> > >  (const_int 1)))]
> > >   "TARGET_SSE"
> > >   "@
> > >\t{%2, %0|%0, %2}
> > >
> > > v\t{%2, 
> > > %1, %0|%0, %1, 
> > > %2}"
> > >   [(set_attr "isa" "noavx,avx")
> > >(set_attr "type" "sse")
> > >(set_attr "btver2_sse_attr" "maxmin")
> > >(set_attr "prefix" "")
> > >(set_attr "mode" "")])
> > >
> > > makes the operand a full vector operation, which seems simpler.
> >
> > This pattern is used to implement scalar smaxmin intrinsics.
> >
> > > The above would then be:
> > >
> > >   (vec_merge:V2DF
> > > (op:V2DF
> > >   (reg:V2DF 85)
> > >   (vec_duplicate:V2DF (reg:DF 87)))
> > > (reg/v:V2DF 85 [ x ])
> > > (const_int 1 [0x1])]))
> > >
> > > I guess technically the two have different faulting behaviour though,
> > > since the smaxmin gets applied to all elements, not just element 0.
> >
> > This is the issue.   We don't use the correct mode for scalar instructions:
> >
> > ---
> > #include 
> >
> > __m128d
> > foo1 (__m128d x, double *p)
> > {
> >   __m128d y = _mm_load_sd (p);
> >   return _mm_max_pd (x, y);
> > }
> > ---
> >
> > movq (%rdi), %xmm1
> > maxpd %xmm1, %xmm0
> > ret
> >
> >
> > Here is the updated patch to add standard floating point scalar
> > operation patterns to i386 backend.Then we can do
> >
> > ---
> > #include 
> >
> > extern __inline __m128d __attribute__((__gnu_inline__,
> > __always_inline__, __artificial__))
> > _new_mm_max_pd (__m128d __A, __m128d __B)
> > {
> >   __A[0] = __A[0] > __B[0] ? __A[0] : __B[0];
> >   return __A;
> > }
> >
> > __m128d
> > foo2 (__m128d x, double *p)
> > {
> >   __m128d y = _mm_load_sd (p);
> >   return _new_mm_max_pd (x, y);
> > }
> >
> > maxsd (%rdi), %xmm0
> > ret
> >
> > We should use generic vector operations to implement i386 intrinsics
> > as much as we can.
> >
> > > The patch seems very specific.  E.g. why just PLUS, MINUS, MULT and DIV?
> >
> > This patch only adds  +, -, *, /, > and <.We can add more if there
> > are testcases
> > for them.
> >
> > > Thanks,
> > > Richard
> > >
> > >
> > > >>
> > > >> gcc/
> > > >>
> > > >> PR target/54855
> > > >> * simplify-rtx.c (simplify_binary_operation_1): Convert
> > > >> VEC_CONCAT patterns to standard standard scalar operation
> > > >> patterns.
> > > >> * config/i386/sse.md (*_vm3): New.
> > > >> (*_vm3): Likewise.
> > > >>
> > > >> gcc/testsuite/
> > > >>
> > > >> PR target/54855
> > > >> * gcc.target/i386/pr54855-1.c: New test.
> > > >> * gcc.target/i386/pr54855-2.c: Likewise.
> > > >> * gcc.target/i386/pr54855-3.c: Likewise.
> > > >> * gcc.target/i386/pr54855-4.c: Likewise.
> > > >> * gcc.target/i386/pr54855-5.c: Likewise.
> > > >> * gcc.target/i386/pr54855-6.c: Likewise.
> > > >> * gcc.target/i386/pr54855-7.c: Likewise.
> > > >
> > > > PING:
> > > >
> > > > https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00398.html
> >
> > Thanks.
> >
>
> PING:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01416.html
>

PING.

-- 
H.J.


PING^3: [PATCH] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move

2019-06-18 Thread H.J. Lu
On Fri, May 31, 2019 at 10:38 AM H.J. Lu  wrote:
>
> On Tue, May 21, 2019 at 2:43 PM H.J. Lu  wrote:
> >
> > On Fri, Feb 22, 2019 at 8:25 AM H.J. Lu  wrote:
> > >
> > > Hi Jan, Uros,
> > >
> > > This patch fixes the wrong code bug:
> > >
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89229
> > >
> > > Tested on AVX2 and AVX512 with and without --with-arch=native.
> > >
> > > OK for trunk?
> > >
> > > Thanks.
> > >
> > > H.J.
> > > --
> > > i386 backend has
> > >
> > > INT_MODE (OI, 32);
> > > INT_MODE (XI, 64);
> > >
> > > So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation,
> > > in case of const_1, all 512 bits set.
> > >
> > > We can load zeros with narrower instruction, (e.g. 256 bit by inherent
> > > zeroing of highpart in case of 128 bit xor), so TImode in this case.
> > >
> > > Some targets prefer V4SF mode, so they will emit float xorps for zeroing.
> > >
> > > sse.md has
> > >
> > > (define_insn "mov_internal"
> > >   [(set (match_operand:VMOVE 0 "nonimmediate_operand"
> > >  "=v,v ,v ,m")
> > > (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand"
> > >  " C,BC,vm,v"))]
> > > 
> > >   /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
> > >  in avx512f, so we need to use workarounds, to access sse 
> > > registers
> > >  16-31, which are evex-only. In avx512vl we don't need 
> > > workarounds.  */
> > >   if (TARGET_AVX512F &&  < 64 && !TARGET_AVX512VL
> > >   && (EXT_REX_SSE_REG_P (operands[0])
> > >   || EXT_REX_SSE_REG_P (operands[1])))
> > > {
> > >   if (memory_operand (operands[0], mode))
> > > {
> > >   if ( == 32)
> > > return "vextract64x4\t{$0x0, %g1, %0|%0, 
> > > %g1, 0x0}";
> > >   else if ( == 16)
> > > return "vextract32x4\t{$0x0, %g1, %0|%0, 
> > > %g1, 0x0}";
> > >   else
> > > gcc_unreachable ();
> > > }
> > > ...
> > >
> > > However, since ix86_hard_regno_mode_ok has
> > >
> > >  /* TODO check for QI/HI scalars.  */
> > >   /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
> > >   if (TARGET_AVX512VL
> > >   && (mode == OImode
> > >   || mode == TImode
> > >   || VALID_AVX256_REG_MODE (mode)
> > >   || VALID_AVX512VL_128_REG_MODE (mode)))
> > > return true;
> > >
> > >   /* xmm16-xmm31 are only available for AVX-512.  */
> > >   if (EXT_REX_SSE_REGNO_P (regno))
> > > return false;
> > >
> > >   if (TARGET_AVX512F &&  < 64 && !TARGET_AVX512VL
> > >   && (EXT_REX_SSE_REG_P (operands[0])
> > >   || EXT_REX_SSE_REG_P (operands[1])))
> > >
> > > is a dead code.
> > >
> > > Also for
> > >
> > > long long *p;
> > > volatile __m256i yy;
> > >
> > > void
> > > foo (void)
> > > {
> > >_mm256_store_epi64 (p, yy);
> > > }
> > >
> > > with AVX512VL, we should generate
> > >
> > > vmovdqa %ymm0, (%rax)
> > >
> > > not
> > >
> > > vmovdqa64   %ymm0, (%rax)
> > >
> > > All TYPE_SSEMOV vector moves are consolidated to ix86_output_ssemov:
> > >
> > > 1. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE/AVX vector
> > > moves will be generated.
> > > 2. If xmm16-xmm31/ymm16-ymm31 registers are used:
> > >a. With AVX512VL, AVX512VL vector moves will be generated.
> > >b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
> > >   move will be done with zmm register move.
> > >
> > > ext_sse_reg_operand is removed since it is no longer needed.
> > >
> > > Tested on AVX2 and AVX512 with and without --with-arch=native.
> > >
> > > gcc/
> > >
> > > PR target/89229
> > > PR target/89346
> > > * config/i386/i386-protos.h (ix86_output_ssemov): New prototype.
> > > * config/i386/i386.c (ix86_get_ssemov): New function.
> > > (ix86_output_ssemov): Likewise.
> > > * config/i386/i386.md (*movxi_internal_avx512f): Call
> > > ix86_output_ssemov for TYPE_SSEMOV.
> > > (*movoi_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV.
> > > Remove ext_sse_reg_operand and TARGET_AVX512VL check.
> > > (*movti_internal): Likewise.
> > > (*movdi_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
> > > Remove ext_sse_reg_operand check.
> > > (*movsi_internal): Likewise.
> > > (*movtf_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
> > > (*movdf_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
> > > Remove TARGET_AVX512F, TARGET_PREFER_AVX256, TARGET_AVX512VL
> > > and ext_sse_reg_operand check.
> > > (*movsf_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV.
> > > Remove TARGET_PREFER_AVX256, TARGET_AVX512VL and
> > > ext_sse_reg_operand check.
> > > * config/i386/mmx.md (MMXMODE:*mov_internal): Call
> > > ix86_output_ssemov for 

PING^1 [PATCH] Minimize number of operations between two areas of memory

2019-06-18 Thread H.J. Lu
On Tue, Jun 11, 2019 at 8:14 AM H.J. Lu  wrote:
>
> For op_by_pieces operations between two areas of memory, this patch adds
> -fminimize-op-by-pieces-run to minimize number of operations.  When
> operating on LENGTH bytes of memory, it starts with the widest usable
> integer size, MAX_SIZE, for LENGTH bytes and finishes with the smallest
> usable integer size, MIN_SIZE, for the remaining bytes where MAX_SIZE
> >= MIN_SIZE.  If MIN_SIZE > the remaining bytes, the last operation is
> performed on MIN_SIZE bytes of overlapping memory from the previous
> operation.
>
> Tested on Linux/x86-64 with both -fminimize-op-by-pieces-run enabled
> and disabled by default.
>
> I will submit a separate patch to enable -fminimize-op-by-pieces-run for -Os.
>
> OK for trunk?
>
> gcc/
>
> PR middl-end/90773
> * common.opt (-fminimize-op-by-pieces-run): New.
> * expr.c (op_by_pieces_d): Add get_usable_mode.
> (op_by_pieces_d::get_usable_mode): New.
> (op_by_pieces_d::run): Use get_usable_mode to get the largest
> usable integer mode for -fminimize-op-by-pieces-run.
> * doc/invoke.texi: Document -fminimize-op-by-pieces-run.
>
> gcc/testsuite/
>
> PR middl-end/90773
> * gcc.target/i386/pr90773-1.c: New test.
> * gcc.target/i386/pr90773-2.c: Likewise.
> * gcc.target/i386/pr90773-3.c: Likewise.
> * gcc.target/i386/pr90773-4.c: Likewise.
> * gcc.target/i386/pr90773-5.c: Likewise.
>
> Thanks.
>

PING:

https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00641.html

-- 
H.J.


C++ PATCH for c++/60223 - ICE with T{} in non-deduced context

2019-06-18 Thread Marek Polacek
Here we ICE on

  template // #1
  struct A {};

  template void foo(A) {}

  void bar() { foo(A()); }

when deducing T in the non-type template parameter, because unify didn't
expect a CONSTRUCTOR:

default:
  /* An unresolved overload is a nondeduced context.  */
  if (is_overloaded_fn (parm) || type_unknown_p (parm))
return unify_success (explain_p);
  gcc_assert (EXPR_P (parm) || TREE_CODE (parm) == TRAIT_EXPR);

This works if T{} is replaced with T() in #1 -- then unify gets a CAST_EXPR,
which is EXPR_P.

Since here we're in a non-deduced context, I think we should simply accept
the CONSTRUCTOR and return unify_success.

I've also updated a comment that has gotten obsolete now.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-06-18  Marek Polacek  

PR c++/60223 - ICE with T{} in non-deduced context.
* pt.c (unify): Allow COMPOUND_LITERAL_P in a non-deduced context.
Update a comment.

* g++.dg/cpp0x/nondeduced1.C: New test.
* g++.dg/cpp0x/nondeduced2.C: New test.
* g++.dg/cpp0x/nondeduced3.C: New test.
* g++.dg/cpp0x/nondeduced4.C: New test.

diff --git gcc/cp/pt.c gcc/cp/pt.c
index 2a626526c6f..69de55369dd 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -22786,7 +22786,9 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
   /* An unresolved overload is a nondeduced context.  */
   if (is_overloaded_fn (parm) || type_unknown_p (parm))
return unify_success (explain_p);
-  gcc_assert (EXPR_P (parm) || TREE_CODE (parm) == TRAIT_EXPR);
+  gcc_assert (EXPR_P (parm)
+ || COMPOUND_LITERAL_P (parm)
+ || TREE_CODE (parm) == TRAIT_EXPR);
 expr:
   /* We must be looking at an expression.  This can happen with
 something like:
@@ -22794,15 +22796,19 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
   template 
   void foo(S, S);
 
-This is a "nondeduced context":
+or
+
+  template
+  void foo(A);
+
+This is a "non-deduced context":
 
   [deduct.type]
 
-  The nondeduced contexts are:
+  The non-deduced contexts are:
 
-  --A type that is a template-id in which one or more of
-the template-arguments is an expression that references
-a template-parameter.
+  --A non-type template argument or an array bound in which
+a subexpression references a template parameter.
 
 In these cases, we assume deduction succeeded, but don't
 actually infer any unifications.  */
diff --git gcc/testsuite/g++.dg/cpp0x/nondeduced1.C 
gcc/testsuite/g++.dg/cpp0x/nondeduced1.C
new file mode 100644
index 000..067079e50df
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/nondeduced1.C
@@ -0,0 +1,16 @@
+// PR c++/60223
+// { dg-do compile { target c++11 } }
+
+template
+struct A { };
+
+template
+void foo(A a);
+
+void bar()
+{
+  foo(A());
+  foo(A());
+  foo<>(A());
+  foo<>(A());
+}
diff --git gcc/testsuite/g++.dg/cpp0x/nondeduced2.C 
gcc/testsuite/g++.dg/cpp0x/nondeduced2.C
new file mode 100644
index 000..3f96fe4e858
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/nondeduced2.C
@@ -0,0 +1,14 @@
+// PR c++/60223
+// { dg-do compile { target c++11 } }
+
+template
+struct A { };
+
+template
+void foo(A);
+
+void bar()
+{
+  foo(A());
+  foo<>(A());
+}
diff --git gcc/testsuite/g++.dg/cpp0x/nondeduced3.C 
gcc/testsuite/g++.dg/cpp0x/nondeduced3.C
new file mode 100644
index 000..d943dceea4b
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/nondeduced3.C
@@ -0,0 +1,16 @@
+// PR c++/60223
+// { dg-do compile { target c++11 } }
+
+template
+struct A { };
+
+template
+void foo(A a);
+
+void bar()
+{
+  foo(A());
+  foo(A());
+  foo<>(A());
+  foo<>(A());
+}
diff --git gcc/testsuite/g++.dg/cpp0x/nondeduced4.C 
gcc/testsuite/g++.dg/cpp0x/nondeduced4.C
new file mode 100644
index 000..818034c857c
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/nondeduced4.C
@@ -0,0 +1,13 @@
+// PR c++/60223
+// { dg-do compile { target c++11 } }
+
+template
+struct A { };
+
+template
+void foo(A, T = T{});
+
+void bar()
+{
+  foo(A());
+}


[AArch64] Use scvtf fbits option where appropriate

2019-06-18 Thread Joel Hutton

On 18/06/2019 11:37, Richard Earnshaw (lists) wrote:
> Start sentences with a capital letter.  End them with a full stop.
> "inequal" isn't a word: you probably mean "unequal".

I've fixed this, the iterator is, however defined as 'fcvt_iesize'
and described in the adjacent comment in iterators.md as 'inequal'.
I've addressed your other comments.

On 18/06/2019 13:30, Richard Sandiford wrote:
> Wilco Dijkstra  writes:
>>   > +/* If X is a positive CONST_DOUBLE with a value that is the 
>> reciprocal of a
>>   > +   power of 2 (i.e 1/2^n) return the number of float bits. e.g. 
>> for x==(1/2^n)
>>   > +   return n. Otherwise return -1.  */
>>   > +int
>>   > +aarch64_fpconst_pow2_recip (rtx x)
>>   > +{
>>   > +  REAL_VALUE_TYPE r0;
>>   > +
>>   > +  if (!CONST_DOUBLE_P (x))
>>   > +    return -1;
>>> CONST_DOUBLE can be used for things other than floating point.  You
>>> should really check that the mode on the double in is in class 
>>> MODE_FLOAT.
>>   Several other functions (eg aarch64_fpconst_pow_of_2) do the same 
>> since
>> this function is only called with HF/SF/DF mode. We could add an 
>> assert for
>> SCALAR_FLOAT_MODE_P (but then aarch64_fpconst_pow_of_2 should do
>> the same).
> IMO we should leave it as-is.  aarch64.h has:
I've gone with the majority and left it as-is, but I don't have strong 
feelings on it.
From 1e44ef7e999527a0b03316cf0ea002f8d4437052 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 13 Jun 2019 11:08:56 +0100
Subject: [PATCH] SCVTF fbits

---
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64.c  |  23 +++
 gcc/config/aarch64/aarch64.md |  39 +
 gcc/config/aarch64/constraints.md |   7 +
 gcc/config/aarch64/predicates.md  |   4 +
 gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c | 140 ++
 6 files changed, 214 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 1e3b1c91db1..ad1ba458a3f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -494,6 +494,7 @@ enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
 int aarch64_fpconst_pow_of_2 (rtx);
+int aarch64_fpconst_pow2_recip (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9a035dd9ed8..028da32174d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18707,6 +18707,29 @@ aarch64_fpconst_pow_of_2 (rtx x)
   return exact_log2 (real_to_integer (r));
 }
 
+/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
+   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for x==(1/2^n)
+   return n. Otherwise return -1.  */
+
+int
+aarch64_fpconst_pow2_recip (rtx x)
+{
+  REAL_VALUE_TYPE r0;
+
+  if (!CONST_DOUBLE_P (x))
+return -1;
+
+  r0 = *CONST_DOUBLE_REAL_VALUE (x);
+  if (exact_real_inverse (DFmode, )
+  && !REAL_VALUE_NEGATIVE (r0))
+{
+	int ret = exact_log2 (real_to_integer ());
+	if (ret >= 1 && ret <= 31)
+	return ret;
+}
+  return -1;
+}
+
 /* If X is a vector of equal CONST_DOUBLE values and that value is
Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 526c7fb0dab..c7c6a18b0ff 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6016,6 +6016,44 @@
   [(set_attr "type" "f_cvtf2i")]
 )
 
+;; Equal width integer to fp combine.
+(define_insn "*aarch64_cvtf__2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w,w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "w,?r"))
+		   (match_operand:GPF 2 "aarch64_fp_pow2_recip" "Dt,Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+switch (which_alternative)
+{
+  case 0:
+	return "cvtf\t%0, %1, #%2";
+  case 1:
+	return "cvtf\t%0, %1, #%2";
+  default:
+	gcc_unreachable ();
+}
+  }
+  [(set_attr "type" "neon_int_to_fp_,f_cvti2f")
+   (set_attr "arch" "simd,fp")]
+)
+
+;; Unequal width integer to fp combine.
+(define_insn "*aarch64_cvtf__2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "r"))
+		   (match_operand:GPF 2 "aarch64_fp_pow2_recip" "Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+return "cvtf\t%0, %1, #%2";
+  }
+  [(set_attr "type" "f_cvti2f")]
+)
+
+;; Equal width integer to fp conversion.
 (define_insn "2"
   [(set (match_operand:GPF 0 "register_operand" 

Re: Do not give up early on access path oracle

2019-06-18 Thread Jan Hubicka
> >Well, me too :-)  I didn't really understand the choice of the original
> >condition above.  It seemed to be "return true if both access sizes are
> >variable", but the comment implied something else.
> 
> Sorry,! Must_eq is obviously fine. 

Thanks, good to know we are on same page :)
After some thinking I however believe we could reorganize oracle to be
faster.  What we do currently is

1) try to match base pointers and if they do we go for rangle_overlap_p only
2) try to match base types and if they do go for rangle_overlap_p only
3) try nonoverlapping_component_refs_p
4) try aliasing_component_refs_p.

Now I think we could do

1) try to match base pointers, if they do and rangle_overlap_p is false,
   return false. Otherwise remember that we can not have partial overlap
   of arrays which we handle conservatively otherwise.
2) try to match base types. If they do
a) return false if range_overlap_p is false
b) do variant of nonoverlaping_component_refs_decl_p starting from
   the known match of types and using LTO friendly
   same_types_for_tbaa_p compare.

   If it does not trip over array_range_refs or unions I do not
   think we need to do sorting like nonoverlapping_component_refs_p
   does. This perhaps the sorting path can be dropped compoetely.
c) return true if failed
3) continue by looking for match of basetype of one path with inner type
   of the ohter (aliasing_component_refs_p) and if match is found
   repeat a),b),c)
4) do the access path continuation test.

I do not think this scheme should miss any cases where
nonoverlapping_component_refs_p matches since if we have match in the
middle of access path then either the access paths are incompatible
(will be ruled out by 4) or mathing types are found in 2/3.

2) is kind of redundant with 3), but since basetype match is common and
it saves some extra walk I think it makes sense to do it first and make
3) to skip outermost REF.

So I think I will drop this patch, fix divergences between
nonoverlapping_component_refs_p and and nonoverlaping_component_refs_decl_p,
hopefully enable aliasing_component_refs and nonoverlapping
and then see if I can reorgnaize the code this way.

Honza



[committed][AArch64] Add a new CC mode for SVE conditions

2019-06-18 Thread Richard Sandiford
The SVE ACLE patches need to introduce a new CC_NZC mode for the
conditions that can be tested after a PTRUE.  In particular, LT needs
to map to "mi"/"first" and GE to "pl"/"nfrst", instead of the normal
CC mapping.

Another advantage of using a separate mode is that we can print the SVE
names of the conditions, which makes the output a bit easier to read.
It therefore seems like an independent improvement that can go in now.

The patch also avoids using (compare X (const_int 0)), because that gets
folded away when used with LTU and GEU ("cc"/"last" and "cs"/"nlast").
Just using an unspec should be OK.

The full set of conditions can't be tested without other SVE ACLE patches.

Tested on aarch64-linux-gnu (with and without SVE).  Applied as r272427.

Richard


2019-06-18  Richard Sandiford  

gcc/
* config/aarch64/aarch64-modes.def (CC_NZC): New CC_MODE.
* config/aarch64/aarch64-sve.md (*3_cc)
(ptest_ptrue, while_ult)
(*while_ult_cc, *cmp)
(*cmp_ptest, *cmp_cc)
(*pred_cmp_combine, *pred_cmp)
(vec_cmp, vec_cmpu, cbranch4):
Use CC_NZC instead of CC.
* config/aarch64/aarch64.md (condjump): Print a '.' in SVE conditions.
* config/aarch64/aarch64.c (aarch64_sve_condition_codes): New variable.
(aarch64_print_operand): Handle E_CC_NZCmode.
(aarch64_emit_sve_ptrue_op_cc): Use gen_set_clobber_cc_nzc instead
of gen_set_clobber_cc.

gcc/testsuite/
* gcc.target/aarch64/sve/struct_vect_18.c: Allow branches to
contain dots.
* gcc.target/aarch64/sve/struct_vect_19.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_20.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_21.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_22.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_23.c: Likewise.
* gcc.target/aarch64/sve/unroll-1.c: Likewise.
* gcc.target/aarch64/sve/while_1.c: Check for b.any.

Index: gcc/config/aarch64/aarch64-modes.def
===
--- gcc/config/aarch64/aarch64-modes.def2019-03-08 18:15:38.220734572 
+
+++ gcc/config/aarch64/aarch64-modes.def2019-06-18 15:44:25.158766687 
+0100
@@ -33,6 +33,8 @@
 CC_MODE (CCFP);
 CC_MODE (CCFPE);
 CC_MODE (CC_SWP);
+CC_MODE (CC_NZC);   /* Only N, Z and C bits of condition flags are valid.
+  (Used with SVE predicate tests.)  */
 CC_MODE (CC_NZ);/* Only N and Z bits of condition flags are valid.  */
 CC_MODE (CC_Z); /* Only Z bit of condition flags is valid.  */
 CC_MODE (CC_C); /* C represents unsigned overflow of a simple addition.  */
Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-06-18 15:43:09.591393527 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-06-18 15:44:25.158766687 +0100
@@ -1172,16 +1172,15 @@ (define_insn "pred_3"
 ;; UNSPEC_PTEST_PTRUE is logically redundant, but means that the tested
 ;; value is structurally equivalent to rhs of the second set.
 (define_insn "*3_cc"
-  [(set (reg:CC CC_REGNUM)
-   (compare:CC
- (unspec:SI [(match_operand:PRED_ALL 1 "register_operand" "Upa")
- (and:PRED_ALL
-   (LOGICAL:PRED_ALL
- (match_operand:PRED_ALL 2 "register_operand" "Upa")
- (match_operand:PRED_ALL 3 "register_operand" "Upa"))
-   (match_dup 1))]
-UNSPEC_PTEST_PTRUE)
- (const_int 0)))
+  [(set (reg:CC_NZC CC_REGNUM)
+   (unspec:CC_NZC
+ [(match_operand:PRED_ALL 1 "register_operand" "Upa")
+  (and:PRED_ALL
+(LOGICAL:PRED_ALL
+  (match_operand:PRED_ALL 2 "register_operand" "Upa")
+  (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+(match_dup 1))]
+ UNSPEC_PTEST_PTRUE))
(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
(and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3))
  (match_dup 1)))]
@@ -1320,12 +1319,11 @@ (define_expand "3"
 ;; the constant.  We would use a separate unspec code for PTESTs involving
 ;; GPs that might not be PTRUEs.
 (define_insn "ptest_ptrue"
-  [(set (reg:CC CC_REGNUM)
-   (compare:CC
- (unspec:SI [(match_operand:PRED_ALL 0 "register_operand" "Upa")
- (match_operand:PRED_ALL 1 "register_operand" "Upa")]
-UNSPEC_PTEST_PTRUE)
- (const_int 0)))]
+  [(set (reg:CC_NZC CC_REGNUM)
+   (unspec:CC_NZC
+ [(match_operand:PRED_ALL 0 "register_operand" "Upa")
+  (match_operand:PRED_ALL 1 "register_operand" "Upa")]
+ UNSPEC_PTEST_PTRUE))]
   "TARGET_SVE"
   "ptest\t%0, %1.b"
 )
@@ -1337,7 +1335,7 @@ (define_insn "while_ult, %1, %2"
 )
@@ -1346,15 +1344,14 @@ (define_insn "while_ult_cc"
-  [(set (reg:CC CC_REGNUM)
-  

[committed][AArch64] Tabify aarch64-sve.md

2019-06-18 Thread Richard Sandiford
Tested on aarch64-linux-gnu (with and without SVE).  Applied as r272426.

Richard


2019-06-18  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md: Tabify file.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-06-18 15:42:40.859631868 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-06-18 15:43:09.591393527 +0100
@@ -72,7 +72,7 @@ (define_expand "mov"
head of the file) and increases the addressing choices for
little-endian.  */
 if ((MEM_P (operands[0]) || MEM_P (operands[1]))
-&& can_create_pseudo_p ())
+   && can_create_pseudo_p ())
   {
aarch64_expand_sve_mem_move (operands[0], operands[1], mode);
DONE;
@@ -88,7 +88,7 @@ (define_expand "mov"
 /* Optimize subregs on big-endian targets: we can use REV[BHW]
instead of going through memory.  */
 if (BYTES_BIG_ENDIAN
-&& aarch64_maybe_expand_sve_subreg_move (operands[0], operands[1]))
+   && aarch64_maybe_expand_sve_subreg_move (operands[0], operands[1]))
   DONE;
   }
 )
@@ -100,7 +100,7 @@ (define_expand "mov"
 (define_insn_and_split "*aarch64_sve_mov_subreg_be"
   [(set (match_operand:SVE_ALL 0 "aarch64_sve_nonimmediate_operand" "=w")
(unspec:SVE_ALL
-  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+ [(match_operand:VNx16BI 1 "register_operand" "Upl")
   (match_operand 2 "aarch64_any_register_operand" "w")]
  UNSPEC_REV_SUBREG))]
   "TARGET_SVE && BYTES_BIG_ENDIAN"
@@ -147,7 +147,7 @@ (define_insn "*aarch64_sve_mov_be"
 (define_expand "aarch64_sve_reload_be"
   [(parallel
  [(set (match_operand 0)
-   (match_operand 1))
+  (match_operand 1))
   (clobber (match_operand:VNx16BI 2 "register_operand" "=Upl"))])]
   "TARGET_SVE && BYTES_BIG_ENDIAN"
   {
@@ -1442,24 +1442,24 @@ (define_insn "*cmp_cc"
 (define_insn_and_split "*pred_cmp_combine"
   [(set (match_operand: 0 "register_operand" "=Upa, Upa")
(and:
- (unspec:
-   [(match_operand: 1)
-(SVE_INT_CMP:
-  (match_operand:SVE_I 2 "register_operand" "w, w")
-  (match_operand:SVE_I 3 "aarch64_sve_cmp__operand" 
", w"))]
-   UNSPEC_MERGE_PTRUE)
- (match_operand: 4 "register_operand" "Upl, Upl")))
+(unspec:
+  [(match_operand: 1)
+   (SVE_INT_CMP:
+ (match_operand:SVE_I 2 "register_operand" "w, w")
+ (match_operand:SVE_I 3 "aarch64_sve_cmp__operand" 
", w"))]
+  UNSPEC_MERGE_PTRUE)
+(match_operand: 4 "register_operand" "Upl, Upl")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_SVE"
   "#"
   "&& 1"
   [(parallel
  [(set (match_dup 0)
-  (and:
-(SVE_INT_CMP:
-  (match_dup 2)
-  (match_dup 3))
-(match_dup 4)))
+ (and:
+   (SVE_INT_CMP:
+ (match_dup 2)
+ (match_dup 3))
+   (match_dup 4)))
   (clobber (reg:CC CC_REGNUM))])]
 )
 
@@ -2730,8 +2730,8 @@ (define_expand "vec_unpack_flo
a ZIP whose first operand is zero.  */
 rtx temp = gen_reg_rtx (VNx4SImode);
 emit_insn ((
-   ? gen_aarch64_sve_zip2vnx4si
-   : gen_aarch64_sve_zip1vnx4si)
+   ? gen_aarch64_sve_zip2vnx4si
+   : gen_aarch64_sve_zip1vnx4si)
   (temp, operands[1], operands[1]));
 rtx ptrue = aarch64_ptrue_reg (VNx2BImode);
 emit_insn (gen_aarch64_sve_vnx4sivnx2df2 (operands[0],


[AArch64] Factor out pfalse predicate creation

2019-06-18 Thread Richard Sandiford
Following on from the previous ptrue patch.

Tested on aarch64-linux-gnu (with and without SVE).  Applied as r272425.

Richard


2019-06-18  Richard Sandiford  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_pfalse_reg): Declare.
* config/aarch64/aarch64.c (aarch64_pfalse_reg): New function.
* config/aarch64/aarch64-sve.md: Use it.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2019-06-18 15:42:18.535817057 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2019-06-18 15:42:40.859631868 +0100
@@ -521,6 +521,7 @@ void aarch64_err_no_fpadvsimd (machine_m
 void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0);
 rtx aarch64_ptrue_reg (machine_mode);
+rtx aarch64_pfalse_reg (machine_mode);
 void aarch64_emit_sve_pred_move (rtx, rtx, rtx);
 void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode);
 bool aarch64_maybe_expand_sve_subreg_move (rtx, rtx);
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-06-18 15:42:18.539817024 +0100
+++ gcc/config/aarch64/aarch64.c2019-06-18 15:42:40.863631835 +0100
@@ -2467,6 +2467,15 @@ aarch64_ptrue_reg (machine_mode mode)
   return force_reg (mode, CONSTM1_RTX (mode));
 }
 
+/* Return an all-false predicate register of mode MODE.  */
+
+rtx
+aarch64_pfalse_reg (machine_mode mode)
+{
+  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
+  return force_reg (mode, CONST0_RTX (mode));
+}
+
 /* Return true if we can move VALUE into a register using a single
CNT[BHWD] instruction.  */
 
Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-06-18 15:42:18.535817057 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-06-18 15:42:40.859631868 +0100
@@ -488,7 +488,7 @@ (define_expand "vec_extract"
   {
/* The last element can be extracted with a LASTB and a false
   predicate.  */
-   rtx sel = force_reg (mode, CONST0_RTX (mode));
+   rtx sel = aarch64_pfalse_reg (mode);
emit_insn (gen_extract_last_ (operands[0], sel, operands[1]));
DONE;
   }


[committed][AArch64] Factor out ptrue predicate creation

2019-06-18 Thread Richard Sandiford
This is the first step to canonicalising predicate constants so that
they can be reused between modes.

Tested on aarch64-linux-gnu (with and without SVE).  Applied as 

Richard


2019-06-18  Richard Sandiford  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_ptrue_reg): Declare.
* config/aarch64/aarch64.c (aarch64_ptrue_reg): New functions.
(aarch64_expand_sve_widened_duplicate, aarch64_expand_sve_mem_move)
(aarch64_maybe_expand_sve_subreg_move, aarch64_evpc_rev_local)
(aarch64_expand_sve_vec_cmp_int): Use it.
(aarch64_expand_sve_vec_cmp_float): Likewise.
* config/aarch64/aarch64-sve.md: Likewise throughout.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2019-06-07 08:39:40.998350935 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2019-06-18 15:42:18.535817057 +0100
@@ -520,6 +520,7 @@ const char * aarch64_output_probe_sve_st
 void aarch64_err_no_fpadvsimd (machine_mode);
 void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0);
+rtx aarch64_ptrue_reg (machine_mode);
 void aarch64_emit_sve_pred_move (rtx, rtx, rtx);
 void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode);
 bool aarch64_maybe_expand_sve_subreg_move (rtx, rtx);
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-06-18 09:35:55.205867120 +0100
+++ gcc/config/aarch64/aarch64.c2019-06-18 15:42:18.539817024 +0100
@@ -2458,6 +2458,15 @@ aarch64_force_temporary (machine_mode mo
 }
 }
 
+/* Return an all-true predicate register of mode MODE.  */
+
+rtx
+aarch64_ptrue_reg (machine_mode mode)
+{
+  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
+  return force_reg (mode, CONSTM1_RTX (mode));
+}
+
 /* Return true if we can move VALUE into a register using a single
CNT[BHWD] instruction.  */
 
@@ -3187,7 +3196,7 @@ aarch64_expand_sve_widened_duplicate (rt
   machine_mode mode = GET_MODE (dest);
   unsigned int elem_bytes = GET_MODE_UNIT_SIZE (mode);
   machine_mode pred_mode = aarch64_sve_pred_mode (elem_bytes).require ();
-  rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+  rtx ptrue = aarch64_ptrue_reg (pred_mode);
   src = gen_rtx_UNSPEC (mode, gen_rtvec (2, ptrue, src), UNSPEC_LD1RQ);
   emit_insn (gen_rtx_SET (dest, src));
   return true;
@@ -3448,7 +3457,7 @@ aarch64_emit_sve_pred_move (rtx dest, rt
 aarch64_expand_sve_mem_move (rtx dest, rtx src, machine_mode pred_mode)
 {
   machine_mode mode = GET_MODE (dest);
-  rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+  rtx ptrue = aarch64_ptrue_reg (pred_mode);
   if (!register_operand (src, mode)
   && !register_operand (dest, mode))
 {
@@ -3512,7 +3521,7 @@ aarch64_maybe_expand_sve_subreg_move (rt
 return false;
 
   /* Generate *aarch64_sve_mov_subreg_be.  */
-  rtx ptrue = force_reg (VNx16BImode, CONSTM1_RTX (VNx16BImode));
+  rtx ptrue = aarch64_ptrue_reg (VNx16BImode);
   rtx unspec = gen_rtx_UNSPEC (GET_MODE (dest), gen_rtvec (2, ptrue, src),
   UNSPEC_REV_SUBREG);
   emit_insn (gen_rtx_SET (dest, unspec));
@@ -16753,7 +16762,7 @@ aarch64_evpc_rev_local (struct expand_ve
   rtx src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0), unspec);
   if (d->vec_flags == VEC_SVE_DATA)
 {
-  rtx pred = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+  rtx pred = aarch64_ptrue_reg (pred_mode);
   src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (2, pred, src),
UNSPEC_MERGE_PTRUE);
 }
@@ -17101,7 +17110,7 @@ aarch64_expand_sve_vec_cmp_int (rtx targ
   if (!aarch64_sve_cmp_operand_p (code, op1))
 op1 = force_reg (data_mode, op1);
 
-  rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+  rtx ptrue = aarch64_ptrue_reg (pred_mode);
   rtx cond = gen_rtx_fmt_ee (code, pred_mode, op0, op1);
   aarch64_emit_sve_ptrue_op_cc (target, ptrue, cond);
 }
@@ -17160,7 +17169,7 @@ aarch64_expand_sve_vec_cmp_float (rtx ta
   machine_mode pred_mode = GET_MODE (target);
   machine_mode data_mode = GET_MODE (op0);
 
-  rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+  rtx ptrue = aarch64_ptrue_reg (pred_mode);
   switch (code)
 {
 case UNORDERED:
Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-06-18 15:41:32.604198094 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-06-18 15:42:18.535817057 +0100
@@ -232,7 +232,7 @@ (define_expand "gather_load"
  UNSPEC_LD1_GATHER))]
   "TARGET_SVE"
   {
-operands[5] = force_reg (mode, CONSTM1_RTX (mode));
+operands[5] = aarch64_ptrue_reg (mode);
   }
 )
 
@@ -289,7 +289,7 @@ (define_expand "scatter_store"
  UNSPEC_ST1_SCATTER))]
   "TARGET_SVE"
   {
-operands[5] = force_reg (mode, 

[committed][AArch64] Simplify SVE IFN_COND patterns

2019-06-18 Thread Richard Sandiford
This patch makes the binary IFN_COND patterns use the same approach
as the ternary patterns, with one pattern handling the cases in
which the "else" value isn't tied to one of the other inputs.

Tested on aarch64-linux-gnu with and without SVE, applied as r272423.

Richard


2019-06-18  Richard Sandiford  
Kugan Vivekanandarajah  

gcc/
* config/aarch64/aarch64-sve.md (*cond__0): Delete.
(*cond__z): Fold into...
(*cond__any): ...here.  Also handle cases in which
operand 4 can be tied to operand 0 (either inherently or via RA).

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2019-06-18 09:35:55.197867186 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2019-06-18 15:41:32.604198094 +0100
@@ -1868,41 +1868,6 @@ (define_expand "cond_"
   "TARGET_SVE"
 )
 
-;; Predicated integer operations with select matching the output operand.
-(define_insn "*cond__0"
-  [(set (match_operand:SVE_I 0 "register_operand" "+w, w, ?")
-   (unspec:SVE_I
- [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
-  (SVE_INT_BINARY:SVE_I
-(match_operand:SVE_I 2 "register_operand" "0, w, w")
-(match_operand:SVE_I 3 "register_operand" "w, 0, w"))
-  (match_dup 0)]
- UNSPEC_SEL))]
-  "TARGET_SVE"
-  "@
-   \t%0., %1/m, %0., %3.
-   \t%0., %1/m, %0., %2.
-   movprfx\t%0, %1/m, %2\;\t%0., %1/m, %0., 
%3."
-  [(set_attr "movprfx" "*,*,yes")]
-)
-
-(define_insn "*cond__0"
-  [(set (match_operand:SVE_SDI 0 "register_operand" "+w, w, ?")
-   (unspec:SVE_SDI
- [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
-  (SVE_INT_BINARY_SD:SVE_SDI
-(match_operand:SVE_SDI 2 "register_operand" "0, w, w")
-(match_operand:SVE_SDI 3 "register_operand" "w, 0, w"))
-  (match_dup 0)]
- UNSPEC_SEL))]
-  "TARGET_SVE"
-  "@
-   \t%0., %1/m, %0., %3.
-   \t%0., %1/m, %0., %2.
-   movprfx\t%0, %1/m, %2\;\t%0., %1/m, %0., 
%3."
-  [(set_attr "movprfx" "*,*,yes")]
-)
-
 ;; Predicated integer operations with select matching the first operand.
 (define_insn "*cond__2"
   [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
@@ -1969,78 +1934,64 @@ (define_insn "*cond__3"
   [(set_attr "movprfx" "*,yes")]
 )
 
-;; Predicated integer operations with select matching zero.
-(define_insn "*cond__z"
-  [(set (match_operand:SVE_I 0 "register_operand" "=")
-   (unspec:SVE_I
- [(match_operand: 1 "register_operand" "Upl")
-  (SVE_INT_BINARY:SVE_I
-(match_operand:SVE_I 2 "register_operand" "w")
-(match_operand:SVE_I 3 "register_operand" "w"))
-  (match_operand:SVE_I 4 "aarch64_simd_imm_zero")]
- UNSPEC_SEL))]
-  "TARGET_SVE"
-  "movprfx\t%0., %1/z, %2.\;\t%0., %1/m, 
%0., %3."
-  [(set_attr "movprfx" "yes")]
-)
-
-(define_insn "*cond__z"
-  [(set (match_operand:SVE_SDI 0 "register_operand" "=")
-   (unspec:SVE_SDI
- [(match_operand: 1 "register_operand" "Upl")
-  (SVE_INT_BINARY_SD:SVE_SDI
-(match_operand:SVE_SDI 2 "register_operand" "w")
-(match_operand:SVE_SDI 3 "register_operand" "w"))
-  (match_operand:SVE_SDI 4 "aarch64_simd_imm_zero")]
- UNSPEC_SEL))]
-  "TARGET_SVE"
-  "movprfx\t%0., %1/z, %2.\;\t%0., %1/m, 
%0., %3."
-  [(set_attr "movprfx" "yes")]
-)
-
-;; Synthetic predications with select unmatched.
+;; Predicated integer binary operations in which the values of inactive
+;; lanes are distinct from the other inputs.
 (define_insn_and_rewrite "*cond__any"
-  [(set (match_operand:SVE_I 0 "register_operand" "=")
+  [(set (match_operand:SVE_I 0 "register_operand" "=, , , , ?")
(unspec:SVE_I
- [(match_operand: 1 "register_operand" "Upl")
+ [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl, 
Upl")
   (SVE_INT_BINARY:SVE_I
-(match_operand:SVE_I 2 "register_operand" "w")
-(match_operand:SVE_I 3 "register_operand" "w"))
-  (match_operand:SVE_I 4 "register_operand"   "w")]
+(match_operand:SVE_I 2 "register_operand" "0, w, w, w, w")
+(match_operand:SVE_I 3 "register_operand" "w, 0, w, w, w"))
+  (match_operand:SVE_I 4 "aarch64_simd_reg_or_zero" "Dz, Dz, Dz, 0, 
w")]
  UNSPEC_SEL))]
   "TARGET_SVE
-   && !(rtx_equal_p (operands[0], operands[4])
-|| rtx_equal_p (operands[2], operands[4])
-|| rtx_equal_p (operands[3], operands[4]))"
-  "#"
-  "&& reload_completed"
+   && !rtx_equal_p (operands[2], operands[4])
+   && !rtx_equal_p (operands[3], operands[4])"
+  "@
+   movprfx\t%0., %1/z, %0.\;\t%0., %1/m, 
%0., %3.
+   movprfx\t%0., %1/z, %0.\;\t%0., 
%1/m, %0., %2.
+   movprfx\t%0., %1/z, %2.\;\t%0., %1/m, 
%0., %3.
+   movprfx\t%0., %1/m, %2.\;\t%0., %1/m, 
%0., %3.
+   #"
+  "&& reload_completed
+   && register_operand (operands[4], mode)
+   && !rtx_equal_p 

Re: [PATCH][gcc] libgccjit: add bitfield support

2019-06-18 Thread Andrea Corallo

David Malcolm writes:


> Thanks for working on this; sorry for the delay in reviewing it.
>
> Overall, this looks close to being ready, but I have a few notes:
>
> [...]


Cool thanks for the review.
I'll adress your comments.
When you have time if you could take a look to the other pending patch
I have (this is way smaller) would be great.

Bests
  Andrea


Re: [PATCH][gcc] libgccjit: add bitfield support

2019-06-18 Thread David Malcolm
On Mon, 2019-06-03 at 09:51 +, Andrea Corallo wrote:
> Hi all,
> I would like to submit this patch that aims to introduce bitfields
> support into libgccjit.
> 
> A new entry point gcc_jit_context_new_bitfield is added plus relative
> testcase.
> 
> Checked with make check-jit does not introduce regressions.
> 
> Feedbacks are very welcome.
> 
> Bests
> 
> Andrea
> 
> 2019-06-01  Andrea Corallo andrea.cora...@arm.com
> 
> * docs/topics/compatibility.rst (LIBGCCJIT_ABI_12): New ABI tag.
> * docs/topics/types.rst: Add gcc_jit_context_new_bitfield.
> * jit-common.h (namespace recording): Add class bitfield.
> * jit-playback.c: Include "c-family/c-common.h"
> (playback::context::new_bitfield): New method.
> (playback::compound_type::set_fields): Add bitfield support.
> (playback::lvalue::jit_mark_addressable): Make this a method of
> lvalue
> plus return a bool to communicate success.
> (playback::lvalue::get_address): Check for jit_mark_addressable
> return
> value.
> * jit-playback.h (new_bitfield): New method.
> (class bitfield): New class.
> (class lvalue): Add jit_mark_addressable method.
> * jit-recording.c (recording::context::new_bitfield): New method.
> (recording::bitfield::replay_into): New method.
> (recording::bitfield::write_to_dump): Likewise.
> (recording::bitfield::make_debug_string): Likewise.
> (recording::bitfield::write_reproducer): Likewise.
> * jit-recording.h (class context): Add new_bitfield method.
> (class field): Make it derivable by class bitfield.
> (class bitfield): Add new class.
> * libgccjit++.h (class context): Add new_bitfield method.
> * libgccjit.c (struct gcc_jit_bitfield): New structure.
> (gcc_jit_context_new_bitfield): New function.
> * libgccjit.h
> (LIBGCCJIT_HAVE_gcc_jit_context_new_bitfield) New macro.
> (gcc_jit_context_new_bitfield): New function.
> * libgccjit.map (LIBGCCJIT_ABI_12) New ABI tag.
> 
> 
> 2019-06-01  Andrea Corallo andrea.cora...@arm.com
> 
> * jit.dg/all-non-failing-tests.h: Add test-accessing-bitfield.c.
> * jit.dg/test-accessing-bitfield.c: New testcase.

Thanks for working on this; sorry for the delay in reviewing it.

Overall, this looks close to being ready, but I have a few notes:

[...]

> diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
> index b74495c..7676e55 100644
> --- a/gcc/jit/jit-playback.c
> +++ b/gcc/jit/jit-playback.c
> @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "convert.h"
>  #include "stor-layout.h"
>  #include "print-tree.h"
> +#include "c-family/c-common.h"

Presumably this is for DECL_C_BIT_FIELD etc, and this would make
libgccjit piggy-back off of the C family's macros for using lang flag
4?

If so, I think it would be cleaner to instead take a copy of those
macros (at the top of jit-playback.c for now), renaming them to
  DECL_JIT_BIT_FIELD
etc, with a "compare with" comment referring to the C frontend macros.

That way libgccjit doesn't directly depend on implementation details of
the C-family of frontends, but it's easy to see where we took the code
from (I believe that libgccjit isn't currently making any use of lang
flags, so this would be the first).

>  #include "gimplify.h"
>  #include "gcc-driver-name.h"
>  #include "attribs.h"
> @@ -263,6 +264,48 @@ new_field (location *loc,
>return new field (decl);
>  }
>  
> +/* Construct a playback::bitfield instance (wrapping a tree).  */
> +
> +playback::field *
> +playback::context::
> +new_bitfield (location *loc,
> +   type *type,
> +   int width,
> +   const char *name)
> +{
> +  gcc_assert (type);
> +  gcc_assert (name);
> +  gcc_assert (width);
> +
> +  /* compare with c/c-decl.c:grokfield,  grokdeclarator and
> + check_bitfield_type_and_width.  */
> +
> +  tree tree_type = type->as_tree ();
> +  if (TREE_CODE (tree_type) != INTEGER_TYPE
> +  && TREE_CODE (tree_type) != BOOLEAN_TYPE)
> +{
> +  add_error (loc, "bit-field %s has invalid type", name);

Ideally this error message would identify the type, and be more precise
about what the problem with it is.

I initially thought something like:

  add_error (loc,
 "bit-field %s has invalid type %s (must be integer or boolean)",
 name, type->get_debug_string ());

would work, but type is a playback::type, rather than a
recording::type.

Is there a way to catch this problem earlier, before we reach playback?
Alternatively:

  add_error (loc,
 "bit-field %s has invalid type (must be integer or boolean)");
 name);

would at least be more precise about the problem.

It would be good to have test coverage for this; see e.g.
  gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_field-opaque-struct.c
and similar, where testcases have the form:

test-error-API_ENTRYPOINT-WHAT-WENT-WRONG.c


> +  return NULL;
> +}
> +  tree tree_width = build_int_cst (integer_type_node, width);
> +  if (compare_tree_int (tree_width, TYPE_PRECISION (tree_type)) > 0)
> +{
> +  add_error 

RE: [patch][aarch64]: fix unrecognizable insn for ldr got in ilp32 tiny

2019-06-18 Thread Sylvia Taylor
Hi Wilco,

Combined them into one pattern. Updated the diff and the changelog is now:

gcc/ChangeLog:

2019-06-18  Sylvia Taylor  

* config/aarch64/aarch64.c
(aarch64_load_symref_appropriately): Change SYMBOL_TINY_GOT.
* config/aarch64/aarch64.md
(ldr_got_tiny_): New pattern.
(ldr_got_tiny_sidi): New pattern.

Cheers,
Syl

-Original Message-
From: Wilco Dijkstra  
Sent: 13 June 2019 18:42
To: Sylvia Taylor 
Cc: nd ; GCC Patches ; Richard Earnshaw 
; James Greenhalgh 
Subject: Re: [patch][aarch64]: fix unrecognizable insn for ldr got in ilp32 tiny

Hi Sylvia,

-(define_insn "ldr_got_tiny"
+(define_insn "ldr_got_tiny_di"
   [(set (match_operand:DI 0 "register_operand" "=r")
-   (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")]
-  UNSPEC_GOTTINYPIC))]
+   (unspec:DI
+ [(match_operand:DI 1 "aarch64_valid_symref" "S")]
+   UNSPEC_GOTTINYPIC))]
   ""
   "ldr\\t%0, %L1"
   [(set_attr "type" "load_8")]
 )
 
+(define_insn "ldr_got_tiny_si"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (unspec:SI
+ [(match_operand:SI 1 "aarch64_valid_symref" "S")]
+   UNSPEC_GOTTINYPIC))]
+  "TARGET_ILP32"
+  "ldr\\t%0, %L1"
+  [(set_attr "type" "load_4")]
+)

These can be easily combined like the related ldr_got_small_.

Wilco

-Original Message-
From: Sylvia Taylor 
Sent: 11 June 2019 14:25
To: Richard Earnshaw ; James Greenhalgh 
; Marcus Shawcroft ; 
gcc-patches@gcc.gnu.org
Cc: nd 
Subject: [patch][aarch64]: fix unrecognizable insn for ldr got in ilp32 tiny

Greetings,

This patch addresses a bug in ldr GOT for mcmodel=tiny in which this 
instruction is not generated for ilp32 modes.

Defined 2 new patterns for ldr_got_tiny. Added additional checks to use the 
appropriate rtl pattern for any of the modes.

Examples of previously unrecognized instructions:
ldrx1, :got:_ZTIi// [c=4 l=4]  ldr_got_tiny_si
ldrx0, :got:global   // [c=4 l=4]  ldr_got_tiny_sidi

Bootstrapped and tested on aarch64-none-linux-gnu.
Bug fix tested with aarch64-none-elf-g++ -mabi=ilp32 -mcmodel=tiny -fpic.

The existing test now fixed is: testsuite/g++.dg/torture/stackalign/throw-1.C

Ok for trunk? If yes, I don't have any commit rights, so can someone please 
commit it on my behalf.

Cheers,
Syl

gcc/ChangeLog:

2019-06-11  Sylvia Taylor  

* config/aarch64/aarch64.c
(aarch64_load_symref_appropriately): Change SYMBOL_TINY_GOT.
* config/aarch64/aarch64.md
(ldr_got_tiny): Change to ldr_got_tiny_di.
(ldr_got_tiny_si): New pattern.
(ldr_got_tiny_sidi): New pattern.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
b38505b0872688634b2d3f625ab8d313e89cfca0..26a8f91b4af53eb2301f27f82a164174c6ef7774
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2251,8 +2251,26 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
   }
 
 case SYMBOL_TINY_GOT:
-  emit_insn (gen_ldr_got_tiny (dest, imm));
-  return;
+  {
+   machine_mode mode = GET_MODE (dest);
+
+   if (mode == ptr_mode)
+ {
+   if (mode == DImode)
+ emit_insn (gen_ldr_got_tiny_di (dest, imm));
+   else
+ /* TARGET_ILP32.  */
+ emit_insn (gen_ldr_got_tiny_si (dest, imm));
+ }
+   else
+ {
+   /* TARGET_ILP32.  */
+   gcc_assert (mode == Pmode);
+   emit_insn (gen_ldr_got_tiny_sidi (dest, imm));
+ }
+
+   return;
+  }
 
 case SYMBOL_TINY_TLSIE:
   {
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
ff83974aeb0b1bf46415c29ba47ada74a79d7586..34a1c52777ed2533dc7f08491f5852138c0e1d00
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6469,13 +6469,25 @@
   [(set_attr "type" "load_4")]
 )
 
-(define_insn "ldr_got_tiny"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-   (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")]
-  UNSPEC_GOTTINYPIC))]
+(define_insn "ldr_got_tiny_"
+  [(set (match_operand:PTR 0 "register_operand" "=r")
+   (unspec:PTR
+ [(match_operand:PTR 1 "aarch64_valid_symref" "S")]
+   UNSPEC_GOTTINYPIC))]
   ""
   "ldr\\t%0, %L1"
-  [(set_attr "type" "load_8")]
+  [(set_attr "type" "load_")]
+)
+
+(define_insn "ldr_got_tiny_sidi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI
+ (unspec:SI
+   [(match_operand:DI 1 "aarch64_valid_symref" "S")]
+ UNSPEC_GOTTINYPIC)))]
+  "TARGET_ILP32"
+  "ldr\\t%0, %L1"
+  [(set_attr "type" "load_4")]
 )
 
 (define_insn "aarch64_load_tp_hard"


Re: [PATCH] avoid ice due to inconsistent argument types to fold_build (PR 90662)

2019-06-18 Thread Christophe Lyon
On Tue, 18 Jun 2019 at 15:07, Martin Sebor  wrote:
>
> On 6/18/19 2:38 AM, Christophe Lyon wrote:
> > On Fri, 14 Jun 2019 at 03:35, Jeff Law  wrote:
> >>
> >> On 6/13/19 1:10 PM, Martin Sebor wrote:
> >>> Attached is a fix for the fold_build call with inconsistent
> >>> argument types introduced in a recent commit of mine.
> >>>
> >>> Tested on x86_64-linux.
> >>>
> >>> Martin
> >>>
> >>> gcc-90662.diff
> >>>
> >>> PR tree-optimization/90662 - strlen of a string in a vla plus offset not 
> >>> folded
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>PR tree-optimization/90662
> >>>* tree-ssa-strlen.c (get_stridx): Convert fold_build2 operands
> >>>to the same type.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>PR tree-optimization/90662
> >>>* gcc.dg/pr90866-2.c: New test.
> >>>* gcc.dg/pr90866.c: Ditto.
> >> OK
> >> jeff
> >
> > Hi,
> >
> > I've noticed that pr90866-2.c fails on arm-none-eabi:
> > /gcc/testsuite/gcc.dg/pr90866-2.c:17:5: error: conflicting types for 'i'
> > /gcc/testsuite/gcc.dg/pr90866-2.c:17:1: note: an argument type that
> > has a default promotion cannot match an empty parameter name list
> > declaration
> > /gcc/testsuite/gcc.dg/pr90866-2.c:16:5: note: previous declaration of
> > 'i' was here
> > compiler exited with status 1
> > FAIL: gcc.dg/pr90866-2.c (test for excess errors)
> >
> > Removing 'int i();' makes the test pass, but I'm wondering why is
> > passes on other targets without this change?
>
> I'm sure it's because of the difference in type promotion rules
> on the target.
OK, it's surprising it work on arm-linux* and not on arm-none-eabi?

> The function declaration shouldn't have an impact
> on the test case so I have removed it in r272418.
OK thanks

>
> Thanks
> Martin


Re: [PATCH] avoid ice due to inconsistent argument types to fold_build (PR 90662)

2019-06-18 Thread Martin Sebor

On 6/18/19 2:38 AM, Christophe Lyon wrote:

On Fri, 14 Jun 2019 at 03:35, Jeff Law  wrote:


On 6/13/19 1:10 PM, Martin Sebor wrote:

Attached is a fix for the fold_build call with inconsistent
argument types introduced in a recent commit of mine.

Tested on x86_64-linux.

Martin

gcc-90662.diff

PR tree-optimization/90662 - strlen of a string in a vla plus offset not folded

gcc/ChangeLog:

   PR tree-optimization/90662
   * tree-ssa-strlen.c (get_stridx): Convert fold_build2 operands
   to the same type.

gcc/testsuite/ChangeLog:

   PR tree-optimization/90662
   * gcc.dg/pr90866-2.c: New test.
   * gcc.dg/pr90866.c: Ditto.

OK
jeff


Hi,

I've noticed that pr90866-2.c fails on arm-none-eabi:
/gcc/testsuite/gcc.dg/pr90866-2.c:17:5: error: conflicting types for 'i'
/gcc/testsuite/gcc.dg/pr90866-2.c:17:1: note: an argument type that
has a default promotion cannot match an empty parameter name list
declaration
/gcc/testsuite/gcc.dg/pr90866-2.c:16:5: note: previous declaration of
'i' was here
compiler exited with status 1
FAIL: gcc.dg/pr90866-2.c (test for excess errors)

Removing 'int i();' makes the test pass, but I'm wondering why is
passes on other targets without this change?


I'm sure it's because of the difference in type promotion rules
on the target.  The function declaration shouldn't have an impact
on the test case so I have removed it in r272418.

Thanks
Martin


Re: [PATCH] Replace std::to_string for integers with optimized version

2019-06-18 Thread Jonathan Wakely

On 18/06/19 13:58 +0200, Christophe Lyon wrote:

On Mon, 17 Jun 2019 at 15:21, Jonathan Wakely  wrote:


On 13/06/19 22:41 +0200, Christophe Lyon wrote:
>Hi,
>
>
>On Wed, 12 Jun 2019 at 16:54, Jonathan Wakely  wrote:
>>
>> The std::to_chars functions from C++17 can be used to implement
>> std::to_string with much better performance than calling snprintf. Only
>> the __detail::__to_chars_len and __detail::__to_chars_10 functions are
>> needed for to_string, because it always outputs base 10 representations.
>>
>> The return type of __detail::__to_chars_10 should not be declared before
>> C++17, so the function body is extracted into a new function that can be
>> reused by to_string and __detail::__to_chars_10.
>>
>> The existing tests for to_chars rely on to_string to check for correct
>> answers. Now that they use the same code that doesn't actually ensure
>> correctness, so add new tests for std::to_string that compare against
>> printf output.
>>
>> * include/Makefile.am: Add new  header.
>> * include/Makefile.in: Regenerate.
>> * include/bits/basic_string.h (to_string(int), to_string(unsigned))
>> (to_string(long), to_string(unsigned long), to_string(long long))
>> (to_string(unsigned long long)): Rewrite to use __to_chars_10_impl.
>> * include/bits/charconv.h: New header.
>> (__detail::__to_chars_len): Move here from .
>> (__detail::__to_chars_10_impl): New function extracted from
>> __detail::__to_chars_10.
>> * include/std/charconv (__cpp_lib_to_chars): Add, but comment out.
>> (__to_chars_unsigned_type): New class template that reuses
>> __make_unsigned_selector_base::__select to pick a type.
>> (__unsigned_least_t): Redefine as __to_chars_unsigned_type::type.
>> (__detail::__to_chars_len): Move to new header.
>> (__detail::__to_chars_10): Add inline specifier. Move code doing the
>> output to __detail::__to_chars_10_impl and call that.
>> * include/std/version (__cpp_lib_to_chars): Add, but comment out.
>> * testsuite/21_strings/basic_string/numeric_conversions/char/
>> to_string.cc: Fix reference in comment. Remove unused variable.
>> * testsuite/21_strings/basic_string/numeric_conversions/char/
>> to_string_int.cc: New test.
>>
>> Tested x86_64-linux, committed to trunk.
>>
>
>The new test to_string_int.cc fails on arm-none-eabi:
>PASS: 21_strings/basic_string/numeric_conversions/char/to_string_int.cc
>(test for excess errors)
>spawn 
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/utils/bin/qemu-wrapper.sh
>./to_string_int.exe
>/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/to_string_int.cc:105:
>void check_value(T) [with T = long long int]: Assertion 's ==
>expected' failed.
>FAIL: 21_strings/basic_string/numeric_conversions/char/to_string_int.cc
>execution test

Does the target support "%lld" in printf?

Does the .sum file show UNSUPPORTED for the existing
21_strings/basic_string/numeric_conversions/char/to_string.cc test?


No, it PASSes.


Does the attached patch make the test show what fails?

I didn't try because I realized that it might be a problem with how I
configure newlib in my validation system.

After I added the same flags I use when I build manually,
to_string_int.cc now passes.

I added:
   --enable-newlib-io-pos-args \
   --enable-newlib-io-c99-formats \
   --enable-newlib-io-long-long \
   --enable-newlib-io-long-double \
to newlib's configure flags.


I assume it's io-c99-formats that makes "%lld" work.


However, this makes several other libstdc++ tests fail on aarch64:
   22_locale/money_get/get/char/5.cc execution test
   22_locale/money_get/get/wchar_t/5.cc execution test
   22_locale/num_get/get/char/39168.cc execution test
   22_locale/num_get/get/char/4.cc execution test
   22_locale/num_get/get/wchar_t/39168.cc execution test
   22_locale/num_get/get/wchar_t/4.cc execution test
   26_numerics/complex/inserters_extractors/char/1.cc execution test
   26_numerics/complex/inserters_extractors/wchar_t/1.cc execution test
   27_io/basic_istream/extractors_arithmetic/char/01.cc execution test
   27_io/basic_istream/extractors_arithmetic/char/12.cc execution test
   27_io/basic_istream/extractors_arithmetic/wchar_t/01.cc execution test
   27_io/basic_istream/extractors_arithmetic/wchar_t/12.cc execution test

but I guess that should be a separate bug report

I'll add those flags to my automated tests.


Thanks.



Re: [AArch64] Use scvtf fbits option where appropriate

2019-06-18 Thread Richard Sandiford
Wilco Dijkstra  writes:
>  > +/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of 
> a
>  > +   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for 
> x==(1/2^n)
>  > +   return n. Otherwise return -1.  */
>  > +int
>  > +aarch64_fpconst_pow2_recip (rtx x)
>  > +{
>  > +  REAL_VALUE_TYPE r0;
>  > +
>  > +  if (!CONST_DOUBLE_P (x))
>  > +    return -1;
>  
>> CONST_DOUBLE can be used for things other than floating point.  You
>> should really check that the mode on the double in is in class MODE_FLOAT.
>  
> Several other functions (eg aarch64_fpconst_pow_of_2) do the same since
> this function is only called with HF/SF/DF mode. We could add an assert for
> SCALAR_FLOAT_MODE_P (but then aarch64_fpconst_pow_of_2 should do
> the same).

IMO we should leave it as-is.  aarch64.h has:

#define TARGET_SUPPORTS_WIDE_INT 1

which makes it invalid to use CONST_DOUBLE for anything other than
floating-point constants.  The handling of CONST_DOUBLEs with integer
modes is effectively compiled out in key places so it would be very hard
to create one accidentally.  And even if somehow we did, it would fail
noisily in other ways.

So I think it would be redundant to assert that CONST_DOUBLE has a float
mode here, much like we (rightly) don't assert that CONST_VECTORs have
vector modes.

Thanks,
Richard


Re: [PATCH] PowerPC: Add 'prefix' to the 'isa' attribute

2019-06-18 Thread Segher Boessenkool
On Mon, Jun 17, 2019 at 08:04:42PM -0400, Michael Meissner wrote:
> --- gcc/config/rs6000/rs6000.md   (revision 272270)
> +++ gcc/config/rs6000/rs6000.md   (working copy)
> @@ -267,7 +267,9 @@ (define_attr "cpu"
>(const (symbol_ref "(enum attr_cpu) rs6000_tune")))
>  
>  ;; The ISA we implement.
> -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf" (const_string "any"))
> +(define_attr "isa"
> +  "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf,prefix"
> +  (const_string "any"))

Don't break the line unnecessarily.

"prefix" is not a good name.  Maybe use "futp" now, so it is in line with
the other isa values?  I.e. the "prefix" part of the "future" isa.  Or
"futx"?  Short names are important, keep it to 4 chars?

(The isa values are named after the processor that first implements them,
not the ISA versions, because 2.07 is much easier to get wrong than p8 is).


Segher


Re: [PATCH] True IPA reimplementation of IPA-SRA

2019-06-18 Thread Martin Liška
On 5/10/19 10:31 AM, Martin Jambor wrote:
> Thanks in advance for any questions, comments and suggestions,

Hi.

I have just a small note that I would appreciate a dbgcnt for the future.
I bet it will be useful for a test-case reduction.

Thank you,
Martin


Re: [PATCH] Replace std::to_string for integers with optimized version

2019-06-18 Thread Christophe Lyon
On Mon, 17 Jun 2019 at 15:21, Jonathan Wakely  wrote:
>
> On 13/06/19 22:41 +0200, Christophe Lyon wrote:
> >Hi,
> >
> >
> >On Wed, 12 Jun 2019 at 16:54, Jonathan Wakely  wrote:
> >>
> >> The std::to_chars functions from C++17 can be used to implement
> >> std::to_string with much better performance than calling snprintf. Only
> >> the __detail::__to_chars_len and __detail::__to_chars_10 functions are
> >> needed for to_string, because it always outputs base 10 representations.
> >>
> >> The return type of __detail::__to_chars_10 should not be declared before
> >> C++17, so the function body is extracted into a new function that can be
> >> reused by to_string and __detail::__to_chars_10.
> >>
> >> The existing tests for to_chars rely on to_string to check for correct
> >> answers. Now that they use the same code that doesn't actually ensure
> >> correctness, so add new tests for std::to_string that compare against
> >> printf output.
> >>
> >> * include/Makefile.am: Add new  header.
> >> * include/Makefile.in: Regenerate.
> >> * include/bits/basic_string.h (to_string(int), to_string(unsigned))
> >> (to_string(long), to_string(unsigned long), to_string(long long))
> >> (to_string(unsigned long long)): Rewrite to use __to_chars_10_impl.
> >> * include/bits/charconv.h: New header.
> >> (__detail::__to_chars_len): Move here from .
> >> (__detail::__to_chars_10_impl): New function extracted from
> >> __detail::__to_chars_10.
> >> * include/std/charconv (__cpp_lib_to_chars): Add, but comment out.
> >> (__to_chars_unsigned_type): New class template that reuses
> >> __make_unsigned_selector_base::__select to pick a type.
> >> (__unsigned_least_t): Redefine as 
> >> __to_chars_unsigned_type::type.
> >> (__detail::__to_chars_len): Move to new header.
> >> (__detail::__to_chars_10): Add inline specifier. Move code doing 
> >> the
> >> output to __detail::__to_chars_10_impl and call that.
> >> * include/std/version (__cpp_lib_to_chars): Add, but comment out.
> >> * testsuite/21_strings/basic_string/numeric_conversions/char/
> >> to_string.cc: Fix reference in comment. Remove unused variable.
> >> * testsuite/21_strings/basic_string/numeric_conversions/char/
> >> to_string_int.cc: New test.
> >>
> >> Tested x86_64-linux, committed to trunk.
> >>
> >
> >The new test to_string_int.cc fails on arm-none-eabi:
> >PASS: 21_strings/basic_string/numeric_conversions/char/to_string_int.cc
> >(test for excess errors)
> >spawn 
> >/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/utils/bin/qemu-wrapper.sh
> >./to_string_int.exe
> >/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/to_string_int.cc:105:
> >void check_value(T) [with T = long long int]: Assertion 's ==
> >expected' failed.
> >FAIL: 21_strings/basic_string/numeric_conversions/char/to_string_int.cc
> >execution test
>
> Does the target support "%lld" in printf?
>
> Does the .sum file show UNSUPPORTED for the existing
> 21_strings/basic_string/numeric_conversions/char/to_string.cc test?
>
No, it PASSes.

> Does the attached patch make the test show what fails?
I didn't try because I realized that it might be a problem with how I
configure newlib in my validation system.

After I added the same flags I use when I build manually,
to_string_int.cc now passes.

I added:
--enable-newlib-io-pos-args \
--enable-newlib-io-c99-formats \
--enable-newlib-io-long-long \
--enable-newlib-io-long-double \
to newlib's configure flags.

However, this makes several other libstdc++ tests fail on aarch64:
22_locale/money_get/get/char/5.cc execution test
22_locale/money_get/get/wchar_t/5.cc execution test
22_locale/num_get/get/char/39168.cc execution test
22_locale/num_get/get/char/4.cc execution test
22_locale/num_get/get/wchar_t/39168.cc execution test
22_locale/num_get/get/wchar_t/4.cc execution test
26_numerics/complex/inserters_extractors/char/1.cc execution test
26_numerics/complex/inserters_extractors/wchar_t/1.cc execution test
27_io/basic_istream/extractors_arithmetic/char/01.cc execution test
27_io/basic_istream/extractors_arithmetic/char/12.cc execution test
27_io/basic_istream/extractors_arithmetic/wchar_t/01.cc execution test
27_io/basic_istream/extractors_arithmetic/wchar_t/12.cc execution test

but I guess that should be a separate bug report

I'll add those flags to my automated tests.

Thanks,

Christophe

> I suspect the problem is that the test relies on snprintf to check the
> answers are correct, even though the actual library code doesn't need
> snprintf any longer. Previously the std::to_string functions were all
> guarded by _GLIBCXX_USE_C99_STDIO and so I'm guessing were not
> supported for arm-none-eabi. Now the overloads for integral types
> should work without any C library support, 

Re: [PATCH] Avoid undefined behaviour in std::byte operators (LWG 2950)

2019-06-18 Thread Jonathan Wakely

On 18/06/19 12:39 +0100, Jonathan Wakely wrote:

* include/c_global/cstddef (std::byte): Perform arithmetic operations
in unsigned int to avoid promotion (LWG 2950).

Tested x86_64-linux, committed to trunk.


I don't see any great need to backport this, because I don't think the
compiler or UBSan cares that the old definitions were technically
undefined.



[PATCH] Avoid undefined behaviour in std::byte operators (LWG 2950)

2019-06-18 Thread Jonathan Wakely

* include/c_global/cstddef (std::byte): Perform arithmetic operations
in unsigned int to avoid promotion (LWG 2950).

Tested x86_64-linux, committed to trunk.


commit bfa356b2a9353d1f0b7ccc38f3787d5a4f3044ae
Author: redi 
Date:   Tue Jun 18 11:39:43 2019 +

Avoid undefined behaviour in std::byte operators (LWG 2950)

* include/c_global/cstddef (std::byte): Perform arithmetic 
operations
in unsigned int to avoid promotion (LWG 2950).

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272415 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/c_global/cstddef 
b/libstdc++-v3/include/c_global/cstddef
index 8c779ec354d..c94c938f6f3 100644
--- a/libstdc++-v3/include/c_global/cstddef
+++ b/libstdc++-v3/include/c_global/cstddef
@@ -120,71 +120,53 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 using __byte_op_t = typename __byte_operand<_IntegerType>::__type;
 
-  template
-constexpr __byte_op_t<_IntegerType>&
-operator<<=(byte& __b, _IntegerType __shift) noexcept
-{ return __b = byte(static_cast(__b) << __shift); }
-
   template
 constexpr __byte_op_t<_IntegerType>
 operator<<(byte __b, _IntegerType __shift) noexcept
-{ return byte(static_cast(__b) << __shift); }
-
-  template
-constexpr __byte_op_t<_IntegerType>&
-operator>>=(byte& __b, _IntegerType __shift) noexcept
-{ return __b = byte(static_cast(__b) >> __shift); }
+{ return (byte)(unsigned char)((unsigned)__b << __shift); }
 
   template
 constexpr __byte_op_t<_IntegerType>
 operator>>(byte __b, _IntegerType __shift) noexcept
-{ return byte(static_cast(__b) >> __shift); }
-
-  constexpr byte&
-  operator|=(byte& __l, byte __r) noexcept
-  {
-return __l =
-  byte(static_cast(__l) | static_cast(__r));
-  }
+{ return (byte)(unsigned char)((unsigned)__b >> __shift); }
 
   constexpr byte
   operator|(byte __l, byte __r) noexcept
-  {
-return
-  byte(static_cast(__l) | static_cast(__r));
-  }
-
-  constexpr byte&
-  operator&=(byte& __l, byte __r) noexcept
-  {
-   return __l =
- byte(static_cast(__l) & static_cast(__r));
-  }
+  { return (byte)(unsigned char)((unsigned)__l | (unsigned)__r); }
 
   constexpr byte
   operator&(byte __l, byte __r) noexcept
-  {
-return
-  byte(static_cast(__l) & static_cast(__r));
-  }
-
-  constexpr byte&
-  operator^=(byte& __l, byte __r) noexcept
-  {
-return __l =
-  byte(static_cast(__l) ^ static_cast(__r));
-  }
+  { return (byte)(unsigned char)((unsigned)__l & (unsigned)__r); }
 
   constexpr byte
   operator^(byte __l, byte __r) noexcept
-  {
-return
-  byte(static_cast(__l) ^ static_cast(__r));
-  }
+  { return (byte)(unsigned char)((unsigned)__l ^ (unsigned)__r); }
 
   constexpr byte
   operator~(byte __b) noexcept
-  { return byte(~static_cast(__b)); }
+  { return (byte)(unsigned char)~(unsigned)__b; }
+
+  template
+constexpr __byte_op_t<_IntegerType>&
+operator<<=(byte& __b, _IntegerType __shift) noexcept
+{ return __b = __b << __shift; }
+
+  template
+constexpr __byte_op_t<_IntegerType>&
+operator>>=(byte& __b, _IntegerType __shift) noexcept
+{ return __b = __b >> __shift; }
+
+  constexpr byte&
+  operator|=(byte& __l, byte __r) noexcept
+  { return __l = __l | __r; }
+
+  constexpr byte&
+  operator&=(byte& __l, byte __r) noexcept
+  { return __l = __l & __r; }
+
+  constexpr byte&
+  operator^=(byte& __l, byte __r) noexcept
+  { return __l = __l ^ __r; }
 
   template
 constexpr _IntegerType


Re: [PATCH], PowerPC PR90822 (cleanup lfiwax, lfiwzx generation)

2019-06-18 Thread Segher Boessenkool
On Mon, Jun 17, 2019 at 05:24:37PM -0400, Michael Meissner wrote:
> I wrote the code to generate LFIWAX and LFIWZX originally for the power7 in 
> the
> 2010 time frame.  At the time, we did not allow SImode to go into floating
> point and vector registers.  As part of the power9 work, we now allow SImode 
> to
> go into FP/vector registers with for 64-bit code targetting -mcpu=power8 or
> higher.  But we never went back and tweaked the LFIWAX/LFIWZX support.

Why do we allow it only in 64-bit mode?  I mean, it sounds like only
handling 64-bit mode causes us to have more code and more complexity
instead of less.

> I was writing code for a possible future PowerPC machine, and the new code
> added an attribute that caused some of the -mno-vsx tests to fail.  This was
> due to the floatsi2_lfiwax and floatunssi2_lfiwzx patterns did not
> have a non-VSX alternative, and the attribute processing needed to process the
> alternatives before the first split pass.

I don't understand what you mean...  "attribute processing"?

> In general, the 32-bit code seems to generate a lot less instructions,
> including fewer lfiwax/lfiwzx instructions.  On power8/power9 32-bit code,
> there was more mtvsrwz mtvsrwa instructions.

Interesting.  Is that caused by less register pressure?

> --- gcc/config/rs6000/rs6000.md   (revision 272166)
> +++ gcc/config/rs6000/rs6000.md   (working copy)

This patch is very hard to read.  It mixes insertions and deletions of
different definitions, where the only thing they have in common is some
braces or parens or whitespace usually.

Maybe more context (-U) helps, maybe whole-function mode is better (-W),
maybe something else.  It also sometimes helps to do things as a patch
series instead of as one patch.  Please experiment.

> +; On 32-bit systems, we need to have special versions of LFIWAX and LFIWZX 
> because
> +; the sign/zero extend insns are not defined.

I don't understand what this means.

[ Deleted all "-" lines below, to make some sense of it. ]


> +(define_insn_and_split "lfiwax"

This could use a better name?  Why is it separate from extendsidi2 anyway?

> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,v,v")
> + (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,v,v")]
>  UNSPEC_LFIWAX))]
>"TARGET_HARD_FLOAT && TARGET_LFIWAX"
>"@
> lfiwax %0,%y1
> lxsiwax %x0,%y1
> mtvsrwa %x0,%1
> +   vextsw2d %0,%1
> +   #"
> +  "&& reload_completed && TARGET_P8_VECTOR && !TARGET_P9_VECTOR
> +   && altivec_register_operand (operands[1], SImode)"

"&& reload_completed && which_alternative == 3" works fine for that; but
just "&& reload_completed" should work as well, this is the only alternative
with "#" template.

> +  [(const_int 0)]
>  {
>rtx dest = operands[0];
>rtx src = operands[1];
> +  int dest_regno = REGNO (dest);
> +  int src_regno = REGNO (src);
> +  rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
> +  rtx src_v4si = gen_rtx_REG (V4SImode, src_regno);
>  
> +  if (BYTES_BIG_ENDIAN)
>  {
> +  emit_insn (gen_altivec_vupkhsw (dest_v2di, src_v4si));
> +  emit_insn (gen_vsx_xxspltd_v2di (dest_v2di, dest_v2di, const1_rtx));
> +  DONE;
>  }
>else
> +{
> +  emit_insn (gen_altivec_vupklsw (dest_v2di, src_v4si));
> +  emit_insn (gen_vsx_xxspltd_v2di (dest_v2di, dest_v2di, const0_rtx));
> +  DONE;
> +}
>  }
> +  [(set_attr "type" "fpload,fpload,mffgpr,vecexts,vecexts")
> +   (set_attr "isa" "*,p8v,p8v,p9v,p8v")
> +   (set_attr "length" "*,*,*,*,8")])


> +;; Keep the SImode -> DImode conversion along with DImode -> SF/DFmode 
> through
> +;; register allocation so that the register allocator generates a LFIWAX or
> +;; LXSIWAX instruction instead of a LWA instruction plus a MTVSRD* 
> instruction
> +;; on power8 and LWA + STD + LFD on power7/power6 systems.
> +
> +;; LFIWAX LFIWAX LXSIWAX MTVSRWA VEXTSW2D VUPKLSW+SPLAT

Not sure what this line means?

> +;; The first alternative is to support -mno-vsx and -mcpu=power6.
> +(define_insn_and_split "floatsi2_lfiwax"
> +  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa,wa,wa,wa,wa")
> + (float:SFDF
> +  (match_operand:SI 1 "nonimmediate_operand" "Z,Z,Z,r,v,v")))
> +   (clobber (match_scratch:DI 2 "=d,d,v,wa,v,v"))]
> +  "TARGET_HARD_FLOAT && TARGET_LFIWAX && "
>"#"
> +  "&& reload_completed"
> +  [(match_dup 3)
> +   (set (match_dup 0)
> + (float:SFDF (match_dup 2)))]
>  {
>rtx src = operands[1];
> +  rtx tmp = operands[2];
>  
> +  operands[3] = (TARGET_DIRECT_MOVE_64BIT
> +  ? gen_extendsidi2 (tmp, src)
> +  : gen_lfiwax (tmp, src));
>  }
> +  [(set_attr "length" "8,8,8,8,8,12")
> +   (set_attr "type" "fpload,fpload,fpload,mffgpr,fp,fp")
> +   (set_attr "isa" "*,p7v,p8v,p8v,p9v,p8v")])

So this says to convert a SI to SF or DF, first sign-extend it to DImode
and then do the conversion?

> +;; LFIWZX LXSIWZX MTVSRWZ XXEXTRACTUW
> +;; The first alternative is to support 

Re: [AArch64] Use scvtf fbits option where appropriate

2019-06-18 Thread Wilco Dijkstra
Hi,

And a few more comments:

 > +/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
 > +   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for 
 > x==(1/2^n)
 > +   return n. Otherwise return -1.  */
 > +int
 > +aarch64_fpconst_pow2_recip (rtx x)
 > +{
 > +  REAL_VALUE_TYPE r0;
 > +
 > +  if (!CONST_DOUBLE_P (x))
 > +    return -1;
 
> CONST_DOUBLE can be used for things other than floating point.  You
> should really check that the mode on the double in is in class MODE_FLOAT.
 
Several other functions (eg aarch64_fpconst_pow_of_2) do the same since
this function is only called with HF/SF/DF mode. We could add an assert for
SCALAR_FLOAT_MODE_P (but then aarch64_fpconst_pow_of_2 should do
the same).

 > +
 > +  r0 = *CONST_DOUBLE_REAL_VALUE (x);
 > +  if (exact_real_inverse (DFmode, )
 > +  && !REAL_VALUE_NEGATIVE (r0))
 > +    {
 > + int ret = exact_log2 (real_to_integer ());
 > + if (ret >= 1 && ret <= 31)
 > +   {
 > + return ret;
 > +   }

Redundant braces

 > + else
 > +   {
 > + return -1;
 > +   }

The else is redundant because...

 > +    }
 > +  return -1;

... of this.

 > +}
 > +
 >  /* If X is a vector of equal CONST_DOUBLE values and that value is
 > Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
 >  
 > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 > index 526c7fb0dab..d9812aa238e 100644
 > --- a/gcc/config/aarch64/aarch64.md
 > +++ b/gcc/config/aarch64/aarch64.md
 > @@ -6016,6 +6016,44 @@
 >    [(set_attr "type" "f_cvtf2i")]
 >  )
 >  
 > +;; equal width integer to fp combine
 > +(define_insn "*aarch64_cvtf__2_mult"
 > +  [(set (match_operand:GPF 0 "register_operand" "=w,w")
 > + (mult:GPF (FLOATUORS:GPF
 > +    (match_operand: 1 "register_operand" "w,?r"))
 > +    (match_operand 2 "aarch64_fp_pow2_recip""Dt,Dt")))]
 
 > Missing mode on operand 2.  Missing white space between constraint and
 > predicate.

Yes, operand 2 should use GPF as well (odd this doesn't give a warning at 
least).

Also the indentation is off - the multiply operands should be indented to the
same level - match operand 1 should be indented more to the right.

Wilco

[RFC] Re: [RFC] operand_equal_p with valueization

2019-06-18 Thread Martin Liška
Hi.

It's quite some time the discussion has started. Now is time for me to refresh 
IPA ICF
and I would like integrate operand_equal_p with what I currently have in ICF 
(::compare_operand).
I like the idea of a class that will provide both operand_equal_valueize and 
hash_operand_valueize.
These will be implemented in func_checker and can provide basically the same 
what Honza suggested.

Reading the thread, I noticed Richi would prefer to use something like:

template 
int
operand_equal_p_1 (const_tree arg0, const_tree arg1, unsigned int flags,
   tree (*valueize)(tree))
{
#define VALUEIZE(op) (with_valueize && valueize) ? valueize (op) : op 
...
}

To be honest, it looks to me only as an optimization which will fold call
to valueize in current operand_equal_p.

I'm sending a slightly tested pair of patches which does the abstraction
factoring and ICF adaptation.

I would expect a feedback before I'll prepare a proper fix.
Thanks,
Martin
>From b864e44e14a86e9cc7ba494b7af687a0a3e74896 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 10 Jun 2019 14:34:15 +0200
Subject: [PATCH 2/2] Integrate that for IPA ICF.

---
 gcc/ipa-icf-gimple.c | 224 +--
 gcc/ipa-icf-gimple.h |   8 +-
 gcc/ipa-icf.c|   7 +-
 3 files changed, 79 insertions(+), 160 deletions(-)

diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index 0713e125898..6da3e19e317 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -324,6 +324,28 @@ func_checker::compare_memory_operand (tree t1, tree t2)
 /* Function compare for equality given trees T1 and T2 which
can be either a constant or a declaration type.  */
 
+bool
+func_checker::hash_operand_valueize (const_tree arg, inchash::hash ,
+ unsigned int flags)
+{
+  switch (TREE_CODE (arg))
+{
+case FUNCTION_DECL:
+case VAR_DECL:
+case LABEL_DECL:
+case PARM_DECL:
+case RESULT_DECL:
+case CONST_DECL:
+case SSA_NAME:
+  return true;
+
+default:
+  break;
+}
+
+  return false;
+}
+
 bool
 func_checker::compare_cst_or_decl (tree t1, tree t2)
 {
@@ -347,19 +369,6 @@ func_checker::compare_cst_or_decl (tree t1, tree t2)
   return true;
 case VAR_DECL:
   return return_with_debug (compare_variable_decl (t1, t2));
-case FIELD_DECL:
-  {
-	tree offset1 = DECL_FIELD_OFFSET (t1);
-	tree offset2 = DECL_FIELD_OFFSET (t2);
-
-	tree bit_offset1 = DECL_FIELD_BIT_OFFSET (t1);
-	tree bit_offset2 = DECL_FIELD_BIT_OFFSET (t2);
-
-	ret = compare_operand (offset1, offset2)
-	  && compare_operand (bit_offset1, bit_offset2);
-
-	return return_with_debug (ret);
-  }
 case LABEL_DECL:
   {
 	if (t1 == t2)
@@ -383,165 +392,66 @@ func_checker::compare_cst_or_decl (tree t1, tree t2)
 }
 }
 
-/* Function responsible for comparison of various operands T1 and T2.
-   If these components, from functions FUNC1 and FUNC2, are equal, true
-   is returned.  */
-
-bool
-func_checker::compare_operand (tree t1, tree t2)
+int
+func_checker::operand_equal_valueize (const_tree ct1, const_tree ct2, unsigned int)
 {
-  tree x1, x2, y1, y2, z1, z2;
-  bool ret;
-
-  if (!t1 && !t2)
-return true;
-  else if (!t1 || !t2)
-return false;
-
-  tree tt1 = TREE_TYPE (t1);
-  tree tt2 = TREE_TYPE (t2);
-
-  if (!func_checker::compatible_types_p (tt1, tt2))
-return false;
-
-  if (TREE_CODE (t1) != TREE_CODE (t2))
-return return_false ();
+  tree t1 = const_cast  (ct1);
+  tree t2 = const_cast  (ct2);
 
   switch (TREE_CODE (t1))
 {
-case CONSTRUCTOR:
+case FUNCTION_DECL:
+  /* All function decls are in the symbol table and known to match
+	 before we start comparing bodies.  */
+  return true;
+case VAR_DECL:
+  return return_with_debug (compare_variable_decl (t1, t2));
+case LABEL_DECL:
   {
-	unsigned length1 = CONSTRUCTOR_NELTS (t1);
-	unsigned length2 = CONSTRUCTOR_NELTS (t2);
-
-	if (length1 != length2)
-	  return return_false ();
-
-	for (unsigned i = 0; i < length1; i++)
-	  if (!compare_operand (CONSTRUCTOR_ELT (t1, i)->value,
-CONSTRUCTOR_ELT (t2, i)->value))
-	return return_false();
-
-	return true;
+	int *bb1 = m_label_bb_map.get (t1);
+	int *bb2 = m_label_bb_map.get (t2);
+	return return_with_debug (*bb1 == *bb2);
   }
-case ARRAY_REF:
-case ARRAY_RANGE_REF:
-  /* First argument is the array, second is the index.  */
-  x1 = TREE_OPERAND (t1, 0);
-  x2 = TREE_OPERAND (t2, 0);
-  y1 = TREE_OPERAND (t1, 1);
-  y2 = TREE_OPERAND (t2, 1);
-
-  if (!compare_operand (array_ref_low_bound (t1),
-			array_ref_low_bound (t2)))
-	return return_false_with_msg ("");
-  if (!compare_operand (array_ref_element_size (t1),
-			array_ref_element_size (t2)))
-	return return_false_with_msg ("");
-
-  if (!compare_operand (x1, x2))
-	return return_false_with_msg ("");
-  return compare_operand (y1, y2);
-case MEM_REF:
-  {
-	x1 = TREE_OPERAND (t1, 0);
-	

Re: Review Hashtable extract node API

2019-06-18 Thread Jonathan Wakely

On 18/06/19 07:52 +0200, François Dumont wrote:

A small regression noticed while merging.

We shouldn't keep on using a moved-from key_type instance.

Ok to commit ? Feel free to do it if you prefer, I'll do so at end of 
Europe day otherwise.




diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index f5809c7443a..7e89e1b44c4 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -743,7 +743,8 @@ namespace __detail
std::tuple<>()
  };
  auto __pos
-   = __h->_M_insert_unique_node(__k, __bkt, __code, __node._M_node);
+   = __h->_M_insert_unique_node(__h->_M_extract()(__node._M_node->_M_v()),
+__bkt, __code, __node._M_node);
  __node._M_node = nullptr;
  return __pos->second;
}


I can't create an example where this causes a problem, because the key
passed to _M_insert_unique_node is never used. So it doesn't matter
that it's been moved from.

So I have to wonder why we just added the key parameter to that
function, if it's never used.

As far as I can tell, it would only be used for a non-default range
hash function, and I don't care about that. Frankly I find the
policy-based _Hashtable completely unmaintainable and I'd gladly get
rid of all of it that isn't needed for the std::unordered_xxx
containers. The non-standard extensions are not used by anybody,
apparently not tested properly (or this regression should have been
noticed) and make the code too complicated.

We're adding new parameters that have to be passed around even though
they're never used by 99.999% of users. No wonder the code is only
fast at -O3.






Re: [PATCH] Handle '\0' in strcmp in RTL expansion (PR tree-optimization/90892).

2019-06-18 Thread Martin Liška
On 6/18/19 12:33 PM, Jakub Jelinek wrote:
> On Tue, Jun 18, 2019 at 12:27:31PM +0200, Martin Liška wrote:
>>> Oops.  The problematic case is then if the STRING_CST c_getstr finds
>>> is not NUL terminated (dunno if we ever construct that) or if
>>> string_size is smaller than string_length and there are no NULs in that
>>> size.
>>
>> The function always returns a null-terminated string:
>>
>>  14587  /* Return a pointer P to a NUL-terminated string representing the 
>> sequence
>>  14588 of constant characters referred to by SRC (or a subsequence of 
>> such
>>  14589 characters within it if SRC is a reference to a string plus some
>>  14590 constant offset).  If STRLEN is non-null, store the number of 
>> bytes
>>  14591 in the string constant including the terminating NUL char.  
>> *STRLEN is
>>  14592 typically strlen(P) + 1 in the absence of embedded NUL 
>> characters.  */
>>  14593  
>>  14594  const char *
>>  14595  c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */)
>>  14596  {
>>  14597tree offset_node;
>>  14598tree mem_size;
>>
>> That said, the unconditional strnlen should be fine.
> 
> But *strlen it sets might be smaller.

No, for "A\0" you'll get *strlen == 3, but strlen (returned value) == 1.

> 
> I'd try say const char foo[5] = "foobar";
> or similar, or say stick gcc_assert in c_getstr where it is setting
> *strlen and gcc_assert (strnlen (to be returned value, *strlen) < *strlen);
> do a bootstrap/regtest with that and see if it ever triggers (or instead
> of assert failure log into a log file with "a" mode).
> 
> If not, there is no point to pass non-NULL second argument to c_getstr,
> you'd always just use strlen on the returned string.

There might be consumers (like folding of memcmp) of c_getstr which want to 
know how long is a string constant
even thought there's a null-terminating character in the middle of the constant.

Martin

> 
>   Jakub
> 



Re: [PATCH] aarch64: fix gcc.target/aarch64/pcs_attribute-2.c on non-gnu targets

2019-06-18 Thread Richard Earnshaw (lists)
On 07/06/2019 17:03, Szabolcs Nagy wrote:
> Move the ifunc symbol tests into a separate file with dg-require-ifunc.
> And added a base pcs ifunc symbol to the test for completeness.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-06-07  Szabolcs Nagy  
> 
>   * gcc.target/aarch64/pcs_attribute-2.c: Remove ifunc usage.
>   * gcc.target/aarch64/pcs_attribute-3.c: New test.
> 
> 

OK.

R.

> vpcsfix.diff
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/pcs_attribute-2.c 
> b/gcc/testsuite/gcc.target/aarch64/pcs_attribute-2.c
> index d997f52921c..e85465f25fb 100644
> --- a/gcc/testsuite/gcc.target/aarch64/pcs_attribute-2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/pcs_attribute-2.c
> @@ -53,21 +53,6 @@ ATTR void bar_def_vpcs (void)
>  {
>  }
>  
> -static void (*f_ifunc_resolver ()) (void)
> -{
> -  return (void (*)(void))f_local_vpcs;
> -}
> -
> -__attribute__ ((ifunc ("f_ifunc_resolver")))
> -ATTR void f_ifunc_vpcs (void);
> -
> -__attribute__ ((visibility ("hidden")))
> -__attribute__ ((ifunc ("f_ifunc_resolver")))
> -ATTR void f_hidden_ifunc_vpcs (void);
> -
> -__attribute__ ((ifunc ("f_ifunc_resolver")))
> -ATTR static void f_local_ifunc_vpcs (void);
> -
>  void (*refs_basepcs[]) (void) = {
>   f_undef_basepcs,
>   f_def_basepcs,
> @@ -86,9 +71,6 @@ void (*ATTR refs_vpcs[]) (void) = {
>   f_local_weakref_def_vpcs,
>   bar_undef_vpcs,
>   bar_def_vpcs,
> - f_ifunc_vpcs,
> - f_hidden_ifunc_vpcs,
> - f_local_ifunc_vpcs,
>  };
>  
>  /* Note: local symbols don't need .variant_pcs, but gcc generates it, so
> @@ -109,6 +91,3 @@ void (*ATTR refs_vpcs[]) (void) = {
>  /* { dg-final { scan-assembler-times 
> {\.variant_pcs\tf_local_weakref_def_vpcs} 1 } } */
>  /* { dg-final { scan-assembler-times {\.variant_pcs\tf_undef_renamed_vpcs} 1 
> } } */
>  /* { dg-final { scan-assembler-times {\.variant_pcs\tf_def_renamed_vpcs} 1 } 
> } */
> -/* { dg-final { scan-assembler-times {\.variant_pcs\tf_ifunc_vpcs} 1 } } */
> -/* { dg-final { scan-assembler-times {\.variant_pcs\tf_hidden_ifunc_vpcs} 1 
> } } */
> -/* { dg-final { scan-assembler-times {\.variant_pcs\tf_local_ifunc_vpcs} 1 } 
> } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/pcs_attribute-3.c 
> b/gcc/testsuite/gcc.target/aarch64/pcs_attribute-3.c
> new file mode 100644
> index 000..8e306af660f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pcs_attribute-3.c
> @@ -0,0 +1,58 @@
> +/* { dg-do compile } */
> +/* { dg-require-ifunc "" } */
> +/* { dg-require-effective-target aarch64_variant_pcs } */
> +
> +/* Test that .variant_pcs is emitted for vector PCS symbol references.  */
> +
> +#define ATTR __attribute__ ((aarch64_vector_pcs))
> +
> +static void f_local_basepcs (void)
> +{
> +}
> +
> +static void (*f_ifunc_basepcs_resolver ()) (void)
> +{
> +  return (void (*)(void))f_local_basepcs;
> +}
> +
> +__attribute__ ((ifunc ("f_ifunc_basepcs_resolver")))
> +void f_ifunc_basepcs (void);
> +
> +ATTR static void f_local_vpcs (void)
> +{
> +}
> +
> +static void (*f_ifunc_vpcs_resolver ()) (void)
> +{
> +  return (void (*)(void))f_local_vpcs;
> +}
> +
> +__attribute__ ((ifunc ("f_ifunc_vpcs_resolver")))
> +ATTR void f_ifunc_vpcs (void);
> +
> +__attribute__ ((visibility ("hidden")))
> +__attribute__ ((ifunc ("f_ifunc_vpcs_resolver")))
> +ATTR void f_hidden_ifunc_vpcs (void);
> +
> +__attribute__ ((ifunc ("f_ifunc_vpcs_resolver")))
> +ATTR static void f_local_ifunc_vpcs (void);
> +
> +void (*refs_basepcs[]) (void) = {
> + f_ifunc_basepcs,
> +};
> +
> +void (*ATTR refs_vpcs[]) (void) = {
> + f_ifunc_vpcs,
> + f_hidden_ifunc_vpcs,
> + f_local_ifunc_vpcs,
> +};
> +
> +/* Note: local symbols don't need .variant_pcs, but gcc generates it, so
> +   we check them here.  */
> +
> +/* { dg-final { scan-assembler-not {\.variant_pcs\tf_local_basepcs} } } */
> +/* { dg-final { scan-assembler-not {\.variant_pcs\tf_ifunc_basepcs} } } */
> +/* { dg-final { scan-assembler-times {\.variant_pcs\tf_local_vpcs} 1 } } */
> +/* { dg-final { scan-assembler-times {\.variant_pcs\tf_ifunc_vpcs} 1 } } */
> +/* { dg-final { scan-assembler-times {\.variant_pcs\tf_hidden_ifunc_vpcs} 1 
> } } */
> +/* { dg-final { scan-assembler-times {\.variant_pcs\tf_local_ifunc_vpcs} 1 } 
> } */
> 



Re: [AArch64] Use scvtf fbits option where appropriate

2019-06-18 Thread Richard Earnshaw (lists)
On 18/06/2019 10:11, Joel Hutton wrote:
> Hi,
> 
> On 13/06/2019 18:26, Wilco Dijkstra wrote:
>> Wouldn't it be easier to just do exact_log2 (real_to_integer ())
>> and then check the range is in 1..31?
> I've revised this section.
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -6016,6 +6016,40 @@
>> [(set_attr "type" "f_cvtf2i")]
>>   )
>>   
>> +(define_insn "*aarch64_cvtf__2_mult"
>> +  [(set (match_operand:GPF 0 "register_operand" "=w,w")
>> +(mult:GPF (FLOATUORS:GPF
>> +   (match_operand: 1 "register_operand" "w,?r"))
>> +   (match_operand 2 "aarch64_fp_pow2_recip""Dt,Dt")))]
>>
>> We should add a comment before both define_insn similar to the other
>> conversions, explaining what they do and why there are 2 separate patterns
>> (the default versions of the conversions appear to be missing a comment too).
> I've added comments to the new and existing patterns
> 
> 
> 0001-SCVTF-fbits.patch
> 
> From 5a9dfa6c6eb1c5b9c8c464780b7098058989d472 Mon Sep 17 00:00:00 2001
> From: Joel Hutton 
> Date: Thu, 13 Jun 2019 11:08:56 +0100
> Subject: [PATCH] SCVTF fbits
> 
> ---
>  gcc/config/aarch64/aarch64-protos.h   |   1 +
>  gcc/config/aarch64/aarch64.c  |  28 
>  gcc/config/aarch64/aarch64.md |  39 +
>  gcc/config/aarch64/constraints.md |   7 +
>  gcc/config/aarch64/predicates.md  |   4 +
>  gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c | 140 ++
>  6 files changed, 219 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 1e3b1c91db1..ad1ba458a3f 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -494,6 +494,7 @@ enum aarch64_symbol_type aarch64_classify_tls_symbol 
> (rtx);
>  enum reg_class aarch64_regno_regclass (unsigned);
>  int aarch64_asm_preferred_eh_data_format (int, int);
>  int aarch64_fpconst_pow_of_2 (rtx);
> +int aarch64_fpconst_pow2_recip (rtx);
>  machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
>  machine_mode);
>  int aarch64_uxt_size (int, HOST_WIDE_INT);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 9a035dd9ed8..424ca6c9932 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -18707,6 +18707,34 @@ aarch64_fpconst_pow_of_2 (rtx x)
>return exact_log2 (real_to_integer (r));
>  }
>  
> +/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
> +   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for 
> x==(1/2^n)
> +   return n. Otherwise return -1.  */
> +int
> +aarch64_fpconst_pow2_recip (rtx x)
> +{
> +  REAL_VALUE_TYPE r0;
> +
> +  if (!CONST_DOUBLE_P (x))
> +return -1;

CONST_DOUBLE can be used for things other than floating point.  You
should really check that the mode on the double in is in class MODE_FLOAT.

> +
> +  r0 = *CONST_DOUBLE_REAL_VALUE (x);
> +  if (exact_real_inverse (DFmode, )
> +  && !REAL_VALUE_NEGATIVE (r0))
> +{
> + int ret = exact_log2 (real_to_integer ());
> + if (ret >= 1 && ret <= 31)
> +   {
> + return ret;
> +   }
> + else
> +   {
> + return -1;
> +   }
> +}
> +  return -1;
> +}
> +
>  /* If X is a vector of equal CONST_DOUBLE values and that value is
> Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
>  
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 526c7fb0dab..d9812aa238e 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -6016,6 +6016,44 @@
>[(set_attr "type" "f_cvtf2i")]
>  )
>  
> +;; equal width integer to fp combine
> +(define_insn "*aarch64_cvtf__2_mult"
> +  [(set (match_operand:GPF 0 "register_operand" "=w,w")
> + (mult:GPF (FLOATUORS:GPF
> +(match_operand: 1 "register_operand" "w,?r"))
> +(match_operand 2 "aarch64_fp_pow2_recip""Dt,Dt")))]

Missing mode on operand 2.  Missing white space between constraint and
predicate.

> +  "TARGET_FLOAT"
> +  {
> +operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
> +switch (which_alternative)
> +{
> +  case 0:
> + return "cvtf\t%0, %1, #%2";
> +  case 1:
> + return "cvtf\t%0, %1, #%2";
> +  default:
> + gcc_unreachable();
> +}
> +  }
> +  [(set_attr "type" "neon_int_to_fp_,f_cvti2f")
> +   (set_attr "arch" "simd,fp")]
> +)
> +
> +;; inequal width integer to fp combine
> +(define_insn "*aarch64_cvtf__2_mult"
> +  [(set (match_operand:GPF 0 "register_operand" "=w")
> + (mult:GPF (FLOATUORS:GPF
> +(match_operand: 1 "register_operand" "r"))
> +(match_operand 2 "aarch64_fp_pow2_recip" "Dt")))]

Likewise.

> +  "TARGET_FLOAT"

Re: [PATCH] Handle '\0' in strcmp in RTL expansion (PR tree-optimization/90892).

2019-06-18 Thread Jakub Jelinek
On Tue, Jun 18, 2019 at 12:27:31PM +0200, Martin Liška wrote:
> > Oops.  The problematic case is then if the STRING_CST c_getstr finds
> > is not NUL terminated (dunno if we ever construct that) or if
> > string_size is smaller than string_length and there are no NULs in that
> > size.
> 
> The function always returns a null-terminated string:
> 
>  14587  /* Return a pointer P to a NUL-terminated string representing the 
> sequence
>  14588 of constant characters referred to by SRC (or a subsequence of such
>  14589 characters within it if SRC is a reference to a string plus some
>  14590 constant offset).  If STRLEN is non-null, store the number of bytes
>  14591 in the string constant including the terminating NUL char.  
> *STRLEN is
>  14592 typically strlen(P) + 1 in the absence of embedded NUL characters. 
>  */
>  14593  
>  14594  const char *
>  14595  c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */)
>  14596  {
>  14597tree offset_node;
>  14598tree mem_size;
> 
> That said, the unconditional strnlen should be fine.

But *strlen it sets might be smaller.

I'd try say const char foo[5] = "foobar";
or similar, or say stick gcc_assert in c_getstr where it is setting
*strlen and gcc_assert (strnlen (to be returned value, *strlen) < *strlen);
do a bootstrap/regtest with that and see if it ever triggers (or instead
of assert failure log into a log file with "a" mode).

If not, there is no point to pass non-NULL second argument to c_getstr,
you'd always just use strlen on the returned string.

Jakub


Re: [PATCH] Handle '\0' in strcmp in RTL expansion (PR tree-optimization/90892).

2019-06-18 Thread Martin Liška
On 6/18/19 12:16 PM, Jakub Jelinek wrote:
> On Tue, Jun 18, 2019 at 11:56:41AM +0200, Martin Liška wrote:
>> On 6/18/19 10:23 AM, Martin Liška wrote:
>>> On 6/18/19 10:11 AM, Jakub Jelinek wrote:
 On Tue, Jun 18, 2019 at 10:07:50AM +0200, Martin Liška wrote:
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 3463ffb1539..b58e1e58d4d 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -7142,6 +7142,20 @@ inline_expand_builtin_string_cmp (tree exp, rtx 
> target)
>const char *src_str1 = c_getstr (arg1, );
>const char *src_str2 = c_getstr (arg2, );
>  
> +  if (src_str1 != NULL)
> +{
> +  unsigned HOST_WIDE_INT str_str1_strlen = strnlen (src_str1, len1);
> +  if (str_str1_strlen + 1 < len1)
> + len1 = str_str1_strlen + 1;

 You really don't need any of this after strnlen.  strnlen is already
 guaranteed to return a number from 0 to len1 inclusive, so you can really
 just do:
   if (src_str1 != NULL)
 len1 = strnlen (src_str1, len1);

Jakub

>>>
>>> Got it, I'm testing that.
>>>
>>> Martin
>>>
>>
>> Ok, there's an off-by-one error in the previous patch candidate.
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
>> Thanks,
>> Martin
> 
>> >From fe4ef7d43c506f450de802a7d8a602fab7da4545 Mon Sep 17 00:00:00 2001
>> From: Martin Liska 
>> Date: Mon, 17 Jun 2019 10:39:15 +0200
>> Subject: [PATCH] Handle '\0' in strcmp in RTL expansion (PR
>>  tree-optimization/90892).
>>
>> gcc/ChangeLog:
>>
>> 2019-06-17  Martin Liska  
>>
>>  PR tree-optimization/90892
>>  * builtins.c (inline_expand_builtin_string_cmp): Handle '\0'
>>  in string constants.
> 
> Oops.  The problematic case is then if the STRING_CST c_getstr finds
> is not NUL terminated (dunno if we ever construct that) or if
> string_size is smaller than string_length and there are no NULs in that
> size.

The function always returns a null-terminated string:

 14587  /* Return a pointer P to a NUL-terminated string representing the 
sequence
 14588 of constant characters referred to by SRC (or a subsequence of such
 14589 characters within it if SRC is a reference to a string plus some
 14590 constant offset).  If STRLEN is non-null, store the number of bytes
 14591 in the string constant including the terminating NUL char.  *STRLEN 
is
 14592 typically strlen(P) + 1 in the absence of embedded NUL characters.  
*/
 14593  
 14594  const char *
 14595  c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */)
 14596  {
 14597tree offset_node;
 14598tree mem_size;

That said, the unconditional strnlen should be fine.

Martin

> With your patch that would mean setting len1 or len2 one larger than needed.
> 
> Looking at the function, I think we want to do more changes.
> 
> I think any such length changes should be moved after the two punt checks.
> Move also the len3 setting before the new checks (of course conditional on
> is_ncmp).
> Sorry for the earlier advice, you indeed should store the strnlen result
> into a different variable, and through that you can easily differentiate 
> between
> when the string is NUL terminated and when it is not (if strnlen < lenN,
> then NUL terminated).
> If !is_ncmp, I think you should punt for not NUL terminated strings
> (return NULL_RTX).
> If is_ncmp and not NUL terminated, you should punt if len3 is bigger than
> len{1,2}.
> If NUL terminated, you should set lenN to the strnlen returned value + 1.
> 
> Does this sound reasonable?
> 
>   Jakub
> 



Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-18 Thread Martin Liška
On 6/18/19 3:45 AM, Xiong Hu Luo wrote:
> 6.2.  SPEC2017 peakrate:
> 523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r 
> (+13.33%);
> 525.x264_r (-5.29%).

Can you please elaborate what are the key indirect call promotions that are 
needed
to achieve such a significant speed up? Are we talking about calls to virtual 
functions
or C-style indirect calls?

Thanks,
Martin


Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-18 Thread Martin Liška
On 6/18/19 12:07 PM, Segher Boessenkool wrote:
> On Tue, Jun 18, 2019 at 11:34:03AM +0200, Martin Liška wrote:
>> I've got it. So it's situation where you have distribution equal to 50% and 
>> 50%. Note that it's
>> the only valid situation where both edges with be >= 50%. That's the 
>> threshold for which
>> we speculatively devirtualize edges.
> 
> But that 50% is a magic number, isn't it?

Yes :) Apparently LLVM does that for probability >= 30%:
https://code.woboq.org/llvm/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp.html#36

>  Maybe 20% works better, and
> then you need a top5 (in the worst case).

I would then generalize to N, for now I'm waiting for Honza.

Martin

> 
> 
> Segher
> 



Re: [PATCH] Handle '\0' in strcmp in RTL expansion (PR tree-optimization/90892).

2019-06-18 Thread Jakub Jelinek
On Tue, Jun 18, 2019 at 11:56:41AM +0200, Martin Liška wrote:
> On 6/18/19 10:23 AM, Martin Liška wrote:
> > On 6/18/19 10:11 AM, Jakub Jelinek wrote:
> >> On Tue, Jun 18, 2019 at 10:07:50AM +0200, Martin Liška wrote:
> >>> diff --git a/gcc/builtins.c b/gcc/builtins.c
> >>> index 3463ffb1539..b58e1e58d4d 100644
> >>> --- a/gcc/builtins.c
> >>> +++ b/gcc/builtins.c
> >>> @@ -7142,6 +7142,20 @@ inline_expand_builtin_string_cmp (tree exp, rtx 
> >>> target)
> >>>const char *src_str1 = c_getstr (arg1, );
> >>>const char *src_str2 = c_getstr (arg2, );
> >>>  
> >>> +  if (src_str1 != NULL)
> >>> +{
> >>> +  unsigned HOST_WIDE_INT str_str1_strlen = strnlen (src_str1, len1);
> >>> +  if (str_str1_strlen + 1 < len1)
> >>> + len1 = str_str1_strlen + 1;
> >>
> >> You really don't need any of this after strnlen.  strnlen is already
> >> guaranteed to return a number from 0 to len1 inclusive, so you can really
> >> just do:
> >>   if (src_str1 != NULL)
> >> len1 = strnlen (src_str1, len1);
> >>
> >>Jakub
> >>
> > 
> > Got it, I'm testing that.
> > 
> > Martin
> > 
> 
> Ok, there's an off-by-one error in the previous patch candidate.
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin

> >From fe4ef7d43c506f450de802a7d8a602fab7da4545 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Mon, 17 Jun 2019 10:39:15 +0200
> Subject: [PATCH] Handle '\0' in strcmp in RTL expansion (PR
>  tree-optimization/90892).
> 
> gcc/ChangeLog:
> 
> 2019-06-17  Martin Liska  
> 
>   PR tree-optimization/90892
>   * builtins.c (inline_expand_builtin_string_cmp): Handle '\0'
>   in string constants.

Oops.  The problematic case is then if the STRING_CST c_getstr finds
is not NUL terminated (dunno if we ever construct that) or if
string_size is smaller than string_length and there are no NULs in that
size.
With your patch that would mean setting len1 or len2 one larger than needed.

Looking at the function, I think we want to do more changes.

I think any such length changes should be moved after the two punt checks.
Move also the len3 setting before the new checks (of course conditional on
is_ncmp).
Sorry for the earlier advice, you indeed should store the strnlen result
into a different variable, and through that you can easily differentiate between
when the string is NUL terminated and when it is not (if strnlen < lenN,
then NUL terminated).
If !is_ncmp, I think you should punt for not NUL terminated strings
(return NULL_RTX).
If is_ncmp and not NUL terminated, you should punt if len3 is bigger than
len{1,2}.
If NUL terminated, you should set lenN to the strnlen returned value + 1.

Does this sound reasonable?

Jakub


Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-18 Thread Segher Boessenkool
On Tue, Jun 18, 2019 at 11:34:03AM +0200, Martin Liška wrote:
> I've got it. So it's situation where you have distribution equal to 50% and 
> 50%. Note that it's
> the only valid situation where both edges with be >= 50%. That's the 
> threshold for which
> we speculatively devirtualize edges.

But that 50% is a magic number, isn't it?  Maybe 20% works better, and
then you need a top5 (in the worst case).


Segher


Re: [PATCH] Handle '\0' in strcmp in RTL expansion (PR tree-optimization/90892).

2019-06-18 Thread Martin Liška
On 6/18/19 10:23 AM, Martin Liška wrote:
> On 6/18/19 10:11 AM, Jakub Jelinek wrote:
>> On Tue, Jun 18, 2019 at 10:07:50AM +0200, Martin Liška wrote:
>>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>>> index 3463ffb1539..b58e1e58d4d 100644
>>> --- a/gcc/builtins.c
>>> +++ b/gcc/builtins.c
>>> @@ -7142,6 +7142,20 @@ inline_expand_builtin_string_cmp (tree exp, rtx 
>>> target)
>>>const char *src_str1 = c_getstr (arg1, );
>>>const char *src_str2 = c_getstr (arg2, );
>>>  
>>> +  if (src_str1 != NULL)
>>> +{
>>> +  unsigned HOST_WIDE_INT str_str1_strlen = strnlen (src_str1, len1);
>>> +  if (str_str1_strlen + 1 < len1)
>>> +   len1 = str_str1_strlen + 1;
>>
>> You really don't need any of this after strnlen.  strnlen is already
>> guaranteed to return a number from 0 to len1 inclusive, so you can really
>> just do:
>>   if (src_str1 != NULL)
>> len1 = strnlen (src_str1, len1);
>>
>>  Jakub
>>
> 
> Got it, I'm testing that.
> 
> Martin
> 

Ok, there's an off-by-one error in the previous patch candidate.
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From fe4ef7d43c506f450de802a7d8a602fab7da4545 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 17 Jun 2019 10:39:15 +0200
Subject: [PATCH] Handle '\0' in strcmp in RTL expansion (PR
 tree-optimization/90892).

gcc/ChangeLog:

2019-06-17  Martin Liska  

	PR tree-optimization/90892
	* builtins.c (inline_expand_builtin_string_cmp): Handle '\0'
	in string constants.

gcc/testsuite/ChangeLog:

2019-06-17  Martin Liska  

	PR tree-optimization/90892
	* gcc.dg/pr90892.c: New test.
---
 gcc/builtins.c |  6 ++
 gcc/testsuite/gcc.dg/pr90892.c | 14 ++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr90892.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3463ffb1539..78a4bec9bd0 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7142,6 +7142,12 @@ inline_expand_builtin_string_cmp (tree exp, rtx target)
   const char *src_str1 = c_getstr (arg1, );
   const char *src_str2 = c_getstr (arg2, );
 
+  if (src_str1 != NULL)
+len1 = strnlen (src_str1, len1) + 1;
+
+  if (src_str2 != NULL)
+len2 = strnlen (src_str2, len2) + 1;
+
   /* If neither strings is constant string, the call is not qualify.  */
   if (!src_str1 && !src_str2)
 return NULL_RTX;
diff --git a/gcc/testsuite/gcc.dg/pr90892.c b/gcc/testsuite/gcc.dg/pr90892.c
new file mode 100644
index 000..e4b5310807a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr90892.c
@@ -0,0 +1,14 @@
+/* PR tree-optimization/90892 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+const char *a = "A\0b";
+
+int
+main()
+{
+  if (__builtin_strncmp(a, "A\0", 2) != 0)
+__builtin_abort ();
+
+  return 0;
+}
-- 
2.21.0



Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-18 Thread Martin Liška
On 6/18/19 11:02 AM, luoxhu wrote:
> Hi,
> 
> On 2019/6/18 13:51, Martin Liška wrote:
>> On 6/18/19 3:45 AM, Xiong Hu Luo wrote:
>>
>> Hello.
>>
>> Thank you for the interest in the area.
>>
>>> This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
>>> Currently the default instrument function can only find the indirect 
>>> function
>>> that called more than 50% with an incorrect count number returned.
>> Can you please explain what you mean by 'an incorrect count number returned'?
> 
> For a test case indir-call-topn.c, it include 2 indirect calls "one" and 
> "two". the profiling data is as below with trunk code (including your patch, 
> count[0] and count[2] is switched by your code, the count[0] is used in 
> ipa-profile but only support the top1 format, my patch adds the support for 
> the topn format. count[0] was incorrect as WITHOUT your patch it is 0,  
> things getting better with your fix as the count[0] is 35000, but still 
> not correct, in fact, "one" is running 17500 times, and "two" is running 
> the other 17500 times):
> 
> indir-call-topn.gcda:   22:    01a9:  18:COUNTERS indirect_call 9 counts
> indir-call-topn.gcda:   24:   0: *35000 1868707024 0* 0 0 
> 0 0 0
> 
> Running with the "--param indir-call-topn-profile=1" will give below profile 
> data, My patch is based on this profile result and do the optimization for 
> multiple indirect targets, performance can get much improve on this testcase 
> and SPEC2017 for some benchmarks(LLVM already support this several years 
> ago...).
> 
> indir-call-topn.gcda:   26:    01b1:  18:COUNTERS indirect_call_topn 9 
> counts
> indir-call-topn.gcda:   28:   0: *0 969338501 17500 
> 1868707024 17500* 0 0 0
> 
> 
> test case indir-call-topn.c:
> 
> #include 
> 
> 
> typedef int (*fptr) (int);
> int
> one (int a)
> {
>   return 1;
> }
> 
> int
> two (int a)
> {
>   return 0;
> }
> 
> fptr table[] = {, };
> 
> int
> main()
> {
>   int i, x;
>   fptr p = 
> 
>   one (3);
> 
>   for (i = 0; i < 35000; i++)
>     {
>   x = (*p) (3);
>   p = table[x];
>     }
>   printf ("done:%d\n", x);
> }

I've got it. So it's situation where you have distribution equal to 50% and 
50%. Note that it's
the only valid situation where both edges with be >= 50%. That's the threshold 
for which
we speculatively devirtualize edges. That said, you don't need generic topn 
counter, but a probably
only a top2 counter which can be generalized from single-value counter type. 
I'm saying that
because I removed the TOPN, mainly due to:
https://github.com/gcc-mirror/gcc/commit/5cb221f2b9c268df47c97b4837230b15e65f9c14#diff-d003c64ae14449d86df03508de98bde7L179

which is over-complicated profiling function. And the changes that I've done 
recently are motivated
to preserve a stable builds. That's achieved by noticing that a single-value 
counter can't handle all
seen values.

> 
>>
>>>   This patch
>>> leverages the "--param indir-call-topn-profile=1" and enables multiple 
>>> indirect
>> Note that I've remove indir-call-topn-profile last week, the patch will not 
>> apply
>> on current trunk. However, I can help you how to adapt single-value counters
>> to support tracking of multiple values.
> 
> It will be very useful if you help me to track multiple values similarly on 
> trunk code. I will rebase to your code once topn is ready again. Actually 
> topn is more general and top1 is included in, I thought that top1 should be 
> removed instead of topn, though topn will consume longer time than top1 in 
> profile-generate.

As mentioned earlier, I really don't want to put TOPN back. I can help you once 
Honza will agree with the general IPA changes.

> 
>>
>>> targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, 
>>> function
>>> specialization, profiling, partial devirtualization, inlining and cloning 
>>> could
>>> be done successfully based on it.
>> This decision is definitely big question for Honza?
>>
>>> Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests.
>>> Details are:
>>>    1.  When do PGO with indir-call-topn-profile, the gcda data format is not
>>>    supported in ipa-profile pass,
>> If you take a look at gcc/ipa-profile.c:195 you can see how the probability
>> is propagated to IPA passes. Why is that not sufficient?
> 
> Current code only support single indirect target, I need track multiple 
> indirect targets and create multiple speculative edges on single indirect 
> call statement.
> 
> What's more, many ICEs happened in later stage due to single speculative 
> target design, part of this patch is to solve the ICEs of multiple 
> speculative target edges handling.

Well, to be honest I don't like the patch much. It brings another level of 
complexity for a quite rare situation where one
calls 2 functions via an indirect call. And as mentioned, current IPA 
optimization are not happy about multiple indirect branches.

Martin

> 
> 

[nvptx, committed] Use define_insn parametrization

2019-06-18 Thread Tom de Vries
[ was: Re: [nvptx] Fix missing mode warnings in nvptx.md, omp part ]

On 17-06-19 17:09, Jakub Jelinek wrote:
> On Mon, Jun 17, 2019 at 04:53:24PM +0200, Tom de Vries wrote:
>> Updated accordingly, and committed as attached.
> 
> Note, current trunk allows one to define expanders that take mode as the
> first argument, so you could
> (define_insn "@set_softstack_"
>   [(unspec [(match_operand:P 0 "nvptx_register_operand" "R")]
>   UNSPEC_SET_SOFTSTACK)]
>   "TARGET_SOFT_STACK"
> ...
> and then just use gen_set_softstack (Pmode, arg).

Thanks, that's useful.

Committed as attached.

Thanks,
- Tom
[nvptx] Use define_insn parametrization

Parametrize some define_insn to simplify code in define_expands generating
those insns.

Build and reg-tested on x86_64 with nvptx accelerator.

2019-06-18  Tom de Vries  

	* config/nvptx/nvptx-protos.h (gen_set_softstack_insn): Remove.
	* config/nvptx/nvptx.c (gen_set_softstack_insn): Remove.
	* config/nvptx/nvptx.md (define_insn "set_softstack_"): Rename to ...
	(define_insn "@set_softstack_"): ... this.
	(define_insn "omp_simt_enter_"): Rename to ...
	(define_insn "@omp_simt_enter_"): ... this.
	(define_insn "omp_simt_exit_"): Rename to ...
	(define_insn "@omp_simt_exit_"): ... this.

---
 gcc/config/nvptx/nvptx-protos.h |  1 -
 gcc/config/nvptx/nvptx.c| 12 
 gcc/config/nvptx/nvptx.md   | 30 +-
 3 files changed, 9 insertions(+), 34 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 061897a3921..be09a15e49c 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -57,6 +57,5 @@ extern const char *nvptx_output_set_softstack (unsigned);
 extern const char *nvptx_output_simt_enter (rtx, rtx, rtx);
 extern const char *nvptx_output_simt_exit (rtx);
 extern const char *nvptx_output_red_partition (rtx, rtx);
-extern rtx gen_set_softstack_insn (rtx);
 #endif
 #endif
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index aa4a67fbead..c53a1ae9f26 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -112,18 +112,6 @@ enum nvptx_data_area
   DATA_AREA_MAX
 };
 
-rtx
-gen_set_softstack_insn (rtx op)
-{
-  gcc_assert (GET_MODE (op) == Pmode);
-  if (GET_MODE (op) == DImode)
-return gen_set_softstack_di (op);
-  else if (GET_MODE (op) == SImode)
-return gen_set_softstack_si (op);
-  else
-gcc_unreachable ();
-}
-
 /*  We record the data area in the target symbol flags.  */
 #define SYMBOL_DATA_AREA(SYM) \
   (nvptx_data_area)((SYMBOL_REF_FLAGS (SYM) >> SYMBOL_FLAG_MACH_DEP_SHIFT) \
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 84c0ea45431..58a18fe21cf 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1025,8 +1025,8 @@
   ""
 {
   if (TARGET_SOFT_STACK)
-emit_insn (gen_set_softstack_insn (gen_rtx_REG (Pmode,
-		SOFTSTACK_PREV_REGNUM)));
+emit_insn (gen_set_softstack (Pmode, gen_rtx_REG (Pmode,
+		  SOFTSTACK_PREV_REGNUM)));
   emit_jump_insn (gen_return ());
   DONE;
 })
@@ -1059,7 +1059,7 @@
 {
   emit_move_insn (stack_pointer_rtx,
 		  gen_rtx_MINUS (Pmode, stack_pointer_rtx, operands[1]));
-  emit_insn (gen_set_softstack_insn (stack_pointer_rtx));
+  emit_insn (gen_set_softstack (Pmode, stack_pointer_rtx));
   emit_move_insn (operands[0], virtual_stack_dynamic_rtx);
   DONE;
 }
@@ -1071,7 +1071,7 @@
   DONE;
 })
 
-(define_insn "set_softstack_"
+(define_insn "@set_softstack_"
   [(unspec [(match_operand:P 0 "nvptx_register_operand" "R")]
 	   UNSPEC_SET_SOFTSTACK)]
   "TARGET_SOFT_STACK"
@@ -1087,7 +1087,7 @@
   if (TARGET_SOFT_STACK)
 {
   emit_move_insn (operands[0], operands[1]);
-  emit_insn (gen_set_softstack_insn (operands[0]));
+  emit_insn (gen_set_softstack (Pmode, operands[0]));
 }
   DONE;
 })
@@ -1237,7 +1237,7 @@
 
 ;; Patterns for OpenMP SIMD-via-SIMT lowering
 
-(define_insn "omp_simt_enter_"
+(define_insn "@omp_simt_enter_"
   [(set (match_operand:P 0 "nvptx_register_operand" "=R")
 	(unspec_volatile:P [(match_operand:P 1 "nvptx_nonmemory_operand" "Ri")
 			(match_operand:P 2 "nvptx_nonmemory_operand" "Ri")]
@@ -1261,13 +1261,7 @@
   cfun->machine->simt_stack_align = MAX (UINTVAL (operands[2]),
 	 cfun->machine->simt_stack_align);
   cfun->machine->has_simtreg = true;
-  gcc_assert (GET_MODE (operands[0]) == Pmode);
-  if (GET_MODE (operands[0]) == DImode)
-emit_insn (gen_omp_simt_enter_di (operands[0], operands[1], operands[2]));
-  else if (GET_MODE (operands[0]) == SImode)
-emit_insn (gen_omp_simt_enter_si (operands[0], operands[1], operands[2]));
-  else
-gcc_unreachable ();
+  emit_insn (gen_omp_simt_enter (Pmode, operands[0], operands[1], operands[2]));
   DONE;
 })
 
@@ -1275,17 +1269,11 @@
   [(match_operand 0 "nvptx_register_operand" "R")]
   ""
 {
-  gcc_assert (GET_MODE (operands[0]) == Pmode);
-  if (GET_MODE (operands[0]) == 

[committed][nvptx] Fix __main missing prototype warning in crt0.c

2019-06-18 Thread Tom de Vries
Hi,

Atm we see:
...
libgcc/config/nvptx/crt0.c:36:1: warning: no previous prototype for \
  ‘__main’ [-Wmissing-prototypes]
...

Fix this by adding the prototype.

Build and reg-tested on nvptx.
Build and reg-tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix __main missing prototype warning in crt0.c

2019-06-18  Tom de Vries  

* config/nvptx/crt0.c (__main): Declare.

---
 libgcc/config/nvptx/crt0.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
index 097193c708e..b3bf1475fa3 100644
--- a/libgcc/config/nvptx/crt0.c
+++ b/libgcc/config/nvptx/crt0.c
@@ -32,7 +32,9 @@ void *__nvptx_stacks[32] __attribute__((shared,nocommon));
 /* Likewise for -muniform-simt.  */
 unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
 
-void __attribute__((kernel))
+extern void __main (int *, int, void **) __attribute__((kernel));
+
+void
 __main (int *rval_ptr, int argc, void **argv)
 {
   __exitval_ptr = rval_ptr;


Re: Restore correct iv step for fully-masked loops

2019-06-18 Thread Richard Biener
On Tue, Jun 18, 2019 at 10:57 AM Richard Sandiford
 wrote:
>
> r272233 changed the handling of fully-masked loops so that the IV type
> can be wider than the comparison type.  However, it also hard-coded the
> IV step to VF, whereas for SLP groups it needs to be VF * group size.
> This introduced execution failures for gcc.target/aarch64/sve/slp_*_run.c
> (and over 100 other execution regressions, which at least gives some
> confidence that this has good coverage in the testsuite :-)).
>
> Also, iv_precision had type widest_int but only needs to be unsigned int.
> (This was an early review comment but I hadn't noticed that it was still
> widest_int in later versions.)
>
> Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
> OK to install?

OK.

Richard.

> Richard
>
>
> 2019-06-18  Richard Sandiford  
>
> gcc/
> * tree-vect-loop-manip.c (vect_set_loop_masks_directly): Remove
> vf parameter.  Restore the previous iv step of nscalars_step,
> but give it iv_type rather than compare_type.  Tweak code order
> to match the comments.
> (vect_set_loop_condition_masked): Update accordingly.
> * tree-vect-loop.c (vect_verify_full_masking): Use "unsigned int"
> for iv_precision.  Tweak comment formatting.
>
> Index: gcc/tree-vect-loop-manip.c
> ===
> --- gcc/tree-vect-loop-manip.c  2019-06-18 09:35:59.473831854 +0100
> +++ gcc/tree-vect-loop-manip.c  2019-06-18 09:36:03.301800224 +0100
> @@ -382,8 +382,7 @@ vect_maybe_permute_loop_masks (gimple_se
> Use LOOP_COND_GSI to insert code before the exit gcond.
>
> RGM belongs to loop LOOP.  The loop originally iterated NITERS
> -   times and has been vectorized according to LOOP_VINFO.  Each iteration
> -   of the vectorized loop handles VF iterations of the scalar loop.
> +   times and has been vectorized according to LOOP_VINFO.
>
> If NITERS_SKIP is nonnull, the first iteration of the vectorized loop
> starts with NITERS_SKIP dummy iterations of the scalar loop before
> @@ -410,8 +409,7 @@ vect_maybe_permute_loop_masks (gimple_se
>  vect_set_loop_masks_directly (struct loop *loop, loop_vec_info loop_vinfo,
>   gimple_seq *preheader_seq,
>   gimple_stmt_iterator loop_cond_gsi,
> - rgroup_masks *rgm, tree vf,
> - tree niters, tree niters_skip,
> + rgroup_masks *rgm, tree niters, tree 
> niters_skip,
>   bool might_wrap_p)
>  {
>tree compare_type = LOOP_VINFO_MASK_COMPARE_TYPE (loop_vinfo);
> @@ -419,26 +417,28 @@ vect_set_loop_masks_directly (struct loo
>tree mask_type = rgm->mask_type;
>unsigned int nscalars_per_iter = rgm->max_nscalars_per_iter;
>poly_uint64 nscalars_per_mask = TYPE_VECTOR_SUBPARTS (mask_type);
> +  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>
>/* Calculate the maximum number of scalar values that the rgroup
>   handles in total, the number that it handles for each iteration
>   of the vector loop, and the number that it should skip during the
>   first iteration of the vector loop.  */
>tree nscalars_total = niters;
> -  tree nscalars_step = vf;
> +  tree nscalars_step = build_int_cst (iv_type, vf);
>tree nscalars_skip = niters_skip;
>if (nscalars_per_iter != 1)
>  {
>/* We checked before choosing to use a fully-masked loop that these
>  multiplications don't overflow.  */
> -  tree factor = build_int_cst (compare_type, nscalars_per_iter);
> +  tree compare_factor = build_int_cst (compare_type, nscalars_per_iter);
> +  tree iv_factor = build_int_cst (iv_type, nscalars_per_iter);
>nscalars_total = gimple_build (preheader_seq, MULT_EXPR, compare_type,
> -nscalars_total, factor);
> -  nscalars_step = gimple_build (preheader_seq, MULT_EXPR, compare_type,
> -   nscalars_step, factor);
> +nscalars_total, compare_factor);
> +  nscalars_step = gimple_build (preheader_seq, MULT_EXPR, iv_type,
> +   nscalars_step, iv_factor);
>if (nscalars_skip)
> nscalars_skip = gimple_build (preheader_seq, MULT_EXPR, compare_type,
> - nscalars_skip, factor);
> + nscalars_skip, compare_factor);
>  }
>
>/* Create an induction variable that counts the number of scalars
> @@ -447,15 +447,10 @@ vect_set_loop_masks_directly (struct loo
>gimple_stmt_iterator incr_gsi;
>bool insert_after;
>standard_iv_increment_position (loop, _gsi, _after);
> +  create_iv (build_int_cst (iv_type, 0), nscalars_step, NULL_TREE, loop,
> +_gsi, insert_after, _before_incr, _after_incr);
>
> -  tree zero_index = build_int_cst 

[AArch64] Use scvtf fbits option where appropriate

2019-06-18 Thread Joel Hutton
Hi,

On 13/06/2019 18:26, Wilco Dijkstra wrote:
> Wouldn't it be easier to just do exact_log2 (real_to_integer ())
> and then check the range is in 1..31?
I've revised this section.
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -6016,6 +6016,40 @@
> [(set_attr "type" "f_cvtf2i")]
>   )
>   
> +(define_insn "*aarch64_cvtf__2_mult"
> +  [(set (match_operand:GPF 0 "register_operand" "=w,w")
> + (mult:GPF (FLOATUORS:GPF
> +(match_operand: 1 "register_operand" "w,?r"))
> +(match_operand 2 "aarch64_fp_pow2_recip""Dt,Dt")))]
>
> We should add a comment before both define_insn similar to the other
> conversions, explaining what they do and why there are 2 separate patterns
> (the default versions of the conversions appear to be missing a comment too).
I've added comments to the new and existing patterns
From 5a9dfa6c6eb1c5b9c8c464780b7098058989d472 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 13 Jun 2019 11:08:56 +0100
Subject: [PATCH] SCVTF fbits

---
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64.c  |  28 
 gcc/config/aarch64/aarch64.md |  39 +
 gcc/config/aarch64/constraints.md |   7 +
 gcc/config/aarch64/predicates.md  |   4 +
 gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c | 140 ++
 6 files changed, 219 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/fmul_scvtf.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 1e3b1c91db1..ad1ba458a3f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -494,6 +494,7 @@ enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
 int aarch64_fpconst_pow_of_2 (rtx);
+int aarch64_fpconst_pow2_recip (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9a035dd9ed8..424ca6c9932 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -18707,6 +18707,34 @@ aarch64_fpconst_pow_of_2 (rtx x)
   return exact_log2 (real_to_integer (r));
 }
 
+/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a
+   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for x==(1/2^n)
+   return n. Otherwise return -1.  */
+int
+aarch64_fpconst_pow2_recip (rtx x)
+{
+  REAL_VALUE_TYPE r0;
+
+  if (!CONST_DOUBLE_P (x))
+return -1;
+
+  r0 = *CONST_DOUBLE_REAL_VALUE (x);
+  if (exact_real_inverse (DFmode, )
+  && !REAL_VALUE_NEGATIVE (r0))
+{
+	int ret = exact_log2 (real_to_integer ());
+	if (ret >= 1 && ret <= 31)
+	  {
+	return ret;
+	  }
+	else
+	  {
+	return -1;
+	  }
+}
+  return -1;
+}
+
 /* If X is a vector of equal CONST_DOUBLE values and that value is
Y, return the aarch64_fpconst_pow_of_2 of Y.  Otherwise return -1.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 526c7fb0dab..d9812aa238e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6016,6 +6016,44 @@
   [(set_attr "type" "f_cvtf2i")]
 )
 
+;; equal width integer to fp combine
+(define_insn "*aarch64_cvtf__2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w,w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "w,?r"))
+		   (match_operand 2 "aarch64_fp_pow2_recip""Dt,Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+switch (which_alternative)
+{
+  case 0:
+	return "cvtf\t%0, %1, #%2";
+  case 1:
+	return "cvtf\t%0, %1, #%2";
+  default:
+	gcc_unreachable();
+}
+  }
+  [(set_attr "type" "neon_int_to_fp_,f_cvti2f")
+   (set_attr "arch" "simd,fp")]
+)
+
+;; inequal width integer to fp combine
+(define_insn "*aarch64_cvtf__2_mult"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+	(mult:GPF (FLOATUORS:GPF
+		   (match_operand: 1 "register_operand" "r"))
+		   (match_operand 2 "aarch64_fp_pow2_recip" "Dt")))]
+  "TARGET_FLOAT"
+  {
+operands[2] = GEN_INT (aarch64_fpconst_pow2_recip (operands[2]));
+return "cvtf\t%0, %1, #%2";
+  }
+  [(set_attr "type" "f_cvti2f")]
+)
+
+;; equal width integer to fp conversion
 (define_insn "2"
   [(set (match_operand:GPF 0 "register_operand" "=w,w")
 (FLOATUORS:GPF (match_operand: 1 "register_operand" "w,?r")))]
@@ -6027,6 +6065,7 @@
(set_attr "arch" "simd,fp")]
 )
 
+;; inequal width integer to fp conversions
 (define_insn "2"
   [(set (match_operand:GPF 0 "register_operand" "=w")
 (FLOATUORS:GPF (match_operand: 1 "register_operand" "r")))]
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 

Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-18 Thread luoxhu

Hi,

On 2019/6/18 13:51, Martin Liška wrote:

On 6/18/19 3:45 AM, Xiong Hu Luo wrote:

Hello.

Thank you for the interest in the area.


This patch aims to fix PR69678 caused by PGO indirect call profiling bugs.
Currently the default instrument function can only find the indirect function
that called more than 50% with an incorrect count number returned.

Can you please explain what you mean by 'an incorrect count number returned'?


For a test case indir-call-topn.c, it include 2 indirect calls "one" and 
"two". the profiling data is as below with trunk code (including your 
patch, count[0] and count[2] is switched by your code, the count[0] is 
used in ipa-profile but only support the top1 format, my patch adds the 
support for the topn format. count[0] was incorrect as WITHOUT your 
patch it is 0,  things getting better with your fix as the count[0] is 
35000, but still not correct, in fact, "one" is running 17500 
times, and "two" is running the other 17500 times):


indir-call-topn.gcda:   22:    01a9:  18:COUNTERS indirect_call 9 counts
indir-call-topn.gcda:   24:   0: *35000 1868707024 
0* 0 0 0 0 0


Running with the "--param indir-call-topn-profile=1" will give below 
profile data, My patch is based on this profile result and do the 
optimization for multiple indirect targets, performance can get much 
improve on this testcase and SPEC2017 for some benchmarks(LLVM already 
support this several years ago...).


indir-call-topn.gcda:   26:    01b1:  18:COUNTERS indirect_call_topn 
9 counts
indir-call-topn.gcda:   28:   0: *0 969338501 17500 
1868707024 17500* 0 0 0



test case indir-call-topn.c:

#include 


typedef int (*fptr) (int);
int
one (int a)
{
  return 1;
}

int
two (int a)
{
  return 0;
}

fptr table[] = {, };

int
main()
{
  int i, x;
  fptr p = 

  one (3);

  for (i = 0; i < 35000; i++)
    {
  x = (*p) (3);
  p = table[x];
    }
  printf ("done:%d\n", x);
}




  This patch
leverages the "--param indir-call-topn-profile=1" and enables multiple indirect

Note that I've remove indir-call-topn-profile last week, the patch will not 
apply
on current trunk. However, I can help you how to adapt single-value counters
to support tracking of multiple values.


It will be very useful if you help me to track multiple values similarly 
on trunk code. I will rebase to your code once topn is ready again. 
Actually topn is more general and top1 is included in, I thought that 
top1 should be removed instead of topn, though topn will consume longer 
time than top1 in profile-generate.





targets profiling and use in LTO-WPA and LTO-LTRANS stage, as a result, function
specialization, profiling, partial devirtualization, inlining and cloning could
be done successfully based on it.

This decision is definitely big question for Honza?


Performance can get improved 3x (1.7 sec -> 0.4 sec) on simple tests.
Details are:
   1.  When do PGO with indir-call-topn-profile, the gcda data format is not
   supported in ipa-profile pass,

If you take a look at gcc/ipa-profile.c:195 you can see how the probability
is propagated to IPA passes. Why is that not sufficient?


Current code only support single indirect target, I need track multiple 
indirect targets and create multiple speculative edges on single 
indirect call statement.


What's more, many ICEs happened in later stage due to single speculative 
target design, part of this patch is to solve the ICEs of multiple 
speculative target edges handling.



Thanks

Xionghu



Martin


so add variables to pass the information
   through passes, and postpone gimple_ic to ipa-profile like default as inline
   pass will decide whether it is benefit to transform indirect call.
   2.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
   profile full support in ipa passes and cgraph_edge functions.
   3.  Fix various hidden speculative call ICEs exposed after enabling this
   feature when running SPEC2017.
   4.  Add 1 in module testcase and 2 cross module testcases.
   5.  TODOs:
 5.1.  Some reference info will be dropped from WPA to LTRANS, so
 reference check will be difficult in LTRANS, need replace the strstr
 with reference compare.
 5.2.  Some duplicate code need be removed as top1 and topn share same 
logic.
 Actually top1 related logic could be eliminated totally as topn includes 
it.
 5.3.  Split patch maybe needed as too big but not sure how many would be
 reasonable.
   6.  Performance result for ppc64le:
 6.1.  Representative test: indir-call-prof-topn.c runtime improved from
 1.7s to 0.4s.
 6.2.  SPEC2017 peakrate:
 523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r 
(+13.33%);
 525.x264_r (-5.29%).
 No big changes of other benchmarks.
 Option: -Ofast -mcpu=power8
 PASS1_OPTIMIZE: -fprofile-generate --param indir-call-topn-profile=1 
-flto
 PASS2_OPTIMIZE: 

[PATCH] Add to same comdate group only if set (PR middle-end/90899)

2019-06-18 Thread Martin Liška
Hi.

The patch is quite obvious, it copies the same what we do in
another IPA passes.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-06-18  Martin Liska  

PR middle-end/90899
* multiple_target.c (create_dispatcher_calls): Add to comdat
group only if set for ifunc.

gcc/testsuite/ChangeLog:

2019-06-18  Martin Liska  

PR middle-end/90899
* gcc.target/i386/pr90899.c: New test.
---
 gcc/multiple_target.c   | 3 ++-
 gcc/testsuite/gcc.target/i386/pr90899.c | 6 ++
 2 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90899.c


diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
index fa194d416fe..7aea684a40a 100644
--- a/gcc/multiple_target.c
+++ b/gcc/multiple_target.c
@@ -158,7 +158,8 @@ create_dispatcher_calls (struct cgraph_node *node)
 	{
 	  symtab_node *source = ref->referring;
 	  source->create_reference (inode, IPA_REF_ALIAS);
-	  source->add_to_same_comdat_group (inode);
+	  if (inode->get_comdat_group ())
+		source->add_to_same_comdat_group (inode);
 	}
 	  else
 	gcc_unreachable ();
diff --git a/gcc/testsuite/gcc.target/i386/pr90899.c b/gcc/testsuite/gcc.target/i386/pr90899.c
new file mode 100644
index 000..e0e2d5ac6bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90899.c
@@ -0,0 +1,6 @@
+/* PR middle-end/90899 */
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+
+__attribute__ ((target_clones ("default", "arch=slm"))) static int f () { return 0; }
+__attribute__ ((alias ("f"))) __typeof (f) g;



Restore correct iv step for fully-masked loops

2019-06-18 Thread Richard Sandiford
r272233 changed the handling of fully-masked loops so that the IV type
can be wider than the comparison type.  However, it also hard-coded the
IV step to VF, whereas for SLP groups it needs to be VF * group size.
This introduced execution failures for gcc.target/aarch64/sve/slp_*_run.c
(and over 100 other execution regressions, which at least gives some
confidence that this has good coverage in the testsuite :-)).

Also, iv_precision had type widest_int but only needs to be unsigned int.
(This was an early review comment but I hadn't noticed that it was still
widest_int in later versions.)

Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
OK to install?

Richard


2019-06-18  Richard Sandiford  

gcc/
* tree-vect-loop-manip.c (vect_set_loop_masks_directly): Remove
vf parameter.  Restore the previous iv step of nscalars_step,
but give it iv_type rather than compare_type.  Tweak code order
to match the comments.
(vect_set_loop_condition_masked): Update accordingly.
* tree-vect-loop.c (vect_verify_full_masking): Use "unsigned int"
for iv_precision.  Tweak comment formatting.

Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c  2019-06-18 09:35:59.473831854 +0100
+++ gcc/tree-vect-loop-manip.c  2019-06-18 09:36:03.301800224 +0100
@@ -382,8 +382,7 @@ vect_maybe_permute_loop_masks (gimple_se
Use LOOP_COND_GSI to insert code before the exit gcond.
 
RGM belongs to loop LOOP.  The loop originally iterated NITERS
-   times and has been vectorized according to LOOP_VINFO.  Each iteration
-   of the vectorized loop handles VF iterations of the scalar loop.
+   times and has been vectorized according to LOOP_VINFO.
 
If NITERS_SKIP is nonnull, the first iteration of the vectorized loop
starts with NITERS_SKIP dummy iterations of the scalar loop before
@@ -410,8 +409,7 @@ vect_maybe_permute_loop_masks (gimple_se
 vect_set_loop_masks_directly (struct loop *loop, loop_vec_info loop_vinfo,
  gimple_seq *preheader_seq,
  gimple_stmt_iterator loop_cond_gsi,
- rgroup_masks *rgm, tree vf,
- tree niters, tree niters_skip,
+ rgroup_masks *rgm, tree niters, tree niters_skip,
  bool might_wrap_p)
 {
   tree compare_type = LOOP_VINFO_MASK_COMPARE_TYPE (loop_vinfo);
@@ -419,26 +417,28 @@ vect_set_loop_masks_directly (struct loo
   tree mask_type = rgm->mask_type;
   unsigned int nscalars_per_iter = rgm->max_nscalars_per_iter;
   poly_uint64 nscalars_per_mask = TYPE_VECTOR_SUBPARTS (mask_type);
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
 
   /* Calculate the maximum number of scalar values that the rgroup
  handles in total, the number that it handles for each iteration
  of the vector loop, and the number that it should skip during the
  first iteration of the vector loop.  */
   tree nscalars_total = niters;
-  tree nscalars_step = vf;
+  tree nscalars_step = build_int_cst (iv_type, vf);
   tree nscalars_skip = niters_skip;
   if (nscalars_per_iter != 1)
 {
   /* We checked before choosing to use a fully-masked loop that these
 multiplications don't overflow.  */
-  tree factor = build_int_cst (compare_type, nscalars_per_iter);
+  tree compare_factor = build_int_cst (compare_type, nscalars_per_iter);
+  tree iv_factor = build_int_cst (iv_type, nscalars_per_iter);
   nscalars_total = gimple_build (preheader_seq, MULT_EXPR, compare_type,
-nscalars_total, factor);
-  nscalars_step = gimple_build (preheader_seq, MULT_EXPR, compare_type,
-   nscalars_step, factor);
+nscalars_total, compare_factor);
+  nscalars_step = gimple_build (preheader_seq, MULT_EXPR, iv_type,
+   nscalars_step, iv_factor);
   if (nscalars_skip)
nscalars_skip = gimple_build (preheader_seq, MULT_EXPR, compare_type,
- nscalars_skip, factor);
+ nscalars_skip, compare_factor);
 }
 
   /* Create an induction variable that counts the number of scalars
@@ -447,15 +447,10 @@ vect_set_loop_masks_directly (struct loo
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   standard_iv_increment_position (loop, _gsi, _after);
+  create_iv (build_int_cst (iv_type, 0), nscalars_step, NULL_TREE, loop,
+_gsi, insert_after, _before_incr, _after_incr);
 
-  tree zero_index = build_int_cst (iv_type, 0);
-  tree step = build_int_cst (iv_type,
-LOOP_VINFO_VECT_FACTOR (loop_vinfo));
-  /* Create IV of iv_type.  */
-  create_iv (zero_index, step, NULL_TREE, loop, _gsi,
-insert_after, _before_incr, 

[Darwin, committed] The need for FDE symbols is dependent on linker used, not OS rev.

2019-06-18 Thread Iain Sandoe
For very old toolchains, the compiler generated extra symbols that mark the
start of each FDE since the linker couldn’t determine the boundaries otherwise.

We no longer need this (since xcode 3 era).  Since we have detection of the 
linker
version, we can use that directly to determine if support is needed.

Tested on i686/powerpc darwin9, x86_64 darwin10, 16, 18.
applied to mainline
thanks
Iain

2019-06-18  Iain Sandoe  

* config/darwin.c (darwin_emit_unwind_label): New default to false.
(darwin_override_options): Set darwin_emit_unwind_label as needed.

diff --git a/gcc/config/darwin.c b/gcc/config/darwin.c
index 00fa652..a40f532 100644
--- a/gcc/config/darwin.c
+++ b/gcc/config/darwin.c
@@ -99,6 +99,10 @@ int generating_for_darwin_version ;
   for weak or single-definition items.  */
static bool ld_uses_coal_sects = false;

+/* Very old (ld_classic) linkers need a symbol to mark the start of
+   each FDE.  */
+static bool ld_needs_eh_markers = false;
+
/* Section names.  */
section * darwin_sections[NUM_DARWIN_SECTIONS];

@@ -2080,11 +2084,11 @@ darwin_emit_unwind_label (FILE *file, tree decl, int 
for_eh, int empty)
  static int invok_count = 0;
  static tree last_fun_decl = NULL_TREE;

-  /* We use the linker to emit the .eh labels for Darwin 9 and above.  */
-  if (! for_eh || generating_for_darwin_version >= 9)
+  /* Modern linkers can produce distinct FDEs without compiler support.  */
+  if (! for_eh || ! ld_needs_eh_markers)
return;

-  /* FIXME: This only works when the eh for all sections of a function is 
+  /* FIXME: This only works when the eh for all sections of a function are 
 emitted at the same time.  If that changes, we would need to use a lookup
 table of some form to determine what to do.  Also, we should emit the
 unadorned label for the partition containing the public label for a
@@ -3257,16 +3261,29 @@ darwin_override_options (void)
 indirections and we no longer need to emit pic symbol stubs.
 However, if we are generating code for earlier ones (or for use in the 
 kernel) the stubs might still be required, and this will be set true.
- If the user sets it on or off - then that takes precedence. */
+ If the user sets it on or off - then that takes precedence.
+
+ Linkers that don't need stubs, don't need the EH symbol markers either.
+  */

  if (!global_options_set.x_darwin_picsymbol_stubs)
{
-  if (darwin_target_linker) {
-   if (strverscmp (darwin_target_linker, MIN_LD64_OMIT_STUBS) < 0)
+  if (darwin_target_linker) 
+   {
+ if (strverscmp (darwin_target_linker, MIN_LD64_OMIT_STUBS) < 0)
+   {
+ darwin_picsymbol_stubs = true;
+ ld_needs_eh_markers = true;
+   }
+   } 
+  else if (generating_for_darwin_version < 9)
+   {
+ /* If we don't know the linker version and we're targeting an old
+system, we know no better than to assume the use of an earlier 
+linker.  */
  darwin_picsymbol_stubs = true;
-  } else if (generating_for_darwin_version < 9)
-   /* We know no better than to assume the use of an earlier linker.  */
-   darwin_picsymbol_stubs = true;
+ ld_needs_eh_markers = true;
+   }
}
  else if (DARWIN_X86 && darwin_picsymbol_stubs && TARGET_64BIT)
{


[PATCH, testsuite]: Fix gcc.target/i386/pr81563.c scan-assembler failure

2019-06-18 Thread Uros Bizjak
2019-06-18  Uroš Bizjak  

* gcc.target/i386/pr81563.c (dg-final): Check that no
registers are restored from %esp.

Tested on x86_64-linux-gnu {,-m32}.

Will be committed later today.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/pr81563.c 
b/gcc/testsuite/gcc.target/i386/pr81563.c
index ebfd583daf5b..f0efcf913401 100644
--- a/gcc/testsuite/gcc.target/i386/pr81563.c
+++ b/gcc/testsuite/gcc.target/i386/pr81563.c
@@ -10,5 +10,4 @@ fn1 (long long int x)
   return x;
 }
 
-/* { dg-final { scan-assembler-times "movl\[\\t \]*-8\\(%ebp\\),\[\\t \]*%esi" 
1 } } */
-/* { dg-final { scan-assembler-times "movl\[\\t \]*-4\\(%ebp\\),\[\\t \]*%edi" 
1 } } */
+/* { dg-final { scan-assembler-not "movl\[ \\t\]+\[0-9]*\\(%esp\\)" } } */


Re: [PATCH] avoid ice due to inconsistent argument types to fold_build (PR 90662)

2019-06-18 Thread Christophe Lyon
On Fri, 14 Jun 2019 at 03:35, Jeff Law  wrote:
>
> On 6/13/19 1:10 PM, Martin Sebor wrote:
> > Attached is a fix for the fold_build call with inconsistent
> > argument types introduced in a recent commit of mine.
> >
> > Tested on x86_64-linux.
> >
> > Martin
> >
> > gcc-90662.diff
> >
> > PR tree-optimization/90662 - strlen of a string in a vla plus offset not 
> > folded
> >
> > gcc/ChangeLog:
> >
> >   PR tree-optimization/90662
> >   * tree-ssa-strlen.c (get_stridx): Convert fold_build2 operands
> >   to the same type.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR tree-optimization/90662
> >   * gcc.dg/pr90866-2.c: New test.
> >   * gcc.dg/pr90866.c: Ditto.
> OK
> jeff

Hi,

I've noticed that pr90866-2.c fails on arm-none-eabi:
/gcc/testsuite/gcc.dg/pr90866-2.c:17:5: error: conflicting types for 'i'
/gcc/testsuite/gcc.dg/pr90866-2.c:17:1: note: an argument type that
has a default promotion cannot match an empty parameter name list
declaration
/gcc/testsuite/gcc.dg/pr90866-2.c:16:5: note: previous declaration of
'i' was here
compiler exited with status 1
FAIL: gcc.dg/pr90866-2.c (test for excess errors)

Removing 'int i();' makes the test pass, but I'm wondering why is
passes on other targets without this change?

Christophe


  1   2   >