Re: [PATCH] combine: Allow combining two insns to two insns

2018-08-01 Thread Toon Moene

On 07/24/2018 07:18 PM, Segher Boessenkool wrote:


This patch allows combine to combine two insns into two.  This helps
in many cases, by reducing instruction path length, and also allowing
further combinations to happen.  PR85160 is a typical example of code
that it can improve.


I cannot state with certainty that the improvements to our most 
notorious routine between 8.2 and current trunk are solely due to this 
change, but the differences are telling (see attached Fortran code - the 
analysis is about the third loop).


Number of instructions for this loop (Skylake i9-7900).

gfortran82 -S -Ofast -march=native -mtune=native:

  458 verint.s.82.loop3

gfortran90 -S -Ofast -march=native -mtune=native:

  396 verint.s.90.loop3

But the most stunning difference is the use of the stack [ nn(rsp) ] - 
see the attached files ...


--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
# 1 "/scratch/hirlam/hl_home/MPI/lib/src/grdy/verint.F"
# 1 ""
# 1 ""
# 1 "/scratch/hirlam/hl_home/MPI/lib/src/grdy/verint.F"
c Library:grdy $RCSfile$, $Revision: 7536 $
c checked in by $Author: ovignes $ at $Date: 2009-12-18 14:23:36 +0100 (Fri, 18 Dec 2009) $
c $State$, $Locker$
c $Log$
c Revision 1.3  1999/04/22 09:30:45  DagBjoerge
c MPP code
c
c Revision 1.2  1999/03/09 10:23:13  GerardCats
c Add SGI paralllellisation directives DOACROSS
c
c Revision 1.1  1996/09/06 13:12:18  GCats
c Created from grdy.apl, 1 version 2.6.1, by Gerard Cats
c
  SUBROUTINE VERINT (
 I   KLON   , KLAT   , KLEV   , KINT  , KHALO
 I , KLON1  , KLON2  , KLAT1  , KLAT2
 I , KP , KQ , KR
 R , PARG   , PRES
 R , PALFH  , PBETH
 R , PALFA  , PBETA  , PGAMA   )
C
C***
C
C  VERINT - THREE DIMENSIONAL INTERPOLATION
C
C  PURPOSE:
C
C  THREE DIMENSIONAL INTERPOLATION
C
C  INPUT PARAMETERS:
C
C  KLON  NUMBER OF GRIDPOINTS IN X-DIRECTION
C  KLAT  NUMBER OF GRIDPOINTS IN Y-DIRECTION
C  KLEV  NUMBER OF VERTICAL LEVELS
C  KINT  TYPE OF INTERPOLATION
C= 1 - LINEAR
C= 2 - QUADRATIC
C= 3 - CUBIC
C= 4 - MIXED CUBIC/LINEAR
C  KLON1 FIRST GRIDPOINT IN X-DIRECTION
C  KLON2 LAST  GRIDPOINT IN X-DIRECTION
C  KLAT1 FIRST GRIDPOINT IN Y-DIRECTION
C  KLAT2 LAST  GRIDPOINT IN Y-DIRECTION
C  KPARRAY OF INDEXES FOR HORIZONTAL DISPLACEMENTS
C  KQARRAY OF INDEXES FOR HORIZONTAL DISPLACEMENTS
C  KRARRAY OF INDEXES FOR VERTICAL   DISPLACEMENTS
C  PARG  ARRAY OF ARGUMENTS
C  PALFH ALFA HAT
C  PBETH BETA HAT
C  PALFA ARRAY OF WEIGHTS IN X-DIRECTION
C  PBETA ARRAY OF WEIGHTS IN Y-DIRECTION
C  PGAMA ARRAY OF WEIGHTS IN VERTICAL DIRECTION
C
C  OUTPUT PARAMETERS:
C
C  PRES  INTERPOLATED FIELD
C
C  HISTORY:
C
C  J.E. HAUGEN   1  1992
C
C***
C
  IMPLICIT NONE
C
  INTEGER KLON   , KLAT   , KLEV   , KINT   , KHALO,
 IKLON1  , KLON2  , KLAT1  , KLAT2
C
  INTEGER   KP(KLON,KLAT), KQ(KLON,KLAT), KR(KLON,KLAT)
  REALPARG(2-KHALO:KLON+KHALO-1,2-KHALO:KLAT+KHALO-1,KLEV)  ,   
 RPRES(KLON,KLAT) ,
 R   PALFH(KLON,KLAT) ,  PBETH(KLON,KLAT)  ,
 R   PALFA(KLON,KLAT,4)   ,  PBETA(KLON,KLAT,4),
 R   PGAMA(KLON,KLAT,4)
C
  INTEGER JX, JY, IDX, IDY, ILEV
  REAL Z1MAH, Z1MBH
C
  IF (KINT.EQ.1) THEN
C  LINEAR INTERPOLATION
C
  DO JY = KLAT1,KLAT2
  DO JX = KLON1,KLON2
 IDX  = KP(JX,JY)
 IDY  = KQ(JX,JY)
 ILEV = KR(JX,JY)
C
 PRES(JX,JY) = PGAMA(JX,JY,1)*(
C
 +   PBETA(JX,JY,1)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY-1,ILEV-1)
 +  + PALFA(JX,JY,2)*PARG(IDX  ,IDY-1,ILEV-1) )
 + + PBETA(JX,JY,2)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY  ,ILEV-1)
 +  + PALFA(JX,JY,2)*PARG(IDX  ,IDY  ,ILEV-1) ) )
C+
 +   + PGAMA(JX,JY,2)*(
C+
 +   PBETA(JX,JY,1)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY-1,ILEV  )
 +  + PALFA(JX,JY,2)*PARG(IDX  ,IDY-1,ILEV  ) )
 + + PBETA(JX,JY,2)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY  ,ILEV  )
 +  + PALFA(JX,JY,2)*PARG(IDX  ,IDY  ,ILEV  ) ) )
  ENDDO
  ENDDO
C
  ELSE
 +IF (KINT.EQ.2) THEN
C  QUADRATIC INTERPOLATION
C
  DO JY = KLAT1,KLAT2
  DO JX = KLON1,KLON2
 IDX  = KP(JX,JY)
 IDY  = KQ(JX,JY)
 ILEV = KR(JX,JY)
C
 PRES(JX,JY) = PGAMA(JX,JY,1)*(
C
 +   PBETA(JX,JY,1)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY-1,ILEV-1)
 +  + PALFA(JX,JY,2)*PARG(IDX  ,IDY-1,ILEV-1)
 +  + PALFA(JX,JY,3)*PARG(IDX+1,IDY-1,ILEV-1) )
 + + PBETA(JX,JY,2)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY  ,ILEV-1)
 +  + 

PING [PATCH] treat -Wxxx-larger-than=HWI_MAX special (PR 86631)

2018-08-01 Thread Martin Sebor

Richard, do you have any further comments or suggestions or is
the patch acceptable?

I realize it's not ideal but I don't see how to achieve the ideal
(understanding PTRDIFF_MAX) without deferring the processing of
these options until the back end has been initialized.  It would
still mean setting aside some special value, traversing options
again, looking for the Host_Wide_Int ones with the special value,
and replacing it with the value of PTRDIFF_MAX.  That doesn't seem
any cleaner than the current solution, just a lot more work (not
just to implement but for the compiler to go through at startup).

Joseph, can you think of anything better?

  https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01455.html

On 07/26/2018 02:52 PM, Martin Sebor wrote:

On 07/26/2018 08:58 AM, Martin Sebor wrote:

On 07/26/2018 02:38 AM, Richard Biener wrote:

On Wed, Jul 25, 2018 at 5:54 PM Martin Sebor  wrote:


On 07/25/2018 08:57 AM, Jakub Jelinek wrote:

On Wed, Jul 25, 2018 at 08:54:13AM -0600, Martin Sebor wrote:

I don't mean for the special value to be used except internally
for the defaults.  Otherwise, users wanting to override the default
will choose a value other than it.  I'm happy to document it in
the .opt file for internal users though.

-1 has the documented effect of disabling the warnings altogether
(-1 is SIZE_MAX) so while I agree that -1 looks better it doesn't
work.  (It would need more significant changes.)


The variable is signed, so -1 is not SIZE_MAX.  Even if -1 disables
it, you
could use e.g. -2 or other negative value for the other special case.


The -Wxxx-larger-than=N distinguish three ranges of argument
values (treated as unsigned):

   1.  [0, HOST_WIDE_INT_MAX)
   2.  HOST_WIDE_INT_MAX
   3.  [HOST_WIDE_INT_MAX + 1, Infinity)


But it doesn't make sense for those to be host dependent.


It isn't when the values are handled by each warning.  That's
also the point of this patch: to remove this (unintended)
dependency.


I think numerical user input should be limited to [0, ptrdiff_max]
and cases (1) and (2) should be simply merged, I see no value
in distinguishing them.  -Wxxx-larger-than should be aliased
to [0, ptrdiff_max], case (3) is achieved by -Wno-xxx-larger-than.


To be clear: this is also close to what this patch does.

The only wrinkle is that we don't know the value of PTRDIFF_MAX
either at the time the option initial value is set in the .opt
file or when the option is processed when it's specified either
on the command line or as an alias in the .opt file (as all
-Wno-xxx-larger-than options are).  Case (2) above is only
used by the implementation as a placeholder for PTRDIFF_MAX.
It's not part of the interface -- it's an internal workaround
for lack of a better word.

(There is an additional wrinkle in the -Walloca-larger-than=
has two modes both of which are controlled by a single option:
(a) diagnose only allocations >= PTRDIFF_MAX (default), and
(b) diagnose allocations > limit plus also unbounded/unknown
allocations.  I think these modes should be split up and (b)
controlled by a separate option (say something like
-Walloca-may-be-unbounded).


I think you are over-engineering this and the user-interface is
awful.


Thank you.

I agree that what you describe would be the ideal solution.
As I explained in the description of the patch, I did consider
handling PTRDIFF_MAX but the target-dependent value is not
available at the time the option argument is processed.  We
don't even know yet what the target data model is.

This is the best I came up with.  What do you suggest instead?

Martin






Re: [PATCH] Make strlen range computations more conservative

2018-08-01 Thread Martin Sebor

On 08/01/2018 01:19 AM, Richard Biener wrote:

On Tue, 31 Jul 2018, Martin Sebor wrote:


On 07/31/2018 09:48 AM, Jakub Jelinek wrote:

On Tue, Jul 31, 2018 at 09:17:52AM -0600, Martin Sebor wrote:

On 07/31/2018 12:38 AM, Jakub Jelinek wrote:

On Mon, Jul 30, 2018 at 09:45:49PM -0600, Martin Sebor wrote:

Even without _FORTIFY_SOURCE GCC diagnoses (some) writes past
the end of subobjects by string functions.  With _FORTIFY_SOURCE=2
it calls abort.  This is the default on popular distributions,


Note that _FORTIFY_SOURCE=2 is the mode that goes beyond what the
standard
requires, imposes extra requirements.  So from what this mode accepts or
rejects we shouldn't determine what is or isn't considered valid.


I'm not sure what the additional requirements are but the ones
I am referring to are the enforcing of struct member boundaries.
This is in line with the standard requirements of not accessing
[sub]objects via pointers derived from other [sub]objects.


In the middle-end the distinction between what was originally a reference
to subobjects and what was a reference to objects is quickly lost
(whether through SCCVN or other optimizations).
We've run into this many times with the __builtin_object_size already.
So, if e.g.
struct S { char a[3]; char b[5]; } s = { "abc", "defg" };
...
strlen ((char *) ) is well defined but
strlen (s.a) is not in C, for the middle-end you might not figure out which
one is which.


Yes, I'm aware of the middle-end transformation to MEM_REF
-- it's one of the reasons why detecting invalid accesses
by the middle end warnings, including -Warray-bounds,
-Wformat-overflow, -Wsprintf-overflow, and even -Wrestrict,
is less than perfect.

But is strlen(s.a) also meant to be well-defined in the middle
end (with the semantics of computing the length or "abcdefg"?)


Yes.


And if so, what makes it well defined?


The fact that strlen takes a char * argument and thus inline-expansion
of a trivial implementation like

 int len = 0;
 for (; *p; ++p)
   ++len;

will have

 p = 

and the middle-end doesn't reconstruct s.a[..] from the pointer
access.



Certainly not every "strlen" has these semantics.  For example,
this open-coded one doesn't:

  int len = 0;
  for (int i = 0; s.a[i]; ++i)
++len;

It computes 2 (with no warning for the out-of-bounds access).


Yes.


If that's not a problem then why is it one when strlen() does
the same thing?  Presumably the answer is: "because here
the access is via array indexing and in strlen via pointer
dereferences."  (But in C there is no difference between
the two.  Also see below.)


So if the standard doesn't guarantee it and different kinds
of accesses behave differently, how do we explain what "works"
and what doesn't without relying on GCC implementation details?


In the middle-end accesses via pointers - accesses where the
access path is not visible in the access itself - are not
constrained by the "access" path of how the pointer was built.


I have seen and I think shown in this discussion examples
where this is not so.  For instance:

  struct S { char a[1], b[1]; };

  void f (struct S *s, int i)
  {
char *p = >a[i];
char *q = >b[0];

char x = *p;
*q = 11;

if (x != *p)// folded to false
  __builtin_abort ();   // eliminated
  }

Is this a bug?  (I hope not.)


If we can't then the only language we have in common with users
is the standard.  (This, by the way, is what the C memory model
group is trying to address -- the language or feature that's
missing from the standard that says when, if ever, these things
might be valid.)


Well, you simply have to not compare apples and oranges,
a strlen implementation that isn't a strlen implementation
and strlen.


As I'm sure you know, the C standard doesn't differentiate
between the semantics of array subscript expressions and
pointer dereferencing.  They both mean the same thing.
(Nothing prevents an implementation from defining strlen
as a macro that expands into a loop using array indices
for array arguments.)

But this, I suspect, might be behind the disagreement.  You
seem to think in terms of GIMPLE and GCC internals, and have
a clear idea in your head what's meant to be valid and what
isn't.  I suspect only a few GCC developers think this way.
Most of the rest of us think in terms of the language
specification. Not just because that's the contract between
programmers and the compiler, but also because it's the only 
specification available (the GCC internals manual doesn't go

into nearly enough detail to even hint at what the answers
to some of these questions might be).

Martin



PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )

2018-08-01 Thread Martin Sebor

Since the foundation of the patch is detecting and avoiding
the overly aggressive folding of unterminated char arrays,
besides issuing a warning for such arguments to strlen,
the patch also fixes pr86711 - wrong folding of memchr, and
pr86714 - tree-ssa-forwprop.c confused by too long initializer.

The substance of the attached updated patch is unchanged,
I have just added test cases for the two additional bugs.

Bernd, as I mentioned Wednesday, the patch supersedes
yours here:
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html

Martin

On 07/30/2018 01:17 PM, Martin Sebor wrote:

Attached is an updated version of the patch that handles more
instances of calling strlen() on a constant array that is not
a nul-terminated string.

No other functions except strlen are explicitly handled yet,
and neither are constant arrays with braced-initializer lists
like const char a[] = { 'a', 'b', 'c' };  I am testing
an independent solution for those (bug 86552).  Once those
are handled the warning will be able to detect those as well.

Tested on x86_64-linux.

On 07/25/2018 05:38 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html

The fix for bug 86532 has been checked in so this enhancement
can now be applied on top of it (with only minor adjustments).

On 07/19/2018 02:08 PM, Martin Sebor wrote:

In the discussion of my patch for pr86532 Bernd noted that
GCC silently accepts constant character arrays with no
terminating nul as arguments to strlen (and other string
functions).

The attached patch is a first step in detecting these kinds
of bugs in strlen calls by issuing -Wstringop-overflow.
The next step is to modify all other handlers of built-in
functions to detect the same problem (not part of this patch).
Yet another step is to detect these problems in arguments
initialized using the non-string form:

  const char a[] = { 'a', 'b', 'c' };

This patch is meant to apply on top of the one for bug 86532
(I tested it with an earlier version of that patch so there
is code in the context that does not appear in the latest
version of the other diff).

Martin







PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
PR tree-optimization/86711 - wrong folding of memchr
PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	PR tree-optimization/86714
	PR tree-optimization/86711
	PR tree-optimization/86552
	* builtins.h (warn_string_no_nul): Declare..
	(c_strlen): Add argument.
	* builtins.c (warn_string_no_nul): New function.
	(fold_builtin_strlen): Add argument.  Detect missing nul.
	(fold_builtin_1): Adjust.
	(string_length): Add argument and use it.
	(c_strlen): Same.
	(expand_builtin_strlen): Detect missing nul.
	* expr.c (string_constant): Add arguments.  Detect missing nul
	terminator and outermost declaration it's missing in.
	* expr.h (string_constant): Add argument.
	* fold-const.c (c_getstr): Change argument to bool*, rename
	other arguments.
	* fold-const-call.c (fold_const_call): Detect missing nul.
	* gimple-fold.c (get_range_strlen): Add argument.
	(get_maxval_strlen): Adjust.
	* gimple-fold.h (get_range_strlen): Add argument.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86714
	PR tree-optimization/86711
	PR tree-optimization/86552
	* gcc.c-torture/execute/memchr-1.c: New test.
	* gcc.c-torture/execute/pr86714.c: New test.
	* gcc.dg/warn-strlen-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index aa3e0d8..f4924d5 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
 static rtx expand_builtin_expect (tree, rtx);
 static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
-static tree fold_builtin_strlen (location_t, tree, tree);
+static tree fold_builtin_strlen (location_t, tree, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
@@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
   return n;
 }
 
+/* For a call expression EXP to a function that expects a string argument,
+   issue a diagnostic due to it being a called with an argument NONSTR
+   that is a character array with no terminating NUL.  */
+
+void
+warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
+{
+  loc = expansion_point_location_if_in_system_header (loc);
+
+  bool warned;
+  if (exp)
+{
+  if (!fndecl)
+	fndecl = get_callee_fndecl (exp);
+  warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%K%qD argument missing terminating nul",
+			   exp, fndecl);
+}
+  else
+{
+  gcc_assert (fndecl);
+  warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%qD argument missing terminating nul",
+			   fndecl);
+}
+
+  if (warned && DECL_P (nonstr))
+  

Re: [14/46] Make STMT_VINFO_VEC_STMT a stmt_vec_info

2018-08-01 Thread H.J. Lu
On Tue, Jul 24, 2018 at 2:58 AM, Richard Sandiford
 wrote:
> This patch changes STMT_VINFO_VEC_STMT from a gimple stmt to a
> stmt_vec_info and makes the vectorizable_* routines pass back
> a stmt_vec_info to vect_transform_stmt.
>
>
> 2018-07-24  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (_stmt_vec_info::vectorized_stmt): Change from
> a gimple stmt to a stmt_vec_info.
> (vectorizable_condition, vectorizable_live_operation)
> (vectorizable_reduction, vectorizable_induction): Pass back the
> vectorized statement as a stmt_vec_info.
> * tree-vect-data-refs.c (vect_record_grouped_load_vectors): Update
> use of STMT_VINFO_VEC_STMT.
> * tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise,
> accumulating the inner phis that feed the STMT_VINFO_VEC_STMT
> as stmt_vec_infos rather than gimple stmts.
> (vectorize_fold_left_reduction): Change vec_stmt from a gimple stmt
> to a stmt_vec_info.
> (vectorizable_live_operation): Likewise.
> (vectorizable_reduction, vectorizable_induction): Likewise,
> updating use of STMT_VINFO_VEC_STMT.
> * tree-vect-stmts.c (vect_get_vec_def_for_operand_1): Update use
> of STMT_VINFO_VEC_STMT.
> (vect_build_gather_load_calls, vectorizable_bswap, vectorizable_call)
> (vectorizable_simd_clone_call, vectorizable_conversion)
> (vectorizable_assignment, vectorizable_shift, vectorizable_operation)
> (vectorizable_store, vectorizable_load, vectorizable_condition)
> (vectorizable_comparison, can_vectorize_live_stmts): Change vec_stmt
> from a gimple stmt to a stmt_vec_info.
> (vect_transform_stmt): Update use of STMT_VINFO_VEC_STMT.  Pass a
> pointer to a stmt_vec_info to the vectorizable_* routines.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86824

-- 
H.J.


Re: [PATCH] change %G argument from gcall* to gimple*

2018-08-01 Thread David Malcolm
On Wed, 2018-08-01 at 17:39 -0600, Martin Sebor wrote:
> > Please can you mail me when you commit, so I can rebase and retest
> > my
> > patch accordingly.
> > 
> > Thanks!
> 
> I've committed the patch in r263239.


Thanks!
Dave


Re: [PATCH] change %G argument from gcall* to gimple*

2018-08-01 Thread Martin Sebor

Please can you mail me when you commit, so I can rebase and retest my
patch accordingly.

Thanks!


I've committed the patch in r263239.

Thanks
Martin


Re: [PATCHv3 0/6] std::future::wait_* improvements

2018-08-01 Thread Jonathan Wakely

On 01/08/18 20:49 +0100, Jonathan Wakely wrote:

On 01/08/18 14:19 +0100, Mike Crowe wrote:

v2 of this series was originally posted back in January (see
https://gcc.gnu.org/ml/libstdc++/2018-01/msg00035.html )

Apart from minor log message tweaks, the changes since that version are:

* [1/6] Improve libstdc++-v3 async test

Speed up the tests at the risk of more sporadic failures on loaded
machines.

Use lambda rather than separate function for asynchronous routine.

* [2/6] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

Fall back to using gettimeofday and FUTEX_WAIT if FUTEX_WAIT_BITSET and
FUTEX_CLOCK_REALTIME are not available.

* [3/6] libstdc++ futex: Support waiting on std::chrono::steady_clock directly

Fall back to using clock_gettime (or the sycall directly if necessary)
and FUTEX_WAIT if FUTEX_WAIT_BITSET is unavailable.

* [4/6] libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock

No changes

* [5/6] libstdc++ futex: Loop when waiting against arbitrary clock

New patch. My work on std::condition_variable::wait_until made me realise
that there's a risk of indicating a timeout too early when using a
non-standard clock.

* [6/6] Extra async tests, not for merging

Use lambdas rather than separate functions for asynchronous routines.


Torvald Riegel had some objections to my design, but did not respond when I
attempted to justify it and attempted to change my implementation based on
his suggestions (see https://gcc.gnu.org/ml/libstdc++/2018-01/msg00071.html
.)

It looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68519 could
apply equally well to __atomic_futex_unsigned::_M_load_when_equal_for.
I plan to look at that next.

I set up a Debian 4.0 "Etch" system (v2.6.18 kernel, glibc 2.3.6.)
Unfortunately, its GCC 4.1.2 is unable to compile GCC master (as of
5ba044fc3a443274462527eed385732f7ecee3a8) because hash-map.h appears to be
trying to put a reference in a std::pair.)


That's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86739


The above patches cherry-pick
cleanly back to GCC 7.3, and that version does build after adding a few
constants to the system headers. I've confirmed that the lack of support
for FUTEX_WAIT_BITSET is detected correctly and that the code falls back to
using FUTEX_WAIT. This test also shows that the fallback to calling the
clock_gettime system call directly is working too. The async.cc tests all
passed.

I haven't made any attempt to add entries to the .abilist files. I'm not
sure whether I'm supposed to do that as part of the patches, or not.


Changes to gnu.ver linker script are needed, which should be in the patch.


This is the change needed for 'make check-abi' to pass:

--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1902,10 +1902,9 @@ GLIBCXX_3.4.21 {
_ZNSt7codecvtID[is]c*;
_ZT[ISV]St7codecvtID[is]c*E;

-extern "C++"
-{
-  std::__atomic_futex_unsigned_base*;
-};
+# std::__atomic_futex_unsigned_base members
+_ZNSt28__atomic_futex_unsigned_base19_M_futex_notify_all*;
+_ZNSt28__atomic_futex_unsigned_base19_M_futex_wait_until*;

# codecvt_utf8 etc.
_ZNKSt19__codecvt_utf8_base*;
@@ -2044,6 +2043,9 @@ GLIBCXX_3.4.26 {
_ZNSt3pmr20get_default_resourceEv;
_ZNSt3pmr20set_default_resourceEPNS_15memory_resourceE;

+# std::__atomic_futex_unsigned_base::_M_futex_wait_until_steady
+_ZNSt28__atomic_futex_unsigned_base26_M_futex_wait_until_steady*;
+
} GLIBCXX_3.4.25;

# Symbols in the support library (libsupc++) have their own tag.




Changes to the baseline_symbols.txt files should not be included in
the patch, those are regenerated later.




Mike Crowe (6):
Improve libstdc++-v3 async test
libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
libstdc++ futex: Support waiting on std::chrono::steady_clock directly
libstdc++ atomic_futex: Use std::chrono::steady_clock as reference
  clock
libstdc++ futex: Loop when waiting against arbitrary clock
Extra async tests, not for merging


Most of these patches fail to apply because they use spaces where the
GCC code has tabs. I can fix it, but I'm not sure why the tabs aren't
in your patches.






Re: [PATCH] Handle overlength strings in the C FE

2018-08-01 Thread Joseph Myers
On Wed, 1 Aug 2018, Eric Gallager wrote:

> While modifying -Woverlength-strings, I think a good way to test it

The -Woverlength-strings option (warn about string constants longer than 
the minimum guaranteed by the C standard to be supported) has nothing 
whatever to do with the present patch discussion (which relates to string 
constants longer than the array they are used as an initializer for).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH,AIX] Optimize the time required for loading XCOFF data

2018-08-01 Thread Ian Lance Taylor
On Tue, Jul 31, 2018 at 8:12 AM, REIX, Tony  wrote:
>
> Description:
>  * This patch optimizes the time required for loading XCOFF data.
>
> Tests:
>  * AIX: Build: SUCCESS
>- build made by means of gmake on AIX.
>
> ChangeLog:
>   * xcoff.c: Optimize loading of XCOFF data.

Thanks, committed with this ChangeLog entry.

Ian



2018-08-01  Tony Reix  

* xcoff.c (struct xcoff_line, struct xcoff_line_vector): Remove.
(struct xcoff_func, struct xcoff_func_vector): New structs.
(xcoff_syminfo): Drop leading dot from symbol name.
(xcoff_line_compare, xcoff_line_search): Remove.
(xcoff_func_compare, xcoff_func_search): New static functions.
(xcoff_lookup_pc): Search function table.
(xcoff_add_line, xcoff_process_linenos): Remove.
(xcoff_initialize_fileline): Build function table.


Re: Move all wide_int_range* functions into wide-int-range.[ch]

2018-08-01 Thread David Malcolm
On Wed, 2018-08-01 at 15:39 -0400, Aldy Hernandez wrote:
> This is actually an obvious patch, but I'm not committing it just in 
> case you'd prefer another name for the files.

BTW, is it our policy that new gcc C++ source files should have a .cc
extension?

> OK pending bootstrap?
> 
> Aldy


Re: [PATCH] change %G argument from gcall* to gimple*

2018-08-01 Thread David Malcolm
On Wed, 2018-08-01 at 13:53 -0600, Martin Sebor wrote:
> On 08/01/2018 08:33 AM, David Malcolm wrote:
> > On Tue, 2018-07-31 at 13:06 -0600, Martin Sebor wrote:
> > > The GCC internal %G directive takes a gcall* argument and prints
> > > the call's inlining stack in diagnostics.  The argument type
> > > makes
> > > it unsuitable for gimple expressions such as those diagnosed by
> > > -Warray-bounds.
> > > 
> > > As the first step in adding inlining context to -Warray-bounds
> > > warnings the attached patch changes the %G argument to accept
> > > gimple* instead of gcall*.  (More work is needed for %G to
> > > preserve the location range within diagnostics so this patch
> > > just implements the first step.)
> > 
> > Thanks for the patch.
> > 
> > I'm afraid I've been touching some of the same code recently (as
> > part
> > of my work on dumpfile.c), so I think this patch needs rebasing and
> > retesting (sorry!).
> 
> No problem.  I have seen some of your changes but didn't spot
> any serious conflicts.
> 
> > 
> > In particular, my r263181:
> >   https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01765.html
> >("c-family: clean up the data tables in c-format.c",
> > aka 98605dea9f97f74e6a5e75308774c117292b184e)
> > cleaned up part of c-format.c that your patch touches; I think your
> > patch is from before then.
> 
> Yes, I think the only adjustments to be made here should be
> to the gcall*/gimple* comments.
> 
> > 
> > Also, I noticed that your patch conflicts with my (not yet
> > approved)
> > patch here:
> > 
> >[PATCH 5/5] Formatted printing for dump_* in the middle-end
> >  https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01763.html
> > 
> > (which *adds* a usage of "gimple *" for a new pretty_printer
> > subclass,
> > whereas yours changes the "gcall *" usage to "gimple *"), so we
> > need to
> > sync up on that - I'm volunteering for me to wait for you (but
> > please
> > send me a heads-up email when you eventually commit).
> 
> I saw this patch yesterday but didn't notice a conflict.  Your
> changes are more extensive than mine so thanks for the offer to
> deal with any potential collisions.  If your patch is approved
> first it shouldn't be too hard for me to adjust to yours.  I
> think we can just wait and see.

I'll wait until yours is committed (see below).

> > > PR tree-optimization/86650 - -Warray-bounds missing inlining
> > > context
> > > 
> > > gcc/c/ChangeLog:
> > > 
> > >   PR tree-optimization/86650
> > >   * c-objc-common.c (c_tree_printer): Adjust.
> > 
> > I feel a bit hypocritical saying this, as I dislike writing
> > ChangeLog
> > entries, but I find this one too terse: I find myself asking "what
> > adjustment is being made, and why?"
> > 
> > How about something like:
> > 
> > gcc/c/ChangeLog:
> > 
> > PR tree-optimization/86650
> > * c-objc-common.c (c_tree_printer): Move usage of
> > EXPR_LOCATION (t) and TREE_BLOCK (t) from within
> > percent_K_format
> > to this callsite.
> > 
> > or somesuch (assuming that I've read the intent of the change
> > correctly); *is* that the intent of this part of the patch?
> 
> The intent is to be able to call percent_K_format() with location
> and block arguments that are different from those embedded in arg.
> This is relied on in percent_G_format(), otherwise the change has
> no effect on percent_K_format() or its other callers.
> 
> Most of the changes in this patch were mechanical, and this one
> is in my mind obvious: call the function with its new arguments.
> No functionality has been added or removed here.  But I agree
> that your description is more -- descriptive? -- so I've replaced
> the entry with the text as you suggest.

I'm of two minds as to the merits of ChangeLog entries, but I do like
to see some kind of high-level description of the proposed change
somewhere in the email, either in the "blurb" in the covering email, or
in the ChangeLog itself.

The thing that I felt was missing in the initial patch was a
description of the refactoring of percent_K_format's arguments; to me
it felt like something that should be captured *somewhere* (either in
the mailing list archive and/or in the ChangeLog; the latter feels like
a better place).

> > > gcc/c-family/ChangeLog:
> > > 
> > >   PR tree-optimization/86650
> > >   * c-format.c (local_gcall_ptr_node): Rename...
> > >(local_gimple_ptr_node): ...to this.
> > >   * c-format.h (T89_G): Adjust.
> > 
> > Likewise, I find this too terse, and it's incomplete: it's missing
> > the
> > changes to gcc_diag_char_table and to init_dynamic_diag_info.
> > 
> > How about something like this:
> > 
> > * c-format.c (local_gcall_ptr_node): Rename...
> > (local_gimple_ptr_node): ...to this.
> > (gcc_diag_char_table): Update comment for "%G".
> > (init_dynamic_diag_info): Update from "gcall *" to "gimple *".
> > * c-format.h (T89_G): Update to be "gimple *" rather than
> > "gcall *".
> 
> I've changed it to the above.  (I didn't know that 

Re: [PATCH] Handle overlength strings in the C FE

2018-08-01 Thread Eric Gallager
On 8/1/18, Martin Sebor  wrote:
> On 08/01/2018 05:20 AM, Bernd Edlinger wrote:
>> On 07/30/18 17:49, Joseph Myers wrote:
>>> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
>>>
 Hi,

 this is how I would like to handle the over length strings issue in the
 C FE.
 If the string constant is exactly the right length and ends in one
 explicit
 NUL character, shorten it by one character.
>>>
>>> I don't think shortening should be limited to that case.  I think the
>>> case
>>> where the constant is longer than that (and so gets an unconditional
>>> pedwarn) should also have it shortened - any constant that doesn't fit
>>> in
>>> the object being initialized should be shortened to fit, whether
>>> diagnosed
>>> or not, we should define GENERIC / GIMPLE to disallow too-large string
>>> constants in initializers, and should add an assertion somewhere in the
>>> middle-end that no too-large string constants reach it.
>>>
>>
>> Okay, there is an update following your suggestion.
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
>
> The ChangeLog description says:
>
>   * c-typeck.c (digest_init): Fix overlength strings.
>
> suggesting there is a bug but there is no test case.  If there
> is a bug in there that can be triggered by C code (valid or
> otherwise), it would be good to have a test case and a bug
> in Bugzilla.  If there is no bug and this is just cleanup,
> I would suggest to adjust the description.
>
> Other than that, while making improvements here, I think it
> would be helpful to also add more detail to the text of
> the warning:
>
> 1) mention the type of the array being initialized in case
> it's not obvious from the declaration (the array bound could
> be a symbol, not a literal, or the type could be a typedef)
>
> 2) mention the number of elements in the initializer in case
> it's a macro (such as __FILE__) whose definition isn't visible
> in the diagnostic
>
> 3) mention that the excess elements are ignored (since it's
> undefined in the standard, it will let users know what
> happens in GCC).
>
> Here's a test case and a suggested warning:
>
>#define S __FILE__ "\000"
>enum { N = sizeof __FILE__ };
>const char a[N] = S;
>
>warning: discarding 1 excess element from initializer-string for
> 'char[4]' [-Wc++-compat]
> #define S __FILE__ "\000"
>   ^~~~
>note: in expansion of macro ‘S’
> const char a[N] = S;
>   ^
> (Similarly for more than 1 excess element.)
>
> Martin
>

While modifying -Woverlength-strings, I think a good way to test it
would be to enable it during bootstrap. Even though it is currently
disabled, I have done bootstraps before while manually enabling it,
and they still succeeded. Plus there is documentation that says to
avoid overlength strings in GCC sources, so it would be good if this
were verified. I also brought this up in bug 80942:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80942

Eric


Re: [PATCH] Handle overlength strings in the C FE

2018-08-01 Thread Joseph Myers
On Wed, 1 Aug 2018, Marek Polacek wrote:

> I guess you want XALLOCAVAR or XNEWVAR.

Not XALLOCAVAR; GCC supports string constants up to 2 GB (minus one byte), 
which is far too much to put on the stack.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-08-01 Thread Martin Sebor

On 08/01/2018 10:34 AM, Martin Sebor wrote:

If you care about detecting bugs I would expect you to be
supportive rather than dismissive of this work, and helpful
in bringing it to fruition rather that putting it down or
questioning my priorities.  Especially since the work was
prompted by your own (valid) complaint that GCC doesn't
diagnose them.



You don't really listen to what I am saying, I did not say
that we need another warning instead of fixing the wrong
optimization issue at hand.

But I am in good company, you don't listen to Jakub and Richi
either.


I certainly intend to fix bugs I'm responsible for introducing.
I always do if given the chance.  I assume you are referring
to bug 86711 (and 86714).  Fixing the underlying problem has
been on my mind since you first mentioned it, and on my to-do
list since last week (bug 86688).


I've started looking into fixing 86711 but as it turns out,
by avoiding folding non-nul-terminated strings, this patch
already fixes it as well as producing the output you expect
for the test case in 86714, and also fixes 86688.

So unless you intend to pursue your patch I will assign all
these bugs to myself, add the test cases to this patch, and
resubmit it.  (I would normally prefer to deal with each bug
independently, but since I already have a working patch that
does the right thing I'd just as soon save the time and effort
and not try to break it up).

Martin


Re: [PATCH] Handle overlength strings in the C FE

2018-08-01 Thread Marek Polacek
On Wed, Aug 01, 2018 at 08:06:53PM +, Bernd Edlinger wrote:
> On 08/01/18 18:04, Joseph Myers wrote:
> > On Wed, 1 Aug 2018, Bernd Edlinger wrote:
> > 
> >> On 07/30/18 17:49, Joseph Myers wrote:
> >>> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
> >>>
>  Hi,
> 
>  this is how I would like to handle the over length strings issue in the 
>  C FE.
>  If the string constant is exactly the right length and ends in one 
>  explicit
>  NUL character, shorten it by one character.
> >>>
> >>> I don't think shortening should be limited to that case.  I think the case
> >>> where the constant is longer than that (and so gets an unconditional
> >>> pedwarn) should also have it shortened - any constant that doesn't fit in
> >>> the object being initialized should be shortened to fit, whether diagnosed
> >>> or not, we should define GENERIC / GIMPLE to disallow too-large string
> >>> constants in initializers, and should add an assertion somewhere in the
> >>> middle-end that no too-large string constants reach it.
> >>>
> >>
> >> Okay, there is an update following your suggestion.
> > 
> > It seems odd to me to have two separate bits of code dealing with reducing
> > the length, rather than something like
> > 
> > if (too long)
> >{
> >  /* Decide whether to do a pedwarn_init, or a warn_cxx_compat warning,
> > or neither.  */
> >  /* Shorten string, in either case.  */
> >}
> > 
> > The memcmp with "\0\0\0\0" is introducing a hidden assumption that any
> > sort of character in strings is never more than four bytes.  It also seems
> > unnecessary, in that ultimately the over-long string should be shortened
> > regardless of whether what's being removed is a zero character or not.
> > > It should not be possible to be over-long and fail tree_fits_uhwi_p
> > (TYPE_SIZE_UNIT (type)), simply because STRING_CST lengths are stored in
> > host int (even if, ideally, they'd use some other type to allow for
> > STRING_CSTs over 2GB in size).  (And I don't think GCC can represent
> > target type sizes that don't fit in unsigned HOST_WIDE_INT anyway; the
> > only way for a target type size in bytes to fail to be representable in
> > unsigned HOST_WIDE_INT should be if the size is not constant.)
> > 
> 
> Agreed.
> A new simplified version of the patch is attached.
> 
> Bootstrapped and reg-tested as usual.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.

> 2018-08-01  Bernd Edlinger  
> 
>   * c-typeck.c (digest_init): Shorten overlength strings.
> 
> diff -pur gcc/c/c-typeck.c gcc/c/c-typeck.c
> --- gcc/c/c-typeck.c  2018-06-20 18:35:15.0 +0200
> +++ gcc/c/c-typeck.c  2018-07-31 18:49:50.757586625 +0200
> @@ -7435,19 +7435,17 @@ digest_init (location_t init_loc, tree type, tree
>   }
>   }
>  
> -   TREE_TYPE (inside_init) = type;
> if (TYPE_DOMAIN (type) != NULL_TREE
> && TYPE_SIZE (type) != NULL_TREE
> && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
>   {
> unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init);
> +   unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT;
>  
> /* Subtract the size of a single (possibly wide) character
>because it's ok to ignore the terminating null char
>that is counted in the length of the constant.  */
> -   if (compare_tree_int (TYPE_SIZE_UNIT (type),
> - (len - (TYPE_PRECISION (typ1)
> - / BITS_PER_UNIT))) < 0)
> +   if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0)
>   pedwarn_init (init_loc, 0,
> ("initializer-string for array of chars "
>  "is too long"));
> @@ -7456,8 +7454,21 @@ digest_init (location_t init_loc, tree type, tree
>   warning_at (init_loc, OPT_Wc___compat,
>   ("initializer-string for array chars "
>"is too long for C++"));
> +   if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
> + {
> +   unsigned HOST_WIDE_INT size
> + = tree_to_uhwi (TYPE_SIZE_UNIT (type));
> +   const char *p = TREE_STRING_POINTER (inside_init);
> +   char *q = (char *)xmalloc (size + unit);

I guess you want XALLOCAVAR or XNEWVAR.

Marek


Re: [PATCH] Handle overlength strings in the C FE

2018-08-01 Thread Bernd Edlinger
On 08/01/18 18:04, Joseph Myers wrote:
> On Wed, 1 Aug 2018, Bernd Edlinger wrote:
> 
>> On 07/30/18 17:49, Joseph Myers wrote:
>>> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
>>>
 Hi,

 this is how I would like to handle the over length strings issue in the C 
 FE.
 If the string constant is exactly the right length and ends in one explicit
 NUL character, shorten it by one character.
>>>
>>> I don't think shortening should be limited to that case.  I think the case
>>> where the constant is longer than that (and so gets an unconditional
>>> pedwarn) should also have it shortened - any constant that doesn't fit in
>>> the object being initialized should be shortened to fit, whether diagnosed
>>> or not, we should define GENERIC / GIMPLE to disallow too-large string
>>> constants in initializers, and should add an assertion somewhere in the
>>> middle-end that no too-large string constants reach it.
>>>
>>
>> Okay, there is an update following your suggestion.
> 
> It seems odd to me to have two separate bits of code dealing with reducing
> the length, rather than something like
> 
> if (too long)
>{
>  /* Decide whether to do a pedwarn_init, or a warn_cxx_compat warning,
> or neither.  */
>  /* Shorten string, in either case.  */
>}
> 
> The memcmp with "\0\0\0\0" is introducing a hidden assumption that any
> sort of character in strings is never more than four bytes.  It also seems
> unnecessary, in that ultimately the over-long string should be shortened
> regardless of whether what's being removed is a zero character or not.
> > It should not be possible to be over-long and fail tree_fits_uhwi_p
> (TYPE_SIZE_UNIT (type)), simply because STRING_CST lengths are stored in
> host int (even if, ideally, they'd use some other type to allow for
> STRING_CSTs over 2GB in size).  (And I don't think GCC can represent
> target type sizes that don't fit in unsigned HOST_WIDE_INT anyway; the
> only way for a target type size in bytes to fail to be representable in
> unsigned HOST_WIDE_INT should be if the size is not constant.)
> 

Agreed.
A new simplified version of the patch is attached.

Bootstrapped and reg-tested as usual.
Is it OK for trunk?


Thanks
Bernd.
2018-08-01  Bernd Edlinger  

	* c-typeck.c (digest_init): Shorten overlength strings.

diff -pur gcc/c/c-typeck.c gcc/c/c-typeck.c
--- gcc/c/c-typeck.c	2018-06-20 18:35:15.0 +0200
+++ gcc/c/c-typeck.c	2018-07-31 18:49:50.757586625 +0200
@@ -7435,19 +7435,17 @@ digest_init (location_t init_loc, tree type, tree
 		}
 	}
 
-	  TREE_TYPE (inside_init) = type;
 	  if (TYPE_DOMAIN (type) != NULL_TREE
 	  && TYPE_SIZE (type) != NULL_TREE
 	  && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
 	{
 	  unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init);
+	  unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT;
 
 	  /* Subtract the size of a single (possibly wide) character
 		 because it's ok to ignore the terminating null char
 		 that is counted in the length of the constant.  */
-	  if (compare_tree_int (TYPE_SIZE_UNIT (type),
-(len - (TYPE_PRECISION (typ1)
-	/ BITS_PER_UNIT))) < 0)
+	  if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0)
 		pedwarn_init (init_loc, 0,
 			  ("initializer-string for array of chars "
 			   "is too long"));
@@ -7456,8 +7454,21 @@ digest_init (location_t init_loc, tree type, tree
 		warning_at (init_loc, OPT_Wc___compat,
 			("initializer-string for array chars "
 			 "is too long for C++"));
+	  if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
+		{
+		  unsigned HOST_WIDE_INT size
+		= tree_to_uhwi (TYPE_SIZE_UNIT (type));
+		  const char *p = TREE_STRING_POINTER (inside_init);
+		  char *q = (char *)xmalloc (size + unit);
+
+		  memcpy (q, p, size);
+		  memset (q + size, 0, unit);
+		  inside_init = build_string (size + unit, q);
+		  free (q);
+		}
 	}
 
+	  TREE_TYPE (inside_init) = type;
 	  return inside_init;
 	}
   else if (INTEGRAL_TYPE_P (typ1))


Re: [PATCHv3 2/6] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

2018-08-01 Thread Jonathan Wakely

On 01/08/18 14:19 +0100, Mike Crowe wrote:

The futex system call supports waiting for an absolute time if
FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT. Doing so provides two
benefits:

1. The call to gettimeofday is not required in order to calculate a
  relative timeout.

2. If someone changes the system clock during the wait then the futex
  timeout will correctly expire earlier or later. Currently that only
  happens if the clock is changed prior to the call to gettimeofday.

According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25. To ensure
that the code still works correctly with earlier kernel versions, an ENOSYS
error from futex[1] results in the futex_clock_realtime_unavailable being
flag being set. This flag is used to avoid the unnecessary unsupported
futex call in the future and to fall back to the previous gettimeofday and
relative time implementation.

glibc applied an equivalent switch in pthread_cond_timedwait to use
FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
glibc-2.10 back in 2009. See glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7

The futex_clock_realtime_unavailable flag is accessed using
std::memory_order_relaxed to stop it becoming a bottleneck. If the first
two calls to _M_futex_wait_until happen to happen simultaneously then the
only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
risk discovering that it doesn't work and, if so, both set the flag.

[1] This is how glibc's nptl-init.c determines whether these flags are
   supported.
---
libstdc++-v3/src/c++11/futex.cc | 34 ++
1 file changed, 34 insertions(+)

diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index 278a5a80902..72062a4285e 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -35,8 +35,16 @@

// Constants for the wait/wake futex syscall operations
const unsigned futex_wait_op = 0;
+const unsigned futex_wait_bitset_op = 9;
+const unsigned futex_clock_realtime_flag = 256;
+const unsigned futex_bitset_match_any = ~0;
const unsigned futex_wake_op = 1;

+namespace
+{
+  std::atomic futex_clock_realtime_unavailable;
+}
+
namespace std _GLIBCXX_VISIBILITY(default)
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -58,6 +66,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }
else
  {
+   if (!futex_clock_realtime_unavailable.load(std::memory_order_relaxed))
+ {
+   struct timespec rt;
+   rt.tv_sec = __s.count();
+   rt.tv_nsec = __ns.count();
+   if (syscall (SYS_futex, __addr, futex_wait_bitset_op | 
futex_clock_realtime_flag, __val, , nullptr, futex_bitset_match_any) == -1)
+ {
+   _GLIBCXX_DEBUG_ASSERT(errno == EINTR || errno == EAGAIN
+ || errno == ETIMEDOUT || errno == ENOSYS);


I've just replaced the use of _GLIBCXX_DEBUG_ASSERT in that file with
__glibcxx_assert, because the former is only used when Debug Mode is
active, and we'll never build libstdc++.so with Debug Mode.

The latter is enabled by -D_GLIBCXX_ASSERTIONS, which I've just added
to the default flags for the --enable-libstdcxx-debug option (which
builds a second copy of libstdc++.so built without optimisations).

I can make the corresponding changes to your patches, so this is just
FYI.



[PATCH, rs6000] Correct descriptions of __builtin_bcdadd* and _builtin_bcdsub* functions

2018-08-01 Thread Kelvin Nilsen
Several errors were discovered in the descriptions of the __builtin_bcdadd, 
__builtin_bcdadd_lt, __builtin_bcdadd_eq, __builtin_bcdadd_gt, 
__builtin_bcdadd_ov, __builtin_bcdsub, __builtin_bcdsub_lt, 
__builtin_bcdsub_eq, __builtin_bcdsub_gt, and __builtin_bcdsub_ov functions.  
This patch corrects these documentation errors.

I have built the gcc.pdf file and reviewed the formatting, and all looks good.

Is this ok for trunk?

gcc/ChangeLog:

2018-08-01  Kelvin Nilsen  

* doc/extend.texi (PowerPC AltiVec Built-in Functions Available on
ISA 2.07): Correct spelling of bcdsub to be __builtin_bcdsub.  Add
third argument of type "const signed char" to descriptions of
__builtin_bcdadd, __builtin_bcdadd_lt, __builtin_bcdadd_eq,
__builtin_bcdadd_gt, __builtin_bcdadd_ov, __builtin_bcdsub,
__builtin_bcdsub_lt, __builtin_bcdsub_eq, __builtin_bcdsub_gt,
__builtin_bcdsub_ov functions.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 263068)
+++ gcc/doc/extend.texi (working copy)
@@ -18383,16 +18383,16 @@ vector __uint128 vec_vsubcuq (vector __uint128, ve
 __int128 vec_vsubuqm (__int128, __int128);
 __uint128 vec_vsubuqm (__uint128, __uint128);
 
-vector __int128 __builtin_bcdadd (vector __int128, vector __int128);
-int __builtin_bcdadd_lt (vector __int128, vector __int128);
-int __builtin_bcdadd_eq (vector __int128, vector __int128);
-int __builtin_bcdadd_gt (vector __int128, vector __int128);
-int __builtin_bcdadd_ov (vector __int128, vector __int128);
-vector __int128 bcdsub (vector __int128, vector __int128);
-int __builtin_bcdsub_lt (vector __int128, vector __int128);
-int __builtin_bcdsub_eq (vector __int128, vector __int128);
-int __builtin_bcdsub_gt (vector __int128, vector __int128);
-int __builtin_bcdsub_ov (vector __int128, vector __int128);
+vector __int128 __builtin_bcdadd (vector __int128, vector __int128, const 
signed char);
+int __builtin_bcdadd_lt (vector __int128, vector __int128, const signed char);
+int __builtin_bcdadd_eq (vector __int128, vector __int128, const signed char);
+int __builtin_bcdadd_gt (vector __int128, vector __int128, const signed char);
+int __builtin_bcdadd_ov (vector __int128, vector __int128, const signed char);
+vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const 
signed char);
+int __builtin_bcdsub_lt (vector __int128, vector __int128, const signed char);
+int __builtin_bcdsub_eq (vector __int128, vector __int128, const signed char);
+int __builtin_bcdsub_gt (vector __int128, vector __int128, const signed char);
+int __builtin_bcdsub_ov (vector __int128, vector __int128, const signed char);
 @end smallexample
 
 @node PowerPC AltiVec Built-in Functions Available on ISA 3.0



Re: [PATCH] change %G argument from gcall* to gimple*

2018-08-01 Thread Martin Sebor

On 08/01/2018 08:33 AM, David Malcolm wrote:

On Tue, 2018-07-31 at 13:06 -0600, Martin Sebor wrote:

The GCC internal %G directive takes a gcall* argument and prints
the call's inlining stack in diagnostics.  The argument type makes
it unsuitable for gimple expressions such as those diagnosed by
-Warray-bounds.

As the first step in adding inlining context to -Warray-bounds
warnings the attached patch changes the %G argument to accept
gimple* instead of gcall*.  (More work is needed for %G to
preserve the location range within diagnostics so this patch
just implements the first step.)


Thanks for the patch.

I'm afraid I've been touching some of the same code recently (as part
of my work on dumpfile.c), so I think this patch needs rebasing and
retesting (sorry!).


No problem.  I have seen some of your changes but didn't spot
any serious conflicts.



In particular, my r263181:
  https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01765.html
   ("c-family: clean up the data tables in c-format.c",
aka 98605dea9f97f74e6a5e75308774c117292b184e)
cleaned up part of c-format.c that your patch touches; I think your
patch is from before then.


Yes, I think the only adjustments to be made here should be
to the gcall*/gimple* comments.



Also, I noticed that your patch conflicts with my (not yet approved)
patch here:

   [PATCH 5/5] Formatted printing for dump_* in the middle-end
 https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01763.html

(which *adds* a usage of "gimple *" for a new pretty_printer subclass,
whereas yours changes the "gcall *" usage to "gimple *"), so we need to
sync up on that - I'm volunteering for me to wait for you (but please
send me a heads-up email when you eventually commit).


I saw this patch yesterday but didn't notice a conflict.  Your
changes are more extensive than mine so thanks for the offer to
deal with any potential collisions.  If your patch is approved
first it shouldn't be too hard for me to adjust to yours.  I
think we can just wait and see.


PR tree-optimization/86650 - -Warray-bounds missing inlining context

gcc/c/ChangeLog:

PR tree-optimization/86650
* c-objc-common.c (c_tree_printer): Adjust.


I feel a bit hypocritical saying this, as I dislike writing ChangeLog
entries, but I find this one too terse: I find myself asking "what
adjustment is being made, and why?"

How about something like:

gcc/c/ChangeLog:

PR tree-optimization/86650
* c-objc-common.c (c_tree_printer): Move usage of
EXPR_LOCATION (t) and TREE_BLOCK (t) from within percent_K_format
to this callsite.

or somesuch (assuming that I've read the intent of the change
correctly); *is* that the intent of this part of the patch?


The intent is to be able to call percent_K_format() with location
and block arguments that are different from those embedded in arg.
This is relied on in percent_G_format(), otherwise the change has
no effect on percent_K_format() or its other callers.

Most of the changes in this patch were mechanical, and this one
is in my mind obvious: call the function with its new arguments.
No functionality has been added or removed here.  But I agree
that your description is more -- descriptive? -- so I've replaced
the entry with the text as you suggest.


gcc/c-family/ChangeLog:

PR tree-optimization/86650
* c-format.c (local_gcall_ptr_node): Rename...
 (local_gimple_ptr_node): ...to this.
* c-format.h (T89_G): Adjust.


Likewise, I find this too terse, and it's incomplete: it's missing the
changes to gcc_diag_char_table and to init_dynamic_diag_info.

How about something like this:

* c-format.c (local_gcall_ptr_node): Rename...
(local_gimple_ptr_node): ...to this.
(gcc_diag_char_table): Update comment for "%G".
(init_dynamic_diag_info): Update from "gcall *" to "gimple *".
* c-format.h (T89_G): Update to be "gimple *" rather than
"gcall *".


I've changed it to the above.  (I didn't know that trivial changes
like updating comments was expected to be documented in ChangeLogs,
but fine.)



FWIW I use this script to help ChangeLog entries.
It saves a lot of gruntwork:

  
https://github.com/davidmalcolm/gcc-refactoring-scripts/blob/master/generate-changelog.py

(but the remaining work is still tedious, alas)


gcc/cp/ChangeLog:

PR tree-optimization/86650
* error.c (cp_printer): Adjust.


See the suggestion above for c-objc-common.c (c_tree_printer).


Ditto.




gcc/ChangeLog:

PR tree-optimization/86650
* gimple-pretty-print.c (percent_G_format): Simplify.
* tree-diagnostic.c (default_tree_printer): Adjust.
* tree-pretty-print.c (percent_K_format): Add argument.
* tree-pretty-print.h: Add argument.
* gimple-fold.c (gimple_fold_builtin_strncpy): Adjust.
* gimple-ssa-warn-restrict.h (check_bounds_or_overlap): Replace
gcall* argument with gimple*.
* gimple-ssa-warn-restrict.c 

[PATCH] Add -D_GLIBCXX_ASSERTIONS to DEBUG_FLAGS

2018-08-01 Thread Jonathan Wakely

Enable assertions in the extra debug library built when
--enable-libstdcxx-debug is used. Replace some Debug Mode assertions
in src/c++11/futex.cc with __glibcxx_assert, because the library will
never be built with Debug Mode.

* configure: Regenerate.
* configure.ac: Add -D_GLIBCXX_ASSERTIONS to default DEBUG_FLAGS.
* src/c++11/futex.cc: Use __glibcxx_assert instead of
_GLIBCXX_DEBUG_ASSERT.

Tested powerpc64le-linux, committed to trunk.


commit 1af9b89d2b7c7feefae208ad4ebbb4cafe9f3ce4
Author: Jonathan Wakely 
Date:   Wed Aug 1 18:28:06 2018 +0100

Add -D_GLIBCXX_ASSERTIONS to DEBUG_FLAGS

Enable assertions in the extra debug library built when
--enable-libstdcxx-debug is used. Replace some Debug Mode assertions
in src/c++11/futex.cc with __glibcxx_assert, because the library will
never be built with Debug Mode.

* configure: Regenerate.
* configure.ac: Add -D_GLIBCXX_ASSERTIONS to default DEBUG_FLAGS.
* src/c++11/futex.cc: Use __glibcxx_assert instead of
_GLIBCXX_DEBUG_ASSERT.

diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index 1e0a33fb3ea..332af3706d3 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -168,7 +168,7 @@ GLIBCXX_ENABLE_LONG_LONG([yes])
 GLIBCXX_ENABLE_WCHAR_T([yes])
 GLIBCXX_ENABLE_C99([yes])
 GLIBCXX_ENABLE_CONCEPT_CHECKS([no])
-GLIBCXX_ENABLE_DEBUG_FLAGS(["-gdwarf-4 -g3 -O0"])
+GLIBCXX_ENABLE_DEBUG_FLAGS(["-gdwarf-4 -g3 -O0 -D_GLIBCXX_ASSERTIONS"])
 GLIBCXX_ENABLE_DEBUG([no])
 GLIBCXX_ENABLE_PARALLEL([yes])
 GLIBCXX_ENABLE_CXX_FLAGS
diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index 278a5a80902..a5a8ec68c53 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -53,7 +53,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// here on errors is abort.
int ret __attribute__((unused));
ret = syscall (SYS_futex, __addr, futex_wait_op, __val, nullptr);
-   _GLIBCXX_DEBUG_ASSERT(ret == 0 || errno == EINTR || errno == EAGAIN);
+   __glibcxx_assert(ret == 0 || errno == EINTR || errno == EAGAIN);
return true;
   }
 else
@@ -75,8 +75,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
if (syscall (SYS_futex, __addr, futex_wait_op, __val, ) == -1)
  {
-   _GLIBCXX_DEBUG_ASSERT(errno == EINTR || errno == EAGAIN
- || errno == ETIMEDOUT);
+   __glibcxx_assert(errno == EINTR || errno == EAGAIN
+|| errno == ETIMEDOUT);
if (errno == ETIMEDOUT)
  return false;
  }


Re: [PATCHv3 0/6] std::future::wait_* improvements

2018-08-01 Thread Jonathan Wakely

On 01/08/18 14:19 +0100, Mike Crowe wrote:

v2 of this series was originally posted back in January (see
https://gcc.gnu.org/ml/libstdc++/2018-01/msg00035.html )

Apart from minor log message tweaks, the changes since that version are:

* [1/6] Improve libstdc++-v3 async test

 Speed up the tests at the risk of more sporadic failures on loaded
 machines.

 Use lambda rather than separate function for asynchronous routine.

* [2/6] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

 Fall back to using gettimeofday and FUTEX_WAIT if FUTEX_WAIT_BITSET and
 FUTEX_CLOCK_REALTIME are not available.

* [3/6] libstdc++ futex: Support waiting on std::chrono::steady_clock directly

 Fall back to using clock_gettime (or the sycall directly if necessary)
 and FUTEX_WAIT if FUTEX_WAIT_BITSET is unavailable.

* [4/6] libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock

 No changes

* [5/6] libstdc++ futex: Loop when waiting against arbitrary clock

 New patch. My work on std::condition_variable::wait_until made me realise
 that there's a risk of indicating a timeout too early when using a
 non-standard clock.

* [6/6] Extra async tests, not for merging

 Use lambdas rather than separate functions for asynchronous routines.


Torvald Riegel had some objections to my design, but did not respond when I
attempted to justify it and attempted to change my implementation based on
his suggestions (see https://gcc.gnu.org/ml/libstdc++/2018-01/msg00071.html
.)

It looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68519 could
apply equally well to __atomic_futex_unsigned::_M_load_when_equal_for.
I plan to look at that next.

I set up a Debian 4.0 "Etch" system (v2.6.18 kernel, glibc 2.3.6.)
Unfortunately, its GCC 4.1.2 is unable to compile GCC master (as of
5ba044fc3a443274462527eed385732f7ecee3a8) because hash-map.h appears to be
trying to put a reference in a std::pair.)


That's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86739


The above patches cherry-pick
cleanly back to GCC 7.3, and that version does build after adding a few
constants to the system headers. I've confirmed that the lack of support
for FUTEX_WAIT_BITSET is detected correctly and that the code falls back to
using FUTEX_WAIT. This test also shows that the fallback to calling the
clock_gettime system call directly is working too. The async.cc tests all
passed.

I haven't made any attempt to add entries to the .abilist files. I'm not
sure whether I'm supposed to do that as part of the patches, or not.


Changes to gnu.ver linker script are needed, which should be in the patch.

Changes to the baseline_symbols.txt files should not be included in
the patch, those are regenerated later.




Mike Crowe (6):
 Improve libstdc++-v3 async test
 libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
 libstdc++ futex: Support waiting on std::chrono::steady_clock directly
 libstdc++ atomic_futex: Use std::chrono::steady_clock as reference
   clock
 libstdc++ futex: Loop when waiting against arbitrary clock
 Extra async tests, not for merging


Most of these patches fail to apply because they use spaces where the
GCC code has tabs. I can fix it, but I'm not sure why the tabs aren't
in your patches.






Move all wide_int_range* functions into wide-int-range.[ch]

2018-08-01 Thread Aldy Hernandez
This is actually an obvious patch, but I'm not committing it just in 
case you'd prefer another name for the files.


OK pending bootstrap?

Aldy
gcc/

	* Makefile.in (wide-int-range.o): New.
	* tree-vrp.c: Move all the wide_int_* functions to...
	* wide-int-range.c: ...here.
	* tree-vrp.h: Move all the wide_int_* prototypes to...
	* wide-int-range.h: ...here.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index b8716406533..e7d818d174c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1601,6 +1601,7 @@ OBJS = \
 	web.o \
 	wide-int.o \
 	wide-int-print.o \
+	wide-int-range.o \
 	xcoffout.o \
 	$(out_object_file) \
 	$(EXTRA_OBJS) \
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index d18c72d0a02..e1875d8d46e 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "vr-values.h"
 #include "builtins.h"
+#include "wide-int-range.h"
 
 /* Set of SSA names found live during the RPO traversal of the function
for still active basic-blocks.  */
@@ -956,98 +957,7 @@ value_range_constant_singleton (value_range *vr)
   return NULL_TREE;
 }
 
-/* Wrapper around wide_int_binop that adjusts for overflow.
-
-   Return true if we can compute the result; i.e. if the operation
-   doesn't overflow or if the overflow is undefined.  In the latter
-   case (if the operation overflows and overflow is undefined), then
-   adjust the result to be -INF or +INF depending on CODE, VAL1 and
-   VAL2.  Return the value in *RES.
-
-   Return false for division by zero, for which the result is
-   indeterminate.  */
-
-static bool
-wide_int_binop_overflow (wide_int ,
-			 enum tree_code code,
-			 const wide_int , const wide_int ,
-			 signop sign, bool overflow_undefined)
-{
-  wi::overflow_type overflow;
-  if (!wide_int_binop (res, code, w0, w1, sign, ))
-return false;
-
-  /* If the operation overflowed return -INF or +INF depending on the
- operation and the combination of signs of the operands.  */
-  if (overflow && overflow_undefined)
-{
-  switch (code)
-	{
-	case MULT_EXPR:
-	  /* For multiplication, the sign of the overflow is given
-	 by the comparison of the signs of the operands.  */
-	  if (sign == UNSIGNED || w0.sign_mask () == w1.sign_mask ())
-	res = wi::max_value (w0.get_precision (), sign);
-	  else
-	res = wi::min_value (w0.get_precision (), sign);
-	  return true;
-
-	case TRUNC_DIV_EXPR:
-	case FLOOR_DIV_EXPR:
-	case CEIL_DIV_EXPR:
-	case EXACT_DIV_EXPR:
-	case ROUND_DIV_EXPR:
-	  /* For division, the only case is -INF / -1 = +INF.  */
-	  res = wi::max_value (w0.get_precision (), sign);
-	  return true;
-
-	default:
-	  gcc_unreachable ();
-	}
-}
-  return !overflow;
-}
-
-/* For range [LB, UB] compute two wide_int bit masks.
-
-   In the MAY_BE_NONZERO bit mask, if some bit is unset, it means that
-   for all numbers in the range the bit is 0, otherwise it might be 0
-   or 1.
-
-   In the MUST_BE_NONZERO bit mask, if some bit is set, it means that
-   for all numbers in the range the bit is 1, otherwise it might be 0
-   or 1.  */
-
-void
-wide_int_set_zero_nonzero_bits (signop sign,
-const wide_int , const wide_int ,
-wide_int _be_nonzero,
-wide_int _be_nonzero)
-{
-  may_be_nonzero = wi::minus_one (lb.get_precision ());
-  must_be_nonzero = wi::zero (lb.get_precision ());
-
-  if (wi::eq_p (lb, ub))
-{
-  may_be_nonzero = lb;
-  must_be_nonzero = may_be_nonzero;
-}
-  else if (wi::ge_p (lb, 0, sign) || wi::lt_p (ub, 0, sign))
-{
-  wide_int xor_mask = lb ^ ub;
-  may_be_nonzero = lb | ub;
-  must_be_nonzero = lb & ub;
-  if (xor_mask != 0)
-	{
-	  wide_int mask = wi::mask (wi::floor_log2 (xor_mask), false,
-may_be_nonzero.get_precision ());
-	  may_be_nonzero = may_be_nonzero | mask;
-	  must_be_nonzero = wi::bit_and_not (must_be_nonzero, mask);
-	}
-}
-}
-
-/* Value range wrapper for wide_int_set_zero_nonzero_bits.
+/* Value range wrapper for wide_int_range_set_zero_nonzero_bits.
 
Compute MAY_BE_NONZERO and MUST_BE_NONZERO bit masks for range in VR.
 
@@ -1066,9 +976,10 @@ vrp_set_zero_nonzero_bits (const tree expr_type,
   *must_be_nonzero = wi::zero (TYPE_PRECISION (expr_type));
   return false;
 }
-  wide_int_set_zero_nonzero_bits (TYPE_SIGN (expr_type),
- wi::to_wide (vr->min), wi::to_wide (vr->max),
- *may_be_nonzero, *must_be_nonzero);
+  wide_int_range_set_zero_nonzero_bits (TYPE_SIGN (expr_type),
+	wi::to_wide (vr->min),
+	wi::to_wide (vr->max),
+	*may_be_nonzero, *must_be_nonzero);
   return true;
 }
 
@@ -1114,516 +1025,6 @@ ranges_from_anti_range (value_range *ar,
   return vr0->type != VR_UNDEFINED;
 }
 
-/* Order 2 sets of wide int ranges (w0/w1, w2/w3) and set MIN/MAX
-   accordingly.  */
-
-static void
-wide_int_range_min_max (wide_int , wide_int ,
-			wide_int , wide_int , wide_int , wide_int ,
-			signop sign)
-{
-  /* Order pairs w0,w1 and w2,w3.  */
-  if 

Re: [PATCHv3 3/6] libstdc++ futex: Support waiting on std::chrono::steady_clock directly

2018-08-01 Thread Jonathan Wakely

On 01/08/18 14:19 +0100, Mike Crowe wrote:

The user-visible effect of this change is for std::future::wait_until to
use CLOCK_MONOTONIC when passed a timeout of std::chrono::steady_clock
type. This makes it immune to any changes made to the system clock
CLOCK_REALTIME.

Add an overload of __atomic_futex_unsigned::_M_load_and_text_until_impl
that accepts a std::chrono::steady_clock, and correctly passes this through
to __atomic_futex_unsigned_base::_M_futex_wait_until_steady which uses
CLOCK_MONOTONIC for the timeout within the futex system call. These
functions are mostly just copies of the std::chrono::system_clock versions
with small tweaks.

Prior to this commit, a std::chrono::steady timeout would be converted via
std::chrono::system_clock which risks reducing or increasing the timeout if
someone changes CLOCK_REALTIME whilst the wait is happening. (The commit
immediately prior to this one increases the window of opportunity for that
from a short period during the calculation of a relative timeout, to the
entire duration of the wait.)

FUTEX_WAIT_BITSET was added in kernel v2.6.25. If futex reports ENOSYS to
indicate that this operation is not supported then the code falls back to
using clock_gettime(2) to calculate a relative time to wait for.

I believe that I've added this functionality in a way that it doesn't break
ABI compatibility, but that has made it more verbose and less type safe. I
believe that it would be better to maintain the timeout as an instance of
the correct clock type all the way down to a single _M_futex_wait_until
function with an overload for each clock. The current scheme of separating
out the seconds and nanoseconds early risks accidentally calling the wait
function for the wrong clock.


Surely that would just be a programming error? Users aren't calling
these functions, and we only call them in a very limited number of
places, so the risk of calling the wrong one seems manageable. Just
don't do that.



Unfortunately, doing this would break code
that compiled against the old header.


We could add the new functions taking steady_clock::time_point and
system_clock::time_point, and as long as the existing function is also
still exported from the library nothing would break, would it?

We could make the existing function call the new one, or vice versa,
to avoid duplicating code.

I'm nto sure we actually want to do that, but I don't see why it
wouldn't be possible to do without breaking existing code.



diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index 72062a4285e..5c02f0f55ed 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -118,6 +125,78 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }
  }

+  bool
+  __atomic_futex_unsigned_base::_M_futex_wait_until_steady(unsigned *__addr,
+  unsigned __val,
+  bool __has_timeout, chrono::seconds __s, chrono::nanoseconds __ns)
+  {
+if (!__has_timeout)
+  {
+   // Ignore whether we actually succeeded to block because at worst,
+   // we will fall back to spin-waiting.  The only thing we could do
+   // here on errors is abort.
+   int ret __attribute__((unused));
+   ret = syscall (SYS_futex, __addr, futex_wait_op, __val, nullptr);
+   _GLIBCXX_DEBUG_ASSERT(ret == 0 || errno == EINTR || errno == EAGAIN);
+   return true;
+  }
+else
+  {
+   if (!futex_clock_monotonic_unavailable.load(std::memory_order_relaxed))
+ {
+   struct timespec rt;
+   rt.tv_sec = __s.count();
+   rt.tv_nsec = __ns.count();
+
+   if (syscall (SYS_futex, __addr, futex_wait_bitset_op | 
futex_clock_monotonic_flag, __val, , nullptr, futex_bitset_match_any) == -1)
+ {
+   _GLIBCXX_DEBUG_ASSERT(errno == EINTR || errno == EAGAIN
+ || errno == ETIMEDOUT || errno == ENOSYS);
+   if (errno == ETIMEDOUT)
+ return false;
+   else if (errno == ENOSYS)
+ {
+   futex_clock_monotonic_unavailable.store(true, 
std::memory_order_relaxed);
+   // Fall through to legacy implementation if the system
+   // call is unavailable.
+ }
+   else
+ return true;
+ }
+ }
+
+   // We only get to here if futex_clock_realtime_unavailable was


This should say futex_clock_monotonic_unavailable.

I'm also replacing the _GLIBCXX_DEBUG_ASSERT checks in that file with
__glibcxx_assert checks, because that file is never going to be
compiled with Debug Mode.




Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-01 Thread Cesar Philippidis
On 08/01/2018 07:12 AM, Tom de Vries wrote:

 +gangs = grids * (blocks / warp_size);
>>>
>>> So, we launch with gangs == grids * workers ? Is that intentional?
>>
>> Yes. At least that's what I've been using in og8. Setting num_gangs =
>> grids alone caused significant slow downs.
>>
> 
> Well, what you're saying here is: increasing num_gangs increases
> performance.
> 
> You don't explain why you multiply with workers specifically.

I set it that way because I think the occupancy calculator is
determining the occupancy of a single multiprocessor unit, rather than
the entire GPU. Looking at the og8 code again, I had

   num_gangs = 2 * threads_per_sm / warp_size * dev_size

which corresponds to

   2 * grids * blocks / warp_size

Because blocks is generally smaller than threads_per_block, the driver
occupancy calculator ends up launching fewer gangs.

I don't have a firm position with this default behavior. Perhaps we
should just set

  gang = grids

That's probably an improvement over what's there now.

Cesar


Re: [PATCH] Backport gettext fixes to get rid of warnings on macOS

2018-08-01 Thread Simon Marchi

On 2018-08-01 10:58, Simon Marchi wrote:
This patch was tested to build binutils-gdb on GNU/Linux and macOS.  It 
can be
applied to the gcc repo too, after fixing some trivial merge conflicts 
(someone
else will need to do it, as I don't have push access to gcc).  Although 
I think
it is relatively low-risk, building gcc on macOS was not tested with 
this
patch, so if somebody that has already a macOS build can do it, it 
would be

appreciated.


Actually it can be applied cleanly in gcc too, I just forgot I had some 
local commits touching the same spots.


Simon


Re: [PATCH][3/4] Use RPO VN from unrolling

2018-08-01 Thread Richard Biener
On August 1, 2018 4:58:09 PM GMT+02:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> This should be 4/4 but I have the main patch on top, so...
>>
>> This uses the region-based VN from GIMPLE unrolling which means
>> we better approximate the effects optimizations on unrolled inner
>> loops when evaluating whether to unroll outer ones.
>
>Great!  Sounds like it should also fix cases where missed
>value-numbering
>opportunities after unrolling prevented SLP vectorisation.  (Hit a few
>of
>those, but don't have the testcases to hand unfortunately.)

Yes though as before we do not value number the blocks if they are not 
contained inside a loop. That's of course easy to change. 

Richard. 

>Thanks,
>Richard
>
>>
>>  * tree-ssa-loop-ivcanon.c: Include tree-ssa-sccvn.h.
>>  (propagate_constants_for_unrolling): Remove.
>>  (tree_unroll_loops_completely): Perform value-numbering
>>  on the unrolled bodies loop parent.
>>
>>  * gfortran.dg/reassoc_4.f: Change max-completely-peeled-insns
>>  param to current default.
>> ---
>>  gcc/testsuite/gfortran.dg/reassoc_4.f |  2 +-
>>  gcc/tree-ssa-loop-ivcanon.c   | 57
>++-
>>  2 files changed, 10 insertions(+), 49 deletions(-)
>>
>> diff --git a/gcc/testsuite/gfortran.dg/reassoc_4.f
>b/gcc/testsuite/gfortran.dg/reassoc_4.f
>> index b155cba768c..07b4affb2a4 100644
>> --- a/gcc/testsuite/gfortran.dg/reassoc_4.f
>> +++ b/gcc/testsuite/gfortran.dg/reassoc_4.f
>> @@ -1,5 +1,5 @@
>>  ! { dg-do compile }
>> -! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param
>max-completely-peeled-insns=400" }
>> +! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param
>max-completely-peeled-insns=200" }
>>  ! { dg-additional-options "--param max-completely-peel-times=16" {
>target spu-*-* } }
>>subroutine anisonl(w,vo,anisox,s,ii1,jj1,weight)
>>integer ii1,jj1,i1,iii1,j1,jjj1,k1,l1,m1,n1
>> diff --git a/gcc/tree-ssa-loop-ivcanon.c
>b/gcc/tree-ssa-loop-ivcanon.c
>> index 326589f63c3..97c2ad94985 100644
>> --- a/gcc/tree-ssa-loop-ivcanon.c
>> +++ b/gcc/tree-ssa-loop-ivcanon.c
>> @@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-inline.h"
>>  #include "tree-cfgcleanup.h"
>>  #include "builtins.h"
>> +#include "tree-ssa-sccvn.h"
>>  
>>  /* Specifies types of loops that may be unrolled.  */
>>  
>> @@ -1318,50 +1319,6 @@ canonicalize_induction_variables (void)
>>return 0;
>>  }
>>  
>> -/* Propagate constant SSA_NAMEs defined in basic block BB.  */
>> -
>> -static void
>> -propagate_constants_for_unrolling (basic_block bb)
>> -{
>> -  /* Look for degenerate PHI nodes with constant argument.  */
>> -  for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi); )
>> -{
>> -  gphi *phi = gsi.phi ();
>> -  tree result = gimple_phi_result (phi);
>> -  tree arg = gimple_phi_arg_def (phi, 0);
>> -
>> -  if (! SSA_NAME_OCCURS_IN_ABNORMAL_PHI (result)
>> -  && gimple_phi_num_args (phi) == 1
>> -  && CONSTANT_CLASS_P (arg))
>> -{
>> -  replace_uses_by (result, arg);
>> -  gsi_remove (, true);
>> -  release_ssa_name (result);
>> -}
>> -  else
>> -gsi_next ();
>> -}
>> -
>> -  /* Look for assignments to SSA names with constant RHS.  */
>> -  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p
>(gsi); )
>> -{
>> -  gimple *stmt = gsi_stmt (gsi);
>> -  tree lhs;
>> -
>> -  if (is_gimple_assign (stmt)
>> -  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) ==
>tcc_constant
>> -  && (lhs = gimple_assign_lhs (stmt), TREE_CODE (lhs) == SSA_NAME)
>> -  && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
>> -{
>> -  replace_uses_by (lhs, gimple_assign_rhs1 (stmt));
>> -  gsi_remove (, true);
>> -  release_ssa_name (lhs);
>> -}
>> -  else
>> -gsi_next ();
>> -}
>> -}
>> -
>>  /* Process loops from innermost to outer, stopping at the innermost
>> loop we unrolled.  */
>>  
>> @@ -1512,10 +1469,14 @@ tree_unroll_loops_completely (bool
>may_increase_size, bool unroll_outer)
>>EXECUTE_IF_SET_IN_BITMAP (fathers, 0, i, bi)
>>  {
>>loop_p father = get_loop (cfun, i);
>> -  basic_block *body = get_loop_body_in_dom_order (father);
>> -  for (unsigned j = 0; j < father->num_nodes; j++)
>> -propagate_constants_for_unrolling (body[j]);
>> -  free (body);
>> +  bitmap exit_bbs = BITMAP_ALLOC (NULL);
>> +  loop_exit *exit = father->exits->next;
>> +  while (exit->e)
>> +{
>> +  bitmap_set_bit (exit_bbs, exit->e->dest->index);
>> +  exit = exit->next;
>> +}
>> +  do_rpo_vn (cfun, loop_preheader_edge (father), exit_bbs);
>>  }
>>BITMAP_FREE (fathers);



[gomp5] depend/depobj adjustments

2018-08-01 Thread Jakub Jelinek
Hi!

A change is being voted into OpenMP 5.0, where the depend clause
modifiers are separated from dependence type with a comma rather than
colon (mostly for consistency with other clauses), and more importantly,
dependence type is no longer optional, the clauses previously without
dependence type should now use depobj dependence type (this is to make
parsing non-ambiguous).

Tested on x86_64-linux, committed to gomp-5_0-branch.

2018-08-01  Jakub Jelinek  

* tree-core.h (enum omp_clause_depend_kind): Remove
OMP_CLAUSE_DEPEND_UNSPECIFIED, add OMP_CLAUSE_DEPEND_DEPOBJ.
* gimplify.c (gimplify_omp_depend): Handle OMP_CLAUSE_DEPEND_DEPOBJ
instead of OMP_CLAUSE_DEPEND_UNSPECIFIED.
* omp-low.c (lower_depend_clauses): Likewise.
* tree-pretty-print.c (dump_omp_clause): Likewise, print the
dependence type unconditionally.
gcc/c-family/
* c-omp.c (c_finish_omp_depobj): Test for OMP_CLAUSE_DEPEND_DEPOBJ
on clause instead of OMP_CLAUSE_DEPEND_UNSPECIFIED, adjust diagnostics
in that case.  Expect kind to be OMP_CLAUSE_DEPEND_SOURCE if clause
is specified, rather than OMP_CLAUSE_DEPEND_UNSPECIFIED.
gcc/c/
* c-parser.c (c_parser_omp_clause_depend): Adjust parsing for
dependence type to be no longer optional and dependence modifier
separated from dependence type by comma rather than colon.  Parse
depobj dependence type.
(c_parser_omp_depobj): Use OMP_CLAUSE_DEPEND_SOURCE instead of
OMP_CLAUSE_DEPEND_UNSPECIFIED.
* c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_DEPEND_DEPOBJ
instead of OMP_CLAUSE_DEPEND_UNSPECIFIED, adjust diagnostics.
gcc/cp/
* parser.c (cp_parser_omp_clause_depend): Adjust parsing for
dependence type to be no longer optional and dependence modifier
separated from dependence type by comma rather than colon.  Parse
depobj dependence type.
(cp_parser_omp_depobj): Use OMP_CLAUSE_DEPEND_SOURCE instead of
OMP_CLAUSE_DEPEND_UNSPECIFIED.
* semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_DEPEND_DEPOBJ
instead of OMP_CLAUSE_DEPEND_UNSPECIFIED, adjust diagnostics.
* pt.c (tsubst_expr): Use OMP_CLAUSE_DEPEND_SOURCE instead of
OMP_CLAUSE_DEPEND_UNSPECIFIED.
gcc/testsuite/
* c-c++-common/gomp/depend-iterator-1.c (foo, bar, baz): Separate
dependence modifier from type with comma instead of colon.
* c-c++-common/gomp/taskwait-depend-1.c (foo): Likewise.
* c-c++-common/gomp/depobj-1.c (f1, f2, f3): Likewise.  Add depobj: to
depend clauses without dependence type.  Add an extra test for depobj
construct with depobj: type on depend clause and omp_depend_t type of
the lvalue.
* c-c++-common/gomp/depend-iterator-2.c (f1, f2, f3): Separate
dependence modifier from type with comma instead of colon.  Adjust
diagnostics for dependence type no longer being optional.
* g++.dg/gomp/depend-iterator-1.C (foo, bar, baz): Separate
dependence modifier from type with comma instead of colon.
* g++.dg/gomp/depend-iterator-2.C (f1, f2, f3, f4): Likewise.  Adjust
diagnostics for dependence type no longer being optional.
* g++.dg/gomp/depobj-1.C (f1, f2, f4, f5): Separate dependence modifier
from type with comma instead of colon.  Add depobj: to depend clauses
without dependence type.  Add an extra test for depobj construct with
depobj: type on depend clause and omp_depend_t type of the lvalue.
libgomp/
* testsuite/libgomp.c-c++-common/depend-iterator-1.c (main): Separate
dependence modifier from type with comma instead of colon.
* testsuite/libgomp.c-c++-common/depend-iterator-2.c (foo): Likewise.
* testsuite/libgomp.c-c++-common/depobj-1.c (dep, dep2, dep3,
antidep): Add depobj: to depend clauses without dependence type.
* testsuite/libgomp.c++/depend-iterator-1.C (bar, baz): Separate
dependence modifier from type with comma instead of colon.
* testsuite/libgomp.c++/depobj-1.C (dep, dep2, dep3, antidep): Add
depobj: to depend clauses without dependence type.

--- gcc/tree-core.h.jj  2018-07-10 11:36:00.291380752 +0200
+++ gcc/tree-core.h 2018-08-01 16:36:02.709901571 +0200
@@ -1408,13 +1408,13 @@ struct GTY(()) tree_constructor {
 
 enum omp_clause_depend_kind
 {
-  OMP_CLAUSE_DEPEND_UNSPECIFIED,
   OMP_CLAUSE_DEPEND_IN,
   OMP_CLAUSE_DEPEND_OUT,
   OMP_CLAUSE_DEPEND_INOUT,
   OMP_CLAUSE_DEPEND_MUTEXINOUTSET,
   OMP_CLAUSE_DEPEND_SOURCE,
   OMP_CLAUSE_DEPEND_SINK,
+  OMP_CLAUSE_DEPEND_DEPOBJ,
   OMP_CLAUSE_DEPEND_LAST
 };
 
--- gcc/gimplify.c.jj   2018-08-01 14:34:53.945975952 +0200
+++ gcc/gimplify.c  2018-08-01 16:36:33.391000517 +0200
@@ -7547,7 +7547,7 @@ gimplify_omp_depend (tree *list_p, gimpl
  case OMP_CLAUSE_DEPEND_MUTEXINOUTSET:
i = 1;

Re: [PATCH] Handle overlength strings in the C FE

2018-08-01 Thread Bernd Edlinger
On 08/01/18 19:07, Martin Sebor wrote:
> On 08/01/2018 05:20 AM, Bernd Edlinger wrote:
>> On 07/30/18 17:49, Joseph Myers wrote:
>>> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
>>>
 Hi,

 this is how I would like to handle the over length strings issue in the C 
 FE.
 If the string constant is exactly the right length and ends in one explicit
 NUL character, shorten it by one character.
>>>
>>> I don't think shortening should be limited to that case.  I think the case
>>> where the constant is longer than that (and so gets an unconditional
>>> pedwarn) should also have it shortened - any constant that doesn't fit in
>>> the object being initialized should be shortened to fit, whether diagnosed
>>> or not, we should define GENERIC / GIMPLE to disallow too-large string
>>> constants in initializers, and should add an assertion somewhere in the
>>> middle-end that no too-large string constants reach it.
>>>
>>
>> Okay, there is an update following your suggestion.
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
> 
> The ChangeLog description says:
> 
>  * c-typeck.c (digest_init): Fix overlength strings.
> 
> suggesting there is a bug but there is no test case.  If there
> is a bug in there that can be triggered by C code (valid or
> otherwise), it would be good to have a test case and a bug
> in Bugzilla.  If there is no bug and this is just cleanup,
> I would suggest to adjust the description.
> 

Yes, thanks for looking at this.  This is an attempt to
reduce the number of different encodings for otherwise
identical strings in the middle-end.

I could say "Shorten overlength strings." is that better?
There are already numerous test cases with overlength strings.
I verified that, because I have tested this patch on top
of https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00050.html

> Other than that, while making improvements here, I think it
> would be helpful to also add more detail to the text of
> the warning:
> 

Sure, but it is important to do only one thing at a time.
I see this as a preparation to fixing the remaining
string_constant folding issues which are potential wrong-code
issues.

> 1) mention the type of the array being initialized in case
> it's not obvious from the declaration (the array bound could
> be a symbol, not a literal, or the type could be a typedef)
> 
> 2) mention the number of elements in the initializer in case
> it's a macro (such as __FILE__) whose definition isn't visible
> in the diagnostic
> 
> 3) mention that the excess elements are ignored (since it's
> undefined in the standard, it will let users know what
> happens in GCC).
> 
> Here's a test case and a suggested warning:
> 
>    #define S __FILE__ "\000"
>    enum { N = sizeof __FILE__ };
>    const char a[N] = S;
> 
>    warning: discarding 1 excess element from initializer-string for 'char[4]' 
> [-Wc++-compat]
>     #define S __FILE__ "\000"
>   ^~~~
>    note: in expansion of macro ‘S’
>     const char a[N] = S;
>   ^
> (Similarly for more than 1 excess element.)
> 

Yes, definitely helpful, but not part of this patch.
Probably one of your next patches, I would assume.

Bernd.


Re: [PATCH][AArch64] Implement new intrinsics vabsd_s64 and vnegd_s64

2018-08-01 Thread James Greenhalgh
On Wed, Aug 01, 2018 at 07:13:53AM -0500, Vlad Lazar wrote:
> On 31/07/18 22:48, James Greenhalgh wrote:
> > On Fri, Jul 20, 2018 at 04:37:34AM -0500, Vlad Lazar wrote:
> >> Hi,
> >>
> >> The patch adds implementations for the NEON intrinsics vabsd_s64 and 
> >> vnegd_s64.
> >> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/docs/ihi0073/latest/arm-neon-intrinsics-reference-architecture-specification)
> >>
> >> Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
> >> regressions.
> >>
> >> OK for trunk?
> >>
> >> +__extension__ extern __inline int64_t
> >> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> >> +vnegd_s64 (int64_t __a)
> >> +{
> >> +  return -__a;
> >> +}
> > 
> > Does this give the correct behaviour for the minimum value of int64_t? That
> > would be undefined behaviour in C, but well-defined under ACLE.
> > 
> > Thanks,
> > James
> > 
> 
> Hi. Thanks for the review.
> 
> For the minimum value of int64_t it behaves as the ACLE specifies:
> "The negative of the minimum (signed) value is itself."

What should happen in this testcase? The spoiler is below, but try to work out
what should happen and what goes wrong with your implementation.

  int foo (int64_t x)
  {
if (x < (int64_t) 0)
  return vnegd_s64(x) < (int64_t) 0;
else
  return 0;
  }
  
  
  int bar (void)
  {
return foo (INT64_MIN);
  }
 
Thanks,
James


-






INT64_MIN < 0 should be true, so we should return vnegd_s64(INT64_MIN) < 0.
vnegd_s64(INT64_MIN) is identity, so the return value should be
INT64_MIN < 0; i.e. True.

This isn't what the compiler thinks... The compiler makes use of the fact
that -INT64_MIN is undefined behaviour in C, and doesn't need to be considered
as a special case. The if statement gives you a range reduction to [-INF, -1],
negating that gives you a range [1, INF], and [1, INF] is never less than 0,
so the compiler folds the function to return false. We have a mismatch in
semantics



Re: [PATCH,nvptx] Truncate config/nvptx/oacc-parallel.c

2018-08-01 Thread Jakub Jelinek
On Wed, Aug 01, 2018 at 10:13:06AM -0700, Nathan Sidwell wrote:
> On 08/01/2018 04:55 AM, Jakub Jelinek wrote:
> 
> > The ABI compatibility is mainly for libgomp.so which hasn't (ever) bumped
> > the soname and I don't plan to do that any time soon, but even for the
> > offloaded libgomp.a I guess one might compile with GCC 5 and link with GCC
> > 9 and expect things not to fail miserably.  This is a *.a library, can't you
> 
> I think it should fail horribly.  If it succeeded, the performance would
> suck.  Far better to shout at the user sooner.

Ok, I can live with that for OpenACC and GCC 5, just would not like to force
people to rebuild everything every time GCC is bumped even for OpenACC (and
once exported symbols from libgomp.so need to stay forever, unless soname is
bumnped).

Jakub


Re: [PATCH, testsuite]: Fix PR 86153, test case g++.dg/pr83239.C fails

2018-08-01 Thread Martin Sebor

On 08/01/2018 04:34 AM, Uros Bizjak wrote:

Hello!

The testcase fails with:

FAIL: g++.dg/pr83239.C  -std=gnu++11  scan-tree-dump-not optimized
"_ZNSt6vectorIiSaIiEE17_M_default_appendEm"
FAIL: g++.dg/pr83239.C  -std=gnu++14  scan-tree-dump-not optimized
"_ZNSt6vectorIiSaIiEE17_M_default_appendEm"

the test depends on _M_default_append to be inlined, so it verifies
the inlining with:

// Verify that std::vector::_M_default_append() has been inlined
// (the absence of warnings depends on it).
// { dg-final { scan-tree-dump-not
"_ZNSt6vectorIiSaIiEE17_M_default_appendEm"  optimized } }
// { dg-final { scan-tree-dump-not
"_ZNSt6vectorIPvSaIS0_EE17_M_default_appendEm" optimized } }

However, this is not the case with the default -finline-limit, so
raise it to 500 (the same approach is taken in g++.dg/
tree-ssa/copyprop.C).

Unfortunately, the fixed testcase exposes some issue with -std=gnu++98:

FAIL: g++.dg/pr83239.C  -std=gnu++98 (test for excess errors)

In function 'void test_loop() [with T = int]':
cc1plus: warning: 'void* __builtin_memset(void*, int, long unsigned
int)' specified size 18446744073709551608 exceeds maximum object size
9223372036854775807 [-Wstringop-overflow=]
In function 'void test_if(std::vector&, int) [with T = long int]':
cc1plus: warning: 'void* __builtin_memset(void*, int, long unsigned
int)' specified size 18446744073709551600 exceeds maximum object size
9223372036854775807 [-Wstringop-overflow=]

2018-08-01  Uros Bizjak  

PR testsuite/86153
* g++.dg/pr83239.C (dg-options): Add -finline-limit=500.

OK for mainline and gcc-8 branch?


Thanks for spending time on this!  I just looked into it
earlier this week and was going to touch base with Jeff after
he comes back from PTO later this week to see what to about
the test and the outstanding warning.  In comment #20 on
pr83239 Jeff said he has a patch for the missed optimization
that presumably is behind the false positive, so that should
presumably fix the other part of the problem here.

Martin


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-08-01 Thread Bernd Edlinger
On 08/01/18 18:34, Martin Sebor wrote:
>>> If you care about detecting bugs I would expect you to be
>>> supportive rather than dismissive of this work, and helpful
>>> in bringing it to fruition rather that putting it down or
>>> questioning my priorities.  Especially since the work was
>>> prompted by your own (valid) complaint that GCC doesn't
>>> diagnose them.
>>>
>>
>> You don't really listen to what I am saying, I did not say
>> that we need another warning instead of fixing the wrong
>> optimization issue at hand.
>>
>> But I am in good company, you don't listen to Jakub and Richi
>> either.
> 
> I certainly intend to fix bugs I'm responsible for introducing.
> I always do if given the chance.  I assume you are referring
> to bug 86711 (and 86714).  Fixing the underlying problem has
> been on my mind since you first mentioned it, and on my to-do
> list since last week (bug 86688).  You have now submitted
> a patch for both of the former, plus a follow-on patch, but
> you didn't assign either of the bugs to yourself, or indicated
> if the patch fixes 86688, or if you intend to work on it too.
> I haven't reviewed the patches in any detail except to note
> that they touch the same area as mine and likely conflict.
> I'm not sure what I should do now.  Work on fixing these bugs
> myself?  (I would prefer to.)  Try to rebase my work on top
> of yours to see what the conflicts are and try to resolve
> them them in my ongoing work?  Or just keep working on my
> stuff and deal with the conflicts after your patches have
> been committed?  Or continue to debate conflicting priorities
> and try to resolve them first?
> 
> (Those are mostly rhetorical questions.)  The point is that
> if you would just let me fix my bugs we would not have this
> conundrum.  Your test cases are helpful.  But as I have said
> over and over, submitting patches for the same code at the same
> time and even undoing some prior work with no coordination is
> a recipe for confusion and conflict.  I don't recall this
> happening in the past and I don't really understand what
> triggered it in this case.  This isn't an area that normally
> sees a lot of activity.
> 

Martin,

I am totally sorry for this confusion.  I would please
ask you to do your work a bit slower, and that we please
can talk over the direction in which we want to go on.
For instance in the moment not so many new warnings, when
we actually should look at correctness and reliability issues.
I do definitely not want to revert your work, but I will have
to hedge it where it goes too far, but that does not mean that
it will be worthless.

What made my alarm bells ring is the speed in which new buggy
features, are being implemented recently, while at the same time
several global reviewers raised concerns, which would not be
honored.  That is not a good thing.

To me it is an serious problem when those global reviewers
do not seem to agree on the way these features are implemented.

To be honest, I do not believe in democracy, or majority decisions.
But I always slow down when there is no consensus, and look for a
solution that is acceptable for all the key players.


Bernd.


Re: [PATCH,nvptx] Truncate config/nvptx/oacc-parallel.c

2018-08-01 Thread Nathan Sidwell

On 08/01/2018 04:55 AM, Jakub Jelinek wrote:


The ABI compatibility is mainly for libgomp.so which hasn't (ever) bumped
the soname and I don't plan to do that any time soon, but even for the
offloaded libgomp.a I guess one might compile with GCC 5 and link with GCC
9 and expect things not to fail miserably.  This is a *.a library, can't you


I think it should fail horribly.  If it succeeded, the performance would 
suck.  Far better to shout at the user sooner.


nathan

--
Nathan Sidwell


Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-01 Thread Allan Sandfeld Jensen
On Mittwoch, 1. August 2018 18:32:30 CEST Joseph Myers wrote:
> On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> > gcc/
> > 
> > * gcc.c: Correct default specs for -r
> 
> I don't follow why your changes (which would need describing for each
> individual spec changed) are corrections.
> 
> >  /* config.h can define LIB_SPEC to override the default libraries.  */
> >  #ifndef LIB_SPEC
> > 
> > -#define LIB_SPEC "%{!shared:%{g*:-lg}
> > %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}" +#define LIB_SPEC
> > "%{!shared|!r:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"> 
> >  #endif
> 
> '!' binds more closely than '|' in specs.  That is, !shared|!r means the
> following specs are used unless both -shared and -r are specified, which
> seems nonsensical to me.  I'd expect something more like "shared|r:;" to
> expand to nothing if either -shared or -r is passed and to what follows if
> neither is passed.
> 
> And that ignores that this LIB_SPEC value in gcc.c is largely irrelevant,
> as it's generally overridden by targets - and normally for targets using
> ELF shared libraries, for example, -lc *does* have to be used when linking
> with -shared.
> 
> I think you're changing the wrong place for this.  If you want -r to be
> usable with GCC without using -nostdlib (which is an interesting
> question), you actually need to change LINK_COMMAND_SPEC (also sometimes
> overridden for targets) to handle -r more like -nostdlib -nostartfiles.
> 
Ok, thanks for the information, I will investigate that. 

> > -#define LINK_PIE_SPEC "%{static|shared|r:;" PIE_SPEC ":" LD_PIE_SPEC "} "
> > +#define LINK_PIE_SPEC "%{static|shared|r|ar:;" PIE_SPEC ":" LD_PIE_SPEC
> > "} "
> What's this "-ar" option you're handling here?

Dead code from a previous more ambitious version of the patch. I will remove.

`Allan





Re: [PATCH] Handle overlength strings in the C FE

2018-08-01 Thread Martin Sebor

On 08/01/2018 05:20 AM, Bernd Edlinger wrote:

On 07/30/18 17:49, Joseph Myers wrote:

On Mon, 30 Jul 2018, Bernd Edlinger wrote:


Hi,

this is how I would like to handle the over length strings issue in the C FE.
If the string constant is exactly the right length and ends in one explicit
NUL character, shorten it by one character.


I don't think shortening should be limited to that case.  I think the case
where the constant is longer than that (and so gets an unconditional
pedwarn) should also have it shortened - any constant that doesn't fit in
the object being initialized should be shortened to fit, whether diagnosed
or not, we should define GENERIC / GIMPLE to disallow too-large string
constants in initializers, and should add an assertion somewhere in the
middle-end that no too-large string constants reach it.



Okay, there is an update following your suggestion.

Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


The ChangeLog description says:

* c-typeck.c (digest_init): Fix overlength strings.

suggesting there is a bug but there is no test case.  If there
is a bug in there that can be triggered by C code (valid or
otherwise), it would be good to have a test case and a bug
in Bugzilla.  If there is no bug and this is just cleanup,
I would suggest to adjust the description.

Other than that, while making improvements here, I think it
would be helpful to also add more detail to the text of
the warning:

1) mention the type of the array being initialized in case
it's not obvious from the declaration (the array bound could
be a symbol, not a literal, or the type could be a typedef)

2) mention the number of elements in the initializer in case
it's a macro (such as __FILE__) whose definition isn't visible
in the diagnostic

3) mention that the excess elements are ignored (since it's
undefined in the standard, it will let users know what
happens in GCC).

Here's a test case and a suggested warning:

  #define S __FILE__ "\000"
  enum { N = sizeof __FILE__ };
  const char a[N] = S;

  warning: discarding 1 excess element from initializer-string for 
'char[4]' [-Wc++-compat]

   #define S __FILE__ "\000"
 ^~~~
  note: in expansion of macro ‘S’
   const char a[N] = S;
 ^
(Similarly for more than 1 excess element.)

Martin


[PATCH] Fix TSan on ppc64le (PR libsanitizer/86759)

2018-08-01 Thread Marek Polacek
Cherry-pick compiler-rt revision 318044 and 319180.

[PowerPC][tsan] Update tsan to handle changed memory layouts in newer 
kernels

In more recent Linux kernels with 47 bit VMAs the layout of virtual memory
for powerpc64 changed causing the thread sanitizer to not work properly. 
This
patch adds support for 47 bit VMA kernels for powerpc64.

Tested on several 4.x and 3.x kernel releases.

Regtested/bootstrapped on ppc64le-linux with kernel 4.14; applying to
trunk/8.3.

2018-08-01  Marek Polacek  

PR libsanitizer/86759
* tsan/tsan_platform.h: Cherry-pick compiler-rt revision 318044.
* tsan/tsan_platform_linux.cc: Cherry-pick compiler-rt revision
319180.

--- gcc/libsanitizer/tsan/tsan_platform.h
+++ gcc/libsanitizer/tsan/tsan_platform.h
@@ -301,6 +301,38 @@ struct Mapping46 {
   static const uptr kVdsoBeg   = 0x7800ull;
 };
 
+/*
+C/C++ on linux/powerpc64 (47-bit VMA)
+  1000 - 0100  : main binary
+0100   - 0200  : -
+0100   - 1000  : shadow
+1000   - 1000  : -
+1000   - 2000  : metainfo (memory blocks and sync objects)
+2000   - 2000  : -
+2000   - 2200  : traces
+2200   - 7d00  : -
+7d00   - 7e00  : heap
+7e00   - 7e80  : -
+7e80   - 8000  : modules and main thread stack
+*/
+struct Mapping47 {
+  static const uptr kMetaShadowBeg = 0x1000ull;
+  static const uptr kMetaShadowEnd = 0x2000ull;
+  static const uptr kTraceMemBeg   = 0x2000ull;
+  static const uptr kTraceMemEnd   = 0x2200ull;
+  static const uptr kShadowBeg = 0x0100ull;
+  static const uptr kShadowEnd = 0x1000ull;
+  static const uptr kHeapMemBeg= 0x7d00ull;
+  static const uptr kHeapMemEnd= 0x7e00ull;
+  static const uptr kLoAppMemBeg   = 0x1000ull;
+  static const uptr kLoAppMemEnd   = 0x0100ull;
+  static const uptr kHiAppMemBeg   = 0x7e80ull;
+  static const uptr kHiAppMemEnd   = 0x8000ull; // 47 bits
+  static const uptr kAppMemMsk = 0x7c00ull;
+  static const uptr kAppMemXor = 0x0200ull;
+  static const uptr kVdsoBeg   = 0x7800ull;
+};
+
 // Indicates the runtime will define the memory regions at runtime.
 #define TSAN_RUNTIME_VMA 1
 #endif
@@ -427,11 +459,13 @@ uptr MappingArchImpl(void) {
   DCHECK(0);
   return 0;
 #elif defined(__powerpc64__)
-  if (vmaSize == 44)
-return MappingImpl();
-  else
-return MappingImpl();
+  switch (vmaSize) {
+case 44: return MappingImpl();
+case 46: return MappingImpl();
+case 47: return MappingImpl();
+  }
   DCHECK(0);
+  return 0;
 #else
   return MappingImpl();
 #endif
@@ -580,11 +614,13 @@ bool IsAppMem(uptr mem) {
   DCHECK(0);
   return false;
 #elif defined(__powerpc64__)
-  if (vmaSize == 44)
-return IsAppMemImpl(mem);
-  else
-return IsAppMemImpl(mem);
+  switch (vmaSize) {
+case 44: return IsAppMemImpl(mem);
+case 46: return IsAppMemImpl(mem);
+case 47: return IsAppMemImpl(mem);
+  }
   DCHECK(0);
+  return false;
 #else
   return IsAppMemImpl(mem);
 #endif
@@ -607,11 +643,13 @@ bool IsShadowMem(uptr mem) {
   DCHECK(0);
   return false;
 #elif defined(__powerpc64__)
-  if (vmaSize == 44)
-return IsShadowMemImpl(mem);
-  else
-return IsShadowMemImpl(mem);
+  switch (vmaSize) {
+case 44: return IsShadowMemImpl(mem);
+case 46: return IsShadowMemImpl(mem);
+case 47: return IsShadowMemImpl(mem);
+  }
   DCHECK(0);
+  return false;
 #else
   return IsShadowMemImpl(mem);
 #endif
@@ -634,11 +672,13 @@ bool IsMetaMem(uptr mem) {
   DCHECK(0);
   return false;
 #elif defined(__powerpc64__)
-  if (vmaSize == 44)
-return IsMetaMemImpl(mem);
-  else
-return IsMetaMemImpl(mem);
+  switch (vmaSize) {
+case 44: return IsMetaMemImpl(mem);
+case 46: return IsMetaMemImpl(mem);
+case 47: return IsMetaMemImpl(mem);
+  }
   DCHECK(0);
+  return false;
 #else
   return IsMetaMemImpl(mem);
 #endif
@@ -671,11 +711,13 @@ uptr MemToShadow(uptr x) {
   DCHECK(0);
   return 0;
 #elif defined(__powerpc64__)
-  if (vmaSize == 44)
-return MemToShadowImpl(x);
-  else
-return MemToShadowImpl(x);
+  switch (vmaSize) {
+case 44: return MemToShadowImpl(x);
+case 46: return MemToShadowImpl(x);
+case 47: return MemToShadowImpl(x);
+  }
   DCHECK(0);
+  return 0;
 #else
   return MemToShadowImpl(x);
 #endif
@@ -710,11 +752,13 @@ u32 *MemToMeta(uptr x) {
   DCHECK(0);
   return 0;
 #elif defined(__powerpc64__)
-  if (vmaSize == 44)
-return MemToMetaImpl(x);
-  else
-return MemToMetaImpl(x);
+  switch (vmaSize) {
+case 44: return MemToMetaImpl(x);
+case 46: return MemToMetaImpl(x);
+case 47: return MemToMetaImpl(x);
+  }
   DCHECK(0);
+  return 0;
 #else
   return MemToMetaImpl(x);
 #endif
@@ -762,11 +806,13 @@ uptr 

Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-01 Thread Marc Glisse

On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:

 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))

 _mm_move_sd (__m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
+  return __extension__ (__m128d)(__v2df){__B[0],__A[1]};
 }


If the goal is to have it represented as a VEC_PERM_EXPR internally, I 
wonder if we should be explicit and use __builtin_shuffle instead of 
relying on some forwprop pass to transform it. Maybe not, just asking. And 
the answer need not even be the same for _mm_move_sd and _mm_move_ss.


--
Marc Glisse


Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-01 Thread Rainer Orth
Hi Allan,

> The option has existed and been working for years,
> make sure it implies the right extra options, and list
> it in the documentation.

this is way incomplete: you are only fixing the default versions of the
various specs in gcc.c, while there are many others that also need
fixing in gcc/config.  Without this, the new documentation is completely
misleading.

> 2018-08-01 Allan Sandfeld Jensen 
>
> gcc/doc
>
> * invoke.texi: Document -r

Lacks which section you are changing.  Besides, this needs to end in a
full stop.

> gcc/
> * gcc.c: Correct default specs for -r

You need to include which macros you changed in what way.  Best read the
GNU coding standards for the full details.

Don't misunderstand me: I'd very much like to have gcc -r work as it
should, just misleading users into thinking it always does when with
your current patch only few targets work right is ultimtely a
disservice.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCHv3 1/6] Improve libstdc++-v3 async test

2018-08-01 Thread Jonathan Wakely

On 01/08/18 14:19 +0100, Mike Crowe wrote:

Add tests for waiting for the future using both std::chrono::steady_clock
and std::chrono::system_clock in preparation for dealing with those clocks
properly in futex.cc.
---
libstdc++-v3/testsuite/30_threads/async/async.cc | 33 
1 file changed, 33 insertions(+)

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 4c2cdd1a534..015bcce0c2c 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -51,17 +51,50 @@ void test02()
  VERIFY( status == std::future_status::timeout );
  status = f1.wait_until(std::chrono::system_clock::now());
  VERIFY( status == std::future_status::timeout );
+  status = f1.wait_until(std::chrono::steady_clock::now());
+  VERIFY( status == std::future_status::timeout );
  l.unlock();  // allow async thread to proceed
  f1.wait();   // wait for it to finish
  status = f1.wait_for(std::chrono::milliseconds(0));
  VERIFY( status == std::future_status::ready );
  status = f1.wait_until(std::chrono::system_clock::now());
  VERIFY( status == std::future_status::ready );
+  status = f1.wait_until(std::chrono::steady_clock::now());
+  VERIFY( status == std::future_status::ready );
+}
+
+// This test is prone to failures if run on a loaded machine where the
+// kernel decides not to schedule us for several seconds. It also
+// assumes that no-one will warp CLOCK whilst the test is
+// running.
+template
+void test03()
+{
+  auto const start = CLOCK::now();
+  future f1 = async(launch::async, []() {
+  std::this_thread::sleep_for(std::chrono::seconds(2));


Thanks for reducing these times since the last patch. I think this
test improvement can go on trunk, and if we see too many failures we
can adjust the times.



Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-08-01 Thread Martin Sebor

If you care about detecting bugs I would expect you to be
supportive rather than dismissive of this work, and helpful
in bringing it to fruition rather that putting it down or
questioning my priorities.  Especially since the work was
prompted by your own (valid) complaint that GCC doesn't
diagnose them.



You don't really listen to what I am saying, I did not say
that we need another warning instead of fixing the wrong
optimization issue at hand.

But I am in good company, you don't listen to Jakub and Richi
either.


I certainly intend to fix bugs I'm responsible for introducing.
I always do if given the chance.  I assume you are referring
to bug 86711 (and 86714).  Fixing the underlying problem has
been on my mind since you first mentioned it, and on my to-do
list since last week (bug 86688).  You have now submitted
a patch for both of the former, plus a follow-on patch, but
you didn't assign either of the bugs to yourself, or indicated
if the patch fixes 86688, or if you intend to work on it too.
I haven't reviewed the patches in any detail except to note
that they touch the same area as mine and likely conflict.
I'm not sure what I should do now.  Work on fixing these bugs
myself?  (I would prefer to.)  Try to rebase my work on top
of yours to see what the conflicts are and try to resolve
them them in my ongoing work?  Or just keep working on my
stuff and deal with the conflicts after your patches have
been committed?  Or continue to debate conflicting priorities
and try to resolve them first?

(Those are mostly rhetorical questions.)  The point is that
if you would just let me fix my bugs we would not have this
conundrum.  Your test cases are helpful.  But as I have said
over and over, submitting patches for the same code at the same
time and even undoing some prior work with no coordination is
a recipe for confusion and conflict.  I don't recall this
happening in the past and I don't really understand what
triggered it in this case.  This isn't an area that normally
sees a lot of activity.

Martin


Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-01 Thread Joseph Myers
On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:

> gcc/
> * gcc.c: Correct default specs for -r

I don't follow why your changes (which would need describing for each 
individual spec changed) are corrections.

>  /* config.h can define LIB_SPEC to override the default libraries.  */
>  #ifndef LIB_SPEC
> -#define LIB_SPEC "%{!shared:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"
> +#define LIB_SPEC "%{!shared|!r:%{g*:-lg} 
> %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"
>  #endif

'!' binds more closely than '|' in specs.  That is, !shared|!r means the 
following specs are used unless both -shared and -r are specified, which 
seems nonsensical to me.  I'd expect something more like "shared|r:;" to 
expand to nothing if either -shared or -r is passed and to what follows if 
neither is passed.

And that ignores that this LIB_SPEC value in gcc.c is largely irrelevant, 
as it's generally overridden by targets - and normally for targets using 
ELF shared libraries, for example, -lc *does* have to be used when linking 
with -shared.

I think you're changing the wrong place for this.  If you want -r to be 
usable with GCC without using -nostdlib (which is an interesting 
question), you actually need to change LINK_COMMAND_SPEC (also sometimes 
overridden for targets) to handle -r more like -nostdlib -nostartfiles.

> -#define LINK_PIE_SPEC "%{static|shared|r:;" PIE_SPEC ":" LD_PIE_SPEC "} "
> +#define LINK_PIE_SPEC "%{static|shared|r|ar:;" PIE_SPEC ":" LD_PIE_SPEC "} "

What's this "-ar" option you're handling here?

-- 
Joseph S. Myers
jos...@codesourcery.com


[Patch][GCC] Document and fix -r (partial linking)

2018-08-01 Thread Allan Sandfeld Jensen
The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-08-01 Allan Sandfeld Jensen 

gcc/doc

* invoke.texi: Document -r

gcc/
* gcc.c: Correct default specs for -r
---
 gcc/doc/invoke.texi | 7 ++-
 gcc/gcc.c   | 6 +++---
 2 files changed, 9 insertions(+), 4 deletions(-)>From 638966e6c7e072ca46c6af0664fbd57bedbfff80 Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Wed, 1 Aug 2018 18:07:05 +0200
Subject: [PATCH] Fix and document -r option

The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-07-29 Allan Sandfeld Jensen 

gcc/doc

* invoke.texi: Document -r

gcc/
* gcc.c: Correct default specs for -r
---
 gcc/doc/invoke.texi | 7 ++-
 gcc/gcc.c   | 6 +++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6047d82065a..7da30bd9d99 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -518,7 +518,7 @@ Objective-C and Objective-C++ Dialects}.
 @xref{Link Options,,Options for Linking}.
 @gccoptlist{@var{object-file-name}  -fuse-ld=@var{linker}  -l@var{library} @gol
 -nostartfiles  -nodefaultlibs  -nolibc  -nostdlib @gol
--pie  -pthread  -rdynamic @gol
+-pie  -pthread  -r  -rdynamic @gol
 -s  -static -static-pie -static-libgcc  -static-libstdc++ @gol
 -static-libasan  -static-libtsan  -static-liblsan  -static-libubsan @gol
 -shared  -shared-libgcc  -symbolic @gol
@@ -12444,6 +12444,11 @@ x86 Cygwin and MinGW targets.  On some targets this option also sets
 flags for the preprocessor, so it should be used consistently for both
 compilation and linking.
 
+@item -r
+@opindex r
+Produce a relocatable object as output. This is also known as partial
+linking.
+
 @item -rdynamic
 @opindex rdynamic
 Pass the flag @option{-export-dynamic} to the ELF linker, on targets
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 780d4859ef3..858a5600c14 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -675,7 +675,7 @@ proper position among the other output files.  */
 
 /* config.h can define LIB_SPEC to override the default libraries.  */
 #ifndef LIB_SPEC
-#define LIB_SPEC "%{!shared:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"
+#define LIB_SPEC "%{!shared|!r:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"
 #endif
 
 /* When using -fsplit-stack we need to wrap pthread_create, in order
@@ -797,7 +797,7 @@ proper position among the other output files.  */
 /* config.h can define STARTFILE_SPEC to override the default crt0 files.  */
 #ifndef STARTFILE_SPEC
 #define STARTFILE_SPEC  \
-  "%{!shared:%{pg:gcrt0%O%s}%{!pg:%{p:mcrt0%O%s}%{!p:crt0%O%s}}}"
+  "%{!shared|!r:%{pg:gcrt0%O%s}%{!pg:%{p:mcrt0%O%s}%{!p:crt0%O%s}}}"
 #endif
 
 /* config.h can define ENDFILE_SPEC to override the default crtn files.  */
@@ -936,7 +936,7 @@ proper position among the other output files.  */
 #else
 #define LD_PIE_SPEC ""
 #endif
-#define LINK_PIE_SPEC "%{static|shared|r:;" PIE_SPEC ":" LD_PIE_SPEC "} "
+#define LINK_PIE_SPEC "%{static|shared|r|ar:;" PIE_SPEC ":" LD_PIE_SPEC "} "
 #endif
 
 #ifndef LINK_BUILDID_SPEC
-- 
2.17.0



Re: [PATCH] Handle overlength strings in the C FE

2018-08-01 Thread Joseph Myers
On Wed, 1 Aug 2018, Bernd Edlinger wrote:

> On 07/30/18 17:49, Joseph Myers wrote:
> > On Mon, 30 Jul 2018, Bernd Edlinger wrote:
> > 
> >> Hi,
> >>
> >> this is how I would like to handle the over length strings issue in the C 
> >> FE.
> >> If the string constant is exactly the right length and ends in one explicit
> >> NUL character, shorten it by one character.
> > 
> > I don't think shortening should be limited to that case.  I think the case
> > where the constant is longer than that (and so gets an unconditional
> > pedwarn) should also have it shortened - any constant that doesn't fit in
> > the object being initialized should be shortened to fit, whether diagnosed
> > or not, we should define GENERIC / GIMPLE to disallow too-large string
> > constants in initializers, and should add an assertion somewhere in the
> > middle-end that no too-large string constants reach it.
> > 
> 
> Okay, there is an update following your suggestion.

It seems odd to me to have two separate bits of code dealing with reducing 
the length, rather than something like

if (too long)
  {
/* Decide whether to do a pedwarn_init, or a warn_cxx_compat warning,
   or neither.  */
/* Shorten string, in either case.  */
  }

The memcmp with "\0\0\0\0" is introducing a hidden assumption that any 
sort of character in strings is never more than four bytes.  It also seems 
unnecessary, in that ultimately the over-long string should be shortened 
regardless of whether what's being removed is a zero character or not.

It should not be possible to be over-long and fail tree_fits_uhwi_p 
(TYPE_SIZE_UNIT (type)), simply because STRING_CST lengths are stored in 
host int (even if, ideally, they'd use some other type to allow for 
STRING_CSTs over 2GB in size).  (And I don't think GCC can represent 
target type sizes that don't fit in unsigned HOST_WIDE_INT anyway; the 
only way for a target type size in bytes to fail to be representable in 
unsigned HOST_WIDE_INT should be if the size is not constant.)

-- 
Joseph S. Myers
jos...@codesourcery.com


[committed, AArch64] Update expected output for sve/var_stride_[24].c

2018-08-01 Thread Richard Sandiford
After Segher's recent combine change, these tests now use a single
instruction to do the "and" and "lsl 10".  This is a good thing,
so the patch updates the expected output accordingly.

Tested on aarch64-linux-gnu and applied.

Richard


2018-08-01  Richard Sandiford  

gcc/testsuite/
* gcc.target/aarch64/sve/var_stride_2.c: Update expected form
of range check.
* gcc.target/aarch64/sve/var_stride_4.c: Likewise.

Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c 2018-05-02 
08:37:35.613731163 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c 2018-08-01 
17:00:46.831168822 +0100
@@ -16,7 +16,8 @@ f (TYPE *x, TYPE *y, unsigned short n, u
 /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
 /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
 /* Should multiply by (257-1)*4 rather than (VF-1)*4.  */
-/* { dg-final { scan-assembler-times {\tadd\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 
10\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tubfiz\tx[0-9]+, x2, 10, 16\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tubfiz\tx[0-9]+, x3, 10, 16\n} 1 } } */
 /* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
 /* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
 /* { dg-final { scan-assembler-not {\tcsel\tx[0-9]+} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_4.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/var_stride_4.c 2018-05-02 
08:37:35.629731011 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_4.c 2018-08-01 
17:00:46.831168822 +0100
@@ -16,7 +16,8 @@ f (TYPE *x, TYPE *y, int n, int m)
 /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
 /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
 /* Should multiply by (257-1)*4 rather than (VF-1)*4.  */
-/* { dg-final { scan-assembler-times {\tlsl\tx[0-9]+, x[0-9]+, 10\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tsbfiz\tx[0-9]+, x2, 10, 32\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tsbfiz\tx[0-9]+, x3, 10, 32\n} 1 } } */
 /* { dg-final { scan-assembler {\tcmp\tw2, 0} } } */
 /* { dg-final { scan-assembler {\tcmp\tw3, 0} } } */
 /* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 4 } } */


[committed, AArch64] XFAIL sve/vcond_[45].c tests

2018-08-01 Thread Richard Sandiford
See PR 86753 for details.

Tested on aarch64-linux-gnu and applied.

Richard


2018-08-01  Richard Sandiford  

gcc/testsuite/
PR target/86753
* gcc.target/aarch64/sve/vcond_4.c: XFAIL positive tests.
* gcc.target/aarch64/sve/vcond_5.c: Likewise.

Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_4.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/vcond_4.c  2018-05-02 
08:37:35.593731352 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_4.c  2018-08-01 
16:56:53.757184070 +0100
@@ -88,52 +88,54 @@ TEST_CMP (nule)
 TEST_CMP (nuge)
 TEST_CMP (nugt)
 
-/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 5 } } */
-/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 10 } } */
+/* See PR 86753 for the reason behind the XFAILs.  */
+
+/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 5 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 10 { xfail *-*-* } } } */
 
 /* 5 for ne, 5 for ueq and 5 for nueq.  */
-/* { dg-final { scan-assembler-times {\tfcmne\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 15 } } */
-/* { dg-final { scan-assembler-times {\tfcmne\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 30 } } */
+/* { dg-final { scan-assembler-times {\tfcmne\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 15 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tfcmne\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 30 { xfail *-*-* } } } */
 
 /* 5 for lt, 5 for ult and 5 for nult.  */
-/* { dg-final { scan-assembler-times {\tfcmlt\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 15 } } */
-/* { dg-final { scan-assembler-times {\tfcmlt\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 30 } } */
+/* { dg-final { scan-assembler-times {\tfcmlt\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 15 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tfcmlt\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 30 { xfail *-*-* } } } */
 
 /* 5 for le, 5 for ule and 5 for nule.  */
-/* { dg-final { scan-assembler-times {\tfcmle\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 15 } } */
-/* { dg-final { scan-assembler-times {\tfcmle\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 30 } } */
+/* { dg-final { scan-assembler-times {\tfcmle\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 15 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tfcmle\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 30 { xfail *-*-* } } } */
 
 /* 5 for gt, 5 for ugt and 5 for nugt.  */
-/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 15 } } */
-/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 30 } } */
+/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 15 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 30 { xfail *-*-* } } } */
 
 /* 5 for ge, 5 for uge and 5 for nuge.  */
-/* { dg-final { scan-assembler-times {\tfcmge\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 15 } } */
-/* { dg-final { scan-assembler-times {\tfcmge\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 30 } } */
+/* { dg-final { scan-assembler-times {\tfcmge\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, #0\.0\n} 15 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tfcmge\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 30 { xfail *-*-* } } } */
 
 /* { dg-final { scan-assembler-not {\tfcmuo\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, 
#0\.0\n} } } */
 /* 3 loops * 5 invocations for all 12 unordered comparisons.  */
-/* { dg-final { scan-assembler-times {\tfcmuo\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 180 } } */
+/* { dg-final { scan-assembler-times {\tfcmuo\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 180 { xfail *-*-* } } } */
 
-/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.d, p[0-7]/z, 
z[0-9]+\.d, #0\.0\n} 7 } } */
-/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.d, p[0-7]/z, 
z[0-9]+\.d, z[0-9]+\.d\n} 14 } } */
+/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.d, p[0-7]/z, 
z[0-9]+\.d, #0\.0\n} 7 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.d, p[0-7]/z, 
z[0-9]+\.d, z[0-9]+\.d\n} 14 { xfail *-*-* } } } */
 
-/* { dg-final { scan-assembler-times {\tfcmne\tp[0-9]+\.d, p[0-7]/z, 
z[0-9]+\.d, #0\.0\n} 21 } } */
-/* { dg-final { scan-assembler-times {\tfcmne\tp[0-9]+\.d, p[0-7]/z, 
z[0-9]+\.d, z[0-9]+\.d\n} 42 } } */
+/* { dg-final { scan-assembler-times {\tfcmne\tp[0-9]+\.d, p[0-7]/z, 
z[0-9]+\.d, #0\.0\n} 21 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tfcmne\tp[0-9]+\.d, p[0-7]/z, 
z[0-9]+\.d, z[0-9]+\.d\n} 42 { xfail *-*-* } } } */
 
-/* { dg-final { scan-assembler-times {\tfcmlt\tp[0-9]+\.d, 

[PATCH][x86] Match movss and movsd "blend" instructions

2018-08-01 Thread Allan Sandfeld Jensen
Adds the ability to match movss and movsd as blend patterns,
implemented in a new method to be able to match these before shuffles,
while keeping other blends after.

2018-07-29 Allan Sandfeld Jensen 

gcc/config/i386

* i386.cc (expand_vec_perm_movs): New method matching movs
patterns.
* i386.cc (expand_vec_perm_1): Try the new method.

gcc/testsuite

* gcc.target/i386/sse2-movs.c: New test.
---
 gcc/config/i386/emmintrin.h   |  2 +-
 gcc/config/i386/i386.c| 44 +++
 gcc/config/i386/xmmintrin.h   |  2 +-
 gcc/testsuite/gcc.target/i386/sse2-movs.c | 21 +++
 4 files changed, 67 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-movs.c
>From e96b3aa9017ad0d19238c923146196405cc4e5af Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Wed, 9 May 2018 12:35:14 +0200
Subject: [PATCH] Match movss and movsd blends

Adds the ability to match movss and movsd as blend patterns,
implemented in a new method to be able to match these before shuffles,
while keeping other blends after.

2018-07-29 Allan Sandfeld Jensen 

gcc/config/i386

* i386.cc (expand_vec_perm_movs): New method matching movs
patterns.
* i386.cc (expand_vec_perm_1): Try the new method.

gcc/testsuite

* gcc.target/i386/sse2-movs.c: New test.
---
 gcc/config/i386/emmintrin.h   |  2 +-
 gcc/config/i386/i386.c| 44 +++
 gcc/config/i386/xmmintrin.h   |  2 +-
 gcc/testsuite/gcc.target/i386/sse2-movs.c | 21 +++
 4 files changed, 67 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-movs.c

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index b940a39d27b..1efd943bac4 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -113,7 +113,7 @@ _mm_setzero_pd (void)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_sd (__m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
+  return __extension__ (__m128d)(__v2df){__B[0],__A[1]};
 }
 
 /* Load two DPFP values from P.  The address must be 16-byte aligned.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ee409cfe7e4..2337ef5ea08 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -46143,6 +46143,46 @@ expand_vselect_vconcat (rtx target, rtx op0, rtx op1,
   return ok;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   using movss or movsd.  */
+static bool
+expand_vec_perm_movs (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  unsigned i, nelt = d->nelt;
+  rtx x;
+
+  if (d->one_operand_p)
+return false;
+
+  if (TARGET_SSE2 && (vmode == V2DFmode || vmode == V4SFmode))
+;
+  else
+return false;
+
+  /* Only the first element is changed. */
+  if (d->perm[0] != nelt && d->perm[0] != 0)
+return false;
+  for (i = 1; i < nelt; ++i) {
+{
+  if (d->perm[i] != i + nelt - d->perm[0])
+return false;
+}
+  }
+
+  if (d->testing_p)
+return true;
+
+  if (d->perm[0] == nelt)
+x = gen_rtx_VEC_MERGE (vmode, d->op1, d->op0, GEN_INT (1));
+  else
+x = gen_rtx_VEC_MERGE (vmode, d->op0, d->op1, GEN_INT (1));
+
+  emit_insn (gen_rtx_SET (d->target, x));
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
 
@@ -46885,6 +46925,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	}
 }
 
+  /* Try movss/movsd instructions.  */
+  if (expand_vec_perm_movs (d))
+return true;
+
   /* Finally, try the fully general two operand permute.  */
   if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt,
 			  d->testing_p))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index f64f3f74a0b..699f681e054 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1011,7 +1011,7 @@ _mm_storer_ps (float *__P, __m128 __A)
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_ss (__m128 __A, __m128 __B)
 {
-  return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B);
+  return __extension__ (__m128)(__v4sf){__B[0],__A[1],__A[2],__A[3]};
 }
 
 /* Extracts one of the four words of A.  The selector N must be immediate.  */
diff --git a/gcc/testsuite/gcc.target/i386/sse2-movs.c b/gcc/testsuite/gcc.target/i386/sse2-movs.c
new file mode 100644
index 000..79f486cfa82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-movs.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-final { scan-assembler "movss" } } */
+/* { dg-final { scan-assembler "movsd" } } */
+/* { dg-final { scan-assembler-not "unpcklps" } } */
+/* { dg-final { 

Re: [PATCH 2/2] condition_variable: Use steady_clock to implement wait_for

2018-08-01 Thread Jonathan Wakely

On 20/07/18 17:49 +0100, Mike Crowe wrote:

I believe[1][2] that the C++ standard says that
std::condition_variable::wait_for should be implemented to be equivalent
to:

return wait_until(lock, chrono::steady_clock::now() + rel_time);

But the existing implementation uses chrono::system_clock. Now that
wait_until has potentially-different behaviour for chrono::steady_clock,
let's at least try to wait using the correct clock.

[1] https://en.cppreference.com/w/cpp/thread/condition_variable/wait_for
[2] https://github.com/cplusplus/draft/blob/master/source/threads.tex


Also committed to trunk. Thanks again.


---
libstdc++-v3/ChangeLog  | 3 +++
libstdc++-v3/include/std/condition_variable | 5 +++--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 4657af7..432cb84 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,4 +1,7 @@
2018-07-20  Mike Crowe 
+   * include/std/condition_variable (wait_for): Use steady_clock.
+
+2018-07-20  Mike Crowe 
   * include/std/condition_variable (wait_until): Only report timeout
   if we really have timed out when measured against the
   caller-supplied clock.
diff --git a/libstdc++-v3/include/std/condition_variable 
b/libstdc++-v3/include/std/condition_variable
index a2d146a..ce58399 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -65,6 +65,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  class condition_variable
  {
typedef chrono::system_clock   __clock_t;
+typedef chrono::steady_clock   __steady_clock_t;
typedef __gthread_cond_t   __native_type;

#ifdef __GTHREAD_COND_INIT
@@ -142,11 +143,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  wait_for(unique_lock& __lock,
  const chrono::duration<_Rep, _Period>& __rtime)
  {
-   using __dur = typename __clock_t::duration;
+   using __dur = typename __steady_clock_t::duration;
   auto __reltime = chrono::duration_cast<__dur>(__rtime);
   if (__reltime < __rtime)
 ++__reltime;
-   return wait_until(__lock, __clock_t::now() + __reltime);
+   return wait_until(__lock, __steady_clock_t::now() + __reltime);
  }

template
--
git-series 0.9.1
BrightSign considers your privacy to be very important. The emails you send to 
us will be protected and secured. Furthermore, we will only use your email and 
contact information for the reasons you sent them to us and for tracking how 
effectively we respond to your requests.


Re: [PATCH 1/2] condition_variable: Report early wakeup of wait_until as no_timeout

2018-08-01 Thread Jonathan Wakely

On 20/07/18 17:49 +0100, Mike Crowe wrote:

As currently implemented, condition_variable always ultimately waits
against std::chrono::system_clock. This clock can be changed in arbitrary
ways by the user which may result in us waking up too early or too late
when measured against the caller-supplied clock.

We can't (yet) do much about waking up too late[1], but
if we wake up too early we must return cv_status::no_timeout to indicate a
spurious wakeup rather than incorrectly returning cv_status::timeout.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861


Committed to trunk, thanks very much.

I reformatted it slightly, to keep the line below 80 columns. The
version I committed is attached.

commit 79a8b4c1d70287ac0e668c4f2335d70d97c3002e
Author: redi 
Date:   Wed Aug 1 15:39:45 2018 +

Report early wakeup of condition_variable::wait_until as no_timeout

As currently implemented, condition_variable always ultimately waits
against std::chrono::system_clock. This clock can be changed in arbitrary
ways by the user which may result in us waking up too early or too late
when measured against the caller-supplied clock.

We can't (yet) do much about waking up too late (PR 41861), but
if we wake up too early we must return cv_status::no_timeout to indicate a
spurious wakeup rather than incorrectly returning cv_status::timeout.

2018-08-01  Mike Crowe  

* include/std/condition_variable (wait_until): Only report timeout
if we really have timed out when measured against the
caller-supplied clock.
* testsuite/30_threads/condition_variable/members/2.cc: Add test
case to confirm above behaviour.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@263224 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/std/condition_variable b/libstdc++-v3/include/std/condition_variable
index 3f690c81799..c00afa2b7ae 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -117,7 +117,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	const auto __delta = __atime - __c_entry;
 	const auto __s_atime = __s_entry + __delta;
 
-	return __wait_until_impl(__lock, __s_atime);
+	if (__wait_until_impl(__lock, __s_atime) == cv_status::no_timeout)
+	  return cv_status::no_timeout;
+	// We got a timeout when measured against __clock_t but
+	// we need to check against the caller-supplied clock
+	// to tell whether we should return a timeout.
+	if (_Clock::now() < __atime)
+	  return cv_status::no_timeout;
+	return cv_status::timeout;
   }
 
 template
diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc b/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
index 0a44c4fa7cf..09a717801e1 100644
--- a/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
@@ -51,8 +51,60 @@ void test01()
 }
 }
 
+struct slow_clock
+{
+  using rep = std::chrono::system_clock::rep;
+  using period = std::chrono::system_clock::period;
+  using duration = std::chrono::system_clock::duration;
+  using time_point = std::chrono::time_point;
+  static constexpr bool is_steady = false;
+
+  static time_point now()
+  {
+auto real = std::chrono::system_clock::now();
+return time_point{real.time_since_epoch() / 3};
+  }
+};
+
+
+void test01_alternate_clock()
+{
+  try
+{
+  std::condition_variable c1;
+  std::mutex m;
+  std::unique_lock l(m);
+  auto const expire = slow_clock::now() + std::chrono::seconds(1);
+
+  while (slow_clock::now() < expire)
+   {
+ auto const result = c1.wait_until(l, expire);
+
+ // If wait_until returns before the timeout has expired when
+ // measured against the supplied clock, then wait_until must
+ // return no_timeout.
+ if (slow_clock::now() < expire)
+   VERIFY(result == std::cv_status::no_timeout);
+
+ // If wait_until returns timeout then the timeout must have
+ // expired.
+ if (result == std::cv_status::timeout)
+   VERIFY(slow_clock::now() >= expire);
+   }
+}
+  catch (const std::system_error& e)
+{
+  VERIFY( false );
+}
+  catch (...)
+{
+  VERIFY( false );
+}
+}
+
 int main()
 {
   test01();
+  test01_alternate_clock();
   return 0;
 }


Re: [PATCH] Add malloc predictor (PR middle-end/83023).

2018-08-01 Thread Nathan Sidwell

On 08/01/2018 05:25 AM, Marc Glisse wrote:

Throwing new is returns_nonnull (errors are reported with exceptions) so 
that's fine, but non-throwing new is not:


int* p1 = new(std::nothrow) int;

Here errors are reported by returning 0, so it is common to test if p1 
is 0 and this is precisely the case that could benefit from a predictor 
but does not have the attribute to do so (there are also consequences on 
aliasing).


Agreed.  both throwing and non-throwing operator new are malloc-like. 
Placement new doesn't throw, it is explicitly defined to return the 
passed in pointer (and it's not replaceable by the user). So may return 
null, but I don't think any code (outside of a conformance testsuite) 
would actually do that.


I can't find words that specify the return value of any allocation 
function is unaliased to any existing object.   The closest it gets is 
that the non-placement forms might be implemented via malloc and friends.


nathan

--
Nathan Sidwell


[GCC][PATCH v2][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-08-01 Thread Sam Tebbs

Hi all,

This patch adds an optimisation that exploits the AArch64 BFXIL
instruction when or-ing the result of two bitwise and operations
with non-overlapping bitmasks
(e.g. (a & 0x) | (b & 0x)).

Example:

unsigned long long combine(unsigned long long a, unsigned long
long b) {
  return (a & 0xll) | (b & 0xll);
}

void read(unsigned long long a, unsigned long long b, unsigned
long long *c) {
  *c = combine(a, b);
}

When compiled with -O2, read would result in:

read:
  and   x5, x1, #0x
  and   x4, x0, #0x
  orr   x4, x4, x5
  str   x4, [x2]
  ret

But with this patch results in:

read:
  mov    x4, x0
  bfxil    x4, x1, 0, 32
  str    x4, [x2]
  ret

Bootstrapped and regtested on aarch64-none-linux-gnu and
aarch64-none-elf with no regressions.


gcc/
2018-08-01  Sam Tebbs

    PR target/85628
    * config/aarch64/aarch64.md (*aarch64_bfxil):
    Define.
    * config/aarch64/constraints.md (Ulc): Define
    * config/aarch64/aarch64-protos.h
(aarch64_is_left_consecutive): Define.
    * config/aarch64/aarch64.c (aarch64_is_left_consecutive):
New function.

gcc/testsuite
2018-08-01  Sam Tebbs

    PR target/85628
    * gcc.target/aarch64/combine_bfxil.c: New file.
    * gcc.target/aarch64/combine_bfxil_2.c: New file.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index af5db9c595385f7586692258f750b6aceb3ed9c8..01d9e1bd634572fcfa60208ba4dc541805af5ccd 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -574,4 +574,6 @@ rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
+bool aarch64_is_left_consecutive (HOST_WIDE_INT);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475aa9ee579b6a3b2526295b622157120660..3cfa51b15af3e241672f1383cf881c12a44494a5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1454,6 +1454,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
 return SImode;
 }
 
+/* Implement IS_LEFT_CONSECUTIVE.  Check if I's bits are consecutive
+   ones from the MSB.  */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+  return (i | (i - 1)) == HOST_WIDE_INT_M1;
+}
+
 /* Implement TARGET_CONSTANT_ALIGNMENT.  Make strings word-aligned so
that strcpy from constants will be faster.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e9c16f9697b766a5c56b6269a83b7276654c5668..ff2db4af38e16630daeada79afc604c4696abf82 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5305,6 +5305,31 @@
   [(set_attr "type" "rev")]
 )
 
+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
+(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r,0")
+		(match_operand:GPI 3 "const_int_operand" "n, Ulc"))
+	(and:GPI (match_operand:GPI 2 "register_operand" "0,r")
+		(match_operand:GPI 4 "const_int_operand" "Ulc, n"]
+  "(INTVAL (operands[3]) == ~INTVAL (operands[4]))
+  && (aarch64_is_left_consecutive (INTVAL (operands[3]))
+|| aarch64_is_left_consecutive (INTVAL (operands[4])))"
+  {
+switch (which_alternative)
+{
+  case 0:
+	operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[3])));
+	return "bfxil\\t%0, %1, 0, %3";
+  case 1:
+	operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[4])));
+	return "bfxil\\t%0, %2, 0, %3";
+  default:
+	gcc_unreachable ();
+}
+  }
+  [(set_attr "type" "bfm")]
+)
+
 ;; There are no canonicalisation rules for the position of the lshiftrt, ashift
 ;; operations within an IOR/AND RTX, therefore we have two patterns matching
 ;; each valid permutation.
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 72cacdabdac52dcb40b480f7a5bfbf4997c742d8..5bae0b70bbd11013a9fb27ec19cf7467eb20135f 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -172,6 +172,13 @@
   A constraint that matches the immediate constant -1."
   (match_test "op == constm1_rtx"))
 
+(define_constraint "Ulc"
+ "@internal
+ A constraint that matches a constant integer whose bits are consecutive ones
+ from the MSB."
+ (and (match_code "const_int")
+  (match_test "aarch64_is_left_consecutive (ival)")))
+
 (define_constraint "Usv"
   "@internal
A constraint that matches a VG-based constant that can be loaded by
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
new file mode 100644
index ..3bc1dd5b216477efe7494dbcdac7a5bf465af218
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -0,0 +1,98 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps" } */
+
+extern void abort(void);
+

Re: [AArch64] Generate load-pairs when the last load clobbers the address register [2/2]

2018-08-01 Thread Richard Earnshaw (lists)
On 19/07/18 11:03, Jackson Woodruff wrote:
> Hi Richard,
> 
> 
> On 07/12/2018 05:35 PM, Richard Earnshaw (lists) wrote:
>> On 11/07/18 17:48, Jackson Woodruff wrote:
>>> Hi Sudi,
>>>
>>> On 07/10/2018 02:29 PM, Sudakshina Das wrote:
 Hi Jackson


 On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote:
> Hi all,
>
> This patch resolves PR86014.  It does so by noticing that the last
> load may clobber the address register without issue (regardless of
> where it exists in the final ldp/stp sequence). That check has been
> changed so that the last register may be clobbered and the testcase
> (gcc.target/aarch64/ldp_stp_10.c) now passes.
>
> Bootstrap and regtest OK.
>
> OK for trunk?
>
> Jackson
>
> Changelog:
>
> gcc/
>
> 2018-06-25  Jackson Woodruff  
>
>  PR target/86014
>  * config/aarch64/aarch64.c
> (aarch64_operands_adjust_ok_for_ldpstp):
>  Remove address clobber check on last register.
>
 This looks good to me but you will need a maintainer to approve it.
 The only
 thing I would add is that if you could move the comment on top of the
 for loop
 to this patch. That is, keep the original
 /* Check if the addresses are clobbered by load.  */
 in your [1/2] and make the comment change in [2/2].
>>> Thanks, change made.  OK for trunk?
>>>
>>> Thanks,
>>>
>>> Jackson
>>>
>>> pr86014.patch
>>>
>>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index
>>> da44b33b2bc12f9aa2122cf5194e244437fb31a5..8a027974e9772cacf5f5cb8ec61e8ef62187e879
>>> 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -17071,9 +17071,10 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx
>>> *operands, bool load,
>>>   return false;
>>>   }
>>>   -  /* Check if addresses are clobbered by load.  */
>>> +  /* Only the last register in the order in which they occur
>>> + may be clobbered by the load.  */
>>>     if (load)
>>> -    for (int i = 0; i < num_instructions; i++)
>>> +    for (int i = 0; i < num_instructions - 1; i++)
>>>     if (reg_mentioned_p (reg[i], mem[i]))
>>>   return false;
>>>  
>> Can we have a new test for this?
> I've added ldp_stp_13.c that tests for this.
>>
>> Also, if rclass (which you calculate later) is FP_REGS, then the test is
>> redundant since mems can never use FP registers as a base register.
> 
> Yes, makes sense.  I've flipped the logic around so that the rclass is
> calculated first and is then used to avoid the base register check if
> it is not GENERAL_REGS.
> 
> Re-bootstrapped and regtested.
> 
> Is this OK for trunk?
> 

OK.

R.

> Thanks,
> 
> Jackson
> 
>>
>> R.
> 
> 
> clobber.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 1369704da3ed8094c0d4612643794e6392dce05a..3dd891ebd00f24ffa4187f0125b306a3c6671bef
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -17084,9 +17084,26 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx 
> *operands, bool load,
>   return false;
>  }
>  
> -  /* Check if addresses are clobbered by load.  */
> -  if (load)
> -for (int i = 0; i < num_insns; i++)
> +  /* Check if the registers are of same class.  */
> +  rclass = REG_P (reg[0]) && FP_REGNUM_P (REGNO (reg[0]))
> +? FP_REGS : GENERAL_REGS;
> +
> +  for (int i = 1; i < num_insns; i++)
> +if (REG_P (reg[i]) && FP_REGNUM_P (REGNO (reg[i])))
> +  {
> + if (rclass != FP_REGS)
> +   return false;
> +  }
> +else
> +  {
> + if (rclass != GENERAL_REGS)
> +   return false;
> +  }
> +
> +  /* Only the last register in the order in which they occur
> + may be clobbered by the load.  */
> +  if (rclass == GENERAL_REGS && load)
> +for (int i = 0; i < num_insns - 1; i++)
>if (reg_mentioned_p (reg[i], mem[i]))
>   return false;
>  
> @@ -17126,22 +17143,6 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx 
> *operands, bool load,
>&& MEM_ALIGN (mem[0]) < 8 * BITS_PER_UNIT)
>  return false;
>  
> -  /* Check if the registers are of same class.  */
> -  rclass = REG_P (reg[0]) && FP_REGNUM_P (REGNO (reg[0]))
> -? FP_REGS : GENERAL_REGS;
> -
> -  for (int i = 1; i < num_insns; i++)
> -if (REG_P (reg[i]) && FP_REGNUM_P (REGNO (reg[i])))
> -  {
> - if (rclass != FP_REGS)
> -   return false;
> -  }
> -else
> -  {
> - if (rclass != GENERAL_REGS)
> -   return false;
> -  }
> -
>return true;
>  }
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_13.c 
> b/gcc/testsuite/gcc.target/aarch64/ldp_stp_13.c
> new file mode 100644
> index 
> ..9cc3942f153773e8ffe9bcaf07f6b32dc0d5f95e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_13.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 

Re: [PATCH] Add malloc predictor (PR middle-end/83023).

2018-08-01 Thread Marc Glisse

On Wed, 1 Aug 2018, Martin Liška wrote:


On 08/01/2018 02:25 PM, Marc Glisse wrote:

On Wed, 1 Aug 2018, Martin Liška wrote:


On 07/27/2018 02:38 PM, Marc Glisse wrote:

On Fri, 27 Jul 2018, Martin Liška wrote:


So answer is yes, the builtin can be then removed.


Good, thanks. While looking at how widely it is going to apply, I noticed that the 
default, throwing operator new has attribute malloc and everything, but the 
non-throwing variant declared in  doesn't, so it won't benefit from the 
new predictor. I don't know if there is a good reason for this disparity...



Well in case somebody uses operator new:

    int* p1 = new int;
    if (p1)
 delete p1;

we optimize out that to if (true), even when one has used defined
operator new. Thus it's probably OK.


Throwing new is returns_nonnull (errors are reported with exceptions) so that's 
fine, but non-throwing new is not:

int* p1 = new(std::nothrow) int;

Here errors are reported by returning 0, so it is common to test if p1 is 0 and 
this is precisely the case that could benefit from a predictor but does not 
have the attribute to do so (there are also consequences on aliasing).


Then it can be handled with DECL_IS_OPERATOR_NEW, for those we can also set the 
newly introduced predictor.


Independently of whether you extend the predictor to DECL_IS_OPERATOR_NEW, 
it would be good for this nothrow operator new to get the aliasing 
benefits of attribute malloc. I'll open a PR.



(Jan's remark about functions with an inferred malloc attribute reminds me that 
at some point, the code was adding attribute malloc for functions that always 
return 0...)


By inferred do you mean function that are marked as malloc in IPA pure-const 
(propagate_malloc)?


Yes.


Example would be appreciated.


I used the past tense, I am not claiming this still happens.

--
Marc Glisse


Re: [PATCH] Handle overlength strings in C++ FE

2018-08-01 Thread Nathan Sidwell

On 08/01/2018 04:27 AM, Bernd Edlinger wrote:

Hi,

this makes too long string constants shorter,
and fixes one place where a string constant is created
non-zero terminated.  This is a cleanup in preparation
of a more thorough check on the STRING_CST objects
in the middle-end.

Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


ok.


--
Nathan Sidwell


[PATCH] Backport gettext fixes to get rid of warnings on macOS

2018-08-01 Thread Simon Marchi
This patch was tested to build binutils-gdb on GNU/Linux and macOS.  It can be
applied to the gcc repo too, after fixing some trivial merge conflicts (someone
else will need to do it, as I don't have push access to gcc).  Although I think
it is relatively low-risk, building gcc on macOS was not tested with this
patch, so if somebody that has already a macOS build can do it, it would be
appreciated.

Two fixes were committed recently to the gettext repo in order to make
gdb build warning-free on macOS.  This patch backports them both:

  - Make the format_arg attribute effective also in the case 
_INTL_REDIRECT_INLINE.
113893dce80358a4ae0d9463ce73c5670c81cf0c

http://git.savannah.gnu.org/cgit/gettext.git/commit/?id=113893dce80358a4ae0d9463ce73c5670c81cf0c

  - Enable the format_arg attribute also on clang on Mac OS X.
bd6a52241c7c83c90e043ace2082a2508d273f55

http://git.savannah.gnu.org/cgit/gettext.git/commit/?id=bd6a52241c7c83c90e043ace2082a2508d273f55

intl/ChangeLog:

* libgnuintl.h (_INTL_MAY_RETURN_STRING_ARG, gettext, dgettext,
dcgettext, ngettext, dngettext, dcngettext): Backport changes
from upstream gettext.
---
 intl/libgnuintl.h | 35 +++
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/intl/libgnuintl.h b/intl/libgnuintl.h
index acc9093..7616d6f 100644
--- a/intl/libgnuintl.h
+++ b/intl/libgnuintl.h
@@ -115,7 +115,7 @@ extern "C" {
 /* _INTL_MAY_RETURN_STRING_ARG(n) declares that the given function may return
its n-th argument literally.  This enables GCC to warn for example about
printf (gettext ("foo %y")).  */
-#if __GNUC__ >= 3 && !(__APPLE_CC__ > 1 && defined __cplusplus)
+#if defined __GNUC__ && __GNUC__ >= 3 && !(defined __APPLE_CC__ && 
__APPLE_CC__ > 1 && !(defined __clang__ && __clang__ && __clang_major__ >= 3) 
&& defined __cplusplus)
 # define _INTL_MAY_RETURN_STRING_ARG(n) __attribute__ ((__format_arg__ (n)))
 #else
 # define _INTL_MAY_RETURN_STRING_ARG(n)
@@ -127,7 +127,9 @@ extern "C" {
 #ifdef _INTL_REDIRECT_INLINE
 extern char *libintl_gettext (const char *__msgid)
_INTL_MAY_RETURN_STRING_ARG (1);
-static inline char *gettext (const char *__msgid)
+static inline
+_INTL_MAY_RETURN_STRING_ARG (1)
+char *gettext (const char *__msgid)
 {
   return libintl_gettext (__msgid);
 }
@@ -145,7 +147,9 @@ extern char *gettext _INTL_PARAMS ((const char *__msgid))
 #ifdef _INTL_REDIRECT_INLINE
 extern char *libintl_dgettext (const char *__domainname, const char *__msgid)
_INTL_MAY_RETURN_STRING_ARG (2);
-static inline char *dgettext (const char *__domainname, const char *__msgid)
+static inline
+_INTL_MAY_RETURN_STRING_ARG (2)
+char *dgettext (const char *__domainname, const char *__msgid)
 {
   return libintl_dgettext (__domainname, __msgid);
 }
@@ -165,8 +169,9 @@ extern char *dgettext _INTL_PARAMS ((const char 
*__domainname,
 extern char *libintl_dcgettext (const char *__domainname, const char *__msgid,
int __category)
_INTL_MAY_RETURN_STRING_ARG (2);
-static inline char *dcgettext (const char *__domainname, const char *__msgid,
-  int __category)
+static inline
+_INTL_MAY_RETURN_STRING_ARG (2)
+char *dcgettext (const char *__domainname, const char *__msgid, int __category)
 {
   return libintl_dcgettext (__domainname, __msgid, __category);
 }
@@ -188,8 +193,10 @@ extern char *dcgettext _INTL_PARAMS ((const char 
*__domainname,
 extern char *libintl_ngettext (const char *__msgid1, const char *__msgid2,
   unsigned long int __n)
_INTL_MAY_RETURN_STRING_ARG (1) _INTL_MAY_RETURN_STRING_ARG (2);
-static inline char *ngettext (const char *__msgid1, const char *__msgid2,
- unsigned long int __n)
+static inline
+_INTL_MAY_RETURN_STRING_ARG (1) _INTL_MAY_RETURN_STRING_ARG (2)
+char *ngettext (const char *__msgid1, const char *__msgid2,
+   unsigned long int __n)
 {
   return libintl_ngettext (__msgid1, __msgid2, __n);
 }
@@ -210,8 +217,10 @@ extern char *ngettext _INTL_PARAMS ((const char *__msgid1,
 extern char *libintl_dngettext (const char *__domainname, const char *__msgid1,
const char *__msgid2, unsigned long int __n)
_INTL_MAY_RETURN_STRING_ARG (2) _INTL_MAY_RETURN_STRING_ARG (3);
-static inline char *dngettext (const char *__domainname, const char *__msgid1,
-  const char *__msgid2, unsigned long int __n)
+static inline
+_INTL_MAY_RETURN_STRING_ARG (2) _INTL_MAY_RETURN_STRING_ARG (3)
+char *dngettext (const char *__domainname, const char *__msgid1,
+const char *__msgid2, unsigned long int __n)
 {
   return libintl_dngettext (__domainname, __msgid1, __msgid2, __n);
 }
@@ -234,9 +243,11 @@ extern char *libintl_dcngettext (const char *__domainname,
 const char *__msgid1, const char *__msgid2,
 

Re: [PATCH][3/4] Use RPO VN from unrolling

2018-08-01 Thread Richard Sandiford
Richard Biener  writes:
> This should be 4/4 but I have the main patch on top, so...
>
> This uses the region-based VN from GIMPLE unrolling which means
> we better approximate the effects optimizations on unrolled inner
> loops when evaluating whether to unroll outer ones.

Great!  Sounds like it should also fix cases where missed value-numbering
opportunities after unrolling prevented SLP vectorisation.  (Hit a few of
those, but don't have the testcases to hand unfortunately.)

Thanks,
Richard

>
>   * tree-ssa-loop-ivcanon.c: Include tree-ssa-sccvn.h.
>   (propagate_constants_for_unrolling): Remove.
>   (tree_unroll_loops_completely): Perform value-numbering
>   on the unrolled bodies loop parent.
>
>   * gfortran.dg/reassoc_4.f: Change max-completely-peeled-insns
>   param to current default.
> ---
>  gcc/testsuite/gfortran.dg/reassoc_4.f |  2 +-
>  gcc/tree-ssa-loop-ivcanon.c   | 57 
> ++-
>  2 files changed, 10 insertions(+), 49 deletions(-)
>
> diff --git a/gcc/testsuite/gfortran.dg/reassoc_4.f 
> b/gcc/testsuite/gfortran.dg/reassoc_4.f
> index b155cba768c..07b4affb2a4 100644
> --- a/gcc/testsuite/gfortran.dg/reassoc_4.f
> +++ b/gcc/testsuite/gfortran.dg/reassoc_4.f
> @@ -1,5 +1,5 @@
>  ! { dg-do compile }
> -! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param 
> max-completely-peeled-insns=400" }
> +! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param 
> max-completely-peeled-insns=200" }
>  ! { dg-additional-options "--param max-completely-peel-times=16" { target 
> spu-*-* } }
>subroutine anisonl(w,vo,anisox,s,ii1,jj1,weight)
>integer ii1,jj1,i1,iii1,j1,jjj1,k1,l1,m1,n1
> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> index 326589f63c3..97c2ad94985 100644
> --- a/gcc/tree-ssa-loop-ivcanon.c
> +++ b/gcc/tree-ssa-loop-ivcanon.c
> @@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-inline.h"
>  #include "tree-cfgcleanup.h"
>  #include "builtins.h"
> +#include "tree-ssa-sccvn.h"
>  
>  /* Specifies types of loops that may be unrolled.  */
>  
> @@ -1318,50 +1319,6 @@ canonicalize_induction_variables (void)
>return 0;
>  }
>  
> -/* Propagate constant SSA_NAMEs defined in basic block BB.  */
> -
> -static void
> -propagate_constants_for_unrolling (basic_block bb)
> -{
> -  /* Look for degenerate PHI nodes with constant argument.  */
> -  for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi); )
> -{
> -  gphi *phi = gsi.phi ();
> -  tree result = gimple_phi_result (phi);
> -  tree arg = gimple_phi_arg_def (phi, 0);
> -
> -  if (! SSA_NAME_OCCURS_IN_ABNORMAL_PHI (result)
> -   && gimple_phi_num_args (phi) == 1
> -   && CONSTANT_CLASS_P (arg))
> - {
> -   replace_uses_by (result, arg);
> -   gsi_remove (, true);
> -   release_ssa_name (result);
> - }
> -  else
> - gsi_next ();
> -}
> -
> -  /* Look for assignments to SSA names with constant RHS.  */
> -  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); )
> -{
> -  gimple *stmt = gsi_stmt (gsi);
> -  tree lhs;
> -
> -  if (is_gimple_assign (stmt)
> -   && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_constant
> -   && (lhs = gimple_assign_lhs (stmt), TREE_CODE (lhs) == SSA_NAME)
> -   && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
> - {
> -   replace_uses_by (lhs, gimple_assign_rhs1 (stmt));
> -   gsi_remove (, true);
> -   release_ssa_name (lhs);
> - }
> -  else
> - gsi_next ();
> -}
> -}
> -
>  /* Process loops from innermost to outer, stopping at the innermost
> loop we unrolled.  */
>  
> @@ -1512,10 +1469,14 @@ tree_unroll_loops_completely (bool may_increase_size, 
> bool unroll_outer)
> EXECUTE_IF_SET_IN_BITMAP (fathers, 0, i, bi)
>   {
> loop_p father = get_loop (cfun, i);
> -   basic_block *body = get_loop_body_in_dom_order (father);
> -   for (unsigned j = 0; j < father->num_nodes; j++)
> - propagate_constants_for_unrolling (body[j]);
> -   free (body);
> +   bitmap exit_bbs = BITMAP_ALLOC (NULL);
> +   loop_exit *exit = father->exits->next;
> +   while (exit->e)
> + {
> +   bitmap_set_bit (exit_bbs, exit->e->dest->index);
> +   exit = exit->next;
> + }
> +   do_rpo_vn (cfun, loop_preheader_edge (father), exit_bbs);
>   }
> BITMAP_FREE (fathers);


Re: [PATCH] change %G argument from gcall* to gimple*

2018-08-01 Thread David Malcolm
On Tue, 2018-07-31 at 13:06 -0600, Martin Sebor wrote:
> The GCC internal %G directive takes a gcall* argument and prints
> the call's inlining stack in diagnostics.  The argument type makes
> it unsuitable for gimple expressions such as those diagnosed by
> -Warray-bounds.
> 
> As the first step in adding inlining context to -Warray-bounds
> warnings the attached patch changes the %G argument to accept
> gimple* instead of gcall*.  (More work is needed for %G to
> preserve the location range within diagnostics so this patch
> just implements the first step.)

Thanks for the patch.

I'm afraid I've been touching some of the same code recently (as part
of my work on dumpfile.c), so I think this patch needs rebasing and
retesting (sorry!).

In particular, my r263181:
  https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01765.html
   ("c-family: clean up the data tables in c-format.c",
aka 98605dea9f97f74e6a5e75308774c117292b184e)
cleaned up part of c-format.c that your patch touches; I think your
patch is from before then.

Also, I noticed that your patch conflicts with my (not yet approved)
patch here:

   [PATCH 5/5] Formatted printing for dump_* in the middle-end
 https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01763.html

(which *adds* a usage of "gimple *" for a new pretty_printer subclass,
whereas yours changes the "gcall *" usage to "gimple *"), so we need to
sync up on that - I'm volunteering for me to wait for you (but please
send me a heads-up email when you eventually commit).

> PR tree-optimization/86650 - -Warray-bounds missing inlining context
> 
> gcc/c/ChangeLog:
> 
>   PR tree-optimization/86650
>   * c-objc-common.c (c_tree_printer): Adjust.

I feel a bit hypocritical saying this, as I dislike writing ChangeLog
entries, but I find this one too terse: I find myself asking "what
adjustment is being made, and why?"

How about something like:

gcc/c/ChangeLog:

PR tree-optimization/86650
* c-objc-common.c (c_tree_printer): Move usage of
EXPR_LOCATION (t) and TREE_BLOCK (t) from within percent_K_format
to this callsite.

or somesuch (assuming that I've read the intent of the change
correctly); *is* that the intent of this part of the patch?

> gcc/c-family/ChangeLog:
> 
>   PR tree-optimization/86650
>   * c-format.c (local_gcall_ptr_node): Rename...
>(local_gimple_ptr_node): ...to this.
>   * c-format.h (T89_G): Adjust.

Likewise, I find this too terse, and it's incomplete: it's missing the
changes to gcc_diag_char_table and to init_dynamic_diag_info.

How about something like this:

* c-format.c (local_gcall_ptr_node): Rename...
(local_gimple_ptr_node): ...to this.
(gcc_diag_char_table): Update comment for "%G".
(init_dynamic_diag_info): Update from "gcall *" to "gimple *".
* c-format.h (T89_G): Update to be "gimple *" rather than
"gcall *".

FWIW I use this script to help ChangeLog entries.
It saves a lot of gruntwork:

  
https://github.com/davidmalcolm/gcc-refactoring-scripts/blob/master/generate-changelog.py

(but the remaining work is still tedious, alas)

> gcc/cp/ChangeLog:
> 
>   PR tree-optimization/86650
>   * error.c (cp_printer): Adjust.

See the suggestion above for c-objc-common.c (c_tree_printer).

> gcc/ChangeLog:
> 
>   PR tree-optimization/86650
>   * gimple-pretty-print.c (percent_G_format): Simplify.
>   * tree-diagnostic.c (default_tree_printer): Adjust.
>   * tree-pretty-print.c (percent_K_format): Add argument.
>   * tree-pretty-print.h: Add argument.
>   * gimple-fold.c (gimple_fold_builtin_strncpy): Adjust.
>   * gimple-ssa-warn-restrict.h (check_bounds_or_overlap): Replace
>   gcall* argument with gimple*.
>   * gimple-ssa-warn-restrict.c (check_call): Same.
>   (wrestrict_dom_walker::before_dom_children): Same.
>   (builtin_access::builtin_access): Same.
>   (check_bounds_or_overlap): Same.
>   * tree-ssa-ccp.c (pass_post_ipa_warn::execute): Adjust.
>   * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Adjust.

The filenames in this changelog entry aren't in alphabetical order. Was
that deliberate, or an accident of the way you generated them?  It
makes it harder to review the change, as the changes aren't in the
same order as the patch itself.  I think it can occasionally be useful
to do them out-of-order if it helps express the intent of the change,
but I  don't think that's the case here; keeping them in alphabetical
order is probably best here.

Please can you provide a less terse ChangeLog describing the intent of
the changes.  I believe the two essential things in your patch are:

(a) moving the usage of EXPR_LOCATION (t) and TREE_BLOCK (t) from
within percent_K/G_format out to all of their callsites, and

(b) the change from gcall * to gimple *,

assuming I'm reading things right, but in my pre-caffeinated state I'd
greatly prefer the ChangeLog spell that out.

> gcc/testsuite/ChangeLog:
> 

[PATCH][3/4] Use RPO VN from unrolling

2018-08-01 Thread Richard Biener


This should be 4/4 but I have the main patch on top, so...

This uses the region-based VN from GIMPLE unrolling which means
we better approximate the effects optimizations on unrolled inner
loops when evaluating whether to unroll outer ones.

* tree-ssa-loop-ivcanon.c: Include tree-ssa-sccvn.h.
(propagate_constants_for_unrolling): Remove.
(tree_unroll_loops_completely): Perform value-numbering
on the unrolled bodies loop parent.

* gfortran.dg/reassoc_4.f: Change max-completely-peeled-insns
param to current default.
---
 gcc/testsuite/gfortran.dg/reassoc_4.f |  2 +-
 gcc/tree-ssa-loop-ivcanon.c   | 57 ++-
 2 files changed, 10 insertions(+), 49 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/reassoc_4.f 
b/gcc/testsuite/gfortran.dg/reassoc_4.f
index b155cba768c..07b4affb2a4 100644
--- a/gcc/testsuite/gfortran.dg/reassoc_4.f
+++ b/gcc/testsuite/gfortran.dg/reassoc_4.f
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param 
max-completely-peeled-insns=400" }
+! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param 
max-completely-peeled-insns=200" }
 ! { dg-additional-options "--param max-completely-peel-times=16" { target 
spu-*-* } }
   subroutine anisonl(w,vo,anisox,s,ii1,jj1,weight)
   integer ii1,jj1,i1,iii1,j1,jjj1,k1,l1,m1,n1
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 326589f63c3..97c2ad94985 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "tree-cfgcleanup.h"
 #include "builtins.h"
+#include "tree-ssa-sccvn.h"
 
 /* Specifies types of loops that may be unrolled.  */
 
@@ -1318,50 +1319,6 @@ canonicalize_induction_variables (void)
   return 0;
 }
 
-/* Propagate constant SSA_NAMEs defined in basic block BB.  */
-
-static void
-propagate_constants_for_unrolling (basic_block bb)
-{
-  /* Look for degenerate PHI nodes with constant argument.  */
-  for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi); )
-{
-  gphi *phi = gsi.phi ();
-  tree result = gimple_phi_result (phi);
-  tree arg = gimple_phi_arg_def (phi, 0);
-
-  if (! SSA_NAME_OCCURS_IN_ABNORMAL_PHI (result)
- && gimple_phi_num_args (phi) == 1
- && CONSTANT_CLASS_P (arg))
-   {
- replace_uses_by (result, arg);
- gsi_remove (, true);
- release_ssa_name (result);
-   }
-  else
-   gsi_next ();
-}
-
-  /* Look for assignments to SSA names with constant RHS.  */
-  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); )
-{
-  gimple *stmt = gsi_stmt (gsi);
-  tree lhs;
-
-  if (is_gimple_assign (stmt)
- && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_constant
- && (lhs = gimple_assign_lhs (stmt), TREE_CODE (lhs) == SSA_NAME)
- && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
-   {
- replace_uses_by (lhs, gimple_assign_rhs1 (stmt));
- gsi_remove (, true);
- release_ssa_name (lhs);
-   }
-  else
-   gsi_next ();
-}
-}
-
 /* Process loops from innermost to outer, stopping at the innermost
loop we unrolled.  */
 
@@ -1512,10 +1469,14 @@ tree_unroll_loops_completely (bool may_increase_size, 
bool unroll_outer)
  EXECUTE_IF_SET_IN_BITMAP (fathers, 0, i, bi)
{
  loop_p father = get_loop (cfun, i);
- basic_block *body = get_loop_body_in_dom_order (father);
- for (unsigned j = 0; j < father->num_nodes; j++)
-   propagate_constants_for_unrolling (body[j]);
- free (body);
+ bitmap exit_bbs = BITMAP_ALLOC (NULL);
+ loop_exit *exit = father->exits->next;
+ while (exit->e)
+   {
+ bitmap_set_bit (exit_bbs, exit->e->dest->index);
+ exit = exit->next;
+   }
+ do_rpo_vn (cfun, loop_preheader_edge (father), exit_bbs);
}
  BITMAP_FREE (fathers);
 
-- 
2.13.7



[PATCH][2/4] Add rev_post_order_and_mark_dfs_back_seme

2018-08-01 Thread Richard Biener


This adds RPO finding on SEME regions, marking backedges in the
region on-the-fly.  RPO value-numbering uses this first and foremost
in region mode but also for whole-function since it has a way
to visit non-loop-exit edges first leading to a more local
iteration order.

2018-07-04  Richard Biener  

* cfganal.h (rev_post_order_and_mark_dfs_back_seme): Declare.
* cfganal.c (rev_post_order_and_mark_dfs_back_seme): New function.
---
 gcc/cfganal.c | 113 ++
 gcc/cfganal.h |   2 ++
 2 files changed, 115 insertions(+)

diff --git a/gcc/cfganal.c b/gcc/cfganal.c
index b9944c6ef98..3cb608aefc0 100644
--- a/gcc/cfganal.c
+++ b/gcc/cfganal.c
@@ -1057,6 +1057,119 @@ pre_and_rev_post_order_compute (int *pre_order, int 
*rev_post_order,
   return pre_order_num;
 }
 
+/* Unline pre_and_rev_post_order_compute we fill rev_post_order backwards
+   so iterating in RPO order needs to start with rev_post_order[n - 1]
+   going to rev_post_order[0].  If FOR_ITERATION is true then try to
+   make CFG cycles fit into small contiguous regions of the RPO order.
+   When FOR_ITERATION is true this requires up-to-date loop structures.  */
+
+int
+rev_post_order_and_mark_dfs_back_seme (struct function *fn, edge entry,
+  bitmap exit_bbs, bool for_iteration,
+  int *rev_post_order)
+{
+  int pre_order_num = 0;
+  int rev_post_order_num = 0;
+
+  /* Allocate stack for back-tracking up CFG.  Worst case we need
+ O(n^2) edges but the following should suffice in practice without
+ a need to re-allocate.  */
+  auto_vec stack (2 * n_basic_blocks_for_fn (fn));
+
+  int *pre = XNEWVEC (int, 2 * last_basic_block_for_fn (fn));
+  int *post = pre + last_basic_block_for_fn (fn);
+
+  /* BB flag to track nodes that have been visited.  */
+  auto_bb_flag visited (fn);
+  /* BB flag to track which nodes whave post[] assigned to avoid
+ zeroing post.  */
+  auto_bb_flag post_assigned (fn);
+
+  /* Push the first edge on to the stack.  */
+  stack.quick_push (entry);
+
+  while (!stack.is_empty ())
+{
+  basic_block src;
+  basic_block dest;
+
+  /* Look at the edge on the top of the stack.  */
+  int idx = stack.length () - 1;
+  edge e = stack[idx];
+  src = e->src;
+  dest = e->dest;
+  e->flags &= ~EDGE_DFS_BACK;
+
+  /* Check if the edge destination has been visited yet.  */
+  if (! bitmap_bit_p (exit_bbs, dest->index)
+ && ! (dest->flags & visited))
+   {
+ /* Mark that we have visited the destination.  */
+ dest->flags |= visited;
+
+ pre[dest->index] = pre_order_num++;
+
+ if (EDGE_COUNT (dest->succs) > 0)
+   {
+ /* Since the DEST node has been visited for the first
+time, check its successors.  */
+ /* Push the edge vector in reverse to match previous behavior.  */
+ stack.reserve (EDGE_COUNT (dest->succs));
+ for (int i = EDGE_COUNT (dest->succs) - 1; i >= 0; --i)
+   stack.quick_push (EDGE_SUCC (dest, i));
+ /* Generalize to handle more successors?  */
+ if (for_iteration
+ && EDGE_COUNT (dest->succs) == 2)
+   {
+ edge  = stack[stack.length () - 2];
+ if (loop_exit_edge_p (e1->src->loop_father, e1))
+   std::swap (e1, stack.last ());
+   }
+   }
+ else
+   {
+ /* There are no successors for the DEST node so assign
+its reverse completion number.  */
+ post[dest->index] = rev_post_order_num;
+ dest->flags |= post_assigned;
+ rev_post_order[rev_post_order_num] = dest->index;
+ rev_post_order_num++;
+   }
+   }
+  else
+   {
+ if (dest->flags & visited
+ && src != entry->src
+ && pre[src->index] >= pre[dest->index]
+ && !(dest->flags & post_assigned))
+   e->flags |= EDGE_DFS_BACK;
+
+ if (idx != 0 && stack[idx - 1]->src != src)
+   {
+ /* There are no more successors for the SRC node
+so assign its reverse completion number.  */
+ post[src->index] = rev_post_order_num;
+ src->flags |= post_assigned;
+ rev_post_order[rev_post_order_num] = src->index;
+ rev_post_order_num++;
+   }
+
+ stack.pop ();
+   }
+}
+
+  free (pre);
+
+  /* Clear the temporarily allocated flags.  */
+  for (int i = 0; i < rev_post_order_num; ++i)
+BASIC_BLOCK_FOR_FN (fn, rev_post_order[i])->flags
+  &= ~(post_assigned|visited);
+
+  return rev_post_order_num;
+}
+
+
+
 /* Compute the depth first search order on the _reverse_ graph and
store in the array DFS_ORDER, marking the nodes visited in VISITED.
Returns the number of nodes 

Re: [0/5] C-SKY port

2018-08-01 Thread 瞿仙淼


>>> We expect that
>>> C-SKY will also be providing a public link to the processor and ABI
>>> documentation at some point.
>> 
>> The ABI manual has been posted, but not the ISA documentation yet.  (I'd 
>> guess
>> that when it does show up it will be in the same place, though.)
>> 
>> https://github.com/c-sky/csky-doc
> 
> Could you provide the proposed GCC website changes for the port 
> (backends.html, readings.html, news item for index.html)?  readings.html, 
> in particular, would link to the ABI and ISA documentation, while 
> backends.html gives summary information about the properties of both the 
> architecture and the GCC port.
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

Hi,
The ISA documentation is now available from  
https://github.com/c-sky/csky-doc

-Xianmiao

[PATCH][1/4] Add dynamic CFG flag allocation

2018-08-01 Thread Richard Biener


I've posted this previously and didn't change it, the discussion went
down bikeshedding on C++.

* cfg.h (struct control_flow_graph): Add edge_flags_allocated and
bb_flags_allocated members.
(auto_flag): New RAII class for allocating flags.
(auto_edge_flag): New RAII class for allocating edge flags.
(auto_bb_flag): New RAII class for allocating bb flags.
* cfgloop.c (verify_loop_structure): Allocate temporary edge
flag dynamically.
* cfganal.c (dfs_enumerate_from): Remove use of visited sbitmap
in favor of temporarily allocated BB flag.
* hsa-brig.c: Re-order includes.
* hsa-dump.c: Likewise.
* hsa-regalloc.c: Likewise.
* print-rtl.c: Likewise.
* profile-count.c: Likewise.
---
 gcc/cfg.c   |  2 ++
 gcc/cfg.h   | 60 +
 gcc/cfganal.c   | 37 -
 gcc/cfgloop.c   |  9 
 gcc/hsa-brig.c  |  2 +-
 gcc/hsa-dump.c  |  2 +-
 gcc/hsa-regalloc.c  |  4 ++--
 gcc/print-rtl.c |  2 +-
 gcc/profile-count.c |  2 +-
 9 files changed, 77 insertions(+), 43 deletions(-)

diff --git a/gcc/cfg.c b/gcc/cfg.c
index 6d55516adad..7be89d40604 100644
--- a/gcc/cfg.c
+++ b/gcc/cfg.c
@@ -79,6 +79,8 @@ init_flow (struct function *the_fun)
 = EXIT_BLOCK_PTR_FOR_FN (the_fun);
   EXIT_BLOCK_PTR_FOR_FN (the_fun)->prev_bb
 = ENTRY_BLOCK_PTR_FOR_FN (the_fun);
+  the_fun->cfg->edge_flags_allocated = EDGE_ALL_FLAGS;
+  the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS;
 }
 
 /* Helper function for remove_edge and clear_edges.  Frees edge structure
diff --git a/gcc/cfg.h b/gcc/cfg.h
index 0953456782b..9fff135d11f 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -74,6 +74,10 @@ struct GTY(()) control_flow_graph {
 
   /* Maximal count of BB in function.  */
   profile_count count_max;
+
+  /* Dynamically allocated edge/bb flags.  */
+  int edge_flags_allocated;
+  int bb_flags_allocated;
 };
 
 
@@ -121,4 +125,60 @@ extern basic_block get_bb_copy (basic_block);
 void set_loop_copy (struct loop *, struct loop *);
 struct loop *get_loop_copy (struct loop *);
 
+/* Generic RAII class to allocate a bit from storage of integer type T.
+   The allocated bit is accessible as mask with the single bit set
+   via the conversion operator to T.  */
+
+template 
+class auto_flag
+{
+public:
+  /* static assert T is integer type of max HOST_WIDE_INT precision.  */
+  auto_flag (T *sptr)
+{
+  m_sptr = sptr;
+  int free_bit = ffs_hwi (~*sptr);
+  /* If there are no unset bits... */
+  if (free_bit == 0)
+   gcc_unreachable ();
+  m_flag = HOST_WIDE_INT_1U << (free_bit - 1);
+  /* ...or if T is signed and thus the complement is sign-extended,
+ check if we ran out of bits.  We could spare us this bit
+if we could use C++11 std::make_unsigned::type to pass
+~*sptr to ffs_hwi.  */
+  if (m_flag == 0)
+   gcc_unreachable ();
+  gcc_checking_assert ((*sptr & m_flag) == 0);
+  *sptr |= m_flag;
+}
+  ~auto_flag ()
+{
+  gcc_checking_assert ((*m_sptr & m_flag) == m_flag);
+  *m_sptr &= ~m_flag;
+}
+  operator T () const { return m_flag; }
+private:
+  T *m_sptr;
+  T m_flag;
+};
+
+/* RAII class to allocate an edge flag for temporary use.  You have
+   to clear the flag from all edges when you are finished using it.  */
+
+class auto_edge_flag : public auto_flag
+{
+public:
+  auto_edge_flag (function *fun)
+: auto_flag (>cfg->edge_flags_allocated) {}
+};
+
+/* RAII class to allocate a bb flag for temporary use.  You have
+   to clear the flag from all edges when you are finished using it.  */
+class auto_bb_flag : public auto_flag
+{
+public:
+  auto_bb_flag (function *fun)
+: auto_flag (>cfg->bb_flags_allocated) {}
+};
+
 #endif /* GCC_CFG_H */
diff --git a/gcc/cfganal.c b/gcc/cfganal.c
index a901b3f3f2c..b9944c6ef98 100644
--- a/gcc/cfganal.c
+++ b/gcc/cfganal.c
@@ -1145,41 +1145,12 @@ dfs_enumerate_from (basic_block bb, int reverse,
 {
   basic_block *st, lbb;
   int sp = 0, tv = 0;
-  unsigned size;
 
-  /* A bitmap to keep track of visited blocks.  Allocating it each time
- this function is called is not possible, since dfs_enumerate_from
- is often used on small (almost) disjoint parts of cfg (bodies of
- loops), and allocating a large sbitmap would lead to quadratic
- behavior.  */
-  static sbitmap visited;
-  static unsigned v_size;
+  auto_bb_flag visited (cfun);
 
-#define MARK_VISITED(BB) (bitmap_set_bit (visited, (BB)->index))
-#define UNMARK_VISITED(BB) (bitmap_clear_bit (visited, (BB)->index))
-#define VISITED_P(BB) (bitmap_bit_p (visited, (BB)->index))
-
-  /* Resize the VISITED sbitmap if necessary.  */
-  size = last_basic_block_for_fn (cfun);
-  if (size < 10)
-size = 10;
-
-  if (!visited)
-{
-
-  visited = sbitmap_alloc (size);
-  bitmap_clear (visited);
-  v_size = size;
-}

[PATCH][0/4][RFC] RPO style value-numbering

2018-08-01 Thread Richard Biener


This rewrites the value-numbering algorithm used for FRE and PRE from
SSA SCC based to RPO based, thus switching from an algorithm that
handles SSA SCCs optimistically to one that handles CFG SCCs 
optimistically.

The main motivation for this besides being more optimistic was that
adding CFG context sensitive info is easier in RPO style.  Also
tracking availability and thus making expression simplification not
based on values like with SCCVN is possible which allows us to remove
all the warts that scrap side-info we store on SSA names.  It also
fixes PR86554 which is another manifestation of the same issue.

Another motivation was that we're in the need of applying value-numbering
on regions like when unrolling loops or as part of cleanup on code
generated by other passes like the vectorizer.  Thus this rewrite
makes sure that the value-numbering works efficiently on regions
(though in a non-iterative mode), avoiding work and space that is
on the order of the function size rather than the region size to work on.
Sofar the GIMPLE unroller makes use of this, scrapping its own
simple constant propagation engine.  I expect that DOM could get rid of
its value-numbering and instead use a non-iterative RPO-VN run as well.

The patch adds something called predication but it just implements
what I put on top of SCCVN to not regress in that area.

With more optimistic handling comes compile-time regressions and
without limiting I can observe for example a 8% compile-time regression
on 416.gamess which contains loop depths exceeding 8.  The patch now
contains heuristics to selectively value-number backedges optimistically
or not and chooses to do so for the innermost 3 and the outermost loop
of a nest (controlled by --param rpo-vn-max-loop-depth).  I have not
yet played with other values of the param nor re-measured compile-time
for SPEC 2k6.

I've bootstrapped and tested the series on x86_64-unknown-linux-gnu
with bootstrap-O1 and regular bootstrap.

I plan to go forward with this for GCC 9.

Comments?

Thanks,
Richard.


Re: [PATCH] Avoid infinite loop with duplicate anonymous union fields

2018-08-01 Thread Bogdan Harjoc
On Wed, Aug 1, 2018 at 1:20 AM, Joseph Myers  wrote:
> On Wed, 1 Aug 2018, Bogdan Harjoc wrote:
>
>> So array[0] < component < array[2], which loops (I removed the gdb p
>> commands for field_array[1] and so on).
>
> Is the key thing here that you end up with DECL_NAME (field) == NULL_TREE,
> but DECL_NAME (field_array[bot]) != NULL_TREE - and in this particular
> case of a bad ordering only, it's possible to loop without either top or
> bot being changed?  (But other details of the DECL_NAME ordering are
> needed to actually get to that particular point.)

Yes, once it enters the "if DECL_NAME (field) == NULL_TREE" body, only
bot can change, and since "DECL_NAME (field_array[bot]) == NULL_TREE"
is false, the inner while never runs, so it skips directly to
"continue;" below with no changes to bot or top. So the function looks
correct, as long as field_array really is qsort'ed if
TYPE_LANG_SPECIFIC is set.

> seen_error () is the idiomatic way of testing whether an error has been
> reported.

The updated patch is attached and includes a test that passes with:

  make check-gcc RUNTESTFLAGS="dg.exp=union-duplicate-field.c"
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 90ae306c9..5fc62d84d 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -2209,7 +2209,11 @@ lookup_field (tree type, tree component)
  find the element.  Otherwise, do a linear search.  TYPE_LANG_SPECIFIC
  will always be set for structures which have many elements.  */
 
-  if (TYPE_LANG_SPECIFIC (type) && TYPE_LANG_SPECIFIC (type)->s)
+  /* Duplicate field checking replaces duplicates with NULL_TREE so
+ TYPE_LANG_SPECIFIC arrays are potentially no longer sorted. In that
+ case just iterate using DECL_CHAIN. */
+
+  if (TYPE_LANG_SPECIFIC (type) && TYPE_LANG_SPECIFIC (type)->s && !seen_error())
 {
   int bot, top, half;
   tree *field_array = _LANG_SPECIFIC (type)->s->elts[0];
diff --git a/gcc/testsuite/gcc.dg/union-duplicate-field.c b/gcc/testsuite/gcc.dg/union-duplicate-field.c
new file mode 100644
index 0..da9a945d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/union-duplicate-field.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c99" } */
+
+int a0;
+
+struct S
+{
+int a1;
+union {
+int a0;
+int a1; /* { dg-error "duplicate member" } */
+int a2, a3, a4, a5, a6, a7, a8, a9;
+int a10, a11, a12, a13, a14, a15;
+};
+};
+
+int f()
+{
+struct S s;
+return s.a0;
+}


Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)

2018-08-01 Thread Bernd Edlinger
On 07/31/18 05:51, Martin Sebor wrote:
> On 07/30/2018 03:11 PM, Bernd Edlinger wrote:
>> Hi,
>>
>>> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>>> maxelts = maxelts / eltsize - 1;
>>>   }
>>>
>>> +  /* Unless the caller is prepared to handle it by passing in a non-null
>>> + ARR, fail if the terminating nul doesn't fit in the array the string
>>> + is stored in (as in const char a[3] = "123";  */
>>> +  if (!arr && maxelts < strelts)
>>> +    return NULL_TREE;
>>> +
>>
>> this is c_strlen, how is the caller ever supposed to handle non-zero 
>> terminated strings???
>> especially if you do this above?
> 
> Callers that pass in a non-null ARR handle them by issuing
> a warning.  The rest get back a null result.  It should be
> evident from the rest of the patch.  It can be debated what
> each caller should do when it detects such a missing nul
> where one is expected.  Different approaches may be more
> or less appropriate for different callers/functions (e.g.,
> strcpy vs strlen).
> 

Sorry, right in the beginning you have "if (!add) arr = arrs;"

>>> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>>> {
>>>   STRIP_NOPS (src);
>>> +
>>> +  /* Used to detect non-nul-terminated strings in subexpressions
>>> + of a conditional expression.  When ARR is null, point it at
>>> + one of the elements for simplicity.  */
>>> +  tree arrs[] = { NULL_TREE, NULL_TREE };
>>> +  if (!arr)
>>> +    arr = arrs;
>>
>>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>>   unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>>   length = string_length (TREE_STRING_POINTER (init), charsize,
>>>   length / charsize);
>>> -  if (compare_tree_int (array_size, length + 1) < 0)
>>> +  if (nulterm)
>>> +    *nulterm = array_elts > length;
>>> +  else if (array_elts <= length)
>>>     return NULL_TREE;
>>
>> I don't understand why you can't use
>> compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), TREE_STRING_LENGTH 
>> (init))
>> instead of this convoluted code above ???
>>
>> Sorry, this patch does not look like it is ready any time soon.
> 
> I'm open to technical comments on the substance of my changes
> but I'm not interested in your opinion of the readiness of
> the patch (whatever that might mean), certainly not if you
> have formed it after skimming a random handful of lines out
> of a 600 line patch.
> 

Sorry, again.  I just meant you should fix the issues, and
maybe make the patch a bit smaller.

>> But actually I am totally puzzled by your priorities.
>> This is what I see right now:
>>
>> 1) We have missing warnings.
>> 2) We have wrong code bugs.
>> 3) We have apparently a specification error on the C Language standard (*)
>>
>>
>> Why are you prioritizing 1) over 2) thus blocking my attempts to fix a wrong 
>> code
>> issue,and why do you not tackle 3) in your WG14?
> 
> My priorities are none of your concern.
> 

Sorry, again, but your priorities seem to conflict with mine.

> Your "attempts to fix" issues interfere with my work on a number
> of projects.  You are not being helpful -- instead, by submitting
> changes that you know fully well conflict with mine, you are
> impeding and undermining my work.  That is why I object to them.
> 
>> (*) which means that GCC is currently removing code from assertions
>> as I pointed out here: 
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01695.html
>>
>> This happens because GCC follows the language standards literally right now.
>>
>> I would say too literally, and it proves that the language standard's logic 
>> is
>> flawed IMHO.
> 
> I have no idea what your point is about standards, but bugs
> like the one in the example, including those arising from
> uninitialized arrays, could be detected with only minor
> enhancements to the tree-ssa-strlen pass.  Implementing some
> of this is among the projects I'm expected and expecting to
> work on for GCC 9.  This patch is a small step in that
> direction.
> 
> If you care about detecting bugs I would expect you to be
> supportive rather than dismissive of this work, and helpful
> in bringing it to fruition rather that putting it down or
> questioning my priorities.  Especially since the work was
> prompted by your own (valid) complaint that GCC doesn't
> diagnose them.
> 

You don't really listen to what I am saying, I did not say
that we need another warning instead of fixing the wrong
optimization issue at hand.

But I am in good company, you don't listen to Jakub and Richi
either.


Bernd.

> Martin
> 


[gomp5] Parse task modifier of reduction clauses

2018-08-01 Thread Jakub Jelinek
Hi!

This patch adds just the parsing and diagnostics of task reduction modifier.
Such reductions behave then differently, like task_reduction clause on
taskgroup construct when used on parallel or for/sections.

Tested on x86_64-linux, committed to gomp-5_0-branch.

2018-08-01  Jakub Jelinek  

* tree.h (OMP_CLAUSE_REDUCTION_TASK, OMP_CLAUSE_REDUCTION_INSCAN):
Define.
* tree-pretty-print.c (dump_omp_clause): Print reduction modifiers.
* gimplify.c (gimplify_scan_omp_clauses): Handle
OMP_CLAUSE_REDUCTION_TASK diagnostics.
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Handle OMP_CLAUSE_REDUCTION_TASK.
gcc/c/
* c-parser.c (c_parser_omp_clause_reduction): Add IS_OMP argument,
parse reduction modifiers.
(c_parser_oacc_all_clauses, c_parser_omp_all_clauses): Adjust
c_parser_omp_clause_reduction callers.
gcc/cp/
* parser.c (cp_parser_omp_clause_reduction): Add IS_OMP argument,
parse reduction modifiers.
(cp_parser_oacc_all_clauses, cp_parser_omp_all_clauses): Adjust
cp_parser_omp_clause_reduction callers.
gcc/testsuite/
* c-c++-common/gomp/reduction-task-1.c: New test.
* c-c++-common/gomp/reduction-task-2.c: New test.

--- gcc/tree.h.jj   2018-07-17 17:24:39.972318592 +0200
+++ gcc/tree.h  2018-07-30 16:18:35.928699592 +0200
@@ -1614,6 +1614,14 @@ extern tree maybe_wrap_with_location (tr
   (OMP_CLAUSE_RANGE_CHECK (NODE, OMP_CLAUSE_REDUCTION, \
   OMP_CLAUSE_IN_REDUCTION)->base.public_flag)
 
+/* True if a REDUCTION clause has task reduction-modifier.  */
+#define OMP_CLAUSE_REDUCTION_TASK(NODE) \
+  TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_REDUCTION))
+
+/* True if a REDUCTION clause has inscan reduction-modifier.  */
+#define OMP_CLAUSE_REDUCTION_INSCAN(NODE) \
+  TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_REDUCTION))
+
 /* True if a LINEAR clause doesn't need copy in.  True for iterator vars which
are always initialized inside of the loop construct, false otherwise.  */
 #define OMP_CLAUSE_LINEAR_NO_COPYIN(NODE) \
--- gcc/tree-pretty-print.c.jj  2018-07-10 11:32:22.271156564 +0200
+++ gcc/tree-pretty-print.c 2018-07-30 18:22:56.159264665 +0200
@@ -477,6 +477,13 @@ dump_omp_clause (pretty_printer *pp, tre
   /* FALLTHRU */
 case OMP_CLAUSE_REDUCTION:
   pp_string (pp, "reduction(");
+  if (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_REDUCTION)
+   {
+ if (OMP_CLAUSE_REDUCTION_TASK (clause))
+   pp_string (pp, "task,");
+ else if (OMP_CLAUSE_REDUCTION_INSCAN (clause))
+   pp_string (pp, "inscan,");
+   }
   if (OMP_CLAUSE_REDUCTION_CODE (clause) != ERROR_MARK)
{
  pp_string (pp,
--- gcc/gimplify.c.jj   2018-07-25 17:40:05.967970906 +0200
+++ gcc/gimplify.c  2018-08-01 14:34:53.945975952 +0200
@@ -7960,6 +7960,7 @@ gimplify_scan_omp_clauses (tree *list_p,
   hash_map *struct_map_to_clause = NULL;
   tree *prev_list_p = NULL;
   int handled_depend_iterators = -1;
+  int nowait = -1;
 
   ctx = new_omp_context (region_type);
   outer_ctx = ctx->outer_context;
@@ -8113,6 +8114,32 @@ gimplify_scan_omp_clauses (tree *list_p,
}
  goto do_add;
case OMP_CLAUSE_REDUCTION:
+ if (OMP_CLAUSE_REDUCTION_TASK (c))
+   {
+ if (region_type == ORT_WORKSHARE)
+   {
+ if (nowait == -1)
+   nowait = omp_find_clause (*list_p,
+ OMP_CLAUSE_NOWAIT) != NULL_TREE;
+ if (nowait
+ && (outer_ctx == NULL
+ || outer_ctx->region_type != ORT_COMBINED_PARALLEL))
+   {
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "% reduction modifier on a construct "
+   "with a % clause");
+ OMP_CLAUSE_REDUCTION_TASK (c) = 0;
+   }
+   }
+ else if ((region_type & ORT_PARALLEL) != ORT_PARALLEL)
+   {
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "invalid % reduction modifier on construct "
+   "other than %, % or %");
+ OMP_CLAUSE_REDUCTION_TASK (c) = 0;
+   }
+   }
+ /* FALLTHRU */
case OMP_CLAUSE_IN_REDUCTION:
case OMP_CLAUSE_TASK_REDUCTION:
  flags = GOVD_REDUCTION | GOVD_SEEN | GOVD_EXPLICIT;
@@ -9016,6 +9043,9 @@ gimplify_scan_omp_clauses (tree *list_p,
  break;
 
case OMP_CLAUSE_NOWAIT:
+ nowait = 1;
+ break;
+
case OMP_CLAUSE_ORDERED:
case OMP_CLAUSE_UNTIED:
case OMP_CLAUSE_COLLAPSE:
--- gcc/c-family/c-omp.c.jj 2018-07-17 17:24:39.973318593 +0200
+++ gcc/c-family/c-omp.c2018-08-01 14:04:30.178369000 +0200
@@ -1590,6 +1590,28 @@ 

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-01 Thread Tom de Vries
On 08/01/2018 04:01 PM, Cesar Philippidis wrote:
> On 08/01/2018 03:18 AM, Tom de Vries wrote:
>> On 07/31/2018 04:58 PM, Cesar Philippidis wrote:
>>> The attached patch teaches libgomp how to use the CUDA thread occupancy
>>> calculator built into the CUDA driver. Despite both being based off the
>>> CUDA thread occupancy spreadsheet distributed with CUDA, the built in
>>> occupancy calculator differs from the occupancy calculator in og8 in two
>>> key ways. First, og8 launches twice the number of gangs as the driver
>>> thread occupancy calculator. This was my attempt at preventing threads
>>> from idling, and it operating on a similar principle of running 'make
>>> -jN', where N is twice the number of CPU threads.
>>
>> You're saying the two methods are different, and that the difference
>> between the two methods is a factor two, which is a heuristic you added
>> yourself on top of one of the methods, which implies that in fact the
>> two methods are identical. Is my understanding correct here?
> 
> With the exception being that og8 multiples num_gangs by a factor of
> two, those two algorithms are identical, at least with respect to gangs.
> 
>>> Second, whereas og8
>>> always attempts to maximize the CUDA block size, the driver may select a
>>> smaller block, which effectively decreases num_workers.
>>>
>>
>> So, do I understand it correctly that using the function
>> cuOccupancyMaxPotentialBlockSize gives us "minimum block size that can
>> achieve the maximum occupancy" or some such and og8 gives us "maximum
>> block size"?
> 
> Correct.
> 
>>> In terms of performance, there really isn't that much of a difference
>>> between the CUDA driver's occupancy calculator and og8's. However, on
>>> the tests that are impacted, they are generally within a factor of two
>>> from one another, with some tests running faster with the driver
>>> occupancy calculator and others with og8's.
>>>
>>
>> Ack. Well, until we understand that in more detail, going with the
>> driver's occupancy calculator seems the right thing to do.
>>
>>> Unfortunately, support for the CUDA driver API isn't universal; it's
>>> only available in CUDA version 6.5 (or 6050) and newer. In this patch,
>>> I'm exploiting the fact that init_cuda_lib only checks for errors on the
>>> last library function initialized.
>>
>> That sounds incorrect to me. In init_cuda_lib I see:
>> ...
>> # define CUDA_ONE_CALL(call) CUDA_ONE_CALL_1 (call)
>> # define CUDA_ONE_CALL_1(call) \
>>   cuda_lib.call = dlsym (h, #call); \
>>   if (cuda_lib.call == NULL)\
>> return false;
>>   CUDA_CALLS
>> ...
>> so in fact every library function is checked. Have you tested this with
>> pre 6-5 cuda?
> 
> I misread that. You're correct. So far, I've only tested this out with
> CUDA 9.
> 
>> I think we need to add and handle:
>> ...
>>   CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
>> ...
>>
>>> Therefore it guards the usage of
>>>
>>>   cuOccupancyMaxPotentialBlockSizeWithFlags
>>>
>>> by checking driver_version.
>>
>> If we allow the cuOccupancyMaxPotentialBlockSize field to be NULL, we
>> can test for NULL, which seems a simpler solution than testing the version.
>>
>>> If the driver occupancy calculator isn't
>>> available, it falls back to the existing defaults. Maybe the og8 thread
>>> occupancy would make a better default for older versions of CUDA, but
>>> that's a patch for another day.
>>>
>>
>> Agreed.
>>
>>> Is this patch OK for trunk?
>>
>> The patch doesn't build in a setup with
>> --enable-offload-targets=nvptx-none and without cuda, that enables usage
>> of plugin/cuda/cuda.h:
>> ...
>> /data/offload-nvptx/src/libgomp/plugin/plugin-nvptx.c:98:16: error:
>> ‘cuOccupancyMaxPotentialBlockSize’ undeclared here (not in a function);
>> did you mean ‘cuOccupancyMaxPotentialBlockSizeWithFlags’?
>>  CUDA_ONE_CALL (cuOccupancyMaxPotentialBlockSize) \
>> ...
>>
>>> @@ -1220,11 +1227,39 @@ nvptx_exec (void (*fn), size_t mapnum, void 
>>> **hostaddrs, void **devaddrs,
>>>  
>>>{
>>> bool default_dim_p[GOMP_DIM_MAX];
>>> +   int vectors = nvthd->ptx_dev->default_dims[GOMP_DIM_VECTOR];
>>> +   int workers = nvthd->ptx_dev->default_dims[GOMP_DIM_WORKER];
>>> +   int gangs = nvthd->ptx_dev->default_dims[GOMP_DIM_GANG];
>>> +
>>> +   /* The CUDA driver occupancy calculator is only available on
>>> +  CUDA version 6.5 (6050) and newer.  */
>>> +   if (nvthd->ptx_dev->driver_version > 6050)
>>> + {
>>> +   int grids, blocks;
>>> +   CUDA_CALL_ASSERT (cuOccupancyMaxPotentialBlockSize, ,
>>> + , function, NULL, 0,
>>> + dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]);
>>> +   GOMP_PLUGIN_debug (0, "cuOccupancyMaxPotentialBlockSize: "
>>> +  "grid = %d, block = %d\n", grids, blocks);
>>> +
>>
>>
>>> +   if (GOMP_PLUGIN_acc_default_dim (GOMP_DIM_GANG) == 0)
>>
>> You should use gomp_openacc_dims[0].
>>
>>> + gangs = grids * (blocks 

Re: [ARM/FDPIC v2 00/21] FDPIC ABI for ARM

2018-08-01 Thread Christophe Lyon

Ping?


On 13/07/2018 18:10, christophe.l...@st.com wrote:

From: Christophe Lyon 

Hello,

This patch series implements the GCC contribution of the FDPIC ABI for
ARM targets.

This ABI enables to run Linux on ARM MMU-less cores and supports
shared libraries to reduce the memory footprint.

Without MMU, text and data segments relative distances are different
from one process to another, hence the need for a dedicated FDPIC
register holding the start address of the data segment. One of the
side effects is that function pointers require two words to be
represented: the address of the code, and the data segment start
address. These two words are designated as "Function Descriptor",
hence the "FD PIC" name.

On ARM, the FDPIC register is r9 [1], and the target name is
arm-uclinuxfdpiceabi. Note that arm-uclinux exists, but uses another
ABI and the BFLAT file format; it does not support code sharing.
The -mfdpic option is enabled by default, and -mno-fdpic should be
used to build the Linux kernel.

This work was developed some time ago by STMicroelectronics, and was
presented during Linaro Connect SFO15 (September 2015). You can watch
the discussion and read the slides [2].
This presentation was related to the toolchain published on github [3],
which is based on binutils-2.22, gcc-4.7, uclibc-0.9.33.2, gdb-7.5.1
and qemu-2.3.0, and for which pre-built binaries are available [3].

The ABI itself is described in details in [1].

Our Linux kernel patches have been updated and committed by Nicolas
Pitre (Linaro) in July 2017. They are required so that the loader is
able to handle this new file type. Indeed, the ELF files are tagged
with ELFOSABI_ARM_FDPIC. This new tag has been allocated by ARM, as
well as the new relocations involved.

The binutils and QEMU patch series have been merged recently. [4][5]

To build such a toolchain, you'd also need to use my uClibc branch[6].
I have posted uclibc-ng patches for review [7]

I am currently working on updating the patches for the remaining
toolchain components: uclibc and gdb.

This series provides support for ARM v7 architecture and has been
tested on arm-linux-gnueabi without regression, as well as
arm-uclinuxfdpiceabi, using QEMU. arm-uclinuxfdpiceabi has more
failures than arm-linux-gnueabi, but is quite functional.

Are the GCC patches OK for inclusion in master?

Changes between v1 and v2:
- fix GNU coding style
- exit with an error for pre-Armv7
- use ACLE __ARM_ARCH and remove dead code for pre-Armv4
- remove unsupported attempts of pre-Armv7/thumb1 support
- add instructions in comments next to opcodes
- merge patches 11 and 13
- fixed protected visibility handling in patch 8
- merged legitimize_tls_address_fdpic and
   legitimize_tls_address_not_fdpic as requested

Thanks,

Christophe.


[1] https://github.com/mickael-guene/fdpic_doc/blob/master/abi.txt
[2] 
http://connect.linaro.org/resource/sfo15/sfo15-406-arm-fdpic-toolset-kernel-libraries-for-cortex-m-cortex-r-mmuless-cores/
[3] https://github.com/mickael-guene/fdpic_manifest
[4] 
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=f1ac0afe481e83c9a33f247b81fa7de789edc4d9
[5] 
https://git.qemu.org/?p=qemu.git;a=commit;h=e8fa72957419c11984608062c7dcb204a6003a06
[6] 
https://git.linaro.org/people/christophe.lyon/uclibc.git/log/?h=uClibc-0.9.33.2-fdpic-upstream
[7] https://mailman.uclibc-ng.org/pipermail/devel/2018-July/001705.html

Christophe Lyon (21):
   [ARM] FDPIC: Add -mfdpic option support
   [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts
   [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided
   [ARM] FDPIC: Add support for FDPIC for arm architecture
   [ARM] FDPIC: Fix __do_global_dtors_aux and frame_dummy generation
   [ARM] FDPIC: Add support for c++ exceptions
   [ARM] FDPIC: Avoid saving/restoring r9 on stack since it is RO
   [ARM] FDPIC: Ensure local/global binding for function descriptors
   [ARM] FDPIC: Add support for taking address of nested function
   [ARM] FDPIC: Implement TLS support.
   [ARM] FDPIC: Add support to unwind FDPIC signal frame
   [ARM] FDPIC: Restore r9 after we call __aeabi_read_tp
   [ARM] FDPIC: Force LSB bit for PC in Cortex-M architecture
   [ARM][testsuite] FDPIC: Skip unsupported tests
   [ARM][testsuite] FDPIC: Adjust scan-assembler patterns.
   [ARM][testsuite] FDPIC: Skip v8-m and v6-m tests that currently
 produce an ICE
   [ARM][testsuite] FDPIC: Skip tests that don't work in PIC mode
   [ARM][testsuite] FDPIC: Handle *-*-uclinux*
   [ARM][testsuite] FDPIC: Enable tests on pie_enabled targets
   [ARM][testsuite] FDPIC: Adjust pr43698.c to avoid clash with uclibc.
   [ARM][testsuite] FDPIC: Skip tests using architecture older than v7

  config/futex.m4|   2 +-
  config/tls.m4  |   2 +-
  gcc/config.gcc |  13 +-
  gcc/config/arm/arm-c.c |   2 +
  

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-01 Thread Cesar Philippidis
On 08/01/2018 03:18 AM, Tom de Vries wrote:
> On 07/31/2018 04:58 PM, Cesar Philippidis wrote:
>> The attached patch teaches libgomp how to use the CUDA thread occupancy
>> calculator built into the CUDA driver. Despite both being based off the
>> CUDA thread occupancy spreadsheet distributed with CUDA, the built in
>> occupancy calculator differs from the occupancy calculator in og8 in two
>> key ways. First, og8 launches twice the number of gangs as the driver
>> thread occupancy calculator. This was my attempt at preventing threads
>> from idling, and it operating on a similar principle of running 'make
>> -jN', where N is twice the number of CPU threads.
> 
> You're saying the two methods are different, and that the difference
> between the two methods is a factor two, which is a heuristic you added
> yourself on top of one of the methods, which implies that in fact the
> two methods are identical. Is my understanding correct here?

With the exception being that og8 multiples num_gangs by a factor of
two, those two algorithms are identical, at least with respect to gangs.

>> Second, whereas og8
>> always attempts to maximize the CUDA block size, the driver may select a
>> smaller block, which effectively decreases num_workers.
>>
> 
> So, do I understand it correctly that using the function
> cuOccupancyMaxPotentialBlockSize gives us "minimum block size that can
> achieve the maximum occupancy" or some such and og8 gives us "maximum
> block size"?

Correct.

>> In terms of performance, there really isn't that much of a difference
>> between the CUDA driver's occupancy calculator and og8's. However, on
>> the tests that are impacted, they are generally within a factor of two
>> from one another, with some tests running faster with the driver
>> occupancy calculator and others with og8's.
>>
> 
> Ack. Well, until we understand that in more detail, going with the
> driver's occupancy calculator seems the right thing to do.
> 
>> Unfortunately, support for the CUDA driver API isn't universal; it's
>> only available in CUDA version 6.5 (or 6050) and newer. In this patch,
>> I'm exploiting the fact that init_cuda_lib only checks for errors on the
>> last library function initialized.
> 
> That sounds incorrect to me. In init_cuda_lib I see:
> ...
> # define CUDA_ONE_CALL(call) CUDA_ONE_CALL_1 (call)
> # define CUDA_ONE_CALL_1(call) \
>   cuda_lib.call = dlsym (h, #call); \
>   if (cuda_lib.call == NULL)\
> return false;
>   CUDA_CALLS
> ...
> so in fact every library function is checked. Have you tested this with
> pre 6-5 cuda?

I misread that. You're correct. So far, I've only tested this out with
CUDA 9.

> I think we need to add and handle:
> ...
>   CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
> ...
> 
>> Therefore it guards the usage of
>>
>>   cuOccupancyMaxPotentialBlockSizeWithFlags
>>
>> by checking driver_version.
> 
> If we allow the cuOccupancyMaxPotentialBlockSize field to be NULL, we
> can test for NULL, which seems a simpler solution than testing the version.
> 
>> If the driver occupancy calculator isn't
>> available, it falls back to the existing defaults. Maybe the og8 thread
>> occupancy would make a better default for older versions of CUDA, but
>> that's a patch for another day.
>>
> 
> Agreed.
> 
>> Is this patch OK for trunk?
> 
> The patch doesn't build in a setup with
> --enable-offload-targets=nvptx-none and without cuda, that enables usage
> of plugin/cuda/cuda.h:
> ...
> /data/offload-nvptx/src/libgomp/plugin/plugin-nvptx.c:98:16: error:
> ‘cuOccupancyMaxPotentialBlockSize’ undeclared here (not in a function);
> did you mean ‘cuOccupancyMaxPotentialBlockSizeWithFlags’?
>  CUDA_ONE_CALL (cuOccupancyMaxPotentialBlockSize) \
> ...
> 
>> @@ -1220,11 +1227,39 @@ nvptx_exec (void (*fn), size_t mapnum, void 
>> **hostaddrs, void **devaddrs,
>>  
>>{
>>  bool default_dim_p[GOMP_DIM_MAX];
>> +int vectors = nvthd->ptx_dev->default_dims[GOMP_DIM_VECTOR];
>> +int workers = nvthd->ptx_dev->default_dims[GOMP_DIM_WORKER];
>> +int gangs = nvthd->ptx_dev->default_dims[GOMP_DIM_GANG];
>> +
>> +/* The CUDA driver occupancy calculator is only available on
>> +   CUDA version 6.5 (6050) and newer.  */
>> +if (nvthd->ptx_dev->driver_version > 6050)
>> +  {
>> +int grids, blocks;
>> +CUDA_CALL_ASSERT (cuOccupancyMaxPotentialBlockSize, ,
>> +  , function, NULL, 0,
>> +  dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]);
>> +GOMP_PLUGIN_debug (0, "cuOccupancyMaxPotentialBlockSize: "
>> +   "grid = %d, block = %d\n", grids, blocks);
>> +
> 
> 
>> +if (GOMP_PLUGIN_acc_default_dim (GOMP_DIM_GANG) == 0)
> 
> You should use gomp_openacc_dims[0].
> 
>> +  gangs = grids * (blocks / warp_size);
> 
> So, we launch with gangs == grids * workers ? Is that intentional?

Yes. At least that's what I've been using in og8. Setting 

[PATCH] PR libstdc++/60555 std::system_category() should recognise POSIX errno values

2018-08-01 Thread Jonathan Wakely

PR libstdc++/60555
* src/c++11/system_error.cc
(system_error_category::default_error_condition): New override to
check for POSIX errno values.
* testsuite/19_diagnostics/error_category/generic_category.cc: New
* testsuite/19_diagnostics/error_category/system_category.cc: New
test.

Tested x86_64-linux and powerpc64le-linux, committed to trunk.

I want to backport this to all the active branches too. It only
changes an internal type that isn't exported, and the new libstdc++.so
will even fix existing code that currently gets the wrong
error_category for POSIX errno values.



commit 6d40bd10ec7b8701c7299b84cb96832e010bc2fc
Author: Jonathan Wakely 
Date:   Wed Aug 1 01:10:33 2018 +0100

PR libstdc++/60555 std::system_category() should recognise POSIX errno 
values

PR libstdc++/60555
* src/c++11/system_error.cc
(system_error_category::default_error_condition): New override to
check for POSIX errno values.
* testsuite/19_diagnostics/error_category/generic_category.cc: New
* testsuite/19_diagnostics/error_category/system_category.cc: New
test.

diff --git a/libstdc++-v3/src/c++11/system_error.cc 
b/libstdc++-v3/src/c++11/system_error.cc
index c6549fcb4e0..82b4cb5f98c 100644
--- a/libstdc++-v3/src/c++11/system_error.cc
+++ b/libstdc++-v3/src/c++11/system_error.cc
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #undef __sso_string
 
 namespace
@@ -65,6 +66,260 @@ namespace
   // _GLIBCXX_HAVE_STRERROR_L, strerror_l(i, cloc)
   return string(strerror(i));
 }
+
+virtual std::error_condition
+default_error_condition(int ev) const noexcept
+{
+  switch (ev)
+  {
+  // List of errno macros from [cerrno.syn].
+  // C11 only defines EDOM, EILSEQ and ERANGE, the rest are from POSIX.
+  // They expand to integer constant expressions with type int,
+  // and distinct positive values, suitable for use in #if directives.
+  // POSIX adds more macros (but they're not defined on all targets,
+  // see config/os/*/error_constants.h), and POSIX allows
+  // EAGAIN == EWOULDBLOCK and ENOTSUP == EOPNOTSUPP.
+
+#ifdef E2BIG
+  case E2BIG:
+#endif
+#ifdef EACCES
+  case EACCES:
+#endif
+#ifdef EADDRINUSE
+  case EADDRINUSE:
+#endif
+#ifdef EADDRNOTAVAIL
+  case EADDRNOTAVAIL:
+#endif
+#ifdef EAFNOSUPPORT
+  case EAFNOSUPPORT:
+#endif
+#ifdef EAGAIN
+  case EAGAIN:
+#endif
+#ifdef EALREADY
+  case EALREADY:
+#endif
+#ifdef EBADF
+  case EBADF:
+#endif
+#ifdef EBADMSG
+  case EBADMSG:
+#endif
+#ifdef EBUSY
+  case EBUSY:
+#endif
+#ifdef ECANCELED
+  case ECANCELED:
+#endif
+#ifdef ECHILD
+  case ECHILD:
+#endif
+#ifdef ECONNABORTED
+  case ECONNABORTED:
+#endif
+#ifdef ECONNREFUSED
+  case ECONNREFUSED:
+#endif
+#ifdef ECONNRESET
+  case ECONNRESET:
+#endif
+#ifdef EDEADLK
+  case EDEADLK:
+#endif
+#ifdef EDESTADDRREQ
+  case EDESTADDRREQ:
+#endif
+  case EDOM:
+#ifdef EEXIST
+  case EEXIST:
+#endif
+#ifdef EFAULT
+  case EFAULT:
+#endif
+#ifdef EFBIG
+  case EFBIG:
+#endif
+#ifdef EHOSTUNREACH
+  case EHOSTUNREACH:
+#endif
+#ifdef EIDRM
+  case EIDRM:
+#endif
+  case EILSEQ:
+#ifdef EINPROGRESS
+  case EINPROGRESS:
+#endif
+#ifdef EINTR
+  case EINTR:
+#endif
+#ifdef EINVAL
+  case EINVAL:
+#endif
+#ifdef EIO
+  case EIO:
+#endif
+#ifdef EISCONN
+  case EISCONN:
+#endif
+#ifdef EISDIR
+  case EISDIR:
+#endif
+#ifdef ELOOP
+  case ELOOP:
+#endif
+#ifdef EMFILE
+  case EMFILE:
+#endif
+#ifdef EMLINK
+  case EMLINK:
+#endif
+#ifdef EMSGSIZE
+  case EMSGSIZE:
+#endif
+#ifdef ENAMETOOLONG
+  case ENAMETOOLONG:
+#endif
+#ifdef ENETDOWN
+  case ENETDOWN:
+#endif
+#ifdef ENETRESET
+  case ENETRESET:
+#endif
+#ifdef ENETUNREACH
+  case ENETUNREACH:
+#endif
+#ifdef ENFILE
+  case ENFILE:
+#endif
+#ifdef ENOBUFS
+  case ENOBUFS:
+#endif
+#ifdef ENODATA
+  case ENODATA:
+#endif
+#ifdef ENODEV
+  case ENODEV:
+#endif
+#ifdef ENOENT
+  case ENOENT:
+#endif
+#ifdef ENOEXEC
+  case ENOEXEC:
+#endif
+#ifdef ENOLCK
+  case ENOLCK:
+#endif
+#ifdef ENOLINK
+  case ENOLINK:
+#endif
+#ifdef ENOMEM
+  case ENOMEM:
+#endif
+#ifdef ENOMSG
+  case ENOMSG:
+#endif
+#ifdef ENOPROTOOPT
+  case ENOPROTOOPT:
+#endif
+#ifdef ENOSPC
+  case ENOSPC:
+#endif
+#ifdef ENOSR
+  case ENOSR:
+#endif
+#ifdef ENOSTR
+  case ENOSTR:
+#endif
+#ifdef ENOSYS
+  case ENOSYS:
+#endif
+#ifdef ENOTCONN
+  case ENOTCONN:
+#endif
+#ifdef ENOTDIR
+  case ENOTDIR:
+#endif
+#ifdef ENOTEMPTY
+  case ENOTEMPTY:
+#endif
+#ifdef ENOTRECOVERABLE
+  case ENOTRECOVERABLE:
+#endif
+#ifdef ENOTSOCK
+  case ENOTSOCK:
+#endif
+#ifdef ENOTSUP
+  case ENOTSUP:
+#endif
+#ifdef ENOTTY
+  case ENOTTY:
+#endif
+#ifdef ENXIO
+  case ENXIO:

Re: [PATCH] Provide extension hint for aarch64 target (PR driver/83193).

2018-08-01 Thread Martin Liška
PING^1

On 07/18/2018 05:49 PM, Martin Liška wrote:
> Hi.
> 
> This patch improves aarch64 feature modifier hints.
> 
> May I please ask ARM folks to test the patch?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-07-18  Martin Liska  
> 
> PR driver/83193
>   * common/config/aarch64/aarch64-common.c (aarch64_parse_extension):
> Set invalid_extension when there's any.
>   (aarch64_get_all_extension_candidates): New function.
>   (aarch64_rewrite_selected_cpu): Pass NULL as new argument.
>   * config/aarch64/aarch64-protos.h 
> (aarch64_get_all_extension_candidates):
> Declare new function.
>   * config/aarch64/aarch64.c (aarch64_parse_arch): Record
> invalid_feature.
>   (aarch64_parse_cpu): Likewise.
>   (aarch64_print_hint_for_feature_modifier): New.
>   (aarch64_validate_mcpu): Record invalid feature modifier
> and print hint for it.
>   (aarch64_validate_march): Likewise.
>   (aarch64_handle_attr_arch): Likewise.
>   (aarch64_handle_attr_cpu): Likewise.
>   (aarch64_handle_attr_isa_flags): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-07-18  Martin Liska  
> 
> PR driver/83193
>   * gcc.target/aarch64/spellcheck_7.c: New test.
>   * gcc.target/aarch64/spellcheck_8.c: New test.
> ---
>  gcc/common/config/aarch64/aarch64-common.c| 20 +-
>  gcc/config/aarch64/aarch64-protos.h   |  4 +-
>  gcc/config/aarch64/aarch64.c  | 67 +++
>  .../gcc.target/aarch64/spellcheck_7.c | 11 +++
>  .../gcc.target/aarch64/spellcheck_8.c | 12 
>  5 files changed, 97 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/spellcheck_7.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/spellcheck_8.c
> 
> 



Re: [PATCH] Print default options selection for -march,-mcpu and -mtune for aarch64 (PR driver/83193).

2018-08-01 Thread Martin Liška
PING^1

On 07/18/2018 05:48 PM, Martin Liška wrote:
> Hi.
> 
> This is aarch64 fix for PR83193. It's about setting of default options
> so that --help=target -Q prints proper numbers:
> 
> Now this is seen on my cross-compiler:
> 
> --- /home/marxin/Downloads/options-2-before.txt   2018-07-18 
> 14:53:11.658146543 +0200
> +++ /home/marxin/Downloads/options-2.txt  2018-07-18 14:52:30.113274284 
> +0200
> @@ -1,10 +1,10 @@
>  The following options are target specific:
>-mabi=ABI  lp64
> -  -march=ARCH
> +  -march=armv8-a
>-mbig-endian   [disabled]
>-mbionic   [disabled]
>-mcmodel=  small
> -  -mcpu=CPU  
> +  -mcpu= generic
>-mfix-cortex-a53-835769[enabled]
>-mfix-cortex-a53-843419[enabled]
>-mgeneral-regs-only[disabled]
> @@ -19,7 +19,7 @@
>-msve-vector-bits=Nscalable
>-mtls-dialect= desc
>-mtls-size=24
> -  -mtune=CPU 
> +  -mtune=generic
>-muclibc   [disabled]
> 
> May I please ask ARM folks to test the patch?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-07-18  Martin Liska  
> 
> PR driver/83193
>   * config/aarch64/aarch64.c (aarch64_override_options_internal):
> Set default values for x_aarch64_*_string strings.
>   * config/aarch64/aarch64.opt: Remove --{march,mcpu,mtune}==
> prefix.
> ---
>  gcc/config/aarch64/aarch64.c   | 7 +++
>  gcc/config/aarch64/aarch64.opt | 6 +++---
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> 



[nvptx, committed] Define TARGET_HAVE_SPECULATION_SAFE_VALUE

2018-08-01 Thread Tom de Vries
Hi,

this defines new target hook TARGET_HAVE_SPECULATION_SAFE_VALUE for nvptx.
Since AFAIK nvidia claims the related security issue does not exist on their
video hardware, we set it to speculation_safe_value_not_needed.

Build and reg-tested on x86_64 with nvptx accelerator.

Committed.

Thanks,
- Tom

[nvptx] Define TARGET_HAVE_SPECULATION_SAFE_VALUE

2018-08-01  Tom de Vries  

PR target/86800
* config/nvptx/nvptx.c (TARGET_HAVE_SPECULATION_SAFE_VALUE): Define to
speculation_safe_value_not_needed.

---
 gcc/config/nvptx/nvptx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index c1946e75f42..c0b0a2ec3ab 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -6048,6 +6048,9 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, 
reg_class_t)
 #undef TARGET_CAN_CHANGE_MODE_CLASS
 #define TARGET_CAN_CHANGE_MODE_CLASS nvptx_can_change_mode_class
 
+#undef TARGET_HAVE_SPECULATION_SAFE_VALUE
+#define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nvptx.h"


[libgomp, nvptx, committed] Add cuda-lib.def

2018-08-01 Thread Tom de Vries
Hi,

This factors out the cuda library calls list into a seperate .def file.

Build and reg-tested on x86_64 with nvptx accelerator.

Committed.

Thanks,
- Tom

[libgomp, nvptx] Add cuda-lib.def

2018-08-01  Tom de Vries  

* plugin/cuda-lib.def: New file.  Factor out of ...
* plugin/plugin-nvptx.c (CUDA_CALLS): ... here.
(struct cuda_lib_s, init_cuda_lib): Include cuda-lib.def instead of
using CUDA_CALLS.

---
 libgomp/plugin/cuda-lib.def   | 46 ++
 libgomp/plugin/plugin-nvptx.c | 51 ++-
 2 files changed, 48 insertions(+), 49 deletions(-)

diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def
new file mode 100644
index 000..be8e3b3ec4d
--- /dev/null
+++ b/libgomp/plugin/cuda-lib.def
@@ -0,0 +1,46 @@
+CUDA_ONE_CALL (cuCtxCreate)
+CUDA_ONE_CALL (cuCtxDestroy)
+CUDA_ONE_CALL (cuCtxGetCurrent)
+CUDA_ONE_CALL (cuCtxGetDevice)
+CUDA_ONE_CALL (cuCtxPopCurrent)
+CUDA_ONE_CALL (cuCtxPushCurrent)
+CUDA_ONE_CALL (cuCtxSynchronize)
+CUDA_ONE_CALL (cuDeviceGet)
+CUDA_ONE_CALL (cuDeviceGetAttribute)
+CUDA_ONE_CALL (cuDeviceGetCount)
+CUDA_ONE_CALL (cuEventCreate)
+CUDA_ONE_CALL (cuEventDestroy)
+CUDA_ONE_CALL (cuEventElapsedTime)
+CUDA_ONE_CALL (cuEventQuery)
+CUDA_ONE_CALL (cuEventRecord)
+CUDA_ONE_CALL (cuEventSynchronize)
+CUDA_ONE_CALL (cuFuncGetAttribute)
+CUDA_ONE_CALL (cuGetErrorString)
+CUDA_ONE_CALL (cuInit)
+CUDA_ONE_CALL (cuLaunchKernel)
+CUDA_ONE_CALL (cuLinkAddData)
+CUDA_ONE_CALL (cuLinkComplete)
+CUDA_ONE_CALL (cuLinkCreate)
+CUDA_ONE_CALL (cuLinkDestroy)
+CUDA_ONE_CALL (cuMemAlloc)
+CUDA_ONE_CALL (cuMemAllocHost)
+CUDA_ONE_CALL (cuMemcpy)
+CUDA_ONE_CALL (cuMemcpyDtoDAsync)
+CUDA_ONE_CALL (cuMemcpyDtoH)
+CUDA_ONE_CALL (cuMemcpyDtoHAsync)
+CUDA_ONE_CALL (cuMemcpyHtoD)
+CUDA_ONE_CALL (cuMemcpyHtoDAsync)
+CUDA_ONE_CALL (cuMemFree)
+CUDA_ONE_CALL (cuMemFreeHost)
+CUDA_ONE_CALL (cuMemGetAddressRange)
+CUDA_ONE_CALL (cuMemHostGetDevicePointer)
+CUDA_ONE_CALL (cuModuleGetFunction)
+CUDA_ONE_CALL (cuModuleGetGlobal)
+CUDA_ONE_CALL (cuModuleLoad)
+CUDA_ONE_CALL (cuModuleLoadData)
+CUDA_ONE_CALL (cuModuleUnload)
+CUDA_ONE_CALL (cuStreamCreate)
+CUDA_ONE_CALL (cuStreamDestroy)
+CUDA_ONE_CALL (cuStreamQuery)
+CUDA_ONE_CALL (cuStreamSynchronize)
+CUDA_ONE_CALL (cuStreamWaitEvent)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b6ec5f88d59..83176ce07a0 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -52,57 +52,10 @@
 #if PLUGIN_NVPTX_DYNAMIC
 # include 
 
-# define CUDA_CALLS \
-CUDA_ONE_CALL (cuCtxCreate)\
-CUDA_ONE_CALL (cuCtxDestroy)   \
-CUDA_ONE_CALL (cuCtxGetCurrent)\
-CUDA_ONE_CALL (cuCtxGetDevice) \
-CUDA_ONE_CALL (cuCtxPopCurrent)\
-CUDA_ONE_CALL (cuCtxPushCurrent)   \
-CUDA_ONE_CALL (cuCtxSynchronize)   \
-CUDA_ONE_CALL (cuDeviceGet)\
-CUDA_ONE_CALL (cuDeviceGetAttribute)   \
-CUDA_ONE_CALL (cuDeviceGetCount)   \
-CUDA_ONE_CALL (cuEventCreate)  \
-CUDA_ONE_CALL (cuEventDestroy) \
-CUDA_ONE_CALL (cuEventElapsedTime) \
-CUDA_ONE_CALL (cuEventQuery)   \
-CUDA_ONE_CALL (cuEventRecord)  \
-CUDA_ONE_CALL (cuEventSynchronize) \
-CUDA_ONE_CALL (cuFuncGetAttribute) \
-CUDA_ONE_CALL (cuGetErrorString)   \
-CUDA_ONE_CALL (cuInit) \
-CUDA_ONE_CALL (cuLaunchKernel) \
-CUDA_ONE_CALL (cuLinkAddData)  \
-CUDA_ONE_CALL (cuLinkComplete) \
-CUDA_ONE_CALL (cuLinkCreate)   \
-CUDA_ONE_CALL (cuLinkDestroy)  \
-CUDA_ONE_CALL (cuMemAlloc) \
-CUDA_ONE_CALL (cuMemAllocHost) \
-CUDA_ONE_CALL (cuMemcpy)   \
-CUDA_ONE_CALL (cuMemcpyDtoDAsync)  \
-CUDA_ONE_CALL (cuMemcpyDtoH)   \
-CUDA_ONE_CALL (cuMemcpyDtoHAsync)  \
-CUDA_ONE_CALL (cuMemcpyHtoD)   \
-CUDA_ONE_CALL (cuMemcpyHtoDAsync)  \
-CUDA_ONE_CALL (cuMemFree)  \
-CUDA_ONE_CALL (cuMemFreeHost)  \
-CUDA_ONE_CALL (cuMemGetAddressRange)   \
-CUDA_ONE_CALL (cuMemHostGetDevicePointer)\
-CUDA_ONE_CALL (cuModuleGetFunction)\
-CUDA_ONE_CALL (cuModuleGetGlobal)  \
-CUDA_ONE_CALL (cuModuleLoad)   \
-CUDA_ONE_CALL (cuModuleLoadData)   \
-CUDA_ONE_CALL (cuModuleUnload) \
-CUDA_ONE_CALL (cuStreamCreate) \
-CUDA_ONE_CALL (cuStreamDestroy)\
-CUDA_ONE_CALL (cuStreamQuery)  \
-CUDA_ONE_CALL (cuStreamSynchronize)\
-CUDA_ONE_CALL (cuStreamWaitEvent)
 # define CUDA_ONE_CALL(call) \
   __typeof (call) *call;
 struct cuda_lib_s {
-  CUDA_CALLS
+#include "cuda-lib.def"
 } cuda_lib;
 
 /* -1 if init_cuda_lib has not been called yet, false
@@ -127,7 +80,7 @@ init_cuda_lib (void)
   cuda_lib.call = dlsym (h, #call);\
   if (cuda_lib.call == NULL)   \
 return false;
-  CUDA_CALLS
+#include "cuda-lib.def"
   cuda_lib_inited = true;
   return true;
 }


Re: [PATCH,nvptx] Remove use of 'struct map' from plugin (nvptx)

2018-08-01 Thread Tom de Vries
On 08/01/2018 03:43 PM, Cesar Philippidis wrote:
> On 08/01/2018 04:01 AM, Tom de Vries wrote:
>> On 07/31/2018 05:12 PM, Cesar Philippidis wrote:
>>> This is an old patch which removes the struct map from the nvptx plugin.
>>> I believe at one point this was supposed to be used to manage async data
>>> mappings, but in practice that never worked out.
>>
>> I don't quite understand what rationale you're trying to present here.
>>
>> Is this dead code?
> 
> It's dead code.
> 

Then OK.

Thanks,
- Tom


Re: [PATCH,nvptx] Remove use of 'struct map' from plugin (nvptx)

2018-08-01 Thread Cesar Philippidis
On 08/01/2018 04:01 AM, Tom de Vries wrote:
> On 07/31/2018 05:12 PM, Cesar Philippidis wrote:
>> This is an old patch which removes the struct map from the nvptx plugin.
>> I believe at one point this was supposed to be used to manage async data
>> mappings, but in practice that never worked out.
> 
> I don't quite understand what rationale you're trying to present here.
> 
> Is this dead code?

It's dead code.

Cesar


[PATCHv3 1/6] Improve libstdc++-v3 async test

2018-08-01 Thread Mike Crowe
Add tests for waiting for the future using both std::chrono::steady_clock
and std::chrono::system_clock in preparation for dealing with those clocks
properly in futex.cc.
---
 libstdc++-v3/testsuite/30_threads/async/async.cc | 33 
 1 file changed, 33 insertions(+)

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 4c2cdd1a534..015bcce0c2c 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -51,17 +51,50 @@ void test02()
   VERIFY( status == std::future_status::timeout );
   status = f1.wait_until(std::chrono::system_clock::now());
   VERIFY( status == std::future_status::timeout );
+  status = f1.wait_until(std::chrono::steady_clock::now());
+  VERIFY( status == std::future_status::timeout );
   l.unlock();  // allow async thread to proceed
   f1.wait();   // wait for it to finish
   status = f1.wait_for(std::chrono::milliseconds(0));
   VERIFY( status == std::future_status::ready );
   status = f1.wait_until(std::chrono::system_clock::now());
   VERIFY( status == std::future_status::ready );
+  status = f1.wait_until(std::chrono::steady_clock::now());
+  VERIFY( status == std::future_status::ready );
+}
+
+// This test is prone to failures if run on a loaded machine where the
+// kernel decides not to schedule us for several seconds. It also
+// assumes that no-one will warp CLOCK whilst the test is
+// running.
+template
+void test03()
+{
+  auto const start = CLOCK::now();
+  future f1 = async(launch::async, []() {
+  std::this_thread::sleep_for(std::chrono::seconds(2));
+});
+  std::future_status status;
+
+  status = f1.wait_for(std::chrono::milliseconds(500));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(1));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(5));
+  VERIFY( status == std::future_status::ready );
+
+  auto const elapsed = CLOCK::now() - start;
+  VERIFY( elapsed >= std::chrono::seconds(2) );
+  VERIFY( elapsed < std::chrono::seconds(5) );
 }

 int main()
 {
   test01();
   test02();
+  test03();
+  test03();
   return 0;
 }
--
2.11.0

BrightSign considers your privacy to be very important. The emails you send to 
us will be protected and secured. Furthermore, we will only use your email and 
contact information for the reasons you sent them to us and for tracking how 
effectively we respond to your requests.


[PATCHv3 2/6] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

2018-08-01 Thread Mike Crowe
The futex system call supports waiting for an absolute time if
FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT. Doing so provides two
benefits:

1. The call to gettimeofday is not required in order to calculate a
   relative timeout.

2. If someone changes the system clock during the wait then the futex
   timeout will correctly expire earlier or later. Currently that only
   happens if the clock is changed prior to the call to gettimeofday.

According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25. To ensure
that the code still works correctly with earlier kernel versions, an ENOSYS
error from futex[1] results in the futex_clock_realtime_unavailable being
flag being set. This flag is used to avoid the unnecessary unsupported
futex call in the future and to fall back to the previous gettimeofday and
relative time implementation.

glibc applied an equivalent switch in pthread_cond_timedwait to use
FUTEX_CLOCK_REALTIME and FUTEX_WAIT_BITSET rather than FUTEX_WAIT for
glibc-2.10 back in 2009. See glibc:cbd8aeb836c8061c23a5e00419e0fb25a34abee7

The futex_clock_realtime_unavailable flag is accessed using
std::memory_order_relaxed to stop it becoming a bottleneck. If the first
two calls to _M_futex_wait_until happen to happen simultaneously then the
only consequence is that both will try to use FUTEX_CLOCK_REALTIME, both
risk discovering that it doesn't work and, if so, both set the flag.

[1] This is how glibc's nptl-init.c determines whether these flags are
supported.
---
 libstdc++-v3/src/c++11/futex.cc | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index 278a5a80902..72062a4285e 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -35,8 +35,16 @@

 // Constants for the wait/wake futex syscall operations
 const unsigned futex_wait_op = 0;
+const unsigned futex_wait_bitset_op = 9;
+const unsigned futex_clock_realtime_flag = 256;
+const unsigned futex_bitset_match_any = ~0;
 const unsigned futex_wake_op = 1;

+namespace
+{
+  std::atomic futex_clock_realtime_unavailable;
+}
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -58,6 +66,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 else
   {
+   if (!futex_clock_realtime_unavailable.load(std::memory_order_relaxed))
+ {
+   struct timespec rt;
+   rt.tv_sec = __s.count();
+   rt.tv_nsec = __ns.count();
+   if (syscall (SYS_futex, __addr, futex_wait_bitset_op | 
futex_clock_realtime_flag, __val, , nullptr, futex_bitset_match_any) == -1)
+ {
+   _GLIBCXX_DEBUG_ASSERT(errno == EINTR || errno == EAGAIN
+ || errno == ETIMEDOUT || errno == ENOSYS);
+   if (errno == ETIMEDOUT)
+ return false;
+   if (errno == ENOSYS)
+ {
+   futex_clock_realtime_unavailable.store(true, 
std::memory_order_relaxed);
+   // Fall through to legacy implementation if the system
+   // call is unavailable.
+ }
+   else
+ return true;
+ }
+   else
+ return true;
+ }
+
+   // We only get to here if futex_clock_realtime_unavailable was
+   // true or has just been set to true.
struct timeval tv;
gettimeofday (, NULL);
// Convert the absolute timeout value to a relative timeout
--
2.11.0

BrightSign considers your privacy to be very important. The emails you send to 
us will be protected and secured. Furthermore, we will only use your email and 
contact information for the reasons you sent them to us and for tracking how 
effectively we respond to your requests.


[PATCHv3 3/6] libstdc++ futex: Support waiting on std::chrono::steady_clock directly

2018-08-01 Thread Mike Crowe
The user-visible effect of this change is for std::future::wait_until to
use CLOCK_MONOTONIC when passed a timeout of std::chrono::steady_clock
type. This makes it immune to any changes made to the system clock
CLOCK_REALTIME.

Add an overload of __atomic_futex_unsigned::_M_load_and_text_until_impl
that accepts a std::chrono::steady_clock, and correctly passes this through
to __atomic_futex_unsigned_base::_M_futex_wait_until_steady which uses
CLOCK_MONOTONIC for the timeout within the futex system call. These
functions are mostly just copies of the std::chrono::system_clock versions
with small tweaks.

Prior to this commit, a std::chrono::steady timeout would be converted via
std::chrono::system_clock which risks reducing or increasing the timeout if
someone changes CLOCK_REALTIME whilst the wait is happening. (The commit
immediately prior to this one increases the window of opportunity for that
from a short period during the calculation of a relative timeout, to the
entire duration of the wait.)

FUTEX_WAIT_BITSET was added in kernel v2.6.25. If futex reports ENOSYS to
indicate that this operation is not supported then the code falls back to
using clock_gettime(2) to calculate a relative time to wait for.

I believe that I've added this functionality in a way that it doesn't break
ABI compatibility, but that has made it more verbose and less type safe. I
believe that it would be better to maintain the timeout as an instance of
the correct clock type all the way down to a single _M_futex_wait_until
function with an overload for each clock. The current scheme of separating
out the seconds and nanoseconds early risks accidentally calling the wait
function for the wrong clock. Unfortunately, doing this would break code
that compiled against the old header.
---
 libstdc++-v3/include/bits/atomic_futex.h | 67 ++-
 libstdc++-v3/src/c++11/futex.cc  | 79 
 2 files changed, 145 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index ecf5b02031a..47ecd329ea9 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -52,11 +52,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if defined(_GLIBCXX_HAVE_LINUX_FUTEX) && ATOMIC_INT_LOCK_FREE > 1
   struct __atomic_futex_unsigned_base
   {
-// Returns false iff a timeout occurred.
+// __s and __ns are measured against CLOCK_REALTIME. Returns false
+// iff a timeout occurred.
 bool
 _M_futex_wait_until(unsigned *__addr, unsigned __val, bool __has_timeout,
chrono::seconds __s, chrono::nanoseconds __ns);

+// __s and __ns are measured against CLOCK_MONOTONIC. Returns
+// false iff a timeout occurred.
+bool
+_M_futex_wait_until_steady(unsigned *__addr, unsigned __val, bool 
__has_timeout,
+   chrono::seconds __s, chrono::nanoseconds __ns);
+
 // This can be executed after the object has been destroyed.
 static void _M_futex_notify_all(unsigned* __addr);
   };
@@ -86,6 +93,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // value if equal is false.
 // The assumed value is the caller's assumption about the current value
 // when making the call.
+// __s and __ns are measured against CLOCK_REALTIME.
 unsigned
 _M_load_and_test_until(unsigned __assumed, unsigned __operand,
bool __equal, memory_order __mo, bool __has_timeout,
@@ -110,6 +118,36 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 }

+// If a timeout occurs, returns a current value after the timeout;
+// otherwise, returns the operand's value if equal is true or a different
+// value if equal is false.
+// The assumed value is the caller's assumption about the current value
+// when making the call.
+// __s and __ns are measured against CLOCK_MONOTONIC.
+unsigned
+_M_load_and_test_until_steady(unsigned __assumed, unsigned __operand,
+   bool __equal, memory_order __mo, bool __has_timeout,
+   chrono::seconds __s, chrono::nanoseconds __ns)
+{
+  for (;;)
+   {
+ // Don't bother checking the value again because we expect the caller
+ // to have done it recently.
+ // memory_order_relaxed is sufficient because we can rely on just the
+ // modification order (store_notify uses an atomic RMW operation too),
+ // and the futex syscalls synchronize between themselves.
+ _M_data.fetch_or(_Waiter_bit, memory_order_relaxed);
+ bool __ret = _M_futex_wait_until_steady((unsigned*)(void*)&_M_data,
+  __assumed | _Waiter_bit,
+  __has_timeout, __s, __ns);
+ // Fetch the current value after waiting (clears _Waiter_bit).
+ __assumed = _M_load(__mo);
+ if (!__ret || ((__operand == __assumed) == __equal))
+   return __assumed;
+ // TODO adapt wait time
+   

[PATCHv3 5/6] libstdc++ futex: Loop when waiting against arbitrary clock

2018-08-01 Thread Mike Crowe
If std::future::wait_until is passed a time point measured against a clock
that is neither std::chrono::steady_clock nor std::chrono::system_clock
then the generic implementation of
__atomic_futex_unsigned::_M_load_when_equal_until is called which
calculates the timeout based on __clock_t and calls the
_M_load_when_equal_until method for that clock to perform the actual wait.

There's no guarantee that __clock_t is running at the same speed as
__clock_t, so if the underlying wait times out timeout we need to check the
timeout against the caller's clock again before potentially looping.

Also add two extra tests to the testsuite's async.cc:

* run test03 with steady_clock_copy, which behaves identically to
  std::chrono::steady_clock, but isn't std::chrono::steady_clock. This
  causes the overload of __atomic_futex_unsigned::_M_load_when_equal_until
  that takes an arbitrary clock to be called.

* invent test04 which uses a deliberately slow running clock in order to
  exercise the looping behaviour o
  __atomic_futex_unsigned::_M_load_when_equal_until described above.
---
 libstdc++-v3/include/bits/atomic_futex.h | 15 --
 libstdc++-v3/testsuite/30_threads/async/async.cc | 69 
 2 files changed, 79 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index a35020aef4f..b7ffb7fb191 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -229,11 +229,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_load_when_equal_until(unsigned __val, memory_order __mo,
  const chrono::time_point<_Clock, _Duration>& __atime)
   {
-   const typename _Clock::time_point __c_entry = _Clock::now();
-   const __clock_t::time_point __s_entry = __clock_t::now();
-   const auto __delta = __atime - __c_entry;
-   const auto __s_atime = __s_entry + __delta;
-   return _M_load_when_equal_until(__val, __mo, __s_atime);
+   typename _Clock::time_point __c_entry = _Clock::now();
+   do {
+ const __clock_t::time_point __s_entry = __clock_t::now();
+ const auto __delta = __atime - __c_entry;
+ const auto __s_atime = __s_entry + __delta;
+ if (_M_load_when_equal_until(__val, __mo, __s_atime))
+   return true;
+ __c_entry = _Clock::now();
+   } while (__c_entry < __atime);
+   return false;
   }

 // Returns false iff a timeout occurred.
diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 015bcce0c2c..755c95cbea6 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -63,6 +63,27 @@ void test02()
   VERIFY( status == std::future_status::ready );
 }

+// This clock is used to ensure that the
+// __atomic_futex_unsigned::_M_load_when_equal_until overload that
+// takes an arbitrary clock is exercised. Without it, only the
+// specific std::chrono::steady_clock and std::chrono::realtime_clock
+// overloads are exercised.
+
+struct steady_clock_copy
+{
+  using rep = std::chrono::steady_clock::rep;
+  using period = std::chrono::steady_clock::period;
+  using duration = std::chrono::steady_clock::duration;
+  using time_point = std::chrono::time_point;
+  static constexpr bool is_steady = true;
+
+  static time_point now()
+  {
+auto steady = std::chrono::steady_clock::now();
+return time_point{steady.time_since_epoch()};
+  }
+};
+
 // This test is prone to failures if run on a loaded machine where the
 // kernel decides not to schedule us for several seconds. It also
 // assumes that no-one will warp CLOCK whilst the test is
@@ -90,11 +111,59 @@ void test03()
   VERIFY( elapsed < std::chrono::seconds(5) );
 }

+// This clock is supposed to run at a tenth of normal speed, but we
+// don't have to worry about rounding errors causing us to wake up
+// slightly too early below if we actually run it at an eleventh of
+// normal speed.
+struct slow_clock
+{
+  using rep = std::chrono::steady_clock::rep;
+  using period = std::chrono::steady_clock::period;
+  using duration = std::chrono::steady_clock::duration;
+  using time_point = std::chrono::time_point;
+  static constexpr bool is_steady = true;
+
+  static time_point now()
+  {
+auto steady = std::chrono::steady_clock::now();
+return time_point{steady.time_since_epoch() / 11};
+  }
+};
+
+void test04()
+{
+  auto const slow_start = slow_clock::now();
+  future f1 = async(launch::async, []() {
+  std::this_thread::sleep_for(std::chrono::seconds(2));
+});
+
+  // Wait for ~1s
+  {
+auto const steady_begin = std::chrono::steady_clock::now();
+auto const status = f1.wait_until(slow_start + 
std::chrono::milliseconds(100));
+VERIFY(status == std::future_status::timeout);
+auto const elapsed = std::chrono::steady_clock::now() - steady_begin;
+VERIFY(elapsed >= 

[PATCHv3 6/6] Extra async tests, not for merging

2018-08-01 Thread Mike Crowe
These tests show that changing the system clock has an effect on
std::future::wait_until when using std::chrono::system_clock but not when
using std::chrono::steady_clock. Unfortunately these tests have a number of
downsides:

1. Nothing that is attempting to keep the clock set correctly (ntpd,
   systemd-timesyncd) can be running at the same time.

2. The test process requires the CAP_SYS_TIME capability (although, as it's
   written it checks for being root.)

3. Other processes running concurrently may misbehave when the clock darts
   back and forth.

4. They are slow to run.

As such, I don't think they are suitable for merging. I include them here
because I wanted to document how I had tested the changes in the previous
commits.
---
 libstdc++-v3/testsuite/30_threads/async/async.cc | 70 
 1 file changed, 70 insertions(+)

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 755c95cbea6..4f547fb5a75 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -24,6 +24,7 @@

 #include 
 #include 
+#include 

 using namespace std;

@@ -157,6 +158,71 @@ void test04()
   }
 }

+void perturb_system_clock(const std::chrono::seconds )
+{
+  struct timeval tv;
+  if (gettimeofday(, NULL))
+abort();
+
+  tv.tv_sec += seconds.count();
+  if (settimeofday(, NULL))
+abort();
+}
+
+// Ensure that advancing CLOCK_REALTIME doesn't make any difference
+// when we're waiting on std::chrono::steady_clock.
+void test05()
+{
+  auto const start = chrono::steady_clock::now();
+  future f1 = async(launch::async, []() {
+  std::this_thread::sleep_for(std::chrono::seconds(10));
+});
+
+  perturb_system_clock(chrono::seconds(20));
+
+  std::future_status status;
+  status = f1.wait_for(std::chrono::seconds(4));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(6));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(12));
+  VERIFY( status == std::future_status::ready );
+
+  auto const elapsed = chrono::steady_clock::now() - start;
+  VERIFY( elapsed >= std::chrono::seconds(10) );
+  VERIFY( elapsed < std::chrono::seconds(15) );
+
+  perturb_system_clock(chrono::seconds(-20));
+}
+
+// Ensure that advancing CLOCK_REALTIME does make a difference when
+// we're waiting on std::chrono::system_clock.
+void test06()
+{
+  auto const start = chrono::system_clock::now();
+  auto const start_steady = chrono::steady_clock::now();
+
+  future f1 = async(launch::async, []() {
+  std::this_thread::sleep_for(std::chrono::seconds(5));
+  perturb_system_clock(chrono::seconds(60));
+  std::this_thread::sleep_for(std::chrono::seconds(5));
+});
+  future_status status;
+  status = f1.wait_until(start + std::chrono::seconds(60));
+  VERIFY( status == std::future_status::timeout );
+
+  auto const elapsed_steady = chrono::steady_clock::now() - start_steady;
+  VERIFY( elapsed_steady >= std::chrono::seconds(5) );
+  VERIFY( elapsed_steady < std::chrono::seconds(10) );
+
+  status = f1.wait_until(start + std::chrono::seconds(75));
+  VERIFY( status == std::future_status::ready );
+
+  perturb_system_clock(chrono::seconds(-60));
+}
+
 int main()
 {
   test01();
@@ -165,5 +231,9 @@ int main()
   test03();
   test03();
   test04();
+  if (geteuid() == 0) {
+test05();
+test06();
+  }
   return 0;
 }
--
2.11.0

BrightSign considers your privacy to be very important. The emails you send to 
us will be protected and secured. Furthermore, we will only use your email and 
contact information for the reasons you sent them to us and for tracking how 
effectively we respond to your requests.


[PATCHv3 4/6] libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock

2018-08-01 Thread Mike Crowe
The user-visible effect of this change is that std::future::wait_for now
uses std::chrono::steady_clock to determine the timeout. This makes it
immune to changes made to the system clock. It also means that anyone using
their own clock types with std::future::wait_until will have the timeout
converted to std::chrono::steady_clock rather than
std::chrono::system_clock.

Now that use of both std::chrono::steady_clock and
std::chrono::system_clock are correctly supported for the wait timeout, I
believe that std::chrono::steady_clock is a better choice for the reference
clock that all other clocks are converted to since it is guaranteed to
advance steadily. The previous behaviour of converting to
std::chrono::system_clock risks timeouts changing dramatically when the
system clock is changed.
---
 libstdc++-v3/include/bits/atomic_futex.h | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index 47ecd329ea9..a35020aef4f 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -71,7 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template 
   class __atomic_futex_unsigned : __atomic_futex_unsigned_base
   {
-typedef chrono::system_clock __clock_t;
+typedef chrono::steady_clock __clock_t;

 // This must be lock-free and at offset 0.
 atomic _M_data;
@@ -169,7 +169,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 unsigned
 _M_load_and_test_until_impl(unsigned __assumed, unsigned __operand,
bool __equal, memory_order __mo,
-   const chrono::time_point<__clock_t, _Dur>& __atime)
+   const chrono::time_point& __atime)
 {
   auto __s = chrono::time_point_cast(__atime);
   auto __ns = chrono::duration_cast(__atime - __s);
@@ -229,7 +229,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_load_when_equal_until(unsigned __val, memory_order __mo,
  const chrono::time_point<_Clock, _Duration>& __atime)
   {
-   // DR 887 - Sync unknown clock to known clock.
const typename _Clock::time_point __c_entry = _Clock::now();
const __clock_t::time_point __s_entry = __clock_t::now();
const auto __delta = __atime - __c_entry;
@@ -241,7 +240,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 template
 _GLIBCXX_ALWAYS_INLINE bool
 _M_load_when_equal_until(unsigned __val, memory_order __mo,
-   const chrono::time_point<__clock_t, _Duration>& __atime)
+   const chrono::time_point& __atime)
 {
   unsigned __i = _M_load(__mo);
   if ((__i & ~_Waiter_bit) == __val)
--
2.11.0

BrightSign considers your privacy to be very important. The emails you send to 
us will be protected and secured. Furthermore, we will only use your email and 
contact information for the reasons you sent them to us and for tracking how 
effectively we respond to your requests.


[PATCHv3 0/6] std::future::wait_* improvements

2018-08-01 Thread Mike Crowe
v2 of this series was originally posted back in January (see
https://gcc.gnu.org/ml/libstdc++/2018-01/msg00035.html )

Apart from minor log message tweaks, the changes since that version are:

* [1/6] Improve libstdc++-v3 async test

  Speed up the tests at the risk of more sporadic failures on loaded
  machines.

  Use lambda rather than separate function for asynchronous routine.

* [2/6] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

  Fall back to using gettimeofday and FUTEX_WAIT if FUTEX_WAIT_BITSET and
  FUTEX_CLOCK_REALTIME are not available.

* [3/6] libstdc++ futex: Support waiting on std::chrono::steady_clock directly

  Fall back to using clock_gettime (or the sycall directly if necessary)
  and FUTEX_WAIT if FUTEX_WAIT_BITSET is unavailable.

* [4/6] libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock

  No changes

* [5/6] libstdc++ futex: Loop when waiting against arbitrary clock

  New patch. My work on std::condition_variable::wait_until made me realise
  that there's a risk of indicating a timeout too early when using a
  non-standard clock.

* [6/6] Extra async tests, not for merging

  Use lambdas rather than separate functions for asynchronous routines.


Torvald Riegel had some objections to my design, but did not respond when I
attempted to justify it and attempted to change my implementation based on
his suggestions (see https://gcc.gnu.org/ml/libstdc++/2018-01/msg00071.html
.)

It looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68519 could
apply equally well to __atomic_futex_unsigned::_M_load_when_equal_for.
I plan to look at that next.

I set up a Debian 4.0 "Etch" system (v2.6.18 kernel, glibc 2.3.6.)
Unfortunately, its GCC 4.1.2 is unable to compile GCC master (as of
5ba044fc3a443274462527eed385732f7ecee3a8) because hash-map.h appears to be
trying to put a reference in a std::pair.) The above patches cherry-pick
cleanly back to GCC 7.3, and that version does build after adding a few
constants to the system headers. I've confirmed that the lack of support
for FUTEX_WAIT_BITSET is detected correctly and that the code falls back to
using FUTEX_WAIT. This test also shows that the fallback to calling the
clock_gettime system call directly is working too. The async.cc tests all
passed.

I haven't made any attempt to add entries to the .abilist files. I'm not
sure whether I'm supposed to do that as part of the patches, or not.


Mike Crowe (6):
  Improve libstdc++-v3 async test
  libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
  libstdc++ futex: Support waiting on std::chrono::steady_clock directly
  libstdc++ atomic_futex: Use std::chrono::steady_clock as reference
clock
  libstdc++ futex: Loop when waiting against arbitrary clock
  Extra async tests, not for merging

 libstdc++-v3/include/bits/atomic_futex.h |  89 ++--
 libstdc++-v3/src/c++11/futex.cc  | 113 +++
 libstdc++-v3/testsuite/30_threads/async/async.cc | 172 +++
 3 files changed, 364 insertions(+), 10 deletions(-)

--
2.11.0

BrightSign considers your privacy to be very important. The emails you send to 
us will be protected and secured. Furthermore, we will only use your email and 
contact information for the reasons you sent them to us and for tracking how 
effectively we respond to your requests.


Re: [PATCH] Make GO string literals properly NUL terminated

2018-08-01 Thread Bernd Edlinger


On 08/01/18 11:29, Richard Biener wrote:
> 
> Hmm.  I think it would be nice if TREE_STRING_LENGTH would
> match char[2] and TYPE_SIZE_UNIT even if that is inconvenient
> for your check above.  Because the '\0' doesn't belong to the
> string.  Then build_string internally appends a '\0' outside
> of TREE_STRING_LENGTH.
> 

Hmm. Yes, but the outside-0 byte is just one byte, not a wide
character.  There are STRING_CSTs which are not string literals,
for instance attribute tags, Pragmas, asm constrants, etc.
They use the '\0' outside, and have probably no TREE_TYPE.

> 
>> So I would like to be able to assume that the STRING_CST objects
>> are internally always generated properly by the front end.
> 
> Yeah, I guess we need to define what "properly" is ;)
> 
Yes.

>> And that the ARRAY_TYPE of the string literal either has the
>> same length than the TREE_STRING_LENGTH or if it is shorter,
>> this is always exactly one (wide) character size less than TREE_STRING_LENGTH
> 
> I think it should be always the same...
> 

One could not differentiate between "\0" without zero-termination
and "" with zero-termination, theoretically.
We also have char x[100] = "ab";
that is TREE_STRING_LENGTH=3, and TYPE_SIZE_UNIT(TREE_TYPE(x)) = 100.
Of course one could create it with a TREE_STRING_LENGTH = 100,
but imagine char x[1000] = "ab"

>> The idea is to use this property of string literals where needed,
>> and check rigorously in varasm.c.
>>
>> Does that make sense?
> 
> So if it is not the same then the excess character needs to be
> a (wide) NUL in your model?  ISTR your varasm.c patch didn't verify
> that.
> 

I think it does.


Bernd.


Re: [08/11] Make hoist_defs_of_uses use vec_info::lookup_def

2018-08-01 Thread Richard Biener
On Mon, Jul 30, 2018 at 1:43 PM Richard Sandiford
 wrote:
>
> This patch makes hoist_defs_of_uses use vec_info::lookup_def instead of:
>
>   if (!gimple_nop_p (def_stmt)
>   && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
>
> to test whether a feeding scalar statement needs to be hoisted out
> of the vectorised loop.  It isn't worth doing in its own right,
> but it's a prerequisite for the next patch, which needs to update
> the stmt_vec_infos of the hoisted statements.

OK.

>
> 2018-07-30  Richard Sandiford  
>
> gcc/
> * tree-vect-stmts.c (hoist_defs_of_uses): Use vec_info::lookup_def
> instead of gimple_nop_p and flow_bb_inside_loop_p to decide
> whether a statement needs to be hoisted.
>
> Index: gcc/tree-vect-stmts.c
> ===
> *** gcc/tree-vect-stmts.c   2018-07-30 12:42:35.633169005 +0100
> --- gcc/tree-vect-stmts.c   2018-07-30 12:42:35.629169040 +0100
> *** permute_vec_elements (tree x, tree y, tr
> *** 7322,7370 
>   static bool
>   hoist_defs_of_uses (stmt_vec_info stmt_info, struct loop *loop)
>   {
> ssa_op_iter i;
> tree op;
> bool any = false;
>
> FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE)
> ! {
> !   gimple *def_stmt = SSA_NAME_DEF_STMT (op);
> !   if (!gimple_nop_p (def_stmt)
> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
> !   {
> ! /* Make sure we don't need to recurse.  While we could do
> !so in simple cases when there are more complex use webs
> !we don't have an easy way to preserve stmt order to fulfil
> !dependencies within them.  */
> ! tree op2;
> ! ssa_op_iter i2;
> ! if (gimple_code (def_stmt) == GIMPLE_PHI)
> return false;
> ! FOR_EACH_SSA_TREE_OPERAND (op2, def_stmt, i2, SSA_OP_USE)
> !   {
> ! gimple *def_stmt2 = SSA_NAME_DEF_STMT (op2);
> ! if (!gimple_nop_p (def_stmt2)
> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt2)))
> !   return false;
> !   }
> ! any = true;
> !   }
> ! }
>
> if (!any)
>   return true;
>
> FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE)
> ! {
> !   gimple *def_stmt = SSA_NAME_DEF_STMT (op);
> !   if (!gimple_nop_p (def_stmt)
> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
> !   {
> ! gimple_stmt_iterator gsi = gsi_for_stmt (def_stmt);
> ! gsi_remove (, false);
> ! gsi_insert_on_edge_immediate (loop_preheader_edge (loop), def_stmt);
> !   }
> ! }
>
> return true;
>   }
> --- 7322,7360 
>   static bool
>   hoist_defs_of_uses (stmt_vec_info stmt_info, struct loop *loop)
>   {
> +   vec_info *vinfo = stmt_info->vinfo;
> ssa_op_iter i;
> tree op;
> bool any = false;
>
> FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE)
> ! if (stmt_vec_info def_stmt_info = vinfo->lookup_def (op))
> !   {
> !   /* Make sure we don't need to recurse.  While we could do
> !  so in simple cases when there are more complex use webs
> !  we don't have an easy way to preserve stmt order to fulfil
> !  dependencies within them.  */
> !   tree op2;
> !   ssa_op_iter i2;
> !   if (gimple_code (def_stmt_info->stmt) == GIMPLE_PHI)
> ! return false;
> !   FOR_EACH_SSA_TREE_OPERAND (op2, def_stmt_info->stmt, i2, SSA_OP_USE)
> ! if (vinfo->lookup_def (op2))
> return false;
> !   any = true;
> !   }
>
> if (!any)
>   return true;
>
> FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE)
> ! if (stmt_vec_info def_stmt_info = vinfo->lookup_def (op))
> !   {
> !   gimple_stmt_iterator gsi = gsi_for_stmt (def_stmt_info->stmt);
> !   gsi_remove (, false);
> !   gsi_insert_on_edge_immediate (loop_preheader_edge (loop),
> ! def_stmt_info->stmt);
> !   }
>
> return true;
>   }


Re: C++ PATCH for c++/57891, narrowing conversions in non-type template arguments

2018-08-01 Thread Marek Polacek
Ping.

On Mon, Jul 23, 2018 at 04:49:12PM -0400, Marek Polacek wrote:
> On Tue, Jul 03, 2018 at 04:27:33PM -0400, Jason Merrill wrote:
> > On Tue, Jul 3, 2018 at 3:41 PM, Jason Merrill  wrote:
> > > On Tue, Jul 3, 2018 at 2:58 PM, Marek Polacek  wrote:
> > >> On Tue, Jul 03, 2018 at 12:40:51PM -0400, Jason Merrill wrote:
> > >>> On Fri, Jun 29, 2018 at 3:58 PM, Marek Polacek  
> > >>> wrote:
> > >>> > On Wed, Jun 27, 2018 at 07:35:15PM -0400, Jason Merrill wrote:
> > >>> >> On Wed, Jun 27, 2018 at 12:53 PM, Marek Polacek  
> > >>> >> wrote:
> > >>> >> > This PR complains about us accepting invalid code like
> > >>> >> >
> > >>> >> >   template struct A {};
> > >>> >> >   A<-1> a;
> > >>> >> >
> > >>> >> > Where we should detect the narrowing: [temp.arg.nontype] says
> > >>> >> > "A template-argument for a non-type template-parameter shall be a 
> > >>> >> > converted
> > >>> >> > constant expression ([expr.const]) of the type of the 
> > >>> >> > template-parameter."
> > >>> >> > and a converted constant expression can contain only
> > >>> >> > - integral conversions other than narrowing conversions,
> > >>> >> > - [...]."
> > >>> >> > It spurred e.g.
> > >>> >> > 
> > >>> >> > and has >=3 dups so it has some visibility.
> > >>> >> >
> > >>> >> > I think build_converted_constant_expr needs to set check_narrowing.
> > >>> >> > check_narrowing also always mentions that it's in { } but that is 
> > >>> >> > no longer
> > >>> >> > true; in the future it will also apply to <=>.  We'd probably have 
> > >>> >> > to add a new
> > >>> >> > flag to struct conversion if wanted to distinguish between these.
> > >>> >> >
> > >>> >> > This does not yet fix detecting narrowing in function templates 
> > >>> >> > (78244).
> > >>> >> >
> > >>> >> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > >>> >> >
> > >>> >> > 2018-06-27  Marek Polacek  
> > >>> >> >
> > >>> >> > PR c++/57891
> > >>> >> > * call.c (build_converted_constant_expr): Set 
> > >>> >> > check_narrowing.
> > >>> >> > * decl.c (compute_array_index_type): Add warning sentinel. 
> > >>> >> >  Use
> > >>> >> > input_location.
> > >>> >> > * pt.c (convert_nontype_argument): Return NULL_TREE if any 
> > >>> >> > errors
> > >>> >> > were reported.
> > >>> >> > * typeck2.c (check_narrowing): Don't mention { } in 
> > >>> >> > diagnostic.
> > >>> >> >
> > >>> >> > * g++.dg/cpp0x/Wnarrowing6.C: New test.
> > >>> >> > * g++.dg/cpp0x/Wnarrowing7.C: New test.
> > >>> >> > * g++.dg/cpp0x/Wnarrowing8.C: New test.
> > >>> >> > * g++.dg/cpp0x/constexpr-data2.C: Add dg-error.
> > >>> >> > * g++.dg/init/new43.C: Adjust dg-error.
> > >>> >> > * g++.dg/other/fold1.C: Likewise.
> > >>> >> > * g++.dg/parse/array-size2.C: Likewise.
> > >>> >> > * g++.dg/other/vrp1.C: Add dg-error.
> > >>> >> > * g++.dg/template/char1.C: Likewise.
> > >>> >> > * g++.dg/ext/builtin12.C: Likewise.
> > >>> >> > * g++.dg/template/dependent-name3.C: Adjust dg-error.
> > >>> >> >
> > >>> >> > diff --git gcc/cp/call.c gcc/cp/call.c
> > >>> >> > index 209c1fd2f0e..956c7b149dc 100644
> > >>> >> > --- gcc/cp/call.c
> > >>> >> > +++ gcc/cp/call.c
> > >>> >> > @@ -4152,7 +4152,10 @@ build_converted_constant_expr (tree type, 
> > >>> >> > tree expr, tsubst_flags_t complain)
> > >>> >> >  }
> > >>> >> >
> > >>> >> >if (conv)
> > >>> >> > -expr = convert_like (conv, expr, complain);
> > >>> >> > +{
> > >>> >> > +  conv->check_narrowing = !processing_template_decl;
> > >>> >>
> > >>> >> Why !processing_template_decl?  This needs a comment.
> > >>> >
> > >>> > Otherwise we'd warn for e.g.
> > >>> >
> > >>> > template struct S { char a[N]; };
> > >>> > S<1> s;
> > >>> >
> > >>> > where compute_array_index_type will try to convert the size of the 
> > >>> > array (which
> > >>> > is a template_parm_index of type int when parsing the template) to 
> > >>> > size_type.
> > >>> > So I guess I can say that we need to wait for instantiation?
> > >>>
> > >>> We certainly shouldn't give a narrowing diagnostic about a
> > >>> value-dependent expression.  It probably makes sense to check that at
> > >>> the top of check_narrowing, with all the other early exit conditions.
> > >>> But if we do know the constant value in the template, it's good to
> > >>> complain then rather than wait for instantiation.
> > >>
> > >> Makes sense; how about this then?  (Regtest/bootstrap running.)
> > >>
> > >> 2018-07-03  Marek Polacek  
> > >>
> > >> PR c++/57891
> > >> * call.c (build_converted_constant_expr): Set check_narrowing.
> > >> * decl.c (compute_array_index_type): Add warning sentinel.  Use
> > >> input_location.
> > >> * pt.c (convert_nontype_argument): Return NULL_TREE if any errors
> > >> were 

Re: [07/11] Use single basic block array in loop_vec_info

2018-08-01 Thread Richard Biener
On Mon, Jul 30, 2018 at 1:42 PM Richard Sandiford
 wrote:
>
> _loop_vec_info::_loop_vec_info used get_loop_array to get the
> order of the blocks when creating stmt_vec_infos, but then used
> dfs_enumerate_from to get the order of the blocks that the rest
> of the vectoriser uses.  We should be able to use that order
> for creating stmt_vec_infos too.

OK.  Note I have rev_post_order_and_mark_dfs_back_seme for a patch I'm working
on (RPO order on a single-entry multiple-exit region).  I'll try to
remember that "fixme".

Richard.

>
> 2018-07-30  Richard Sandiford  
>
> gcc/
> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Use the
> result of dfs_enumerate_from when constructing stmt_vec_infos,
> instead of additionally calling get_loop_body.
>
> Index: gcc/tree-vect-loop.c
> ===
> *** gcc/tree-vect-loop.c2018-07-30 12:40:59.366015643 +0100
> --- gcc/tree-vect-loop.c2018-07-30 12:40:59.362015678 +0100
> *** _loop_vec_info::_loop_vec_info (struct l
> *** 834,844 
>   scalar_loop (NULL),
>   orig_loop_info (NULL)
>   {
> !   /* Create/Update stmt_info for all stmts in the loop.  */
> !   basic_block *body = get_loop_body (loop);
> !   for (unsigned int i = 0; i < loop->num_nodes; i++)
>   {
> !   basic_block bb = body[i];
> gimple_stmt_iterator si;
>
> for (si = gsi_start_phis (bb); !gsi_end_p (si); gsi_next ())
> --- 834,851 
>   scalar_loop (NULL),
>   orig_loop_info (NULL)
>   {
> !   /* CHECKME: We want to visit all BBs before their successors (except for
> !  latch blocks, for which this assertion wouldn't hold).  In the simple
> !  case of the loop forms we allow, a dfs order of the BBs would the same
> !  as reversed postorder traversal, so we are safe.  */
> !
> !   unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
> ! bbs, loop->num_nodes, loop);
> !   gcc_assert (nbbs == loop->num_nodes);
> !
> !   for (unsigned int i = 0; i < nbbs; i++)
>   {
> !   basic_block bb = bbs[i];
> gimple_stmt_iterator si;
>
> for (si = gsi_start_phis (bb); !gsi_end_p (si); gsi_next ())
> *** _loop_vec_info::_loop_vec_info (struct l
> *** 855,870 
>   add_stmt (stmt);
> }
>   }
> -   free (body);
> -
> -   /* CHECKME: We want to visit all BBs before their successors (except for
> -  latch blocks, for which this assertion wouldn't hold).  In the simple
> -  case of the loop forms we allow, a dfs order of the BBs would the same
> -  as reversed postorder traversal, so we are safe.  */
> -
> -   unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
> - bbs, loop->num_nodes, loop);
> -   gcc_assert (nbbs == loop->num_nodes);
>   }
>
>   /* Free all levels of MASKS.  */
> --- 862,867 


Re: [06/11] Handle VMAT_INVARIANT separately

2018-08-01 Thread Richard Biener
On Mon, Jul 30, 2018 at 1:41 PM Richard Sandiford
 wrote:
>
> Invariant loads were handled as a variation on the code for contiguous
> loads.  We detected whether they were invariant or not as a byproduct of
> creating the vector pointer ivs: vect_create_data_ref_ptr passed back an
> inv_p to say whether the pointer was invariant.
>
> But vectorised invariant loads just keep the original scalar load,
> so this meant that detecting invariant loads had the side-effect of
> creating an unwanted vector pointer iv.  The placement of the code
> also meant that we'd create a vector load and then not use the result.
> In principle this is wrong code, since there's no guarantee that there's
> a vector's worth of accessible data at that address, but we rely on DCE
> to get rid of the load before any harm is done.
>
> E.g., for an invariant load in an inner loop (which seems like the more
> common use case for this code), we'd create:
>
>vectp_a.6_52 =  + 4;
>
># vectp_a.5_53 = PHI 
>
># vectp_a.5_55 = PHI 
>
>vect_next_a_11.7_57 = MEM[(int *)vectp_a.5_55];
>next_a_11 = a[_1];
>vect_cst__58 = {next_a_11, next_a_11, next_a_11, next_a_11};
>
>vectp_a.5_56 = vectp_a.5_55 + 4;
>
>vectp_a.5_54 = vectp_a.5_53 + 0;
>
> whereas all we want is:
>
>next_a_11 = a[_1];
>vect_cst__58 = {next_a_11, next_a_11, next_a_11, next_a_11};
>
> This patch moves the handling to its own block and makes
> vect_create_data_ref_ptr assert (when creating a full iv) that the
> address isn't invariant.
>
> The ncopies handling is unfortunate, but a preexisting issue.
> Richi's suggestion of using a vector of vector statements would
> let us reuse one statement for all copies.

OK.

Richard.

>
> 2018-07-30  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vect_create_data_ref_ptr): Remove inv_p
> parameter.
> * tree-vect-data-refs.c (vect_create_data_ref_ptr): Likewise.
> When creating an iv, assert that the step is not known to be zero.
> (vect_setup_realignment): Update call accordingly.
> * tree-vect-stmts.c (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.  Handle VMAT_INVARIANT separately.
>
> Index: gcc/tree-vectorizer.h
> ===
> *** gcc/tree-vectorizer.h   2018-07-30 12:32:29.586506669 +0100
> --- gcc/tree-vectorizer.h   2018-07-30 12:40:13.0 +0100
> *** extern bool vect_analyze_data_refs (vec_
> *** 1527,1533 
>   extern void vect_record_base_alignments (vec_info *);
>   extern tree vect_create_data_ref_ptr (stmt_vec_info, tree, struct loop *, 
> tree,
>   tree *, gimple_stmt_iterator *,
> ! gimple **, bool, bool *,
>   tree = NULL_TREE, tree = NULL_TREE);
>   extern tree bump_vector_ptr (tree, gimple *, gimple_stmt_iterator *,
>  stmt_vec_info, tree);
> --- 1527,1533 
>   extern void vect_record_base_alignments (vec_info *);
>   extern tree vect_create_data_ref_ptr (stmt_vec_info, tree, struct loop *, 
> tree,
>   tree *, gimple_stmt_iterator *,
> ! gimple **, bool,
>   tree = NULL_TREE, tree = NULL_TREE);
>   extern tree bump_vector_ptr (tree, gimple *, gimple_stmt_iterator *,
>  stmt_vec_info, tree);
> Index: gcc/tree-vect-data-refs.c
> ===
> *** gcc/tree-vect-data-refs.c   2018-07-30 12:32:26.214536374 +0100
> --- gcc/tree-vect-data-refs.c   2018-07-30 12:32:32.546480596 +0100
> *** vect_create_addr_base_for_vector_ref (st
> *** 4674,4689 
>
> Return the increment stmt that updates the pointer in PTR_INCR.
>
> !3. Set INV_P to true if the access pattern of the data reference in the
> !   vectorized loop is invariant.  Set it to false otherwise.
> !
> !4. Return the pointer.  */
>
>   tree
>   vect_create_data_ref_ptr (stmt_vec_info stmt_info, tree aggr_type,
>   struct loop *at_loop, tree offset,
>   tree *initial_address, gimple_stmt_iterator *gsi,
> ! gimple **ptr_incr, bool only_init, bool *inv_p,
>   tree byte_offset, tree iv_step)
>   {
> const char *base_name;
> --- 4674,4686 
>
> Return the increment stmt that updates the pointer in PTR_INCR.
>
> !3. Return the pointer.  */
>
>   tree
>   vect_create_data_ref_ptr (stmt_vec_info stmt_info, tree aggr_type,
>   struct loop *at_loop, tree offset,
>   tree *initial_address, gimple_stmt_iterator *gsi,
> ! gimple **ptr_incr, bool only_init,
>   tree byte_offset, tree iv_step)
>   {
> const char 

Re: [05/11] Add a vect_stmt_to_vectorize helper function

2018-08-01 Thread Richard Biener
On Mon, Jul 30, 2018 at 1:39 PM Richard Sandiford
 wrote:
>
> This patch adds a helper that does the opposite of vect_orig_stmt:
> go from the original scalar statement to the statement that should
> actually be vectorised.
>
> The use in the last two hunks of vectorizable_reduction are because
> reduc_stmt_info (first hunk) and stmt_info (second hunk) are already
> pattern statements if appropriate.

OK.

Richard.

>
> 2018-07-30  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vect_stmt_to_vectorize): New function.
> * tree-vect-loop.c (vect_update_vf_for_slp): Use it.
> (vectorizable_reduction): Likewise.
> * tree-vect-slp.c (vect_analyze_slp_instance): Likewise.
> (vect_detect_hybrid_slp_stmts): Likewise.
> * tree-vect-stmts.c (vect_is_simple_use): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-30 12:32:26.218536339 +0100
> +++ gcc/tree-vectorizer.h   2018-07-30 12:32:29.586506669 +0100
> @@ -1131,6 +1131,17 @@ vect_orig_stmt (stmt_vec_info stmt_info)
>return stmt_info;
>  }
>
> +/* If STMT_INFO has been replaced by a pattern statement, return the
> +   replacement statement, otherwise return STMT_INFO itself.  */
> +
> +inline stmt_vec_info
> +vect_stmt_to_vectorize (stmt_vec_info stmt_info)
> +{
> +  if (STMT_VINFO_IN_PATTERN_P (stmt_info))
> +return STMT_VINFO_RELATED_STMT (stmt_info);
> +  return stmt_info;
> +}
> +
>  /* Return true if BB is a loop header.  */
>
>  static inline bool
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2018-07-30 12:32:26.214536374 +0100
> +++ gcc/tree-vect-loop.c2018-07-30 12:32:29.586506669 +0100
> @@ -1424,9 +1424,7 @@ vect_update_vf_for_slp (loop_vec_info lo
>gsi_next ())
> {
>   stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (gsi_stmt (si));
> - if (STMT_VINFO_IN_PATTERN_P (stmt_info)
> - && STMT_VINFO_RELATED_STMT (stmt_info))
> -   stmt_info = STMT_VINFO_RELATED_STMT (stmt_info);
> + stmt_info = vect_stmt_to_vectorize (stmt_info);
>   if ((STMT_VINFO_RELEVANT_P (stmt_info)
>|| VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info)))
>   && !PURE_SLP_STMT (stmt_info))
> @@ -6111,8 +6109,7 @@ vectorizable_reduction (stmt_vec_info st
> return true;
>
>stmt_vec_info reduc_stmt_info = STMT_VINFO_REDUC_DEF (stmt_info);
> -  if (STMT_VINFO_IN_PATTERN_P (reduc_stmt_info))
> -   reduc_stmt_info = STMT_VINFO_RELATED_STMT (reduc_stmt_info);
> +  reduc_stmt_info = vect_stmt_to_vectorize (reduc_stmt_info);
>
>if (STMT_VINFO_VEC_REDUCTION_TYPE (reduc_stmt_info)
>   == EXTRACT_LAST_REDUCTION)
> @@ -6145,8 +6142,7 @@ vectorizable_reduction (stmt_vec_info st
>if (ncopies > 1
>   && STMT_VINFO_RELEVANT (reduc_stmt_info) <= vect_used_only_live
>   && (use_stmt_info = loop_vinfo->lookup_single_use (phi_result))
> - && (use_stmt_info == reduc_stmt_info
> - || STMT_VINFO_RELATED_STMT (use_stmt_info) == reduc_stmt_info))
> + && vect_stmt_to_vectorize (use_stmt_info) == reduc_stmt_info)
> single_defuse_cycle = true;
>
>/* Create the destination vector  */
> @@ -6915,8 +6911,7 @@ vectorizable_reduction (stmt_vec_info st
>if (ncopies > 1
>&& (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
>&& (use_stmt_info = loop_vinfo->lookup_single_use (reduc_phi_result))
> -  && (use_stmt_info == stmt_info
> - || STMT_VINFO_RELATED_STMT (use_stmt_info) == stmt_info))
> +  && vect_stmt_to_vectorize (use_stmt_info) == stmt_info)
>  {
>single_defuse_cycle = true;
>epilog_copies = 1;
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c 2018-07-30 12:32:26.218536339 +0100
> +++ gcc/tree-vect-slp.c 2018-07-30 12:32:29.586506669 +0100
> @@ -1969,11 +1969,7 @@ vect_analyze_slp_instance (vec_info *vin
>/* Collect the stores and store them in SLP_TREE_SCALAR_STMTS.  */
>while (next_info)
>  {
> - if (STMT_VINFO_IN_PATTERN_P (next_info)
> - && STMT_VINFO_RELATED_STMT (next_info))
> -   scalar_stmts.safe_push (STMT_VINFO_RELATED_STMT (next_info));
> - else
> -   scalar_stmts.safe_push (next_info);
> + scalar_stmts.safe_push (vect_stmt_to_vectorize (next_info));
>   next_info = DR_GROUP_NEXT_ELEMENT (next_info);
>  }
>  }
> @@ -1983,11 +1979,7 @@ vect_analyze_slp_instance (vec_info *vin
>  SLP_TREE_SCALAR_STMTS.  */
>while (next_info)
>  {
> - if (STMT_VINFO_IN_PATTERN_P (next_info)
> - && STMT_VINFO_RELATED_STMT (next_info))
> -   scalar_stmts.safe_push 

Re: [04/11] Add a vect_orig_stmt helper function

2018-08-01 Thread Richard Biener
On Mon, Jul 30, 2018 at 1:38 PM Richard Sandiford
 wrote:
>
> This patch just adds a helper function for going from a potential
> pattern statement to the original scalar statement.

OK.

Richard.

>
> 2018-07-30  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vect_orig_stmt): New function.
> * tree-vect-data-refs.c (vect_preserves_scalar_order_p): Use it.
> * tree-vect-loop.c (vect_model_reduction_cost): Likewise.
> (vect_create_epilog_for_reduction): Likewise.
> (vectorizable_live_operation): Likewise.
> * tree-vect-slp.c (vect_find_last_scalar_stmt_in_slp): Likewise.
> (vect_detect_hybrid_slp_stmts, vect_schedule_slp): Likewise.
> * tree-vect-stmts.c (vectorizable_call): Likewise.
> (vectorizable_simd_clone_call, vect_remove_stores): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-30 12:32:22.718567174 +0100
> +++ gcc/tree-vectorizer.h   2018-07-30 12:32:26.218536339 +0100
> @@ -1120,6 +1120,17 @@ is_pattern_stmt_p (stmt_vec_info stmt_in
>return stmt_info->pattern_stmt_p;
>  }
>
> +/* If STMT_INFO is a pattern statement, return the statement that it
> +   replaces, otherwise return STMT_INFO itself.  */
> +
> +inline stmt_vec_info
> +vect_orig_stmt (stmt_vec_info stmt_info)
> +{
> +  if (is_pattern_stmt_p (stmt_info))
> +return STMT_VINFO_RELATED_STMT (stmt_info);
> +  return stmt_info;
> +}
> +
>  /* Return true if BB is a loop header.  */
>
>  static inline bool
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-07-30 12:32:08.934688600 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-30 12:32:26.214536374 +0100
> @@ -214,10 +214,8 @@ vect_preserves_scalar_order_p (dr_vec_in
>   (but could happen later) while reads will happen no later than their
>   current position (but could happen earlier).  Reordering is therefore
>   only possible if the first access is a write.  */
> -  if (is_pattern_stmt_p (stmtinfo_a))
> -stmtinfo_a = STMT_VINFO_RELATED_STMT (stmtinfo_a);
> -  if (is_pattern_stmt_p (stmtinfo_b))
> -stmtinfo_b = STMT_VINFO_RELATED_STMT (stmtinfo_b);
> +  stmtinfo_a = vect_orig_stmt (stmtinfo_a);
> +  stmtinfo_b = vect_orig_stmt (stmtinfo_b);
>stmt_vec_info earlier_stmt_info = get_earlier_stmt (stmtinfo_a, 
> stmtinfo_b);
>return !DR_IS_WRITE (STMT_VINFO_DATA_REF (earlier_stmt_info));
>  }
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2018-07-30 12:32:22.714567210 +0100
> +++ gcc/tree-vect-loop.c2018-07-30 12:32:26.214536374 +0100
> @@ -3814,10 +3814,7 @@ vect_model_reduction_cost (stmt_vec_info
>
>vectype = STMT_VINFO_VECTYPE (stmt_info);
>mode = TYPE_MODE (vectype);
> -  stmt_vec_info orig_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info);
> -
> -  if (!orig_stmt_info)
> -orig_stmt_info = stmt_info;
> +  stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
>
>code = gimple_assign_rhs_code (orig_stmt_info->stmt);
>
> @@ -4738,13 +4735,8 @@ vect_create_epilog_for_reduction (vec   Otherwise (it is a regular reduction) - the tree-code and scalar-def
>   are taken from STMT.  */
>
> -  stmt_vec_info orig_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info);
> -  if (!orig_stmt_info)
> -{
> -  /* Regular reduction  */
> -  orig_stmt_info = stmt_info;
> -}
> -  else
> +  stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
> +  if (orig_stmt_info != stmt_info)
>  {
>/* Reduction pattern  */
>gcc_assert (STMT_VINFO_IN_PATTERN_P (orig_stmt_info));
> @@ -5540,11 +5532,7 @@ vect_create_epilog_for_reduction (vecif (REDUC_GROUP_FIRST_ELEMENT (stmt_info))
>  {
>stmt_vec_info dest_stmt_info
> -   = SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1];
> -  /* Handle reduction patterns.  */
> -  if (STMT_VINFO_RELATED_STMT (dest_stmt_info))
> -   dest_stmt_info = STMT_VINFO_RELATED_STMT (dest_stmt_info);
> -
> +   = vect_orig_stmt (SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1]);
>scalar_dest = gimple_assign_lhs (dest_stmt_info->stmt);
>group_size = 1;
>  }
> @@ -7898,10 +7886,8 @@ vectorizable_live_operation (stmt_vec_in
>return true;
>  }
>
> -  /* If stmt has a related stmt, then use that for getting the lhs.  */
> -  gimple *stmt = (is_pattern_stmt_p (stmt_info)
> - ? STMT_VINFO_RELATED_STMT (stmt_info)->stmt
> - : stmt_info->stmt);
> +  /* Use the lhs of the original scalar statement.  */
> +  gimple *stmt = vect_orig_stmt (stmt_info)->stmt;
>
>lhs = (is_a  (stmt)) ? gimple_phi_result (stmt)
> : gimple_get_lhs (stmt);
> Index: gcc/tree-vect-slp.c
> ===
> 

Re: [03/11] Remove vect_transform_stmt grouped_store argument

2018-08-01 Thread Richard Biener
On Mon, Jul 30, 2018 at 1:38 PM Richard Sandiford
 wrote:
>
> Nothing now uses the grouped_store value passed back by
> vect_transform_stmt, so we might as well remove it.

OK.

>
> 2018-07-30  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vect_transform_stmt): Remove grouped_store
> argument.
> * tree-vect-stmts.c (vect_transform_stmt): Likewise.
> * tree-vect-loop.c (vect_transform_loop_stmt): Update call 
> accordingly.
> (vect_transform_loop): Likewise.
> * tree-vect-slp.c (vect_schedule_slp_instance): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-30 12:32:19.366596715 +0100
> +++ gcc/tree-vectorizer.h   2018-07-30 12:32:22.718567174 +0100
> @@ -1459,7 +1459,7 @@ extern tree vect_init_vector (stmt_vec_i
>gimple_stmt_iterator *);
>  extern tree vect_get_vec_def_for_stmt_copy (vec_info *, tree);
>  extern bool vect_transform_stmt (stmt_vec_info, gimple_stmt_iterator *,
> - bool *, slp_tree, slp_instance);
> +slp_tree, slp_instance);
>  extern void vect_remove_stores (stmt_vec_info);
>  extern bool vect_analyze_stmt (stmt_vec_info, bool *, slp_tree, slp_instance,
>stmt_vector_for_cost *);
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2018-07-30 12:32:09.114687014 +0100
> +++ gcc/tree-vect-stmts.c   2018-07-30 12:32:22.718567174 +0100
> @@ -9662,8 +9662,7 @@ vect_analyze_stmt (stmt_vec_info stmt_in
>
>  bool
>  vect_transform_stmt (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> -bool *grouped_store, slp_tree slp_node,
> - slp_instance slp_node_instance)
> +slp_tree slp_node, slp_instance slp_node_instance)
>  {
>vec_info *vinfo = stmt_info->vinfo;
>bool is_store = false;
> @@ -9727,7 +9726,6 @@ vect_transform_stmt (stmt_vec_info stmt_
>  last store in the chain is reached.  Store stmts before the last
>  one are skipped, and there vec_stmt_info shouldn't be freed
>  meanwhile.  */
> - *grouped_store = true;
>   stmt_vec_info group_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
>   if (DR_GROUP_STORE_COUNT (group_info) == DR_GROUP_SIZE (group_info))
> is_store = true;
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2018-07-30 12:32:16.190624704 +0100
> +++ gcc/tree-vect-loop.c2018-07-30 12:32:22.714567210 +0100
> @@ -8243,8 +8243,7 @@ vect_transform_loop_stmt (loop_vec_info
>if (dump_enabled_p ())
>  dump_printf_loc (MSG_NOTE, vect_location, "transform statement.\n");
>
> -  bool grouped_store = false;
> -  if (vect_transform_stmt (stmt_info, gsi, _store, NULL, NULL))
> +  if (vect_transform_stmt (stmt_info, gsi, NULL, NULL))
>  *seen_store = stmt_info;
>  }
>
> @@ -8425,7 +8424,7 @@ vect_transform_loop (loop_vec_info loop_
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location, "transform phi.\n");
> - vect_transform_stmt (stmt_info, NULL, NULL, NULL, NULL);
> + vect_transform_stmt (stmt_info, NULL, NULL, NULL);
> }
> }
>
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c 2018-07-30 12:32:19.366596715 +0100
> +++ gcc/tree-vect-slp.c 2018-07-30 12:32:22.714567210 +0100
> @@ -3853,7 +3853,6 @@ vect_transform_slp_perm_load (slp_tree n
>  vect_schedule_slp_instance (slp_tree node, slp_instance instance,
> scalar_stmts_to_slp_tree_map_t *bst_map)
>  {
> -  bool grouped_store;
>gimple_stmt_iterator si;
>stmt_vec_info stmt_info;
>unsigned int group_size;
> @@ -3945,11 +3944,11 @@ vect_schedule_slp_instance (slp_tree nod
>   vec v1;
>   unsigned j;
>   tree tmask = NULL_TREE;
> - vect_transform_stmt (stmt_info, , _store, node, 
> instance);
> + vect_transform_stmt (stmt_info, , node, instance);
>   v0 = SLP_TREE_VEC_STMTS (node).copy ();
>   SLP_TREE_VEC_STMTS (node).truncate (0);
>   gimple_assign_set_rhs_code (stmt, ocode);
> - vect_transform_stmt (stmt_info, , _store, node, 
> instance);
> + vect_transform_stmt (stmt_info, , node, instance);
>   gimple_assign_set_rhs_code (stmt, code0);
>   v1 = SLP_TREE_VEC_STMTS (node).copy ();
>   SLP_TREE_VEC_STMTS (node).truncate (0);
> @@ -3994,7 +3993,7 @@ vect_schedule_slp_instance (slp_tree nod
>   return;
> }
>  }
> -  vect_transform_stmt (stmt_info, , _store, node, instance);
> +  vect_transform_stmt 

Re: [02/11] Remove vect_schedule_slp return value

2018-08-01 Thread Richard Biener
On Mon, Jul 30, 2018 at 1:37 PM Richard Sandiford
 wrote:
>
> Nothing now uses the vect_schedule_slp return value, so it's not worth
> propagating the value through vect_schedule_slp_instance.

OK.

>
> 2018-07-30  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vect_schedule_slp): Return void.
> * tree-vect-slp.c (vect_schedule_slp_instance): Likewise.
> (vect_schedule_slp): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2018-07-30 12:32:09.114687014 +0100
> +++ gcc/tree-vectorizer.h   2018-07-30 12:32:19.366596715 +0100
> @@ -1575,7 +1575,7 @@ extern bool vect_transform_slp_perm_load
>   gimple_stmt_iterator *, poly_uint64,
>   slp_instance, bool, unsigned *);
>  extern bool vect_slp_analyze_operations (vec_info *);
> -extern bool vect_schedule_slp (vec_info *);
> +extern void vect_schedule_slp (vec_info *);
>  extern bool vect_analyze_slp (vec_info *, unsigned);
>  extern bool vect_make_slp_decision (loop_vec_info);
>  extern void vect_detect_hybrid_slp (loop_vec_info);
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c 2018-07-30 12:32:09.026687790 +0100
> +++ gcc/tree-vect-slp.c 2018-07-30 12:32:19.366596715 +0100
> @@ -3849,11 +3849,11 @@ vect_transform_slp_perm_load (slp_tree n
>
>  /* Vectorize SLP instance tree in postorder.  */
>
> -static bool
> +static void
>  vect_schedule_slp_instance (slp_tree node, slp_instance instance,
> scalar_stmts_to_slp_tree_map_t *bst_map)
>  {
> -  bool grouped_store, is_store;
> +  bool grouped_store;
>gimple_stmt_iterator si;
>stmt_vec_info stmt_info;
>unsigned int group_size;
> @@ -3862,14 +3862,14 @@ vect_schedule_slp_instance (slp_tree nod
>slp_tree child;
>
>if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
> -return false;
> +return;
>
>/* See if we have already vectorized the same set of stmts and reuse their
>   vectorized stmts.  */
>if (slp_tree *leader = bst_map->get (SLP_TREE_SCALAR_STMTS (node)))
>  {
>SLP_TREE_VEC_STMTS (node).safe_splice (SLP_TREE_VEC_STMTS (*leader));
> -  return false;
> +  return;
>  }
>
>bst_map->put (SLP_TREE_SCALAR_STMTS (node).copy (), node);
> @@ -3991,11 +3991,10 @@ vect_schedule_slp_instance (slp_tree nod
> }
>   v0.release ();
>   v1.release ();
> - return false;
> + return;
> }
>  }
> -  is_store = vect_transform_stmt (stmt_info, , _store, node,
> - instance);
> +  vect_transform_stmt (stmt_info, , _store, node, instance);
>
>/* Restore stmt def-types.  */
>FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> @@ -4005,8 +4004,6 @@ vect_schedule_slp_instance (slp_tree nod
> FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (child), j, child_stmt_info)
>   STMT_VINFO_DEF_TYPE (child_stmt_info) = vect_internal_def;
>}
> -
> -  return is_store;
>  }
>
>  /* Replace scalar calls from SLP node NODE with setting of their lhs to zero.
> @@ -4048,14 +4045,12 @@ vect_remove_slp_scalar_calls (slp_tree n
>
>  /* Generate vector code for all SLP instances in the loop/basic block.  */
>
> -bool
> +void
>  vect_schedule_slp (vec_info *vinfo)
>  {
>vec slp_instances;
>slp_instance instance;
>unsigned int i;
> -  bool is_store = false;
> -
>
>scalar_stmts_to_slp_tree_map_t *bst_map
>  = new scalar_stmts_to_slp_tree_map_t ();
> @@ -4063,8 +4058,8 @@ vect_schedule_slp (vec_info *vinfo)
>FOR_EACH_VEC_ELT (slp_instances, i, instance)
>  {
>/* Schedule the tree of INSTANCE.  */
> -  is_store = vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance),
> - instance, bst_map);
> +  vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance),
> + instance, bst_map);
>if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
>   "vectorizing stmts using SLP.\n");
> @@ -4099,6 +4094,4 @@ vect_schedule_slp (vec_info *vinfo)
>   vinfo->remove_stmt (store_info);
>  }
>  }
> -
> -  return is_store;
>  }


Re: [01/11] Schedule SLP earlier

2018-08-01 Thread Richard Biener
On Mon, Jul 30, 2018 at 1:37 PM Richard Sandiford
 wrote:
>
> vect_transform_loop used to call vect_schedule_slp lazily when it
> came across the first SLP statement, but it seems easier to do it
> before the main loop.

Indeed.

OK.
Richard.

>
> 2018-07-30  Richard Sandiford  
>
> gcc/
> * tree-vect-loop.c (vect_transform_loop_stmt): Remove slp_scheduled
> argument.
> (vect_transform_loop): Update calls accordingly.  Schedule SLP
> instances before the main loop, if any exist.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2018-07-30 12:32:15.0 +0100
> +++ gcc/tree-vect-loop.c2018-07-30 12:32:16.190624704 +0100
> @@ -8199,14 +8199,12 @@ scale_profile_for_vect_loop (struct loop
>  }
>
>  /* Vectorize STMT_INFO if relevant, inserting any new instructions before 
> GSI.
> -   When vectorizing STMT_INFO as a store, set *SEEN_STORE to its 
> stmt_vec_info.
> -   *SLP_SCHEDULE is a running record of whether we have called
> -   vect_schedule_slp.  */
> +   When vectorizing STMT_INFO as a store, set *SEEN_STORE to its
> +   stmt_vec_info.  */
>
>  static void
>  vect_transform_loop_stmt (loop_vec_info loop_vinfo, stmt_vec_info stmt_info,
> - gimple_stmt_iterator *gsi,
> - stmt_vec_info *seen_store, bool *slp_scheduled)
> + gimple_stmt_iterator *gsi, stmt_vec_info 
> *seen_store)
>  {
>struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> @@ -8237,24 +8235,10 @@ vect_transform_loop_stmt (loop_vec_info
> dump_printf_loc (MSG_NOTE, vect_location, "multiple-types.\n");
>  }
>
> -  /* SLP.  Schedule all the SLP instances when the first SLP stmt is
> - reached.  */
> -  if (slp_vect_type slptype = STMT_SLP_TYPE (stmt_info))
> -{
> -
> -  if (!*slp_scheduled)
> -   {
> - *slp_scheduled = true;
> -
> - DUMP_VECT_SCOPE ("scheduling SLP instances");
> -
> - vect_schedule_slp (loop_vinfo);
> -   }
> -
> -  /* Hybrid SLP stmts must be vectorized in addition to SLP.  */
> -  if (slptype == pure_slp)
> -   return;
> -}
> +  /* Pure SLP statements have already been vectorized.  We still need
> + to apply loop vectorization to hybrid SLP statements.  */
> +  if (PURE_SLP_STMT (stmt_info))
> +return;
>
>if (dump_enabled_p ())
>  dump_printf_loc (MSG_NOTE, vect_location, "transform statement.\n");
> @@ -8284,7 +8268,6 @@ vect_transform_loop (loop_vec_info loop_
>tree niters_vector_mult_vf = NULL_TREE;
>poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>unsigned int lowest_vf = constant_lower_bound (vf);
> -  bool slp_scheduled = false;
>gimple *stmt;
>bool check_profitability = false;
>unsigned int th;
> @@ -8390,6 +8373,14 @@ vect_transform_loop (loop_vec_info loop_
>  /* This will deal with any possible peeling.  */
>  vect_prepare_for_masked_peels (loop_vinfo);
>
> +  /* Schedule the SLP instances first, then handle loop vectorization
> + below.  */
> +  if (!loop_vinfo->slp_instances.is_empty ())
> +{
> +  DUMP_VECT_SCOPE ("scheduling SLP instances");
> +  vect_schedule_slp (loop_vinfo);
> +}
> +
>/* FORNOW: the vectorizer supports only loops which body consist
>   of one basic block (header + empty latch). When the vectorizer will
>   support more involved loop forms, the order by which the BBs are
> @@ -8468,16 +8459,15 @@ vect_transform_loop (loop_vec_info loop_
>   stmt_vec_info pat_stmt_info
> = loop_vinfo->lookup_stmt (gsi_stmt (subsi));
>   vect_transform_loop_stmt (loop_vinfo, pat_stmt_info,
> -   , _store,
> -   _scheduled);
> +   , _store);
> }
>   stmt_vec_info pat_stmt_info
> = STMT_VINFO_RELATED_STMT (stmt_info);
>   vect_transform_loop_stmt (loop_vinfo, pat_stmt_info, 
> ,
> -   _store, _scheduled);
> +   _store);
> }
>   vect_transform_loop_stmt (loop_vinfo, stmt_info, ,
> -   _store, _scheduled);
> +   _store);
> }
>   gsi_next ();
>   if (seen_store)


Re: [PATCH] Add malloc predictor (PR middle-end/83023).

2018-08-01 Thread Martin Liška
On 08/01/2018 02:25 PM, Marc Glisse wrote:
> On Wed, 1 Aug 2018, Martin Liška wrote:
> 
>> On 07/27/2018 02:38 PM, Marc Glisse wrote:
>>> On Fri, 27 Jul 2018, Martin Liška wrote:
>>>
 So answer is yes, the builtin can be then removed.
>>>
>>> Good, thanks. While looking at how widely it is going to apply, I noticed 
>>> that the default, throwing operator new has attribute malloc and 
>>> everything, but the non-throwing variant declared in  doesn't, so it 
>>> won't benefit from the new predictor. I don't know if there is a good 
>>> reason for this disparity...
>>>
>>
>> Well in case somebody uses operator new:
>>
>>     int* p1 = new int;
>>     if (p1)
>>  delete p1;
>>
>> we optimize out that to if (true), even when one has used defined
>> operator new. Thus it's probably OK.
> 
> Throwing new is returns_nonnull (errors are reported with exceptions) so 
> that's fine, but non-throwing new is not:
> 
> int* p1 = new(std::nothrow) int;
> 
> Here errors are reported by returning 0, so it is common to test if p1 is 0 
> and this is precisely the case that could benefit from a predictor but does 
> not have the attribute to do so (there are also consequences on aliasing).

Then it can be handled with DECL_IS_OPERATOR_NEW, for those we can also set the 
newly introduced predictor.

> 
> (Jan's remark about functions with an inferred malloc attribute reminds me 
> that at some point, the code was adding attribute malloc for functions that 
> always return 0...)
>

By inferred do you mean function that are marked as malloc in IPA pure-const 
(propagate_malloc)?
Example would be appreciated.

Martin



Re: [PATCH,nvptx] Truncate config/nvptx/oacc-parallel.c

2018-08-01 Thread Tom de Vries
On 08/01/2018 01:55 PM, Jakub Jelinek wrote:
> On Wed, Aug 01, 2018 at 01:33:09PM +0200, Tom de Vries wrote:
>> On 07/31/2018 05:55 PM, Cesar Philippidis wrote:
>>> Way back in the GCC 5 days when support for OpenACC was in its infancy,
>>> we used to rely on having various GOACC_ thread functions in the runtime
>>> to implement the execution model, or there lack of (that version of GCC
>>> only supported vector level parallelism). However, beginning with GCC 6,
>>> those external functions were replaced with internal functions that get
>>> expanded by the nvptx BE directly.
>>>
>>> This patch removes those stale libgomp functions from the nvptx libgomp
>>> target. Is this OK for trunk, or does libgomp still need to maintain
>>> backwards compatibility with GCC 5?
>>>
>>> This patch has been bootstrapped and regtested for x86_64 with nvptx
>>> offloading.
>>
>> AFAIU, if you use a GCC 5 nvptx offloading compiler that generates calls
>> to these GOACC_ thread functions, you're also expected to use the GCC 5
>> nvptx libgomp.a containing these functions, so I don't see any backwards
>> compatibility issues here.
>>
>> OK for me.
>>
>> Jakub, do you have an opinion here?
> 
> The ABI compatibility is mainly for libgomp.so which hasn't (ever) bumped
> the soname and I don't plan to do that any time soon, but even for the
> offloaded libgomp.a I guess one might compile with GCC 5 and link with GCC
> 9 and expect things not to fail miserably.  This is a *.a library, can't you
> e.g. move those functions to a separate *.c file so that they aren't linked
> in unless GCC 5 is really used?

You're describing the current situation: all those functions sit
together in config/nvptx/oacc-parallel.c. [ Besides, the nvptx .o and .a
files are marked up text files containing ptx functions, so I'm not sure
if the linker is obliged to pull in entire .o files. ]

Thanks,
- Tom


Re: [PATCH] Add malloc predictor (PR middle-end/83023).

2018-08-01 Thread Marc Glisse

On Wed, 1 Aug 2018, Martin Liška wrote:


On 07/27/2018 02:38 PM, Marc Glisse wrote:

On Fri, 27 Jul 2018, Martin Liška wrote:


So answer is yes, the builtin can be then removed.


Good, thanks. While looking at how widely it is going to apply, I noticed that the 
default, throwing operator new has attribute malloc and everything, but the 
non-throwing variant declared in  doesn't, so it won't benefit from the 
new predictor. I don't know if there is a good reason for this disparity...



Well in case somebody uses operator new:

int* p1 = new int;
if (p1)
 delete p1;

we optimize out that to if (true), even when one has used defined
operator new. Thus it's probably OK.


Throwing new is returns_nonnull (errors are reported with exceptions) so 
that's fine, but non-throwing new is not:


int* p1 = new(std::nothrow) int;

Here errors are reported by returning 0, so it is common to test if p1 is 
0 and this is precisely the case that could benefit from a predictor but 
does not have the attribute to do so (there are also consequences on 
aliasing).


(Jan's remark about functions with an inferred malloc attribute reminds me 
that at some point, the code was adding attribute malloc for functions 
that always return 0...)


--
Marc Glisse


Re: [PATCH] Introduce __builtin_expect_with_probability (PR target/83610).

2018-08-01 Thread Martin Liška
On 07/31/2018 11:24 AM, Jan Hubicka wrote:
>> Hi.
>>
>> This is implementation of new built-in that can be used for more fine
>> tweaking of probability. Micro benchmark is attached as part of the PR.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
> 
> It reasonale to me to add the fature. Years ago I made similar patch and at
> that time it did not go in based on argumentation that programers are not good
> on guessing probabilities and this is too much of fine control while it should
> be done by profile feedback. 
> 
> However I guess it is better to have way to specify probability than tweak 
> with
> --param that controls the builtin_expect outcome globally.

Agree with that, that's why I implemented that.

> 
> What I think would be useful is to tie this to the code giving loop trip
> estimate.  If you know that the loop iterates 100 times at the average, you
> can specify probability 1%.   For this it seems to me more sensible to have
> the percentage parameter to be double rather than long so one can specify 
> larger
> trip counts this way.

Makes fully sense, please take a look at attached updated patch.

Martin

> 
> Honza
> 
> 
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-07-24  Martin Liska  
>>
>> PR target/83610
>>  * builtin-types.def (BT_FN_LONG_LONG_LONG_LONG): New type.
>>  * builtins.c (expand_builtin_expect_with_probability):
>> New function.
>>  (expand_builtin): Handle also BUILT_IN_EXPECT_WITH_PROBABILITY.
>>  (build_builtin_expect_predicate): Likewise.
>>  (fold_builtin_expect): Likewise.
>>  (fold_builtin_2): Likewise.
>>  (fold_builtin_3): Likewise.
>>  * builtins.def (BUILT_IN_EXPECT_WITH_PROBABILITY): Define new
>> builtin.
>>  * builtins.h (fold_builtin_expect): Add new argument
>> (probability).
>>  * doc/extend.texi: Document the new builtin.
>>  * doc/invoke.texi: Likewise.
>>  * gimple-fold.c (gimple_fold_call): Pass new argument.
>>  * ipa-fnsummary.c (find_foldable_builtin_expect):
>> Handle also BUILT_IN_EXPECT_WITH_PROBABILITY.
>>  * predict.c (expr_expected_value): Add new out argument which
>> is probability.
>>  (expr_expected_value_1): Likewise.
>>  (tree_predict_by_opcode): Predict edge based on
>> provided probability.
>>  (pass_strip_predict_hints::execute): Use newly added
>> DECL_BUILT_IN_P macro.
>>  * predict.def (PRED_BUILTIN_EXPECT_WITH_PROBABILITY):
>> Define new predictor.
>>  * tree.h (DECL_BUILT_IN_P): Define.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2018-07-24  Martin Liska  
>>
>>  * gcc.dg/predict-16.c: New test.
>> ---
>>  gcc/builtin-types.def |  2 +
>>  gcc/builtins.c| 65 ---
>>  gcc/builtins.def  |  1 +
>>  gcc/builtins.h|  2 +-
>>  gcc/doc/extend.texi   |  8 
>>  gcc/doc/invoke.texi   |  3 ++
>>  gcc/gimple-fold.c |  3 +-
>>  gcc/ipa-fnsummary.c   |  1 +
>>  gcc/predict.c | 61 ++---
>>  gcc/predict.def   |  5 +++
>>  gcc/testsuite/gcc.dg/predict-16.c | 13 +++
>>  gcc/tree.h|  6 +++
>>  12 files changed, 140 insertions(+), 30 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/predict-16.c
>>
>>
> 
>> diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
>> index b01095c420f..6e87bcbbf1d 100644
>> --- a/gcc/builtin-types.def
>> +++ b/gcc/builtin-types.def
>> @@ -531,6 +531,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_ULONG_ULONG_ULONG_ULONG,
>>   BT_ULONG, BT_ULONG, BT_ULONG, BT_ULONG)
>>  DEF_FUNCTION_TYPE_3 (BT_FN_LONG_LONG_UINT_UINT,
>>   BT_LONG, BT_LONG, BT_UINT, BT_UINT)
>> +DEF_FUNCTION_TYPE_3 (BT_FN_LONG_LONG_LONG_LONG,
>> + BT_LONG, BT_LONG, BT_LONG, BT_LONG)
>>  DEF_FUNCTION_TYPE_3 (BT_FN_ULONG_ULONG_UINT_UINT,
>>   BT_ULONG, BT_ULONG, BT_UINT, BT_UINT)
>>  DEF_FUNCTION_TYPE_3 (BT_FN_STRING_CONST_STRING_CONST_STRING_INT,
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index 539a6d17688..29d77d3d83b 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -148,6 +148,7 @@ static rtx expand_builtin_unop (machine_mode, tree, rtx, 
>> rtx, optab);
>>  static rtx expand_builtin_frame_address (tree, tree);
>>  static tree stabilize_va_list_loc (location_t, tree, int);
>>  static rtx expand_builtin_expect (tree, rtx);
>> +static rtx expand_builtin_expect_with_probability (tree, rtx);
>>  static tree fold_builtin_constant_p (tree);
>>  static tree fold_builtin_classify_type (tree);
>>  static tree fold_builtin_strlen (location_t, tree, tree);
>> @@ -5237,6 +5238,27 @@ expand_builtin_expect (tree exp, rtx target)
>>return target;
>>  }
>>  
>> +/* Expand a call to __builtin_expect_with_probability.  We just return our
>> +   argument 

Re: Fold pointer range checks with equal spans

2018-08-01 Thread Marc Glisse

On Wed, 1 Aug 2018, Richard Sandiford wrote:


+/* For pointers @0 and @2 and nonnegative constant offset @1, look for
+   expressions like:
+
+   A: (@0 + @1 < @2) | (@2 + @1 < @0)
+   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)


Once this is in, we may want to consider the opposite:

(@0 + @1 > @2) & (@2 + @1 > @0)


+ /* Always fails for negative values.  */
+ (if (wi::min_precision (rhs, UNSIGNED) <= TYPE_PRECISION (sizetype))


I guess we could simplify to 'true' in the 'else' case but that's less
interesting.


Turning multiple comparisons of the form P + cst CMP Q + cst into a
range check on P - Q sounds good (we don't really have to restrict to
the case where the range is symmetric). Actually, just turning P + cst
CMP Q + cst into P - Q CMP cst should do it, we should already have code
to handle range checking on integers (modulo the issue of CSE P-Q and
Q-P). But I don't know if a couple :s is sufficient to make this
transformation a good canonicalization.


Yeah.  Like you say, in the cases being handled by the patch, folding the
two comparisons separately and then folding the IOR would require either

(a) folding:
  P + cst < Q
   to the rather unnatural-looking:
  Q - P > -cst
   when tree_swap_operands_p (P, Q) or


Is it unnatural? If it helps other optimizations, it doesn't look that
bad to me ;-)

One issue is if we start mixing forms because only one is single_use:
@0 + @1 < @2 | @1 < @0 - @2


(b) making the range fold handle reversed pointer_diffs (which I guess
   makes more sense).


It would be interesting to have a way to write:

(plus @0 (opposite@1 @0))

which would match if @0 is a-b and @1 is b-a or anything similar (I
don't want to repeat all the cases of negate, minus, pointer_diff, etc
in each transformation), but

(match (opposite (minus @0 @1)) (minus @1 @0))

is not a valid syntax. More likely we would write

(plus @0 @1) (if (opposite_p (@0, @1))

with opposite_p defined outside of match.pd :-(

Maybe there is a way to simulate binary predicates on @0 @1 using unary
predicates on (artificial @0 @1).

(looks like I forgot to add pointer_diff to negate_expr_p)


If we start from a comparison of pointer_plus, I think it would make
sense to use pointer_diff.

I believe currently we try to use pointer operations (pointer_plus,
pointer_diff, lt) only for related pointers and we cast to some integer
type for wilder cases (implementation of std::less in C++ for instance).
On the other hand, in an alias check, the 2 pointers are possibly
unrelated, so maybe the code emitted for an alias check should be
changed.


If we can prove that the pointers are to different objects, it would
be valid to fold the check to "no alias", regardless of the constant
(although in practice we should have weeded out those cases before
generating the check).  In that sense, relying on the UBness of
comparing pointers to different objects would be fine.  If there's a
risk of the check being folded to "alias" when the pointers are known
to point to different objects then that would be more of a problem
(as a missed optimisation).


Assuming they are related, we are also assuming that pointer_diff will
not overflow. But for unrelated objects, in particular on 32bit targets,
pointer differences cannot all be represented by a 32 bit ptrdiff_t (the
differences live in a range twice that size), and doing comparisons on
it can have strange effects. In this particular case, I don't really see
a way things could break (you would somehow need one pointer close to 0
and the other close to 2^32 so they end up close modulo 2^32, but that
would require @2+@1 to overflow which means the loop will never run that
far anyway).

But we are still lying and taking a risk that some other optimization
will trust us. (I am ok with keeping the current alias code, just saying
that it involves a small, possibly negligible risk)

--
Marc Glisse


Re: [PATCH][AArch64] Implement new intrinsics vabsd_s64 and vnegd_s64

2018-08-01 Thread Vlad Lazar

On 31/07/18 22:48, James Greenhalgh wrote:

On Fri, Jul 20, 2018 at 04:37:34AM -0500, Vlad Lazar wrote:

Hi,

The patch adds implementations for the NEON intrinsics vabsd_s64 and vnegd_s64.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/docs/ihi0073/latest/arm-neon-intrinsics-reference-architecture-specification)

Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
regressions.

OK for trunk?

+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vnegd_s64 (int64_t __a)
+{
+  return -__a;
+}


Does this give the correct behaviour for the minimum value of int64_t? That
would be undefined behaviour in C, but well-defined under ACLE.

Thanks,
James



Hi. Thanks for the review.

For the minimum value of int64_t it behaves as the ACLE specifies:
"The negative of the minimum (signed) value is itself."

Thanks,
Vlad


Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-08-01 Thread Kyrill Tkachov

Hi Steve,

On 31/07/18 23:24, Steve Ellcey wrote:

Here is a new version of my patch to support the Aarch64 SIMD ABI [1]
in GCC.  I think this is complete enought to be considered for check
in.  I wrote a few new tests and put them in a new gcc.target/torture
directory so they would be run with multiple optimization options.  I
also verified that there are no regressions in the GCC testsuite.

The significant difference between the standard ARM ABI and the SIMD
ABI is that in the normal ABI a callee saves only the lower 64 bits of
registers V8-V15, in the SIMD ABI the callee must save all 128 bits of
registers V8-V23.

As I mentioned in my RFC, I intend to (eventually) follow this patch
with two more, one to define the TARGET_SIMD_CLONE* macros and one to
improve the GCC register allocation/usage when calling SIMD
functions.  Right now, a caller calling a SIMD function will save more
registers than it needs to because some of those registers will also be
saved by the callee.



Thanks for working on this!
A few comments on the patch inline.

Thanks,
Kyrill


Steve Ellcey
sell...@cavium.com

[1] https://developer.arm.com/products/software-development-tools/hpc/a
rm-compiler-for-hpc/vector-function-abi

Compiler ChangeLog:

2018-07-31  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (V23_REGNUM) New constant.
(simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.

Testsuite ChangeLog:

2018-07-31  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.


gcc-vect-abi.patch


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index af5db9c..99c962f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -423,6 +423,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
@@ -507,6 +508,8 @@ void aarch64_split_simd_move (rtx, rtx);

 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);
 
+extern int aarch64_epilogue_uses (int);

+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
   rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475..9e6827a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1027,6 +1027,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */

+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor 

[C++ Patch, obvious] PR 86661

2018-08-01 Thread Paolo Carlini

Hi,

when I lately changed a couple of permerrors to permerror + warning and 
accurate location for the first call, I went for the simple choice of 
using DECL_SOURCE_LOCATION for the first call and keeping location_of in 
the second call. Turns out we consistently want location_of for both, 
because we may have to handle OVERLOADs. Tested x86_64-linux, committed 
to mainline.


Thanks, Paolo.

/

/cp
2018-08-01  Paolo Carlini  

PR c++/86661
* class.c (note_name_declared_in_class): Use location_of in permerror
instead of DECL_SOURCE_LOCATION (for OVERLOADs).

/testsuite
2018-08-01  Paolo Carlini  

PR c++/86661
* g++.dg/lookup/name-clash12.C: New.
Index: cp/class.c
===
--- cp/class.c  (revision 263197)
+++ cp/class.c  (working copy)
@@ -8285,7 +8285,7 @@ note_name_declared_in_class (tree name, tree decl)
 A name N used in a class S shall refer to the same declaration
 in its context and when re-evaluated in the completed scope of
 S.  */
-  if (permerror (DECL_SOURCE_LOCATION (decl),
+  if (permerror (location_of (decl),
 "declaration of %q#D changes meaning of %qD",
 decl, OVL_NAME (decl)))
inform (location_of ((tree) n->value),
Index: testsuite/g++.dg/lookup/name-clash12.C
===
--- testsuite/g++.dg/lookup/name-clash12.C  (nonexistent)
+++ testsuite/g++.dg/lookup/name-clash12.C  (working copy)
@@ -0,0 +1,9 @@
+// PR c++/86661
+
+typedef int a;  // { dg-message "declared here" }
+namespace {
+class b {
+  a c;
+  template  void a();  // { dg-error "changes meaning" }
+};
+}


  1   2   >