Re: [PATCH 1/4] regcprop: Avoid REG_CFA_REGISTER notes (PR85645)

2018-08-10 Thread Jeff Law
On 08/10/2018 02:55 PM, Segher Boessenkool wrote:
> On Mon, Jun 25, 2018 at 03:34:26PM -0600, Jeff Law wrote:
>> On 06/25/2018 05:53 AM, Segher Boessenkool wrote:
>>> Hi Eric,
>>>
>>> On Wed, May 09, 2018 at 09:22:47AM +0200, Eric Botcazou wrote:
> 2018-05-08  Segher Boessenkool  
>
>   PR rtl-optimization/85645
>   *  regcprop.c (copyprop_hardreg_forward_1): Don't propagate into an
>   insn that has a REG_CFA_REGISTER note.

 OK, thanks.
>>>
>>> Are these patches okay for backport to 8?  At least the first two.
>> Yes.
> 
> And for 7?  (Same two).
Yes.
jeff


Re: [PATCH 1/4] regcprop: Avoid REG_CFA_REGISTER notes (PR85645)

2018-08-10 Thread Segher Boessenkool
On Mon, Jun 25, 2018 at 03:34:26PM -0600, Jeff Law wrote:
> On 06/25/2018 05:53 AM, Segher Boessenkool wrote:
> > Hi Eric,
> > 
> > On Wed, May 09, 2018 at 09:22:47AM +0200, Eric Botcazou wrote:
> >>> 2018-05-08  Segher Boessenkool  
> >>>
> >>>   PR rtl-optimization/85645
> >>>   *  regcprop.c (copyprop_hardreg_forward_1): Don't propagate into an
> >>>   insn that has a REG_CFA_REGISTER note.
> >>
> >> OK, thanks.
> > 
> > Are these patches okay for backport to 8?  At least the first two.
> Yes.

And for 7?  (Same two).


Segher


[PATCH] Unconditionally use MAX_EXPR/MIN_EXPR for MAX/MIN intrinsics

2018-08-10 Thread Janne Blomqvist
For floating point types, the question is what MAX(a, NaN) or MIN(a,
NaN) should return (where "a" is a normal number).  There are valid
usecases for returning either one, but the Fortran standard doesn't
specify which one should be chosen.  Also, there is no consensus among
other tested compilers.  In short, it's a mess.  So lets just do
whatever is fastest, which is using MAX_EXPR/MIN_EXPR which are not
defined to do anything in particular if one of the operands is a NaN.

Regtested on x86_64-pc-linux-gnu, Ok for trunk?

gcc/fortran/ChangeLog:

2018-08-10  Janne Blomqvist  

* trans-intrinsic.c (gfc_conv_intrinsic_minmax): Use
MAX_EXPR/MIN_EXPR unconditionally for real arguments.

gcc/testsuite/ChangeLog:

2018-08-10  Janne Blomqvist  

* gfortran.dg/nan_1.f90: Remove tests that test MAX/MIN with NaNs.
---
 gcc/fortran/trans-intrinsic.c   | 55 ++---
 gcc/testsuite/gfortran.dg/nan_1.f90 | 35 --
 2 files changed, 10 insertions(+), 80 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index c9b5479740c..190fde66a8d 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -3914,8 +3914,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, 
enum tree_code op)
   mvar = gfc_create_var (type, "M");
   gfc_add_modify (>pre, mvar, args[0]);
 
-  internal_fn ifn = op == GT_EXPR ? IFN_FMAX : IFN_FMIN;
-
   for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next)
 {
   tree cond = NULL_TREE;
@@ -3936,49 +3934,16 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
val = gfc_evaluate_now (val, >pre);
 
   tree calc;
-  /* If we dealing with integral types or we don't care about NaNs
-just do a MIN/MAX_EXPR.  */
-  if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type))
-   {
-
- tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR;
- calc = fold_build2_loc (input_location, code, type,
- convert (type, val), mvar);
- tmp = build2_v (MODIFY_EXPR, mvar, calc);
-
-   }
-  /* If we care about NaNs and we have internal functions available for
-fmin/fmax to perform the comparison, use those.  */
-  else if (SCALAR_FLOAT_TYPE_P (type)
- && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED))
-   {
- calc = build_call_expr_internal_loc (input_location, ifn, type,
-   2, mvar, convert (type, val));
- tmp = build2_v (MODIFY_EXPR, mvar, calc);
-
-   }
-  /* Otherwise expand to:
-   mvar = a1;
-   if (a2 .op. mvar || isnan (mvar))
- mvar = a2;
-   if (a3 .op. mvar || isnan (mvar))
- mvar = a3;
-   ...  */
-  else
-   {
- tree isnan = build_call_expr_loc (input_location,
-   builtin_decl_explicit (BUILT_IN_ISNAN),
-   1, mvar);
- tmp = fold_build2_loc (input_location, op, logical_type_node,
-convert (type, val), mvar);
-
- tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,
- logical_type_node, tmp,
- fold_convert (logical_type_node, isnan));
- tmp = build3_v (COND_EXPR, tmp,
- build2_v (MODIFY_EXPR, mvar, convert (type, val)),
- build_empty_stmt (input_location));
-   }
+  /* For floating point types, the question is what MAX(a, NaN) or
+MIN(a, NaN) should return (where "a" is a normal number).
+There are valid usecase for returning either one, but the
+Fortran standard doesn't specify which one should be chosen.
+Also, there is no consensus among other tested compilers.  In
+short, it's a mess.  So lets just do whatever is fastest.  */
+  tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR;
+  calc = fold_build2_loc (input_location, code, type,
+ convert (type, val), mvar);
+  tmp = build2_v (MODIFY_EXPR, mvar, calc);
 
   if (cond != NULL_TREE)
tmp = build3_v (COND_EXPR, cond, tmp,
diff --git a/gcc/testsuite/gfortran.dg/nan_1.f90 
b/gcc/testsuite/gfortran.dg/nan_1.f90
index e64b4ce65e1..1b39cc1f21c 100644
--- a/gcc/testsuite/gfortran.dg/nan_1.f90
+++ b/gcc/testsuite/gfortran.dg/nan_1.f90
@@ -66,35 +66,12 @@ program test
   if (isinf(-nan) .or. isinf(-large) .or. .not. isinf(-inf)) STOP 4
 
   ! Check that MIN and MAX behave correctly
-  if (max(2.0, nan) /= 2.0) STOP 5
-  if (min(2.0, nan) /= 2.0) STOP 6
-  if (max(nan, 2.0) /= 2.0) STOP 7
-  if (min(nan, 2.0) /= 2.0) STOP 8
-
-  if (max(2.d0, nan) /= 2.d0) STOP 9! { dg-warning "Extension: Different type 
kinds" }
-  if (min(2.d0, nan) /= 2.d0) STOP 10! { dg-warning "Extension: Different type 
kinds" }
-  

Re: [PATCH] rs6000: Fix vector homogeneous aggregates (PR86197)

2018-08-10 Thread Segher Boessenkool
On Mon, Jun 25, 2018 at 06:32:27AM -0500, Segher Boessenkool wrote:
> On Tue, Jun 19, 2018 at 10:45:59AM +, Segher Boessenkool wrote:
> > The existing code allows only 4 vectors worth of ieee128 homogeneous
> > aggregates, but it should be 8.  This happens because at one spot it
> > is mistakenly qualified as being passed in floating point registers.
> > 
> > This patch fixes it and makes the code easier to read.  Committing to
> > trunk; needs backports too.
> 
> Backported to 8 now.

And to 7 and 6.


Segher


[PATCH] PR libstdc++/68210 adjust operator new and delete for LWG 206

2018-08-10 Thread Jonathan Wakely

Ensure that nothrow versions of new and delete call the ordinary
versions of new or delete, instead of calling malloc or free directly.

These files are all compiled with -std=gnu++14 so can use noexcept and
nullptr to make the code more readable.

PR libstdc++/68210
* doc/xml/manual/intro.xml: Document LWG 206 change.
* libsupc++/del_op.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept.
* libsupc++/del_opa.cc: Likewise.
* libsupc++/del_opant.cc: Likewise.
* libsupc++/del_opnt.cc: Likewise. Call operator delete(ptr) instead
of free(ptr).
* libsupc++/del_ops.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept.
* libsupc++/del_opsa.cc: Likewise.
* libsupc++/del_opva.cc: Likewise.
* libsupc++/del_opvant.cc: Likewise.
* libsupc++/del_opvnt.cc: Likewise. Call operator delete[](ptr)
instead of operator delete(ptr).
* libsupc++/del_opvs.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept.
* libsupc++/del_opvsa.cc: Likewise.
* libsupc++/new_op.cc: Use __builtin_expect in check for zero size.
* libsupc++/new_opa.cc: Use nullptr instead of literal 0.
* libsupc++/new_opant.cc: Likewise. Replace _GLIBCXX_USE_NOEXCEPT
with noexcept.
* libsupc++/new_opnt.cc: Likewise. Call operator new(sz) instead of
malloc(sz).
* libsupc++/new_opvant.cc: Use nullptr and noexcept.
* libsupc++/new_opvnt.cc: Likewise. Call operator new[](sz) instead of
operator new(sz, nothrow).
* testsuite/18_support/new_nothrow.cc: New test.

Tested x86_64-linux, committed to trunk.


commit ef6c2cd63b0af7fa0823d81d0098361a46230b26
Author: Jonathan Wakely 
Date:   Fri Aug 10 19:59:00 2018 +0100

PR libstdc++/68210 adjust operator new and delete for LWG 206

Ensure that nothrow versions of new and delete call the ordinary
versions of new or delete, instead of calling malloc or free directly.

These files are all compiled with -std=gnu++14 so can use noexcept and
nullptr to make the code more readable.

PR libstdc++/68210
* doc/xml/manual/intro.xml: Document LWG 206 change.
* libsupc++/del_op.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept.
* libsupc++/del_opa.cc: Likewise.
* libsupc++/del_opant.cc: Likewise.
* libsupc++/del_opnt.cc: Likewise. Call operator delete(ptr) instead
of free(ptr).
* libsupc++/del_ops.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept.
* libsupc++/del_opsa.cc: Likewise.
* libsupc++/del_opva.cc: Likewise.
* libsupc++/del_opvant.cc: Likewise.
* libsupc++/del_opvnt.cc: Likewise. Call operator delete[](ptr)
instead of operator delete(ptr).
* libsupc++/del_opvs.cc: Replace _GLIBCXX_USE_NOEXCEPT with 
noexcept.
* libsupc++/del_opvsa.cc: Likewise.
* libsupc++/new_op.cc: Use __builtin_expect in check for zero size.
* libsupc++/new_opa.cc: Use nullptr instead of literal 0.
* libsupc++/new_opant.cc: Likewise. Replace _GLIBCXX_USE_NOEXCEPT
with noexcept.
* libsupc++/new_opnt.cc: Likewise. Call operator new(sz) instead of
malloc(sz).
* libsupc++/new_opvant.cc: Use nullptr and noexcept.
* libsupc++/new_opvnt.cc: Likewise. Call operator new[](sz) instead 
of
operator new(sz, nothrow).
* testsuite/18_support/new_nothrow.cc: New test.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index fea07e2bb5f..cb187e1a2ed 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -440,6 +440,17 @@ requirements of the license of GCC.
 Yes, it can, specifically if EOF is reached while skipping 
whitespace.
 
 
+http://www.w3.org/1999/xlink; xlink:href="#206">206:
+   operator new(size_t, nothrow) may become
+  unlinked to ordinary operator new if ordinary
+  version replaced
+
+
+The nothrow forms of new and delete were
+  changed to call the throwing forms, handling any exception by
+  catching it and returning a null pointer.
+
+
 http://www.w3.org/1999/xlink; xlink:href="#211">211:
operator(istream, string) doesn't set 
failbit
 
diff --git a/libstdc++-v3/libsupc++/del_op.cc b/libstdc++-v3/libsupc++/del_op.cc
index 9a5cb82fdf0..ab3b617afa7 100644
--- a/libstdc++-v3/libsupc++/del_op.cc
+++ b/libstdc++-v3/libsupc++/del_op.cc
@@ -44,7 +44,7 @@ _GLIBCXX_END_NAMESPACE_VERSION
 #pragma GCC diagnostic ignored "-Wsized-deallocation"
 
 _GLIBCXX_WEAK_DEFINITION void
-operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT
+operator delete(void* ptr) noexcept
 {
   std::free(ptr);
 }
diff --git a/libstdc++-v3/libsupc++/del_opa.cc 
b/libstdc++-v3/libsupc++/del_opa.cc
index 71f384df661..2a1f0aba3a1 100644

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-10 Thread Cesar Philippidis
On 08/08/2018 08:19 AM, Tom de Vries wrote:
> On Wed, Aug 08, 2018 at 07:09:16AM -0700, Cesar Philippidis wrote:
>> On 08/07/2018 06:52 AM, Cesar Philippidis wrote:

Thanks for review. This version should address all of the following
remarks. However, one thing to note ...

>> [nvptx] Use CUDA driver API to select default runtime launch geometry
>>
>> 2018-08-YY  Cesar Philippidis  
>>
>>  libgomp/
>>  plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef.
>>  (cuDriverGetVersion): Declare.
>>  (cuOccupancyMaxPotentialBlockSizeWithFlags): Declare.
>>  plugin/plugin-nvptx.c (CUDA_ONE_CALL): Add entries for
>>  cuDriverGetVersion and cuOccupancyMaxPotentialBlockSize.
>>  (ptx_device): Add driver_version member.
>>  (nvptx_open_device): Initialize it.
>>  (nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the
>>  default num_gangs and num_workers when the driver supports it.
>> ---
>>  libgomp/plugin/cuda-lib.def   |  2 ++
>>  libgomp/plugin/cuda/cuda.h|  4 
>>  libgomp/plugin/plugin-nvptx.c | 40 +++-
>>  3 files changed, 45 insertions(+), 1 deletion(-)
>>
>> diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def
>> index be8e3b3..f2433e1 100644
>> --- a/libgomp/plugin/cuda-lib.def
>> +++ b/libgomp/plugin/cuda-lib.def
>> @@ -2,6 +2,7 @@ CUDA_ONE_CALL (cuCtxCreate)
>>  CUDA_ONE_CALL (cuCtxDestroy)
>>  CUDA_ONE_CALL (cuCtxGetCurrent)
>>  CUDA_ONE_CALL (cuCtxGetDevice)
>> +CUDA_ONE_CALL (cuDriverGetVersion)
> 
> Don't use cuDriverGetVersion.
> 
>>  CUDA_ONE_CALL (cuCtxPopCurrent)
>>  CUDA_ONE_CALL (cuCtxPushCurrent)
>>  CUDA_ONE_CALL (cuCtxSynchronize)
>> @@ -39,6 +40,7 @@ CUDA_ONE_CALL (cuModuleGetGlobal)
>>  CUDA_ONE_CALL (cuModuleLoad)
>>  CUDA_ONE_CALL (cuModuleLoadData)
>>  CUDA_ONE_CALL (cuModuleUnload)
>> +CUDA_ONE_CALL (cuOccupancyMaxPotentialBlockSize)
> 
> Use CUDA_ONE_CALL_MAYBE_NULL.
> 
>>  CUDA_ONE_CALL (cuStreamCreate)
>>  CUDA_ONE_CALL (cuStreamDestroy)
>>  CUDA_ONE_CALL (cuStreamQuery)
>> diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
>> index 4799825..3a790e6 100644
>> --- a/libgomp/plugin/cuda/cuda.h
>> +++ b/libgomp/plugin/cuda/cuda.h
>> @@ -44,6 +44,7 @@ typedef void *CUevent;
>>  typedef void *CUfunction;
>>  typedef void *CUlinkState;
>>  typedef void *CUmodule;
>> +typedef size_t (*CUoccupancyB2DSize)(int);
>>  typedef void *CUstream;
>>  
>>  typedef enum {
>> @@ -123,6 +124,7 @@ CUresult cuCtxSynchronize (void);
>>  CUresult cuDeviceGet (CUdevice *, int);
>>  CUresult cuDeviceGetAttribute (int *, CUdevice_attribute, CUdevice);
>>  CUresult cuDeviceGetCount (int *);
>> +CUresult cuDriverGetVersion(int *);
>>  CUresult cuEventCreate (CUevent *, unsigned);
>>  #define cuEventDestroy cuEventDestroy_v2
>>  CUresult cuEventDestroy (CUevent);
>> @@ -170,6 +172,8 @@ CUresult cuModuleGetGlobal (CUdeviceptr *, size_t *, 
>> CUmodule, const char *);
>>  CUresult cuModuleLoad (CUmodule *, const char *);
>>  CUresult cuModuleLoadData (CUmodule *, const void *);
>>  CUresult cuModuleUnload (CUmodule);
>> +CUresult cuOccupancyMaxPotentialBlockSize(int *, int *, CUfunction,
>> +  CUoccupancyB2DSize, size_t, int);
>>  CUresult cuStreamCreate (CUstream *, unsigned);
>>  #define cuStreamDestroy cuStreamDestroy_v2
>>  CUresult cuStreamDestroy (CUstream);
>> diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
>> index 825470a..b0ccf0b 100644
>> --- a/libgomp/plugin/plugin-nvptx.c
>> +++ b/libgomp/plugin/plugin-nvptx.c
>> @@ -376,6 +376,7 @@ struct ptx_device
>>int max_threads_per_block;
>>int max_threads_per_multiprocessor;
>>int default_dims[GOMP_DIM_MAX];
>> +  int driver_version;
>>  
>>struct ptx_image_data *images;  /* Images loaded on device.  */
>>pthread_mutex_t image_lock; /* Lock for above list.  */
>> @@ -687,6 +688,7 @@ nvptx_open_device (int n)
>>ptx_dev->ord = n;
>>ptx_dev->dev = dev;
>>ptx_dev->ctx_shared = false;
>> +  ptx_dev->driver_version = 0;
>>  
>>r = CUDA_CALL_NOCHECK (cuCtxGetDevice, _dev);
>>if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT)
>> @@ -780,6 +782,9 @@ nvptx_open_device (int n)
>>for (int i = 0; i != GOMP_DIM_MAX; i++)
>>  ptx_dev->default_dims[i] = 0;
>>  
>> +  CUDA_CALL_ERET (NULL, cuDriverGetVersion, );
>> +  ptx_dev->driver_version = pi;
>> +
>>ptx_dev->images = NULL;
>>pthread_mutex_init (_dev->image_lock, NULL);
>>  
>> @@ -1173,11 +1178,44 @@ nvptx_exec (void (*fn), size_t mapnum, void 
>> **hostaddrs, void **devaddrs,
>>  
>>{
>>  bool default_dim_p[GOMP_DIM_MAX];
>> +int vectors = nvthd->ptx_dev->default_dims[GOMP_DIM_VECTOR];
>> +int workers = nvthd->ptx_dev->default_dims[GOMP_DIM_WORKER];
>> +int gangs = nvthd->ptx_dev->default_dims[GOMP_DIM_GANG];
>> +

is that I modified the default value for vectors as follows

+   int vectors = default_dim_p[GOMP_DIM_VECTOR]
+ ? 

Re: [PATCH] Make strlen range computations more conservative

2018-08-10 Thread Martin Sebor

On 08/08/2018 11:36 PM, Jeff Law wrote:

On 08/02/2018 09:42 AM, Martin Sebor wrote:


The warning bits are definitely not okay by me.  The purpose
of the warnings (-W{format,sprintf}-{overflow,truncation} is
to detect buffer overflows.  When a warning doesn't have access
to string length information for dynamically created strings
(like the strlen pass does) it uses array sizes as a proxy.
This is useful both to detect possible buffer overflows and
to prevent false positives for overflows that cannot happen
in correctly written programs.

So how much of this falling-back to array sizes as a proxy would become
unnecessary if sprintf had access to the strlen pass as an analysis module?

As you know we've been kicking that around and from my investigations
that doesn't really look hard to do.  Encapsulate the data structures in
a class, break up the statement handling into analysis and optimization
and we should be good to go.  I'm hoping to start prototyping this week.

If we think that has a reasonable chance to eliminate the array-size
fallback, then that seems like the most promising path forward.


We discussed this idea this morning so let me respond here and
reiterate the answer.  Using the strlen data will help detect
buffer overflow where the array size isn't available but it
cannot replace the array size heuristic. Here's a simple
example:

  struct S { char a[8]; };

  char d[8];
  void f (struct S *s, int i)
  {
sprintf (d, "%s-%i",  s[i].a, i);
  }

We don't know the length of s->a but we do know that it can
be up to 7 bytes long (assuming it's nul-terminated of course)
so we know the sprintf call can overflow.  Conversely, if
the size of the destination is increased to 20  the sprintf
call cannot overflow so the diagnostic can be avoided.

Removing the array size heuristic would force us to either give
up on diagnosing the first case or issue false positives for
the second case.  I think the second alternative would make
the warning too noisy to be useful.

The strlen pass will help detect buffer overflows in cases
where the array size isn't known (e.g., with dynamically
allocated buffers) but where the length of the string store
in the array is known.  It will also help avoid false positives
in cases where the string stored in an array of known size is
known to be too short to cause an overflow.  For instance here:

  struct S { char a[8]; };

  char d[8];
  void f (struct S *s, int i)
  {
if (strlen (s->a) < 4 && i >= 0 && i < 100)
  sprintf (d, "%s-%i",  s->a, i);
  }

Martin



Re: [PATCH], Improve PowerPC switch behavior on medium code model system

2018-08-10 Thread Segher Boessenkool
On Tue, Jul 31, 2018 at 10:39:21AM -0400, Michael Meissner wrote:
> This patch adds an insn to load a LABEL_REF into a GPR.  This is needed so the
> FWPROP1 pass can convert the load the of the label address from the TOC to a
> direct load to a GPR.

I don't see why you need a separate RTL insn for this.  It seems to me
that some more generic pattern should accept label_refs.

> While working on the patch, I discovered that the LWA instruction did not
> support indexed loads.  This was due to it using the 'Y' constraint, which
> accepts DS-form offsettable addresses, but not X-form indexed addresses.  I
> added the Z constraint so that the indexed form is accepted.

This part is fine.  Please split it out to a separate patch.

>   * config/rs6000/rs6000.md (extendsi2): Allow reg+reg indexed
>   addressing.

This should say it is changing the constraints.

>   (labelref): New insn to optimize loading a label address into
>   registers on a medium code system.

(*labelref) btw.


Segher


Re: [arm-8-branch] Add Linaro version string and macros

2018-08-10 Thread Yvan Roux
On Fri, 10 Aug 2018 at 17:01, Ramana Radhakrishnan
 wrote:
>
> On Fri, Aug 10, 2018 at 3:35 PM, Yvan Roux  wrote:
> > On Fri, 10 Aug 2018 at 14:31, Yvan Roux  wrote:
> >>
> >> On Fri, 10 Aug 2018 at 13:45, Ramana Radhakrishnan
> >>  wrote:
> >> >
> >> > On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux  wrote:
> >> > > Hi,
> >> > >
> >> > > This patch adds Linaro version string and release macros to ARM GCC 8
> >> > > vendor branch.
> >> > >
> >> > > Ok to commit?
> >> > >
> >> >
> >> >
> >> > Ok if no regressions. (I'm assuming you've built and eyeballed that
> >> > the pre-processor macros are being produced). (I have a patch to
> >> > www-docs for this branch that I'm writing up and should try and get
> >> > out today. Though it would be nice to have tests for these if
> >> > possible.
> >>
> >> I've not passed the regression testsuite since this patch is part of
> >> Linaro vendor branches for a long time, I've just built an x86_64 c
> >> compiler from arm-8-branch and verified the version string and macros:
> >>
> >> $ ~/wip/arm-8-install/bin/gcc --version
> >> gcc (Linaro GCC 8.2-2018.08~dev) 8.2.1 20180802
> >>
> >> $ ~/wip/arm-8-install/bin/gcc -E -dM - < /dev/null | grep LINARO
> >> #define __LINARO_SPIN__ 0
> >> #define __LINARO_RELEASE__ 201808
> >>
> >> I can add some tests, but it will take some time to remember me how
> >> these kind of thing is tested in the testsuite ;)
> >
> > Updated version of the patch, with a test case for Linaro macros.  The
> > test is not strict w/r to the version or spin number to avoid having
> > to update it every release or re-spin, do you think it is sufficient ?
> >
> > Testsuite ran with the included test PASS.
> >
> > gcc/ChangeLog
> > 2018-08-10  Yvan Roux  
> >
> > * LINARO-VERSION: New file.
> > * configure.ac: Add Linaro version string.
> > * configure: Regenerate.
> > * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define.
> > (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition.
> > (cppbuiltin.o): Depend on $(LINAROVER).
> > * cppbuiltin.c (parse_linarover): New.
> > (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.
> >
> > gcc/testsuite/ChangeLog
> > 2018-08-10  Yvan Roux  
> >
> > * gcc.dg/cpp/linaro-macros.c: New test.
>
>
> Looks good, thanks  - can you put the Changelog in a Changelog.arm file ?

Sure, does it need the FSF Copyright lines at the end like other ChangeLogs ?

Thanks
Yvan

> regards
> Ramana


Re: [PATCH v2 3/4] vxworks: enable use of .init_array/.fini_array for cdtors

2018-08-10 Thread Olivier Hainque
Hello Rasmus,

> On 28 Jun 2018, at 10:43, Rasmus Villemoes  wrote:
> 
> Assume that if the user passed --enable-initfini-array when building
> gcc, the rest of the toolchain (including the link spec and linker
> script) is set up appropriately.
> 
> Note that configuring with --enable-initfini-array may prevent the -mrtp
> mode from working,

> due to the (unconditional) use of .init_array.*
> sections instead of .ctors.* -

Just spent some time verifying this, and, indeed getting
a RTP to work with this setup is going to be tough, if at
all possible.

> however, that is the case regardless of this patch.

Right, though the situation becomes a bit different
with the patch as we now have references to the macro,
sort of implying it should work.

I still think it's fine, as what we do is simply honor
what the doc of --enable-initfini-array states: generate
constructor/desctructors in init/fini_array sections,
nothing more, and the comments pretty clearly state that
it's the responsibility of the one who configured this
way to manage the constructors and fit them into the OS
runtime.

And we can revisit if we find a better way out.

So, as it allows at least a use case to operate smoothly
...

> 2018-06-04  Rasmus Villemoes  
> 
> gcc/
>   config/vxworks.c: Set targetm.have_ctors_dtors if 
> HAVE_INITFINI_ARRAY_SUPPORT.
>  config/vxworks.h: Set SUPPORTS_INIT_PRIORITY if HAVE_INITFINI_ARRAY_SUPPORT.

Ok, modulo ChangeLog reformatting:

* config/vxworks.c: Set targetm.have_ctors_dtors if
HAVE_INITFINI_ARRAY_SUPPORT.
* config/vxworks.h: Set SUPPORTS_INIT_PRIORITY if
HAVE_INITFINI_ARRAY_SUPPORT.

Thanks,

Olivier



Re: [RFC][PATCH][mid-end] Optimize immediate choice in comparisons.

2018-08-10 Thread Vlad Lazar

On 09/08/18 06:48, Jeff Law wrote:

On 08/07/2018 02:11 PM, Richard Sandiford wrote:

Hi Vlad,

Thanks for the patch.

Vlad Lazar  writes:

Hi.

This patch optimises the choice of immediates in integer comparisons. Integer
comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
and there are cases where an incremented/decremented immediate can be loaded 
into a
register in fewer instructions. The cases are as follows:

i)   a >  b or a >= b + 1

ii)  a <= b or a <  b + 1
iii) a >= b or a >  b - 1
iv)  a <  b or a <= b - 1

For each comparison we check whether the equivalent can be performed in less 
instructions.
This is done on a statement by statement basis, right before the GIMPLE 
statement is expanded
to RTL. Therefore, it requires a correct implementation of the TARGET_INSN_COST 
hook.
The change is general and it applies to any integer comparison, regardless of 
types or location.

For example, on AArch64 for the following code:

int foo (int x)
{
return x > 0xfefe;
}

it generates:

mov w1, -16777217
cmp w0, w1
csetw0, cs

instead of:

mov w1, 65534
movkw1, 0xfeff, lsl 16
cmp w0, w1
csetw0, hi

Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and 
there are no regressions.


Looks like a useful feature.  I'm playing devil's advocate to some
extent here, but did you consider instead doing this during the
expansion functions themselves?  In some ways that feels more
natural because we're already operating on rtxes at that point.
It also has the advantage that it would trap comparisons that didn't
exist at the gimple level, such as when computing the carry for a
doubleword add/sub.

I've got no strong opinions on doing it in cfgexpand vs the expansion
functions themselves.  I'm happy to have you setting overall direction
here Richard.

I do worry about the amount of RTL we generate and throw away during
cost computation.  Though it's just for comparisons, so it may not be
terrible.

I wouldn't be surprised if ports aren't particularly accurate in their
costing computations for this kind of use -- but that's nothing new.  We
run into it every time we use rtx costing in a new place.  I'm
comfortable having targets fault in improvements for this kind of use.

Jeff


Thank you for the feedback.

I agree with Richard's opinion that the change feels more natural in the
actual expanders.  Therefore, I've reimplemented the feature in the expansion
functions.  It's worth mentioning that it now also applies the optimization
in ternary operators comparisons and I added a test which reflects such a case.

Regarding the amount of RTL generated, I have tried to keep it at a minimum,
by only generating the immediate value moves.

I've bootstrapped and regtested again and there are no regressions.
See the updated patch below.

Thanks,
Vlad

---

This patch optimises the choice of immediates in integer comparisons. Integer
comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
and there are cases where an incremented/decremented immediate can be loaded 
into a
register in fewer instructions. The cases are as follows:
  
i)   a >  b or a >= b + 1

ii)  a <= b or a <  b + 1
iii) a >= b or a >  b - 1
iv)  a <  b or a <= b - 1

For each comparison we check whether the equivalent can be performed in less 
instructions.
This is done in the gimple expanders. Therefore, it requires a correct 
implementation of the
TARGET_INSN_COST hook. The change is general and it applies to any integer 
comparison, regardless
of types or location.

For example, on AArch64 for the following code:

int foo (int x)
{
  return x > 0xfefe;
}

it generates:

mov w1, -16777217
cmp w0, w1
csetw0, cs

instead of:

mov w1, 65534
movkw1, 0xfeff, lsl 16
cmp w0, w1
csetw0, hi

Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and 
there are no regressions.

Thanks,
Vlad

gcc/testsuite/
Changelog for gcc/testsuite/Changelog
2018-08-10  Vlad Lazar  

* gcc.target/aarch64/imm_choice_comparison.c: New.

gcc/
Changelog for gcc/Changelog
2018-08-10  Vlad Lazar  
* expmed.h (canonicalize_comparison, canonicalized_cmp_code): New 
declarations.
* expmed.c (canonicalize_comparison, canonicalized_cmp_code): New 
implementations.
* expmed.c (emit_store_flag_1): Add call to canonicalize_comparison.
* optabs.c (prepare_cmp_insn): Likewise.

---

diff --git a/gcc/expmed.h b/gcc/expmed.h
index 
2890d9c9bbd034f01030dd551d544bf73e73b784..86a32a643fdd0fc9f396bd2c7904244bd484df16
 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -702,6 +702,10 @@ extern rtx emit_store_flag (rtx, enum rtx_code, rtx, rtx, 
machine_mode,
 extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx,
  machine_mode, int, int);
 
+extern void canonicalize_comparison (machine_mode, enum rtx_code *, rtx *);

+
+extern enum rtx_code canonicalized_cmp_code 

Re: [PATCH][AARCH64] inline strlen for 8-bytes aligned strings

2018-08-10 Thread Anton Youdkevitch

Wilco,

On 10.08.2018 18:04, Wilco Dijkstra wrote:

Hi,

A quick benchmark shows it's faster up to about 10 bytes, but after that it 
becomes extremely slow. At 16 bytes it's already 2.5 times slower and for 
larger sizes its over 13 times slower than the GLIBC implementation...


The implementation falls back to the library call if the
string is not aligned.


If it did that for larger sizes then it would be fine. However a byte loop is 
is unacceptably slow.

Also given the large amount of inlined code, it would make sense to handle 
larger sizes than 8. It may be worth comparing a loop doing 8 bytes per 
iteration with the GLIBC strlen or just inline the first 16 bytes and then 
fallback to strlen.
Also if you have statistics that show tiny strlen sizes are much more common 
then the strlen implementation could be further tuned for that.

Valid points, thanks. I will consider that.
BTW, what HW did you use for the benchmarking?


Re: [PATCH][AARCH64] inline strlen for 8-bytes aligned strings

2018-08-10 Thread Wilco Dijkstra
Hi,

A quick benchmark shows it's faster up to about 10 bytes, but after that it 
becomes extremely slow. At 16 bytes it's already 2.5 times slower and for 
larger sizes its over 13 times slower than the GLIBC implementation...

> The implementation falls back to the library call if the
> string is not aligned. 

If it did that for larger sizes then it would be fine. However a byte loop is 
is unacceptably slow.

Also given the large amount of inlined code, it would make sense to handle 
larger sizes than 8. It may be worth comparing a loop doing 8 bytes per 
iteration with the GLIBC strlen or just inline the first 16 bytes and then 
fallback to strlen.

Also if you have statistics that show tiny strlen sizes are much more common 
then the strlen implementation could be further tuned for that.

Wilco


Re: [arm-8-branch] Add Linaro version string and macros

2018-08-10 Thread Ramana Radhakrishnan
On Fri, Aug 10, 2018 at 3:35 PM, Yvan Roux  wrote:
> On Fri, 10 Aug 2018 at 14:31, Yvan Roux  wrote:
>>
>> On Fri, 10 Aug 2018 at 13:45, Ramana Radhakrishnan
>>  wrote:
>> >
>> > On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux  wrote:
>> > > Hi,
>> > >
>> > > This patch adds Linaro version string and release macros to ARM GCC 8
>> > > vendor branch.
>> > >
>> > > Ok to commit?
>> > >
>> >
>> >
>> > Ok if no regressions. (I'm assuming you've built and eyeballed that
>> > the pre-processor macros are being produced). (I have a patch to
>> > www-docs for this branch that I'm writing up and should try and get
>> > out today. Though it would be nice to have tests for these if
>> > possible.
>>
>> I've not passed the regression testsuite since this patch is part of
>> Linaro vendor branches for a long time, I've just built an x86_64 c
>> compiler from arm-8-branch and verified the version string and macros:
>>
>> $ ~/wip/arm-8-install/bin/gcc --version
>> gcc (Linaro GCC 8.2-2018.08~dev) 8.2.1 20180802
>>
>> $ ~/wip/arm-8-install/bin/gcc -E -dM - < /dev/null | grep LINARO
>> #define __LINARO_SPIN__ 0
>> #define __LINARO_RELEASE__ 201808
>>
>> I can add some tests, but it will take some time to remember me how
>> these kind of thing is tested in the testsuite ;)
>
> Updated version of the patch, with a test case for Linaro macros.  The
> test is not strict w/r to the version or spin number to avoid having
> to update it every release or re-spin, do you think it is sufficient ?
>
> Testsuite ran with the included test PASS.
>
> gcc/ChangeLog
> 2018-08-10  Yvan Roux  
>
> * LINARO-VERSION: New file.
> * configure.ac: Add Linaro version string.
> * configure: Regenerate.
> * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define.
> (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition.
> (cppbuiltin.o): Depend on $(LINAROVER).
> * cppbuiltin.c (parse_linarover): New.
> (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.
>
> gcc/testsuite/ChangeLog
> 2018-08-10  Yvan Roux  
>
> * gcc.dg/cpp/linaro-macros.c: New test.


Looks good, thanks  - can you put the Changelog in a Changelog.arm file ?

regards
Ramana


Re: [arm-8-branch] Add Linaro version string and macros

2018-08-10 Thread Yvan Roux
On Fri, 10 Aug 2018 at 14:31, Yvan Roux  wrote:
>
> On Fri, 10 Aug 2018 at 13:45, Ramana Radhakrishnan
>  wrote:
> >
> > On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux  wrote:
> > > Hi,
> > >
> > > This patch adds Linaro version string and release macros to ARM GCC 8
> > > vendor branch.
> > >
> > > Ok to commit?
> > >
> >
> >
> > Ok if no regressions. (I'm assuming you've built and eyeballed that
> > the pre-processor macros are being produced). (I have a patch to
> > www-docs for this branch that I'm writing up and should try and get
> > out today. Though it would be nice to have tests for these if
> > possible.
>
> I've not passed the regression testsuite since this patch is part of
> Linaro vendor branches for a long time, I've just built an x86_64 c
> compiler from arm-8-branch and verified the version string and macros:
>
> $ ~/wip/arm-8-install/bin/gcc --version
> gcc (Linaro GCC 8.2-2018.08~dev) 8.2.1 20180802
>
> $ ~/wip/arm-8-install/bin/gcc -E -dM - < /dev/null | grep LINARO
> #define __LINARO_SPIN__ 0
> #define __LINARO_RELEASE__ 201808
>
> I can add some tests, but it will take some time to remember me how
> these kind of thing is tested in the testsuite ;)

Updated version of the patch, with a test case for Linaro macros.  The
test is not strict w/r to the version or spin number to avoid having
to update it every release or re-spin, do you think it is sufficient ?

Testsuite ran with the included test PASS.

gcc/ChangeLog
2018-08-10  Yvan Roux  

* LINARO-VERSION: New file.
* configure.ac: Add Linaro version string.
* configure: Regenerate.
* Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define.
(CFLAGS-cppbuiltin.o): Add LINAROVER macro definition.
(cppbuiltin.o): Depend on $(LINAROVER).
* cppbuiltin.c (parse_linarover): New.
(define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.

gcc/testsuite/ChangeLog
2018-08-10  Yvan Roux  

* gcc.dg/cpp/linaro-macros.c: New test.
Index: gcc/LINARO-VERSION
===
--- gcc/LINARO-VERSION	(nonexistent)
+++ gcc/LINARO-VERSION	(working copy)
@@ -0,0 +1 @@
+8.2-2018.08~dev
Index: gcc/Makefile.in
===
--- gcc/Makefile.in	(revision 263464)
+++ gcc/Makefile.in	(working copy)
@@ -854,10 +854,12 @@
 DEVPHASE:= $(srcdir)/DEV-PHASE # experimental, prerelease, ""
 DATESTAMP   := $(srcdir)/DATESTAMP # MMDD or empty
 REVISION:= $(srcdir)/REVISION  # [BRANCH revision XX]
+LINAROVER   := $(srcdir)/LINARO-VERSION # M.x-.MM[-S][~dev]
 
 BASEVER_c   := $(shell cat $(BASEVER))
 DEVPHASE_c  := $(shell cat $(DEVPHASE))
 DATESTAMP_c := $(shell cat $(DATESTAMP))
+LINAROVER_c := $(shell cat $(LINAROVER))
 
 ifeq (,$(wildcard $(REVISION)))
 REVISION_c  :=
@@ -884,6 +886,7 @@
   "\"$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(DATESTAMP_c))\""
 PKGVERSION_s:= "\"@PKGVERSION@\""
 BUGURL_s:= "\"@REPORT_BUGS_TO@\""
+LINAROVER_s := "\"$(LINAROVER_c)\""
 
 PKGVERSION  := @PKGVERSION@
 BUGURL_TEXI := @REPORT_BUGS_TEXI@
@@ -2883,8 +2886,9 @@
   -DSTANDARD_EXEC_PREFIX=\"$(libdir)/gcc/\" \
   @TARGET_SYSTEM_ROOT_DEFINE@
 
-CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s)
-cppbuiltin.o: $(BASEVER)
+CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \
+	-DLINAROVER=$(LINAROVER_s)
+cppbuiltin.o: $(BASEVER) $(LINAROVER)
 
 CFLAGS-cppdefault.o += $(PREPROCESSOR_DEFINES)
 
Index: gcc/configure
===
--- gcc/configure	(revision 263464)
+++ gcc/configure	(working copy)
@@ -1726,7 +1726,8 @@
   --with-stabsarrange to use stabs instead of host debug format
   --with-dwarf2   force the default debug format to be DWARF 2
   --with-specs=SPECS  add SPECS to driver command-line processing
-  --with-pkgversion=PKG   Use PKG in the version string in place of "GCC"
+  --with-pkgversion=PKG   Use PKG in the version string in place of "Linaro
+  GCC `cat $srcdir/LINARO-VERSION`"
   --with-bugurl=URL   Direct users to URL to report a bug
   --with-multilib-listselect multilibs (AArch64, SH and x86-64 only)
   --with-gnu-ld   assume the C compiler uses GNU ld default=no
@@ -7649,7 +7650,7 @@
   *)   PKGVERSION="($withval) " ;;
  esac
 else
-  PKGVERSION="(GCC) "
+  PKGVERSION="(Linaro GCC `cat $srcdir/LINARO-VERSION`) "
 
 fi
 
@@ -18448,7 +18449,7 @@
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18451 "configure"
+#line 18452 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18554,7 +18555,7 @@
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18557 "configure"
+#line 18558 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
Index: gcc/configure.ac

Fix even more merging PIC and PIE options

2018-08-10 Thread Jan Hubicka
Hi,
this patch should fix merging of PIC and PIE options so we always resort
to the least common denominator of the object files compiled (i.e. 
linking together -fpic and -fPIE will result in -fpie binary).
Note that we also use information about format of output passed by linker
plugin so we will disable pic/pie when resulting binary is not relocatable.
However for shared libraries and pie we want to pick right mode that makes
sense.

I wrote simple script that tries all possible options of combining two files.
Mode picked is specified in the output file.

To support targets that default to pic/pie well, I had to also hack
lto_write_options to always stream what mode was used in the given file.

lto-bootstrapped/regtested x86_64-linux, OK?

Honza

PR lto/86517
* lto-opts.c (lto_write_options): Always stream PIC/PIE mode.
* lto-wrapper.c (merge_and_complain): Fix merging of PIC/PIE
Index: lto-opts.c
===
--- lto-opts.c  (revision 263356)
+++ lto-opts.c  (working copy)
@@ -78,6 +78,22 @@ lto_write_options (void)
   && !global_options.x_flag_openacc)
 append_to_collect_gcc_options (_obstack, _p,
   "-fno-openacc");
+  /* Append PIC/PIE mode because its default depends on target and it is
+ subject of merging in lto-wrapper.  */
+  if ((!global_options_set.x_flag_pic || global_options.x_flag_pic == 0)
+  && !global_options_set.x_flag_pie)
+{
+   append_to_collect_gcc_options (_obstack, _p,
+ global_options.x_flag_pic == 2
+ ? "-fPIC"
+ : global_options.x_flag_pic == 1
+ ? "-fpic"
+ : global_options.x_flag_pie == 2
+ ? "-fPIE"
+ : global_options.x_flag_pie == 1
+ ? "-fpie"
+ : "-fno-pie");
+}
 
   /* Append options from target hook and store them to offload_lto section.  */
   if (lto_stream_offload_p)
Index: lto-wrapper.c
===
--- lto-wrapper.c   (revision 263356)
+++ lto-wrapper.c   (working copy)
@@ -408,6 +408,11 @@ merge_and_complain (struct cl_decoded_op
  It is a common mistake to mix few -fPIC compiled objects into otherwise
  non-PIC code.  We do not want to build everything with PIC then.
 
+ Similarly we merge PIE options, however in addition we keep
+  -fPIC + -fPIE = -fPIE
+  -fpic + -fPIE = -fpie
+  -fPIC/-fpic + -fpie = -fpie
+
  It would be good to warn on mismatches, but it is bit hard to do as
  we do not know what nothing translates to.  */
 
@@ -415,11 +420,38 @@ merge_and_complain (struct cl_decoded_op
 if ((*decoded_options)[j].opt_index == OPT_fPIC
 || (*decoded_options)[j].opt_index == OPT_fpic)
   {
-   if (!pic_option
-   || (pic_option->value > 0) != ((*decoded_options)[j].value > 0))
- remove_option (decoded_options, j, decoded_options_count);
-   else if (pic_option->opt_index == OPT_fPIC
-&& (*decoded_options)[j].opt_index == OPT_fpic)
+   /* -fno-pic in one unit implies -fno-pic everywhere.  */
+   if ((*decoded_options)[j].value == 0)
+ j++;
+   /* If we have no pic option or merge in -fno-pic, we still may turn
+  existing pic/PIC mode into pie/PIE if -fpie/-fPIE is present.  */
+   else if ((pic_option && pic_option->value == 0)
+|| !pic_option)
+ {
+   if (pie_option)
+ {
+   bool big = (*decoded_options)[j].opt_index == OPT_fPIC
+  && pie_option->opt_index == OPT_fPIE;
+   (*decoded_options)[j].opt_index = big ? OPT_fPIE : OPT_fpie;
+   if (pie_option->value)
+ (*decoded_options)[j].canonical_option[0] = big ? "-fPIE" : 
"-fpie";
+   else
+ (*decoded_options)[j].canonical_option[0] = big ? "-fno-pie" 
: "-fno-pie";
+   (*decoded_options)[j].value = pie_option->value;
+   j++;
+ }
+   else if (pic_option)
+ {
+   (*decoded_options)[j] = *pic_option;
+   j++;
+ }
+   /* We do not know if target defaults to pic or not, so just remove
+  option if it is missing in one unit but enabled in other.  */
+   else
+ remove_option (decoded_options, j, decoded_options_count);
+ }
+   else if (pic_option->opt_index == OPT_fpic
+&& (*decoded_options)[j].opt_index == OPT_fPIC)
  {
(*decoded_options)[j] = *pic_option;
j++;
@@ -430,11 +462,42 @@ merge_and_complain (struct cl_decoded_op
else if 

Re: [PATCH][AARCH64] inline strlen for 8-bytes aligned strings

2018-08-10 Thread Anton Youdkevitch

Richard,

On 10.08.2018 16:54, Richard Earnshaw (lists) wrote:

On 10/08/18 14:38, Anton Youdkevitch wrote:

The patch inlines strlen for 8-byte aligned strings on
AARCH64 like it's done on other platforms (power, s390).
The implementation falls back to the library call if the
string is not aligned. Synthetic testing on Cavium T88
and Cavium T99 showed the following performance gains:

T99: up to 8 bytes: +100%, 100+ bytes - +20%
T88: up 8 bytes: +100%, 100+ bytes - 0%

which seems to be OK as most of the string are short strings.

SPEC performance testing on T99 and T88 did not show any
statistically significant differences.

Bootstrapped and regression-tested on aarch64-linux-gnu.
No new failures found. OK for trunk?

2016-08-10 Anton Youdkevitch 

   * gcc/config/aarch64/aarch64.md (strlen) New pattern.
   (UNSPEC_BUILTIN_STRLEN): Define.
   * gcc/config/aarch64/aarch64.c (aarch64_expand_strlen):
   Expand only in 8-byte aligned case, do not attempt to
   adjust address
   * gcc/config/aarch64/aarch64-protos.h
   (aarch64_expand_strlen): Declare.
   * gcc/testsuite/gcc.target/aarch64/strlen_aligned.c: New



I'm not generally convinced by this kind of optimization.  It can be a
win if the string is short, but it can be a major loss if the string is
long, because the library function can do a number of tricks to improve
performance significantly, such as switching opportunistically to Neon
or SVE, or even just unwinding the loop further.  We might end up with
the bizarre situation where misaligned strings are faster to handle than
aligned strings.

So before even considering this, I'd like to see it benchmarked against
a good library implementation across a range of string lengths.  It is
NOT sufficient to show that it is a win for short strings.

Having said that, a quick dynamic test for the first 8 bytes containing
a NULL byte and calling the library if that is not the case could be a
significant win with minimal overhead for the fail case.

Thank you for the feedback.

I'd had the version your suggested in mind. Will get back with the
updated patch.




R.



---

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index cda2895..9beb289 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -358,6 +358,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx);
  bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
  void aarch64_expand_call (rtx, rtx, bool);
  bool aarch64_expand_movmem (rtx *);
+void aarch64_expand_strlen (rtx *);
  bool aarch64_float_const_zero_rtx_p (rtx);
  bool aarch64_float_const_rtx_p (rtx);
  bool aarch64_function_arg_regno_p (unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4b5183b..d12fb6b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16107,6 +16107,81 @@ aarch64_expand_movmem (rtx *operands)
return true;
  }
  
+/* Emit code to perform a strlen.

+
+   OPERANDS[0] is the destination.
+   OPERANDS[1] is the string.
+   OPERANDS[2] is the char to search.
+   OPERANDS[3] is the alignment.  */
+
+void aarch64_expand_strlen (rtx* operands) {
+  rtx result = operands[0];
+  rtx src = operands[1];
+  rtx loop_label = gen_label_rtx ();
+  rtx end_label = gen_label_rtx ();
+  rtx end_loop_label = gen_label_rtx ();
+  rtx preloop_label = gen_label_rtx ();
+  rtx str = gen_reg_rtx (DImode);
+  rtx addr = force_reg (DImode, XEXP (src, 0));
+  rtx start_addr = gen_reg_rtx(DImode);
+  rtx tmp1 = gen_reg_rtx (DImode);
+  rtx tmp2 = gen_reg_rtx (DImode);
+  rtx tmp3 = gen_reg_rtx (DImode);
+  rtx mask1 = gen_reg_rtx (DImode);
+  rtx mask2 = gen_reg_rtx (DImode);
+  rtx x;
+  rtx mem;
+
+  emit_insn (gen_rtx_SET (start_addr, addr));
+  emit_insn (gen_anddi3 (tmp1, addr, GEN_INT (4096 - 1)));
+  /* if less than 16 bytes left till the end of the page */
+  x = gen_rtx_GT (DImode, tmp1, GEN_INT (4096 - 16));
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+gen_rtx_LABEL_REF (Pmode, preloop_label), pc_rtx);
+
+  emit_move_insn (str, gen_rtx_MEM (DImode, addr));
+  emit_insn (gen_rtx_SET (mask1, GEN_INT (0x0101010101010101)));
+  emit_insn (gen_rtx_SET (mask2, GEN_INT (0x7f7f7f7f7f7f7f7f)));
+
+  /* process the chunk */
+  emit_insn (gen_subdi3 (tmp1, str, mask1));
+  emit_insn (gen_iordi3 (tmp2, str, mask2));
+  emit_insn (gen_rtx_SET (tmp2, gen_rtx_NOT (DImode, tmp2)));
+  emit_insn (gen_anddi3 (tmp3, tmp1, tmp2));
+
+
+  /* if NULL found jump to calculate it's exact position */
+  x = gen_rtx_NE (DImode, tmp3, GEN_INT (0));
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+gen_rtx_LABEL_REF (Pmode, end_loop_label), pc_rtx);
+  emit_jump_insn (gen_rtx_SET (pc_rtx, x));
+
+  emit_insn (gen_adddi3 (addr, addr, GEN_INT (8)));
+  emit_label (preloop_label);
+  mem = gen_rtx_POST_MODIFY (DImode, addr, plus_constant (DImode, addr, 1));
+
+  /* simple byte loop */
+  emit_label 

Re: [RFC][PATCH v2] PR preprocessor/83173: Additional check before decrementing highest_location

2018-08-10 Thread Mike Gulick
On 05/29/2018 11:25 AM, Mike Gulick wrote:
> On 03/04/2018 02:27 PM, Mike Gulick wrote:
>>
>>
>> On 02/09/2018 05:54 PM, Mike Gulick wrote:
>>> Hi David,
>>>
>>> Here is a new version of the linemap patch (see my earlier emails for an 
>>> updated
>>> version of the test code).
>>
>> 
>>
>> Hi David,
>>
>> Just wondering if you have had a chance to look at these updated patches
>> yet.  Let me know if you have any questions I can answer, or if there is
>> anything you would like me to do that would make reviewing them easier
>> (reposting, rebasing, refactoring the bug fix from the diagnostics
>> change in the last patch).
>>
>> The most recent postings are:
>> Bug fix patch: https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00557.html
>> Test case: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01993.html
>>
>> Thanks,
>> Mike
>>
> 
> Hi David,
> 
> Now that gcc 8.1 is out the door, would you have any time to review these
> patches?  I re-tested them after rebasing on the latest git master, and
> everything still behaved as expected.  I can post the rebased patches if you
> would like, but it was a trivial merge with no conflicts.
> 
> Thanks,
> Mike
> 

I'd still like to get this bug fixed.  I don't think I need the [RFC] tag any
more on these patches.  Should I post a new thread to this list to get the ball
rolling again?



Re: [RFC,PATCH] Introduce -msdata=explicit for powerpc

2018-08-10 Thread Segher Boessenkool
Hi Alexandre,

On Thu, Aug 09, 2018 at 03:23:12AM -0300, Alexandre Oliva wrote:
> On Aug  8, 2018, Segher Boessenkool  wrote:
> 
> > Then you get sdata2 used (via srodata in generic code), and it is accessed
> > via GPR2 (via the sda21 reloc and linker magic).  It is hard to trace down 
> > :-)
> 
> Aah, it didn't occur to me that the r2 uses could be introduced by the
> linker.  I was cutting corners and looking at the asm output, after
> building only gcc, without binutils.  Thanks for the explanation.  Mind
> if I add a comment about that next to... hmm...  how about the
> SMALL_DATA_* macros?

This is fine, please commit to trunk.  Thanks!


Segher


Re: [PATCH][AARCH64] inline strlen for 8-bytes aligned strings

2018-08-10 Thread Richard Earnshaw (lists)
On 10/08/18 14:38, Anton Youdkevitch wrote:
> The patch inlines strlen for 8-byte aligned strings on
> AARCH64 like it's done on other platforms (power, s390).
> The implementation falls back to the library call if the
> string is not aligned. Synthetic testing on Cavium T88
> and Cavium T99 showed the following performance gains:
> 
> T99: up to 8 bytes: +100%, 100+ bytes - +20%
> T88: up 8 bytes: +100%, 100+ bytes - 0%
> 
> which seems to be OK as most of the string are short strings.
> 
> SPEC performance testing on T99 and T88 did not show any
> statistically significant differences.
> 
> Bootstrapped and regression-tested on aarch64-linux-gnu.
> No new failures found. OK for trunk?
> 
> 2016-08-10 Anton Youdkevitch 
> 
>   * gcc/config/aarch64/aarch64.md (strlen) New pattern.
>   (UNSPEC_BUILTIN_STRLEN): Define.
>   * gcc/config/aarch64/aarch64.c (aarch64_expand_strlen):
>   Expand only in 8-byte aligned case, do not attempt to
>   adjust address
>   * gcc/config/aarch64/aarch64-protos.h
>   (aarch64_expand_strlen): Declare.
>   * gcc/testsuite/gcc.target/aarch64/strlen_aligned.c: New


I'm not generally convinced by this kind of optimization.  It can be a
win if the string is short, but it can be a major loss if the string is
long, because the library function can do a number of tricks to improve
performance significantly, such as switching opportunistically to Neon
or SVE, or even just unwinding the loop further.  We might end up with
the bizarre situation where misaligned strings are faster to handle than
aligned strings.

So before even considering this, I'd like to see it benchmarked against
a good library implementation across a range of string lengths.  It is
NOT sufficient to show that it is a win for short strings.

Having said that, a quick dynamic test for the first 8 bytes containing
a NULL byte and calling the library if that is not the case could be a
significant win with minimal overhead for the fail case.

R.

> 
> ---
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index cda2895..9beb289 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -358,6 +358,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx);
>  bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
>  void aarch64_expand_call (rtx, rtx, bool);
>  bool aarch64_expand_movmem (rtx *);
> +void aarch64_expand_strlen (rtx *);
>  bool aarch64_float_const_zero_rtx_p (rtx);
>  bool aarch64_float_const_rtx_p (rtx);
>  bool aarch64_function_arg_regno_p (unsigned);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 4b5183b..d12fb6b 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -16107,6 +16107,81 @@ aarch64_expand_movmem (rtx *operands)
>return true;
>  }
>  
> +/* Emit code to perform a strlen.
> +
> +   OPERANDS[0] is the destination.
> +   OPERANDS[1] is the string.
> +   OPERANDS[2] is the char to search.
> +   OPERANDS[3] is the alignment.  */
> +
> +void aarch64_expand_strlen (rtx* operands) {
> +  rtx result = operands[0];
> +  rtx src = operands[1];
> +  rtx loop_label = gen_label_rtx ();
> +  rtx end_label = gen_label_rtx ();
> +  rtx end_loop_label = gen_label_rtx ();
> +  rtx preloop_label = gen_label_rtx ();
> +  rtx str = gen_reg_rtx (DImode);
> +  rtx addr = force_reg (DImode, XEXP (src, 0));
> +  rtx start_addr = gen_reg_rtx(DImode);
> +  rtx tmp1 = gen_reg_rtx (DImode);
> +  rtx tmp2 = gen_reg_rtx (DImode);
> +  rtx tmp3 = gen_reg_rtx (DImode);
> +  rtx mask1 = gen_reg_rtx (DImode);
> +  rtx mask2 = gen_reg_rtx (DImode);
> +  rtx x;
> +  rtx mem;
> +
> +  emit_insn (gen_rtx_SET (start_addr, addr));
> +  emit_insn (gen_anddi3 (tmp1, addr, GEN_INT (4096 - 1)));
> +  /* if less than 16 bytes left till the end of the page */
> +  x = gen_rtx_GT (DImode, tmp1, GEN_INT (4096 - 16));
> +  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
> +gen_rtx_LABEL_REF (Pmode, preloop_label), 
> pc_rtx);
> +
> +  emit_move_insn (str, gen_rtx_MEM (DImode, addr));
> +  emit_insn (gen_rtx_SET (mask1, GEN_INT (0x0101010101010101)));
> +  emit_insn (gen_rtx_SET (mask2, GEN_INT (0x7f7f7f7f7f7f7f7f)));
> +
> +  /* process the chunk */
> +  emit_insn (gen_subdi3 (tmp1, str, mask1));
> +  emit_insn (gen_iordi3 (tmp2, str, mask2));
> +  emit_insn (gen_rtx_SET (tmp2, gen_rtx_NOT (DImode, tmp2)));
> +  emit_insn (gen_anddi3 (tmp3, tmp1, tmp2));
> +
> +
> +  /* if NULL found jump to calculate it's exact position */
> +  x = gen_rtx_NE (DImode, tmp3, GEN_INT (0));
> +  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
> +gen_rtx_LABEL_REF (Pmode, end_loop_label), 
> pc_rtx);
> +  emit_jump_insn (gen_rtx_SET (pc_rtx, x));
> +
> +  emit_insn (gen_adddi3 (addr, addr, GEN_INT (8)));
> +  emit_label (preloop_label);
> +  mem = gen_rtx_POST_MODIFY (DImode, addr, plus_constant (DImode, addr, 1));
> +
> +  /* simple byte loop */
> +  

[PATCH][AARCH64] inline strlen for 8-bytes aligned strings

2018-08-10 Thread Anton Youdkevitch
The patch inlines strlen for 8-byte aligned strings on
AARCH64 like it's done on other platforms (power, s390).
The implementation falls back to the library call if the
string is not aligned. Synthetic testing on Cavium T88
and Cavium T99 showed the following performance gains:

T99: up to 8 bytes: +100%, 100+ bytes - +20%
T88: up 8 bytes: +100%, 100+ bytes - 0%

which seems to be OK as most of the string are short strings.

SPEC performance testing on T99 and T88 did not show any
statistically significant differences.

Bootstrapped and regression-tested on aarch64-linux-gnu.
No new failures found. OK for trunk?

2016-08-10 Anton Youdkevitch 

  * gcc/config/aarch64/aarch64.md (strlen) New pattern.
  (UNSPEC_BUILTIN_STRLEN): Define.
  * gcc/config/aarch64/aarch64.c (aarch64_expand_strlen):
  Expand only in 8-byte aligned case, do not attempt to
  adjust address
  * gcc/config/aarch64/aarch64-protos.h
  (aarch64_expand_strlen): Declare.
  * gcc/testsuite/gcc.target/aarch64/strlen_aligned.c: New

---

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index cda2895..9beb289 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -358,6 +358,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
 void aarch64_expand_call (rtx, rtx, bool);
 bool aarch64_expand_movmem (rtx *);
+void aarch64_expand_strlen (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_float_const_rtx_p (rtx);
 bool aarch64_function_arg_regno_p (unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4b5183b..d12fb6b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16107,6 +16107,81 @@ aarch64_expand_movmem (rtx *operands)
   return true;
 }
 
+/* Emit code to perform a strlen.
+
+   OPERANDS[0] is the destination.
+   OPERANDS[1] is the string.
+   OPERANDS[2] is the char to search.
+   OPERANDS[3] is the alignment.  */
+
+void aarch64_expand_strlen (rtx* operands) {
+  rtx result = operands[0];
+  rtx src = operands[1];
+  rtx loop_label = gen_label_rtx ();
+  rtx end_label = gen_label_rtx ();
+  rtx end_loop_label = gen_label_rtx ();
+  rtx preloop_label = gen_label_rtx ();
+  rtx str = gen_reg_rtx (DImode);
+  rtx addr = force_reg (DImode, XEXP (src, 0));
+  rtx start_addr = gen_reg_rtx(DImode);
+  rtx tmp1 = gen_reg_rtx (DImode);
+  rtx tmp2 = gen_reg_rtx (DImode);
+  rtx tmp3 = gen_reg_rtx (DImode);
+  rtx mask1 = gen_reg_rtx (DImode);
+  rtx mask2 = gen_reg_rtx (DImode);
+  rtx x;
+  rtx mem;
+
+  emit_insn (gen_rtx_SET (start_addr, addr));
+  emit_insn (gen_anddi3 (tmp1, addr, GEN_INT (4096 - 1)));
+  /* if less than 16 bytes left till the end of the page */
+  x = gen_rtx_GT (DImode, tmp1, GEN_INT (4096 - 16));
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+gen_rtx_LABEL_REF (Pmode, preloop_label), pc_rtx);
+
+  emit_move_insn (str, gen_rtx_MEM (DImode, addr));
+  emit_insn (gen_rtx_SET (mask1, GEN_INT (0x0101010101010101)));
+  emit_insn (gen_rtx_SET (mask2, GEN_INT (0x7f7f7f7f7f7f7f7f)));
+
+  /* process the chunk */
+  emit_insn (gen_subdi3 (tmp1, str, mask1));
+  emit_insn (gen_iordi3 (tmp2, str, mask2));
+  emit_insn (gen_rtx_SET (tmp2, gen_rtx_NOT (DImode, tmp2)));
+  emit_insn (gen_anddi3 (tmp3, tmp1, tmp2));
+
+
+  /* if NULL found jump to calculate it's exact position */
+  x = gen_rtx_NE (DImode, tmp3, GEN_INT (0));
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+gen_rtx_LABEL_REF (Pmode, end_loop_label), pc_rtx);
+  emit_jump_insn (gen_rtx_SET (pc_rtx, x));
+
+  emit_insn (gen_adddi3 (addr, addr, GEN_INT (8)));
+  emit_label (preloop_label);
+  mem = gen_rtx_POST_MODIFY (DImode, addr, plus_constant (DImode, addr, 1));
+
+  /* simple byte loop */
+  emit_label (loop_label);
+  emit_move_insn (str, gen_rtx_ZERO_EXTEND (DImode, gen_rtx_MEM (QImode, 
mem)));
+  x = gen_rtx_NE (SImode, str, GEN_INT(0));
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, gen_rtx_LABEL_REF (Pmode, 
loop_label), pc_rtx);
+  emit_jump_insn (gen_rtx_SET (pc_rtx, x));
+
+  emit_insn (gen_subdi3 (result, addr, start_addr));
+  /* adjusting after the last post-decrement */
+  emit_insn (gen_adddi3 (result, result, GEN_INT (-1)));
+  emit_jump_insn (gen_jump (end_label));
+  emit_barrier ();
+
+  emit_label (end_loop_label);
+  emit_insn (gen_bswapdi2 (tmp3, tmp3));
+  emit_insn (gen_clzdi2 (tmp3, tmp3));
+  emit_insn (gen_ashrdi3 (tmp3, tmp3, GEN_INT (3)));
+  emit_move_insn (result, tmp3);
+
+  emit_label(end_label);
+}
+
 /* Split a DImode store of a CONST_INT SRC to MEM DST as two
SImode stores.  Handle the case when the constant has identical
bottom and top halves.  This is beneficial when the two stores can be
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 10fcde6..7c60b69 100644
--- a/gcc/config/aarch64/aarch64.md
+++ 

Re: [arm-8-branch] Add Linaro version string and macros

2018-08-10 Thread Yvan Roux
On Fri, 10 Aug 2018 at 13:45, Ramana Radhakrishnan
 wrote:
>
> On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux  wrote:
> > Hi,
> >
> > This patch adds Linaro version string and release macros to ARM GCC 8
> > vendor branch.
> >
> > Ok to commit?
> >
>
>
> Ok if no regressions. (I'm assuming you've built and eyeballed that
> the pre-processor macros are being produced). (I have a patch to
> www-docs for this branch that I'm writing up and should try and get
> out today. Though it would be nice to have tests for these if
> possible.

I've not passed the regression testsuite since this patch is part of
Linaro vendor branches for a long time, I've just built an x86_64 c
compiler from arm-8-branch and verified the version string and macros:

$ ~/wip/arm-8-install/bin/gcc --version
gcc (Linaro GCC 8.2-2018.08~dev) 8.2.1 20180802

$ ~/wip/arm-8-install/bin/gcc -E -dM - < /dev/null | grep LINARO
#define __LINARO_SPIN__ 0
#define __LINARO_RELEASE__ 201808

I can add some tests, but it will take some time to remember me how
these kind of thing is tested in the testsuite ;)

> regards
> Ramana
>
>
> > Thanks
> > Yvan
> >
> > gcc/ChangeLog
> > 2018-08-10  Yvan Roux  
> >
> > * LINARO-VERSION: New file.
> > * configure.ac: Add Linaro version string.
> > * configure: Regenerate.
> > * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define.
> > (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition.
> > (cppbuiltin.o): Depend on $(LINAROVER).
> > * cppbuiltin.c (parse_linarover): New.
> > (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.


Re: [arm-8-branch] Add Linaro version string and macros

2018-08-10 Thread Ramana Radhakrishnan
On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux  wrote:
> Hi,
>
> This patch adds Linaro version string and release macros to ARM GCC 8
> vendor branch.
>
> Ok to commit?
>


Ok if no regressions. (I'm assuming you've built and eyeballed that
the pre-processor macros are being produced). (I have a patch to
www-docs for this branch that I'm writing up and should try and get
out today. Though it would be nice to have tests for these if
possible.

regards
Ramana


> Thanks
> Yvan
>
> gcc/ChangeLog
> 2018-08-10  Yvan Roux  
>
> * LINARO-VERSION: New file.
> * configure.ac: Add Linaro version string.
> * configure: Regenerate.
> * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define.
> (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition.
> (cppbuiltin.o): Depend on $(LINAROVER).
> * cppbuiltin.c (parse_linarover): New.
> (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.


[RFC PATCH, i386]: Deprecate -mmitigate-rop

2018-08-10 Thread Uros Bizjak
This option is fairly ineffective, and in the light of CET, nobody
seems interested to improve it. Deprecate the option, so it won't lure
developers to the land of false security.

2018-08-10  Uros Bizjak  

* config/i386/i386.opt (mmitigate-rop): Mark as deprecated.
* doc/invoke.texi (mmitigate-rop): Remove.
* config/i386/i386.c: Do not include "regrename.h".
(ix86_rop_should_change_byte_p, reg_encoded_number)
(ix86_get_modrm_for_rop, set_rop_modrm_reg_bits, ix86_mitigate_rop):
Remove.
(ix86_reorg): Remove call to ix86_mitigate_rop.
* config/i386/i386.md (attr "modrm_class"): Remove.
(cmp_ccno_1, mov_xor, movstrict_xor,
x86_movcc_0_m1. x86_movcc_0_m1_se)
(x86_movcc_0_m1_neg): Remove modrm_class attribute override.

testsuite/Changelog:

2018-08-10  Uros Bizjak  

* gcc.target/i386/rop1.c: Remove.
* gcc.target/i386/pr83554 (dg-options): Remove -mmitigate-rop.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Any opinion against the deprecation?

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7554fd1f6599..d70e92124ddc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -75,7 +75,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-iterator.h"
 #include "dbgcnt.h"
 #include "case-cfn-macros.h"
-#include "regrename.h"
 #include "dojump.h"
 #include "fold-const-call.h"
 #include "tree-vrp.h"
@@ -3135,15 +3134,6 @@ ix86_debug_options (void)
   return;
 }
 
-/* Return true if T is one of the bytes we should avoid with
-   -mmitigate-rop.  */
-
-static bool
-ix86_rop_should_change_byte_p (int t)
-{
-  return t == 0xc2 || t == 0xc3 || t == 0xca || t == 0xcb;
-}
-
 static const char *stringop_alg_names[] = {
 #define DEF_ENUM
 #define DEF_ALG(alg, name) #name,
@@ -29110,98 +29100,6 @@ ix86_instantiate_decls (void)
   instantiate_decl_rtl (s->rtl);
 }
 
-/* Return the number used for encoding REG, in the range 0..7.  */
-
-static int
-reg_encoded_number (rtx reg)
-{
-  unsigned regno = REGNO (reg);
-  switch (regno)
-{
-case AX_REG:
-  return 0;
-case CX_REG:
-  return 1;
-case DX_REG:
-  return 2;
-case BX_REG:
-  return 3;
-case SP_REG:
-  return 4;
-case BP_REG:
-  return 5;
-case SI_REG:
-  return 6;
-case DI_REG:
-  return 7;
-default:
-  break;
-}
-  if (IN_RANGE (regno, FIRST_STACK_REG, LAST_STACK_REG))
-return regno - FIRST_STACK_REG;
-  if (IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG))
-return regno - FIRST_SSE_REG;
-  if (IN_RANGE (regno, FIRST_MMX_REG, LAST_MMX_REG))
-return regno - FIRST_MMX_REG;
-  if (IN_RANGE (regno, FIRST_REX_SSE_REG, LAST_REX_SSE_REG))
-return regno - FIRST_REX_SSE_REG;
-  if (IN_RANGE (regno, FIRST_REX_INT_REG, LAST_REX_INT_REG))
-return regno - FIRST_REX_INT_REG;
-  if (IN_RANGE (regno, FIRST_MASK_REG, LAST_MASK_REG))
-return regno - FIRST_MASK_REG;
-  return -1;
-}
-
-/* Given an insn INSN with NOPERANDS OPERANDS, return the modr/m byte used
-   in its encoding if it could be relevant for ROP mitigation, otherwise
-   return -1.  If POPNO0 and POPNO1 are nonnull, store the operand numbers
-   used for calculating it into them.  */
-
-static int
-ix86_get_modrm_for_rop (rtx_insn *insn, rtx *operands, int noperands,
-   int *popno0 = 0, int *popno1 = 0)
-{
-  if (asm_noperands (PATTERN (insn)) >= 0)
-return -1;
-  int has_modrm = get_attr_modrm (insn);
-  if (!has_modrm)
-return -1;
-  enum attr_modrm_class cls = get_attr_modrm_class (insn);
-  rtx op0, op1;
-  switch (cls)
-{
-case MODRM_CLASS_OP02:
-  gcc_assert (noperands >= 3);
-  if (popno0)
-   {
- *popno0 = 0;
- *popno1 = 2;
-   }
-  op0 = operands[0];
-  op1 = operands[2];
-  break;
-case MODRM_CLASS_OP01:
-  gcc_assert (noperands >= 2);
-  if (popno0)
-   {
- *popno0 = 0;
- *popno1 = 1;
-   }
-  op0 = operands[0];
-  op1 = operands[1];
-  break;
-default:
-  return -1;
-}
-  if (REG_P (op0) && REG_P (op1))
-{
-  int enc0 = reg_encoded_number (op0);
-  int enc1 = reg_encoded_number (op1);
-  return 0xc0 + (enc1 << 3) + enc0;
-}
-  return -1;
-}
-
 /* Check whether x86 address PARTS is a pc-relative address.  */
 
 bool
@@ -42215,215 +42113,6 @@ ix86_seh_fixup_eh_fallthru (void)
 }
 }
 
-/* Given a register number BASE, the lowest of a group of registers, update
-   regsets IN and OUT with the registers that should be avoided in input
-   and output operands respectively when trying to avoid generating a modr/m
-   byte for -mmitigate-rop.  */
-
-static void
-set_rop_modrm_reg_bits (int base, HARD_REG_SET , HARD_REG_SET )
-{
-  SET_HARD_REG_BIT (out, base);
-  SET_HARD_REG_BIT (out, base + 1);
-  SET_HARD_REG_BIT (in, base + 2);
-  SET_HARD_REG_BIT (in, base + 3);
-}
-
-/* Called if -mmitigate-rop is in effect.  Try to rewrite 

Re: [PATCH][OBVIOUS] Remove extra line in MAINTAINERS.

2018-08-10 Thread Martin Liška
On 08/10/2018 01:12 PM, Paolo Carlini wrote:
> Hi,
> 
> On 10/08/2018 12:44, Martin Liška wrote:
>> On 08/10/2018 12:38 PM, Paolo Carlini wrote:
>>> Hi,
>>>
>>> On 10/08/2018 12:35, Martin Liška wrote:
 Hi.

 This removes one extra line that's a typo.
>>> It's not ;) The complete line is "c++ runtime libs special modes". See? 
>>> Admittedly, we may want to have something clearer, but just removing that 
>>> line isn't the way to go.
>>>
>>> Paolo.
>> Oup, I was too eager.
>> So then what about putting "c++ runtime libs special modes" on one line?
> Yes, that would be nice, but I think Francois didn't manage to do that while 
> keeping a consistent formatting, that line is *long*!
> 
> Paolo.

Then I'm going to install following patch.

Martin
>From c72c46fce810b63e162ebcce8abc2fd1d44e2181 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 10 Aug 2018 13:36:53 +0200
Subject: [PATCH] Fix wrongly removed line.

ChangeLog:

2018-08-10  Martin Liska  

	* MAINTAINERS: Revert change in previous commit and
join lines.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8d42c91b2d7..a9f20d73afe 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -188,7 +188,7 @@ c++ runtime libs	Paolo Carlini		
 c++ runtime libs	Ulrich Drepper		
 c++ runtime libs	Benjamin De Kosnik	
 c++ runtime libs	Jonathan Wakely		
-special modes		François Dumont		
+c++ runtime libs special modes		François Dumont		
 fixincludes		Bruce Korb		
 *gimpl*			Jakub Jelinek		
 *gimpl*			Aldy Hernandez		
-- 
2.18.0



Re: Improve safe iterator move semantic

2018-08-10 Thread Ville Voutilainen
On 10 August 2018 at 13:47, Jonathan Wakely  wrote:
> Doing a test like this with TSan should be the absolute minimum
> required for any change to the mutex locking policy.

Agreed. Concurrency code is something that our test suite is not
well-equipped to test (because
it doesn't support TSan and such yet), so it would seem prudent to
stress-test such patches
via testsuite-external means.

> I'm not aware of people complaining about the performance of debug
> mode anyway. Everybody I speak to is happy to accept a performance hit
> in order to get checking.

Yep; while it's nice to have performance improvements in debug mode,
there are probably more
important and significant ways to improve it..

> I think https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86843 would be
> more helpful to our users, as it would allow some Debug Mode checks to
> be enabled in programs that can't currently use it (because
> recompiling the entire program is not possible).

..like this one, which would be a major usability improvement.


Re: [PATCH][OBVIOUS] Remove extra line in MAINTAINERS.

2018-08-10 Thread Paolo Carlini

Hi,

On 10/08/2018 12:44, Martin Liška wrote:

On 08/10/2018 12:38 PM, Paolo Carlini wrote:

Hi,

On 10/08/2018 12:35, Martin Liška wrote:

Hi.

This removes one extra line that's a typo.

It's not ;) The complete line is "c++ runtime libs special modes". See? 
Admittedly, we may want to have something clearer, but just removing that line isn't the 
way to go.

Paolo.

Oup, I was too eager.
So then what about putting "c++ runtime libs special modes" on one line?
Yes, that would be nice, but I think Francois didn't manage to do that 
while keeping a consistent formatting, that line is *long*!


Paolo.


Re: [PATCH] Remove not needed __builtin_expect due to malloc predictor.

2018-08-10 Thread Jonathan Wakely

On 10/08/18 12:32 +0200, Martin Liška wrote:

Hi.

After we introduced new non-NULL malloc predictor, we can remove these 
__builtin_expects.
Predictors will change in following way:

Before:

Predictions for bb 5
 first match heuristics: 10.00%
 combined heuristics: 10.00%
 __builtin_expect heuristics of edge 5->6: 10.00%
 call heuristics of edge 5->6 (ignored): 33.00%
 loop exit heuristics of edge 5->9 (ignored): 5.50%

After:

Predictions for bb 5
 first match heuristics: 0.04%
 combined heuristics: 0.04%
 pointer (on trees) heuristics of edge 5->6 (ignored): 30.00%
 malloc returned non-NULL heuristics of edge 5->6: 0.04%
 call heuristics of edge 5->6 (ignored): 33.00%
 loop exit heuristics of edge 5->9 (ignored): 5.50%


Maybe there are similar allocation-related expects, but I haven't found them.
Ready after it survives regression tests?


OK for trunk - thanks!



Martin

libstdc++-v3/ChangeLog:

2018-08-10  Martin Liska  

* libsupc++/new_op.cc (new): Remove __builtin_expect as malloc
   predictor can handle that.
* libsupc++/new_opa.cc: Likewise.
* libsupc++/new_opnt.cc (new): Likewise.
---
libstdc++-v3/libsupc++/new_op.cc   | 2 +-
libstdc++-v3/libsupc++/new_opa.cc  | 2 +-
libstdc++-v3/libsupc++/new_opnt.cc | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)





diff --git a/libstdc++-v3/libsupc++/new_op.cc b/libstdc++-v3/libsupc++/new_op.cc
index 3a1e38d9df7..3caa0bab2ea 100644
--- a/libstdc++-v3/libsupc++/new_op.cc
+++ b/libstdc++-v3/libsupc++/new_op.cc
@@ -47,7 +47,7 @@ operator new (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc)
  if (sz == 0)
sz = 1;

-  while (__builtin_expect ((p = malloc (sz)) == 0, false))
+  while ((p = malloc (sz)) == 0)
{
  new_handler handler = std::get_new_handler ();
  if (! handler)
diff --git a/libstdc++-v3/libsupc++/new_opa.cc 
b/libstdc++-v3/libsupc++/new_opa.cc
index 68eac5b8ceb..a27ff843ca1 100644
--- a/libstdc++-v3/libsupc++/new_opa.cc
+++ b/libstdc++-v3/libsupc++/new_opa.cc
@@ -126,7 +126,7 @@ operator new (std::size_t sz, std::align_val_t al)
#endif

  using __gnu_cxx::aligned_alloc;
-  while (__builtin_expect ((p = aligned_alloc (align, sz)) == 0, false))
+  while ((p = aligned_alloc (align, sz)) == 0)
{
  new_handler handler = std::get_new_handler ();
  if (! handler)
diff --git a/libstdc++-v3/libsupc++/new_opnt.cc 
b/libstdc++-v3/libsupc++/new_opnt.cc
index a2dc33ad4d3..faab44e66c2 100644
--- a/libstdc++-v3/libsupc++/new_opnt.cc
+++ b/libstdc++-v3/libsupc++/new_opnt.cc
@@ -40,7 +40,7 @@ operator new (std::size_t sz, const std::nothrow_t&) 
_GLIBCXX_USE_NOEXCEPT
  if (sz == 0)
sz = 1;

-  while (__builtin_expect ((p = malloc (sz)) == 0, false))
+  while ((p = malloc (sz)) == 0)
{
  new_handler handler = std::get_new_handler ();
  if (! handler)





Re: Improve safe iterator move semantic

2018-08-10 Thread Jonathan Wakely

On 10/08/18 11:00 +0100, Jonathan Wakely wrote:

This valid program shows data races with -fsanitize=thread after your
patch is applied:

#define _GLIBCXX_DEBUG
#include 
#include 

void thrash(std::vector::iterator& iter)
{
for (int i = 0; i < 1000; ++i)
{
  auto jiter = std::move(iter);
  iter = std::move(jiter);
  jiter = std::move(iter);
  iter = std::move(jiter);
}
}

int main()
{
std::vector v{1, 2, 3};
auto it1 = v.begin();
auto it2 = v.begin();
std::thread t1(thrash, std::ref(it1));
thrash(it2);
t1.join();
}


Doing a test like this with TSan should be the absolute minimum
required for any change to the mutex locking policy.

You want to reduce the locks on move operations of iterators? OK,
perform lots of move operations on two iterators from the same
sequence and see if ThreadSanitizer shows errors.

If TSan doesn't show errors, adjust the test and try harder to prove
it's correct. Maybe your test was flawed. For example, my first
attempt to prove the data race passed the iterators by value not by
reference. That meant there were more than two debug iterators in the
linked list, and so the two iterators being tested were not adjacent
in the list, and didn't conflict. Because I'd already proved there was
a bug by inspecting the code I knew my test was flawed, so I changed
it. But even if I hadn't already proved a bug, I would have kept
testing different variations to prove whether the code was correct or
not.

Our debug iterators make "non-local" modifications to other objects
from the same sequence. That needs careful synchronisation. If that
makes the code a bit slower, then we just have to live with it.
Correct and slow is better than fast and wrong.

I'm not aware of people complaining about the performance of debug
mode anyway. Everybody I speak to is happy to accept a performance hit
in order to get checking.

I think https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86843 would be
more helpful to our users, as it would allow some Debug Mode checks to
be enabled in programs that can't currently use it (because
recompiling the entire program is not possible).




Re: [PATCH][OBVIOUS] Remove extra line in MAINTAINERS.

2018-08-10 Thread Martin Liška
On 08/10/2018 12:38 PM, Paolo Carlini wrote:
> Hi,
> 
> On 10/08/2018 12:35, Martin Liška wrote:
>> Hi.
>>
>> This removes one extra line that's a typo.
> It's not ;) The complete line is "c++ runtime libs special modes". See? 
> Admittedly, we may want to have something clearer, but just removing that 
> line isn't the way to go.
> 
> Paolo.

Oup, I was too eager.
So then what about putting "c++ runtime libs special modes" on one line?

Martin


Re: [PATCH][OBVIOUS] Remove extra line in MAINTAINERS.

2018-08-10 Thread Paolo Carlini

Hi,

On 10/08/2018 12:35, Martin Liška wrote:

Hi.

This removes one extra line that's a typo.
It's not ;) The complete line is "c++ runtime libs special modes". See? 
Admittedly, we may want to have something clearer, but just removing 
that line isn't the way to go.


Paolo.


[PATCH][OBVIOUS] Remove extra line in MAINTAINERS.

2018-08-10 Thread Martin Liška
Hi.

This removes one extra line that's a typo.

Martin

ChangeLog:

2018-08-10  Martin Liska  

* MAINTAINERS: Remove extra line.
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)


diff --git a/MAINTAINERS b/MAINTAINERS
index f96ab622146..8d42c91b2d7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -188,7 +188,6 @@ c++ runtime libs	Paolo Carlini		
 c++ runtime libs	Ulrich Drepper		
 c++ runtime libs	Benjamin De Kosnik	
 c++ runtime libs	Jonathan Wakely		
-c++ runtime libs
 special modes		François Dumont		
 fixincludes		Bruce Korb		
 *gimpl*			Jakub Jelinek		



[PATCH] Remove not needed __builtin_expect due to malloc predictor.

2018-08-10 Thread Martin Liška
Hi.

After we introduced new non-NULL malloc predictor, we can remove these 
__builtin_expects.
Predictors will change in following way:

Before:

Predictions for bb 5
  first match heuristics: 10.00%
  combined heuristics: 10.00%
  __builtin_expect heuristics of edge 5->6: 10.00%
  call heuristics of edge 5->6 (ignored): 33.00%
  loop exit heuristics of edge 5->9 (ignored): 5.50%

After:

Predictions for bb 5
  first match heuristics: 0.04%
  combined heuristics: 0.04%
  pointer (on trees) heuristics of edge 5->6 (ignored): 30.00%
  malloc returned non-NULL heuristics of edge 5->6: 0.04%
  call heuristics of edge 5->6 (ignored): 33.00%
  loop exit heuristics of edge 5->9 (ignored): 5.50%


Maybe there are similar allocation-related expects, but I haven't found them.
Ready after it survives regression tests?

Martin

libstdc++-v3/ChangeLog:

2018-08-10  Martin Liska  

* libsupc++/new_op.cc (new): Remove __builtin_expect as malloc
predictor can handle that.
* libsupc++/new_opa.cc: Likewise.
* libsupc++/new_opnt.cc (new): Likewise.
---
 libstdc++-v3/libsupc++/new_op.cc   | 2 +-
 libstdc++-v3/libsupc++/new_opa.cc  | 2 +-
 libstdc++-v3/libsupc++/new_opnt.cc | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)


diff --git a/libstdc++-v3/libsupc++/new_op.cc b/libstdc++-v3/libsupc++/new_op.cc
index 3a1e38d9df7..3caa0bab2ea 100644
--- a/libstdc++-v3/libsupc++/new_op.cc
+++ b/libstdc++-v3/libsupc++/new_op.cc
@@ -47,7 +47,7 @@ operator new (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc)
   if (sz == 0)
 sz = 1;
 
-  while (__builtin_expect ((p = malloc (sz)) == 0, false))
+  while ((p = malloc (sz)) == 0)
 {
   new_handler handler = std::get_new_handler ();
   if (! handler)
diff --git a/libstdc++-v3/libsupc++/new_opa.cc b/libstdc++-v3/libsupc++/new_opa.cc
index 68eac5b8ceb..a27ff843ca1 100644
--- a/libstdc++-v3/libsupc++/new_opa.cc
+++ b/libstdc++-v3/libsupc++/new_opa.cc
@@ -126,7 +126,7 @@ operator new (std::size_t sz, std::align_val_t al)
 #endif
 
   using __gnu_cxx::aligned_alloc;
-  while (__builtin_expect ((p = aligned_alloc (align, sz)) == 0, false))
+  while ((p = aligned_alloc (align, sz)) == 0)
 {
   new_handler handler = std::get_new_handler ();
   if (! handler)
diff --git a/libstdc++-v3/libsupc++/new_opnt.cc b/libstdc++-v3/libsupc++/new_opnt.cc
index a2dc33ad4d3..faab44e66c2 100644
--- a/libstdc++-v3/libsupc++/new_opnt.cc
+++ b/libstdc++-v3/libsupc++/new_opnt.cc
@@ -40,7 +40,7 @@ operator new (std::size_t sz, const std::nothrow_t&) _GLIBCXX_USE_NOEXCEPT
   if (sz == 0)
 sz = 1;
 
-  while (__builtin_expect ((p = malloc (sz)) == 0, false))
+  while ((p = malloc (sz)) == 0)
 {
   new_handler handler = std::get_new_handler ();
   if (! handler)



[arm-8-branch] Add Linaro version string and macros

2018-08-10 Thread Yvan Roux
Hi,

This patch adds Linaro version string and release macros to ARM GCC 8
vendor branch.

Ok to commit?

Thanks
Yvan

gcc/ChangeLog
2018-08-10  Yvan Roux  

* LINARO-VERSION: New file.
* configure.ac: Add Linaro version string.
* configure: Regenerate.
* Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define.
(CFLAGS-cppbuiltin.o): Add LINAROVER macro definition.
(cppbuiltin.o): Depend on $(LINAROVER).
* cppbuiltin.c (parse_linarover): New.
(define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.
Index: gcc/LINARO-VERSION
===
--- gcc/LINARO-VERSION	(nonexistent)
+++ gcc/LINARO-VERSION	(working copy)
@@ -0,0 +1 @@
+8.2-2018.08~dev
Index: gcc/Makefile.in
===
--- gcc/Makefile.in	(revision 263464)
+++ gcc/Makefile.in	(working copy)
@@ -854,10 +854,12 @@
 DEVPHASE:= $(srcdir)/DEV-PHASE # experimental, prerelease, ""
 DATESTAMP   := $(srcdir)/DATESTAMP # MMDD or empty
 REVISION:= $(srcdir)/REVISION  # [BRANCH revision XX]
+LINAROVER   := $(srcdir)/LINARO-VERSION # M.x-.MM[-S][~dev]
 
 BASEVER_c   := $(shell cat $(BASEVER))
 DEVPHASE_c  := $(shell cat $(DEVPHASE))
 DATESTAMP_c := $(shell cat $(DATESTAMP))
+LINAROVER_c := $(shell cat $(LINAROVER))
 
 ifeq (,$(wildcard $(REVISION)))
 REVISION_c  :=
@@ -884,6 +886,7 @@
   "\"$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(DATESTAMP_c))\""
 PKGVERSION_s:= "\"@PKGVERSION@\""
 BUGURL_s:= "\"@REPORT_BUGS_TO@\""
+LINAROVER_s := "\"$(LINAROVER_c)\""
 
 PKGVERSION  := @PKGVERSION@
 BUGURL_TEXI := @REPORT_BUGS_TEXI@
@@ -2883,8 +2886,9 @@
   -DSTANDARD_EXEC_PREFIX=\"$(libdir)/gcc/\" \
   @TARGET_SYSTEM_ROOT_DEFINE@
 
-CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s)
-cppbuiltin.o: $(BASEVER)
+CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \
+	-DLINAROVER=$(LINAROVER_s)
+cppbuiltin.o: $(BASEVER) $(LINAROVER)
 
 CFLAGS-cppdefault.o += $(PREPROCESSOR_DEFINES)
 
Index: gcc/configure
===
--- gcc/configure	(revision 263464)
+++ gcc/configure	(working copy)
@@ -1726,7 +1726,8 @@
   --with-stabsarrange to use stabs instead of host debug format
   --with-dwarf2   force the default debug format to be DWARF 2
   --with-specs=SPECS  add SPECS to driver command-line processing
-  --with-pkgversion=PKG   Use PKG in the version string in place of "GCC"
+  --with-pkgversion=PKG   Use PKG in the version string in place of "Linaro
+  GCC `cat $srcdir/LINARO-VERSION`"
   --with-bugurl=URL   Direct users to URL to report a bug
   --with-multilib-listselect multilibs (AArch64, SH and x86-64 only)
   --with-gnu-ld   assume the C compiler uses GNU ld default=no
@@ -7649,7 +7650,7 @@
   *)   PKGVERSION="($withval) " ;;
  esac
 else
-  PKGVERSION="(GCC) "
+  PKGVERSION="(Linaro GCC `cat $srcdir/LINARO-VERSION`) "
 
 fi
 
@@ -18448,7 +18449,7 @@
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18451 "configure"
+#line 18452 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18554,7 +18555,7 @@
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18557 "configure"
+#line 18558 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
Index: gcc/configure.ac
===
--- gcc/configure.ac	(revision 263464)
+++ gcc/configure.ac	(working copy)
@@ -929,7 +929,7 @@
 )
 AC_SUBST(CONFIGURE_SPECS)
 
-ACX_PKGVERSION([GCC])
+ACX_PKGVERSION([Linaro GCC `cat $srcdir/LINARO-VERSION`])
 ACX_BUGURL([https://gcc.gnu.org/bugs/])
 
 # Sanity check enable_languages in case someone does not run the toplevel
Index: gcc/cppbuiltin.c
===
--- gcc/cppbuiltin.c	(revision 263464)
+++ gcc/cppbuiltin.c	(working copy)
@@ -53,18 +53,41 @@
 *patchlevel = s_patchlevel;
 }
 
+/* Parse a LINAROVER version string of the format "M.m-year.month[-spin][~dev]"
+   to create Linaro release number MM and spin version.  */
+static void
+parse_linarover (int *release, int *spin)
+{
+  static int s_year = -1, s_month, s_spin;
 
+  if (s_year == -1)
+if (sscanf (LINAROVER, "%*[^-]-%d.%d-%d", _year, _month, _spin) != 3)
+  {
+	sscanf (LINAROVER, "%*[^-]-%d.%d", _year, _month);
+	s_spin = 0;
+  }
+
+  if (release)
+*release = s_year * 100 + s_month;
+
+  if (spin)
+*spin = s_spin;
+}
+
 /* Define __GNUC__, __GNUC_MINOR__, __GNUC_PATCHLEVEL__ and __VERSION__.  */
 static void
 define__GNUC__ (cpp_reader *pfile)
 {
-  int major, minor, patchlevel;
+  int major, minor, patchlevel, linaro_release, linaro_spin;
 
   parse_basever (, , );
+  parse_linarover (_release, 

Re: Improve safe iterator move semantic

2018-08-10 Thread Jonathan Wakely

On 09/08/18 20:41 +0200, François Dumont wrote:

    Here is a patch to improve Debug mode safe iterator move semantic.

    To summarize where we used to have N mutex locks we now have N - 
1. For instance move constructor used to lock mutex twice, now it only 
does it once. Note that move constructor or move assignment operator 
are currently more expensive than their copy counterparts !


Yes, because they have to mutate two objects, not just one. We could
just remove the move operations, and so all "moves" just do copies
instead. That would not allow Dbueg Mode to assert on operations on
moved-from iterators. Do we care about that?

Do any of our (non-debug) container iterators actually become invalid
after a move, or are they all safe to keep using?



+#if __cplusplus >= 201103L
+_Safe_iterator_base(_Safe_iterator_base&&) = default;
+
+_Safe_iterator_base&
+operator=(_Safe_iterator_base&&) = default;
+#endif


This is very counterintuitive. The type has pointer members, so moving
doesn't set them to null, it's just a copy. In the derived moves we do
an explicit _M_reset() later, to make it more "move-like", but the
base operation is still a copy.

We would need **at least** a comment on the defaulted move operations
saying "these just copy the members, but then the derived class clears
them after adjusting pointers for the sequence's other iterators".

But there's another counterintuitive problem: in C++98 mode _Safe_base
has an implicit (public) copy constructor and copy assignment
operator, but in C++11 mode those are suppressed by the (protected)
move constructor and move assignment operator. Either we should
declare private+undefined copy ops in C++98 mode (to make them
consistently unavailable), or just change the defaulted move into
defaulted copies (since that's what they do anyway).

Or alternatively, don't define a move constructor and move assignment
operator at all, and delete copies and moves. The derived
_Safe_iterator move ctor and move assignment can both use the same
function _M_move to do the moving and the resetting all at once. See
the attached patch (relative to yours) to see what I mean.

BUT, I think the whole concept of reducing the number of mutex locks
is flawed. See below.



@@ -148,11 +161,12 @@ namespace __gnu_debug

/** Reset all member variables */
void
-_M_reset() throw ();
+_M_reset() _GLIBCXX_USE_NOEXCEPT
+{ __builtin_memset(this, 0, sizeof(_Safe_iterator_base)); }


This is undefined behaviour, because _Safe_iterator_base is not
trivially copyable. Adding a static assertion shows:

In file included from 
/home/jwakely/src/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:29:
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_base.h:179:17:
 error: static assertion failed
  static_assert(std::is_trivially_copyable<_Safe_iterator_base>::value, "");
^~~



+  inline void
+  _Safe_local_iterator_base::_M_move_to(_Safe_iterator_base* __x,
+   bool __constant)
+  {
+__gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
+if (_M_prior)
+  _M_prior->_M_next = this;
+else
+  {
+   // No prior, we are at the beginning of the linked list.
+   auto& __its = __constant
+ ? _M_get_container()->_M_const_local_iterators
+ : _M_get_container()->_M_local_iterators;
+   if (__its == __x)
+ __its = this;
+  }
+
+if (_M_next)
+  _M_next->_M_prior = this;
+  }


This is not thread-safe. What if another thread is modifying
*__x._M_prior at the same time? It will lock
__x._M_prior->_M_get_mutex(), and then try to update
*__x._M_prior->_M_next, which is __x.

But nothing locks __x._M_get_mutex() to make that safe.

This valid program shows data races with -fsanitize=thread after your
patch is applied:

#define _GLIBCXX_DEBUG
#include 
#include 

void thrash(std::vector::iterator& iter)
{
 for (int i = 0; i < 1000; ++i)
 {
   auto jiter = std::move(iter);
   iter = std::move(jiter);
   jiter = std::move(iter);
   iter = std::move(jiter);
 }
}

int main()
{
 std::vector v{1, 2, 3};
 auto it1 = v.begin();
 auto it2 = v.begin();
 std::thread t1(thrash, std::ref(it1));
 thrash(it2);
 t1.join();
}

(It also shows data races with my version of the patch attached to
this mail, because my patch only refactors yours a bit, it doesn't
change how the mutexes are locked).

I think this idea is fundamentally flawed.


diff --git a/libstdc++-v3/include/debug/safe_base.h b/libstdc++-v3/include/debug/safe_base.h
index c276a1883a1..aa6fc76ec42 100644
--- a/libstdc++-v3/include/debug/safe_base.h
+++ b/libstdc++-v3/include/debug/safe_base.h
@@ -97,13 +97,6 @@ namespace __gnu_debug
 : _M_sequence(0), _M_version(0), _M_prior(0), _M_next(0)
 { this->_M_attach(__x._M_sequence, __constant); }
 
-#if __cplusplus >= 201103L
-_Safe_iterator_base(_Safe_iterator_base&&) = default;
-
-_Safe_iterator_base&
-

Re: [C++ Patch] Tweak check_previous_goto_1 to emit hard errors instead of permerrors in some cases

2018-08-10 Thread Paolo Carlini
.. an additional clarification (I told you that over the years we 
changed this code quite a bit...): I originally added the testcase that 
I'm adjusting here, I did that when, back in 2014, I worked on 63558: 
the test uses -fpermissive -w and was meant to check, as requested by 
Manuel in the bug, that we didn't issue further unsuppressible 
explanatory diagnostic after the primary permerror. I would argue that 
that first clean-up back 2014 was an improvement but note that, in 
practice, before it we rejected (with a confusing permerror + error) 
this kind of broken code and we don't anymore. That's bad. We don't 
really emit sensible assembly for it, as I already explained.


Paolo.