Re: [PATCH 1/4] regcprop: Avoid REG_CFA_REGISTER notes (PR85645)
On 08/10/2018 02:55 PM, Segher Boessenkool wrote: > On Mon, Jun 25, 2018 at 03:34:26PM -0600, Jeff Law wrote: >> On 06/25/2018 05:53 AM, Segher Boessenkool wrote: >>> Hi Eric, >>> >>> On Wed, May 09, 2018 at 09:22:47AM +0200, Eric Botcazou wrote: > 2018-05-08 Segher Boessenkool > > PR rtl-optimization/85645 > * regcprop.c (copyprop_hardreg_forward_1): Don't propagate into an > insn that has a REG_CFA_REGISTER note. OK, thanks. >>> >>> Are these patches okay for backport to 8? At least the first two. >> Yes. > > And for 7? (Same two). Yes. jeff
Re: [PATCH 1/4] regcprop: Avoid REG_CFA_REGISTER notes (PR85645)
On Mon, Jun 25, 2018 at 03:34:26PM -0600, Jeff Law wrote: > On 06/25/2018 05:53 AM, Segher Boessenkool wrote: > > Hi Eric, > > > > On Wed, May 09, 2018 at 09:22:47AM +0200, Eric Botcazou wrote: > >>> 2018-05-08 Segher Boessenkool > >>> > >>> PR rtl-optimization/85645 > >>> * regcprop.c (copyprop_hardreg_forward_1): Don't propagate into an > >>> insn that has a REG_CFA_REGISTER note. > >> > >> OK, thanks. > > > > Are these patches okay for backport to 8? At least the first two. > Yes. And for 7? (Same two). Segher
[PATCH] Unconditionally use MAX_EXPR/MIN_EXPR for MAX/MIN intrinsics
For floating point types, the question is what MAX(a, NaN) or MIN(a, NaN) should return (where "a" is a normal number). There are valid usecases for returning either one, but the Fortran standard doesn't specify which one should be chosen. Also, there is no consensus among other tested compilers. In short, it's a mess. So lets just do whatever is fastest, which is using MAX_EXPR/MIN_EXPR which are not defined to do anything in particular if one of the operands is a NaN. Regtested on x86_64-pc-linux-gnu, Ok for trunk? gcc/fortran/ChangeLog: 2018-08-10 Janne Blomqvist * trans-intrinsic.c (gfc_conv_intrinsic_minmax): Use MAX_EXPR/MIN_EXPR unconditionally for real arguments. gcc/testsuite/ChangeLog: 2018-08-10 Janne Blomqvist * gfortran.dg/nan_1.f90: Remove tests that test MAX/MIN with NaNs. --- gcc/fortran/trans-intrinsic.c | 55 ++--- gcc/testsuite/gfortran.dg/nan_1.f90 | 35 -- 2 files changed, 10 insertions(+), 80 deletions(-) diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index c9b5479740c..190fde66a8d 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -3914,8 +3914,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) mvar = gfc_create_var (type, "M"); gfc_add_modify (>pre, mvar, args[0]); - internal_fn ifn = op == GT_EXPR ? IFN_FMAX : IFN_FMIN; - for (i = 1, argexpr = argexpr->next; i < nargs; i++, argexpr = argexpr->next) { tree cond = NULL_TREE; @@ -3936,49 +3934,16 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) val = gfc_evaluate_now (val, >pre); tree calc; - /* If we dealing with integral types or we don't care about NaNs -just do a MIN/MAX_EXPR. */ - if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)) - { - - tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR; - calc = fold_build2_loc (input_location, code, type, - convert (type, val), mvar); - tmp = build2_v (MODIFY_EXPR, mvar, calc); - - } - /* If we care about NaNs and we have internal functions available for -fmin/fmax to perform the comparison, use those. */ - else if (SCALAR_FLOAT_TYPE_P (type) - && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED)) - { - calc = build_call_expr_internal_loc (input_location, ifn, type, - 2, mvar, convert (type, val)); - tmp = build2_v (MODIFY_EXPR, mvar, calc); - - } - /* Otherwise expand to: - mvar = a1; - if (a2 .op. mvar || isnan (mvar)) - mvar = a2; - if (a3 .op. mvar || isnan (mvar)) - mvar = a3; - ... */ - else - { - tree isnan = build_call_expr_loc (input_location, - builtin_decl_explicit (BUILT_IN_ISNAN), - 1, mvar); - tmp = fold_build2_loc (input_location, op, logical_type_node, -convert (type, val), mvar); - - tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR, - logical_type_node, tmp, - fold_convert (logical_type_node, isnan)); - tmp = build3_v (COND_EXPR, tmp, - build2_v (MODIFY_EXPR, mvar, convert (type, val)), - build_empty_stmt (input_location)); - } + /* For floating point types, the question is what MAX(a, NaN) or +MIN(a, NaN) should return (where "a" is a normal number). +There are valid usecase for returning either one, but the +Fortran standard doesn't specify which one should be chosen. +Also, there is no consensus among other tested compilers. In +short, it's a mess. So lets just do whatever is fastest. */ + tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR; + calc = fold_build2_loc (input_location, code, type, + convert (type, val), mvar); + tmp = build2_v (MODIFY_EXPR, mvar, calc); if (cond != NULL_TREE) tmp = build3_v (COND_EXPR, cond, tmp, diff --git a/gcc/testsuite/gfortran.dg/nan_1.f90 b/gcc/testsuite/gfortran.dg/nan_1.f90 index e64b4ce65e1..1b39cc1f21c 100644 --- a/gcc/testsuite/gfortran.dg/nan_1.f90 +++ b/gcc/testsuite/gfortran.dg/nan_1.f90 @@ -66,35 +66,12 @@ program test if (isinf(-nan) .or. isinf(-large) .or. .not. isinf(-inf)) STOP 4 ! Check that MIN and MAX behave correctly - if (max(2.0, nan) /= 2.0) STOP 5 - if (min(2.0, nan) /= 2.0) STOP 6 - if (max(nan, 2.0) /= 2.0) STOP 7 - if (min(nan, 2.0) /= 2.0) STOP 8 - - if (max(2.d0, nan) /= 2.d0) STOP 9! { dg-warning "Extension: Different type kinds" } - if (min(2.d0, nan) /= 2.d0) STOP 10! { dg-warning "Extension: Different type kinds" } -
Re: [PATCH] rs6000: Fix vector homogeneous aggregates (PR86197)
On Mon, Jun 25, 2018 at 06:32:27AM -0500, Segher Boessenkool wrote: > On Tue, Jun 19, 2018 at 10:45:59AM +, Segher Boessenkool wrote: > > The existing code allows only 4 vectors worth of ieee128 homogeneous > > aggregates, but it should be 8. This happens because at one spot it > > is mistakenly qualified as being passed in floating point registers. > > > > This patch fixes it and makes the code easier to read. Committing to > > trunk; needs backports too. > > Backported to 8 now. And to 7 and 6. Segher
[PATCH] PR libstdc++/68210 adjust operator new and delete for LWG 206
Ensure that nothrow versions of new and delete call the ordinary versions of new or delete, instead of calling malloc or free directly. These files are all compiled with -std=gnu++14 so can use noexcept and nullptr to make the code more readable. PR libstdc++/68210 * doc/xml/manual/intro.xml: Document LWG 206 change. * libsupc++/del_op.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept. * libsupc++/del_opa.cc: Likewise. * libsupc++/del_opant.cc: Likewise. * libsupc++/del_opnt.cc: Likewise. Call operator delete(ptr) instead of free(ptr). * libsupc++/del_ops.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept. * libsupc++/del_opsa.cc: Likewise. * libsupc++/del_opva.cc: Likewise. * libsupc++/del_opvant.cc: Likewise. * libsupc++/del_opvnt.cc: Likewise. Call operator delete[](ptr) instead of operator delete(ptr). * libsupc++/del_opvs.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept. * libsupc++/del_opvsa.cc: Likewise. * libsupc++/new_op.cc: Use __builtin_expect in check for zero size. * libsupc++/new_opa.cc: Use nullptr instead of literal 0. * libsupc++/new_opant.cc: Likewise. Replace _GLIBCXX_USE_NOEXCEPT with noexcept. * libsupc++/new_opnt.cc: Likewise. Call operator new(sz) instead of malloc(sz). * libsupc++/new_opvant.cc: Use nullptr and noexcept. * libsupc++/new_opvnt.cc: Likewise. Call operator new[](sz) instead of operator new(sz, nothrow). * testsuite/18_support/new_nothrow.cc: New test. Tested x86_64-linux, committed to trunk. commit ef6c2cd63b0af7fa0823d81d0098361a46230b26 Author: Jonathan Wakely Date: Fri Aug 10 19:59:00 2018 +0100 PR libstdc++/68210 adjust operator new and delete for LWG 206 Ensure that nothrow versions of new and delete call the ordinary versions of new or delete, instead of calling malloc or free directly. These files are all compiled with -std=gnu++14 so can use noexcept and nullptr to make the code more readable. PR libstdc++/68210 * doc/xml/manual/intro.xml: Document LWG 206 change. * libsupc++/del_op.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept. * libsupc++/del_opa.cc: Likewise. * libsupc++/del_opant.cc: Likewise. * libsupc++/del_opnt.cc: Likewise. Call operator delete(ptr) instead of free(ptr). * libsupc++/del_ops.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept. * libsupc++/del_opsa.cc: Likewise. * libsupc++/del_opva.cc: Likewise. * libsupc++/del_opvant.cc: Likewise. * libsupc++/del_opvnt.cc: Likewise. Call operator delete[](ptr) instead of operator delete(ptr). * libsupc++/del_opvs.cc: Replace _GLIBCXX_USE_NOEXCEPT with noexcept. * libsupc++/del_opvsa.cc: Likewise. * libsupc++/new_op.cc: Use __builtin_expect in check for zero size. * libsupc++/new_opa.cc: Use nullptr instead of literal 0. * libsupc++/new_opant.cc: Likewise. Replace _GLIBCXX_USE_NOEXCEPT with noexcept. * libsupc++/new_opnt.cc: Likewise. Call operator new(sz) instead of malloc(sz). * libsupc++/new_opvant.cc: Use nullptr and noexcept. * libsupc++/new_opvnt.cc: Likewise. Call operator new[](sz) instead of operator new(sz, nothrow). * testsuite/18_support/new_nothrow.cc: New test. diff --git a/libstdc++-v3/doc/xml/manual/intro.xml b/libstdc++-v3/doc/xml/manual/intro.xml index fea07e2bb5f..cb187e1a2ed 100644 --- a/libstdc++-v3/doc/xml/manual/intro.xml +++ b/libstdc++-v3/doc/xml/manual/intro.xml @@ -440,6 +440,17 @@ requirements of the license of GCC. Yes, it can, specifically if EOF is reached while skipping whitespace. +http://www.w3.org/1999/xlink; xlink:href="#206">206: + operator new(size_t, nothrow) may become + unlinked to ordinary operator new if ordinary + version replaced + + +The nothrow forms of new and delete were + changed to call the throwing forms, handling any exception by + catching it and returning a null pointer. + + http://www.w3.org/1999/xlink; xlink:href="#211">211: operator(istream, string) doesn't set failbit diff --git a/libstdc++-v3/libsupc++/del_op.cc b/libstdc++-v3/libsupc++/del_op.cc index 9a5cb82fdf0..ab3b617afa7 100644 --- a/libstdc++-v3/libsupc++/del_op.cc +++ b/libstdc++-v3/libsupc++/del_op.cc @@ -44,7 +44,7 @@ _GLIBCXX_END_NAMESPACE_VERSION #pragma GCC diagnostic ignored "-Wsized-deallocation" _GLIBCXX_WEAK_DEFINITION void -operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT +operator delete(void* ptr) noexcept { std::free(ptr); } diff --git a/libstdc++-v3/libsupc++/del_opa.cc b/libstdc++-v3/libsupc++/del_opa.cc index 71f384df661..2a1f0aba3a1 100644
Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry
On 08/08/2018 08:19 AM, Tom de Vries wrote: > On Wed, Aug 08, 2018 at 07:09:16AM -0700, Cesar Philippidis wrote: >> On 08/07/2018 06:52 AM, Cesar Philippidis wrote: Thanks for review. This version should address all of the following remarks. However, one thing to note ... >> [nvptx] Use CUDA driver API to select default runtime launch geometry >> >> 2018-08-YY Cesar Philippidis >> >> libgomp/ >> plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef. >> (cuDriverGetVersion): Declare. >> (cuOccupancyMaxPotentialBlockSizeWithFlags): Declare. >> plugin/plugin-nvptx.c (CUDA_ONE_CALL): Add entries for >> cuDriverGetVersion and cuOccupancyMaxPotentialBlockSize. >> (ptx_device): Add driver_version member. >> (nvptx_open_device): Initialize it. >> (nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the >> default num_gangs and num_workers when the driver supports it. >> --- >> libgomp/plugin/cuda-lib.def | 2 ++ >> libgomp/plugin/cuda/cuda.h| 4 >> libgomp/plugin/plugin-nvptx.c | 40 +++- >> 3 files changed, 45 insertions(+), 1 deletion(-) >> >> diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def >> index be8e3b3..f2433e1 100644 >> --- a/libgomp/plugin/cuda-lib.def >> +++ b/libgomp/plugin/cuda-lib.def >> @@ -2,6 +2,7 @@ CUDA_ONE_CALL (cuCtxCreate) >> CUDA_ONE_CALL (cuCtxDestroy) >> CUDA_ONE_CALL (cuCtxGetCurrent) >> CUDA_ONE_CALL (cuCtxGetDevice) >> +CUDA_ONE_CALL (cuDriverGetVersion) > > Don't use cuDriverGetVersion. > >> CUDA_ONE_CALL (cuCtxPopCurrent) >> CUDA_ONE_CALL (cuCtxPushCurrent) >> CUDA_ONE_CALL (cuCtxSynchronize) >> @@ -39,6 +40,7 @@ CUDA_ONE_CALL (cuModuleGetGlobal) >> CUDA_ONE_CALL (cuModuleLoad) >> CUDA_ONE_CALL (cuModuleLoadData) >> CUDA_ONE_CALL (cuModuleUnload) >> +CUDA_ONE_CALL (cuOccupancyMaxPotentialBlockSize) > > Use CUDA_ONE_CALL_MAYBE_NULL. > >> CUDA_ONE_CALL (cuStreamCreate) >> CUDA_ONE_CALL (cuStreamDestroy) >> CUDA_ONE_CALL (cuStreamQuery) >> diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h >> index 4799825..3a790e6 100644 >> --- a/libgomp/plugin/cuda/cuda.h >> +++ b/libgomp/plugin/cuda/cuda.h >> @@ -44,6 +44,7 @@ typedef void *CUevent; >> typedef void *CUfunction; >> typedef void *CUlinkState; >> typedef void *CUmodule; >> +typedef size_t (*CUoccupancyB2DSize)(int); >> typedef void *CUstream; >> >> typedef enum { >> @@ -123,6 +124,7 @@ CUresult cuCtxSynchronize (void); >> CUresult cuDeviceGet (CUdevice *, int); >> CUresult cuDeviceGetAttribute (int *, CUdevice_attribute, CUdevice); >> CUresult cuDeviceGetCount (int *); >> +CUresult cuDriverGetVersion(int *); >> CUresult cuEventCreate (CUevent *, unsigned); >> #define cuEventDestroy cuEventDestroy_v2 >> CUresult cuEventDestroy (CUevent); >> @@ -170,6 +172,8 @@ CUresult cuModuleGetGlobal (CUdeviceptr *, size_t *, >> CUmodule, const char *); >> CUresult cuModuleLoad (CUmodule *, const char *); >> CUresult cuModuleLoadData (CUmodule *, const void *); >> CUresult cuModuleUnload (CUmodule); >> +CUresult cuOccupancyMaxPotentialBlockSize(int *, int *, CUfunction, >> + CUoccupancyB2DSize, size_t, int); >> CUresult cuStreamCreate (CUstream *, unsigned); >> #define cuStreamDestroy cuStreamDestroy_v2 >> CUresult cuStreamDestroy (CUstream); >> diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c >> index 825470a..b0ccf0b 100644 >> --- a/libgomp/plugin/plugin-nvptx.c >> +++ b/libgomp/plugin/plugin-nvptx.c >> @@ -376,6 +376,7 @@ struct ptx_device >>int max_threads_per_block; >>int max_threads_per_multiprocessor; >>int default_dims[GOMP_DIM_MAX]; >> + int driver_version; >> >>struct ptx_image_data *images; /* Images loaded on device. */ >>pthread_mutex_t image_lock; /* Lock for above list. */ >> @@ -687,6 +688,7 @@ nvptx_open_device (int n) >>ptx_dev->ord = n; >>ptx_dev->dev = dev; >>ptx_dev->ctx_shared = false; >> + ptx_dev->driver_version = 0; >> >>r = CUDA_CALL_NOCHECK (cuCtxGetDevice, _dev); >>if (r != CUDA_SUCCESS && r != CUDA_ERROR_INVALID_CONTEXT) >> @@ -780,6 +782,9 @@ nvptx_open_device (int n) >>for (int i = 0; i != GOMP_DIM_MAX; i++) >> ptx_dev->default_dims[i] = 0; >> >> + CUDA_CALL_ERET (NULL, cuDriverGetVersion, ); >> + ptx_dev->driver_version = pi; >> + >>ptx_dev->images = NULL; >>pthread_mutex_init (_dev->image_lock, NULL); >> >> @@ -1173,11 +1178,44 @@ nvptx_exec (void (*fn), size_t mapnum, void >> **hostaddrs, void **devaddrs, >> >>{ >> bool default_dim_p[GOMP_DIM_MAX]; >> +int vectors = nvthd->ptx_dev->default_dims[GOMP_DIM_VECTOR]; >> +int workers = nvthd->ptx_dev->default_dims[GOMP_DIM_WORKER]; >> +int gangs = nvthd->ptx_dev->default_dims[GOMP_DIM_GANG]; >> + is that I modified the default value for vectors as follows + int vectors = default_dim_p[GOMP_DIM_VECTOR] + ?
Re: [PATCH] Make strlen range computations more conservative
On 08/08/2018 11:36 PM, Jeff Law wrote: On 08/02/2018 09:42 AM, Martin Sebor wrote: The warning bits are definitely not okay by me. The purpose of the warnings (-W{format,sprintf}-{overflow,truncation} is to detect buffer overflows. When a warning doesn't have access to string length information for dynamically created strings (like the strlen pass does) it uses array sizes as a proxy. This is useful both to detect possible buffer overflows and to prevent false positives for overflows that cannot happen in correctly written programs. So how much of this falling-back to array sizes as a proxy would become unnecessary if sprintf had access to the strlen pass as an analysis module? As you know we've been kicking that around and from my investigations that doesn't really look hard to do. Encapsulate the data structures in a class, break up the statement handling into analysis and optimization and we should be good to go. I'm hoping to start prototyping this week. If we think that has a reasonable chance to eliminate the array-size fallback, then that seems like the most promising path forward. We discussed this idea this morning so let me respond here and reiterate the answer. Using the strlen data will help detect buffer overflow where the array size isn't available but it cannot replace the array size heuristic. Here's a simple example: struct S { char a[8]; }; char d[8]; void f (struct S *s, int i) { sprintf (d, "%s-%i", s[i].a, i); } We don't know the length of s->a but we do know that it can be up to 7 bytes long (assuming it's nul-terminated of course) so we know the sprintf call can overflow. Conversely, if the size of the destination is increased to 20 the sprintf call cannot overflow so the diagnostic can be avoided. Removing the array size heuristic would force us to either give up on diagnosing the first case or issue false positives for the second case. I think the second alternative would make the warning too noisy to be useful. The strlen pass will help detect buffer overflows in cases where the array size isn't known (e.g., with dynamically allocated buffers) but where the length of the string store in the array is known. It will also help avoid false positives in cases where the string stored in an array of known size is known to be too short to cause an overflow. For instance here: struct S { char a[8]; }; char d[8]; void f (struct S *s, int i) { if (strlen (s->a) < 4 && i >= 0 && i < 100) sprintf (d, "%s-%i", s->a, i); } Martin
Re: [PATCH], Improve PowerPC switch behavior on medium code model system
On Tue, Jul 31, 2018 at 10:39:21AM -0400, Michael Meissner wrote: > This patch adds an insn to load a LABEL_REF into a GPR. This is needed so the > FWPROP1 pass can convert the load the of the label address from the TOC to a > direct load to a GPR. I don't see why you need a separate RTL insn for this. It seems to me that some more generic pattern should accept label_refs. > While working on the patch, I discovered that the LWA instruction did not > support indexed loads. This was due to it using the 'Y' constraint, which > accepts DS-form offsettable addresses, but not X-form indexed addresses. I > added the Z constraint so that the indexed form is accepted. This part is fine. Please split it out to a separate patch. > * config/rs6000/rs6000.md (extendsi2): Allow reg+reg indexed > addressing. This should say it is changing the constraints. > (labelref): New insn to optimize loading a label address into > registers on a medium code system. (*labelref) btw. Segher
Re: [arm-8-branch] Add Linaro version string and macros
On Fri, 10 Aug 2018 at 17:01, Ramana Radhakrishnan wrote: > > On Fri, Aug 10, 2018 at 3:35 PM, Yvan Roux wrote: > > On Fri, 10 Aug 2018 at 14:31, Yvan Roux wrote: > >> > >> On Fri, 10 Aug 2018 at 13:45, Ramana Radhakrishnan > >> wrote: > >> > > >> > On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux wrote: > >> > > Hi, > >> > > > >> > > This patch adds Linaro version string and release macros to ARM GCC 8 > >> > > vendor branch. > >> > > > >> > > Ok to commit? > >> > > > >> > > >> > > >> > Ok if no regressions. (I'm assuming you've built and eyeballed that > >> > the pre-processor macros are being produced). (I have a patch to > >> > www-docs for this branch that I'm writing up and should try and get > >> > out today. Though it would be nice to have tests for these if > >> > possible. > >> > >> I've not passed the regression testsuite since this patch is part of > >> Linaro vendor branches for a long time, I've just built an x86_64 c > >> compiler from arm-8-branch and verified the version string and macros: > >> > >> $ ~/wip/arm-8-install/bin/gcc --version > >> gcc (Linaro GCC 8.2-2018.08~dev) 8.2.1 20180802 > >> > >> $ ~/wip/arm-8-install/bin/gcc -E -dM - < /dev/null | grep LINARO > >> #define __LINARO_SPIN__ 0 > >> #define __LINARO_RELEASE__ 201808 > >> > >> I can add some tests, but it will take some time to remember me how > >> these kind of thing is tested in the testsuite ;) > > > > Updated version of the patch, with a test case for Linaro macros. The > > test is not strict w/r to the version or spin number to avoid having > > to update it every release or re-spin, do you think it is sufficient ? > > > > Testsuite ran with the included test PASS. > > > > gcc/ChangeLog > > 2018-08-10 Yvan Roux > > > > * LINARO-VERSION: New file. > > * configure.ac: Add Linaro version string. > > * configure: Regenerate. > > * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define. > > (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition. > > (cppbuiltin.o): Depend on $(LINAROVER). > > * cppbuiltin.c (parse_linarover): New. > > (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros. > > > > gcc/testsuite/ChangeLog > > 2018-08-10 Yvan Roux > > > > * gcc.dg/cpp/linaro-macros.c: New test. > > > Looks good, thanks - can you put the Changelog in a Changelog.arm file ? Sure, does it need the FSF Copyright lines at the end like other ChangeLogs ? Thanks Yvan > regards > Ramana
Re: [PATCH v2 3/4] vxworks: enable use of .init_array/.fini_array for cdtors
Hello Rasmus, > On 28 Jun 2018, at 10:43, Rasmus Villemoes wrote: > > Assume that if the user passed --enable-initfini-array when building > gcc, the rest of the toolchain (including the link spec and linker > script) is set up appropriately. > > Note that configuring with --enable-initfini-array may prevent the -mrtp > mode from working, > due to the (unconditional) use of .init_array.* > sections instead of .ctors.* - Just spent some time verifying this, and, indeed getting a RTP to work with this setup is going to be tough, if at all possible. > however, that is the case regardless of this patch. Right, though the situation becomes a bit different with the patch as we now have references to the macro, sort of implying it should work. I still think it's fine, as what we do is simply honor what the doc of --enable-initfini-array states: generate constructor/desctructors in init/fini_array sections, nothing more, and the comments pretty clearly state that it's the responsibility of the one who configured this way to manage the constructors and fit them into the OS runtime. And we can revisit if we find a better way out. So, as it allows at least a use case to operate smoothly ... > 2018-06-04 Rasmus Villemoes > > gcc/ > config/vxworks.c: Set targetm.have_ctors_dtors if > HAVE_INITFINI_ARRAY_SUPPORT. > config/vxworks.h: Set SUPPORTS_INIT_PRIORITY if HAVE_INITFINI_ARRAY_SUPPORT. Ok, modulo ChangeLog reformatting: * config/vxworks.c: Set targetm.have_ctors_dtors if HAVE_INITFINI_ARRAY_SUPPORT. * config/vxworks.h: Set SUPPORTS_INIT_PRIORITY if HAVE_INITFINI_ARRAY_SUPPORT. Thanks, Olivier
Re: [RFC][PATCH][mid-end] Optimize immediate choice in comparisons.
On 09/08/18 06:48, Jeff Law wrote: On 08/07/2018 02:11 PM, Richard Sandiford wrote: Hi Vlad, Thanks for the patch. Vlad Lazar writes: Hi. This patch optimises the choice of immediates in integer comparisons. Integer comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1) and there are cases where an incremented/decremented immediate can be loaded into a register in fewer instructions. The cases are as follows: i) a > b or a >= b + 1 ii) a <= b or a < b + 1 iii) a >= b or a > b - 1 iv) a < b or a <= b - 1 For each comparison we check whether the equivalent can be performed in less instructions. This is done on a statement by statement basis, right before the GIMPLE statement is expanded to RTL. Therefore, it requires a correct implementation of the TARGET_INSN_COST hook. The change is general and it applies to any integer comparison, regardless of types or location. For example, on AArch64 for the following code: int foo (int x) { return x > 0xfefe; } it generates: mov w1, -16777217 cmp w0, w1 csetw0, cs instead of: mov w1, 65534 movkw1, 0xfeff, lsl 16 cmp w0, w1 csetw0, hi Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and there are no regressions. Looks like a useful feature. I'm playing devil's advocate to some extent here, but did you consider instead doing this during the expansion functions themselves? In some ways that feels more natural because we're already operating on rtxes at that point. It also has the advantage that it would trap comparisons that didn't exist at the gimple level, such as when computing the carry for a doubleword add/sub. I've got no strong opinions on doing it in cfgexpand vs the expansion functions themselves. I'm happy to have you setting overall direction here Richard. I do worry about the amount of RTL we generate and throw away during cost computation. Though it's just for comparisons, so it may not be terrible. I wouldn't be surprised if ports aren't particularly accurate in their costing computations for this kind of use -- but that's nothing new. We run into it every time we use rtx costing in a new place. I'm comfortable having targets fault in improvements for this kind of use. Jeff Thank you for the feedback. I agree with Richard's opinion that the change feels more natural in the actual expanders. Therefore, I've reimplemented the feature in the expansion functions. It's worth mentioning that it now also applies the optimization in ternary operators comparisons and I added a test which reflects such a case. Regarding the amount of RTL generated, I have tried to keep it at a minimum, by only generating the immediate value moves. I've bootstrapped and regtested again and there are no regressions. See the updated patch below. Thanks, Vlad --- This patch optimises the choice of immediates in integer comparisons. Integer comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1) and there are cases where an incremented/decremented immediate can be loaded into a register in fewer instructions. The cases are as follows: i) a > b or a >= b + 1 ii) a <= b or a < b + 1 iii) a >= b or a > b - 1 iv) a < b or a <= b - 1 For each comparison we check whether the equivalent can be performed in less instructions. This is done in the gimple expanders. Therefore, it requires a correct implementation of the TARGET_INSN_COST hook. The change is general and it applies to any integer comparison, regardless of types or location. For example, on AArch64 for the following code: int foo (int x) { return x > 0xfefe; } it generates: mov w1, -16777217 cmp w0, w1 csetw0, cs instead of: mov w1, 65534 movkw1, 0xfeff, lsl 16 cmp w0, w1 csetw0, hi Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and there are no regressions. Thanks, Vlad gcc/testsuite/ Changelog for gcc/testsuite/Changelog 2018-08-10 Vlad Lazar * gcc.target/aarch64/imm_choice_comparison.c: New. gcc/ Changelog for gcc/Changelog 2018-08-10 Vlad Lazar * expmed.h (canonicalize_comparison, canonicalized_cmp_code): New declarations. * expmed.c (canonicalize_comparison, canonicalized_cmp_code): New implementations. * expmed.c (emit_store_flag_1): Add call to canonicalize_comparison. * optabs.c (prepare_cmp_insn): Likewise. --- diff --git a/gcc/expmed.h b/gcc/expmed.h index 2890d9c9bbd034f01030dd551d544bf73e73b784..86a32a643fdd0fc9f396bd2c7904244bd484df16 100644 --- a/gcc/expmed.h +++ b/gcc/expmed.h @@ -702,6 +702,10 @@ extern rtx emit_store_flag (rtx, enum rtx_code, rtx, rtx, machine_mode, extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx, machine_mode, int, int); +extern void canonicalize_comparison (machine_mode, enum rtx_code *, rtx *); + +extern enum rtx_code canonicalized_cmp_code
Re: [PATCH][AARCH64] inline strlen for 8-bytes aligned strings
Wilco, On 10.08.2018 18:04, Wilco Dijkstra wrote: Hi, A quick benchmark shows it's faster up to about 10 bytes, but after that it becomes extremely slow. At 16 bytes it's already 2.5 times slower and for larger sizes its over 13 times slower than the GLIBC implementation... The implementation falls back to the library call if the string is not aligned. If it did that for larger sizes then it would be fine. However a byte loop is is unacceptably slow. Also given the large amount of inlined code, it would make sense to handle larger sizes than 8. It may be worth comparing a loop doing 8 bytes per iteration with the GLIBC strlen or just inline the first 16 bytes and then fallback to strlen. Also if you have statistics that show tiny strlen sizes are much more common then the strlen implementation could be further tuned for that. Valid points, thanks. I will consider that. BTW, what HW did you use for the benchmarking?
Re: [PATCH][AARCH64] inline strlen for 8-bytes aligned strings
Hi, A quick benchmark shows it's faster up to about 10 bytes, but after that it becomes extremely slow. At 16 bytes it's already 2.5 times slower and for larger sizes its over 13 times slower than the GLIBC implementation... > The implementation falls back to the library call if the > string is not aligned. If it did that for larger sizes then it would be fine. However a byte loop is is unacceptably slow. Also given the large amount of inlined code, it would make sense to handle larger sizes than 8. It may be worth comparing a loop doing 8 bytes per iteration with the GLIBC strlen or just inline the first 16 bytes and then fallback to strlen. Also if you have statistics that show tiny strlen sizes are much more common then the strlen implementation could be further tuned for that. Wilco
Re: [arm-8-branch] Add Linaro version string and macros
On Fri, Aug 10, 2018 at 3:35 PM, Yvan Roux wrote: > On Fri, 10 Aug 2018 at 14:31, Yvan Roux wrote: >> >> On Fri, 10 Aug 2018 at 13:45, Ramana Radhakrishnan >> wrote: >> > >> > On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux wrote: >> > > Hi, >> > > >> > > This patch adds Linaro version string and release macros to ARM GCC 8 >> > > vendor branch. >> > > >> > > Ok to commit? >> > > >> > >> > >> > Ok if no regressions. (I'm assuming you've built and eyeballed that >> > the pre-processor macros are being produced). (I have a patch to >> > www-docs for this branch that I'm writing up and should try and get >> > out today. Though it would be nice to have tests for these if >> > possible. >> >> I've not passed the regression testsuite since this patch is part of >> Linaro vendor branches for a long time, I've just built an x86_64 c >> compiler from arm-8-branch and verified the version string and macros: >> >> $ ~/wip/arm-8-install/bin/gcc --version >> gcc (Linaro GCC 8.2-2018.08~dev) 8.2.1 20180802 >> >> $ ~/wip/arm-8-install/bin/gcc -E -dM - < /dev/null | grep LINARO >> #define __LINARO_SPIN__ 0 >> #define __LINARO_RELEASE__ 201808 >> >> I can add some tests, but it will take some time to remember me how >> these kind of thing is tested in the testsuite ;) > > Updated version of the patch, with a test case for Linaro macros. The > test is not strict w/r to the version or spin number to avoid having > to update it every release or re-spin, do you think it is sufficient ? > > Testsuite ran with the included test PASS. > > gcc/ChangeLog > 2018-08-10 Yvan Roux > > * LINARO-VERSION: New file. > * configure.ac: Add Linaro version string. > * configure: Regenerate. > * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define. > (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition. > (cppbuiltin.o): Depend on $(LINAROVER). > * cppbuiltin.c (parse_linarover): New. > (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros. > > gcc/testsuite/ChangeLog > 2018-08-10 Yvan Roux > > * gcc.dg/cpp/linaro-macros.c: New test. Looks good, thanks - can you put the Changelog in a Changelog.arm file ? regards Ramana
Re: [arm-8-branch] Add Linaro version string and macros
On Fri, 10 Aug 2018 at 14:31, Yvan Roux wrote: > > On Fri, 10 Aug 2018 at 13:45, Ramana Radhakrishnan > wrote: > > > > On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux wrote: > > > Hi, > > > > > > This patch adds Linaro version string and release macros to ARM GCC 8 > > > vendor branch. > > > > > > Ok to commit? > > > > > > > > > Ok if no regressions. (I'm assuming you've built and eyeballed that > > the pre-processor macros are being produced). (I have a patch to > > www-docs for this branch that I'm writing up and should try and get > > out today. Though it would be nice to have tests for these if > > possible. > > I've not passed the regression testsuite since this patch is part of > Linaro vendor branches for a long time, I've just built an x86_64 c > compiler from arm-8-branch and verified the version string and macros: > > $ ~/wip/arm-8-install/bin/gcc --version > gcc (Linaro GCC 8.2-2018.08~dev) 8.2.1 20180802 > > $ ~/wip/arm-8-install/bin/gcc -E -dM - < /dev/null | grep LINARO > #define __LINARO_SPIN__ 0 > #define __LINARO_RELEASE__ 201808 > > I can add some tests, but it will take some time to remember me how > these kind of thing is tested in the testsuite ;) Updated version of the patch, with a test case for Linaro macros. The test is not strict w/r to the version or spin number to avoid having to update it every release or re-spin, do you think it is sufficient ? Testsuite ran with the included test PASS. gcc/ChangeLog 2018-08-10 Yvan Roux * LINARO-VERSION: New file. * configure.ac: Add Linaro version string. * configure: Regenerate. * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define. (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition. (cppbuiltin.o): Depend on $(LINAROVER). * cppbuiltin.c (parse_linarover): New. (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros. gcc/testsuite/ChangeLog 2018-08-10 Yvan Roux * gcc.dg/cpp/linaro-macros.c: New test. Index: gcc/LINARO-VERSION === --- gcc/LINARO-VERSION (nonexistent) +++ gcc/LINARO-VERSION (working copy) @@ -0,0 +1 @@ +8.2-2018.08~dev Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 263464) +++ gcc/Makefile.in (working copy) @@ -854,10 +854,12 @@ DEVPHASE:= $(srcdir)/DEV-PHASE # experimental, prerelease, "" DATESTAMP := $(srcdir)/DATESTAMP # MMDD or empty REVISION:= $(srcdir)/REVISION # [BRANCH revision XX] +LINAROVER := $(srcdir)/LINARO-VERSION # M.x-.MM[-S][~dev] BASEVER_c := $(shell cat $(BASEVER)) DEVPHASE_c := $(shell cat $(DEVPHASE)) DATESTAMP_c := $(shell cat $(DATESTAMP)) +LINAROVER_c := $(shell cat $(LINAROVER)) ifeq (,$(wildcard $(REVISION))) REVISION_c := @@ -884,6 +886,7 @@ "\"$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(DATESTAMP_c))\"" PKGVERSION_s:= "\"@PKGVERSION@\"" BUGURL_s:= "\"@REPORT_BUGS_TO@\"" +LINAROVER_s := "\"$(LINAROVER_c)\"" PKGVERSION := @PKGVERSION@ BUGURL_TEXI := @REPORT_BUGS_TEXI@ @@ -2883,8 +2886,9 @@ -DSTANDARD_EXEC_PREFIX=\"$(libdir)/gcc/\" \ @TARGET_SYSTEM_ROOT_DEFINE@ -CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) -cppbuiltin.o: $(BASEVER) +CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \ + -DLINAROVER=$(LINAROVER_s) +cppbuiltin.o: $(BASEVER) $(LINAROVER) CFLAGS-cppdefault.o += $(PREPROCESSOR_DEFINES) Index: gcc/configure === --- gcc/configure (revision 263464) +++ gcc/configure (working copy) @@ -1726,7 +1726,8 @@ --with-stabsarrange to use stabs instead of host debug format --with-dwarf2 force the default debug format to be DWARF 2 --with-specs=SPECS add SPECS to driver command-line processing - --with-pkgversion=PKG Use PKG in the version string in place of "GCC" + --with-pkgversion=PKG Use PKG in the version string in place of "Linaro + GCC `cat $srcdir/LINARO-VERSION`" --with-bugurl=URL Direct users to URL to report a bug --with-multilib-listselect multilibs (AArch64, SH and x86-64 only) --with-gnu-ld assume the C compiler uses GNU ld default=no @@ -7649,7 +7650,7 @@ *) PKGVERSION="($withval) " ;; esac else - PKGVERSION="(GCC) " + PKGVERSION="(Linaro GCC `cat $srcdir/LINARO-VERSION`) " fi @@ -18448,7 +18449,7 @@ lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18451 "configure" +#line 18452 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -18554,7 +18555,7 @@ lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18557 "configure" +#line 18558 "configure" #include "confdefs.h" #if HAVE_DLFCN_H Index: gcc/configure.ac
Fix even more merging PIC and PIE options
Hi, this patch should fix merging of PIC and PIE options so we always resort to the least common denominator of the object files compiled (i.e. linking together -fpic and -fPIE will result in -fpie binary). Note that we also use information about format of output passed by linker plugin so we will disable pic/pie when resulting binary is not relocatable. However for shared libraries and pie we want to pick right mode that makes sense. I wrote simple script that tries all possible options of combining two files. Mode picked is specified in the output file. To support targets that default to pic/pie well, I had to also hack lto_write_options to always stream what mode was used in the given file. lto-bootstrapped/regtested x86_64-linux, OK? Honza PR lto/86517 * lto-opts.c (lto_write_options): Always stream PIC/PIE mode. * lto-wrapper.c (merge_and_complain): Fix merging of PIC/PIE Index: lto-opts.c === --- lto-opts.c (revision 263356) +++ lto-opts.c (working copy) @@ -78,6 +78,22 @@ lto_write_options (void) && !global_options.x_flag_openacc) append_to_collect_gcc_options (_obstack, _p, "-fno-openacc"); + /* Append PIC/PIE mode because its default depends on target and it is + subject of merging in lto-wrapper. */ + if ((!global_options_set.x_flag_pic || global_options.x_flag_pic == 0) + && !global_options_set.x_flag_pie) +{ + append_to_collect_gcc_options (_obstack, _p, + global_options.x_flag_pic == 2 + ? "-fPIC" + : global_options.x_flag_pic == 1 + ? "-fpic" + : global_options.x_flag_pie == 2 + ? "-fPIE" + : global_options.x_flag_pie == 1 + ? "-fpie" + : "-fno-pie"); +} /* Append options from target hook and store them to offload_lto section. */ if (lto_stream_offload_p) Index: lto-wrapper.c === --- lto-wrapper.c (revision 263356) +++ lto-wrapper.c (working copy) @@ -408,6 +408,11 @@ merge_and_complain (struct cl_decoded_op It is a common mistake to mix few -fPIC compiled objects into otherwise non-PIC code. We do not want to build everything with PIC then. + Similarly we merge PIE options, however in addition we keep + -fPIC + -fPIE = -fPIE + -fpic + -fPIE = -fpie + -fPIC/-fpic + -fpie = -fpie + It would be good to warn on mismatches, but it is bit hard to do as we do not know what nothing translates to. */ @@ -415,11 +420,38 @@ merge_and_complain (struct cl_decoded_op if ((*decoded_options)[j].opt_index == OPT_fPIC || (*decoded_options)[j].opt_index == OPT_fpic) { - if (!pic_option - || (pic_option->value > 0) != ((*decoded_options)[j].value > 0)) - remove_option (decoded_options, j, decoded_options_count); - else if (pic_option->opt_index == OPT_fPIC -&& (*decoded_options)[j].opt_index == OPT_fpic) + /* -fno-pic in one unit implies -fno-pic everywhere. */ + if ((*decoded_options)[j].value == 0) + j++; + /* If we have no pic option or merge in -fno-pic, we still may turn + existing pic/PIC mode into pie/PIE if -fpie/-fPIE is present. */ + else if ((pic_option && pic_option->value == 0) +|| !pic_option) + { + if (pie_option) + { + bool big = (*decoded_options)[j].opt_index == OPT_fPIC + && pie_option->opt_index == OPT_fPIE; + (*decoded_options)[j].opt_index = big ? OPT_fPIE : OPT_fpie; + if (pie_option->value) + (*decoded_options)[j].canonical_option[0] = big ? "-fPIE" : "-fpie"; + else + (*decoded_options)[j].canonical_option[0] = big ? "-fno-pie" : "-fno-pie"; + (*decoded_options)[j].value = pie_option->value; + j++; + } + else if (pic_option) + { + (*decoded_options)[j] = *pic_option; + j++; + } + /* We do not know if target defaults to pic or not, so just remove + option if it is missing in one unit but enabled in other. */ + else + remove_option (decoded_options, j, decoded_options_count); + } + else if (pic_option->opt_index == OPT_fpic +&& (*decoded_options)[j].opt_index == OPT_fPIC) { (*decoded_options)[j] = *pic_option; j++; @@ -430,11 +462,42 @@ merge_and_complain (struct cl_decoded_op else if
Re: [PATCH][AARCH64] inline strlen for 8-bytes aligned strings
Richard, On 10.08.2018 16:54, Richard Earnshaw (lists) wrote: On 10/08/18 14:38, Anton Youdkevitch wrote: The patch inlines strlen for 8-byte aligned strings on AARCH64 like it's done on other platforms (power, s390). The implementation falls back to the library call if the string is not aligned. Synthetic testing on Cavium T88 and Cavium T99 showed the following performance gains: T99: up to 8 bytes: +100%, 100+ bytes - +20% T88: up 8 bytes: +100%, 100+ bytes - 0% which seems to be OK as most of the string are short strings. SPEC performance testing on T99 and T88 did not show any statistically significant differences. Bootstrapped and regression-tested on aarch64-linux-gnu. No new failures found. OK for trunk? 2016-08-10 Anton Youdkevitch * gcc/config/aarch64/aarch64.md (strlen) New pattern. (UNSPEC_BUILTIN_STRLEN): Define. * gcc/config/aarch64/aarch64.c (aarch64_expand_strlen): Expand only in 8-byte aligned case, do not attempt to adjust address * gcc/config/aarch64/aarch64-protos.h (aarch64_expand_strlen): Declare. * gcc/testsuite/gcc.target/aarch64/strlen_aligned.c: New I'm not generally convinced by this kind of optimization. It can be a win if the string is short, but it can be a major loss if the string is long, because the library function can do a number of tricks to improve performance significantly, such as switching opportunistically to Neon or SVE, or even just unwinding the loop further. We might end up with the bizarre situation where misaligned strings are faster to handle than aligned strings. So before even considering this, I'd like to see it benchmarked against a good library implementation across a range of string lengths. It is NOT sufficient to show that it is a win for short strings. Having said that, a quick dynamic test for the first 8 bytes containing a NULL byte and calling the library if that is not the case could be a significant win with minimal overhead for the fail case. Thank you for the feedback. I'd had the version your suggested in mind. Will get back with the updated patch. R. --- diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index cda2895..9beb289 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -358,6 +358,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx); bool aarch64_emit_approx_sqrt (rtx, rtx, bool); void aarch64_expand_call (rtx, rtx, bool); bool aarch64_expand_movmem (rtx *); +void aarch64_expand_strlen (rtx *); bool aarch64_float_const_zero_rtx_p (rtx); bool aarch64_float_const_rtx_p (rtx); bool aarch64_function_arg_regno_p (unsigned); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4b5183b..d12fb6b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16107,6 +16107,81 @@ aarch64_expand_movmem (rtx *operands) return true; } +/* Emit code to perform a strlen. + + OPERANDS[0] is the destination. + OPERANDS[1] is the string. + OPERANDS[2] is the char to search. + OPERANDS[3] is the alignment. */ + +void aarch64_expand_strlen (rtx* operands) { + rtx result = operands[0]; + rtx src = operands[1]; + rtx loop_label = gen_label_rtx (); + rtx end_label = gen_label_rtx (); + rtx end_loop_label = gen_label_rtx (); + rtx preloop_label = gen_label_rtx (); + rtx str = gen_reg_rtx (DImode); + rtx addr = force_reg (DImode, XEXP (src, 0)); + rtx start_addr = gen_reg_rtx(DImode); + rtx tmp1 = gen_reg_rtx (DImode); + rtx tmp2 = gen_reg_rtx (DImode); + rtx tmp3 = gen_reg_rtx (DImode); + rtx mask1 = gen_reg_rtx (DImode); + rtx mask2 = gen_reg_rtx (DImode); + rtx x; + rtx mem; + + emit_insn (gen_rtx_SET (start_addr, addr)); + emit_insn (gen_anddi3 (tmp1, addr, GEN_INT (4096 - 1))); + /* if less than 16 bytes left till the end of the page */ + x = gen_rtx_GT (DImode, tmp1, GEN_INT (4096 - 16)); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, +gen_rtx_LABEL_REF (Pmode, preloop_label), pc_rtx); + + emit_move_insn (str, gen_rtx_MEM (DImode, addr)); + emit_insn (gen_rtx_SET (mask1, GEN_INT (0x0101010101010101))); + emit_insn (gen_rtx_SET (mask2, GEN_INT (0x7f7f7f7f7f7f7f7f))); + + /* process the chunk */ + emit_insn (gen_subdi3 (tmp1, str, mask1)); + emit_insn (gen_iordi3 (tmp2, str, mask2)); + emit_insn (gen_rtx_SET (tmp2, gen_rtx_NOT (DImode, tmp2))); + emit_insn (gen_anddi3 (tmp3, tmp1, tmp2)); + + + /* if NULL found jump to calculate it's exact position */ + x = gen_rtx_NE (DImode, tmp3, GEN_INT (0)); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, +gen_rtx_LABEL_REF (Pmode, end_loop_label), pc_rtx); + emit_jump_insn (gen_rtx_SET (pc_rtx, x)); + + emit_insn (gen_adddi3 (addr, addr, GEN_INT (8))); + emit_label (preloop_label); + mem = gen_rtx_POST_MODIFY (DImode, addr, plus_constant (DImode, addr, 1)); + + /* simple byte loop */ + emit_label
Re: [RFC][PATCH v2] PR preprocessor/83173: Additional check before decrementing highest_location
On 05/29/2018 11:25 AM, Mike Gulick wrote: > On 03/04/2018 02:27 PM, Mike Gulick wrote: >> >> >> On 02/09/2018 05:54 PM, Mike Gulick wrote: >>> Hi David, >>> >>> Here is a new version of the linemap patch (see my earlier emails for an >>> updated >>> version of the test code). >> >> >> >> Hi David, >> >> Just wondering if you have had a chance to look at these updated patches >> yet. Let me know if you have any questions I can answer, or if there is >> anything you would like me to do that would make reviewing them easier >> (reposting, rebasing, refactoring the bug fix from the diagnostics >> change in the last patch). >> >> The most recent postings are: >> Bug fix patch: https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00557.html >> Test case: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01993.html >> >> Thanks, >> Mike >> > > Hi David, > > Now that gcc 8.1 is out the door, would you have any time to review these > patches? I re-tested them after rebasing on the latest git master, and > everything still behaved as expected. I can post the rebased patches if you > would like, but it was a trivial merge with no conflicts. > > Thanks, > Mike > I'd still like to get this bug fixed. I don't think I need the [RFC] tag any more on these patches. Should I post a new thread to this list to get the ball rolling again?
Re: [RFC,PATCH] Introduce -msdata=explicit for powerpc
Hi Alexandre, On Thu, Aug 09, 2018 at 03:23:12AM -0300, Alexandre Oliva wrote: > On Aug 8, 2018, Segher Boessenkool wrote: > > > Then you get sdata2 used (via srodata in generic code), and it is accessed > > via GPR2 (via the sda21 reloc and linker magic). It is hard to trace down > > :-) > > Aah, it didn't occur to me that the r2 uses could be introduced by the > linker. I was cutting corners and looking at the asm output, after > building only gcc, without binutils. Thanks for the explanation. Mind > if I add a comment about that next to... hmm... how about the > SMALL_DATA_* macros? This is fine, please commit to trunk. Thanks! Segher
Re: [PATCH][AARCH64] inline strlen for 8-bytes aligned strings
On 10/08/18 14:38, Anton Youdkevitch wrote: > The patch inlines strlen for 8-byte aligned strings on > AARCH64 like it's done on other platforms (power, s390). > The implementation falls back to the library call if the > string is not aligned. Synthetic testing on Cavium T88 > and Cavium T99 showed the following performance gains: > > T99: up to 8 bytes: +100%, 100+ bytes - +20% > T88: up 8 bytes: +100%, 100+ bytes - 0% > > which seems to be OK as most of the string are short strings. > > SPEC performance testing on T99 and T88 did not show any > statistically significant differences. > > Bootstrapped and regression-tested on aarch64-linux-gnu. > No new failures found. OK for trunk? > > 2016-08-10 Anton Youdkevitch > > * gcc/config/aarch64/aarch64.md (strlen) New pattern. > (UNSPEC_BUILTIN_STRLEN): Define. > * gcc/config/aarch64/aarch64.c (aarch64_expand_strlen): > Expand only in 8-byte aligned case, do not attempt to > adjust address > * gcc/config/aarch64/aarch64-protos.h > (aarch64_expand_strlen): Declare. > * gcc/testsuite/gcc.target/aarch64/strlen_aligned.c: New I'm not generally convinced by this kind of optimization. It can be a win if the string is short, but it can be a major loss if the string is long, because the library function can do a number of tricks to improve performance significantly, such as switching opportunistically to Neon or SVE, or even just unwinding the loop further. We might end up with the bizarre situation where misaligned strings are faster to handle than aligned strings. So before even considering this, I'd like to see it benchmarked against a good library implementation across a range of string lengths. It is NOT sufficient to show that it is a win for short strings. Having said that, a quick dynamic test for the first 8 bytes containing a NULL byte and calling the library if that is not the case could be a significant win with minimal overhead for the fail case. R. > > --- > > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index cda2895..9beb289 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -358,6 +358,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx); > bool aarch64_emit_approx_sqrt (rtx, rtx, bool); > void aarch64_expand_call (rtx, rtx, bool); > bool aarch64_expand_movmem (rtx *); > +void aarch64_expand_strlen (rtx *); > bool aarch64_float_const_zero_rtx_p (rtx); > bool aarch64_float_const_rtx_p (rtx); > bool aarch64_function_arg_regno_p (unsigned); > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 4b5183b..d12fb6b 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -16107,6 +16107,81 @@ aarch64_expand_movmem (rtx *operands) >return true; > } > > +/* Emit code to perform a strlen. > + > + OPERANDS[0] is the destination. > + OPERANDS[1] is the string. > + OPERANDS[2] is the char to search. > + OPERANDS[3] is the alignment. */ > + > +void aarch64_expand_strlen (rtx* operands) { > + rtx result = operands[0]; > + rtx src = operands[1]; > + rtx loop_label = gen_label_rtx (); > + rtx end_label = gen_label_rtx (); > + rtx end_loop_label = gen_label_rtx (); > + rtx preloop_label = gen_label_rtx (); > + rtx str = gen_reg_rtx (DImode); > + rtx addr = force_reg (DImode, XEXP (src, 0)); > + rtx start_addr = gen_reg_rtx(DImode); > + rtx tmp1 = gen_reg_rtx (DImode); > + rtx tmp2 = gen_reg_rtx (DImode); > + rtx tmp3 = gen_reg_rtx (DImode); > + rtx mask1 = gen_reg_rtx (DImode); > + rtx mask2 = gen_reg_rtx (DImode); > + rtx x; > + rtx mem; > + > + emit_insn (gen_rtx_SET (start_addr, addr)); > + emit_insn (gen_anddi3 (tmp1, addr, GEN_INT (4096 - 1))); > + /* if less than 16 bytes left till the end of the page */ > + x = gen_rtx_GT (DImode, tmp1, GEN_INT (4096 - 16)); > + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, > +gen_rtx_LABEL_REF (Pmode, preloop_label), > pc_rtx); > + > + emit_move_insn (str, gen_rtx_MEM (DImode, addr)); > + emit_insn (gen_rtx_SET (mask1, GEN_INT (0x0101010101010101))); > + emit_insn (gen_rtx_SET (mask2, GEN_INT (0x7f7f7f7f7f7f7f7f))); > + > + /* process the chunk */ > + emit_insn (gen_subdi3 (tmp1, str, mask1)); > + emit_insn (gen_iordi3 (tmp2, str, mask2)); > + emit_insn (gen_rtx_SET (tmp2, gen_rtx_NOT (DImode, tmp2))); > + emit_insn (gen_anddi3 (tmp3, tmp1, tmp2)); > + > + > + /* if NULL found jump to calculate it's exact position */ > + x = gen_rtx_NE (DImode, tmp3, GEN_INT (0)); > + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, > +gen_rtx_LABEL_REF (Pmode, end_loop_label), > pc_rtx); > + emit_jump_insn (gen_rtx_SET (pc_rtx, x)); > + > + emit_insn (gen_adddi3 (addr, addr, GEN_INT (8))); > + emit_label (preloop_label); > + mem = gen_rtx_POST_MODIFY (DImode, addr, plus_constant (DImode, addr, 1)); > + > + /* simple byte loop */ > +
[PATCH][AARCH64] inline strlen for 8-bytes aligned strings
The patch inlines strlen for 8-byte aligned strings on AARCH64 like it's done on other platforms (power, s390). The implementation falls back to the library call if the string is not aligned. Synthetic testing on Cavium T88 and Cavium T99 showed the following performance gains: T99: up to 8 bytes: +100%, 100+ bytes - +20% T88: up 8 bytes: +100%, 100+ bytes - 0% which seems to be OK as most of the string are short strings. SPEC performance testing on T99 and T88 did not show any statistically significant differences. Bootstrapped and regression-tested on aarch64-linux-gnu. No new failures found. OK for trunk? 2016-08-10 Anton Youdkevitch * gcc/config/aarch64/aarch64.md (strlen) New pattern. (UNSPEC_BUILTIN_STRLEN): Define. * gcc/config/aarch64/aarch64.c (aarch64_expand_strlen): Expand only in 8-byte aligned case, do not attempt to adjust address * gcc/config/aarch64/aarch64-protos.h (aarch64_expand_strlen): Declare. * gcc/testsuite/gcc.target/aarch64/strlen_aligned.c: New --- diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index cda2895..9beb289 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -358,6 +358,7 @@ bool aarch64_emit_approx_div (rtx, rtx, rtx); bool aarch64_emit_approx_sqrt (rtx, rtx, bool); void aarch64_expand_call (rtx, rtx, bool); bool aarch64_expand_movmem (rtx *); +void aarch64_expand_strlen (rtx *); bool aarch64_float_const_zero_rtx_p (rtx); bool aarch64_float_const_rtx_p (rtx); bool aarch64_function_arg_regno_p (unsigned); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4b5183b..d12fb6b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16107,6 +16107,81 @@ aarch64_expand_movmem (rtx *operands) return true; } +/* Emit code to perform a strlen. + + OPERANDS[0] is the destination. + OPERANDS[1] is the string. + OPERANDS[2] is the char to search. + OPERANDS[3] is the alignment. */ + +void aarch64_expand_strlen (rtx* operands) { + rtx result = operands[0]; + rtx src = operands[1]; + rtx loop_label = gen_label_rtx (); + rtx end_label = gen_label_rtx (); + rtx end_loop_label = gen_label_rtx (); + rtx preloop_label = gen_label_rtx (); + rtx str = gen_reg_rtx (DImode); + rtx addr = force_reg (DImode, XEXP (src, 0)); + rtx start_addr = gen_reg_rtx(DImode); + rtx tmp1 = gen_reg_rtx (DImode); + rtx tmp2 = gen_reg_rtx (DImode); + rtx tmp3 = gen_reg_rtx (DImode); + rtx mask1 = gen_reg_rtx (DImode); + rtx mask2 = gen_reg_rtx (DImode); + rtx x; + rtx mem; + + emit_insn (gen_rtx_SET (start_addr, addr)); + emit_insn (gen_anddi3 (tmp1, addr, GEN_INT (4096 - 1))); + /* if less than 16 bytes left till the end of the page */ + x = gen_rtx_GT (DImode, tmp1, GEN_INT (4096 - 16)); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, +gen_rtx_LABEL_REF (Pmode, preloop_label), pc_rtx); + + emit_move_insn (str, gen_rtx_MEM (DImode, addr)); + emit_insn (gen_rtx_SET (mask1, GEN_INT (0x0101010101010101))); + emit_insn (gen_rtx_SET (mask2, GEN_INT (0x7f7f7f7f7f7f7f7f))); + + /* process the chunk */ + emit_insn (gen_subdi3 (tmp1, str, mask1)); + emit_insn (gen_iordi3 (tmp2, str, mask2)); + emit_insn (gen_rtx_SET (tmp2, gen_rtx_NOT (DImode, tmp2))); + emit_insn (gen_anddi3 (tmp3, tmp1, tmp2)); + + + /* if NULL found jump to calculate it's exact position */ + x = gen_rtx_NE (DImode, tmp3, GEN_INT (0)); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, +gen_rtx_LABEL_REF (Pmode, end_loop_label), pc_rtx); + emit_jump_insn (gen_rtx_SET (pc_rtx, x)); + + emit_insn (gen_adddi3 (addr, addr, GEN_INT (8))); + emit_label (preloop_label); + mem = gen_rtx_POST_MODIFY (DImode, addr, plus_constant (DImode, addr, 1)); + + /* simple byte loop */ + emit_label (loop_label); + emit_move_insn (str, gen_rtx_ZERO_EXTEND (DImode, gen_rtx_MEM (QImode, mem))); + x = gen_rtx_NE (SImode, str, GEN_INT(0)); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, gen_rtx_LABEL_REF (Pmode, loop_label), pc_rtx); + emit_jump_insn (gen_rtx_SET (pc_rtx, x)); + + emit_insn (gen_subdi3 (result, addr, start_addr)); + /* adjusting after the last post-decrement */ + emit_insn (gen_adddi3 (result, result, GEN_INT (-1))); + emit_jump_insn (gen_jump (end_label)); + emit_barrier (); + + emit_label (end_loop_label); + emit_insn (gen_bswapdi2 (tmp3, tmp3)); + emit_insn (gen_clzdi2 (tmp3, tmp3)); + emit_insn (gen_ashrdi3 (tmp3, tmp3, GEN_INT (3))); + emit_move_insn (result, tmp3); + + emit_label(end_label); +} + /* Split a DImode store of a CONST_INT SRC to MEM DST as two SImode stores. Handle the case when the constant has identical bottom and top halves. This is beneficial when the two stores can be diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 10fcde6..7c60b69 100644 --- a/gcc/config/aarch64/aarch64.md +++
Re: [arm-8-branch] Add Linaro version string and macros
On Fri, 10 Aug 2018 at 13:45, Ramana Radhakrishnan wrote: > > On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux wrote: > > Hi, > > > > This patch adds Linaro version string and release macros to ARM GCC 8 > > vendor branch. > > > > Ok to commit? > > > > > Ok if no regressions. (I'm assuming you've built and eyeballed that > the pre-processor macros are being produced). (I have a patch to > www-docs for this branch that I'm writing up and should try and get > out today. Though it would be nice to have tests for these if > possible. I've not passed the regression testsuite since this patch is part of Linaro vendor branches for a long time, I've just built an x86_64 c compiler from arm-8-branch and verified the version string and macros: $ ~/wip/arm-8-install/bin/gcc --version gcc (Linaro GCC 8.2-2018.08~dev) 8.2.1 20180802 $ ~/wip/arm-8-install/bin/gcc -E -dM - < /dev/null | grep LINARO #define __LINARO_SPIN__ 0 #define __LINARO_RELEASE__ 201808 I can add some tests, but it will take some time to remember me how these kind of thing is tested in the testsuite ;) > regards > Ramana > > > > Thanks > > Yvan > > > > gcc/ChangeLog > > 2018-08-10 Yvan Roux > > > > * LINARO-VERSION: New file. > > * configure.ac: Add Linaro version string. > > * configure: Regenerate. > > * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define. > > (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition. > > (cppbuiltin.o): Depend on $(LINAROVER). > > * cppbuiltin.c (parse_linarover): New. > > (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.
Re: [arm-8-branch] Add Linaro version string and macros
On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux wrote: > Hi, > > This patch adds Linaro version string and release macros to ARM GCC 8 > vendor branch. > > Ok to commit? > Ok if no regressions. (I'm assuming you've built and eyeballed that the pre-processor macros are being produced). (I have a patch to www-docs for this branch that I'm writing up and should try and get out today. Though it would be nice to have tests for these if possible. regards Ramana > Thanks > Yvan > > gcc/ChangeLog > 2018-08-10 Yvan Roux > > * LINARO-VERSION: New file. > * configure.ac: Add Linaro version string. > * configure: Regenerate. > * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define. > (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition. > (cppbuiltin.o): Depend on $(LINAROVER). > * cppbuiltin.c (parse_linarover): New. > (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.
[RFC PATCH, i386]: Deprecate -mmitigate-rop
This option is fairly ineffective, and in the light of CET, nobody seems interested to improve it. Deprecate the option, so it won't lure developers to the land of false security. 2018-08-10 Uros Bizjak * config/i386/i386.opt (mmitigate-rop): Mark as deprecated. * doc/invoke.texi (mmitigate-rop): Remove. * config/i386/i386.c: Do not include "regrename.h". (ix86_rop_should_change_byte_p, reg_encoded_number) (ix86_get_modrm_for_rop, set_rop_modrm_reg_bits, ix86_mitigate_rop): Remove. (ix86_reorg): Remove call to ix86_mitigate_rop. * config/i386/i386.md (attr "modrm_class"): Remove. (cmp_ccno_1, mov_xor, movstrict_xor, x86_movcc_0_m1. x86_movcc_0_m1_se) (x86_movcc_0_m1_neg): Remove modrm_class attribute override. testsuite/Changelog: 2018-08-10 Uros Bizjak * gcc.target/i386/rop1.c: Remove. * gcc.target/i386/pr83554 (dg-options): Remove -mmitigate-rop. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Any opinion against the deprecation? Uros. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7554fd1f6599..d70e92124ddc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -75,7 +75,6 @@ along with GCC; see the file COPYING3. If not see #include "tree-iterator.h" #include "dbgcnt.h" #include "case-cfn-macros.h" -#include "regrename.h" #include "dojump.h" #include "fold-const-call.h" #include "tree-vrp.h" @@ -3135,15 +3134,6 @@ ix86_debug_options (void) return; } -/* Return true if T is one of the bytes we should avoid with - -mmitigate-rop. */ - -static bool -ix86_rop_should_change_byte_p (int t) -{ - return t == 0xc2 || t == 0xc3 || t == 0xca || t == 0xcb; -} - static const char *stringop_alg_names[] = { #define DEF_ENUM #define DEF_ALG(alg, name) #name, @@ -29110,98 +29100,6 @@ ix86_instantiate_decls (void) instantiate_decl_rtl (s->rtl); } -/* Return the number used for encoding REG, in the range 0..7. */ - -static int -reg_encoded_number (rtx reg) -{ - unsigned regno = REGNO (reg); - switch (regno) -{ -case AX_REG: - return 0; -case CX_REG: - return 1; -case DX_REG: - return 2; -case BX_REG: - return 3; -case SP_REG: - return 4; -case BP_REG: - return 5; -case SI_REG: - return 6; -case DI_REG: - return 7; -default: - break; -} - if (IN_RANGE (regno, FIRST_STACK_REG, LAST_STACK_REG)) -return regno - FIRST_STACK_REG; - if (IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)) -return regno - FIRST_SSE_REG; - if (IN_RANGE (regno, FIRST_MMX_REG, LAST_MMX_REG)) -return regno - FIRST_MMX_REG; - if (IN_RANGE (regno, FIRST_REX_SSE_REG, LAST_REX_SSE_REG)) -return regno - FIRST_REX_SSE_REG; - if (IN_RANGE (regno, FIRST_REX_INT_REG, LAST_REX_INT_REG)) -return regno - FIRST_REX_INT_REG; - if (IN_RANGE (regno, FIRST_MASK_REG, LAST_MASK_REG)) -return regno - FIRST_MASK_REG; - return -1; -} - -/* Given an insn INSN with NOPERANDS OPERANDS, return the modr/m byte used - in its encoding if it could be relevant for ROP mitigation, otherwise - return -1. If POPNO0 and POPNO1 are nonnull, store the operand numbers - used for calculating it into them. */ - -static int -ix86_get_modrm_for_rop (rtx_insn *insn, rtx *operands, int noperands, - int *popno0 = 0, int *popno1 = 0) -{ - if (asm_noperands (PATTERN (insn)) >= 0) -return -1; - int has_modrm = get_attr_modrm (insn); - if (!has_modrm) -return -1; - enum attr_modrm_class cls = get_attr_modrm_class (insn); - rtx op0, op1; - switch (cls) -{ -case MODRM_CLASS_OP02: - gcc_assert (noperands >= 3); - if (popno0) - { - *popno0 = 0; - *popno1 = 2; - } - op0 = operands[0]; - op1 = operands[2]; - break; -case MODRM_CLASS_OP01: - gcc_assert (noperands >= 2); - if (popno0) - { - *popno0 = 0; - *popno1 = 1; - } - op0 = operands[0]; - op1 = operands[1]; - break; -default: - return -1; -} - if (REG_P (op0) && REG_P (op1)) -{ - int enc0 = reg_encoded_number (op0); - int enc1 = reg_encoded_number (op1); - return 0xc0 + (enc1 << 3) + enc0; -} - return -1; -} - /* Check whether x86 address PARTS is a pc-relative address. */ bool @@ -42215,215 +42113,6 @@ ix86_seh_fixup_eh_fallthru (void) } } -/* Given a register number BASE, the lowest of a group of registers, update - regsets IN and OUT with the registers that should be avoided in input - and output operands respectively when trying to avoid generating a modr/m - byte for -mmitigate-rop. */ - -static void -set_rop_modrm_reg_bits (int base, HARD_REG_SET , HARD_REG_SET ) -{ - SET_HARD_REG_BIT (out, base); - SET_HARD_REG_BIT (out, base + 1); - SET_HARD_REG_BIT (in, base + 2); - SET_HARD_REG_BIT (in, base + 3); -} - -/* Called if -mmitigate-rop is in effect. Try to rewrite
Re: [PATCH][OBVIOUS] Remove extra line in MAINTAINERS.
On 08/10/2018 01:12 PM, Paolo Carlini wrote: > Hi, > > On 10/08/2018 12:44, Martin Liška wrote: >> On 08/10/2018 12:38 PM, Paolo Carlini wrote: >>> Hi, >>> >>> On 10/08/2018 12:35, Martin Liška wrote: Hi. This removes one extra line that's a typo. >>> It's not ;) The complete line is "c++ runtime libs special modes". See? >>> Admittedly, we may want to have something clearer, but just removing that >>> line isn't the way to go. >>> >>> Paolo. >> Oup, I was too eager. >> So then what about putting "c++ runtime libs special modes" on one line? > Yes, that would be nice, but I think Francois didn't manage to do that while > keeping a consistent formatting, that line is *long*! > > Paolo. Then I'm going to install following patch. Martin >From c72c46fce810b63e162ebcce8abc2fd1d44e2181 Mon Sep 17 00:00:00 2001 From: marxin Date: Fri, 10 Aug 2018 13:36:53 +0200 Subject: [PATCH] Fix wrongly removed line. ChangeLog: 2018-08-10 Martin Liska * MAINTAINERS: Revert change in previous commit and join lines. --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 8d42c91b2d7..a9f20d73afe 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -188,7 +188,7 @@ c++ runtime libs Paolo Carlini c++ runtime libs Ulrich Drepper c++ runtime libs Benjamin De Kosnik c++ runtime libs Jonathan Wakely -special modes François Dumont +c++ runtime libs special modes François Dumont fixincludes Bruce Korb *gimpl* Jakub Jelinek *gimpl* Aldy Hernandez -- 2.18.0
Re: Improve safe iterator move semantic
On 10 August 2018 at 13:47, Jonathan Wakely wrote: > Doing a test like this with TSan should be the absolute minimum > required for any change to the mutex locking policy. Agreed. Concurrency code is something that our test suite is not well-equipped to test (because it doesn't support TSan and such yet), so it would seem prudent to stress-test such patches via testsuite-external means. > I'm not aware of people complaining about the performance of debug > mode anyway. Everybody I speak to is happy to accept a performance hit > in order to get checking. Yep; while it's nice to have performance improvements in debug mode, there are probably more important and significant ways to improve it.. > I think https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86843 would be > more helpful to our users, as it would allow some Debug Mode checks to > be enabled in programs that can't currently use it (because > recompiling the entire program is not possible). ..like this one, which would be a major usability improvement.
Re: [PATCH][OBVIOUS] Remove extra line in MAINTAINERS.
Hi, On 10/08/2018 12:44, Martin Liška wrote: On 08/10/2018 12:38 PM, Paolo Carlini wrote: Hi, On 10/08/2018 12:35, Martin Liška wrote: Hi. This removes one extra line that's a typo. It's not ;) The complete line is "c++ runtime libs special modes". See? Admittedly, we may want to have something clearer, but just removing that line isn't the way to go. Paolo. Oup, I was too eager. So then what about putting "c++ runtime libs special modes" on one line? Yes, that would be nice, but I think Francois didn't manage to do that while keeping a consistent formatting, that line is *long*! Paolo.
Re: [PATCH] Remove not needed __builtin_expect due to malloc predictor.
On 10/08/18 12:32 +0200, Martin Liška wrote: Hi. After we introduced new non-NULL malloc predictor, we can remove these __builtin_expects. Predictors will change in following way: Before: Predictions for bb 5 first match heuristics: 10.00% combined heuristics: 10.00% __builtin_expect heuristics of edge 5->6: 10.00% call heuristics of edge 5->6 (ignored): 33.00% loop exit heuristics of edge 5->9 (ignored): 5.50% After: Predictions for bb 5 first match heuristics: 0.04% combined heuristics: 0.04% pointer (on trees) heuristics of edge 5->6 (ignored): 30.00% malloc returned non-NULL heuristics of edge 5->6: 0.04% call heuristics of edge 5->6 (ignored): 33.00% loop exit heuristics of edge 5->9 (ignored): 5.50% Maybe there are similar allocation-related expects, but I haven't found them. Ready after it survives regression tests? OK for trunk - thanks! Martin libstdc++-v3/ChangeLog: 2018-08-10 Martin Liska * libsupc++/new_op.cc (new): Remove __builtin_expect as malloc predictor can handle that. * libsupc++/new_opa.cc: Likewise. * libsupc++/new_opnt.cc (new): Likewise. --- libstdc++-v3/libsupc++/new_op.cc | 2 +- libstdc++-v3/libsupc++/new_opa.cc | 2 +- libstdc++-v3/libsupc++/new_opnt.cc | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/libstdc++-v3/libsupc++/new_op.cc b/libstdc++-v3/libsupc++/new_op.cc index 3a1e38d9df7..3caa0bab2ea 100644 --- a/libstdc++-v3/libsupc++/new_op.cc +++ b/libstdc++-v3/libsupc++/new_op.cc @@ -47,7 +47,7 @@ operator new (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc) if (sz == 0) sz = 1; - while (__builtin_expect ((p = malloc (sz)) == 0, false)) + while ((p = malloc (sz)) == 0) { new_handler handler = std::get_new_handler (); if (! handler) diff --git a/libstdc++-v3/libsupc++/new_opa.cc b/libstdc++-v3/libsupc++/new_opa.cc index 68eac5b8ceb..a27ff843ca1 100644 --- a/libstdc++-v3/libsupc++/new_opa.cc +++ b/libstdc++-v3/libsupc++/new_opa.cc @@ -126,7 +126,7 @@ operator new (std::size_t sz, std::align_val_t al) #endif using __gnu_cxx::aligned_alloc; - while (__builtin_expect ((p = aligned_alloc (align, sz)) == 0, false)) + while ((p = aligned_alloc (align, sz)) == 0) { new_handler handler = std::get_new_handler (); if (! handler) diff --git a/libstdc++-v3/libsupc++/new_opnt.cc b/libstdc++-v3/libsupc++/new_opnt.cc index a2dc33ad4d3..faab44e66c2 100644 --- a/libstdc++-v3/libsupc++/new_opnt.cc +++ b/libstdc++-v3/libsupc++/new_opnt.cc @@ -40,7 +40,7 @@ operator new (std::size_t sz, const std::nothrow_t&) _GLIBCXX_USE_NOEXCEPT if (sz == 0) sz = 1; - while (__builtin_expect ((p = malloc (sz)) == 0, false)) + while ((p = malloc (sz)) == 0) { new_handler handler = std::get_new_handler (); if (! handler)
Re: Improve safe iterator move semantic
On 10/08/18 11:00 +0100, Jonathan Wakely wrote: This valid program shows data races with -fsanitize=thread after your patch is applied: #define _GLIBCXX_DEBUG #include #include void thrash(std::vector::iterator& iter) { for (int i = 0; i < 1000; ++i) { auto jiter = std::move(iter); iter = std::move(jiter); jiter = std::move(iter); iter = std::move(jiter); } } int main() { std::vector v{1, 2, 3}; auto it1 = v.begin(); auto it2 = v.begin(); std::thread t1(thrash, std::ref(it1)); thrash(it2); t1.join(); } Doing a test like this with TSan should be the absolute minimum required for any change to the mutex locking policy. You want to reduce the locks on move operations of iterators? OK, perform lots of move operations on two iterators from the same sequence and see if ThreadSanitizer shows errors. If TSan doesn't show errors, adjust the test and try harder to prove it's correct. Maybe your test was flawed. For example, my first attempt to prove the data race passed the iterators by value not by reference. That meant there were more than two debug iterators in the linked list, and so the two iterators being tested were not adjacent in the list, and didn't conflict. Because I'd already proved there was a bug by inspecting the code I knew my test was flawed, so I changed it. But even if I hadn't already proved a bug, I would have kept testing different variations to prove whether the code was correct or not. Our debug iterators make "non-local" modifications to other objects from the same sequence. That needs careful synchronisation. If that makes the code a bit slower, then we just have to live with it. Correct and slow is better than fast and wrong. I'm not aware of people complaining about the performance of debug mode anyway. Everybody I speak to is happy to accept a performance hit in order to get checking. I think https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86843 would be more helpful to our users, as it would allow some Debug Mode checks to be enabled in programs that can't currently use it (because recompiling the entire program is not possible).
Re: [PATCH][OBVIOUS] Remove extra line in MAINTAINERS.
On 08/10/2018 12:38 PM, Paolo Carlini wrote: > Hi, > > On 10/08/2018 12:35, Martin Liška wrote: >> Hi. >> >> This removes one extra line that's a typo. > It's not ;) The complete line is "c++ runtime libs special modes". See? > Admittedly, we may want to have something clearer, but just removing that > line isn't the way to go. > > Paolo. Oup, I was too eager. So then what about putting "c++ runtime libs special modes" on one line? Martin
Re: [PATCH][OBVIOUS] Remove extra line in MAINTAINERS.
Hi, On 10/08/2018 12:35, Martin Liška wrote: Hi. This removes one extra line that's a typo. It's not ;) The complete line is "c++ runtime libs special modes". See? Admittedly, we may want to have something clearer, but just removing that line isn't the way to go. Paolo.
[PATCH][OBVIOUS] Remove extra line in MAINTAINERS.
Hi. This removes one extra line that's a typo. Martin ChangeLog: 2018-08-10 Martin Liska * MAINTAINERS: Remove extra line. --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index f96ab622146..8d42c91b2d7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -188,7 +188,6 @@ c++ runtime libs Paolo Carlini c++ runtime libs Ulrich Drepper c++ runtime libs Benjamin De Kosnik c++ runtime libs Jonathan Wakely -c++ runtime libs special modes François Dumont fixincludes Bruce Korb *gimpl* Jakub Jelinek
[PATCH] Remove not needed __builtin_expect due to malloc predictor.
Hi. After we introduced new non-NULL malloc predictor, we can remove these __builtin_expects. Predictors will change in following way: Before: Predictions for bb 5 first match heuristics: 10.00% combined heuristics: 10.00% __builtin_expect heuristics of edge 5->6: 10.00% call heuristics of edge 5->6 (ignored): 33.00% loop exit heuristics of edge 5->9 (ignored): 5.50% After: Predictions for bb 5 first match heuristics: 0.04% combined heuristics: 0.04% pointer (on trees) heuristics of edge 5->6 (ignored): 30.00% malloc returned non-NULL heuristics of edge 5->6: 0.04% call heuristics of edge 5->6 (ignored): 33.00% loop exit heuristics of edge 5->9 (ignored): 5.50% Maybe there are similar allocation-related expects, but I haven't found them. Ready after it survives regression tests? Martin libstdc++-v3/ChangeLog: 2018-08-10 Martin Liska * libsupc++/new_op.cc (new): Remove __builtin_expect as malloc predictor can handle that. * libsupc++/new_opa.cc: Likewise. * libsupc++/new_opnt.cc (new): Likewise. --- libstdc++-v3/libsupc++/new_op.cc | 2 +- libstdc++-v3/libsupc++/new_opa.cc | 2 +- libstdc++-v3/libsupc++/new_opnt.cc | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/libstdc++-v3/libsupc++/new_op.cc b/libstdc++-v3/libsupc++/new_op.cc index 3a1e38d9df7..3caa0bab2ea 100644 --- a/libstdc++-v3/libsupc++/new_op.cc +++ b/libstdc++-v3/libsupc++/new_op.cc @@ -47,7 +47,7 @@ operator new (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc) if (sz == 0) sz = 1; - while (__builtin_expect ((p = malloc (sz)) == 0, false)) + while ((p = malloc (sz)) == 0) { new_handler handler = std::get_new_handler (); if (! handler) diff --git a/libstdc++-v3/libsupc++/new_opa.cc b/libstdc++-v3/libsupc++/new_opa.cc index 68eac5b8ceb..a27ff843ca1 100644 --- a/libstdc++-v3/libsupc++/new_opa.cc +++ b/libstdc++-v3/libsupc++/new_opa.cc @@ -126,7 +126,7 @@ operator new (std::size_t sz, std::align_val_t al) #endif using __gnu_cxx::aligned_alloc; - while (__builtin_expect ((p = aligned_alloc (align, sz)) == 0, false)) + while ((p = aligned_alloc (align, sz)) == 0) { new_handler handler = std::get_new_handler (); if (! handler) diff --git a/libstdc++-v3/libsupc++/new_opnt.cc b/libstdc++-v3/libsupc++/new_opnt.cc index a2dc33ad4d3..faab44e66c2 100644 --- a/libstdc++-v3/libsupc++/new_opnt.cc +++ b/libstdc++-v3/libsupc++/new_opnt.cc @@ -40,7 +40,7 @@ operator new (std::size_t sz, const std::nothrow_t&) _GLIBCXX_USE_NOEXCEPT if (sz == 0) sz = 1; - while (__builtin_expect ((p = malloc (sz)) == 0, false)) + while ((p = malloc (sz)) == 0) { new_handler handler = std::get_new_handler (); if (! handler)
[arm-8-branch] Add Linaro version string and macros
Hi, This patch adds Linaro version string and release macros to ARM GCC 8 vendor branch. Ok to commit? Thanks Yvan gcc/ChangeLog 2018-08-10 Yvan Roux * LINARO-VERSION: New file. * configure.ac: Add Linaro version string. * configure: Regenerate. * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define. (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition. (cppbuiltin.o): Depend on $(LINAROVER). * cppbuiltin.c (parse_linarover): New. (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros. Index: gcc/LINARO-VERSION === --- gcc/LINARO-VERSION (nonexistent) +++ gcc/LINARO-VERSION (working copy) @@ -0,0 +1 @@ +8.2-2018.08~dev Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 263464) +++ gcc/Makefile.in (working copy) @@ -854,10 +854,12 @@ DEVPHASE:= $(srcdir)/DEV-PHASE # experimental, prerelease, "" DATESTAMP := $(srcdir)/DATESTAMP # MMDD or empty REVISION:= $(srcdir)/REVISION # [BRANCH revision XX] +LINAROVER := $(srcdir)/LINARO-VERSION # M.x-.MM[-S][~dev] BASEVER_c := $(shell cat $(BASEVER)) DEVPHASE_c := $(shell cat $(DEVPHASE)) DATESTAMP_c := $(shell cat $(DATESTAMP)) +LINAROVER_c := $(shell cat $(LINAROVER)) ifeq (,$(wildcard $(REVISION))) REVISION_c := @@ -884,6 +886,7 @@ "\"$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(DATESTAMP_c))\"" PKGVERSION_s:= "\"@PKGVERSION@\"" BUGURL_s:= "\"@REPORT_BUGS_TO@\"" +LINAROVER_s := "\"$(LINAROVER_c)\"" PKGVERSION := @PKGVERSION@ BUGURL_TEXI := @REPORT_BUGS_TEXI@ @@ -2883,8 +2886,9 @@ -DSTANDARD_EXEC_PREFIX=\"$(libdir)/gcc/\" \ @TARGET_SYSTEM_ROOT_DEFINE@ -CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) -cppbuiltin.o: $(BASEVER) +CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \ + -DLINAROVER=$(LINAROVER_s) +cppbuiltin.o: $(BASEVER) $(LINAROVER) CFLAGS-cppdefault.o += $(PREPROCESSOR_DEFINES) Index: gcc/configure === --- gcc/configure (revision 263464) +++ gcc/configure (working copy) @@ -1726,7 +1726,8 @@ --with-stabsarrange to use stabs instead of host debug format --with-dwarf2 force the default debug format to be DWARF 2 --with-specs=SPECS add SPECS to driver command-line processing - --with-pkgversion=PKG Use PKG in the version string in place of "GCC" + --with-pkgversion=PKG Use PKG in the version string in place of "Linaro + GCC `cat $srcdir/LINARO-VERSION`" --with-bugurl=URL Direct users to URL to report a bug --with-multilib-listselect multilibs (AArch64, SH and x86-64 only) --with-gnu-ld assume the C compiler uses GNU ld default=no @@ -7649,7 +7650,7 @@ *) PKGVERSION="($withval) " ;; esac else - PKGVERSION="(GCC) " + PKGVERSION="(Linaro GCC `cat $srcdir/LINARO-VERSION`) " fi @@ -18448,7 +18449,7 @@ lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18451 "configure" +#line 18452 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -18554,7 +18555,7 @@ lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18557 "configure" +#line 18558 "configure" #include "confdefs.h" #if HAVE_DLFCN_H Index: gcc/configure.ac === --- gcc/configure.ac (revision 263464) +++ gcc/configure.ac (working copy) @@ -929,7 +929,7 @@ ) AC_SUBST(CONFIGURE_SPECS) -ACX_PKGVERSION([GCC]) +ACX_PKGVERSION([Linaro GCC `cat $srcdir/LINARO-VERSION`]) ACX_BUGURL([https://gcc.gnu.org/bugs/]) # Sanity check enable_languages in case someone does not run the toplevel Index: gcc/cppbuiltin.c === --- gcc/cppbuiltin.c (revision 263464) +++ gcc/cppbuiltin.c (working copy) @@ -53,18 +53,41 @@ *patchlevel = s_patchlevel; } +/* Parse a LINAROVER version string of the format "M.m-year.month[-spin][~dev]" + to create Linaro release number MM and spin version. */ +static void +parse_linarover (int *release, int *spin) +{ + static int s_year = -1, s_month, s_spin; + if (s_year == -1) +if (sscanf (LINAROVER, "%*[^-]-%d.%d-%d", _year, _month, _spin) != 3) + { + sscanf (LINAROVER, "%*[^-]-%d.%d", _year, _month); + s_spin = 0; + } + + if (release) +*release = s_year * 100 + s_month; + + if (spin) +*spin = s_spin; +} + /* Define __GNUC__, __GNUC_MINOR__, __GNUC_PATCHLEVEL__ and __VERSION__. */ static void define__GNUC__ (cpp_reader *pfile) { - int major, minor, patchlevel; + int major, minor, patchlevel, linaro_release, linaro_spin; parse_basever (, , ); + parse_linarover (_release,
Re: Improve safe iterator move semantic
On 09/08/18 20:41 +0200, François Dumont wrote: Here is a patch to improve Debug mode safe iterator move semantic. To summarize where we used to have N mutex locks we now have N - 1. For instance move constructor used to lock mutex twice, now it only does it once. Note that move constructor or move assignment operator are currently more expensive than their copy counterparts ! Yes, because they have to mutate two objects, not just one. We could just remove the move operations, and so all "moves" just do copies instead. That would not allow Dbueg Mode to assert on operations on moved-from iterators. Do we care about that? Do any of our (non-debug) container iterators actually become invalid after a move, or are they all safe to keep using? +#if __cplusplus >= 201103L +_Safe_iterator_base(_Safe_iterator_base&&) = default; + +_Safe_iterator_base& +operator=(_Safe_iterator_base&&) = default; +#endif This is very counterintuitive. The type has pointer members, so moving doesn't set them to null, it's just a copy. In the derived moves we do an explicit _M_reset() later, to make it more "move-like", but the base operation is still a copy. We would need **at least** a comment on the defaulted move operations saying "these just copy the members, but then the derived class clears them after adjusting pointers for the sequence's other iterators". But there's another counterintuitive problem: in C++98 mode _Safe_base has an implicit (public) copy constructor and copy assignment operator, but in C++11 mode those are suppressed by the (protected) move constructor and move assignment operator. Either we should declare private+undefined copy ops in C++98 mode (to make them consistently unavailable), or just change the defaulted move into defaulted copies (since that's what they do anyway). Or alternatively, don't define a move constructor and move assignment operator at all, and delete copies and moves. The derived _Safe_iterator move ctor and move assignment can both use the same function _M_move to do the moving and the resetting all at once. See the attached patch (relative to yours) to see what I mean. BUT, I think the whole concept of reducing the number of mutex locks is flawed. See below. @@ -148,11 +161,12 @@ namespace __gnu_debug /** Reset all member variables */ void -_M_reset() throw (); +_M_reset() _GLIBCXX_USE_NOEXCEPT +{ __builtin_memset(this, 0, sizeof(_Safe_iterator_base)); } This is undefined behaviour, because _Safe_iterator_base is not trivially copyable. Adding a static assertion shows: In file included from /home/jwakely/src/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:29: /home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_base.h:179:17: error: static assertion failed static_assert(std::is_trivially_copyable<_Safe_iterator_base>::value, ""); ^~~ + inline void + _Safe_local_iterator_base::_M_move_to(_Safe_iterator_base* __x, + bool __constant) + { +__gnu_cxx::__scoped_lock __l(this->_M_get_mutex()); +if (_M_prior) + _M_prior->_M_next = this; +else + { + // No prior, we are at the beginning of the linked list. + auto& __its = __constant + ? _M_get_container()->_M_const_local_iterators + : _M_get_container()->_M_local_iterators; + if (__its == __x) + __its = this; + } + +if (_M_next) + _M_next->_M_prior = this; + } This is not thread-safe. What if another thread is modifying *__x._M_prior at the same time? It will lock __x._M_prior->_M_get_mutex(), and then try to update *__x._M_prior->_M_next, which is __x. But nothing locks __x._M_get_mutex() to make that safe. This valid program shows data races with -fsanitize=thread after your patch is applied: #define _GLIBCXX_DEBUG #include #include void thrash(std::vector::iterator& iter) { for (int i = 0; i < 1000; ++i) { auto jiter = std::move(iter); iter = std::move(jiter); jiter = std::move(iter); iter = std::move(jiter); } } int main() { std::vector v{1, 2, 3}; auto it1 = v.begin(); auto it2 = v.begin(); std::thread t1(thrash, std::ref(it1)); thrash(it2); t1.join(); } (It also shows data races with my version of the patch attached to this mail, because my patch only refactors yours a bit, it doesn't change how the mutexes are locked). I think this idea is fundamentally flawed. diff --git a/libstdc++-v3/include/debug/safe_base.h b/libstdc++-v3/include/debug/safe_base.h index c276a1883a1..aa6fc76ec42 100644 --- a/libstdc++-v3/include/debug/safe_base.h +++ b/libstdc++-v3/include/debug/safe_base.h @@ -97,13 +97,6 @@ namespace __gnu_debug : _M_sequence(0), _M_version(0), _M_prior(0), _M_next(0) { this->_M_attach(__x._M_sequence, __constant); } -#if __cplusplus >= 201103L -_Safe_iterator_base(_Safe_iterator_base&&) = default; - -_Safe_iterator_base& -
Re: [C++ Patch] Tweak check_previous_goto_1 to emit hard errors instead of permerrors in some cases
.. an additional clarification (I told you that over the years we changed this code quite a bit...): I originally added the testcase that I'm adjusting here, I did that when, back in 2014, I worked on 63558: the test uses -fpermissive -w and was meant to check, as requested by Manuel in the bug, that we didn't issue further unsuppressible explanatory diagnostic after the primary permerror. I would argue that that first clean-up back 2014 was an improvement but note that, in practice, before it we rejected (with a confusing permerror + error) this kind of broken code and we don't anymore. That's bad. We don't really emit sensible assembly for it, as I already explained. Paolo.