Re: [PATCH 1/3][middle-end]PR78809 (Inline strcmp with small constant strings)

2017-11-17 Thread Paolo Carlini
Hi, On 17/11/2017 06:29, Jeff Law wrote: OK. I'll go ahead and commit for you. Beautiful. Thanks Jeff. I think this patch is small enough to not require a copyright assignment. However further work likely will. I don't offhand know if Oracle has a blanket assignment in place. Can you work

[3/7] Split mask checking out of vectorizable_mask_load_store

2017-11-17 Thread Richard Sandiford
This patch splits the mask argument checking out of vectorizable_mask_load_store, so that a later patch can use it in both vectorizable_load and vectorizable_store. It also adds dump messages for false returns. This is mostly useful for the TYPE_VECTOR_SUBPARTS check, which can fail if pattern

[2/7] Make vect_model_store_cost take a vec_load_store_type

2017-11-17 Thread Richard Sandiford
This patch makes vect_model_store_cost take a vec_load_store_type instead of a vect_def_type. It's a wash on its own, but it helps with later patches. Richard 2017-11-17 Richard Sandiford gcc/ * tree-vectorizer.h (vec_load_store_type): Moved from

Re: [PATCH] Use bswap framework in store-merging (PR tree-optimization/78821)

2017-11-17 Thread Thomas Preudhomme
Hi Jakub, On 16/11/17 17:06, Jakub Jelinek wrote: Hi! This patch uses the bswap pass framework inside of the store merging pass to handle adjacent stores which produce together a 16/32/64 bit store of bswapped value (loaded or from SSA_NAME) or identity (usually only from SSA_NAME, the code

Re: [PATCH][PR c++/82888] smarter code for default initialization of scalar arrays

2017-11-17 Thread Richard Biener
On Thu, Nov 16, 2017 at 5:21 PM, Nathan Froyd wrote: > Default-initialization of scalar arrays in C++ member initialization > lists produced rather slow code, laboriously setting each element of the > array to zero. It would be much faster to block-initialize the array, >

[PATCH] Remove some useless work in PRE

2017-11-17 Thread Richard Biener
VN already sees if an expresion is fully constant so there's no reason to duplicate that work during PHI translation. I've verified with an assert the paths are indeed unreachable. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2017-11-17 Richard Biener

[6/7] Split gather load handling out of vectorizable_{mask_load_store,load}

2017-11-17 Thread Richard Sandiford
vectorizable_mask_load_store and vectorizable_load used the same code to build a gather load call, except that the former also vectorised a mask argument and used it for both the src and mask inputs. The latter instead used a src input of zero and a mask input of all-ones. This patch splits the

[5/7] Split out gather load mask building

2017-11-17 Thread Richard Sandiford
This patch splits out the code to build an all-bits-one or all-bits-zero input to a gather load. The catch is that both masks can have floating-point type, in which case they are implicitly treated in the same way as an integer bitmask. Richard 2017-11-17 Richard Sandiford

[4/7] Split rhs checking out of vectorizable_{,mask_load_}store

2017-11-17 Thread Richard Sandiford
This patch splits out the rhs checking code that's common to both vectorizable_mask_load_store and vectorizable_store. Richard 2017-11-17 Richard Sandiford gcc/ * tree-vect-stmts.c (vect_check_store_rhs): New function, split out from...

[patch][x86] skylake costs

2017-11-17 Thread Koval, Julia
Hi, this patch introduces separate cost model for skylake-avx512. Ok for trunk? gcc/ * config/i386/i386.c (processor_target_table): Add skylake_cost for skylake-avx512. * config/i386/x86-tune-costs.h (skylake_memcpy, skylake_memset, skylake_cost): New. Thanks,

Re: [patch][x86] skylake costs

2017-11-17 Thread Uros Bizjak
On Fri, Nov 17, 2017 at 10:18 AM, Koval, Julia wrote: > Hi, this patch introduces separate cost model for skylake-avx512. Ok for > trunk? > > gcc/ > * config/i386/i386.c (processor_target_table): Add skylake_cost for > skylake-avx512. > *

[PATCH] Ability to remap file names in __FILE__, etc (PR other/70268)

2017-11-17 Thread Boris Kolpackov
The below patch adds the -fmacro-prefix-map option that allows remapping of file names in __FILE__, __BASE_FILE__, and __builtin_FILE(), similar to how -fdebug-prefix-map allows to do the same for debug information. Additionally, the patch adds -ffile-prefix-map which can be used to specify both

Add support for bitwise reductions

2017-11-17 Thread Richard Sandiford
This patch adds support for the SVE bitwise reduction instructions (ANDV, ORV and EORV). It's a fairly mechanical extension of existing REDUC_* operators. Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu and powerpc64le-linux-gnu. Richard 2017-11-17 Richard Sandiford

Re: [PATCH] Fix PR83017 (fortran part)

2017-11-17 Thread Janne Blomqvist
On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener wrote: > This patch changes the Fortran frontend to annotate DO CONCURRENT > with parallel instead of ivdep. > > The patch is not enough to enable a runtime benefit because of > some autopar costing issues but for other cases it

Re: [PATCH, AArch64] Adjust tuning parameters for Falkor

2017-11-17 Thread Kyrill Tkachov
Hi Luis, [cc'ing aarch64 maintainers, it's quicker to get review that way] On 15/11/17 03:00, Luis Machado wrote: > I think the best thing is to leave this tuning structure in place and > just change default_opt_level to -1 to disable it at -O3. > > Thanks, > Andrew > Indeed that seems to

Re: Add support for masked load/store_lanes

2017-11-17 Thread Richard Sandiford
Richard Sandiford writes: > This patch adds support for vectorising groups of IFN_MASK_LOADs > and IFN_MASK_STOREs using conditional load/store-lanes instructions. > This requires new internal functions to represent the result > (IFN_MASK_{LOAD,STORE}_LANES), as well

Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-17 Thread Richard Biener
On Thu, 16 Nov 2017, Tamar Christina wrote: > Hi Richard, > > > > > I'd have made it > > > > if { ([is-effective-target non_strict_align] > > && ! ( [istarget ...] || )) > > > > thus default it to 1 for non-strict-align targets. > > > > Fair, I've switched it to a black list

Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

2017-11-17 Thread Kyrill Tkachov
On 17/11/17 10:45, Sudi Das wrote: Hi Kyrill Thanks I have made the change. Thanks Sudi, I've committed this on your behalf with r254863. Kyrill Sudi From: Kyrill Tkachov Sent: Thursday, November 16, 2017 5:03 PM To: Sudi Das; gcc-patches@gcc.gnu.org Cc:

Re: [PATCH] Disable -ftrapping-math by default

2017-11-17 Thread Richard Biener
On Fri, Nov 17, 2017 at 12:10 AM, Marc Glisse wrote: > On Thu, 16 Nov 2017, Richard Biener wrote: > >> On Thu, Nov 16, 2017 at 3:33 PM, Wilco Dijkstra >> wrote: >>> >>> GCC currently defaults to -ftrapping-math. This is supposed to generate >>> code

[0/7] Fold vectorizable_mask_load_store into vectorizable_load/store

2017-11-17 Thread Richard Sandiford
The vectoriser uses vectorizable_mask_load_store to handle conditional loads and stores (IFN_MASK_LOAD and IFN_MASK_STORE) but uses vectorizable_load and vectorizable_store for unconditional loads and stores. vectorizable_mask_load_store shares a lot of code with the other two routines, and this

[1/7] Move code that stubs out IFN_MASK_LOADs

2017-11-17 Thread Richard Sandiford
vectorizable_mask_load_store replaces scalar IFN_MASK_LOAD calls with dummy assignments, so that they never survive vectorisation. This patch moves the code to vect_transform_loop instead, so that we only change the scalar statements once all of them have been vectorised. This makes it easier to

SLP reductions with variable-length vectors

2017-11-17 Thread Richard Sandiford
Two things stopped us using SLP reductions with variable-length vectors: (1) We didn't have a way of constructing the initial vector. This patch does it by creating a vector full of the neutral identity value and then using a shift-and-insert function to insert any non-identity inputs

Re: [PATCH 7/7]: Enable clobber high for tls descs on Aarch64

2017-11-17 Thread Andrew Pinski
On Fri, Nov 17, 2017 at 12:21 AM, Alan Hayward wrote: > >> On 16 Nov 2017, at 19:32, Andrew Pinski wrote: >> >> On Thu, Nov 16, 2017 at 4:35 AM, Alan Hayward wrote: >>> This final patch adds the clobber high expressions to tls_desc

RFC: libgomp target plugins and atexit

2017-11-17 Thread Tom de Vries
Hi, I wrote a patch that called some function in the common libgomp code from GOMP_OFFLOAD_fini_device, and found that it hung due to the fact that: - gomp_target_fini locks devices[*].lock while calling GOMP_OFFLOAD_fini_device, and - the function call that I added also locked that same

[patch] Add support for #pragma GCC unroll

2017-11-17 Thread Eric Botcazou
Hi, this is a cleaned up and updated revision of Mike's latest posted patch implementing #pragma GCC unroll in the C and C++ compilers. To be honest, we're not so much interested in the front-end bits as in the middle-end bits, because the latter would at last make the Ada version of the

Re: [PATCH] [BRIGFE] Reduce the number of type conversions due to the untyped HSAIL regs

2017-11-17 Thread Rainer Orth
Hi Pekka, > Instead of always representing the HSAIL's untyped registers as > unsigned int, the gccbrig now pre-analyzes the BRIG code and > builds the register variables as a type used the most when storing > or reading data to/from each register. This reduces the total > conversions which

Re: [PATCH] Improve -Wmaybe-uninitialized documentation

2017-11-17 Thread Jonathan Wakely
On 16/11/17 10:59 -0700, Jeff Law wrote: On 11/16/2017 03:49 AM, Jonathan Wakely wrote: On 15/11/17 20:28 -0700, Martin Sebor wrote: On 11/15/2017 07:31 AM, Jonathan Wakely wrote: The docs for -Wmaybe-uninitialized have some issues: - That first sentence is looong. - Apparently some C++ 

[PATCH] Fix PR83017 (fortran part)

2017-11-17 Thread Richard Biener
This adds a new ANNOTATE_EXPR kind, annot_expr_parallel_kind, which is stronger than ivdep which maps semantically to safelen=INT_MAX which alone doesn't tell us enough to auto-parallelize anything. annot_expr_parallel_kind can map to the already existing loop flag can_be_parallel which can be

RE: [PATCH] [ARC] update GLIBC_DYNAMIC_LINKER per glibc upstreaming review comments

2017-11-17 Thread Claudiu Zissulescu
Hi, > gcc/ > * config/arc/linux.h: GLIBC_DYNAMIC_LINKER update per glibc > upstreaming review comments > Accepted and committed. Thank you for your contribution, Claudiu

Re: [PATCH] Improve -Wmaybe-uninitialized documentation

2017-11-17 Thread Jonathan Wakely
On 16/11/17 09:18 -0700, Martin Sebor wrote: On 11/16/2017 03:49 AM, Jonathan Wakely wrote: On 15/11/17 20:28 -0700, Martin Sebor wrote: On 11/15/2017 07:31 AM, Jonathan Wakely wrote: The docs for -Wmaybe-uninitialized have some issues: - That first sentence is looong. - Apparently some

Re: [PATCH 7/7]: Enable clobber high for tls descs on Aarch64

2017-11-17 Thread Alan Hayward
> On 16 Nov 2017, at 19:32, Andrew Pinski wrote: > > On Thu, Nov 16, 2017 at 4:35 AM, Alan Hayward wrote: >> This final patch adds the clobber high expressions to tls_desc for aarch64. >> It also adds three tests. >> >> In addition I also tested by

[PATCH, libgomp, openacc] Use GOMP_ASYNC_SYNC in GOACC_declare

2017-11-17 Thread Tom de Vries
Hi, GOACC_enter_exit_data has this prototype: ... void GOACC_enter_exit_data (int device, size_t mapnum, void **hostaddrs, size_t *sizes, unsigned short *kinds, int async, int num_waits, ...) ... And GOACC_declare calls

[7/7] Make vectorizable_load/store handle IFN_MASK_LOAD/STORE

2017-11-17 Thread Richard Sandiford
After the previous patches, it's easier to see that the remaining inlined transform code in vectorizable_mask_load_store is just a cut-down version of the VMAT_CONTIGUOUS handling in vectorizable_load and vectorizable_store. This patch therefore makes those functions handle masked loads and

Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

2017-11-17 Thread Sudi Das
Hi Kyrill Thanks I have made the change. Sudi From: Kyrill Tkachov Sent: Thursday, November 16, 2017 5:03 PM To: Sudi Das; gcc-patches@gcc.gnu.org Cc: nd; Ramana Radhakrishnan; Richard Earnshaw Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c   Hi

Re: [RFTesting] New POINTER_DIFF_EXPR

2017-11-17 Thread Richard Biener
On Sat, Nov 11, 2017 at 12:44 AM, Marc Glisse wrote: > Adding some random cc: to people who might be affected. Hopefully I am not > breaking any of your stuff... > > Ulrich Weigand (address space) > Ilya Enkovich (pointer bound check) > DJ Delorie (target with 24-bit partial

Re: [PATCH] Fix PR83017 (fortran part)

2017-11-17 Thread Richard Biener
On Fri, 17 Nov 2017, Janne Blomqvist wrote: > On Fri, Nov 17, 2017 at 3:03 PM, Richard Biener wrote: > > On Fri, 17 Nov 2017, Janne Blomqvist wrote: > > > >> On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener wrote: > >> > This patch changes the Fortran

Re: [PATCH 1/3][middle-end]PR78809 (Inline strcmp with small constant strings)

2017-11-17 Thread Qing Zhao
thanks Jeff and Paolo. really appreciate for all the help so far. Qing > On Nov 17, 2017, at 3:17 AM, Paolo Carlini wrote: > > Hi, > > On 17/11/2017 06:29, Jeff Law wrote: >> OK. I'll go ahead and commit for you. > Beautiful. Thanks Jeff. >> I think this patch is

Add support for vectorising live-out values using SVE LASTB

2017-11-17 Thread Richard Sandiford
This patch uses the SVE LASTB instruction to optimise cases in which a value produced by the final scalar iteration of a vectorised loop is live outside the loop. Previously this situation would stop us from using a fully-masked loop. Tested on aarch64-linux-gnu (with and without SVE),

Allow single-element interleaving for non-power-of-2 strides

2017-11-17 Thread Richard Sandiford
This allows LD3 to be used for isolated a[i * 3] accesses, in a similar way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively. Given the problems with the cost model underestimating the cost of elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE cases that are

[PATCH, libgomp, openacc] Factor out async argument utility functions

2017-11-17 Thread Tom de Vries
Hi, I've factored out 3 new functions to test properties of enum acc_async_t: ... typedef enum acc_async_t { /* Keep in sync with include/gomp-constants.h. */ acc_async_noval = -1, acc_async_sync = -2 } acc_async_t; ... In order to understand what this means: ... if (async <

Re: [RFA][PATCH] patch 6/n Refactoring evrp

2017-11-17 Thread Richard Biener
On Fri, Nov 17, 2017 at 8:18 AM, Jeff Law wrote: > > As I've stated several times one of the goals here is to provide a > little range analysis module that we can embed & reuse. > > To accomplish that I need to break down the evrp class. > > This patch does the bulk of the real

[PATCH GCC]Support load in CT_STORE_STORE chain if dominated by store in the same loop iteration

2017-11-17 Thread Bin Cheng
Hi, I previously introduced CT_STORE_STORE chains in predcom. This patch further supports load reference in CT_STORE_STORE chain if the load is dominated by a store reference in the same loop iteration. So example as in added test case: for (i = 0; i < len; i++) { a[i] = t1;

Re: [PATCH][PR c++/82888] smarter code for default initialization of scalar arrays

2017-11-17 Thread Jason Merrill
On Thu, Nov 16, 2017 at 11:21 AM, Nathan Froyd wrote: > Default-initialization of scalar arrays in C++ member initialization > lists produced rather slow code, laboriously setting each element of the > array to zero. It would be much faster to block-initialize the array, >

[PATCH] combine: Don't split insns if half is unused (PR82621)

2017-11-17 Thread Segher Boessenkool
If we have a PARALLEL of two SETs, and one half is unused, we currently happily split that into two instructions (albeit the unused one is useless). Worse, as PR82621 shows, combine will happily merge this insn into I3 even if some intervening insn sets the same register again, which is wrong.

Add support for fully-predicated loops

2017-11-17 Thread Richard Sandiford
This patch adds support for using a single fully-predicated loop instead of a vector loop and a scalar tail. An SVE WHILELO instruction generates the predicate for each iteration of the loop, given the current scalar iv value and the loop bound. This operation is wrapped up in a new internal

Use single-iteration epilogues when peeling for gaps

2017-11-17 Thread Richard Sandiford
This patch adds support for fully-masking loops that require peeling for gaps. It peels exactly one scalar iteration and uses the masked loop to handle the rest. Previously we would fall back on using a standard unmasked loop instead. Tested on aarch64-linux-gnu (with and without SVE),

Add an "early rematerialisation" pass

2017-11-17 Thread Richard Sandiford
This patch looks for pseudo registers that are live across a call and for which no call-preserved hard registers exist. It then recomputes the pseudos as necessary to ensure that they are no longer live across a call. The comment at the head of the file describes the approach. A new target hook

Re: [patch] Add support for #pragma GCC unroll

2017-11-17 Thread Richard Biener
On Fri, Nov 17, 2017 at 11:23 AM, Eric Botcazou wrote: > Hi, > > this is a cleaned up and updated revision of Mike's latest posted patch > implementing #pragma GCC unroll in the C and C++ compilers. To be honest, > we're not so much interested in the front-end bits as in

[PATCH][AArch64] Set SLOW_BYTE_ACCESS

2017-11-17 Thread Wilco Dijkstra
Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing bitfields by their declared type, which results in better codegeneration on practically any target. I'm thinking we should completely remove all trace of SLOW_BYTE_ACCESS from GCC as it's confusing and useless. OK for commit

Re: [PATCH][GCC][ARM] Implement "arch" GCC pragma and "+" attributes [Patch (2/3)]

2017-11-17 Thread Kyrill Tkachov
On 15/11/17 15:59, Tamar Christina wrote: -Original Message- From: Kyrill Tkachov [mailto:kyrylo.tkac...@foss.arm.com] Sent: Wednesday, November 15, 2017 10:11 To: Tamar Christina ; Sandra Loosemore ; gcc-patches@gcc.gnu.org Cc: nd

Re: [PATCH 1/3][middle-end]PR78809 (Inline strcmp with small constant strings)

2017-11-17 Thread Jeff Law
On 11/17/2017 02:17 AM, Paolo Carlini wrote: > Hi, > > On 17/11/2017 06:29, Jeff Law wrote: >> OK. I'll go ahead and commit for you. > Beautiful. Thanks Jeff. >> I think this patch is small enough to not require a copyright >> assignment.  However further work likely will.  I don't offhand know

Re: [PATCH] [BRIGFE] Reduce the number of type conversions due to the untyped HSAIL regs

2017-11-17 Thread Pekka Jääskeläinen
Hi Rainer, On Fri, Nov 17, 2017 at 1:32 PM, Rainer Orth wrote: > Please fix. Fixed in r254870. BR, Pekka

[PATCH] rs6000: Fix for altivec-macros.c

2017-11-17 Thread Segher Boessenkool
This fixes the altivec-macros.c testcase; we now need to explicitly say "no column number" for messages without one. Tested on powerpc64-linux {-m32,-m64}; committing to trunk. Segher 2017-11-17 Segher Boessenkool gcc/testsuite/ *

Re: [RFA][PATCH] patch 5/n Cleaning up evrp

2017-11-17 Thread Pedro Alves
On 11/17/2017 04:49 AM, Jeff Law wrote: > + /* We do not allow copying this object or initializing one from another. > */ > + evrp_dom_walker (const evrp_dom_walker &); > + evrp_dom_walker& operator= (const evrp_dom_walker &); > + Note you can use include/ansidecl.h's

[PATCH][AArch64] Remove remaining uses of * in patterns

2017-11-17 Thread Wilco Dijkstra
Remove the remaining uses of '*' from aarch64.md. Using '*' in alternatives is typically incorrect as it tells the register allocator to ignore those alternatives. Also add a missing '?' so we prefer a floating point register for same-size int<->fp conversions. Passes regress & bootstrap, OK for

Add an empty_mask_is_expensive hook

2017-11-17 Thread Richard Sandiford
This patch adds a hook to control whether we avoid executing masked (predicated) stores when the mask is all false. We don't want to do that by default for SVE. Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2017-11-17

Add support for conditional reductions using SVE CLASTB

2017-11-17 Thread Richard Sandiford
This patch uses SVE CLASTB to optimise conditional reductions. It means that we no longer need to maintain a separate index vector to record the most recent valid value, and no longer need to worry about overflow cases. Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu and

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2017-11-17 Thread James Greenhalgh
On Fri, Nov 17, 2017 at 03:21:31PM +, Wilco Dijkstra wrote: > Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing > bitfields by their declared type, which results in better codegeneration on > practically > any target. > > I'm thinking we should completely remove all

Rework the legitimize_address_displacement hook

2017-11-17 Thread Richard Sandiford
This patch: - tweaks the handling of legitimize_address_displacement so that it gets called before rather than after the address has been expanded. This means that we're no longer at the mercy of LRA being able to interpret the expanded instructions. - passes the original offset to

Re: [RFA][PATCH] patch 7/n Introduce evrp_range_analyzer class

2017-11-17 Thread Richard Biener
On Fri, Nov 17, 2017 at 8:41 AM, Jeff Law wrote: > This patch introduces the evrp_range_analyzer class. This is the class > we're going to be able to embed into existing dominator walkers to > provide them with context sensitive range analysis. > > The bulk of the class is

Add unroll-and-jam pass v2

2017-11-17 Thread Michael Matz
Hi, so I've dusted off and improved the implementation of unroll-and-jam from last year. The changes relative to last submission are: * corrected feasibility of the transform (i.e. that dependency directions are correctly retained, the last submission was wrong). * added profitability

Re: [RFTesting] New POINTER_DIFF_EXPR

2017-11-17 Thread Jason Merrill
On Fri, Nov 17, 2017 at 7:56 AM, Richard Biener wrote: > On Sat, Nov 11, 2017 at 12:44 AM, Marc Glisse wrote: >> Adding some random cc: to people who might be affected. Hopefully I am not >> breaking any of your stuff... >> >> Ulrich Weigand

[C++ Patch, V2] PR 82593 ("Internal compiler error: in process_init_constructor_array, at cp/typeck2.c:1294")

2017-11-17 Thread Paolo Carlini
Hi again, I managed to spend much more time on the issue and I'm starting a new thread with a mature - IMHO - proposal: the big thing is the use of the existing check_array_designated_initializer in process_init_constructor_array,  which calls maybe_constant_value, as we want, and covers all

Re: [PATCH, AArch64] Adjust tuning parameters for Falkor

2017-11-17 Thread Luis Machado
On 11/17/2017 07:25 AM, Kyrill Tkachov wrote: Hi Luis, [cc'ing aarch64 maintainers, it's quicker to get review that way] On 15/11/17 03:00, Luis Machado wrote: > I think the best thing is to leave this tuning structure in place and > just change default_opt_level   to -1 to disable it at -O3.

[PATCH Obvious]Remove redundant check on component distance

2017-11-17 Thread Bin Cheng
Hi, This is an obvious patch removing redundant check on component distance in tree-predcom.c Bootstrap and test along with next patch. Is it OK? Thanks, bin 2017-11-15 Bin Cheng * tree-predcom.c (add_ref_to_chain): Remove check on distance.From

Re: [PATCH] Fix PR83017 (fortran part)

2017-11-17 Thread Richard Biener
On Fri, 17 Nov 2017, Janne Blomqvist wrote: > On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener wrote: > > This patch changes the Fortran frontend to annotate DO CONCURRENT > > with parallel instead of ivdep. > > > > The patch is not enough to enable a runtime benefit because

[PATCH] PR83017, parloops part

2017-11-17 Thread Richard Biener
This makes the minimum number of iterations per thread a --param instead of a magic define and handles loop->can_be_parallel independent of whether flag_loop_parallelize_all was enabled (and thus also handle loops our own dependence analysis can analyze but graphites could not). It also adjusts

Re: [PATCH] Fix PR83017 (fortran part)

2017-11-17 Thread Janne Blomqvist
On Fri, Nov 17, 2017 at 3:03 PM, Richard Biener wrote: > On Fri, 17 Nov 2017, Janne Blomqvist wrote: > >> On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener wrote: >> > This patch changes the Fortran frontend to annotate DO CONCURRENT >> > with parallel

Re: [RFA][PATCH] patch 4/n Refactor bits of vrp_visit_assignment_or_call -- correct patch attached this time

2017-11-17 Thread Richard Biener
On Fri, Nov 17, 2017 at 5:17 AM, Jeff Law wrote: > No nyquil tonight, so the proper patch is attached this time... > > -- > > > > So the next group of changes is focused on breaking down evrp into an > analysis engine and the actual optimization pass. The analysis engine > can

Re: [PATCH] Add _Float/_FloatX rounding built-ins & improve gimple optimization of _Float/_FloatX built-in functions

2017-11-17 Thread Segher Boessenkool
Hi! On Fri, Nov 17, 2017 at 12:04:45AM -0500, Michael Meissner wrote: > This patch is an enhancement of a previous page that never got approved. > https://gcc.gnu.org/ml/gcc-patches/2017-10/threads.html#02124 > > In the original patch, I added support to the machine independent > infrastructure

Make ivopts handle calls to internal functions

2017-11-17 Thread Richard Sandiford
ivopts previously treated pointer arguments to internal functions like IFN_MASK_LOAD and IFN_MASK_STORE as normal gimple values. This patch makes it treat them as addresses instead. This makes a significant difference to the code quality for SVE loops, since we can then use loads and stores with

Handle peeling for alignment with masking

2017-11-17 Thread Richard Sandiford
This patch adds support for aligning vectors by using a partial first iteration. E.g. if the start pointer is 3 elements beyond an aligned address, the first iteration will have a mask in which the first three elements are false. On SVE, the optimisation is only useful for vector-length-specific

Re: [PATCH, GCC/ARM] Do no clobber r4 in Armv8-M nonsecure call

2017-11-17 Thread Kyrill Tkachov
Hi Thomas, On 15/11/17 17:14, Thomas Preudhomme wrote: Hi, Expanders for Armv8-M nonsecure call unnecessarily clobber r4 despite the libcall they perform not writing to r4. Furthermore, the requirement for the branch target address to be in r4 as expected by the libcall is modeled in a

Re: [PATCH 7/7]: Enable clobber high for tls descs on Aarch64

2017-11-17 Thread Szabolcs Nagy
On 17/11/17 08:42, Andrew Pinski wrote: > On Fri, Nov 17, 2017 at 12:21 AM, Alan Hayward wrote: >> >>> On 16 Nov 2017, at 19:32, Andrew Pinski wrote: >>> >>> On Thu, Nov 16, 2017 at 4:35 AM, Alan Hayward wrote: This final patch

Re: [PATCH, AArch64] Adjust tuning parameters for Falkor

2017-11-17 Thread James Greenhalgh
On Wed, Nov 15, 2017 at 03:00:53AM +, Luis Machado wrote: > > I think the best thing is to leave this tuning structure in place and > > just change default_opt_level to -1 to disable it at -O3. > > > > Thanks, > > Andrew > > > > Indeed that seems to be more appropriate if re-enabling

Re: [PR tree-optimization/83022] malloc/memset->calloc too aggressive

2017-11-17 Thread Jeff Law
On 11/17/2017 09:07 AM, Nathan Sidwell wrote: > We currently optimize a malloc/memset pair into a calloc call (when the > values match, of course).  This turns out to be a pessimization for > mysql 5.6, where the allocator looks like: > > void *ptr = malloc (size); > if (ptr && other_condition) >

Re: [1/7] Move code that stubs out IFN_MASK_LOADs

2017-11-17 Thread Jeff Law
On 11/17/2017 02:17 AM, Richard Sandiford wrote: > vectorizable_mask_load_store replaces scalar IFN_MASK_LOAD calls with > dummy assignments, so that they never survive vectorisation. This patch > moves the code to vect_transform_loop instead, so that we only change > the scalar statements once

Re: [PR tree-optimization/83022] malloc/memset->calloc too aggressive

2017-11-17 Thread Jeff Law
On 11/17/2017 11:57 AM, Nathan Sidwell wrote: > On 11/17/2017 01:37 PM, Jeff Law wrote: > >> ISTM the better way to drive this is to query the branch probabilities. >> It'd probably be simpler too.  Is there some reason that's not a good >> solution? > > (a) I'd have to learn how to do that Yea,

[PATCH] handle non-constant offsets in -Wstringop-overflow (PR 77608)

2017-11-17 Thread Martin Sebor
The attached patch enhances -Wstringop-overflow to detect more instances of buffer overflow at compile time by handling non- constant offsets into the destination object that are known to be in some range. The solution could be improved by handling even more cases (e.g., anti-ranges or offsets

Silence overactive assert in ipa-inline

2017-11-17 Thread Jan Hubicka
Hi, with frequencies not being capped by 100 it is easy to run into roundoff errors that are more than 1. Maybe I will need to give up on this assert (which would be pity as it is useful) but for now I just made it bit more tolerant. Bootstrapped/regtested x86_64-linux. Honza *

Disturb profile less in tree-tailcall

2017-11-17 Thread Jan Hubicka
Hi, with tail recursion and accumulation it is quite common case that the profile is unrealistic and the recursive call is triggered more often then the entry block. This patch prevents tailcall from dropping entry block profile to 0 (and making it very cold) in this case. Bootstrapped/regtested

Re: [PATCH] MicroBlaze use default ident output generation

2017-11-17 Thread Michael Eager
On 11/15/2017 10:58 PM, Nathan Rossi wrote: Remove the MicroBlaze specific TARGET_ASM_OUTPUT_IDENT definition, and use the default. This resolves issues associated with the use of the .sdata2 operation in cases where emitted assembly after the ident output is incorrectly in the .sdata2 section

Re: [4/7] Split rhs checking out of vectorizable_{,mask_load_}store

2017-11-17 Thread Jeff Law
On 11/17/2017 02:18 AM, Richard Sandiford wrote: > This patch splits out the rhs checking code that's common to both > vectorizable_mask_load_store and vectorizable_store. > > Richard > > > 2017-11-17 Richard Sandiford > > gcc/ > * tree-vect-stmts.c

Re: [PR tree-optimization/83022] malloc/memset->calloc too aggressive

2017-11-17 Thread Nathan Sidwell
On 11/17/2017 01:37 PM, Jeff Law wrote: ISTM the better way to drive this is to query the branch probabilities. It'd probably be simpler too. Is there some reason that's not a good solution? (a) I'd have to learn how to do that (b) in the case where the condition is just a null check,

Re: [7/7] Make vectorizable_load/store handle IFN_MASK_LOAD/STORE

2017-11-17 Thread Jeff Law
On 11/17/2017 02:21 AM, Richard Sandiford wrote: > After the previous patches, it's easier to see that the remaining > inlined transform code in vectorizable_mask_load_store is just a > cut-down version of the VMAT_CONTIGUOUS handling in vectorizable_load > and vectorizable_store. This patch

Re: [Patch, fortran] PR78990 [5/6/7 Regression] ICE when assigning polymorphic array function result

2017-11-17 Thread Paul Richard Thomas
Hi Dominique, Quite suddenly, I am seeing fault too. I don't know what has changed. I'm on to it. Thanks Paul On 15 November 2017 at 11:40, Dominique d'Humières wrote: > Hi Paul, > > Your patch fixes the ICE and pass the tests. However I see > > At line 22 of file

Increase precision of static profiles

2017-11-17 Thread Jan Hubicka
Hi, this patch makes static profile to be in range 0...2^30 rather than 0...1. This is safe now as profile-counts are taking care of possible overflow when the profile ends up cummulating too high after inlining. MThere are two testcases that needs adusting. dump-2.c simply checks for

Re: [PATCH, AArch64] Adjust tuning parameters for Falkor

2017-11-17 Thread Luis Machado
On 11/17/2017 01:48 PM, James Greenhalgh wrote: On Wed, Nov 15, 2017 at 03:00:53AM +, Luis Machado wrote: I think the best thing is to leave this tuning structure in place and just change default_opt_level to -1 to disable it at -O3. Thanks, Andrew Indeed that seems to be more

Remove unnecessary temporary in tree-if-conv.c

2017-11-17 Thread Richard Sandiford
The call to ifc_temp_var in predicate_mem_writes become redundant in r230099. Before that point the mask was calculated using fold_build_*s, but now it's calculated by gimple_build and so is already a valid gimple value. As it stands, the call forces an SSA_NAME-to-SSA_NAME copy to be created,

Fix x86 vectorization cost wrt unsupported 8bit and 64bit integer ops

2017-11-17 Thread Jan Hubicka
Hi, as discussed at IRC, currently vectorizer costmodel ignores the fact that not all vector operations are supported. In particular when vectorizing byte and 64bit integer loops we quite often end up producing slower vector sequence by believing that we can use vector operations which does not

Add support for in-order addition reduction using SVE FADDA

2017-11-17 Thread Richard Sandiford
This patch adds support for in-order floating-point addition reductions, which are suitable even in strict IEEE mode. Previously vect_is_simple_reduction would reject any cases that forbid reassociation. The idea is instead to tentatively accept them as "FOLD_LEFT_REDUCTIONs" and only fail later

Re: [PATCH] Improve -Wmaybe-uninitialized documentation

2017-11-17 Thread Jeff Law
On 11/17/2017 05:40 AM, Jonathan Wakely wrote: > On 16/11/17 09:18 -0700, Martin Sebor wrote: >> On 11/16/2017 03:49 AM, Jonathan Wakely wrote: >>> On 15/11/17 20:28 -0700, Martin Sebor wrote: On 11/15/2017 07:31 AM, Jonathan Wakely wrote: > The docs for -Wmaybe-uninitialized have some

Re: [PATCH Obvious]Remove redundant check on component distance

2017-11-17 Thread Jeff Law
On 11/17/2017 06:48 AM, Bin Cheng wrote: > Hi, > This is an obvious patch removing redundant check on component distance in > tree-predcom.c Bootstrap and test along with next patch. Is it OK? > > Thanks, > bin > 2017-11-15 Bin Cheng > > * tree-predcom.c

Re: [2/7] Make vect_model_store_cost take a vec_load_store_type

2017-11-17 Thread Jeff Law
On 11/17/2017 02:17 AM, Richard Sandiford wrote: > This patch makes vect_model_store_cost take a vec_load_store_type > instead of a vect_def_type. It's a wash on its own, but it helps > with later patches. > > Richard > > > 2017-11-17 Richard Sandiford > >

Re: [6/7] Split gather load handling out of vectorizable_{mask_load_store,load}

2017-11-17 Thread Jeff Law
On 11/17/2017 02:20 AM, Richard Sandiford wrote: > vectorizable_mask_load_store and vectorizable_load used the same > code to build a gather load call, except that the former also > vectorised a mask argument and used it for both the src and mask > inputs. The latter instead used a src input of

[RFC][PATCH] Remove SLOW_BYTE_ACCESS

2017-11-17 Thread Wilco Dijkstra
Remove SLOW_BYTE_ACCESS given it's confusing, badly named, badly documented and used incorrectly. Although most targets define it as 1, there are several targets which confuse it (based on comments next to it) and set it to 0 since the name obviously implies it should be 0 when byte accesses are

[RFC][PATCH] Change default to -fcommon

2017-11-17 Thread Wilco Dijkstra
GCC currently defaults to -fcommon. This is an optional C feature dating back to early C implementations. On many targets this means global variable accesses having an unnecessary codesize and performance penalty in C code (the same source generates better code when built as C++). Given there

Fix profile updating in ipa-cp and fixup_cfg

2017-11-17 Thread Jan Hubicka
Hi, this patch makes ipa-cp to not drop profile to 0 when clonning across all active apths and tree-cfg to do same profile updating as tree-inline. I will factor out the common code as a followup. Bootstrapped/regtested x86_64-linux. * ipa-cp.c (update_profiling_info): Handle conversion

Re: [patch, fortran] Fix PR 83012, rejects-valid regression with contiguous pointer

2017-11-17 Thread Paul Richard Thomas
Hi Thomas, This is OK. Thanks Paul On 17 November 2017 at 17:38, Thomas Koenig wrote: > Hello world, > > the attached patch fixes the PR by looking at the function interface if > one exists. > > Regression-tested. OK for trunk? > > Regards > > Thomas > >

Re: Allow single-element interleaving for non-power-of-2 strides

2017-11-17 Thread Jeff Law
On 11/17/2017 08:33 AM, Richard Sandiford wrote: > This allows LD3 to be used for isolated a[i * 3] accesses, in a similar > way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively. > Given the problems with the cost model underestimating the cost of > elementwise accesses, the patch

  1   2   >