Hi,
On 17/11/2017 06:29, Jeff Law wrote:
OK. I'll go ahead and commit for you.
Beautiful. Thanks Jeff.
I think this patch is small enough to not require a copyright
assignment. However further work likely will. I don't offhand know if
Oracle has a blanket assignment in place. Can you work
This patch splits the mask argument checking out of
vectorizable_mask_load_store, so that a later patch can use it in both
vectorizable_load and vectorizable_store. It also adds dump messages
for false returns. This is mostly useful for the TYPE_VECTOR_SUBPARTS
check, which can fail if pattern
This patch makes vect_model_store_cost take a vec_load_store_type
instead of a vect_def_type. It's a wash on its own, but it helps
with later patches.
Richard
2017-11-17 Richard Sandiford
gcc/
* tree-vectorizer.h (vec_load_store_type): Moved from
Hi Jakub,
On 16/11/17 17:06, Jakub Jelinek wrote:
Hi!
This patch uses the bswap pass framework inside of the store merging
pass to handle adjacent stores which produce together a 16/32/64 bit
store of bswapped value (loaded or from SSA_NAME) or identity (usually
only from SSA_NAME, the code
On Thu, Nov 16, 2017 at 5:21 PM, Nathan Froyd wrote:
> Default-initialization of scalar arrays in C++ member initialization
> lists produced rather slow code, laboriously setting each element of the
> array to zero. It would be much faster to block-initialize the array,
>
VN already sees if an expresion is fully constant so there's no reason
to duplicate that work during PHI translation. I've verified with
an assert the paths are indeed unreachable.
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
Richard.
2017-11-17 Richard Biener
vectorizable_mask_load_store and vectorizable_load used the same
code to build a gather load call, except that the former also
vectorised a mask argument and used it for both the src and mask
inputs. The latter instead used a src input of zero and a mask
input of all-ones.
This patch splits the
This patch splits out the code to build an all-bits-one or all-bits-zero
input to a gather load. The catch is that both masks can have
floating-point type, in which case they are implicitly treated in
the same way as an integer bitmask.
Richard
2017-11-17 Richard Sandiford
This patch splits out the rhs checking code that's common to both
vectorizable_mask_load_store and vectorizable_store.
Richard
2017-11-17 Richard Sandiford
gcc/
* tree-vect-stmts.c (vect_check_store_rhs): New function,
split out from...
Hi, this patch introduces separate cost model for skylake-avx512. Ok for trunk?
gcc/
* config/i386/i386.c (processor_target_table): Add skylake_cost for
skylake-avx512.
* config/i386/x86-tune-costs.h (skylake_memcpy, skylake_memset,
skylake_cost): New.
Thanks,
On Fri, Nov 17, 2017 at 10:18 AM, Koval, Julia wrote:
> Hi, this patch introduces separate cost model for skylake-avx512. Ok for
> trunk?
>
> gcc/
> * config/i386/i386.c (processor_target_table): Add skylake_cost for
> skylake-avx512.
> *
The below patch adds the -fmacro-prefix-map option that allows remapping
of file names in __FILE__, __BASE_FILE__, and __builtin_FILE(), similar
to how -fdebug-prefix-map allows to do the same for debug information.
Additionally, the patch adds -ffile-prefix-map which can be used to
specify both
This patch adds support for the SVE bitwise reduction instructions
(ANDV, ORV and EORV). It's a fairly mechanical extension of existing
REDUC_* operators.
Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
and powerpc64le-linux-gnu.
Richard
2017-11-17 Richard Sandiford
On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener wrote:
> This patch changes the Fortran frontend to annotate DO CONCURRENT
> with parallel instead of ivdep.
>
> The patch is not enough to enable a runtime benefit because of
> some autopar costing issues but for other cases it
Hi Luis,
[cc'ing aarch64 maintainers, it's quicker to get review that way]
On 15/11/17 03:00, Luis Machado wrote:
> I think the best thing is to leave this tuning structure in place and
> just change default_opt_level to -1 to disable it at -O3.
>
> Thanks,
> Andrew
>
Indeed that seems to
Richard Sandiford writes:
> This patch adds support for vectorising groups of IFN_MASK_LOADs
> and IFN_MASK_STOREs using conditional load/store-lanes instructions.
> This requires new internal functions to represent the result
> (IFN_MASK_{LOAD,STORE}_LANES), as well
On Thu, 16 Nov 2017, Tamar Christina wrote:
> Hi Richard,
>
> >
> > I'd have made it
> >
> > if { ([is-effective-target non_strict_align]
> > && ! ( [istarget ...] || ))
> >
> > thus default it to 1 for non-strict-align targets.
> >
>
> Fair, I've switched it to a black list
On 17/11/17 10:45, Sudi Das wrote:
Hi Kyrill
Thanks I have made the change.
Thanks Sudi, I've committed this on your behalf with r254863.
Kyrill
Sudi
From: Kyrill Tkachov
Sent: Thursday, November 16, 2017 5:03 PM
To: Sudi Das; gcc-patches@gcc.gnu.org
Cc:
On Fri, Nov 17, 2017 at 12:10 AM, Marc Glisse wrote:
> On Thu, 16 Nov 2017, Richard Biener wrote:
>
>> On Thu, Nov 16, 2017 at 3:33 PM, Wilco Dijkstra
>> wrote:
>>>
>>> GCC currently defaults to -ftrapping-math. This is supposed to generate
>>> code
The vectoriser uses vectorizable_mask_load_store to handle conditional
loads and stores (IFN_MASK_LOAD and IFN_MASK_STORE) but uses
vectorizable_load and vectorizable_store for unconditional loads
and stores. vectorizable_mask_load_store shares a lot of code
with the other two routines, and this
vectorizable_mask_load_store replaces scalar IFN_MASK_LOAD calls with
dummy assignments, so that they never survive vectorisation. This patch
moves the code to vect_transform_loop instead, so that we only change
the scalar statements once all of them have been vectorised.
This makes it easier to
Two things stopped us using SLP reductions with variable-length vectors:
(1) We didn't have a way of constructing the initial vector.
This patch does it by creating a vector full of the neutral
identity value and then using a shift-and-insert function
to insert any non-identity inputs
On Fri, Nov 17, 2017 at 12:21 AM, Alan Hayward wrote:
>
>> On 16 Nov 2017, at 19:32, Andrew Pinski wrote:
>>
>> On Thu, Nov 16, 2017 at 4:35 AM, Alan Hayward wrote:
>>> This final patch adds the clobber high expressions to tls_desc
Hi,
I wrote a patch that called some function in the common libgomp code
from GOMP_OFFLOAD_fini_device, and found that it hung due to the fact that:
- gomp_target_fini locks devices[*].lock while calling
GOMP_OFFLOAD_fini_device, and
- the function call that I added also locked that same
Hi,
this is a cleaned up and updated revision of Mike's latest posted patch
implementing #pragma GCC unroll in the C and C++ compilers. To be honest,
we're not so much interested in the front-end bits as in the middle-end bits,
because the latter would at last make the Ada version of the
Hi Pekka,
> Instead of always representing the HSAIL's untyped registers as
> unsigned int, the gccbrig now pre-analyzes the BRIG code and
> builds the register variables as a type used the most when storing
> or reading data to/from each register. This reduces the total
> conversions which
On 16/11/17 10:59 -0700, Jeff Law wrote:
On 11/16/2017 03:49 AM, Jonathan Wakely wrote:
On 15/11/17 20:28 -0700, Martin Sebor wrote:
On 11/15/2017 07:31 AM, Jonathan Wakely wrote:
The docs for -Wmaybe-uninitialized have some issues:
- That first sentence is looong.
- Apparently some C++
This adds a new ANNOTATE_EXPR kind, annot_expr_parallel_kind, which
is stronger than ivdep which maps semantically to safelen=INT_MAX
which alone doesn't tell us enough to auto-parallelize anything.
annot_expr_parallel_kind can map to the already existing loop
flag can_be_parallel which can be
Hi,
> gcc/
> * config/arc/linux.h: GLIBC_DYNAMIC_LINKER update per glibc
> upstreaming review comments
>
Accepted and committed. Thank you for your contribution,
Claudiu
On 16/11/17 09:18 -0700, Martin Sebor wrote:
On 11/16/2017 03:49 AM, Jonathan Wakely wrote:
On 15/11/17 20:28 -0700, Martin Sebor wrote:
On 11/15/2017 07:31 AM, Jonathan Wakely wrote:
The docs for -Wmaybe-uninitialized have some issues:
- That first sentence is looong.
- Apparently some
> On 16 Nov 2017, at 19:32, Andrew Pinski wrote:
>
> On Thu, Nov 16, 2017 at 4:35 AM, Alan Hayward wrote:
>> This final patch adds the clobber high expressions to tls_desc for aarch64.
>> It also adds three tests.
>>
>> In addition I also tested by
Hi,
GOACC_enter_exit_data has this prototype:
...
void
GOACC_enter_exit_data (int device, size_t mapnum,
void **hostaddrs, size_t *sizes,
unsigned short *kinds,
int async, int num_waits, ...)
...
And GOACC_declare calls
After the previous patches, it's easier to see that the remaining
inlined transform code in vectorizable_mask_load_store is just a
cut-down version of the VMAT_CONTIGUOUS handling in vectorizable_load
and vectorizable_store. This patch therefore makes those functions
handle masked loads and
Hi Kyrill
Thanks I have made the change.
Sudi
From: Kyrill Tkachov
Sent: Thursday, November 16, 2017 5:03 PM
To: Sudi Das; gcc-patches@gcc.gnu.org
Cc: nd; Ramana Radhakrishnan; Richard Earnshaw
Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c
Hi
On Sat, Nov 11, 2017 at 12:44 AM, Marc Glisse wrote:
> Adding some random cc: to people who might be affected. Hopefully I am not
> breaking any of your stuff...
>
> Ulrich Weigand (address space)
> Ilya Enkovich (pointer bound check)
> DJ Delorie (target with 24-bit partial
On Fri, 17 Nov 2017, Janne Blomqvist wrote:
> On Fri, Nov 17, 2017 at 3:03 PM, Richard Biener wrote:
> > On Fri, 17 Nov 2017, Janne Blomqvist wrote:
> >
> >> On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener wrote:
> >> > This patch changes the Fortran
thanks Jeff and Paolo.
really appreciate for all the help so far.
Qing
> On Nov 17, 2017, at 3:17 AM, Paolo Carlini wrote:
>
> Hi,
>
> On 17/11/2017 06:29, Jeff Law wrote:
>> OK. I'll go ahead and commit for you.
> Beautiful. Thanks Jeff.
>> I think this patch is
This patch uses the SVE LASTB instruction to optimise cases in which
a value produced by the final scalar iteration of a vectorised loop is
live outside the loop. Previously this situation would stop us from
using a fully-masked loop.
Tested on aarch64-linux-gnu (with and without SVE),
This allows LD3 to be used for isolated a[i * 3] accesses, in a similar
way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively.
Given the problems with the cost model underestimating the cost of
elementwise accesses, the patch continues to reject the VMAT_ELEMENTWISE
cases that are
Hi,
I've factored out 3 new functions to test properties of enum acc_async_t:
...
typedef enum acc_async_t {
/* Keep in sync with include/gomp-constants.h. */
acc_async_noval = -1,
acc_async_sync = -2
} acc_async_t;
...
In order to understand what this means:
...
if (async <
On Fri, Nov 17, 2017 at 8:18 AM, Jeff Law wrote:
>
> As I've stated several times one of the goals here is to provide a
> little range analysis module that we can embed & reuse.
>
> To accomplish that I need to break down the evrp class.
>
> This patch does the bulk of the real
Hi,
I previously introduced CT_STORE_STORE chains in predcom. This patch further
supports load
reference in CT_STORE_STORE chain if the load is dominated by a store reference
in the same
loop iteration. So example as in added test case:
for (i = 0; i < len; i++)
{
a[i] = t1;
On Thu, Nov 16, 2017 at 11:21 AM, Nathan Froyd wrote:
> Default-initialization of scalar arrays in C++ member initialization
> lists produced rather slow code, laboriously setting each element of the
> array to zero. It would be much faster to block-initialize the array,
>
If we have a PARALLEL of two SETs, and one half is unused, we currently
happily split that into two instructions (albeit the unused one is
useless). Worse, as PR82621 shows, combine will happily merge this
insn into I3 even if some intervening insn sets the same register
again, which is wrong.
This patch adds support for using a single fully-predicated loop instead
of a vector loop and a scalar tail. An SVE WHILELO instruction generates
the predicate for each iteration of the loop, given the current scalar
iv value and the loop bound. This operation is wrapped up in a new internal
This patch adds support for fully-masking loops that require peeling
for gaps. It peels exactly one scalar iteration and uses the masked
loop to handle the rest. Previously we would fall back on using a
standard unmasked loop instead.
Tested on aarch64-linux-gnu (with and without SVE),
This patch looks for pseudo registers that are live across a call
and for which no call-preserved hard registers exist. It then
recomputes the pseudos as necessary to ensure that they are no
longer live across a call. The comment at the head of the file
describes the approach.
A new target hook
On Fri, Nov 17, 2017 at 11:23 AM, Eric Botcazou wrote:
> Hi,
>
> this is a cleaned up and updated revision of Mike's latest posted patch
> implementing #pragma GCC unroll in the C and C++ compilers. To be honest,
> we're not so much interested in the front-end bits as in
Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
bitfields by their declared type, which results in better codegeneration on
practically
any target.
I'm thinking we should completely remove all trace of SLOW_BYTE_ACCESS
from GCC as it's confusing and useless.
OK for commit
On 15/11/17 15:59, Tamar Christina wrote:
-Original Message-
From: Kyrill Tkachov [mailto:kyrylo.tkac...@foss.arm.com]
Sent: Wednesday, November 15, 2017 10:11
To: Tamar Christina ; Sandra Loosemore
; gcc-patches@gcc.gnu.org
Cc: nd
On 11/17/2017 02:17 AM, Paolo Carlini wrote:
> Hi,
>
> On 17/11/2017 06:29, Jeff Law wrote:
>> OK. I'll go ahead and commit for you.
> Beautiful. Thanks Jeff.
>> I think this patch is small enough to not require a copyright
>> assignment. However further work likely will. I don't offhand know
Hi Rainer,
On Fri, Nov 17, 2017 at 1:32 PM, Rainer Orth
wrote:
> Please fix.
Fixed in r254870.
BR,
Pekka
This fixes the altivec-macros.c testcase; we now need to explicitly
say "no column number" for messages without one.
Tested on powerpc64-linux {-m32,-m64}; committing to trunk.
Segher
2017-11-17 Segher Boessenkool
gcc/testsuite/
*
On 11/17/2017 04:49 AM, Jeff Law wrote:
> + /* We do not allow copying this object or initializing one from another.
> */
> + evrp_dom_walker (const evrp_dom_walker &);
> + evrp_dom_walker& operator= (const evrp_dom_walker &);
> +
Note you can use include/ansidecl.h's
Remove the remaining uses of '*' from aarch64.md.
Using '*' in alternatives is typically incorrect as it tells the register
allocator to ignore those alternatives. Also add a missing '?' so we
prefer a floating point register for same-size int<->fp conversions.
Passes regress & bootstrap, OK for
This patch adds a hook to control whether we avoid executing masked
(predicated) stores when the mask is all false. We don't want to do
that by default for SVE.
Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
and powerpc64le-linux-gnu. OK to install?
Richard
2017-11-17
This patch uses SVE CLASTB to optimise conditional reductions. It means
that we no longer need to maintain a separate index vector to record
the most recent valid value, and no longer need to worry about overflow
cases.
Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
and
On Fri, Nov 17, 2017 at 03:21:31PM +, Wilco Dijkstra wrote:
> Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
> bitfields by their declared type, which results in better codegeneration on
> practically
> any target.
>
> I'm thinking we should completely remove all
This patch:
- tweaks the handling of legitimize_address_displacement
so that it gets called before rather than after the address has
been expanded. This means that we're no longer at the mercy
of LRA being able to interpret the expanded instructions.
- passes the original offset to
On Fri, Nov 17, 2017 at 8:41 AM, Jeff Law wrote:
> This patch introduces the evrp_range_analyzer class. This is the class
> we're going to be able to embed into existing dominator walkers to
> provide them with context sensitive range analysis.
>
> The bulk of the class is
Hi,
so I've dusted off and improved the implementation of unroll-and-jam from
last year. The changes relative to last submission are:
* corrected feasibility of the transform (i.e. that dependency directions
are correctly retained, the last submission was wrong).
* added profitability
On Fri, Nov 17, 2017 at 7:56 AM, Richard Biener
wrote:
> On Sat, Nov 11, 2017 at 12:44 AM, Marc Glisse wrote:
>> Adding some random cc: to people who might be affected. Hopefully I am not
>> breaking any of your stuff...
>>
>> Ulrich Weigand
Hi again,
I managed to spend much more time on the issue and I'm starting a new
thread with a mature - IMHO - proposal: the big thing is the use of the
existing check_array_designated_initializer in
process_init_constructor_array, which calls maybe_constant_value, as we
want, and covers all
On 11/17/2017 07:25 AM, Kyrill Tkachov wrote:
Hi Luis,
[cc'ing aarch64 maintainers, it's quicker to get review that way]
On 15/11/17 03:00, Luis Machado wrote:
> I think the best thing is to leave this tuning structure in place and
> just change default_opt_level to -1 to disable it at -O3.
Hi,
This is an obvious patch removing redundant check on component distance in
tree-predcom.c Bootstrap and test along with next patch. Is it OK?
Thanks,
bin
2017-11-15 Bin Cheng
* tree-predcom.c (add_ref_to_chain): Remove check on distance.From
On Fri, 17 Nov 2017, Janne Blomqvist wrote:
> On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener wrote:
> > This patch changes the Fortran frontend to annotate DO CONCURRENT
> > with parallel instead of ivdep.
> >
> > The patch is not enough to enable a runtime benefit because
This makes the minimum number of iterations per thread a --param instead
of a magic define and handles loop->can_be_parallel independent of
whether flag_loop_parallelize_all was enabled (and thus also handle
loops our own dependence analysis can analyze but graphites could not).
It also adjusts
On Fri, Nov 17, 2017 at 3:03 PM, Richard Biener wrote:
> On Fri, 17 Nov 2017, Janne Blomqvist wrote:
>
>> On Fri, Nov 17, 2017 at 11:13 AM, Richard Biener wrote:
>> > This patch changes the Fortran frontend to annotate DO CONCURRENT
>> > with parallel
On Fri, Nov 17, 2017 at 5:17 AM, Jeff Law wrote:
> No nyquil tonight, so the proper patch is attached this time...
>
> --
>
>
>
> So the next group of changes is focused on breaking down evrp into an
> analysis engine and the actual optimization pass. The analysis engine
> can
Hi!
On Fri, Nov 17, 2017 at 12:04:45AM -0500, Michael Meissner wrote:
> This patch is an enhancement of a previous page that never got approved.
> https://gcc.gnu.org/ml/gcc-patches/2017-10/threads.html#02124
>
> In the original patch, I added support to the machine independent
> infrastructure
ivopts previously treated pointer arguments to internal functions
like IFN_MASK_LOAD and IFN_MASK_STORE as normal gimple values.
This patch makes it treat them as addresses instead. This makes
a significant difference to the code quality for SVE loops,
since we can then use loads and stores with
This patch adds support for aligning vectors by using a partial
first iteration. E.g. if the start pointer is 3 elements beyond
an aligned address, the first iteration will have a mask in which
the first three elements are false.
On SVE, the optimisation is only useful for vector-length-specific
Hi Thomas,
On 15/11/17 17:14, Thomas Preudhomme wrote:
Hi,
Expanders for Armv8-M nonsecure call unnecessarily clobber r4 despite
the libcall they perform not writing to r4. Furthermore, the
requirement for the branch target address to be in r4 as expected by
the libcall is modeled in a
On 17/11/17 08:42, Andrew Pinski wrote:
> On Fri, Nov 17, 2017 at 12:21 AM, Alan Hayward wrote:
>>
>>> On 16 Nov 2017, at 19:32, Andrew Pinski wrote:
>>>
>>> On Thu, Nov 16, 2017 at 4:35 AM, Alan Hayward wrote:
This final patch
On Wed, Nov 15, 2017 at 03:00:53AM +, Luis Machado wrote:
> > I think the best thing is to leave this tuning structure in place and
> > just change default_opt_level to -1 to disable it at -O3.
> >
> > Thanks,
> > Andrew
> >
>
> Indeed that seems to be more appropriate if re-enabling
On 11/17/2017 09:07 AM, Nathan Sidwell wrote:
> We currently optimize a malloc/memset pair into a calloc call (when the
> values match, of course). This turns out to be a pessimization for
> mysql 5.6, where the allocator looks like:
>
> void *ptr = malloc (size);
> if (ptr && other_condition)
>
On 11/17/2017 02:17 AM, Richard Sandiford wrote:
> vectorizable_mask_load_store replaces scalar IFN_MASK_LOAD calls with
> dummy assignments, so that they never survive vectorisation. This patch
> moves the code to vect_transform_loop instead, so that we only change
> the scalar statements once
On 11/17/2017 11:57 AM, Nathan Sidwell wrote:
> On 11/17/2017 01:37 PM, Jeff Law wrote:
>
>> ISTM the better way to drive this is to query the branch probabilities.
>> It'd probably be simpler too. Is there some reason that's not a good
>> solution?
>
> (a) I'd have to learn how to do that
Yea,
The attached patch enhances -Wstringop-overflow to detect more
instances of buffer overflow at compile time by handling non-
constant offsets into the destination object that are known to
be in some range. The solution could be improved by handling
even more cases (e.g., anti-ranges or offsets
Hi,
with frequencies not being capped by 100 it is easy to run into roundoff errors
that are more than 1. Maybe I will need to give up on this assert (which would
be pity as it is useful) but for now I just made it bit more tolerant.
Bootstrapped/regtested x86_64-linux.
Honza
*
Hi,
with tail recursion and accumulation it is quite common case that the profile
is unrealistic and the recursive call is triggered more often then the
entry block. This patch prevents tailcall from dropping entry block profile
to 0 (and making it very cold) in this case.
Bootstrapped/regtested
On 11/15/2017 10:58 PM, Nathan Rossi wrote:
Remove the MicroBlaze specific TARGET_ASM_OUTPUT_IDENT definition, and
use the default.
This resolves issues associated with the use of the .sdata2 operation in
cases where emitted assembly after the ident output is incorrectly in
the .sdata2 section
On 11/17/2017 02:18 AM, Richard Sandiford wrote:
> This patch splits out the rhs checking code that's common to both
> vectorizable_mask_load_store and vectorizable_store.
>
> Richard
>
>
> 2017-11-17 Richard Sandiford
>
> gcc/
> * tree-vect-stmts.c
On 11/17/2017 01:37 PM, Jeff Law wrote:
ISTM the better way to drive this is to query the branch probabilities.
It'd probably be simpler too. Is there some reason that's not a good
solution?
(a) I'd have to learn how to do that
(b) in the case where the condition is just a null check,
On 11/17/2017 02:21 AM, Richard Sandiford wrote:
> After the previous patches, it's easier to see that the remaining
> inlined transform code in vectorizable_mask_load_store is just a
> cut-down version of the VMAT_CONTIGUOUS handling in vectorizable_load
> and vectorizable_store. This patch
Hi Dominique,
Quite suddenly, I am seeing fault too. I don't know what has changed.
I'm on to it.
Thanks
Paul
On 15 November 2017 at 11:40, Dominique d'Humières wrote:
> Hi Paul,
>
> Your patch fixes the ICE and pass the tests. However I see
>
> At line 22 of file
Hi,
this patch makes static profile to be in range 0...2^30 rather than
0...1. This is safe now as profile-counts are taking care of
possible overflow when the profile ends up cummulating too high after
inlining.
MThere are two testcases that needs adusting. dump-2.c simply checks
for
On 11/17/2017 01:48 PM, James Greenhalgh wrote:
On Wed, Nov 15, 2017 at 03:00:53AM +, Luis Machado wrote:
I think the best thing is to leave this tuning structure in place and
just change default_opt_level to -1 to disable it at -O3.
Thanks,
Andrew
Indeed that seems to be more
The call to ifc_temp_var in predicate_mem_writes become redundant
in r230099. Before that point the mask was calculated using
fold_build_*s, but now it's calculated by gimple_build and so
is already a valid gimple value.
As it stands, the call forces an SSA_NAME-to-SSA_NAME copy
to be created,
Hi,
as discussed at IRC, currently vectorizer costmodel ignores the fact that not
all vector operations are supported. In particular when vectorizing byte and
64bit integer loops we quite often end up producing slower vector sequence by
believing that we can use vector operations which does not
This patch adds support for in-order floating-point addition reductions,
which are suitable even in strict IEEE mode.
Previously vect_is_simple_reduction would reject any cases that forbid
reassociation. The idea is instead to tentatively accept them as
"FOLD_LEFT_REDUCTIONs" and only fail later
On 11/17/2017 05:40 AM, Jonathan Wakely wrote:
> On 16/11/17 09:18 -0700, Martin Sebor wrote:
>> On 11/16/2017 03:49 AM, Jonathan Wakely wrote:
>>> On 15/11/17 20:28 -0700, Martin Sebor wrote:
On 11/15/2017 07:31 AM, Jonathan Wakely wrote:
> The docs for -Wmaybe-uninitialized have some
On 11/17/2017 06:48 AM, Bin Cheng wrote:
> Hi,
> This is an obvious patch removing redundant check on component distance in
> tree-predcom.c Bootstrap and test along with next patch. Is it OK?
>
> Thanks,
> bin
> 2017-11-15 Bin Cheng
>
> * tree-predcom.c
On 11/17/2017 02:17 AM, Richard Sandiford wrote:
> This patch makes vect_model_store_cost take a vec_load_store_type
> instead of a vect_def_type. It's a wash on its own, but it helps
> with later patches.
>
> Richard
>
>
> 2017-11-17 Richard Sandiford
>
>
On 11/17/2017 02:20 AM, Richard Sandiford wrote:
> vectorizable_mask_load_store and vectorizable_load used the same
> code to build a gather load call, except that the former also
> vectorised a mask argument and used it for both the src and mask
> inputs. The latter instead used a src input of
Remove SLOW_BYTE_ACCESS given it's confusing, badly named,
badly documented and used incorrectly. Although most targets
define it as 1, there are several targets which confuse it
(based on comments next to it) and set it to 0 since the name
obviously implies it should be 0 when byte accesses are
GCC currently defaults to -fcommon. This is an optional C feature dating
back to early C implementations. On many targets this means global variable
accesses having an unnecessary codesize and performance penalty in C code
(the same source generates better code when built as C++). Given there
Hi,
this patch makes ipa-cp to not drop profile to 0 when clonning across
all active apths and tree-cfg to do same profile updating as tree-inline.
I will factor out the common code as a followup.
Bootstrapped/regtested x86_64-linux.
* ipa-cp.c (update_profiling_info): Handle conversion
Hi Thomas,
This is OK.
Thanks
Paul
On 17 November 2017 at 17:38, Thomas Koenig wrote:
> Hello world,
>
> the attached patch fixes the PR by looking at the function interface if
> one exists.
>
> Regression-tested. OK for trunk?
>
> Regards
>
> Thomas
>
>
On 11/17/2017 08:33 AM, Richard Sandiford wrote:
> This allows LD3 to be used for isolated a[i * 3] accesses, in a similar
> way to the current a[i * 2] and a[i * 4] for LD2 and LD4 respectively.
> Given the problems with the cost model underestimating the cost of
> elementwise accesses, the patch
1 - 100 of 185 matches
Mail list logo