for commit?
ChangeLog:
2014-11-14 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.c (generic_regmove_cost):
Increase FP move cost.
---
gcc/config/aarch64/aarch64.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc
Hi Jiong,
Can you commit this please?
2014-11-19 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.c (generic_regmove_cost):
Increase FP move cost (PR61915).
---
gcc/config/aarch64/aarch64.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git
, so I'll leave to 1 for now. The patch is the same
as last time, it just sets integer to 2, and uses the same settings for all
CPUs.
OK for commit?
ChangeLog:
2014-11-24 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64-protos.h (tune-params):
Add reasociation
Here is a new rematerialization sub-pass of LRA.
I've tested and benchmarked the sub-pass on x86-64 and ARM. The
sub-pass permits to generate a smaller code in average on both
architecture (although improvement no-significant), adds 0.4%
additional compilation time in -O2 mode of
Vladimir Makarov wrote:
On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and
SPECFP is ~0.2% faster.
Thanks for reporting this. It is important for me as I have no aarch64
machine for benchmarking.
Perlbmk performance degradation is too big and I'll definitely look
Wilco Dijkstra wrote:
Vladimir Makarov wrote:
On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and
SPECFP is ~0.2% faster.
Thanks for reporting this. It is important for me as I have no aarch64
machine for benchmarking.
Perlbmk performance degradation is too
Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64-protos.h (tune-params):
Add reasociation tuning parameters.
* gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH):
Define. (aarch64_reassociation_width): New function.
(generic_tunings
,
however it is right thing to do for any constant, including constants in
literal pools (which are
typically not legitimate). Also use ALL_REGS rather than GENERAL_REGS as
ALL_REGS has the correct
floating point register costs.
ChangeLog:
2014-10-29 Wilco Dijkstra wdijk...@arm.com
* gcc
Wilco Dijkstra wdijk...@arm.com
* gcc/ree.c (combine_reaching_defs):
Ensure inserted copy writes a single register.
---
gcc/ree.c | 8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/gcc/ree.c b/gcc/ree.c
index 856745f..9aa1e36 100644
--- a/gcc/ree.c
+++ b/gcc
Hi,
This is a set of patches improving register costs on AArch64. The first fixes
aarch64_register_move_cost() to support CALLER_SAVE_REGS and POINTER_REGS so
costs are calculated
correctly in the register allocator.
ChangeLog:
2014-09-04 Wilco Dijkstra wdijk...@arm.com
* gcc/config
This patch fixes a bug in aarch64_register_move_cost(): GET_MODE_SIZE is in
bytes not bits. As a
result the FP2FP cost doesn't need to be set to 4 to catch the special case for
Q register moves.
ChangeLog:
2014-09-04 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.c
Cleanup inconsistent use of __extension__.
ChangeLog:
2014-09-04 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.c: Cleanup use of __extension__.
---
gcc/config/aarch64/aarch64.c | 38 +++---
1 file changed, 11 insertions(+), 27 deletions
://gcc.gnu.org/ml/gcc-patches/2014-09/msg00356.html).
OK for commit?
Wilco
ChangeLog:
2014-09-04 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.c:
Add cortexa57_regmove_cost and cortexa53_regmove_cost to avoid
spilling from integer to FP registers.
---
gcc
From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com]
- NAMED_PARAM (FP2FP, 4)
+ NAMED_PARAM (FP2FP, 2)
This is not directly related to the change below and it is missing
from the ChangeLog. Originally this number had to be 2 in order
for secondary reload to kick in. See the
Thanks! Jakub noticed a potential problem in this area a while back,
but I never came up with any code to trigger and have kept that issue on
my todo list ever since.
Rather than ensuring the inserted copy write a single register, it seems
to me we're better off ensuring that the number of
Patch attached for commit as I don't have write access.
-Original Message-
From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com]
Sent: 04 September 2014 16:23
To: Wilco Dijkstra
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 1/4] AArch64: Fix register_move_cost
On 4
Patch attached for commit as I don't have write access.
ChangeLog:
2014-09-11 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.c (aarch64_register_move_cost):
Fix Q register move handling. (generic_regmove_cost): Undo raised
FP2FP move cost as Q register
OK, I'll skip this patch for now as HAVE_DESIGNATED_INITIALIZERS should
always be false, so there is no point in cleaning it up.
-Original Message-
From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com]
Sent: 04 September 2014 16:42
To: Wilco Dijkstra
Cc: gcc-patches@gcc.gnu.org
I've kept the integer move costs at 1 - patch attached for commit as I don't
have write access.
ChangeLog:
2014-09-11 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.c:
(cortexa57_regmove_cost): New cost table for A57.
(cortexa53_regmove_cost): New cost
Hi,
This patch improves the register move costs for 128-bit types.
OK for commit?
ChangeLog:
2014-09-15 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.c (aarch64_register_move_cost):
Add register move costs for 128-bit types.
---
gcc/config/aarch64/aarch64.c
Jiong Wang wrote:
when generating instructions to access local variable, for example a local
array,
if the array size very big, then we need a temp reg to keep the intermediate
index,
then use that temp reg as base reg, so that ldr is capable of indexing the
element.
while this
This patch makes -fsched-pressure the default on AArch64, like on ARM. This
improves performance and
reduces codesize due to fewer unnecessary spills.
OK for commit?
ChangeLog:
2014-09-19 Wilco Dijkstra wdijk...@arm.com
* gcc/common/config/aarch64/aarch64-common.c
Attached. Jiong, can you commit this for me please?
-Original Message-
From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com]
Sent: 23 September 2014 11:52
To: Wilco Dijkstra
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Improve regmove_costs for 128-bit types
Jeff Law wrote:
Do you have a testcase that shows the expected improvements from this
change? It's OK if it's specific to a target.
Have you bootstrapped and regression tested this change?
With a test for the testsuite and assuming it passes bootstrap and
regression testing, this will
Jeff Law wrote:
OK with the appropropriate ChangeLog entires. THe original for
ira-costs.c was fine, so you just need the trivial one for the testcase.
ChangeLog below - Jiong, could you commit for me please?
2014-12-02 Wilco Dijkstra wdijk...@arm.com
* gcc/ira-costs.c
Marcus Shawcroft wrote:
OK for commit?
ChangeLog:
2014-11-24 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64-protos.h (tune-params):
Add reasociation tuning parameters.
* gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH
be trivially
found with an assert? Also would it not be a good idea to have a single
register copy function that
ensures all data is copied?
ChangeLog: 2014-12-09 Wilco Dijkstra wdijk...@arm.com
* gcc/ira-emit.c (ira_create_new_reg): Copy preference classes.
---
gcc/ira-emit.c | 11
Jeff Law wrote:
On 12/09/14 12:21, Wilco Dijkstra wrote:
With the fix it uses a floating point register as expected. Given a similar
issue in
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html, would it not be
better to change
the
initialization values of reg_pref to illegal
This patch generalizes the code alignment and lets each CPU set function, jump
and loop alignment
independently. The defaults for A53/A57 are based the original patch by James
Greenhalgh.
OK for trunk?
ChangeLog:
2014-12-13 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64
Add an override for TARGET_MIN_DIVISIONS_FOR_RECIP_MUL and set the minimum
number of divisions to 2.
This gives ~0.5% speedup on SPECFP2000/2006.
OK for trunk?
ChangeLog:
2014-12-13 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.c (TARGET_MIN_DIVISIONS_FOR_RECIP_MUL
into gcc.target/i386/
I've moved it and changed the compile condition:
/* { dg-do compile { target { ! { ia32 } } } } */
Jiong, can you commit this please?
Wilco
2015-03-06 Wilco Dijkstra wdijk...@arm.com
* gcc/tree-ssa-phiopt.c (neg_replacement): Remove.
(tree_ssa_phiopt_worker
sxtwx0, w0
eor x1, x0, x0, asr 63
sub x1, x1, x0, asr 63
mov x0, x1
ret
After:
addsw0, w0, 1
csneg w0, w0, w0, pl
ret
ChangeLog:
2015-03-03 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.md
Richard Biener wrote:
On Thu, Feb 26, 2015 at 11:20 PM, Jeff Law l...@redhat.com wrote:
On 02/26/15 10:30, Wilco Dijkstra wrote:
Several GCC versions ago a conditional negate optimization was introduced
as a workaround for
PR45685. However the branchless expansion for conditional
Andrew Pinski wrote:
On Tue, Mar 3, 2015 at 10:06 AM, Wilco Dijkstra wdijk...@arm.com wrote:
This patch makes aarch64_min_divisions_for_recip_mul configurable for float
and double. This
allows
CPUs with really fast or multiple dividers to return 3 (or even 4) if that
happens
This patch makes aarch64_min_divisions_for_recip_mul configurable for float and
double. This allows
CPUs with really fast or multiple dividers to return 3 (or even 4) if that
happens to be faster
overall. No code generation change - bootstrap regression OK.
ChangeLog:
2015-03-03 Wilco
Include the cost of op0 and op1 in all cases in PLUS and MINUS in
aarch64_rtx_costs.
Bootstrap regression OK.
ChangeLog:
2015-03-04 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.c (aarch64_rtx_costs):
Calculate cost of op0 and op1 in PLUS and MINUS cases
Jeff Law wrote:
On 02/26/15 10:30, Wilco Dijkstra wrote:
Several GCC versions ago a conditional negate optimization was introduced
as a workaround
for
PR45685. However the branchless expansion for conditional negate is
extremely inefficient on
most
targets (5 sequentially dependent
Maxim Kuvyrkov wrote:
You are removing the 2nd alternative that generates abs with your patch.
While I agree that
using csneg is faster on all implementations, can you say the same for
abs? Especially
given the fact that csneg requires 4 operands instead of abs'es 2?
Yes, given that
Maxim Kuvyrkov wrote:
On Mar 4, 2015, at 3:30 PM, Wilco Dijkstra wdijk...@arm.com wrote:
Maxim Kuvyrkov wrote:
You are removing the 2nd alternative that generates abs with your patch.
While I agree
that
using csneg is faster on all implementations, can you say the same for
abs
This patch fixes the shift costs for Cortex-A53 so they are more accurate -
immediate shifts use
SBFM/UBFM which takes 2 cycles, register controlled shifts take 1 cycle.
Bootstrap and regression
OK.
ChangeLog:
2015-03-05 Wilco Dijkstra wdijk...@arm.com
* gcc/config/arm/aarch-cost
,%rdi), %eax
ret
After:
cmp w0, 4
csneg w0, w0, w0, lt
ret
movl%edi, %edx
movl%edi, %eax
negl%edx
cmpl$4, %edi
cmovge %edx, %eax
ret
ChangeLog:
2015-02-26 Wilco Dijkstra wdijk...@arm.com
-27 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_mode3):
Place integer variant first.
---
gcc/config/aarch64/aarch64.md | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/gcc/config/aarch64/aarch64.md b
ping
-Original Message-
From: Wilco Dijkstra [mailto:wdijk...@arm.com]
Sent: 05 March 2015 14:49
To: gcc-patches@gcc.gnu.org
Subject: [PATCH][AArch64] Fix Cortex-A53 shift costs
This patch fixes the shift costs for Cortex-A53 so they are more accurate -
immediate shifts
use
ping
-Original Message-
From: Wilco Dijkstra [mailto:wdijk...@arm.com]
Sent: 03 March 2015 18:06
To: GCC Patches
Subject: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul
configurable
This patch makes aarch64_min_divisions_for_recip_mul configurable for float
ping
-Original Message-
From: Wilco Dijkstra [mailto:wdijk...@arm.com]
Sent: 04 March 2015 15:38
To: GCC Patches
Subject: [PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS
Include the cost of op0 and op1 in all cases in PLUS and MINUS in
aarch64_rtx_costs.
Bootstrap
ping
-Original Message-
From: Wilco Dijkstra [mailto:wdijk...@arm.com]
Sent: 03 March 2015 16:19
To: GCC Patches
Subject: [PATCH][AArch64] Use conditional negate for abs expansion
Expand abs into a compare and conditional negate. This is the most obvious
expansion, enables
Jeff Law wrote:
On 12/10/14 06:26, Wilco Dijkstra wrote:
If recomputing is best does that mean that record_reg_classes should not
give a boost to the preferred class in the 2nd pass?
Perhaps. I haven't looked deeply at this part of IRA. I was relaying
my experiences with (ab)using
James Greenhalgh wrote:
On Mon, Apr 27, 2015 at 02:42:36PM +0100, Wilco Dijkstra wrote:
-Original Message-
From: Wilco Dijkstra [mailto:wdijk...@arm.com]
Sent: 03 March 2015 16:19
To: GCC Patches
Subject: [PATCH][AArch64] Use conditional negate for abs expansion
Marcus Shawcroft wrote:
On 27 April 2015 at 14:43, Wilco Dijkstra wdijk...@arm.com wrote:
static unsigned int
-aarch64_min_divisions_for_recip_mul (enum machine_mode mode
ATTRIBUTE_UNUSED)
+aarch64_min_divisions_for_recip_mul (enum machine_mode mode)
{
- return 2
Marcus Shawcroft wrote:
On 5 March 2015 at 14:49, Wilco Dijkstra wdijk...@arm.com wrote:
This patch fixes the shift costs for Cortex-A53 so they are more accurate -
immediate shifts
use
SBFM/UBFM which takes 2 cycles, register controlled shifts take 1 cycle.
Bootstrap and
regression
Marcus Shawcroft wrote:
On 1 May 2015 at 12:26, Wilco Dijkstra wdijk...@arm.com wrote:
Marcus Shawcroft wrote:
On 27 April 2015 at 14:43, Wilco Dijkstra wdijk...@arm.com wrote:
static unsigned int
-aarch64_min_divisions_for_recip_mul (enum machine_mode mode
ATTRIBUTE_UNUSED
James Greenhalgh wrote:
On Mon, Apr 27, 2015 at 05:57:26PM +0100, Wilco Dijkstra wrote:
James Greenhalgh wrote:
On Mon, Apr 27, 2015 at 02:42:36PM +0100, Wilco Dijkstra wrote:
-Original Message-
From: Wilco Dijkstra [mailto:wdijk...@arm.com]
Sent: 03 March 2015 16:19
Richard Henderson wrote:
However, the way that aarch64 and alpha have done it hasn't
been ideal, in that there's a fairly costly search that must
be done every time. I've thought before about changing this
so that we would be able to cache results, akin to how we do
it in expmed.c for
ping
-Original Message-
From: Wilco Dijkstra [mailto:wdijk...@arm.com]
Sent: 27 April 2015 14:37
To: GCC Patches
Subject: [PATCH][AArch64] Improve spill code - swap order in shl pattern
Various instructions are supported as integer operations as well as SIMD on
AArch64. When
to the extra int-FP
moves. Placing the
integer variant first in the shr pattern generates far more optimal spill code.
2015-07-27 Wilco Dijkstra wdijk...@arm.com
* gcc/config/aarch64/aarch64.md (aarch64_lshr_sisd_or_int_mode3):
Place integer variant first
Richard Henderson wrote:
On 08/12/2015 08:59 AM, Wilco Dijkstra wrote:
I looked at the statistics of AArch64 immediate generation a while ago.
The interesting thing is ~95% of calls are queries, and the same query is on
average repeated 10 times in a row. So (a) it is not important
Several instructions accidentally emit wzr/xzr even when the pattern specifies
an immediate. Fix
this by removing the register specifier in patterns that emit immediates.
Passes regression tests. OK for commit?
ChangeLog:
2015-10-28 Wilco Dijkstra <wdijk...@arm.com>
* gcc/
This patch adds CCMP selection based on rtx costs. This is based on Jiong's
already approved patch
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01434.html with some minor
refactoring and the tests updated.
OK for commit?
ChangeLog:
2015-11-13 Jiong Wang
gcc/
*
This patch adds support for rtx costing of CCMP. The cost is the same as
int/FP compare, however comparisons with zero get a slightly larger cost.
This means we prefer emitting compares with zero so they can be merged with
ALU operations.
OK for commit?
ChangeLog:
2015-11-13 Wilco Dijkstra
a compare with zero can be merged into an ALU
operation:
int
f (int a, int b)
{
a += b;
return a == 0 || a == 3;
}
f:
addsw0, w0, w1
ccmpw0, 3, 4, ne
csetw0, eq
ret
Passes GCC regression tests. OK for commit?
ChangeLog:
2015-11-13 Wilco Dijkstra
This patch adds support for FCCMP. This is trivial with the new CCMP
representation - remove the restriction of FP in ccmp.c and add FCCMP
patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected.
OK for commit?
ChangeLog:
2015-11-13 Wilco Dijkstra <wdijk...@arm.com>
> Evandro Menezes wrote:
> Hi, Wilco.
>
> It looks good to me, but FCMP is quite different from FCCMP on Exynos M1,
> so it'd be helpful to have distinct types for them. Say, "fcmp{s,d}"
> and "fccmp{s,d}". Would it be acceptable to add this with this patch or
> later?
It would be easy to add
of reg_pref to illegal register classes so this kind
of issue can be trivially
found with an assert? Also would it not be a good idea to have a single
register copy function that
ensures all data is copied?
ChangeLog: 2014-12-09 Wilco Dijkstra wdijk...@arm.com
* gcc/ira-emit.c
of
the register. This results in better register allocation overall, fewer
spills and reduced codesize - particularly in SPEC2006 gamess.
GCC regression passes with several minor fixes.
OK for commit?
ChangeLog:
2015-11-06 Wilco Dijkstra <wdijk...@arm.com>
* gcc/config/aarch64/aar
This patch improves support for instructions that allow FP zero immediate. All
FP compares generated
by various patterns should use aarch64_fp_compare_operand. LDP/STP uses
aarch64_reg_or_fp_zero.
Passes regression on AArch64.
OK for commit?
ChangeLog:
2015-10-08 Wilco Dijkstra <wd
Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. This
can give up to 2x
speedup on many AArch64 implementations. Also model the crypto instructions on
Cortex-A57 according
to the Optimization Guide.
Passes regression tests.
ChangeLog:
2015-10-14 Wilco Dijkstra
Bernd Schmidt wrote:
> Sent: 17 November 2015 22:16
> To: Wilco Dijkstra; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 1/4][AArch64] Generalize CCMP support
>
> On 11/13/2015 05:02 PM, Wilco Dijkstra wrote:
> > * gcc/ccmp.c (expand_ccmp_expr): Extract cmp_cod
(v2 version removes 4 enums)
This patch adds support for FCCMP. This is trivial with the new CCMP
representation - remove the restriction of FP in ccmp.c and add FCCMP
patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected.
OK for commit?
ChangeLog:
2015-11-18 Wilco Dijkstra
Wang <jiong.w...@arm.com>
2015-11-18 Wilco Dijkstra <wdijk...@arm.com>
gcc/
* ccmp.c (expand_ccmp_expr_1): Cost the instruction sequences
generated from different expand order. Cleanup enum use.
gcc/testsuite/
* gcc.target/aarch64/ccmp_1.c: Update test
> Segher Boessenkool wrote:
> On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
> > > > Combine canonicalizes certain AND masks in a comparison with zero into
> > > > extracts of the
> > > widest
> > > > register t
Hi,
Combine canonicalizes certain AND masks in a comparison with zero into extracts
of the widest
register type. During matching these are expanded into a very inefficient
sequence that fails to
match. For example (x & 2) == 0 is matched in combine like this:
Failed to match this instruction:
> Oleg Endo wrote:
> On 04 Sep 2015, at 01:54, Segher Boessenkool
> wrote:
>
> > On Thu, Sep 03, 2015 at 05:25:43PM +0100, Kyrill Tkachov wrote:
> >>> void g(void);
> >>> void f(int *x) { if (*x & 2) g(); }
> >
> >> A testcase I was looking at is:
> >> int
> >> foo
> Segher Boessenkool wrote:
> On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote:
> > >>You will end up with a *lot* of target hooks like this. It will also
> > >>make testing harder (less coverage). I am not sure that is a good idea.
> > >
> > >We certainly need a lot more target hooks in
> Kyrill Tkachov wrote:
> A testcase I was looking at is:
> int
> foo (int a)
> {
>return (a & 7) != 0;
> }
>
> For me this generates:
> and w0, w0, 7
> cmp w0, wzr
> csetw0, ne
> ret
>
> when it could be:
> tst w0, 7
>
> pins...@gmail.com wrote:
> > On Sep 7, 2015, at 7:22 PM, Kugan <kugan.vivekanandara...@linaro.org> wrote:
> >
> >
> >
> > On 07/09/15 20:46, Wilco Dijkstra wrote:
> >>> Kugan wrote:
> >>> 2. vector-compare-1.c from c-c++-common/tortu
> Kugan wrote:
> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
> -O3 -g Error: unaligned opcodes detected in executable segment. It works
> fine if I remove the -g. I am looking into it and needs to be fixed as well.
This is a known assembler bug I found a while back,
> Renlin Li wrote:
> Hi Andrew,
>
> Previously, there is a discussion thread in binutils mailing list:
>
> https://sourceware.org/ml/binutils/2015-04/msg00032.html
>
> Nick proposed a way to fix, Richard Henderson hold similar opinion as you.
Both Nick and Richard H seem to think it is an
> Segher Boessenkool wrote:
> Hi Wilco,
>
> On Wed, Sep 02, 2015 at 06:09:24PM +0100, Wilco Dijkstra wrote:
> > Combine canonicalizes certain AND masks in a comparison with zero into
> > extracts of the
> widest
> > register type. During matching these are
Remove aarch64_bitmasks, aarch64_build_bitmask_table and aarch64_bitmasks_cmp
as they are no longer
used by the immediate generation code.
No change in generated code, passes GCC regression tests/bootstrap.
ChangeLog:
2015-09-02 Wilco Dijkstra <wdijk...@arm.com>
* gcc/config/a
used
instead of add/sub (codesize remains the same).
ChangeLog:
2015-09-02 Wilco Dijkstra <wdijk...@arm.com>
* gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate):
Remove redundant immediate generation code.
---
gcc/config/aarch64/aarch64.
and
checks the mask is repeated across the full 64 bits. Native performance is 5-6x
faster on typical
queries.
No change in generated code, passes GCC regression/bootstrap.
ChangeLog:
2015-09-02 Wilco Dijkstra <wdijk...@arm.com>
* gcc/config/aarch64/aarch64.c (aarch64_bitma
tests/bootstrap.
ChangeLog:
2015-09-02 Wilco Dijkstra <wdijk...@arm.com>
* gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate):
Replace slow immediate matching loops with a faster algorithm.
---
gcc/config/aarch64/aarch64.
This is a set of patches to reduce the compile-time overhead of immediate
generation on AArch64.
There have been discussions and investigations into reducing the overhead of
immediate generation
using various caching strategies. However the statistics showed some of the
expensive immediate
in generated code for some
special cases but
codesize is identical.
ChangeLog:
2015-09-02 Wilco Dijkstra <wdijk...@arm.com>
* gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate):
Cleanup immediate generation code.
---
gcc/config/aarch64/aarch64.c
sted on AArch64.
OK for commit?
ChangeLog:
2015-09-25 Wilco Dijkstra <wdijk...@arm.com>
* gcc/config/aarch64/aarch64.md (add3):
Block early expansion into 2 add instructions.
(add3_pluslong): New pattern to combine complex
immediates into 2 additions.
---
Adding Bernd - would you mind reviewing the ccmp.c change please?
> -Original Message-
> From: James Greenhalgh [mailto:james.greenha...@arm.com]
> Sent: 15 December 2015 16:42
> To: Wilco Dijkstra
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 2/4 v2][AArch64] Add
-11-12 Wilco Dijkstra <wdijk...@arm.com>
* gcc/target.def (gen_ccmp_first): Update documentation.
(gen_ccmp_next): Likewise.
* gcc/doc/tm.texi (gen_ccmp_first): Update documentation.
(gen_ccmp_next): Likewise.
* gcc/ccmp.c (expand_ccmp_expr): E
ping
> -Original Message-
> From: Wilco Dijkstra [mailto:wdijk...@arm.com]
> Sent: 28 October 2015 17:33
> To: GCC Patches
> Subject: [PATCH][AArch64] Avoid emitting zero immediate as zero register
>
> Several instructions accidentally emit wzr/xzr even when the
ping
> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 17 November 2015 18:36
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 2/4 v2][AArch64] Add support for FCCMP
>
> (v2 version removes 4 enums)
>
> This patch adds support
ping
> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 13 November 2015 16:03
> To: 'gcc-patches@gcc.gnu.org'
> Subject: [PATCH 3/4][AArch64] Add CCMP to rtx costs
>
> This patch adds support for rtx costing of CCMP. The cost is
ping
> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 13 November 2015 16:03
> To: 'gcc-patches@gcc.gnu.org'
> Subject: [PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose
> better expand order
>
> This patch adds CCM
Kyrill Tkachov wrote:
> On 14/10/15 13:30, Wilco Dijkstra wrote:
> > Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs.
> > This can give up to 2x
> > speedup on many AArch64 implementations. Also model the crypto instructions
> &
C16
eor v1.16b, v1.16b, v0.16b
eor v0.16b, v1.16b, v0.16b
eor v1.16b, v1.16b, v0.16b
tbl v0.16b, {v0.16b - v1.16b}, v5.16b
Regress passes. This fixes regressions that were introduced recently, so OK for
commit?
ChangeLog:
2015-12-15 Wilco Dijkstra <
ping
> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 06 November 2015 20:06
> To: 'gcc-patches@gcc.gnu.org'
> Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
>
> Th
ping
> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 19 November 2015 18:12
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH][ARM] Enable fusion of AES instructions
>
> Enable instruction fusion of AES instructions on ARM for Cor
James Greenhalgh wrote:
> On Wed, Dec 16, 2015 at 01:05:21PM +0000, Wilco Dijkstra wrote:
> > James Greenhalgh wrote:
> > > On Tue, Dec 15, 2015 at 10:54:49AM +0000, Wilco Dijkstra wrote:
> > > > ping
> > > >
> > > > > -Original Messag
Richard Biener wrote:
> On Wed, Dec 16, 2015 at 10:32 AM, James Greenhalgh
> <james.greenha...@arm.com> wrote:
> > On Tue, Dec 15, 2015 at 11:35:45AM +0000, Wilco Dijkstra wrote:
> >>
> >> Add support for vector permute cost since various permutes can expan
James Greenhalgh wrote:
> On Tue, Dec 15, 2015 at 10:54:49AM +0000, Wilco Dijkstra wrote:
> > ping
> >
> > > -Original Message-
> > > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> > > Sent: 06 November 2015 20:06
> > > To: 'gcc-
Hi Evandro,
> Here's what I had in mind when I inquired about distinguishing FCMP from
> FCCMP. As you can see in the patch, Exynos is the only target that
> cares about it, but I wonder if ThunderX or Xgene would too.
>
> What do you think?
The new attributes look fine (I've got a similar
> James Greenhalgh wrote:
> > Could you please repost this with the word-wrapping issues fixed.
> > I can't apply it to my tree for review or to commit it on your behalf in
> > the current form.
So it looks like Outlook no longer supports sending emails without wrapping and
the
maximum is only
1 - 100 of 1115 matches
Mail list logo