[PATCH][AArch64] Adjust generic move costs

2014-11-14 Thread Wilco Dijkstra
for commit? ChangeLog: 2014-11-14 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c (generic_regmove_cost): Increase FP move cost. --- gcc/config/aarch64/aarch64.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc

RE: [PATCH][AArch64] Adjust generic move costs

2014-11-19 Thread Wilco Dijkstra
Hi Jiong, Can you commit this please? 2014-11-19 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c (generic_regmove_cost): Increase FP move cost (PR61915). --- gcc/config/aarch64/aarch64.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git

RE: [PATCH] AArch64: Add TARGET_SCHED_REASSOCIATION_WIDTH

2014-11-24 Thread Wilco Dijkstra
, so I'll leave to 1 for now. The patch is the same as last time, it just sets integer to 2, and uses the same settings for all CPUs. OK for commit? ChangeLog: 2014-11-24 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64-protos.h (tune-params): Add reasociation

RE: New rematerialization sub-pass in LRA

2014-10-13 Thread Wilco Dijkstra
Here is a new rematerialization sub-pass of LRA. I've tested and benchmarked the sub-pass on x86-64 and ARM. The sub-pass permits to generate a smaller code in average on both architecture (although improvement no-significant), adds 0.4% additional compilation time in -O2 mode of

RE: New rematerialization sub-pass in LRA

2014-10-14 Thread Wilco Dijkstra
Vladimir Makarov wrote: On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and SPECFP is ~0.2% faster. Thanks for reporting this. It is important for me as I have no aarch64 machine for benchmarking. Perlbmk performance degradation is too big and I'll definitely look

RE: New rematerialization sub-pass in LRA

2014-10-14 Thread Wilco Dijkstra
Wilco Dijkstra wrote: Vladimir Makarov wrote: On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and SPECFP is ~0.2% faster. Thanks for reporting this. It is important for me as I have no aarch64 machine for benchmarking. Perlbmk performance degradation is too

[PATCH] AArch64: Add TARGET_SCHED_REASSOCIATION_WIDTH

2014-10-29 Thread Wilco Dijkstra
Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64-protos.h (tune-params): Add reasociation tuning parameters. * gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH): Define. (aarch64_reassociation_width): New function. (generic_tunings

[PATCH] Improve spillcost of literal pool loads

2014-10-29 Thread Wilco Dijkstra
, however it is right thing to do for any constant, including constants in literal pools (which are typically not legitimate). Also use ALL_REGS rather than GENERAL_REGS as ALL_REGS has the correct floating point register costs. ChangeLog: 2014-10-29 Wilco Dijkstra wdijk...@arm.com * gcc

[PATCH] Fix register corruption bug in ree

2014-09-04 Thread Wilco Dijkstra
Wilco Dijkstra wdijk...@arm.com * gcc/ree.c (combine_reaching_defs): Ensure inserted copy writes a single register. --- gcc/ree.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/ree.c b/gcc/ree.c index 856745f..9aa1e36 100644 --- a/gcc/ree.c +++ b/gcc

[PATCH 1/4] AArch64: Fix register_move_cost

2014-09-04 Thread Wilco Dijkstra
Hi, This is a set of patches improving register costs on AArch64. The first fixes aarch64_register_move_cost() to support CALLER_SAVE_REGS and POINTER_REGS so costs are calculated correctly in the register allocator. ChangeLog: 2014-09-04 Wilco Dijkstra wdijk...@arm.com * gcc/config

[PATCH 2/4] AArch64: Fix cost for Q register moves

2014-09-04 Thread Wilco Dijkstra
This patch fixes a bug in aarch64_register_move_cost(): GET_MODE_SIZE is in bytes not bits. As a result the FP2FP cost doesn't need to be set to 4 to catch the special case for Q register moves. ChangeLog: 2014-09-04 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c

[PATCH 3/4] AArch64: Cleanup inconsistent use of __extension__

2014-09-04 Thread Wilco Dijkstra
Cleanup inconsistent use of __extension__. ChangeLog: 2014-09-04 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c: Cleanup use of __extension__. --- gcc/config/aarch64/aarch64.c | 38 +++--- 1 file changed, 11 insertions(+), 27 deletions

[PATCH 4/4] AArch64: Add regmove_costs for Cortex-A57 and A53

2014-09-04 Thread Wilco Dijkstra
://gcc.gnu.org/ml/gcc-patches/2014-09/msg00356.html). OK for commit? Wilco ChangeLog: 2014-09-04 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c: Add cortexa57_regmove_cost and cortexa53_regmove_cost to avoid spilling from integer to FP registers. --- gcc

RE: [PATCH 2/4] AArch64: Fix cost for Q register moves

2014-09-04 Thread Wilco Dijkstra
From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] - NAMED_PARAM (FP2FP, 4) + NAMED_PARAM (FP2FP, 2) This is not directly related to the change below and it is missing from the ChangeLog. Originally this number had to be 2 in order for secondary reload to kick in. See the

RE: [PATCH] Fix register corruption bug in ree

2014-09-08 Thread Wilco Dijkstra
Thanks! Jakub noticed a potential problem in this area a while back, but I never came up with any code to trigger and have kept that issue on my todo list ever since. Rather than ensuring the inserted copy write a single register, it seems to me we're better off ensuring that the number of

RE: [PATCH 1/4] AArch64: Fix register_move_cost

2014-09-11 Thread Wilco Dijkstra
Patch attached for commit as I don't have write access. -Original Message- From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] Sent: 04 September 2014 16:23 To: Wilco Dijkstra Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 1/4] AArch64: Fix register_move_cost On 4

RE: [PATCH 2/4] AArch64: Fix cost for Q register moves

2014-09-11 Thread Wilco Dijkstra
Patch attached for commit as I don't have write access. ChangeLog: 2014-09-11 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c (aarch64_register_move_cost): Fix Q register move handling. (generic_regmove_cost): Undo raised FP2FP move cost as Q register

RE: [PATCH 3/4] AArch64: Cleanup inconsistent use of __extension__

2014-09-11 Thread Wilco Dijkstra
OK, I'll skip this patch for now as HAVE_DESIGNATED_INITIALIZERS should always be false, so there is no point in cleaning it up. -Original Message- From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] Sent: 04 September 2014 16:42 To: Wilco Dijkstra Cc: gcc-patches@gcc.gnu.org

RE: [PATCH 4/4] AArch64: Add regmove_costs for Cortex-A57 and A53

2014-09-11 Thread Wilco Dijkstra
I've kept the integer move costs at 1 - patch attached for commit as I don't have write access. ChangeLog: 2014-09-11 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c: (cortexa57_regmove_cost): New cost table for A57. (cortexa53_regmove_cost): New cost

[PATCH] AArch64: Improve regmove_costs for 128-bit types

2014-09-15 Thread Wilco Dijkstra
Hi, This patch improves the register move costs for 128-bit types. OK for commit? ChangeLog: 2014-09-15 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c (aarch64_register_move_cost): Add register move costs for 128-bit types. --- gcc/config/aarch64/aarch64.c

RE: [PATCH][AArch64] Fix PR63293

2014-09-19 Thread Wilco Dijkstra
Jiong Wang wrote: when generating instructions to access local variable, for example a local array, if the array size very big, then we need a temp reg to keep the intermediate index, then use that temp reg as base reg, so that ldr is capable of indexing the element. while this

[PATCH] AArch64: Default to -fsched-pressure

2014-09-19 Thread Wilco Dijkstra
This patch makes -fsched-pressure the default on AArch64, like on ARM. This improves performance and reduces codesize due to fewer unnecessary spills. OK for commit? ChangeLog: 2014-09-19 Wilco Dijkstra wdijk...@arm.com * gcc/common/config/aarch64/aarch64-common.c

RE: [PATCH] AArch64: Improve regmove_costs for 128-bit types

2014-09-24 Thread Wilco Dijkstra
Attached. Jiong, can you commit this for me please? -Original Message- From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] Sent: 23 September 2014 11:52 To: Wilco Dijkstra Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] AArch64: Improve regmove_costs for 128-bit types

RE: [PATCH] Improve spillcost of literal pool loads

2014-11-28 Thread Wilco Dijkstra
Jeff Law wrote: Do you have a testcase that shows the expected improvements from this change? It's OK if it's specific to a target. Have you bootstrapped and regression tested this change? With a test for the testsuite and assuming it passes bootstrap and regression testing, this will

RE: [PATCH] Improve spillcost of literal pool loads

2014-12-02 Thread Wilco Dijkstra
Jeff Law wrote: OK with the appropropriate ChangeLog entires. THe original for ira-costs.c was fine, so you just need the trivial one for the testcase. ChangeLog below - Jiong, could you commit for me please? 2014-12-02 Wilco Dijkstra wdijk...@arm.com * gcc/ira-costs.c

RE: [PATCH] AArch64: Add TARGET_SCHED_REASSOCIATION_WIDTH

2014-12-09 Thread Wilco Dijkstra
Marcus Shawcroft wrote: OK for commit? ChangeLog: 2014-11-24 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64-protos.h (tune-params): Add reasociation tuning parameters. * gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH

[PATCH] Fix IRA register preferencing

2014-12-09 Thread Wilco Dijkstra
be trivially found with an assert? Also would it not be a good idea to have a single register copy function that ensures all data is copied? ChangeLog: 2014-12-09 Wilco Dijkstra wdijk...@arm.com * gcc/ira-emit.c (ira_create_new_reg): Copy preference classes. --- gcc/ira-emit.c | 11

RE: [PATCH] Fix IRA register preferencing

2014-12-10 Thread Wilco Dijkstra
Jeff Law wrote: On 12/09/14 12:21, Wilco Dijkstra wrote: With the fix it uses a floating point register as expected. Given a similar issue in https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html, would it not be better to change the initialization values of reg_pref to illegal

[PATCH][AArch64] Generalize code alignment

2014-12-12 Thread Wilco Dijkstra
This patch generalizes the code alignment and lets each CPU set function, jump and loop alignment independently. The defaults for A53/A57 are based the original patch by James Greenhalgh. OK for trunk? ChangeLog: 2014-12-13 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64

[PATCH][AArch64] Add TARGET_MIN_DIVISIONS_FOR_RECIP_MUL

2014-12-12 Thread Wilco Dijkstra
Add an override for TARGET_MIN_DIVISIONS_FOR_RECIP_MUL and set the minimum number of divisions to 2. This gives ~0.5% speedup on SPECFP2000/2006. OK for trunk? ChangeLog: 2014-12-13 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c (TARGET_MIN_DIVISIONS_FOR_RECIP_MUL

RE: [PATCH] Remove inefficient branchless conditional negate optimization

2015-03-06 Thread Wilco Dijkstra
into gcc.target/i386/ I've moved it and changed the compile condition: /* { dg-do compile { target { ! { ia32 } } } } */ Jiong, can you commit this please? Wilco 2015-03-06 Wilco Dijkstra wdijk...@arm.com * gcc/tree-ssa-phiopt.c (neg_replacement): Remove. (tree_ssa_phiopt_worker

[PATCH][AArch64] Use conditional negate for abs expansion

2015-03-03 Thread Wilco Dijkstra
sxtwx0, w0 eor x1, x0, x0, asr 63 sub x1, x1, x0, asr 63 mov x0, x1 ret After: addsw0, w0, 1 csneg w0, w0, w0, pl ret ChangeLog: 2015-03-03 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.md

RE: [PATCH] Remove inefficient branchless conditional negate optimization

2015-02-27 Thread Wilco Dijkstra
Richard Biener wrote: On Thu, Feb 26, 2015 at 11:20 PM, Jeff Law l...@redhat.com wrote: On 02/26/15 10:30, Wilco Dijkstra wrote: Several GCC versions ago a conditional negate optimization was introduced as a workaround for PR45685. However the branchless expansion for conditional

RE: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable

2015-03-03 Thread Wilco Dijkstra
Andrew Pinski wrote: On Tue, Mar 3, 2015 at 10:06 AM, Wilco Dijkstra wdijk...@arm.com wrote: This patch makes aarch64_min_divisions_for_recip_mul configurable for float and double. This allows CPUs with really fast or multiple dividers to return 3 (or even 4) if that happens

[PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable

2015-03-03 Thread Wilco Dijkstra
This patch makes aarch64_min_divisions_for_recip_mul configurable for float and double. This allows CPUs with really fast or multiple dividers to return 3 (or even 4) if that happens to be faster overall. No code generation change - bootstrap regression OK. ChangeLog: 2015-03-03 Wilco

[PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS

2015-03-04 Thread Wilco Dijkstra
Include the cost of op0 and op1 in all cases in PLUS and MINUS in aarch64_rtx_costs. Bootstrap regression OK. ChangeLog: 2015-03-04 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.c (aarch64_rtx_costs): Calculate cost of op0 and op1 in PLUS and MINUS cases

RE: [PATCH] Remove inefficient branchless conditional negate optimization

2015-03-04 Thread Wilco Dijkstra
Jeff Law wrote: On 02/26/15 10:30, Wilco Dijkstra wrote: Several GCC versions ago a conditional negate optimization was introduced as a workaround for PR45685. However the branchless expansion for conditional negate is extremely inefficient on most targets (5 sequentially dependent

RE: [PATCH][AArch64] Use conditional negate for abs expansion

2015-03-04 Thread Wilco Dijkstra
Maxim Kuvyrkov wrote: You are removing the 2nd alternative that generates abs with your patch. While I agree that using csneg is faster on all implementations, can you say the same for abs? Especially given the fact that csneg requires 4 operands instead of abs'es 2? Yes, given that

RE: [PATCH][AArch64] Use conditional negate for abs expansion

2015-03-04 Thread Wilco Dijkstra
Maxim Kuvyrkov wrote: On Mar 4, 2015, at 3:30 PM, Wilco Dijkstra wdijk...@arm.com wrote: Maxim Kuvyrkov wrote: You are removing the 2nd alternative that generates abs with your patch. While I agree that using csneg is faster on all implementations, can you say the same for abs

[PATCH][AArch64] Fix Cortex-A53 shift costs

2015-03-05 Thread Wilco Dijkstra
This patch fixes the shift costs for Cortex-A53 so they are more accurate - immediate shifts use SBFM/UBFM which takes 2 cycles, register controlled shifts take 1 cycle. Bootstrap and regression OK. ChangeLog: 2015-03-05 Wilco Dijkstra wdijk...@arm.com * gcc/config/arm/aarch-cost

[PATCH] Remove inefficient branchless conditional negate optimization

2015-02-26 Thread Wilco Dijkstra
,%rdi), %eax ret After: cmp w0, 4 csneg w0, w0, w0, lt ret movl%edi, %edx movl%edi, %eax negl%edx cmpl$4, %edi cmovge %edx, %eax ret ChangeLog: 2015-02-26 Wilco Dijkstra wdijk...@arm.com

[PATCH][AArch64] Improve spill code - swap order in shl pattern

2015-04-27 Thread Wilco Dijkstra
-27 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_mode3): Place integer variant first. --- gcc/config/aarch64/aarch64.md | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b

RE: [PATCH][AArch64] Fix Cortex-A53 shift costs

2015-04-27 Thread Wilco Dijkstra
ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 05 March 2015 14:49 To: gcc-patches@gcc.gnu.org Subject: [PATCH][AArch64] Fix Cortex-A53 shift costs This patch fixes the shift costs for Cortex-A53 so they are more accurate - immediate shifts use

RE: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable

2015-04-27 Thread Wilco Dijkstra
ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 03 March 2015 18:06 To: GCC Patches Subject: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable This patch makes aarch64_min_divisions_for_recip_mul configurable for float

RE: [PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS

2015-04-27 Thread Wilco Dijkstra
ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 04 March 2015 15:38 To: GCC Patches Subject: [PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS Include the cost of op0 and op1 in all cases in PLUS and MINUS in aarch64_rtx_costs. Bootstrap

RE: [PATCH][AArch64] Use conditional negate for abs expansion

2015-04-27 Thread Wilco Dijkstra
ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 03 March 2015 16:19 To: GCC Patches Subject: [PATCH][AArch64] Use conditional negate for abs expansion Expand abs into a compare and conditional negate. This is the most obvious expansion, enables

RE: [PATCH] Fix IRA register preferencing

2015-04-27 Thread Wilco Dijkstra
Jeff Law wrote: On 12/10/14 06:26, Wilco Dijkstra wrote: If recomputing is best does that mean that record_reg_classes should not give a boost to the preferred class in the 2nd pass? Perhaps. I haven't looked deeply at this part of IRA. I was relaying my experiences with (ab)using

RE: [PATCH][AArch64] Use conditional negate for abs expansion

2015-04-27 Thread Wilco Dijkstra
James Greenhalgh wrote: On Mon, Apr 27, 2015 at 02:42:36PM +0100, Wilco Dijkstra wrote: -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 03 March 2015 16:19 To: GCC Patches Subject: [PATCH][AArch64] Use conditional negate for abs expansion

RE: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable

2015-05-01 Thread Wilco Dijkstra
Marcus Shawcroft wrote: On 27 April 2015 at 14:43, Wilco Dijkstra wdijk...@arm.com wrote: static unsigned int -aarch64_min_divisions_for_recip_mul (enum machine_mode mode ATTRIBUTE_UNUSED) +aarch64_min_divisions_for_recip_mul (enum machine_mode mode) { - return 2

RE: [PATCH][AArch64] Fix Cortex-A53 shift costs

2015-05-01 Thread Wilco Dijkstra
Marcus Shawcroft wrote: On 5 March 2015 at 14:49, Wilco Dijkstra wdijk...@arm.com wrote: This patch fixes the shift costs for Cortex-A53 so they are more accurate - immediate shifts use SBFM/UBFM which takes 2 cycles, register controlled shifts take 1 cycle. Bootstrap and regression

RE: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable

2015-05-01 Thread Wilco Dijkstra
Marcus Shawcroft wrote: On 1 May 2015 at 12:26, Wilco Dijkstra wdijk...@arm.com wrote: Marcus Shawcroft wrote: On 27 April 2015 at 14:43, Wilco Dijkstra wdijk...@arm.com wrote: static unsigned int -aarch64_min_divisions_for_recip_mul (enum machine_mode mode ATTRIBUTE_UNUSED

RE: [PATCH][AArch64] Use conditional negate for abs expansion

2015-05-14 Thread Wilco Dijkstra
James Greenhalgh wrote: On Mon, Apr 27, 2015 at 05:57:26PM +0100, Wilco Dijkstra wrote: James Greenhalgh wrote: On Mon, Apr 27, 2015 at 02:42:36PM +0100, Wilco Dijkstra wrote: -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 03 March 2015 16:19

Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation

2015-08-12 Thread Wilco Dijkstra
Richard Henderson wrote: However, the way that aarch64 and alpha have done it hasn't been ideal, in that there's a fairly costly search that must be done every time. I've thought before about changing this so that we would be able to cache results, akin to how we do it in expmed.c for

RE: [PATCH][AArch64] Improve spill code - swap order in shl pattern

2015-07-27 Thread Wilco Dijkstra
ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 27 April 2015 14:37 To: GCC Patches Subject: [PATCH][AArch64] Improve spill code - swap order in shl pattern Various instructions are supported as integer operations as well as SIMD on AArch64. When

[PATCH][AArch64] Improve spill code - swap order in shr patterns

2015-07-27 Thread Wilco Dijkstra
to the extra int-FP moves. Placing the integer variant first in the shr pattern generates far more optimal spill code. 2015-07-27 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.md (aarch64_lshr_sisd_or_int_mode3): Place integer variant first

RE: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation

2015-08-25 Thread Wilco Dijkstra
Richard Henderson wrote: On 08/12/2015 08:59 AM, Wilco Dijkstra wrote: I looked at the statistics of AArch64 immediate generation a while ago. The interesting thing is ~95% of calls are queries, and the same query is on average repeated 10 times in a row. So (a) it is not important

[PATCH][AArch64] Avoid emitting zero immediate as zero register

2015-10-28 Thread Wilco Dijkstra
Several instructions accidentally emit wzr/xzr even when the pattern specifies an immediate. Fix this by removing the register specifier in patterns that emit immediates. Passes regression tests. OK for commit? ChangeLog: 2015-10-28 Wilco Dijkstra <wdijk...@arm.com> * gcc/

[PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose better expand order

2015-11-13 Thread Wilco Dijkstra
This patch adds CCMP selection based on rtx costs. This is based on Jiong's already approved patch https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01434.html with some minor refactoring and the tests updated. OK for commit? ChangeLog: 2015-11-13 Jiong Wang gcc/ *

[PATCH 3/4][AArch64] Add CCMP to rtx costs

2015-11-13 Thread Wilco Dijkstra
This patch adds support for rtx costing of CCMP. The cost is the same as int/FP compare, however comparisons with zero get a slightly larger cost. This means we prefer emitting compares with zero so they can be merged with ALU operations. OK for commit? ChangeLog: 2015-11-13 Wilco Dijkstra

[PATCH 1/4][AArch64] Generalize CCMP support

2015-11-13 Thread Wilco Dijkstra
a compare with zero can be merged into an ALU operation: int f (int a, int b) { a += b; return a == 0 || a == 3; } f: addsw0, w0, w1 ccmpw0, 3, 4, ne csetw0, eq ret Passes GCC regression tests. OK for commit? ChangeLog: 2015-11-13 Wilco Dijkstra

[PATCH 2/4][AArch64] Add support for FCCMP

2015-11-13 Thread Wilco Dijkstra
This patch adds support for FCCMP. This is trivial with the new CCMP representation - remove the restriction of FP in ccmp.c and add FCCMP patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected. OK for commit? ChangeLog: 2015-11-13 Wilco Dijkstra <wdijk...@arm.com>

RE: [PATCH 2/4][AArch64] Add support for FCCMP

2015-11-13 Thread Wilco Dijkstra
> Evandro Menezes wrote: > Hi, Wilco. > > It looks good to me, but FCMP is quite different from FCCMP on Exynos M1, > so it'd be helpful to have distinct types for them. Say, "fcmp{s,d}" > and "fccmp{s,d}". Would it be acceptable to add this with this patch or > later? It would be easy to add

[PATCH] Fix IRA register preferencing

2015-11-10 Thread Wilco Dijkstra
of reg_pref to illegal register classes so this kind of issue can be trivially found with an assert? Also would it not be a good idea to have a single register copy function that ensures all data is copied? ChangeLog: 2014-12-09  Wilco Dijkstra  wdijk...@arm.com     * gcc/ira-emit.c

[PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2015-11-06 Thread Wilco Dijkstra
of the register. This results in better register allocation overall, fewer spills and reduced codesize - particularly in SPEC2006 gamess. GCC regression passes with several minor fixes. OK for commit? ChangeLog: 2015-11-06 Wilco Dijkstra <wdijk...@arm.com> * gcc/config/aarch64/aar

[PATCH][AArch64] Update patterns to support FP zero

2015-10-08 Thread Wilco Dijkstra
This patch improves support for instructions that allow FP zero immediate. All FP compares generated by various patterns should use aarch64_fp_compare_operand. LDP/STP uses aarch64_reg_or_fp_zero. Passes regression on AArch64. OK for commit? ChangeLog: 2015-10-08 Wilco Dijkstra <wd

[PATCH][AArch64] Enable fusion of AES instructions

2015-10-14 Thread Wilco Dijkstra
Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. This can give up to 2x speedup on many AArch64 implementations. Also model the crypto instructions on Cortex-A57 according to the Optimization Guide. Passes regression tests. ChangeLog: 2015-10-14 Wilco Dijkstra

RE: [PATCH 1/4 v2][AArch64] Generalize CCMP support

2015-11-18 Thread Wilco Dijkstra
Bernd Schmidt wrote: > Sent: 17 November 2015 22:16 > To: Wilco Dijkstra; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH 1/4][AArch64] Generalize CCMP support > > On 11/13/2015 05:02 PM, Wilco Dijkstra wrote: > > * gcc/ccmp.c (expand_ccmp_expr): Extract cmp_cod

[PATCH 2/4 v2][AArch64] Add support for FCCMP

2015-11-18 Thread Wilco Dijkstra
(v2 version removes 4 enums) This patch adds support for FCCMP. This is trivial with the new CCMP representation - remove the restriction of FP in ccmp.c and add FCCMP patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected. OK for commit? ChangeLog: 2015-11-18 Wilco Dijkstra

[PATCH 4/4 v2][AArch64] Cost CCMP instruction sequences to choose better expand order

2015-11-18 Thread Wilco Dijkstra
Wang <jiong.w...@arm.com> 2015-11-18 Wilco Dijkstra <wdijk...@arm.com> gcc/ * ccmp.c (expand_ccmp_expr_1): Cost the instruction sequences generated from different expand order. Cleanup enum use. gcc/testsuite/ * gcc.target/aarch64/ccmp_1.c: Update test

RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Segher Boessenkool wrote: > On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote: > > > > Combine canonicalizes certain AND masks in a comparison with zero into > > > > extracts of the > > > widest > > > > register t

RFC: Combine of compare & and oddity

2015-09-02 Thread Wilco Dijkstra
Hi, Combine canonicalizes certain AND masks in a comparison with zero into extracts of the widest register type. During matching these are expanded into a very inefficient sequence that fails to match. For example (x & 2) == 0 is matched in combine like this: Failed to match this instruction:

RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Oleg Endo wrote: > On 04 Sep 2015, at 01:54, Segher Boessenkool > wrote: > > > On Thu, Sep 03, 2015 at 05:25:43PM +0100, Kyrill Tkachov wrote: > >>> void g(void); > >>> void f(int *x) { if (*x & 2) g(); } > > > >> A testcase I was looking at is: > >> int > >> foo

RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Segher Boessenkool wrote: > On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote: > > >>You will end up with a *lot* of target hooks like this. It will also > > >>make testing harder (less coverage). I am not sure that is a good idea. > > > > > >We certainly need a lot more target hooks in

RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Kyrill Tkachov wrote: > A testcase I was looking at is: > int > foo (int a) > { >return (a & 7) != 0; > } > > For me this generates: > and w0, w0, 7 > cmp w0, wzr > csetw0, ne > ret > > when it could be: > tst w0, 7 >

RE: [0/7] Type promotion pass and elimination of zext/sext

2015-09-07 Thread Wilco Dijkstra
> pins...@gmail.com wrote: > > On Sep 7, 2015, at 7:22 PM, Kugan <kugan.vivekanandara...@linaro.org> wrote: > > > > > > > > On 07/09/15 20:46, Wilco Dijkstra wrote: > >>> Kugan wrote: > >>> 2. vector-compare-1.c from c-c++-common/tortu

RE: [0/7] Type promotion pass and elimination of zext/sext

2015-09-07 Thread Wilco Dijkstra
> Kugan wrote: > 2. vector-compare-1.c from c-c++-common/torture fails to assemble with > -O3 -g Error: unaligned opcodes detected in executable segment. It works > fine if I remove the -g. I am looking into it and needs to be fixed as well. This is a known assembler bug I found a while back,

RE: [0/7] Type promotion pass and elimination of zext/sext

2015-09-08 Thread Wilco Dijkstra
> Renlin Li wrote: > Hi Andrew, > > Previously, there is a discussion thread in binutils mailing list: > > https://sourceware.org/ml/binutils/2015-04/msg00032.html > > Nick proposed a way to fix, Richard Henderson hold similar opinion as you. Both Nick and Richard H seem to think it is an

RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Segher Boessenkool wrote: > Hi Wilco, > > On Wed, Sep 02, 2015 at 06:09:24PM +0100, Wilco Dijkstra wrote: > > Combine canonicalizes certain AND masks in a comparison with zero into > > extracts of the > widest > > register type. During matching these are

[PATCH][AArch64][3/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
Remove aarch64_bitmasks, aarch64_build_bitmask_table and aarch64_bitmasks_cmp as they are no longer used by the immediate generation code. No change in generated code, passes GCC regression tests/bootstrap. ChangeLog: 2015-09-02 Wilco Dijkstra <wdijk...@arm.com> * gcc/config/a

[PATCH][AArch64][4/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
used instead of add/sub (codesize remains the same). ChangeLog: 2015-09-02 Wilco Dijkstra <wdijk...@arm.com> * gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate): Remove redundant immediate generation code. --- gcc/config/aarch64/aarch64.

[PATCH][AArch64][1/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
and checks the mask is repeated across the full 64 bits. Native performance is 5-6x faster on typical queries. No change in generated code, passes GCC regression/bootstrap. ChangeLog: 2015-09-02 Wilco Dijkstra <wdijk...@arm.com> * gcc/config/aarch64/aarch64.c (aarch64_bitma

[PATCH][AArch64][2/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
tests/bootstrap. ChangeLog: 2015-09-02 Wilco Dijkstra <wdijk...@arm.com> * gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate): Replace slow immediate matching loops with a faster algorithm. --- gcc/config/aarch64/aarch64.

[PATCH][AArch64][0/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
This is a set of patches to reduce the compile-time overhead of immediate generation on AArch64. There have been discussions and investigations into reducing the overhead of immediate generation using various caching strategies. However the statistics showed some of the expensive immediate

[PATCH][AArch64][5/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
in generated code for some special cases but codesize is identical. ChangeLog: 2015-09-02 Wilco Dijkstra <wdijk...@arm.com> * gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate): Cleanup immediate generation code. --- gcc/config/aarch64/aarch64.c

[PATCH][AArch64] Improve add immediate expansion

2015-09-25 Thread Wilco Dijkstra
sted on AArch64. OK for commit? ChangeLog: 2015-09-25 Wilco Dijkstra <wdijk...@arm.com> * gcc/config/aarch64/aarch64.md (add3): Block early expansion into 2 add instructions. (add3_pluslong): New pattern to combine complex immediates into 2 additions. ---

RE: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2015-12-15 Thread Wilco Dijkstra
Adding Bernd - would you mind reviewing the ccmp.c change please? > -Original Message- > From: James Greenhalgh [mailto:james.greenha...@arm.com] > Sent: 15 December 2015 16:42 > To: Wilco Dijkstra > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH 2/4 v2][AArch64] Add

Re: [PATCH 1/4 v2][AArch64] Generalize CCMP support

2015-12-15 Thread Wilco Dijkstra
-11-12 Wilco Dijkstra <wdijk...@arm.com> * gcc/target.def (gen_ccmp_first): Update documentation. (gen_ccmp_next): Likewise. * gcc/doc/tm.texi (gen_ccmp_first): Update documentation. (gen_ccmp_next): Likewise. * gcc/ccmp.c (expand_ccmp_expr): E

RE: [PATCH][AArch64] Avoid emitting zero immediate as zero register

2015-12-15 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wdijk...@arm.com] > Sent: 28 October 2015 17:33 > To: GCC Patches > Subject: [PATCH][AArch64] Avoid emitting zero immediate as zero register > > Several instructions accidentally emit wzr/xzr even when the

RE: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2015-12-15 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com] > Sent: 17 November 2015 18:36 > To: gcc-patches@gcc.gnu.org > Subject: [PATCH 2/4 v2][AArch64] Add support for FCCMP > > (v2 version removes 4 enums) > > This patch adds support

RE: [PATCH 3/4][AArch64] Add CCMP to rtx costs

2015-12-15 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com] > Sent: 13 November 2015 16:03 > To: 'gcc-patches@gcc.gnu.org' > Subject: [PATCH 3/4][AArch64] Add CCMP to rtx costs > > This patch adds support for rtx costing of CCMP. The cost is

RE: [PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose better expand order

2015-12-15 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com] > Sent: 13 November 2015 16:03 > To: 'gcc-patches@gcc.gnu.org' > Subject: [PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose > better expand order > > This patch adds CCM

RE: [PATCH][AArch64] Enable fusion of AES instructions

2015-12-15 Thread Wilco Dijkstra
Kyrill Tkachov wrote: > On 14/10/15 13:30, Wilco Dijkstra wrote: > > Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. > > This can give up to 2x > > speedup on many AArch64 implementations. Also model the crypto instructions > &

[PATCH][AArch64] Add vector permute cost

2015-12-15 Thread Wilco Dijkstra
C16 eor v1.16b, v1.16b, v0.16b eor v0.16b, v1.16b, v0.16b eor v1.16b, v1.16b, v0.16b tbl v0.16b, {v0.16b - v1.16b}, v5.16b Regress passes. This fixes regressions that were introduced recently, so OK for commit? ChangeLog: 2015-12-15 Wilco Dijkstra <

RE: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2015-12-15 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com] > Sent: 06 November 2015 20:06 > To: 'gcc-patches@gcc.gnu.org' > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS > > Th

RE: [PATCH][ARM] Enable fusion of AES instructions

2015-12-15 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com] > Sent: 19 November 2015 18:12 > To: gcc-patches@gcc.gnu.org > Subject: [PATCH][ARM] Enable fusion of AES instructions > > Enable instruction fusion of AES instructions on ARM for Cor

RE: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2015-12-17 Thread Wilco Dijkstra
James Greenhalgh wrote: > On Wed, Dec 16, 2015 at 01:05:21PM +0000, Wilco Dijkstra wrote: > > James Greenhalgh wrote: > > > On Tue, Dec 15, 2015 at 10:54:49AM +0000, Wilco Dijkstra wrote: > > > > ping > > > > > > > > > -Original Messag

RE: [PATCH][AArch64] Add vector permute cost

2015-12-16 Thread Wilco Dijkstra
Richard Biener wrote: > On Wed, Dec 16, 2015 at 10:32 AM, James Greenhalgh > <james.greenha...@arm.com> wrote: > > On Tue, Dec 15, 2015 at 11:35:45AM +0000, Wilco Dijkstra wrote: > >> > >> Add support for vector permute cost since various permutes can expan

RE: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2015-12-16 Thread Wilco Dijkstra
James Greenhalgh wrote: > On Tue, Dec 15, 2015 at 10:54:49AM +0000, Wilco Dijkstra wrote: > > ping > > > > > -Original Message- > > > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com] > > > Sent: 06 November 2015 20:06 > > > To: 'gcc-

Re: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2016-01-06 Thread Wilco Dijkstra
Hi Evandro, > Here's what I had in mind when I inquired about distinguishing FCMP from > FCCMP. As you can see in the patch, Exynos is the only target that > cares about it, but I wonder if ThunderX or Xgene would too. > > What do you think? The new attributes look fine (I've got a similar

RE: [PATCH 1/4 v2][AArch64] Generalize CCMP support

2015-11-27 Thread Wilco Dijkstra
> James Greenhalgh wrote: > > Could you please repost this with the word-wrapping issues fixed. > > I can't apply it to my tree for review or to commit it on your behalf in > > the current form. So it looks like Outlook no longer supports sending emails without wrapping and the maximum is only

  1   2   3   4   5   6   7   8   9   10   >