[PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-05 Thread Wilco Dijkstra
x23, x24, [sp,#32] ldpx25, x26, [sp,#48] ldpx27, x28, [sp,#64] ldrx30, [sp,#80] ldpx19, x20, [sp],#96 ret Passes bootstrap, OK for commit (and backport to GCC7)? ChangeLog: 2018-01-05 Wilco Dijkstra * config/aarch64/aarch64.c (aarch64_components_for_bb

Re: [PATCH][AArch64] Improve register allocation of fma

2018-01-05 Thread Wilco Dijkstra
Andrew Pinski wrote: > Seems like you should do something similar to the integer madd/msub > instructions too (aarch64_mla is already correct but aarch64_mla_elt > needs this too). Integer madd/msub may benefit too, however it wouldn't make a difference for a 3-operand mla since the register allo

Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-08 Thread Wilco Dijkstra
Segher Boessenkool wrote: > On Fri, Jan 05, 2018 at 12:22:44PM +0000, Wilco Dijkstra wrote: >> An example epilog in a shrinkwrapped function before: >> >> ldp    x21, x22, [sp,#16] >> ldr    x23, [sp,#32] >> ldr    x24, [sp,#40] >> ldp    x25, x26, [sp,#48

Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-08 Thread Wilco Dijkstra
Segher Boessenkool wrote: > On Mon, Jan 08, 2018 at 01:27:24PM +0000, Wilco Dijkstra wrote: > >> Peepholing is very conservative about instructions using SP and won't touch >> anything frame related. If this was working better then the backend could >> just >&g

Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-25 Thread Wilco Dijkstra
explicitly checking for a subset of GENERAL_REGS and FP_REGS. Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. Passes regress, OK for commit? Since it is a regression introduced in GCC8, OK to backport to GCC8? ChangeLog: 2018-05-25 Wilco Dijkstra * config/aarch64/

Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-29 Thread Wilco Dijkstra
James Greenhalgh wrote: > > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. > > > I'd prefer more detail than this for a workaround; which test, why did it > > start to fail, why is this the right solution, etc. It was gcc.target/aarch64/vect_copy_lane_1.c generating: test

Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-30 Thread Wilco Dijkstra
Richard Sandiford > The "?" change seems to make intrinsic sense given the extra cost of the > GPR alternative.  But I think the real reason for this failure is that > we define no V1DF patterns, and target-independent code falls back to > using moves in the corresponding *integer* mode.  So for

Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-31 Thread Wilco Dijkstra
Richard Sandiford wrote: >> This has probably been reported elsewhere already but I can't find >> such a report, so sorry for possible duplicate, >> but this patch is causing ICEs on aarch64 >> FAIL:    gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve >> (internal compiler error) >> FAIL:   

[COMMITTED][testsuite] Remove xfail from vect-abs-compile.c

2018-06-18 Thread Wilco Dijkstra
Since PR64946 has been fixed, we can remove the xfail from this test. Committed as obvious. ChangeLog: 2018-06-18 Wilco Dijkstra PR tree-optimization/64946 * gcc.target/aarch64/vect-abs-compile.c: Remove xfail. -- diff --git a/gcc/testsuite/gcc.target/aarch64/vect-abs

[COMMITTED][testsuite] Add target pthread to pr86076.c

2018-06-18 Thread Wilco Dijkstra
Add missing target pthread to ensure test doesn't fail on bare-metal targets. Committed as obvious. ChangeLog: 2018-06-18 Wilco Dijkstra PR tree-optimization/86076 * gcc.dg/pr86076.c: Add target pthread for bare-metal targets. -- diff --git a/gcc/testsuite/gcc.dg/pr8607

[PATCH v3] Change default to -fno-math-errno

2018-06-18 Thread Wilco Dijkstra
: f: str x30, [sp, -16]! bl lroundf add x0, x0, 1 ldr x30, [sp], 16 ret With -fno-math-errno: f: fcvtas x0, s0 add x0, x0, 1 ret Passes regress on AArch64. OK for commit? ChangeLog: 2018-06-18 Wilco Dijkstra

Re: [PATCH v3] Change default to -fno-math-errno

2018-06-19 Thread Wilco Dijkstra
Richard Biener wrote: > There are a number of regression tests that check for errno handling > (I added some to avoid aliasing for example).  Please make sure to > add explicit -fmath-errno to those that do not already have it set > (I guess such patch would be obvious and independent of this one)

Re: [PATCH v3] Change default to -fno-math-errno

2018-06-26 Thread Wilco Dijkstra
Joseph Myers wrote: > On Thu, 21 Jun 2018, Jeff Law wrote: > > > I think all this implies that the setting of -fno-math-errno by default > > really depends on the math library in use since it's the library that > > has to arrange for either errno to get set or for an exception to be raised. > > If

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-06-27 Thread Wilco Dijkstra
Eric Botcazou wrote: > > The AArch64 parts are OK. I've been holding off approving the patch while > > I wait for Eric to reply on the x86_64 fails with your new testcase. > > The test is not portable in any case since it uses the "optimize" attribute > so > I'd just make it specific to Aarch64

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-06-27 Thread Wilco Dijkstra
Eric Botcazou wrote: >> This test can easily be changed not to use optimize since it doesn't look >> like it needs it. We really need to tests these builtins properly, >> otherwise they will continue to fail on most targets. > > As far as I can see PR target/84521 has been reported only for Aarch6

Re: [PATCH v3] Change default to -fno-math-errno

2018-06-27 Thread Wilco Dijkstra
Joseph Myers wrote: > On Tue, 26 Jun 2018, Wilco Dijkstra wrote: > > That looks incorrect indeed but that's mostly a problem with -fmath-errno > > as it > > would result in GCC assuming the function is const/pure when in fact it > > isn't. > > Does

[COMMITTED][testsuite] Fix f16_mov_immediate_3.c

2018-06-28 Thread Wilco Dijkstra
Fix and simplify the testcase so it generates dup even on latest trunk. This fixes the failure reported in: https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01799.html Committed as obvious. ChangeLog: 2018-06-28 Wilco Dijkstra * gcc.target/aarch64/f16_mov_immediate_3.c: Fix testcase

Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-09 Thread Wilco Dijkstra
Segher Boessenkool wrote: > On Mon, Jan 08, 2018 at 0:25:47PM +0000, Wilco Dijkstra wrote: >> > Always pairing two registers together *also* degrades code quality. >> >> No, while it's not optimal, it means smaller code and fewer memory accesses. > > It means

Re: [PATCH] Simplify floating point comparisons

2018-01-09 Thread Wilco Dijkstra
Richard Biener wrote: >On Thu, Jan 4, 2018 at 10:27 PM, Marc Glisse wrote: >> I don't understand how the check you added helps. It simply blocks the transformation for infinity: + (if (!REAL_VALUE_ISINF (TREE_REAL_CST (@0))) + (switch + (if (real_less (&dconst0, TREE_REAL_CST_P

Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-11 Thread Wilco Dijkstra
Segher Boessenkool wrote:   > Of course I see that ldp is useful.  I don't think that this particular > way of forcing more pairs is a good idea.  Needs testing / benchmarking / > instrumentation, and we haven't seen any of that. I wouldn't propose a patch if it caused slowdowns. In fact I am see

Re: [PATCH] Simplify floating point comparisons

2018-01-12 Thread Wilco Dijkstra
>= and <= for now since C / x can underflow if C is small. Simplify (x * C1) > C2 into x > (C2 / C1) with -funsafe-math-optimizations. If C1 is negative the comparison is reversed. OK for commit? ChangeLog 2018-01-10 Wilco Dijkstra Jackson Woodruff gcc/

Re: [PATCH v2] Change default to -fno-math-errno

2018-01-12 Thread Wilco Dijkstra
as x0, s0 4: d65f03c0ret With -fno-math-errno: f: fcvtas x0, s0 add x0, x0, 1 ret OK for commit? 2018-01-12 Wilco Dijkstra * common.opt (fmath-errno): Change default to 0. * opts.c (set_fast_math_flags): Force -fno-ma

[PATCH] PR82964: Fix 128-bit immediate ICEs

2018-01-15 Thread Wilco Dijkstra
ommit? ChangeLog: 2018-01-15 Wilco Dijkstra Richard Sandiford gcc/ PR target/82964 * config/aarch64/aarch64.md (movti_aarch64): Use Uti constraint. * config/aarch64/aarch64.c (aarch64_mov128_immediate): New function. (aarch64_legitimate_constant_p):

Re: [PATCH v2] Change default to -fno-math-errno

2018-01-16 Thread Wilco Dijkstra
Joseph Myers wrote: > Another question to consider: what about configurations (mostly > soft-float) where floating-point exceptions are not supported?  (glibc > wrongly defines math_errhandling to include MATH_ERREXCEPT there, but the > only option actually permitted by C99 in that case would b

[PATCH v2][AArch64] Remove remaining uses of * in patterns

2018-01-16 Thread Wilco Dijkstra
egister for same-size int<->fp conversions. Passes regress & bootstrap, OK for commit? ChangeLog: 2018-01-16 Wilco Dijkstra * config/aarch64/aarch64.md (mov): Remove '*' in alternatives. (movsi_aarch64): Likewise. (load_pairsi): Likewise. (load_p

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
Hi, In general I think the best way to achieve this would be to use the existing cost models which are there for exactly this purpose. If this doesn't work well enough then we should fix those. As is, this patch disables a whole class of instructions for a specific target rather than simply tellin

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
(finished version this time, somehow Outlook loves to send emails early...) Hi, In general I think the best way to achieve this would be to use the existing cost models which are there for exactly this purpose. If this doesn't work well enough then we should fix those. As is, this patch disables

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
Siddhesh Poyarekar wrote:   > The current cost model will disable reg offset for loads as well as > stores, which doesn't work well since loads with reg offset are faster > for falkor. Why is that a bad thing? With the patch as is, the testcase generates: .L4: ldr q0, [x2, x3]

Re: [PATCH] PR82964: Fix 128-bit immediate ICEs

2018-01-17 Thread Wilco Dijkstra
James Greenhalgh wrote: > -  /* Do not allow wide int constants - this requires support in movti.  */ > +  /* Only allow simple 128-bit immediates.  */ >    if (CONST_WIDE_INT_P (x)) > -    return false; > +    return aarch64_mov128_immediate (x); > I can see why this could be correct, but it is

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
Siddhesh Poyarekar wrote: On Wednesday 17 January 2018 08:31 PM, Wilco Dijkstra wrote: > Why is that a bad thing? With the patch as is, the testcase generates: > > .L4: >    ldr q0, [x2, x3] >    add x5, x1, x3 >    add x3, x3, 16 >    cmp   

[Committed][AArch64] Fix fp16 test failures after PR82964 fix

2018-01-18 Thread Wilco Dijkstra
s the failures and has no effect otherwise. Committed as trivial fix. ChangeLog: 2018-01-18 Wilco Dijkstra gcc/ PR target/82964 * config/aarch64/aarch64.c (aarch64_legitimate_constant_p): Use GET_MODE_CLASS for scalar floating point. -- diff --git a/gcc/config/aarch64/aarc

Re: [PATCH] PR82964: Fix 128-bit immediate ICEs

2018-01-18 Thread Wilco Dijkstra
Christophe Lyon wrote: > After this patch (r256800), I have noticed new failures on aarch64: >    gcc.target/aarch64/f16_mov_immediate_1.c scan-assembler-times > mov\tw[0-9]+, #?19520 3 (found 0 times) Thanks for spotting these, the scripts appear to have missed those (contrib/dg-cmp-results.sh s

[PATCH][AArch64] PR79262: Adjust vector cost

2018-01-22 Thread Wilco Dijkstra
- libquantum and SPECv6 performance improves. OK for commit? ChangeLog: 2018-01-22 Wilco Dijkstra PR target/79262 * config/aarch64/aarch64.c (generic_vector_cost): Adjust vec_to_scalar_cost. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index

[PATCH] PR84068: Fix sort order of SCHED_PRESSURE_MODEL

2018-01-31 Thread Wilco Dijkstra
sorted using the pressure model, and instructions outside it will use RFS_DEP_COUNT and/or RFS_TIE for their order. Bootstrap OK on AArch64, OK for commit? ChangeLog: 2018-01-31 Wilco Dijkstra PR rlt-optimization/84068 * haifa-sched.c (rank_for_schedule): Fix SCHED_PRESSURE_MODEL

Re: [PATCH] PR84068: Fix sort order of SCHED_PRESSURE_MODEL

2018-01-31 Thread Wilco Dijkstra
Richard Sandiford wrote: > This was the original intent, but was changed in r213708.  TBH I'm not > sure what the second hunk in that revision fixed, since model_index is > supposed to return an index greater than all valid indices when passed > an instruction outside the current block.  Maxim, do

Re: [PATCH v2][AArch64] Remove remaining uses of * in patterns

2018-02-01 Thread Wilco Dijkstra
James Greenhalgh wrote: > Please queue for GCC 9. OK when trunk is back open for new code. This fixes the regressions introduced by the SVE merge conflicts and the failures of aarch64/pr62178.c, both of which are new regressions, so we should fix these now. Wilco

Re: [PATCH] PR84068: Fix sort order of SCHED_PRESSURE_MODEL

2018-02-01 Thread Wilco Dijkstra
Richard Sandiford wrote: > But why wasn't the index 0 as expected for the insns outside of the block? Well it seems it checks for index 0 and sets the model_index as the current maximum model_index count. This means the target_bb check isn't strictly required - I build all of SPECINT2017 using t

Re: [PATCH] PR84068: Fix sort order of SCHED_PRESSURE_MODEL

2018-02-02 Thread Wilco Dijkstra
sort on model_index. If the model_index is the same we defer to RFS_DEP_COUNT and/or RFS_TIE. Bootstrap OK, OK for commit? ChangeLog: 2018-02-02 Wilco Dijkstra PR rlt-optimization/84068 * haifa-sched.c (rank_for_schedule): Fix SCHED_PRESSURE_MODEL sorting. PR rlt

Re: [RFC] Tree loop unroller pass

2018-02-13 Thread Wilco Dijkstra
Hi Kugan, > Based on the previous discussions, I tried to implement a tree loop > unroller for partial unrolling. I would like to queue this RFC patches > for next stage1 review. This is a great plan - GCC urgently requires a good unroller! > * Cost-model for selecting the loop uses the same par

Re: [PATCH v6] aarch64: Add split-stack support

2018-02-13 Thread Wilco Dijkstra
Hi Adhemerval, A few comments on the assembly code: +# This function is called with non-standard calling convention: on entry +# x10 is the requested stack pointer, x11 is previous stack pointer (if +# functions has stacked arguments which needs to be restored), and x12 is +# the caller link reg

Re: [PING][PATCH v3] Disable reg offset in quad-word store for Falkor

2018-02-15 Thread Wilco Dijkstra
Hi Siddhesh, I still don't like the idea of disabling a whole class of instructions in the md file. It seems much better to adjust the costs here so that you get most of the improvement now, and fine tune it once we can differentiate between loads and stores. Taking your example, adding -funroll

Re: [RFC] Tree loop unroller pass

2018-02-16 Thread Wilco Dijkstra
Richard Biener wrote: >> This is a great plan - GCC urgently requires a good unroller! > > How so? I thought it is well-known for many years that the rtl unroller doesn't work properly. In practically all cases where LLVM beats GCC, it is due to unrolling small loops. You may have noticed how p

Re: [RFC] Tree loop unroller pass

2018-02-16 Thread Wilco Dijkstra
Richard Biener wrote: > With Ooo CPUs speculatively executing the next iterations I very much doubt > that. OoO execution is like really dumb loop unrolling, you still have all the dependencies between iterations, all the branches, all the pointer increments etc. Optimizing those reduces instr

Re: [PING][PATCH v3] Disable reg offset in quad-word store for Falkor

2018-02-19 Thread Wilco Dijkstra
Siddhesh Poyarekar wrote: > On Thursday 15 February 2018 07:50 PM, Wilco Dijkstra wrote: >> So it seems to me using existing cost mechanisms is always preferable, even >> if you >> currently can't differentiate between loads and stores. > > Luis is working on addr

[PATCH][AArch64] PR84114: Avoid reassociating FMA

2018-02-22 Thread Wilco Dijkstra
testcase and gives 1% speedup on SPECFP2017, fixing the performance regression. OK for commit? ChangeLog: 2018-02-23 Wilco Dijkstra PR tree-optimization/84114 * config/aarch64/aarch64.c (aarch64_reassociation_width) Avoid reassociation of FLOAT_MODE addition. -- diff

Re: [PATCH][AArch64] PR84114: Avoid reassociating FMA

2018-02-27 Thread Wilco Dijkstra
Richard Biener > It happens that on some targets doing two FMAs in parallel and one > non-FMA operation merging them is faster than chaining three FMAs... Like I mentioned in the PR, long chains should be broken, but for that we need a new parameter to state how long a chain may be before it is

Re: [PATCH][AArch64] Remove aarch64_frame_pointer_required

2018-03-01 Thread Wilco Dijkstra
Richard Sandiford wrote: > But there's the third question of whether the frame pointer is available > for general allocation.  By removing frame_pointer_required, we're saying > that the frame pointer is always available for general use.  Unlike on ARM/Thumb-2, the frame pointer is unfortunately

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Wilco Dijkstra
Jakub Jelinek wrote: > On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote: >> Not sure if I missed some important part of the discussion but >> for the testcase we want to preserve the tailcall, right? So >> it would be enough to set avoid_libcall to >> endp != 0 && CALL_EXPR_TAILCALL

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Wilco Dijkstra
Jakub Jelinek wrote: > On Thu, Apr 12, 2018 at 03:53:13PM +0000, Wilco Dijkstra wrote: >> The tailcall issue is just a distraction. Historically the handling of >> mempcpy  >> has been horribly inefficient in both GCC and GLIBC for practically all >> targets. >&

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

2018-04-12 Thread Wilco Dijkstra
Jakub Jelinek wrote: >On Thu, Apr 12, 2018 at 04:30:07PM +0000, Wilco Dijkstra wrote: >> Jakub Jelinek wrote: >> Frankly I don't see why it is a P1 regression. Do you have a benchmark that > >That is how regression priorities are defined. How can one justify consider

Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2)

2018-04-13 Thread Wilco Dijkstra
Jakub Jelinek wrote:  >On Thu, Apr 12, 2018 at 05:29:35PM +0000, Wilco Dijkstra wrote: >> > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got >> > added, >> > in 2013 other power versions, in 2016 s390*, etc.  Doing a decent mempcpy >> >

Re: [PATCH] Frame pointer for arm with THUMB2 mode

2018-09-05 Thread Wilco Dijkstra
Hi Denis, > We are working on applying Address/LeakSanitizer for the full Tizen OS > distribution. It's about ~1000 packages, ASan/LSan runtime is installed > to ld.so.preload. As we know ASan/LSan has interceptors for > allocators/deallocators such as (malloc/realloc/calloc/free) and so on. > O

Re: [PATCH] Frame pointer for arm with THUMB2 mode

2018-09-05 Thread Wilco Dijkstra
Hi Denis, >> Adding support for a frame chain would require an ABI change. It > would have to > > work across GCC, LLVM, Arm, Thumb-1 and Thumb-2 - not a trivial amount of > > effort. > Clang already works that way. No, that's incorrect like Richard pointed out. Only a single register can be u

[PATCH][AArch64] Support zero-extended move to FP register

2018-09-27 Thread Wilco Dijkstra
.8b fmovw0, s0 ret After: fmovs0, w0 cnt v0.8b, v0.8b addvb0, v0.8b fmovw0, s0 ret Passes regress on AArch64, OK for commit? ChangeLog: 2018-09-27 Wilco Dijkstra gcc/ * config/aarch64/aarch64.md (zero_extendsidi2

Re: [PATCH][GCC][AARCH64] Add even-pair register classes

2018-09-28 Thread Wilco Dijkstra
Matthew wrote: > The canonical way to require even-odd pairs of registers to implement a TImode > pseudo register as mentioned in the documentation is to limit *all* TImode > registers to being even-odd by using the TARGET_HARD_REGNO_MODE_OK hook. And that is the best approach for cases like this

Re: [PATCH][AArch64] Support zero-extended move to FP register

2018-09-28 Thread Wilco Dijkstra
cnt v0.8b, v0.8b addvb0, v0.8b fmovw0, s0 ret After: fmovs0, w0 cnt v0.8b, v0.8b addvb0, v0.8b fmovw0, s0 ret Passes regress on AArch64, OK for commit? ChangeLog: 2018-09-28 Wilco Dijkstra gcc/ * conf

Re: [PATCH][AArch64] Support zero-extended move to FP register

2018-09-28 Thread Wilco Dijkstra
Richard Henderson wrote: > If you're going to add moves r->w, why not also go ahead and add w->r. > There are also HImode fmov zero-extensions, fwiw. Well in principle it would be possible to support all 8/16/32-bit zero extensions for all combinations of int and fp registers. However I prefer t

Re: [PATCH][AArch64] Support zero-extended move to FP register

2018-10-03 Thread Wilco Dijkstra
or commit? ChangeLog: 2018-10-03 Wilco Dijkstra gcc/ * config/aarch64/aarch64.md (zero_extendsidi2_aarch64): Add alternatives to zero-extend between int and floating-point registers. (load_pair_zero_extendsidi2_aarch64): Add alternative to emit zero-extended ldp in

Re: [PATCH v3] Change default to -fno-math-errno

2018-10-11 Thread Wilco Dijkstra
Hi Jeff, > So I went back and reviewed all the discussion around this.  I'm still > having trouble getting comfortable with flipping the default -- unless > we know ahead of time that the target runtime doesn't set errno on any > of the math routines.  That implies a target hook to describe the >

Re: [PATCH][tree-complex.c] PR tree-optimization/70291: Inline floating-point complex multiplication more aggressively

2018-05-02 Thread Wilco Dijkstra
Richard Biener wrote: > why use BUILT_IN_ISUNORDERED but not a GIMPLE_COND with > UNORDERED_EXPR? Note again that might trap/throw with -fsignalling-nans > so better avoid this transform for flag_signalling_nans as well... Both currently trap on signalling NaNs due to the implementation of the C

Re: [PATCH v2][AArch64] Remove remaining uses of * in patterns

2018-05-14 Thread Wilco Dijkstra
James Greenhalgh wrote: > On Tue, Jan 16, 2018 at 04:32:36PM +0000, Wilco Dijkstra wrote: >> v2: Rebased after the big SVE commits >> >> Remove the remaining uses of '*' from aarch64.md. >> Using '*' in alternatives is typically incorrect as it

Re: [PATCH][AArch64] Improve register allocation of fma

2018-05-15 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 04 January 2018 17:46 To: GCC Patches Cc: nd Subject: [PATCH][AArch64] Improve register allocation of fma   This patch improves register allocation of fma by preferring to update the accumulator register.  This is done by adding fma insns with operand 1 as the

Re: [PATCH][AArch64] Simplify frame pointer logic

2018-05-15 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 25 October 2017 16:29 To: GCC Patches Cc: nd Subject: [PATCH][AArch64] Simplify frame pointer logic   Simplify frame pointer logic based on review comments here (https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01727.html). This patch incrementally adds to these

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-15 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 17 November 2017 15:21 To: GCC Patches Cc: nd Subject: [PATCH][AArch64] Set SLOW_BYTE_ACCESS   Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing bitfields by their declared type, which results in better codegeneration on practically any

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-15 Thread Wilco Dijkstra
Hi, > I see nothing about you addressing James' comment from 17th November... I addressed that in a separate patch, see https://patchwork.ozlabs.org/patch/839126/ Wilco

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-15 Thread Wilco Dijkstra
Hi, > Which doesn't appear to have been approved.  Did you follow up with Jeff? I'll get back to that one at some point - it'll take some time to agree on a way forward with the callback. Wilco

Re: [PATCH][AArch64] Improve register allocation of fma

2018-05-15 Thread Wilco Dijkstra
Hi, James Greenhalgh wrote: > > This seems like a fairly horrible hack around the register allocator > behaviour. That is why I proposed to improve the register allocator so one can explicitly specify the copy preference in the md syntax. However that wasn't accepted, so we'll have to use a hack

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-16 Thread Wilco Dijkstra
Richard Earnshaw wrote: >>> Which doesn't appear to have been approved.  Did you follow up with Jeff? >> >> I'll get back to that one at some point - it'll take some time to agree on a >> way >> forward with the callback. >> >> Wilco >> >> > > So it seems to me that this should then be q

Re: [PATCH][AArch64] Unify vec_set patterns, support floating-point vector modes properly

2018-05-17 Thread Wilco Dijkstra
Kyrill Tkachov wrote: > That patch would look like the attached. Is this preferable? > For the above example it generates the desired: > foo_v4sf: >   ldr s0, [x0] >   ldr s1, [x1, 8] >   ins v0.s[1], v1.s[0] >   ld1 {v0.s}[2], [x2] >   ld1 {v0.s}[3], [x3] >

Re: [PATCH][AArch64] Simplify frame pointer logic

2018-05-22 Thread Wilco Dijkstra
James Greenhalgh wrote: > +/* Determine whether a frame chain needs to be generated.  */ > +static bool > +aarch64_needs_frame_chain (void) > +{ > +  /* Force a frame chain for EH returns so the return address is at FP+8.  */ > +  if (frame_pointer_needed || crtl->calls_eh_return) > +    return tr

[PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-22 Thread Wilco Dijkstra
ND_FP_REGS register class which is now used instead of ALL_REGS. Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. Passes regress, OK for commit? Since it is a regression introduced in GCC8, OK to backport to GCC8? ChangeLog: 2018-05-22 Wilco Dijkstra * config/aarch64

Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-23 Thread Wilco Dijkstra
Richard Sandiford wrote: > -  if (allocno_class != ALL_REGS) > +  if (allocno_class != POINTER_AND_FP_REGS) >  return allocno_class; >  > -  if (best_class != ALL_REGS) > +  if (best_class != POINTER_AND_FP_REGS) >  return best_class; >  >    mode = PSEUDO_REGNO_MODE (regno); > I think

[PATCH] Fix register corruption bug in ree

2014-09-04 Thread Wilco Dijkstra
Wilco Dijkstra * gcc/ree.c (combine_reaching_defs): Ensure inserted copy writes a single register. --- gcc/ree.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/ree.c b/gcc/ree.c index 856745f..9aa1e36 100644 --- a/gcc/ree.c +++ b/gcc/ree.c @@ -743,6

[PATCH 1/4] AArch64: Fix register_move_cost

2014-09-04 Thread Wilco Dijkstra
Hi, This is a set of patches improving register costs on AArch64. The first fixes aarch64_register_move_cost() to support CALLER_SAVE_REGS and POINTER_REGS so costs are calculated correctly in the register allocator. ChangeLog: 2014-09-04 Wilco Dijkstra * gcc/config/aarch64/aarch64

[PATCH 2/4] AArch64: Fix cost for Q register moves

2014-09-04 Thread Wilco Dijkstra
This patch fixes a bug in aarch64_register_move_cost(): GET_MODE_SIZE is in bytes not bits. As a result the FP2FP cost doesn't need to be set to 4 to catch the special case for Q register moves. ChangeLog: 2014-09-04 Wilco Dijkstra * gcc/config/aarch64/aarc

[PATCH 3/4] AArch64: Cleanup inconsistent use of __extension__

2014-09-04 Thread Wilco Dijkstra
Cleanup inconsistent use of __extension__. ChangeLog: 2014-09-04 Wilco Dijkstra * gcc/config/aarch64/aarch64.c: Cleanup use of __extension__. --- gcc/config/aarch64/aarch64.c | 38 +++--- 1 file changed, 11 insertions(+), 27 deletions(-) diff --git a

[PATCH 4/4] AArch64: Add regmove_costs for Cortex-A57 and A53

2014-09-04 Thread Wilco Dijkstra
://gcc.gnu.org/ml/gcc-patches/2014-09/msg00356.html). OK for commit? Wilco ChangeLog: 2014-09-04 Wilco Dijkstra * gcc/config/aarch64/aarch64.c: Add cortexa57_regmove_cost and cortexa53_regmove_cost to avoid spilling from integer to FP registers. --- gcc/config/aarch64

RE: [PATCH 2/4] AArch64: Fix cost for Q register moves

2014-09-04 Thread Wilco Dijkstra
> From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] > > - NAMED_PARAM (FP2FP, 4) > > + NAMED_PARAM (FP2FP, 2) > > This is not directly related to the change below and it is missing > from the ChangeLog. Originally this number had to be > 2 in order > for secondary reload to kick in.

RE: [PATCH] Fix register corruption bug in ree

2014-09-08 Thread Wilco Dijkstra
> Thanks! Jakub noticed a potential problem in this area a while back, > but I never came up with any code to trigger and have kept that issue on > my todo list ever since. > > Rather than ensuring the inserted copy write a single register, it seems > to me we're better off ensuring that the numb

RE: New rematerialization sub-pass in LRA

2014-10-13 Thread Wilco Dijkstra
> Here is a new rematerialization sub-pass of LRA. > > I've tested and benchmarked the sub-pass on x86-64 and ARM. The > sub-pass permits to generate a smaller code in average on both > architecture (although improvement no-significant), adds < 0.4% > additional compilation time in -O2 mode o

RE: New rematerialization sub-pass in LRA

2014-10-14 Thread Wilco Dijkstra
> Vladimir Makarov wrote: > > On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and > > SPECFP is ~0.2% faster. > Thanks for reporting this. It is important for me as I have no aarch64 > machine for benchmarking. > > Perlbmk performance degradation is too big and I'll definite

RE: New rematerialization sub-pass in LRA

2014-10-14 Thread Wilco Dijkstra
> Wilco Dijkstra wrote: > > Vladimir Makarov wrote: > > > On SPECINT2k performance is ~0.5% worse (5.5% regression on perlbmk), and > > > SPECFP is ~0.2% faster. > > Thanks for reporting this. It is important for me as I have no aarch64 > > machine for be

RE: [PATCH][AArch64] Use conditional negate for abs expansion

2015-05-14 Thread Wilco Dijkstra
> James Greenhalgh wrote: > On Mon, Apr 27, 2015 at 05:57:26PM +0100, Wilco Dijkstra wrote: > > > James Greenhalgh wrote: > > > On Mon, Apr 27, 2015 at 02:42:36PM +0100, Wilco Dijkstra wrote: > > > > > -Original Message- > > > &

[PATCH][AArch64] Improve spill code - swap order in shl pattern

2015-04-27 Thread Wilco Dijkstra
illing. 2015-04-27 Wilco Dijkstra * gcc/config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_3): Place integer variant first. --- gcc/config/aarch64/aarch64.md | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gc

RE: [PATCH][AArch64] Use conditional negate for abs expansion

2015-04-27 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wdijk...@arm.com] > Sent: 03 March 2015 16:19 > To: GCC Patches > Subject: [PATCH][AArch64] Use conditional negate for abs expansion > > Expand abs into a compare and conditional negate. This is the most

RE: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable

2015-04-27 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wdijk...@arm.com] > Sent: 03 March 2015 18:06 > To: GCC Patches > Subject: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul > configurable > > This patch makes aarch64_min_divisions_for_recip_mu

RE: [PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS

2015-04-27 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wdijk...@arm.com] > Sent: 04 March 2015 15:38 > To: GCC Patches > Subject: [PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS > > Include the cost of op0 and op1 in all cases in PLUS and MINUS in > aarch6

RE: [PATCH][AArch64] Fix Cortex-A53 shift costs

2015-04-27 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wdijk...@arm.com] > Sent: 05 March 2015 14:49 > To: gcc-patches@gcc.gnu.org > Subject: [PATCH][AArch64] Fix Cortex-A53 shift costs > > This patch fixes the shift costs for Cortex-A53 so they are more accurate

RE: [PATCH] Fix IRA register preferencing

2015-04-27 Thread Wilco Dijkstra
> Jeff Law wrote: > On 12/10/14 06:26, Wilco Dijkstra wrote: > > > > If recomputing is best does that mean that record_reg_classes should not > > give a boost to the preferred class in the 2nd pass? > Perhaps. I haven't looked deeply at this part of IRA. I was re

RE: [PATCH][AArch64] Use conditional negate for abs expansion

2015-04-27 Thread Wilco Dijkstra
> James Greenhalgh wrote: > On Mon, Apr 27, 2015 at 02:42:36PM +0100, Wilco Dijkstra wrote: > > > -Original Message----- > > > From: Wilco Dijkstra [mailto:wdijk...@arm.com] > > > Sent: 03 March 2015 16:19 > > > To: GCC Patches > > > Subje

RE: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable

2015-05-01 Thread Wilco Dijkstra
> Marcus Shawcroft wrote: > On 27 April 2015 at 14:43, Wilco Dijkstra wrote: > > >> static unsigned int > >> -aarch64_min_divisions_for_recip_mul (enum machine_mode mode > >> ATTRIBUTE_UNUSED) > >> +aarch64_min_divisions_for_recip_mul (enu

RE: [PATCH][AArch64] Fix Cortex-A53 shift costs

2015-05-01 Thread Wilco Dijkstra
> Marcus Shawcroft wrote: > On 5 March 2015 at 14:49, Wilco Dijkstra wrote: > > This patch fixes the shift costs for Cortex-A53 so they are more accurate - > > immediate shifts > use > > SBFM/UBFM which takes 2 cycles, register controlled shifts take 1 cycle. >

RE: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable

2015-05-01 Thread Wilco Dijkstra
> Marcus Shawcroft wrote: > On 1 May 2015 at 12:26, Wilco Dijkstra wrote: > > > > > >> Marcus Shawcroft wrote: > >> On 27 April 2015 at 14:43, Wilco Dijkstra wrote: > >> > >> >> static unsigned int > >> >> -aarch64_mi

[PATCH][AArch64] Adjust generic move costs

2014-11-14 Thread Wilco Dijkstra
lls. OK for commit? ChangeLog: 2014-11-14 Wilco Dijkstra * gcc/config/aarch64/aarch64.c (generic_regmove_cost): Increase FP move cost. --- gcc/config/aarch64/aarch64.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c

RE: [PATCH][AArch64] Adjust generic move costs

2014-11-19 Thread Wilco Dijkstra
Hi Jiong, Can you commit this please? 2014-11-19 Wilco Dijkstra * gcc/config/aarch64/aarch64.c (generic_regmove_cost): Increase FP move cost (PR61915). --- gcc/config/aarch64/aarch64.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/config

RE: [PATCH] AArch64: Add TARGET_SCHED_REASSOCIATION_WIDTH

2014-11-24 Thread Wilco Dijkstra
to 2 did give an improvement, but vector had no effect, so I'll leave to 1 for now. The patch is the same as last time, it just sets integer to 2, and uses the same settings for all CPUs. OK for commit? ChangeLog: 2014-11-24 Wilco Dijkstra * gcc/config/aarch64/aarch64-protos.h

RE: [PATCH] Improve spillcost of literal pool loads

2014-11-28 Thread Wilco Dijkstra
> Jeff Law wrote: > Do you have a testcase that shows the expected improvements from this > change? It's OK if it's specific to a target. > > Have you bootstrapped and regression tested this change? > > With a test for the testsuite and assuming it passes bootstrap and > regression testing, this

RE: [PATCH] Improve spillcost of literal pool loads

2014-12-02 Thread Wilco Dijkstra
> Jeff Law wrote: > OK with the appropropriate ChangeLog entires. THe original for > ira-costs.c was fine, so you just need the trivial one for the testcase. ChangeLog below - Jiong, could you commit for me please? 2014-12-02 Wilco Dijkstra * gcc/ira-costs.c (scan_one_insn)

[PATCH] Remove inefficient branchless conditional negate optimization

2015-02-26 Thread Wilco Dijkstra
rcx,%rdi), %eax ret After: cmp w0, 4 csneg w0, w0, w0, lt ret movl%edi, %edx movl%edi, %eax negl%edx cmpl$4, %edi cmovge %edx, %eax ret ChangeLog: 2015-02-26 Wilco Dijkstra wdijk...@arm.

RE: [PATCH] Remove inefficient branchless conditional negate optimization

2015-02-27 Thread Wilco Dijkstra
> Richard Biener wrote: > On Thu, Feb 26, 2015 at 11:20 PM, Jeff Law wrote: > > On 02/26/15 10:30, Wilco Dijkstra wrote: > >> > >> Several GCC versions ago a conditional negate optimization was introduced > >> as a workaround for > >> PR45685.

<    1   2   3   4   5   6   7   8   9   10   >