[PATCH][AArch64] Use conditional negate for abs expansion

2015-03-03 Thread Wilco Dijkstra
0, 1 sxtwx0, w0 eor x1, x0, x0, asr 63 sub x1, x1, x0, asr 63 mov x0, x1 ret After: addsw0, w0, 1 csneg w0, w0, w0, pl ret ChangeLog: 2015-03-03 Wilco Dijkstra * gcc/config/aarch64/aarch64.md (absdi2): opti

[PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable

2015-03-03 Thread Wilco Dijkstra
This patch makes aarch64_min_divisions_for_recip_mul configurable for float and double. This allows CPUs with really fast or multiple dividers to return 3 (or even 4) if that happens to be faster overall. No code generation change - bootstrap & regression OK. ChangeLog: 2015-03-03 W

RE: [PATCH][AArch64] Make aarch64_min_divisions_for_recip_mul configurable

2015-03-03 Thread Wilco Dijkstra
> Andrew Pinski wrote: > On Tue, Mar 3, 2015 at 10:06 AM, Wilco Dijkstra wrote: > > This patch makes aarch64_min_divisions_for_recip_mul configurable for float > > and double. This > allows > > CPUs with really fast or multiple dividers to return 3 (or even 4) if that

RE: [PATCH][AArch64] Use conditional negate for abs expansion

2015-03-04 Thread Wilco Dijkstra
> Maxim Kuvyrkov wrote: > > You are removing the 2nd alternative that generates "abs" with your patch. > While I agree that > using "csneg" is faster on all implementations, can you say the same for > "abs"? Especially > given the fact that csneg requires 4 operands instead of abs'es 2? Yes, g

RE: [PATCH][AArch64] Use conditional negate for abs expansion

2015-03-04 Thread Wilco Dijkstra
> Maxim Kuvyrkov wrote: > > On Mar 4, 2015, at 3:30 PM, Wilco Dijkstra wrote: > > > >> Maxim Kuvyrkov wrote: > >> > >> You are removing the 2nd alternative that generates "abs" with your patch. > >> While I agree > that > >

[PATCH][AArch64] Fix aarch64_rtx_costs of PLUS/MINUS

2015-03-04 Thread Wilco Dijkstra
Include the cost of op0 and op1 in all cases in PLUS and MINUS in aarch64_rtx_costs. Bootstrap & regression OK. ChangeLog: 2015-03-04 Wilco Dijkstra * gcc/config/aarch64/aarch64.c (aarch64_rtx_costs): Calculate cost of op0 and op1 in PLUS and MINUS cases. --- gcc/co

RE: [PATCH] Remove inefficient branchless conditional negate optimization

2015-03-04 Thread Wilco Dijkstra
> Jeff Law wrote: > On 02/26/15 10:30, Wilco Dijkstra wrote: > > Several GCC versions ago a conditional negate optimization was introduced > > as a workaround > for > > PR45685. However the branchless expansion for conditional negate is > > extremely ine

[PATCH][AArch64] Fix Cortex-A53 shift costs

2015-03-05 Thread Wilco Dijkstra
This patch fixes the shift costs for Cortex-A53 so they are more accurate - immediate shifts use SBFM/UBFM which takes 2 cycles, register controlled shifts take 1 cycle. Bootstrap and regression OK. ChangeLog: 2015-03-05 Wilco Dijkstra * gcc/config/arm/aarch-cost-tables.h

RE: [PATCH] Remove inefficient branchless conditional negate optimization

2015-03-06 Thread Wilco Dijkstra
> So, OK with the testcase moved into gcc.target/i386/ I've moved it and changed the compile condition: /* { dg-do compile { target { ! { ia32 } } } } */ Jiong, can you commit this please? Wilco 2015-03-06 Wilco Dijkstra * gcc/tree-ssa-phiopt.c (neg_replacement): Remove.

[PATCH] AArch64: Add TARGET_SCHED_REASSOCIATION_WIDTH

2014-10-29 Thread Wilco Dijkstra
Wilco Dijkstra * gcc/config/aarch64/aarch64-protos.h (tune-params): Add reasociation tuning parameters. * gcc/config/aarch64/aarch64.c (TARGET_SCHED_REASSOCIATION_WIDTH): Define. (aarch64_reassociation_width): New function. (generic_tunings) Add reassociation

[PATCH] Improve spillcost of literal pool loads

2014-10-29 Thread Wilco Dijkstra
, however it is right thing to do for any constant, including constants in literal pools (which are typically not legitimate). Also use ALL_REGS rather than GENERAL_REGS as ALL_REGS has the correct floating point register costs. ChangeLog: 2014-10-29 Wilco Dijkstra * gcc/ira-costs.c

[PATCH][AArch64] Fix PR81800

2019-05-28 Thread Wilco Dijkstra
is to disable lrint/llrint on double if the size of a long is smaller (ie. ilp32). Passes regress and bootstrap on AArch64. OK for commit? ChangeLog 2018-11-13 Wilco Dijkstra gcc/ PR target/81800 * gcc/config/aarch64/aarch64.md (lrint): Disable lrint pattern i

[PATCH][AArch64] Fix symbol offset limit

2019-05-28 Thread Wilco Dijkstra
, OK for commit? ChangeLog: 2018-11-09 Wilco Dijkstra gcc/ * config/aarch64/aarch64.c (aarch64_classify_symbol): Apply reasonable limit to symbol offsets. testsuite/ * gcc.target/aarch64/symbol-range.c (foo): Set new limit. * gcc.target/aarch64/symbol-r

[PATCH v2] Fix PR64242

2019-05-28 Thread Wilco Dijkstra
on AArch64 and x86-64. Inspected x86, Arm, Thumb-1 and Thumb-2 assembler which looks correct. ChangeLog: 2018-12-07 Wilco Dijkstra gcc/ PR middle-end/64242 * builtins.c (expand_builtin_longjmp): Add frame clobbers and schedule block. (expand_builtin_nonlocal_

[PATCH] Fix PR84521

2019-05-28 Thread Wilco Dijkstra
seems incorrect since the helper function moves the the frame pointer value into the static chain register (so this patch does nothing to make it better or worse). AArch64 bootstrap OK, new test passes on AArch64, x86-64 and Arm. ChangeLog: 2018-12-13 Wilco Dijkstra gcc/ PR middle-end/

[PATCH] Fix alignment option parser (PR90684)

2019-05-30 Thread Wilco Dijkstra
Fix the alignment option parser to always allow up to 4 alignments. Now -falign-functions=16:8:8:8 no longer reports an error. OK for commit (and backport to GCC9)? ChangeLog: 2019-05-30 Wilco Dijkstra PR driver/90684 * gcc/opts.c (parse_and_check_align_values): Allow 4

[PATCH][AArch64] Increase default function alignment

2019-05-31 Thread Wilco Dijkstra
With -mcpu=generic the function alignment is currently 8, however almost all supported cores prefer 16 or higher, so increase the default to 16:12. This gives ~0.2% performance increase on SPECINT2017, while codesize is 0.12% larger. ChangeLog: 2019-05-31 Wilco Dijkstra * config

Re: [PATCH][AArch64] Increase default function alignment

2019-05-31 Thread Wilco Dijkstra
Hi Steve, > I have no objection to the change but could the commit message and/or > comments be expanded to explain the ':12' part of this value.  I > couldn't find an explanation for it in the code and I don't understand > what it does. See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.ht

Re: [AArch64] Use scvtf fbits option where appropriate

2019-06-13 Thread Wilco Dijkstra
Hi Joel, A few comments below: +/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a + power of 2 (i.e 1/2^n) return the number of float bits. e.g. for x==(1/2^n) + return log2 (n). Otherwise return 0. */ +int +aarch64_fpconst_pow2_recip (rtx x) +{ + REAL_VALUE_TYPE r

Re: [patch][aarch64]: fix unrecognizable insn for ldr got in ilp32 tiny

2019-06-13 Thread Wilco Dijkstra
Hi Sylvia, -(define_insn "ldr_got_tiny" +(define_insn "ldr_got_tiny_di" [(set (match_operand:DI 0 "register_operand" "=r") - (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")] - UNSPEC_GOTTINYPIC))] + (unspec:DI + [(match_operand:DI 1 "aarch64_vali

[COMMITTED] Improve PR64242 testcase

2019-06-17 Thread Wilco Dijkstra
Clear the input array to avoid the testcase accidentally passing with an incorrect frame pointer. Committed as obvious. ChangeLog: 2019-06-17 Wilco Dijkstra testsuite/ PR middle-end/64242 * gcc.c-torture/execute/pr64242.c: Improve test. -- diff --git a/gcc/testsuite/gcc.c

Re: [PATCH] Fix PR84521

2019-06-17 Thread Wilco Dijkstra
Hi Jeff, > So I like the significant simplification here.  My worry is whether or > not this is, in effect, an ABI change.  ie, would we be able to mix and > match .o files from before/after this change which used the builtin > setjmp/longjmp bits? No it's not an ABI change. It does affect the va

Re: [AArch64] Use scvtf fbits option where appropriate

2019-06-18 Thread Wilco Dijkstra
Hi, And a few more comments: > +/* If X is a positive CONST_DOUBLE with a value that is the reciprocal of a > +   power of 2 (i.e 1/2^n) return the number of float bits. e.g. for > x==(1/2^n) > +   return n. Otherwise return -1.  */ > +int > +aarch64_fpconst_pow2_recip (rtx x) > +{ > + 

Re: [PATCH] Fix PR84521

2019-06-18 Thread Wilco Dijkstra
Hi Max, > The testcase from the patch passes with the trunk xtensa-linux-gcc > with windowed ABI. But with the changes in this patch a lot of tests > that use longjmp are failing on xtensa-linux. Interesting. I looked at the _xtensa_nonlocal_goto implementation in libgcc/config/xtensa/lib2funcs.

Re: [PATCH] Fix PR84521

2019-06-18 Thread Wilco Dijkstra
Hi, > > Is this test valid?  Can jmp buffer be allowed on stack? > > Sure, the contents of the jmp buffer is only valid during the lifetime > of the call frame anyway. Indeed. The issue with jmp buffer being on the stack causing incorrect restore when doing longjmp has just been fixed (PR64242)

Re: [PATCH] Fix PR84521

2019-06-18 Thread Wilco Dijkstra
Hi Max, > It would work if a frame pointer was initialized in the function test, but > it wasn't: Right, because it unwinds, it needs a valid frame pointer since we no longer store the stack pointer. So xtensa_frame_pointer_required should do something like: if (cfun->machine->accesses_prev_fr

Re: [PATCH] Fix PR84521

2019-06-19 Thread Wilco Dijkstra
Hi Max, > On Tue, Jun 18, 2019 at 4:53 PM Wilco Dijkstra wrote: > > > It would work if a frame pointer was initialized in the function test, but > > > it wasn't: > > > > Right, because it unwinds, it needs a valid frame pointer since we no

Re: [PATCH] Adding RBIT gcc builtin for ARM

2019-06-20 Thread Wilco Dijkstra
Hi Ayan, Have you seen https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481? Adding support for a generic bitreverse builtin would be very useful since LLVM already supports this. Wilco

Re: [RFC][AArch64] Add support for system register based stack protector canary access

2018-12-03 Thread Wilco Dijkstra
Hi, Florian wrote: > For userland, I would like to eventually copy the OpenBSD approach for > architectures which have some form of PC-relative addressing: we can > have multiple random canaries in (RELRO) .rodata in sufficiently close > to the code that needs them (assuming that we have split .ro

[PATCH v2] Fix PR64242

2018-12-07 Thread Wilco Dijkstra
Log: 2018-12-07 Wilco Dijkstra gcc/ PR middle-end/64242 * builtins.c (expand_builtin_longjmp): Add frame clobbers and schedule block. (expand_builtin_nonlocal_goto): Likewise. testsuite/ PR middle-end/64242 * gcc.c-torture/execute/pr64242.c: Update test. --

Re: [PATCH v2] Fix PR64242

2018-12-07 Thread Wilco Dijkstra
Hi, Jakub Jelinek wrote: > On Fri, Dec 07, 2018 at 02:52:48PM +0000, Wilco Dijkstra wrote: >> -  struct __attribute__((aligned (32))) S { int a[4]; } s;    >>

Re: [PATCH v2] Fix PR64242

2018-12-07 Thread Wilco Dijkstra
Hi, Jakub Jelinek wrote: On Fri, Dec 07, 2018 at 04:19:22PM +, Wilco Dijkstra wrote: >> The test case doesn't need an aligned object to fail, so why did you add it? > > It needed it on i686, because otherwise it happened to see the value it > wanted in the caller's

Re: [RFA] [target/87369] Prefer "bit" over "bfxil"

2018-12-07 Thread Wilco Dijkstra
Hi, >> Ultimately, the best solution here will probably depend on which we >> think is more likely, copysign or the example I give above. > I'd tend to suspect we'd see more pure integer bit twiddling than the > copysign stuff. All we need to do is to clearly separate the integer and FP/SIMD case

Re: [ping] allow target configurations to state R18 as reserved on arrch64

2018-12-12 Thread Wilco Dijkstra
Hi Oliver, +#define FIXED_R18 0    {                            \ 0, 0, 0, 0,   0, 0, 0, 0,    /* R0 - R7 */        \ 0, 0, 0, 0,   0, 0, 0, 0,    /* R8 - R15 */        \ -    0, 0, 0, 0,   0, 0, 0, 0,    /* R16 - R23 */        \ +    0, 0, FIXED_R18, 0, 0, 0, 0, 0,    /* R16 - R23 */  

Re: [ping] Change static chain to r11 on aarch64

2018-12-12 Thread Wilco Dijkstra
Hi, >> On 12 Dec 2018, at 18:21, Richard Earnshaw (lists) >> wrote: > >>  However, that introduces an issue that that >> code is potentially used across multiple versions of gcc, with >> potentially different choices of the static chain register.  Hmm, this >> might need some more careful though

Re: [ping] Change static chain to r11 on aarch64

2018-12-12 Thread Wilco Dijkstra
Hi Martin, > Does a non-executable stack actually improve security? Absolutely, it's like closing your front door rather than just leave it open for anyone. > For the alternative implementation using (custom) function > descriptors (-fno-trampolines) the static chain becomes > part of the ABI or

Re: [ping] Change static chain to r11 on aarch64

2018-12-13 Thread Wilco Dijkstra
Hi Martin, Uecker, Martin wrote: >Am Mittwoch, den 12.12.2018, 22:04 + schrieb Wilco Dijkstra: >> Hi Martin, >> >> > Does a non-executable stack actually improve security? >> >> Absolutely, it's like closing your front door rather than just leave i

Re: [ping] Change static chain to r11 on aarch64

2018-12-13 Thread Wilco Dijkstra
Hi Martin, > One could also argue that it creates a false sense of security > and diverts resources from properly fixing the real problems > i.e. the buffer overflows which lets an attacker write to the > stack in the first place. A program without buffer overflows > is secure even without an exec

[PATCH] Fix PR84521

2018-12-14 Thread Wilco Dijkstra
seems incorrect since the helper function moves the the frame pointer value into the static chain register (so this patch does nothing to make it better or worse). AArch64 bootstrap OK, new test passes on AArch64, x86-64 and Arm. ChangeLog: 2018-12-13 Wilco Dijkstra gcc/ PR middle-end/

Re: [ping] Change static chain to r11 on aarch64

2018-12-17 Thread Wilco Dijkstra
Hi Hans-Peter, > While the choice of static-chain register does not affect the > ABI, it's the other way round: the choice of static-chain > register matters, specifically it's call-clobberedness. Agreed. > It looks like the current aarch64 static-chain register R18 is > call-saved but without s

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-19 Thread Wilco Dijkstra
Hi, Jakub Jelinek wrote: > On Wed, Dec 19, 2018 at 07:53:48PM +, Uecker, Martin wrote: >> What do you think about making the trampoline a single call >> instruction and have a large memory region which is the same >> page mapped many times? This sounds like a good idea, but given a function d

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-20 Thread Wilco Dijkstra
Hi Martin, > There is a similar mechanism for pointer-to-member-functions > used by C++. Is this correct on aarch64? /* By default, the C++ compiler will use the lowest bit of the pointer    to function to indicate a pointer-to-member-function points to a    virtual member function.  However, if

Re: [ping] Change static chain to r11 on aarch64

2018-12-21 Thread Wilco Dijkstra
Hi Olivier, > I'm experimenting with the idea of adjusting the > stack probing code using r9 today, to see if it could > save/restore that reg if it happens to be the static chain > as well. > > If that can be made to work, maybe that would be a better > alternative than just swapping and have the

Re: [PATCH][GCC][Aarch64] Change expected bfxil count in gcc.target/aarch64/combine_bfxil.c to 18 (PR/87763)

2019-01-04 Thread Wilco Dijkstra
Hi Sam, This is a trivial test fix, so it falls under the obvious rule and can be committed without approval - https://www.gnu.org/software/gcc/svnwrite.html Cheers, Wilco

Re: [PATCH][ARM] Fix low reg issue in Thumb-2 movsi patterns

2019-07-25 Thread Wilco Dijkstra
Hi Richard, > I think this should be "lk*r", not "l*rk".  SP is only going to crop up > in rare circumstances, but we are always going to need this pattern if > it does and hiding this from register preferencing is pointless.  It's > not like the compiler is going to start allocating SP in the

[PATCH][ARM] Switch to default sched pressure algorithm

2019-07-29 Thread Wilco Dijkstra
ortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-07-30 Thread Wilco Dijkstra
Hi all, >On 30/07/2019 10:31, Ramana Radhakrishnan wrote: >> On 30/07/2019 10:08, Christophe Lyon wrote: >>> Hi Wilco, >>> >>> Do you know which benchmarks were used when this was checked-in? >>> It isn't clear from >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html >> >> It

[COMMITTED][ARM] Adjust literal pool offset in Thumb-2 movsi patterns

2019-07-30 Thread Wilco Dijkstra
K on arm-none-linux-gnueabihf --with-cpu=cortex-a57, committed as obvious. [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01579.html ChangeLog: 2019-07-30 Wilco Dijkstra * config/arm/thumb2.md (thumb2_movsi_insn): Adjust literal offset. * config/arm/vfp.md (thumb2_movsi

Re: [PATCH][ARM] Cleanup logical DImode operations

2019-07-31 Thread Wilco Dijkstra
7-18  Wilco Dijkstra      * config/arm/arm.md (split and/eor/ior): Remove Neon check.     (split not): Add DImode not splitter.     (anddi3): Remove pattern.     (anddi3_insn): Likewise.     (anddi_zesidi_di): Likewise.     (anddi_sesdi_di): Likewise.     (anddi_notd

Re: [PATCH][ARM] Cleanup DImode shifts

2019-07-31 Thread Wilco Dijkstra
01301.html ChangeLog: 2019-07-19  Wilco Dijkstra      * config/arm/iterators.md (qhs_extenddi_cstr): Update.     (qhs_extenddi_cstr): Likewise.     * config/arm/arm.md (ashldi3): Always expand early.     (ashlsi3): Likewise.     (ashrsi3): Likewise.     (zero_extenddi2): R

Re: [PATCH][ARM] Remove remaining Neon DImode support

2019-07-31 Thread Wilco Dijkstra
removed. Code generation is improved in all cases, saving another 400-500 instructions from the PR77308 testcase (total improvement is over 1700 instructions with -mcpu=cortex-a57 -O2). Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-19  W

Re: [PATCH][AArch64] Increase default function alignment

2019-07-31 Thread Wilco Dijkstra
ping   With -mcpu=generic the function alignment is currently 8, however almost all supported cores prefer 16 or higher, so increase the default to 16:12. This gives ~0.2% performance increase on SPECINT2017, while codesize is 0.12% larger. ChangeLog: 2019-05-31  Wilco Dijkstra

Re: [PATCH][AArch64] Fix symbol offset limit

2019-07-31 Thread Wilco Dijkstra
ch64, passes regress, OK for commit? ChangeLog: 2018-11-09  Wilco Dijkstra      gcc/     * config/aarch64/aarch64.c (aarch64_classify_symbol):     Apply reasonable limit to symbol offsets.     testsuite/     * gcc.target/aarch64/symbol-range.c (foo): Set new l

Re: [PATCH][AArch64] Fix PR81800

2019-07-31 Thread Wilco Dijkstra
The fix is to disable lrint/llrint on double if the size of a long is smaller (ie. ilp32). Passes regress and bootstrap on AArch64. OK for commit? ChangeLog 2018-11-13  Wilco Dijkstra        gcc/     PR target/81800     * gcc/config/aarch64/aarch64.md (lrint): Disable

[COMMITTED] Fix pr89330_0.C test

2019-08-01 Thread Wilco Dijkstra
Fix pr89330_0.C test by adding missing effective target shared. Committed as obvious. ChangeLog: 2019-08-01 Wilco Dijkstra * gcc/testsuite/g++.dg/lto/pr89330_0.C: Add effective-target shared. -- diff --git a/gcc/testsuite/g++.dg/lto/pr89330_0.C b/gcc/testsuite/g++.dg/lto/pr89330_0.C

[PATCH] Add missing popcount simplifications (PR90693)

2019-08-13 Thread Wilco Dijkstra
Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and popcount (x) == 1 into (x-1) gcc/ PR middle-end/90693 * match.pd: Add popcount simplifications. testsuite/ PR middle-end/90693 * gcc.dg/fold-popcount-5.c: Add new test. --- diff --git a/gcc/match.p

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-08-19 Thread Wilco Dijkstra
hf --with-cpu=cortex-a57 ChangeLog: 2019-07-29  Wilco Dijkstra      * config/arm/arm.c (arm_option_override): Don't override sched     pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 81286cadf32f908e045d704128

Re: [PATCH][AArch64] Fix symbol offset limit

2019-08-19 Thread Wilco Dijkstra
nces.    Bootstrapped on AArch64, passes regress, OK for commit?    ChangeLog:  2018-11-09  Wilco Dijkstra    gcc/ * config/aarch64/aarch64.c (aarch64_classify_symbol): Apply reasonable limit to symbol offsets.   testsuite/ * gcc.target/aarch64/symbol-range.c

Re: [PATCH][ARM] Cleanup logical DImode operations

2019-08-19 Thread Wilco Dijkstra
mmit?    ChangeLog:  2019-07-18  Wilco Dijkstra    * config/arm/arm.md (split and/eor/ior): Remove Neon check. (split not): Add DImode not splitter. (anddi3): Remove pattern. (anddi3_insn): Likewise. (anddi_zesidi_di): Likewise. (anddi_sesd

Re: [PATCH][ARM] Cleanup DImode shifts

2019-08-19 Thread Wilco Dijkstra
org/ml/gcc-patches/2019-07/msg01301.html    ChangeLog:  2019-07-19  Wilco Dijkstra    * config/arm/iterators.md (qhs_extenddi_cstr): Update. (qhs_extenddi_cstr): Likewise. * config/arm/arm.md (ashldi3): Always expand early. (ashlsi3): Likewise.    

Re: [PATCH][ARM] Remove remaining Neon DImode support

2019-08-19 Thread Wilco Dijkstra
eLog:  2019-07-19  Wilco Dijkstra    * config/arm/arm.md (neon_for_64bits): Remove. (avoid_neon_for_64bits): Remove. (arm_adddi3): Always split early. (arm_subdi3): Always split early. (negdi2): Remove Neon expansion. (split zero_ex

Re: [PATCH] Add missing popcount simplifications (PR90693)

2019-08-21 Thread Wilco Dijkstra
Hi Richard, > > > > I think this should be in expand stage where there could be comparison > > of the cost of the RTLs. > > I tend to agree here, if not then for the reason the "simplified" variants > have more GIMPLE stmts which means they are not "simpler".  In > fact I'd argue for canonicaliza

[PATCH][ARM] Deprecate -mneon-for-64bits

2019-08-23 Thread Wilco Dijkstra
-08-22 Wilco Dijkstra * gcc/config/arm/arm.opt (mneon-for-64bits): Deprecate. * gcc/config/arm/arm.h (TARGET_PREFER_NEON_64BITS): Remove. (prefer_neon_for_64bits): Remove. * gcc/config/arm/arm.c (prefer_neon_for_64bits): Remove. (tune_params): Remove

Re: [PATCH][ARM] Deprecate -mneon-for-64bits

2019-08-23 Thread Wilco Dijkstra
s option is deprecated and has no effect. @item -mslow-flash-data @opindex mslow-flash-data Updated patch: Deprecate -mneon-for-64bits since it no longer has any effect after the DImode codegen improvements. OK for commit? ChangeLog: 2019-08-23 Wilco Dijkstra * gcc/doc/invoke.

Re: [PATCH][ARM] Cleanup logical DImode operations

2019-08-28 Thread Wilco Dijkstra
Hi Christophe, > After this was committed (r274823), I've noticed 2 regressions on arm*: > FAIL: gcc.target/arm/pr53447-5.c scan-assembler-times (ldrd|vldr\\.64) 20 > FAIL: gcc.target/arm/pr53447-5.c scan-assembler-times (strd|vstr\\.64) 18 > > Does this test still pass for you? You're right, t

[PATCH][ARM] Add logical DImode expanders

2019-08-29 Thread Wilco Dijkstra
memory operands and immediates are handled more efficiently. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-08-29 Wilco Dijkstra * config/arm/arm.md (anddi3): Expand explicitly. (iordi3): Likewise. (xordi3): Likewise. (one_cmpldi2): Likewise

Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Wilco Dijkstra
Hi Maxim, >  It appears that cores with autoprefetcher hardware prefer loads and stores >bundled together, not interspersed with > other instructions to occupy the >rest of CPU units.   I don't believe it is as simple as that - modern cores have multiple prefetchers but won't prefer bund

Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Wilco Dijkstra
Hi Alexander, > So essentially the main issue is not a hardware peculiarity, but rather the > bad schedule being totally wrong (it could only make sense if loads had > 1-cycle > latency, which they do not). The scheduling is only bad because the specific intrinsics used are mapped onto asm stat

Re: [PATCH][AArch64] Fix symbol offset limit

2019-09-02 Thread Wilco Dijkstra
nces.      Bootstrapped on AArch64, passes regress, OK for commit?      ChangeLog:   2018-11-09  Wilco Dijkstra       gcc/   * config/aarch64/aarch64.c (aarch64_classify_symbol):   Apply reasonable limit to symbol offsets.      testsuite/   * gcc.target/aar

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-09-02 Thread Wilco Dijkstra
one-linux-gnueabihf --with-cpu=cortex-a57    ChangeLog:  2019-07-29  Wilco Dijkstra    * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm.    --    diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c  index 81286cadf32f908e045d704128

[PATCH][ARM] Remove support for MULS

2019-09-03 Thread Wilco Dijkstra
Remove various MULS/MLAS patterns which are enabled when optimizing for size. However the codesize gain from these patterns is so minimal that there is no point in keeping them. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-09-03 Wilco Dijkstra * config/arm/arm.md

[PATCH][ARM] Cleanup multiply patterns

2019-09-03 Thread Wilco Dijkstra
Cleanup the 32-bit multiply patterns. Merge the pre-Armv6 with the Armv6 patterns, remove useless alternatives and order the accumulator operands to prefer MLA Ra, Rb, Rc, Ra whenever feasible. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-09-03 Wilco Dijkstra * config/arm

[PATCH][ARM] Cleanup highpart multiply patterns

2019-09-03 Thread Wilco Dijkstra
Cleanup the various highpart multiply patterns using iterators. As a result the signed and unsigned variants and the pre-Armv6 multiply operand constraints are all handled in a single pattern and simple expander. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-09-03 Wilco Dijkstra

[PATCH][ARM] Cleanup 64-bit multiplies

2019-09-03 Thread Wilco Dijkstra
other DImode operations splitting early. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-09-03 Wilco Dijkstra * config/arm/arm.md (maddsidi4): Remove expander. (mulsidi3adddi): Remove pattern. (mulsidi3adddi_v6): Likewise. (mulsidi3_nov6): Likewise

Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-09-03 Thread Wilco Dijkstra
Hi Maxim, >  > Autoprefetching heuristic is enabled only for cores that support it, and > isn't active for by default. >   > It's enabled on most cores, including the default (generic). So we do have to > be > careful that this doesn't regress any other benchmarks or do worse on modern > cores

[COMMITTED][GCC8][GCC9][AArch64] Backport PR81800

2019-09-04 Thread Wilco Dijkstra
nce llrint now also ignores FE_INVALID exceptions! The fix is to disable lrint/llrint on double if the size of a long is smaller (ie. ilp32). ChangeLog 2018-11-13 Wilco Dijkstra gcc/ PR target/81800 * gcc/config/aarch64/aarch64.md (lrint): Disable lrint pattern if GPF

Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line

2019-09-05 Thread Wilco Dijkstra
Hi Richard, >What I have not done, but is now a possibility, is to use a custom >calling convention for the out-of-line routines. I now only clobber >2 (or 3, for TImode) temp regs and set a return value. This would be a great feature to have since it reduces the overhead of outlinin

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-09-06 Thread Wilco Dijkstra
Hi, +(simplify + (convert +(rshift + (mult > is the outer convert really necessary? That is, if we change > the simplification result to Indeed that should be "convert?" to make it optional. > Is the Hamming weight popcount > faster than the libgcc table-based approach? I wonder if

[PATCH][ARM] Enable arm_legitimize_address for Thumb-2

2019-09-09 Thread Wilco Dijkstra
PECFP improves 0.2%. Bootstrap OK, OK for commit? ChangeLog: 2019-09-09 Wilco Dijkstra * config/arm/arm.c (arm_legitimize_address): Remove Thumb-2 bailout. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a5a6a0fab1b4b7ef07931522e7d47e59842

[PATCH][ARM] Tweak HONOR_REG_ALLOC_ORDER

2019-09-09 Thread Wilco Dijkstra
? ChangeLog: 2019-09-09 Wilco Dijkstra * config/arm/arm.h (HONOR_REG_ALLOC_ORDER): Set when optimizing for size. -- diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 8d023389eec469ad9c8a4e88edebdad5f3c23769..e3473e29fbbb964ff1136c226fbe30d35dbf7b39 100644 --- a/gcc/config/arm

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-09-09 Thread Wilco Dijkstra
ess OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57      ChangeLog:   2019-07-29  Wilco Dijkstra       * config/arm/arm.c (arm_option_override): Don't override sched   pressure algorithm.      --      diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/

Re: [PATCH][ARM] Add logical DImode expanders

2019-09-09 Thread Wilco Dijkstra
which ensure memory operands and immediates are handled more efficiently. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-08-29  Wilco Dijkstra      * config/arm/arm.md (anddi3): Expand explicitly.     (iordi3): Likewise.     (xordi3): Likewise.     (one_cmpldi2

Re: [PATCH][AArch64] Fix symbol offset limit

2019-09-09 Thread Wilco Dijkstra
and its references.        Bootstrapped on AArch64, passes regress, OK for commit?        ChangeLog:    2018-11-09  Wilco Dijkstra         gcc/    * config/aarch64/aarch64.c (aarch64_classify_symbol):    Apply reasonable limit to symbol offsets.        tests

Re: [PATCH][ARM] Remove support for MULS

2019-09-09 Thread Wilco Dijkstra
ping   Remove various MULS/MLAS patterns which are enabled when optimizing for size.  However the codesize gain from these patterns is so minimal that there is no point in keeping them. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-09-03  Wilco Dijkstra      * config

Re: [PATCH][ARM] Cleanup multiply patterns

2019-09-09 Thread Wilco Dijkstra
ping   Cleanup the 32-bit multiply patterns.  Merge the pre-Armv6 with the Armv6 patterns, remove useless alternatives and order the accumulator operands to prefer MLA Ra, Rb, Rc, Ra whenever feasible. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-09-03  Wilco Dijkstra

Re: [PATCH][ARM] Cleanup highpart multiply patterns

2019-09-09 Thread Wilco Dijkstra
  Wilco Dijkstra      * config/arm/arm.md (smulsi3_highpart): Use and iterators.     (smulsi3_highpart_nov6): Remove pattern.     (smulsi3_highpart_v6): Likewise.     (umulsi3_highpart): Likewise.     (umulsi3_highpart_nov6): Likewise.     (umulsi3_highpart_v6

Re: [PATCH][ARM] Cleanup 64-bit multiplies

2019-09-09 Thread Wilco Dijkstra
subreg issues due to other DImode operations splitting early. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-09-03  Wilco Dijkstra      * config/arm/arm.md (maddsidi4): Remove expander.     (mulsidi3adddi): Remove pattern.     (mulsidi3adddi_v6): Likewise

[PATCH][ARM] Correctly set SLOW_BYTE_ACCESS

2019-09-11 Thread Wilco Dijkstra
or commit? ChangeLog: 2019-09-11 Wilco Dijkstra * config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1. -- diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 8b92c830de09a3ad49420fdfacde02d8efc2a89b..11212d988a0f56299c2266bace80170d074be56c 100644 --- a/gcc/config/arm/arm.h

[PATCH][ARM] Enable code hoisting with -Os (PR80155)

2019-09-11 Thread Wilco Dijkstra
While code hoisting generally improves codesize, it can affect performance negatively. Benchmarking shows it doesn't help SPEC and negatively affects embedded benchmarks, so only enable code hoisting with -Os on Arm. Bootstrap OK, OK for commit? ChangeLog: 2019-09-11 Wilco Dij

Re: [PATCH] Fix PR 91708

2019-09-11 Thread Wilco Dijkstra
Hi Jeff, Jeff wrote: > Just to make sure I understand. Are you saying the addresses for the > MEMs are equal or the contents of the memory location are equal. > > For the former the alignment has to be the same, plain and simple, even > if GCC isn't aware the alignments have to be the same. > > F

Re: [PATCH][ARM] Correctly set SLOW_BYTE_ACCESS

2019-09-11 Thread Wilco Dijkstra
Hi Paul, > > On Sep 11, 2019, at 11:48 AM, Wilco Dijkstra wrote: > > > > Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing > > bitfields by their declared type, which results in better codegeneration > > on practically any target.  So set it

Re: [PATCH] Fix PR 91708

2019-09-11 Thread Wilco Dijkstra
Hi Jeff, > We're talking about two instructions where if the first executes, then > the second also executes.  If the memory addresses are the same, then > their alignment is the same. > > In your case the two instructions are on different execution paths and > are in fact mutually exclusive. S

Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)

2019-09-12 Thread Wilco Dijkstra
Hi Richard, > Do we document target specific deviations from "default" behavior somewhere? Not as far as I know. The other option changes in arm-common.c are not mentioned anywhere, neither is any of arm_option_override_internal. If we want to keep documentation useful, we shouldn't clutter the

Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line

2019-09-16 Thread Wilco Dijkstra
Hi Richard, >> So what is the behaviour when you explicitly select a specific CPU? > > Selecting a specific cpu selects the specific architecture that the cpu > supports, does it not?  Thus the architecture example above still applies. > > Unless I don't understand what distinction that you're mak

Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)

2019-09-16 Thread Wilco Dijkstra
Hi Prathamesh, > My only concern with the patch is that the issue isn't specific to > code-hoisting. > For this particular case (reproducible with pr77445-2.c), disabling > jump threading > doesn't cause the register spill with hoisting enabled. > Likewise disabling forwprop3 and forwprop4 prevent

Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line

2019-09-17 Thread Wilco Dijkstra
Hi Kyrill, >> When you select a CPU the goal is that we optimize and schedule for that >> specific microarchitecture. That implies using atomics that work best for >> that core rather than outlining them. > > I think we want to go ahead with this framework to enable the portable > deployment of L

Re: [ARM/FDPIC v6 13/24] [ARM] FDPIC: Force LSB bit for PC in Cortex-M architecture

2019-09-17 Thread Wilco Dijkstra
Hi Christophe, Can you explain this in more detail - it doesn't make sense to me to force the Thumb bit during unwinding since it should already be correct, even on a Thumb-only CPU. Perhaps the kernel code that pushes an incorrect address on the stack could be fixed instead? > Without this, when

Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)

2019-09-17 Thread Wilco Dijkstra
Hi Richard, > The issue with the bugzilla is that it lacked appropriate testcase(s) and thus > it is now a mess.  There are clear testcases (maybe not in the benchmarks you Agreed - it's not clear whether any of the proposed changes would actually help the original issue. My patch absolutely does

Re: [PATCH][ARM] Cleanup multiply patterns

2019-09-18 Thread Wilco Dijkstra
Hi Kyrill, >>  + (mult:SI (match_operand:SI 3 "s_register_operand" "r") >>  +  (match_operand:SI 2 "s_register_operand" "r"] > > Looks like we'll want to mark operand 2 here with '%' as well? That doesn't make any difference since both operands are identical. It only h

Re: [PATCH][ARM] Add logical DImode expanders

2019-09-18 Thread Wilco Dijkstra
Hi Kyrill, > We should be able to "compress" the above 3 patterns into one using code > iterators. Good point, that makes sense. I've committed this: ChangeLog: 2019-09-18 Wilco Dijkstra PR target/91738 * config/arm/arm.md (di3): Expand explicitly.

<    1   2   3   4   5   6   7   8   9   10   >