Re: [PATCH v3] Optimize strchr to strlen

2016-09-28 Thread Wilco Dijkstra
les), but this simple patch fixes it: If strchr can't be folded in gimple-fold.c, break so folding code in builtins.c is also called. OK for commit? 2016-09-28 Wilco Dijkstra <wdijk...@arm.com> * gimple-fold.c (gimple_fold_builtin): After failing to fold strchr, also try the generi

Re: [PATCH] Increase lto-min-partition

2016-09-26 Thread Wilco Dijkstra
Prathamesh Kulkarni wrote: > Hi Wilco, > I am working on LTO varpool partitioning to improve performance for > section anchors. > I posted a preliminary patch posted at: > https://gcc.gnu.org/ml/gcc/2016-07/msg00033.html > Unfortunately I haven't yet been able to

Re: [PATCH] Increase lto-min-partition

2016-09-26 Thread Wilco Dijkstra
Markus Trippelsdorf wrote: > On 2016.09.26 at 09:42 +0200, Richard Biener wrote: > > On Sat, Sep 24, 2016 at 10:52 AM, Markus Trippelsdorf > > wrote: > > > On 2016.09.23 at 15:29 +0200, Richard Biener wrote: > > > If for example you limit partitions to 4 on a 4-core

Re: [PATCH] Increase lto-min-partition

2016-09-23 Thread Wilco Dijkstra
Markus Trippelsdorf wrote: > What I wanted to point out is that you of course loose the speedup you'll > get from parallel running backends with only a single partition. Absolutely. For every possible value of min-lto-partition you can find an application that will build with more parallelism if

Re: [PATCH] Increase lto-min-partition

2016-09-23 Thread Wilco Dijkstra
Richard Biener wrote: >On Fri, Sep 23, 2016 at 3:02 PM, Markus Trippelsdorf >wrote: > > And tramp3d only uses ten partitions (lto-min-partition=1). > > With lto-min-partition=5 (current patch) this decrease to only two > > partitions. As a result we loose the

[PATCH v3] Optimize strchr to strlen

2016-09-23 Thread Wilco Dijkstra
: 2016-09-23 Wilco Dijkstra <wdijk...@arm.com> gcc/ * gcc/gimple-fold.c (gimple_fold_builtin_strchr): New function to optimize strchr (s, 0) to strlen. (gimple_fold_builtin): Add BUILT_IN_STRCHR case. testsuite/ * gcc/testsuite/gcc.dg/strlenopt-20.c: Updat

[PATCH] Increase lto-min-partition

2016-09-22 Thread Wilco Dijkstra
ARM server. ChangeLog: 2016-09-22 Wilco Dijkstra <wdijk...@arm.com> gcc/ * params.def (MIN_PARTITION_SIZE): Increase to 5. -- diff --git a/gcc/params.def b/gcc/params.def index 79b7dd4cca9ec1bb67a64725fb1a596b6e937419..da8fd1825e15f2aa800b1c8b680985776c1080ed 100644 ---

Re: [PATCH v2][AArch64] Fix symbol offset limit

2016-09-21 Thread Wilco Dijkstra
ChangeLog: 2016-09-12  Wilco Dijkstra  <wdijk...@arm.com>     gcc/     * config/aarch64/aarch64.c (aarch64_classify_symbol):     Apply reasonable limit to symbol offsets.     testsuite/     * gcc.target/aarch64/symbol-range.c (foo): Set new limit.     * gcc.target/aarch64/symb

Re: [PATCH][AArch64] Improve stack adjustment

2016-09-21 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 23 August 2016 15:49 To: Richard Earnshaw; GCC Patches Cc: nd Subject: Re: [PATCH][AArch64] Improve stack adjustment   ping     Richard Earnshaw wrote: > I see you've added a default argument for your new parameter.  I think > doing that is fine, but

Re: [PATCH][AArch64] Align FP callee-saves

2016-09-21 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 08 September 2016 14:35 To: GCC Patches Cc: nd Subject: [PATCH][AArch64] Align FP callee-saves   If the number of integer callee-saves is odd, the FP callee-saves use 8-byte aligned LDP/STP.  Since 16-byte alignment may be faster on some CPUs, align the FP

Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

2016-09-21 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 02 September 2016 12:31 To: Ramana Radhakrishnan; GCC Patches Cc: nd Subject: Re: [PATCH][AArch64 - v3] Simplify eh_return implementation   Ramana Radhakrishnan wrote: > Can you please file a PR for this and add some testcases ?  This sounds like > a s

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-16 Thread Wilco Dijkstra
Bernd Schmidt wrote: > On 09/15/2016 03:38 PM, Wilco Dijkstra wrote: > > __rawmemchr is not the fastest on any target I tried, including x86, > > Interesting. Care to share your test program? I just looked at the libc > sources and strlen/rawmemchr are practically identical cod

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Wilco Dijkstra
Jakub Jelinek > > Those are the generic definitions, all targets that care about performance > obviously should replace them with assembly code. No, that's exactly my point, it is not true that it is always best to write assembly code. For example there is absolutely no

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Wilco Dijkstra
From: Jakub Jelinek <ja...@redhat.com> > On Thu, Sep 15, 2016 at 02:55:48PM +0000, Wilco Dijkstra wrote: >> stpcpy is not conceptually the same, but for mempcpy, yes. By default >> it's converted into memcpy in the GLIBC headers and the generic >> implementation

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Wilco Dijkstra
Jakub Jelinek wrote: On Thu, Sep 15, 2016 at 03:16:52PM +0100, Szabolcs Nagy wrote: > > > > from libc point of view, rawmemchr is a rarely used > > nonstandard function that should be optimized for size. > > (glibc does not do this now, but it should in my opinion.) > > rawmemchr with 0 is to

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Wilco Dijkstra
Richard Biener wrote: > On Wed, Sep 14, 2016 at 3:45 PM, Jakub Jelinek wrote: > > On Wed, Sep 14, 2016 at 03:41:33PM +0200, Richard Biener wrote: > >> > We've seen several different proposals for where/how to do this > >> > simplification, why did you > >> > say strlenopt is

Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Wilco Dijkstra
Tamar Christina wrote: > On 13/09/16 13:43, Joseph Myers wrote: > > On Tue, 13 Sep 2016, Tamar Christina wrote: >> > >> On 12/09/16 23:33, Joseph Myers wrote: > >>> Why is this boolean false for ieee_quad_format, mips_quad_format and > >>> ieee_half_format?  They should meet your description (even

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-13 Thread Wilco Dijkstra
Richard Biener <richard.guent...@gmail.com> wrote: > On Wed, May 18, 2016 at 2:29 PM, Wilco Dijkstra <wilco.dijks...@arm.com> > wrote: >> Richard Biener wrote: >> >>> Yeah ;) I'm currently bootstrapping/testing the patch that makes it >>> poss

Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-13 Thread Wilco Dijkstra
Jakub wrote: > On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote: > > This patch adds an optimized route to the fpclassify builtin > > for floating point numbers which are similar to IEEE-754 in format. > > > > The goal is to make it faster by: > > 1. Trying to determine the most

Re: [PATCH v2][AArch64] Fix symbol offset limit

2016-09-12 Thread Wilco Dijkstra
016-09-12 Wilco Dijkstra <wdijk...@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_classify_symbol): Apply reasonable limit to symbol offsets. testsuite/ * gcc.target/aarch64/symbol-range.c (foo): Set new limit. * gcc.target/aarch64/symbol-ra

Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

2016-09-12 Thread Wilco Dijkstra
s various bugs in aarch64_final_eh_return_addr, which does not work with -fomit-frame-pointer, alloca or outgoing arguments. Bootstrap OK, GCC Regression OK, OK for trunk? Would it be useful to backport this to GCC6.x? ChangeLog: 2016-09-02  Wilco Dijkstra  <wdijk...@arm.com>

[PATCH][AArch64] Align FP callee-saves

2016-09-08 Thread Wilco Dijkstra
callee-saves, the generated code doesn't change. Bootstrap and regression pass, OK for commit? ChangeLog: 2016-09-08 Wilco Dijkstra <wdijk...@arm.com> * config/aarch64/aarch64.c (aarch64_layout_frame): Align FP callee-saves. -- diff --git a/gcc/config/aarch64/aarch64.c

Re: [PATCH][AArch64] Improve legitimize_address

2016-09-08 Thread Wilco Dijkstra
Christophe Lyon wrote: > After this patch, I've noticed a regression: > FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0-9] > You probably need to adjust the scan pattern. The original code was better and what we should generate. IVOpt chooses to use indexing eventhough it

[PATCH][AArch64] Improve legitimize_address

2016-09-06 Thread Wilco Dijkstra
] ldr w0, [x0, 3424] add w0, w1, w0 ret OK for trunk? ChangeLog: 2016-09-06 Wilco Dijkstra <wdijk...@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_legitimize_address): Avoid use of base_offset if offset already in range. -- diff --git a/gcc/

Re: [PATCH][AArch64] Improve stack adjustment

2016-09-06 Thread Wilco Dijkstra
ion using simple wrapper functions and no default parameters: ChangeLog: 2016-08-10  Wilco Dijkstra  <wdijk...@arm.com> gcc/     * config/aarch64/aarch64.c (aarch64_add_constant_internal):     Add extra argument to allow emitting the move immediate.     Use add/su

Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

2016-09-02 Thread Wilco Dijkstra
ious bugs in aarch64_final_eh_return_addr, which does not work with -fomit-frame-pointer, alloca or outgoing arguments. Bootstrap OK, GCC Regression OK, OK for trunk? Would it be useful to backport this to GCC6.x? ChangeLog: 2016-09-02 Wilco Dijkstra <wdijk...@arm.com> PR77455 gcc/

Re: [PATCH][AArch64 - v2] Simplify eh_return implementation

2016-09-01 Thread Wilco Dijkstra
or outgoing arguments. Bootstrap OK, GCC Regression OK, OK for trunk? Would it be useful to backport this to GCC6.x? ChangeLog: 2016-08-10  Wilco Dijkstra  <wdijk...@arm.com> gcc/     * config/aarch64/aarch64.md (eh_return): Remove pattern and splitter.     * config/aarch64/aar

Re: [PATCH][AArch64] Fix symbol offset limit

2016-08-26 Thread Wilco Dijkstra
mostly QoI, what we have here is a case where legal code fails to link due to optimization. The original example is from GCC itself, the fixed_regs array is small but due to optimization we can end up with _regs + 0x. Wilco > ChangeLog: > 2016-08-23  Wilco Dijkstra  &

Re: [PATCH][AArch64 - v2] Simplify eh_return implementation

2016-08-26 Thread Wilco Dijkstra
Jiong Wang wrote: > The -fomit-frame-pointer is really broken on aarch64_find_eh_return_addr Yes, that's a good conclusion. However even with the frame pointer there are cases that fail, for example it will access LR off SP even after alloca. In fact we're lucky it

Re: [PATCH][AArch64] Add legitimize_address_displacement hook

2016-08-23 Thread Wilco Dijkstra
trap, GCC regression OK. ChangeLog: 2016-08-10  Wilco Dijkstra  <wdijk...@arm.com>     gcc/     * config/aarch64/aarch64.c (aarch64_legitimize_address_displacement):     New function.     (TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT): Define. -- diff --git a/gcc/config/aarch64/aarch64

Re: [PATCH][AArch64] Improve stack adjustment

2016-08-23 Thread Wilco Dijkstra
ion using simple wrapper functions and no default parameters: ChangeLog: 2016-08-10  Wilco Dijkstra  <wdijk...@arm.com> gcc/     * config/aarch64/aarch64.c (aarch64_add_constant_internal):     Add extra argument to allow emitting the move immediate.     Use add/su

Re: [PATCH][AArch64 - v2] Simplify eh_return implementation

2016-08-23 Thread Wilco Dijkstra
arguments. Bootstrap OK, GCC Regression OK, OK for trunk? Would it be useful to backport this to GCC6.x? ChangeLog: 2016-08-10 Wilco Dijkstra <wdijk...@arm.com> gcc/ * config/aarch64/aarch64.md (eh_return): Remove pattern and splitter. * config/aarch64/aar

[PATCH][AArch64] Fix symbol offset limit

2016-08-23 Thread Wilco Dijkstra
failing with symbol out of range. OK for commit? As this is a latent bug, OK to backport to GCC6.x? ChangeLog: 2016-08-23 Wilco Dijkstra <wdijk...@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_classify_symbol): Apply reasonable limit to symbol offsets. tes

Re: [PATCH][AArch64] Add legitimize_address_displacement hook

2016-08-10 Thread Wilco Dijkstra
egression OK. ChangeLog: 2016-08-10 Wilco Dijkstra <wdijk...@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_legitimize_address_displacement): New function. (TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT): Define. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc

Re: [PATCH][AArch64 - v2] Simplify eh_return implementation

2016-08-10 Thread Wilco Dijkstra
arguments. Bootstrap OK, GCC Regression OK, OK for trunk? Would it be useful to backport this to GCC6.x? ChangeLog: 2016-08-10 Wilco Dijkstra <wdijk...@arm.com> gcc/ * config/aarch64/aarch64.md (eh_return): Remove pattern and splitter. * config/aarch64/aar

Re: [PATCH][AArch64] Improve stack adjustment

2016-08-10 Thread Wilco Dijkstra
mple wrapper functions and no default parameters: ChangeLog: 2016-08-10 Wilco Dijkstra <wdijk...@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_add_constant_internal): Add extra argument to allow emitting the move immediate. Use add/sub with positive immediate

[PATCH][AArch64] Simplify eh_return implementation

2016-08-04 Thread Wilco Dijkstra
implementation also fixes various bugs in aarch64_final_eh_return_addr, which does not work with -fomit-frame-pointer, alloca or outgoing arguments. Bootstrap OK, GCC Regression OK, OK for trunk? Would it be useful to backport this to GCC6.x? ChangeLog: 2016-08-04 Wilco Dijkstra <wdijk...@arm.

[PATCH][AArch64] Improve stack adjustment

2016-08-04 Thread Wilco Dijkstra
ret Passes GCC regression tests. ChangeLog: 2016-08-04 Wilco Dijkstra <wdijk...@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_add_constant): Add extra argument to allow emitting the move immediate. Use add/sub with positive imm

[PATCH][AArch64] Add legitimize_address_displacement hook

2016-08-04 Thread Wilco Dijkstra
add x1, sp, x1 str wzr, [x1] add x1, sp, x2 str wzr, [x1] add x1, sp, x3 str wzr, [x1] ldr w0, [sp, w0, sxtw 2] add sp, sp, 32768 ret Bootstrap, GCC regression OK. ChangeLog: 2016-08-04 Wilco Dijkstra

[PATCH][AArch64] Optimize prolog/epilog

2016-07-29 Thread Wilco Dijkstra
(if frame_pointer_needed) 4. stp reg3, reg4, [sp, callee_offset + N*16] (store remaining callee-saves) 5. sub sp, sp, final_adjust The epilog reverses this, and may omit step 3 if alloca wasn't used. Bootstrap, GCC & gdb regression OK. ChangeLog: 2016-07-29 Wilco Dijk

[PATCH][AArch64] Cleanup frame push/pop code

2016-07-26 Thread Wilco Dijkstra
This patch improves the readability of the prolog and epilog code by moving some code into separate functions. There is no difference in generated code. OK for commit? ChangeLog: 2016-07-26 Wilco Dijkstra <wdijk...@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_pushwb_pa

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra
Richard Earnshaw wrote: > So under what circumstances does it lead to sub-optimal code? If the cost is incorrect Combine can make the wrong decision, for example whether to emit a multiply-add or not. I'm not sure whether this still happens as Kyrill fixed several issues in Combine since this

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra
Richard Earnshaw wrote: > Both of which look reasonable to me. Yes the code we generate for these examples is fine, I don't believe this example ever went bad. It's just the cost calculation that is incorrect with the outer check. Wilco

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra
Richard Earnshaw wrote: > Why does combine care what the cost is if the instruction isn't valid? No idea. Combine does lots of odd things that don't make sense to me. Unfortunately the costs we give for cases like this need to be accurate or they negatively affect code quality. The reason for

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra
Richard Earnshaw wrote: > I'm not sure about this, while rtx_cost is called recursively as it > walks the RTL, I'd normally expect the outer levels of the recursion to > catch the cases where zero-extend is folded into a more complex > operation.  Hitting a case like this suggests that something

[PATCH 3/3][AArch64] Improve zero extend

2016-07-19 Thread Wilco Dijkstra
where UBFM has the same performance as AND, and minor speedups across several benchmarks on an implementation where UBFM is slower than AND. Bootstrapped and tested on aarch64-none-elf. 2016-07-19 Kristina Martsenko <kristina.martse...@arm.com> 2016-07-19 Wilco Dijkstra <wdijk..

[PATCH 2/3][AArch64] Improve zero extend

2016-07-19 Thread Wilco Dijkstra
When zero extending a 32-bit value to 64 bits, there should always be a SET operation on the outside, according to the patterns in aarch64.md. However, the mid-end can also ask for the cost of a made-up instruction, where the zero-extend is part of another operation, not SET. In this case we

[PATCH 1/3][AArch64] Improve zero extend

2016-07-19 Thread Wilco Dijkstra
This patchset improves zero extend costs and code generation. When zero extending a 32-bit register, we emit a "mov", but currently report the cost of the "mov" incorrectly. In terms of speed, we currently say the cost is that of an extend operation. But the cost of a "mov" is the cost of 1

[PATCH][ARM][Testsuite] Fix prototype in vst1Q_laneu64-1.c

2016-07-06 Thread Wilco Dijkstra
Fix prototype in vst1Q_laneu64-1.c to unsigned char* so it passes. Committed as trivial fix. ChangeLog 2016-07-06 Wilco Dijkstra <wdijk...@arm.com> gcc/testsuite/ * gcc.target/arm/vst1Q_laneu64-1.c (foo): Use unsigned char*. --- diff --git a/gcc/testsuite/gcc.targ

[PATCH][AArch64] Improve Cortex-A53 integer scheduler

2016-07-05 Thread Wilco Dijkstra
This patch improves the accuracy of the Cortex-A53 integer scheduler, resulting in performance gains across a wide range of benchmarks. OK for commit? ChangeLog: 2016-07-05 Wilco Dijkstra <wdijk...@arm.com> * config/arm/cortex-a53.md: Use final_presence_set for in

Re: [PATCH][AArch64] Increase code alignment

2016-06-30 Thread Wilco Dijkstra
Evandro Menezes wrote: On 06/29/16 07:59, James Greenhalgh wrote: > On Tue, Jun 21, 2016 at 02:39:23PM +0100, Wilco Dijkstra wrote: >> ping >> >> >> From: Wilco Dijkstra >> Sent: 03 June 2016 11:51 >> To: GCC Patches >> Cc: nd; philipp.toms...@theobrom

[PATCH][AArch64] Canonicalize Cortex core tunings

2016-06-29 Thread Wilco Dijkstra
code is now more similar as well as more optimal across Cortex cores. Regress passes, OK for commit? ChangeLog: 2016-06-29 Wilco Dijkstra <wdijk...@arm.com> * config/aarch64/aarch64.c (cortexa35_tunings): Enable AES fusion. Use cortexa57_branch_cost. (cortexa53_t

[PATCH] Fix big-endian bswap

2016-06-23 Thread Wilco Dijkstra
number never matches in big-endian. The test gcc.dg/optimize-bswapsi-4.c now passes on AArch64, no other changes. OK for commit? ChangeLog: 2016-06-23 Wilco Dijkstra <wdijk...@arm.com> * gcc/tree-ssa-math-opts.c (find_bswap_or_nop_1): Adjust bitnumbering for big-

[Committed][testsuite] Ensure vrnd* tests run on ARMv8 cores

2016-06-21 Thread Wilco Dijkstra
to return true for AArch64 so these tests are run on AArch64 too. Committed as trivial patch in r237653. ChangeLog: 2016-06-21 Wilco Dijkstra <wdijk...@arm.com> gcc/testsuite/ * gcc.target/aarch64/advsimd-intrinsics/vrnd.c (dg-require-effective-target): Use arm_v8_n

[Committed][testsuite] Fix tree-ssa/attr-hotcold-2.c failures

2016-06-21 Thread Wilco Dijkstra
Fix tree-ssa/attr-hotcold-2.c failures now that the test runs. GCC dumps the blocks 3 times so update count to 3 and the test passes. ChangeLog: 2016-06-21  Wilco Dijkstra  <wdijk...@arm.com>     gcc/testsuite/     * gcc.dg/tree-ssa/attr-hotcold-2.c (scan-tree-dump-times):     Set to 3 s

[Committed][testsuite] Fix vect-8.f90 test

2016-06-21 Thread Wilco Dijkstra
Due to recent improvements to the vectorizer, the number of vectorized loops needs to be increased to 21 in gfortran.dg/vect/vect-8.f90. Confirmed this test now passes on AArch64. Commited as trivial patch in r237650. ChangeLog: 2016-06-21 Wilco Dijkstra <wdijk...@arm.

Re: [PATCH][AArch64] Increase code alignment

2016-06-21 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 03 June 2016 11:51 To: GCC Patches Cc: nd; philipp.toms...@theobroma-systems.com; pins...@gmail.com; jim.wil...@linaro.org; benedikt.hu...@theobroma-systems.com; Evandro Menezes Subject: [PATCH][AArch64] Increase code alignment   Increase loop alignment

[PATCH][AArch64] Cleanup -mpc-relative-loads

2016-06-03 Thread Wilco Dijkstra
This patch cleans up the -mpc-relative-loads option processing. Rename to avoid the "no*" name and confusing !no* expressions. Fix the option processing code to implement -mno-pc-relative-loads rather than ignore it. OK for commit? ChangeLog: 2016-06-03 Wilco Dijkstra <wdi

[PATCH][AArch64] Increase code alignment

2016-06-03 Thread Wilco Dijkstra
agree on alignment of 16 for function, and 8 for loops and branches, so we should change -mcpu=generic as well if there is no disagreement - feedback welcome. OK for commit? ChangeLog: 2016-05-03 Wilco Dijkstra <wdijk...@arm.com> * gcc/config/aarch64/aarch64.c (cortexa53_t

Re: [PATCH][AArch64] Improve aarch64_modes_tieable_p

2016-06-02 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 17 May 2016 17:08 To: James Greenhalgh Cc: gcc-patches@gcc.gnu.org; nd Subject: Re: [PATCH][AArch64] Improve aarch64_modes_tieable_p James Greenhalgh wrote: > It would be handy if you could raise something in bugzi

[PATCH][AArch64] Add missing fcsel in Cortex-A57 scheduler

2016-06-02 Thread Wilco Dijkstra
The Cortex-A57 scheduler is missing fcsel, so add it. OK for commit? ChangeLog: 2016-06-02 Wilco Dijkstra <wdijk...@arm.com> * config/arm/cortex-a57.md (cortex_a57_fp_cpys): Add fcsel. --- diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md

Re: [PATCH][AArch64] Remove aarch64_cannot_change_mode_class

2016-05-27 Thread Wilco Dijkstra
other fixes in that bug. Sure. > > ChangeLog: > 2016-05-19 Wilco Dijkstra <wdijk...@arm.com> > > * gcc/config/aarch64/aarch64.h > (CANNOT_CHANGE_MODE_CLASS): Remove. > * gcc/config/aarch64/aarch64.c > (aarch64_cannot_change_mode_class): Remove fun

Re: [PATCH][AArch64] Adjust SIMD integer preference

2016-05-26 Thread Wilco Dijkstra
James Greenhalgh wrote: > I really don't like [1][2][3] this technique of attempting to work around > register allocator issues using the disparaging mechanisms. I don't see the issue as it is a standard mechanism to describe higher cost to the register allocator. On the other had the use of '*'

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-24 Thread Wilco Dijkstra
Jim Wilson wrote: > It looks like a slight lose on qdf24xx on SPEC CPU2006 at -O3. I see > about a 0.37% loss on the integer benchmarks, and no significant > change on the FP benchmarks. The integer loss is mainly due to > 458.sjeng which drops 2%. We had tried various values for >

[PATCH][AArch64] Remove aarch64_cannot_change_mode_class

2016-05-19 Thread Wilco Dijkstra
Remove aarch64_cannot_change_mode_class as the underlying issue (PR67609) has been resolved. This avoids a few unnecessary lane widening operations like: faddp d18, v18.2d mov d18, v18.d[0] Passes regress, OK for commit? ChangeLog: 2016-05-19 Wilco Dijkstra <wdijk...@arm.

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-05-18 Thread Wilco Dijkstra
Richard Biener wrote: > > Yeah ;) I'm currently bootstrapping/testing the patch that makes it possible > to > write all this in match.pd. So what was the conclusion? Improving match.pd to be able to handle more cases like this seems like a nice thing. Wilco

Re: [PATCH][AArch64] Improve aarch64_modes_tieable_p

2016-05-17 Thread Wilco Dijkstra
James Greenhalgh wrote: > It would be handy if you could raise something in bugzilla for the > register allocator deficiency. The register allocation issues are well known and we have multiple workarounds for this in place. When you allow modes to be tieable the workarounds are not as effective.

Re: [PATCH][AArch64] Adjust SIMD integer preference

2016-05-17 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 22 April 2016 16:35 To: gcc-patches@gcc.gnu.org Cc: nd Subject: [PATCH][AArch64] Adjust SIMD integer preference SIMD operations like combine prefer to have their operands in FP registers, so increase the cost of integer

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-16 Thread Wilco Dijkstra
James Greenhalgh wrote: > As this change will change code generation for all cores (except > Exynos-M1), I'd like to hear from those with more detailed knowledge of > ThunderX, X-Gene and qdf24xx before I take this patch. > > Let's give it another week or so for comments, and expand the CC list.

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-16 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 22 April 2016 17:15 To: gcc-patches@gcc.gnu.org Cc: nd Subject: [PATCH][AArch64] Improve aarch64_case_values_threshold setting GCC expands switch statements in a very simplistic way and tries to use a table expansion even

Re: [PATCH][AArch64] print_operand should not fallthrough from register operand into generic operand

2016-05-16 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 27 April 2016 17:39 To: James Greenhalgh Cc: gcc-patches@gcc.gnu.org; nd Subject: Re: [PATCH][AArch64] print_operand should not fallthrough from register operand into generic operand James Greenhalgh wrote: > So the p

Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations

2016-05-10 Thread Wilco Dijkstra
>> The new version does not seem better, as it adds a branch on the path >> and it is not smaller. > > That looks like bb-reorder isn't doing its job? Maybe it thinks that > pop is too expensive to copy? It relies on static branch probabilities, which are set completely wrong in GCC, so it ends

Re: Enabling -frename-registers?

2016-05-05 Thread Wilco Dijkstra
Ramana Radhakrishnan wrote: > > Can you file a bugzilla entry with a testcase that folks can look at please ? I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70961. Unfortunately I don't have a simple testcase that I can share. Wilco

Re: Enabling -frename-registers?

2016-05-04 Thread Wilco Dijkstra
Bernd Schmidt wrote: > On 05/04/2016 03:25 PM, Ramana Radhakrishnan wrote: >> On ARM / AArch32 I haven't seen any performance data yet - the one place we >> are concerned >> about the impact is on Thumb2 code size as regrename may end up >> inadvertently putting more >> things in high

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-28 Thread Wilco Dijkstra
Kyrill Tkachov wrote: > On 25/04/16 20:21, Wilco Dijkstra wrote: > > The GCC switch expansion is awful, so > > even with a good indirect predictor it is better to use conditional > > branches. > > In what way is it awful? If there's something we can do better at

Re: [PATCH][AArch64] print_operand should not fallthrough from register operand into generic operand

2016-04-27 Thread Wilco Dijkstra
James Greenhalgh wrote: > So the part of this patch removing the fallthrough to general operand > is not OK for trunk. > > The other parts look reasonable to me, please resubmit just those. Right, I removed the removal of the fallthrough. Here is the revised version: ChangeLog: 2016-

Re: [AArch64] Emit division using the Newton series

2016-04-27 Thread Wilco Dijkstra
James Greenhalgh wrote: > So this is off for all cores currently supported by GCC? > > I'm not sure I understand why we should take this if it will immediately > be dead code? I presume it was meant to have the vector variants enabled with -mcpu=exynos-m1 as that is where you can get a good gain

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-26 Thread Wilco Dijkstra
Evandro Menezes wrote: > > True, but the results when running on A53 could be quite different. GCC is ~1.2% faster on Cortex-A53 built for generic, but there is no difference in perlbench. Wilco

Re: [PATCH][AArch64] Replace insn to zero up SIMD registers

2016-04-26 Thread Wilco Dijkstra
Evandro Menezes wrote: >On 03/10/16 10:37, James Greenhalgh wrote: >> Thanks for sticking with it. This is OK for GCC 7 when development >> opens. >> >> Remember to mention the most recent changes in your Changelog entry >> (Remove "fp" attribute from *movhf_aarch64 and *movtf_aarch64). > > > OK

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-25 Thread Wilco Dijkstra
Evandro Menezes wrote: > I agree with your assessment, but I'm more curious to understand how > this change affects code built with the default -mcpu=generic when run > on both A53 and A57, the typical configuration of big.LITTLE machines. I wouldn't expect the result to be any different as the

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-25 Thread Wilco Dijkstra
Evandro Menezes wrote: > I assume that you mean that such improvements are true for > -mcpu=generic, yes? On which target, A53 or A57 or other? It's true for any CPU setting. The SPEC results are for Cortex-A57 however I wrote a microbenchmark that shows improvements on all targets I have

Re: [PATCH][doc] Update documentation of AArch64 options

2016-04-22 Thread Wilco Dijkstra
ex that automatically does the right thing? >> +Enable PC relative literal loads. With this option literal pools are Fixed, new version below: 2016-04-22 Wilco Dijkstra <wdijk...@arm.com> gcc/ * gcc/doc/invoke.texi (AArch64 Options): Update. -- diff --git a/gcc/doc/invok

[PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-22 Thread Wilco Dijkstra
in performance (1-2%) as well as size (0.5-1% smaller). OK for trunk? ChangeLog: 2016-04-22 Wilco Dijkstra <wdijk...@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_case_values_threshold): Return a better case_values_threshold when optimizing. -- diff --git a/gcc/config/a

[PATCH][AArch64] Adjust SIMD integer preference

2016-04-22 Thread Wilco Dijkstra
SIMD operations like combine prefer to have their operands in FP registers, so increase the cost of integer registers slightly to avoid unnecessary int<->FP moves. This improves register allocation of scalar SIMD operations. OK for trunk? ChangeLog: 2016-04-22 Wilco Dijkstra <wdijk..

[PATCH][AArch64] print_operand should not fallthrough from register operand into generic operand

2016-04-22 Thread Wilco Dijkstra
://gcc.gnu.org/ml/gcc-patches/2016-04/msg01265.html to be committed first) ChangeLog: 2016-04-22 Wilco Dijkstra <wdijk...@arm.com> gcc/ * config/aarch64/aarch64.md (add3_compareC_cconly_imm): Remove use of %w for immediate. (add3_compareC_imm): Likewise. (si

[PATCH][AArch64] Fix shift attributes

2016-04-22 Thread Wilco Dijkstra
This patch fixes the attributes of integer immediate shifts which were incorrectly modelled as register controlled shifts. Also change EXTR attribute to being a rotate. OK for trunk? ChangeLog: 2016-04-22 Wilco Dijkstra <wdijk...@arm.com> * gcc/config/aarch64/aarc

[PATCH][AArch64] Improve aarch64_modes_tieable_p

2016-04-22 Thread Wilco Dijkstra
orr v0.8b, v1.8b, v0.8b OK for trunk? ChangeLog: 2016-04-22 Wilco Dijkstra <wdijk...@arm.com> * gcc/config/aarch64/aarch64.c (aarch64_modes_tieable_p): Allow scalar/single vector modes to be tieable. --- gcc/config/aarch64/aarch64.c | 18 ++

[PATCH][doc] Update documentation of AArch64 options

2016-04-21 Thread Wilco Dijkstra
Update documentation of AArch64 options for GCC6 to be more accurate, fix a few minor mistakes and remove some duplication. Tested with "make info dvi pdf html" and checked resulting PDF is as expected. OK for trunk and backport to GCC6.1 branch? ChangeLog: 2016-04-21 Wilco Dijkst

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-04-20 Thread Wilco Dijkstra
Jakub Jelinek wrote: > On Wed, Apr 20, 2016 at 11:17:06AM +0000, Wilco Dijkstra wrote: >> Can you quantify "don't like"? I benchmarked rawmemchr on a few targets >> and it's slower than strlen, so it's hard to guess what you don't like about >> it. > > This is

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-04-20 Thread Wilco Dijkstra
Richard Biener wrote: > Better - comments below. Jakub objections to the usefulness of the transform > remain - we do have the strlen pass that uses some global knowledge to decide > on profitability. One could argue that for -Os doing the reverse transform is > profitable? In what way would it

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-04-20 Thread Wilco Dijkstra
Jakub Jelinek wrote: > I still don't like this transformation and would very much prefer to see > using rawmemchr instead on targets that provide it, and also this is > something that IMHO should be done in the tree-ssa-strlen.c pass together > with the other optimizations in there. Similarly to

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-04-19 Thread Wilco Dijkstra
this kind of thing either. Wilco ChangeLog: 2016-04-19 Wilco Dijkstra <wdijk...@arm.com> gcc/ * gcc/gimple-fold.c (gimple_fold_builtin_strchr): New function to optimize strchr (s, 0) to strlen. (gimple_fold_builtin): Add BUILT_IN_STRCHR case. testsuite/

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-04-18 Thread Wilco Dijkstra
Jakub Jelinek <ja...@redhat.com> wrote: > On Mon, Apr 18, 2016 at 05:00:45PM +0000, Wilco Dijkstra wrote: >> Optimize strchr (s, 0) to s + strlen (s). strchr (s, 0) appears a common >> idiom for finding the end of a string, however it is not a very efficient >> way of

[PATCH] Optimize strchr (s, 0) to strlen

2016-04-18 Thread Wilco Dijkstra
twice as fast as strchr on strings of 1KB). OK for trunk? ChangeLog: 2016-04-18 Wilco Dijkstra <wdijk...@arm.com> gcc/ * gcc/builtins.c (fold_builtin_strchr): Optimize strchr (s, 0) into strlen. testsuite/ * gcc/testsuite/gcc.dg/strlenopt-20.c: Update test.

Re: [AArch64] Add more precision choices for the reciprocal square root approximation

2016-04-01 Thread Wilco Dijkstra
Evandro Menezes wrote: > > I hope that this gets in the ballpark of what's been discussed previously. Yes that's very close to what I had in mind. A minor issue is that the vector modes cannot work as they start at MAX_MODE_FLOAT (which is > 32): +/* Control approximate alternatives to certain

Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Wilco Dijkstra
Evandro Menezes wrote: > However, I don't think that there's the need to handle any special case > for division. The only case when the approximation differs from > division is when the numerator is infinity and the denominator, zero, > when the approximation returns infinity and the division,

Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Wilco Dijkstra
Evandro Menezes wrote: > > The division variant should use the same latency reduction trick I > > mentioned for sqrt. > > I don't think that it applies here, since it doesn't have to deal with > special cases. No it applies as it's exactly the same calculation: x * rsqrt(y) and x * recip(y). In

Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Wilco Dijkstra
Evandro Menezes wrote: On 03/23/16 11:24, Evandro Menezes wrote: > On 03/17/16 15:09, Evandro Menezes wrote: >> This patch implements FP division by an approximation using the Newton >> series. >> >> With this patch, DF division is sped up by over 100% and SF division, >> zilch, both on A57 and on

Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-04-01 Thread Wilco Dijkstra
Evandro Menezes wrote: > > Ping^1 I haven't seen a newer version that incorporates my feedback. To recap what I'd like to see is a more general way to select approximations based on mode. I don't believe that looking at the inner mode works in general, and it doesn't make sense to add internal

Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-03-19 Thread Wilco Dijkstra
Hi Evandro, > For example, though this approximation is improves the performance > noticeably for DF on A57, for SF, not so much, if at all. I'm still skeptical that you ever can get any gain on scalars. I bet the only gain is on 4x vectorized floats. So what I would like to see is this

<    5   6   7   8   9   10   11   12   >