Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-10-02 Thread Wilco Dijkstra
Hi Ramana, >> I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf >> --build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the >> default armhf settings are incorrect. I shouldn't need the --with-float=hard >> since >> that is obviously implied by armhf, a

[PATCH v2] AArch64: Add inline memmove expansion

2023-10-16 Thread Wilco Dijkstra
v2: further cleanups, improved comments Add support for inline memmove expansions. The generated code is identical as for memcpy, except that all loads are emitted before stores rather than being interleaved. The maximum size is 256 bytes which requires at most 16 registers. Passes regress/boot

Re: [PATCH v2] AArch64: Fix strict-align cpymem/setmem [PR103100]

2023-10-16 Thread Wilco Dijkstra
ping   v2: Use UINTVAL, rename max_mops_size. The cpymemdi/setmemdi implementation doesn't fully support strict alignment. Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT. Clean up the condition when to use MOPS.     Passes regress/bootstrap, OK for commit?     gcc/Cha

Re: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061]

2023-10-16 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 02 June 2023 18:28 To: GCC Patches Cc: Richard Sandiford ; Kyrylo Tkachov Subject: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061]   Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible with existing binaries

Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-10-16 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 04 August 2023 16:05 To: GCC Patches ; Richard Sandiford Cc: Kyrylo Tkachov Subject: [PATCH] libatomic: Improve ifunc selection on AArch64   Add support for ifunc selection based on CPUID register.  Neoverse N1 supports atomic 128-bit load/store, so use

Re: [PATCH] AArch64: Fix __sync_val_compare_and_swap [PR111404]

2023-10-16 Thread Wilco Dijkstra
ping   __sync_val_compare_and_swap may be used on 128-bit types and either calls the outline atomic code or uses an inline loop.  On AArch64 LDXP is only atomic if the value is stored successfully using STXP, but the current implementations do not perform the store if the comparison fails.  In thi

Re: [PATCH] AArch64: Fix __sync_val_compare_and_swap [PR111404]

2023-10-16 Thread Wilco Dijkstra
Hi Ramana, > I remember this to be the previous discussions and common understanding. > > https://gcc.gnu.org/legacy-ml/gcc/2016-06/msg00017.html > > and here > > https://gcc.gnu.org/legacy-ml/gcc-patches/2017-02/msg00168.html > > Can you point any discussion recently that shows this has changed

[PATCH] AArch64: Improve immediate generation

2023-10-19 Thread Wilco Dijkstra
Further improve immediate generation by adding support for 2-instruction MOV/EOR bitmask immediates. This reduces the number of 3/4-instruction immediates in SPECCPU2017 by ~2%. Passes regress, OK for commit? gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)

[PATCH] AArch64: Cleanup memset expansion

2023-10-19 Thread Wilco Dijkstra
Cleanup memset implementation. Similar to memcpy/memmove, use an offset and bytes throughout. Simplify the complex calculations when optimizing for size by using a fixed limit. Passes regress/bootstrap, OK for commit? gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_progress_poin

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-19 Thread Wilco Dijkstra
Hi Steve, > This patch checks for SIMD functions and saves the extra registers when > needed. It does not change the caller behavour, so with just this patch > there may be values saved by both the caller and callee. This is not > efficient, but it is correct code. I tried a few simple test cas

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-20 Thread Wilco Dijkstra
Steve Ellcey wrote: > Yes, I see where I missed this in aarch64_push_regs > and aarch64_pop_regs.  I think that is why the second of > Wilco's two examples (f2) is wrong.  I am unclear about > exactly what is meant by writeback and why we have it and > how that and callee_adjust are used.  Any cha

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-20 Thread Wilco Dijkstra
Hi Umesh, Looking at your patch, this would break all results which need to be normalized. Index: libgcc/config/arm/ieee754-df.S === --- libgcc/config/arm/ieee754-df.S (revision 262850) +++ libgcc/config/arm/ieee754-df.S (

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-20 Thread Wilco Dijkstra
Umesh Kalappa wrote: > We tried some of the normalisation numbers and the fix works and please > could you help us with the input ,where  if you see that fix breaks down. Well try any set of inputs which require normalisation. You'll find these no longer get normalised and so will get incorrect r

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-23 Thread Wilco Dijkstra
Umesh Kalappa wrote: > We tested on the SP and yes the problem persist on the SP too and > attached patch will fix the both SP and DP issues for the  denormal > resultant. The patch now looks correct to me (but I can't approve). > We bootstrapped the compiler ,look ok to us with minimal testing

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-23 Thread Wilco Dijkstra
Steve Ellcey wrote: > OK, I think I understand this a bit better now.  I think my main > problem is with the  term 'writeback' which I am not used to seeing. > But if I understand things correctly we are saving one or two registers > and (possibly) updating the stack pointer using auto-increment/a

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-27 Thread Wilco Dijkstra
Hi Nicolas, I think your patch doesn't quite work as expected: @@ -238,9 +238,10 @@ LSYM(Lad_a): movsip, ip, lsl #1 adcsxl, xl, xl adc xh, xh, xh - tst xh, #0x0010 - sub r4, r4, #1 - bne LSYM(Lad_e) + subsr4, r4, #1 +

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-27 Thread Wilco Dijkstra
Nicolas Pitre wrote: >> However if r4 is non-zero, the carry will be set, and the tsths will be >> executed. This >> clears the carry and sets the Z flag based on bit 20. > > No, not at all. The carry is not affected. And that's the point of the > tst instruction here rather than a cmp: it sets

[PATCH v2] AArch64: Improve immediate generation

2023-10-24 Thread Wilco Dijkstra
v2: Use check-function-bodies in tests Further improve immediate generation by adding support for 2-instruction MOV/EOR bitmask immediates. This reduces the number of 3/4-instruction immediates in SPECCPU2017 by ~2%. Passes regress, OK for commit? gcc/ChangeLog: * config/aarch64/aarch64

Re: [PATCH v2] AArch64: Fix strict-align cpymem/setmem [PR103100]

2023-11-06 Thread Wilco Dijkstra
ping   v2: Use UINTVAL, rename max_mops_size. The cpymemdi/setmemdi implementation doesn't fully support strict alignment. Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT. Clean up the condition when to use MOPS.     Passes regress/bootstrap, OK for commit?     gcc/Ch

Re: [PATCH v2] AArch64: Add inline memmove expansion

2023-11-06 Thread Wilco Dijkstra
ping   v2: further cleanups, improved comments Add support for inline memmove expansions.  The generated code is identical as for memcpy, except that all loads are emitted before stores rather than being interleaved.  The maximum size is 256 bytes which requires at most 16 registers. Passes regre

Re: [PATCH] AArch64: Cleanup memset expansion

2023-11-06 Thread Wilco Dijkstra
ping   Cleanup memset implementation.  Similar to memcpy/memmove, use an offset and bytes throughout.  Simplify the complex calculations when optimizing for size by using a fixed limit. Passes regress/bootstrap, OK for commit?     gcc/ChangeLog:     * config/aarch64/aarch64.cc (aarch64_progre

Re: [PATCH] AArch64: Fix __sync_val_compare_and_swap [PR111404]

2023-11-06 Thread Wilco Dijkstra
  ping   __sync_val_compare_and_swap may be used on 128-bit types and either calls the outline atomic code or uses an inline loop.  On AArch64 LDXP is only atomic if the value is stored successfully using STXP, but the current implementations do not perform the store if the comparison fails.  In

Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-11-06 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 04 August 2023 16:05 To: GCC Patches ; Richard Sandiford Cc: Kyrylo Tkachov Subject: [PATCH] libatomic: Improve ifunc selection on AArch64   Add support for ifunc selection based on CPUID register.  Neoverse N1 supports atomic 128-bit load/store, so use

Re: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061]

2023-11-06 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 02 June 2023 18:28 To: GCC Patches Cc: Richard Sandiford ; Kyrylo Tkachov Subject: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061]   Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible with existing binaries

[PATCH] AArch64: Enable fast shifts on Neoverse N1

2020-09-14 Thread Wilco Dijkstra
, regress pass, OK for commit? ChangeLog: 2020-09-11 Wilco Dijkstra * config/aarch64/aarch64.c (neoversen1_tunings): Enable AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND. --- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index

[PATCH 1/2] AArch64: Cleanup CPU option processing code

2020-09-14 Thread Wilco Dijkstra
ommit? ChangeLog: 2020-09-03 Wilco Dijkstra * config.gcc (aarch64*-*-*): Simplify --with-cpu and --with-arch processing. Add support for architectural extensions. * config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Remove AARCH64_CPU_DEFAULT_FLAGS. * config/aa

[PATCH 2/2] AArch64: Add support for --with-tune

2020-09-14 Thread Wilco Dijkstra
e, so explicitly allow that. Co-authored-by: Delia Burduv Bootstrap OK, regress pass, OK to commit? ChangeLog 2020-09-03 Wilco Dijkstra * config.gcc (aarch64*-*-*): Add --with-tune. Support --with-cpu=native. * config/aarch64/aarch64.h (OPTION_DEFAULT_SPECS): Add -

Re: [PATCH 1/2] AArch64: Cleanup CPU option processing code

2020-09-14 Thread Wilco Dijkstra
Hi Richard, >On 14/09/2020 15:19, Wilco Dijkstra wrote: >> The --with-cpu/--with-arch configure option processing not only checks valid >> arguments >> but also sets TARGET_CPU_DEFAULT with a CPU and extension bitmask.  This >> isn't used >> however since a

[PATCH] PR85678: Change default to -fno-common

2019-10-25 Thread Wilco Dijkstra
-fcommon. It is about time to change the default. OK for commit? ChangeLog 2019-10-25 Wilco Dijkstra PR85678 * common.opt (fcommon): Change init to 1. doc/ * invoke.texi (-fcommon): Update documentation. --- diff --git a/gcc/common.opt b/gcc/common.opt index

Re: [PATCH] PR85678: Change default to -fno-common

2019-10-28 Thread Wilco Dijkstra
Hi Jeff, > Has this been bootstrapped and regression tested? Yes, it bootstraps OK of course. I ran regression over the weekend, there are a few minor regressions in lto due to relying on tentative definitions and a few latent bugs. I'd expect there will be a few similar failures on other targets

Re: [PATCH] PR85678: Change default to -fno-common

2019-10-28 Thread Wilco Dijkstra
Hi, >> I suppose targets can override this decision. > I think they probably could via the override_options mechanism. Yes, it's trivial to add this to target_option_override(): if (!global_options_set.x_flag_no_common) flag_no_common = 0; Cheers, Wilco

Re: [PATCH] PR85678: Change default to -fno-common

2019-10-29 Thread Wilco Dijkstra
Hi Iain, > for the record,  Darwin bootstraps OK with the change (which is to be > expected, > since the preferred setting for it is -fno-common). That's good to hear. > Testsuite fails are order “a few hundred” mostly seem to be related to > tree-prof > and vector tests (plus the anticipated

[PATCH v2] PR85678: Change default to -fno-common

2019-10-29 Thread Wilco Dijkstra
to C code only, C++ code is not affected by -fcommon. It is about time to change the default. Bootstrap OK, passes testsuite on AArch64. OK for commit? ChangeLog 2019-10-29 Wilco Dijkstra PR85678 * common.opt (fcommon): Change init to 1. doc/ * invoke.texi (-fcommon

Re: [PATCH v2] PR85678: Change default to -fno-common

2019-10-30 Thread Wilco Dijkstra
Hi Richard, > Please don't add -fcommon in lto.exp. So what is the best way to add an extra option to lto.exp? Note dg-lto-options completely overrides the options from lto.exp, so I can't use that except in tests which already use it. Cheers, Wilco

Re: [PATCH v2] PR85678: Change default to -fno-common

2019-11-04 Thread Wilco Dijkstra
Hi Richard, >> > Please don't add -fcommon in lto.exp. >> >> So what is the best way to add an extra option to lto.exp? >> Note dg-lto-options completely overrides the options from lto.exp, so I can't >> use that except in tests which already use it. > > On what testcases do you need it at all? T

Re: [PATCH v3] PR85678: Change default to -fno-common

2019-11-05 Thread Wilco Dijkstra
by -fcommon. It is about time to change the default. Passes bootstrap and regress on AArch64 and x64. OK for commit? ChangeLog 2019-11-05 Wilco Dijkstra PR85678 * common.opt (fcommon): Change init to 1. doc/ * invoke.texi (-fcommon): Update documentation. testsuite/

[PATCH][Arm] Only enable fsched-pressure with Ofast

2019-11-06 Thread Wilco Dijkstra
ating point code is generally beneficial (more registers and higher latencies), only enable the pressure scheduler with -Ofast. On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well as a 0.2% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-11-06

[PATCH][ARM] Improve max_cond_insns setting for Cortex cores

2019-11-06 Thread Wilco Dijkstra
BLOCK. Also use the CPU tuning setting when a CPU/tune is selected if -mrestrict-it is not explicitly set. On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well as a 0.4% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-08-19 Wilco Dijkstra

[PATCH] PR90838: Support ctz idioms

2019-11-12 Thread Wilco Dijkstra
18, 6, 11, 5, 10, 9 }; return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27]; } Is optimized to: rbitw0, w0 clz w0, w0 and w0, w0, 31 ret Bootstrapped on AArch64. OK for commit? ChangeLog: 2019-11-12 Wilco Dijkstra

Re: [PATCH] PR90838: Support ctz idioms

2019-11-13 Thread Wilco Dijkstra
Hi Segher, > Out of interest, what uses this? I have never seen it before. It's used in sjeng in SPEC and gives a 2% speedup on Cortex-A57. Tricks like this used to be very common 20 years ago since a loop or binary search is way too slow and few CPUs supported fast clz/ctz instructions. It's o

Re: [PATCH] Further bootstrap unbreak (was Re: [PATCH] PR90838: Support ctz idioms)

2020-01-13 Thread Wilco Dijkstra
Hi Jakub, On Sat, Jan 11, 2020 at 05:30:52PM +0100, Jakub Jelinek wrote: > On Sat, Jan 11, 2020 at 05:24:19PM +0100, Andreas Schwab wrote: > > ../../gcc/tree-ssa-forwprop.c: In function 'bool > > simplify_count_trailing_zeroes(gimple_stmt_iterator*)': > > ../../gcc/tree-ssa-forwprop.c:1925:23: er

[PATCH] Fix ctz issues (PR93231)

2020-01-13 Thread Wilco Dijkstra
returns 0 or 1. Add extra test cases. (note the diff uses the old tree and includes Jakub's bootstrap fixes) Bootstrap OK on AArch64 and x64. ChangeLog: 2020-01-13 Wilco Dijkstra PR tree-optimization/93231 * tree-ssa-forwprop.c (optimize_count_trailing_zeroes)

Re: [PATCH] Fix ctz issues (PR93231)

2020-01-15 Thread Wilco Dijkstra
on negative shift counts or multiply constants. Check the type is a char type for the string constant case to avoid accidentally matching a wide STRING_CST. Add a tree_expr_nonzero_p check to allow the optimization even if CTZ_DEFINED_VALUE_AT_ZERO returns 0 or 1. Add extra test cases. Bootstrap OK on

[PATCH][AArch64] Fix shrinkwrapping interactions with atomics (PR92692)

2020-01-16 Thread Wilco Dijkstra
this fixes the failure you were getting? ChangeLog: 2020-01-16 Wilco Dijkstra PR target/92692 * config/aarch64/aarch64.c (aarch64_split_compare_and_swap) Add assert to ensure prolog has been emitted. (aarch64_split_atomic_op): Likewise. * config/aarch64

Re: [PATCH][AARCH64] Set jump-align=4 for neoversen1

2020-01-16 Thread Wilco Dijkstra
ping Testing shows the setting of 32:16 for jump alignment has a significant codesize cost, however it doesn't make a difference in performance. So set jump-align to 4 to get 1.6% codesize improvement. OK for commit? ChangeLog 2019-12-24 Wilco Dijkstra * config/aarch64/aarc

Re: [PATCH][AARCH64] Enable compare branch fusion

2020-01-16 Thread Wilco Dijkstra
ping Enable the most basic form of compare-branch fusion since various CPUs support it. This has no measurable effect on cores which don't support branch fusion, but increases fusion opportunities on cores which do. Bootstrapped on AArch64, OK for commit? ChangeLog: 2019-12-24 Wilco Dij

Re: [PATCH][Arm] Only enable fsched-pressure with Ofast

2020-01-16 Thread Wilco Dijkstra
uling floating point code is generally beneficial (more registers and higher latencies), only enable the pressure scheduler with -Ofast. On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well as a 0.2% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-11-06

Re: [PATCH][AARCH64] Enable compare branch fusion

2020-01-17 Thread Wilco Dijkstra
Hi Richard, > If you're able to say for the record which cores you tested, then that'd > be good. I've mostly checked it on Cortex-A57 - if there is any affect, it would be on older cores. > OK, thanks.  I agree there doesn't seem to be an obvious reason why this > would pessimise any cores sign

Re: [PATCH][AARCH64] Set jump-align=4 for neoversen1

2020-01-17 Thread Wilco Dijkstra
Hi Kyrill & Richard, > I was leaving this to others in case it was obvious to them. On the > basis that silence suggests it wasn't, :-) could you go into more details? > Is it expected on first principles that jump alignment doesn't matter > for Neoverse N1, or is this purely based on experimenta

Re: [PATCH 3/4 GCC11] IVOPTs Consider cost_step on different forms during unrolling

2020-01-20 Thread Wilco Dijkstra
Hi Kewen, Would it not make more sense to use the TARGET_ADDRESS_COST hook to return different costs for immediate offset and register offset addressing, and ensure IVOpts correctly takes this into account? On AArch64 we've defined different costs for immediate offset, register offset, register o

Re: [PATCH][ARM] Correctly set SLOW_BYTE_ACCESS

2020-01-21 Thread Wilco Dijkstra
r3, r2, r3 add r0, r0, r3 bx lr Bootstrap OK, OK for commit? ChangeLog: 2019-09-11 Wilco Dijkstra * config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1. -- diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index e07cf03538c5bb23e3285859b9e44a6

Re: [PATCH v2][ARM] Disable code hoisting with -O3 (PR80155)

2020-01-21 Thread Wilco Dijkstra
code hoisting for -O3 and higher. OK for commit? ChangeLog: 2019-11-26 Wilco Dijkstra PR tree-optimization/80155 * common/config/arm/arm-common.c (arm_option_optimization_table): Disable -fcode-hoisting with -O3. -- diff --git a/gcc/common/config/arm/arm-common.c b/gcc/c

Re: [PATCH][AArch64] Fix shrinkwrapping interactions with atomics (PR92692)

2020-01-27 Thread Wilco Dijkstra
Hi Segher, > On Thu, Jan 16, 2020 at 12:50:14PM +0000, Wilco Dijkstra wrote: >> The separate shrinkwrapping pass may insert stores in the middle >> of atomics loops which can cause issues on some implementations. >> Avoid this by delaying splitting of atomic patterns until a

[PATCH][AArch64] Improve popcount expansion

2020-02-03 Thread Wilco Dijkstra
expansion is now: fmovs0, w0 cnt v0.8b, v0.8b addvb0, v0.8b fmovw0, s0 Bootstrap OK, passes regress. ChangeLog 2020-02-02 Wilco Dijkstra gcc/ * config/aarch64/aarch64.md (popcount2): Improve expansion. * config/aarch64/aarch64-simd.md

[PATCH][AArch64] Improve clz patterns

2020-02-04 Thread Wilco Dijkstra
Wilco Dijkstra * config/aarch64/aarch64.md (clz2): Mask the clz result. (clrsb2): Likewise. (ctz2): Likewise. -- diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5edc76ee14b55b2b4323530e10bd22b3ffca483e

Re: [PATCH][AArch64] Improve popcount expansion

2020-02-04 Thread Wilco Dijkstra
Hi Andrew, > You might want to add a testcase that the autovectorizers too. > > Currently we get also: > >    ldr q0, [x0] >    addv    b0, v0.16b >    umov    w0, v0.b[0] >    ret My patch doesn't change this case on purpose - there are also many intrinsics which generate re

Re: [PATCH][ARM] Correctly set SLOW_BYTE_ACCESS

2020-02-04 Thread Wilco Dijkstra
r3, r2, r3 add r0, r0, r3 bx lr Bootstrap OK, OK for commit? ChangeLog: 2019-09-11 Wilco Dijkstra * config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1. -- diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index e07cf03538c5bb23e3285859b9e44a6

Re: [PATCH][Arm] Only enable fsched-pressure with Ofast

2020-02-04 Thread Wilco Dijkstra
uling floating point code is generally beneficial (more registers and higher latencies), only enable the pressure scheduler with -Ofast. On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well as a 0.2% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-11-06

Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores

2020-02-04 Thread Wilco Dijkstra
s have max_cond_insns set to 5 due to historical reasons. Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A cores, so change it to 2. Set it to 4 on older in-order cores as that is the MAX_INSN_PER_IT_BLOCK limit for Thumb-2. Bootstrapped on armhf. OK for commit? ChangeLo

Re: [PATCH][ARM] Remove support for MULS

2020-02-04 Thread Wilco Dijkstra
Any further comments? Note GCC doesn't support S/UMULLS either since it is equally useless. It's no surprise that Thumb-2 removed support for flag-setting 64-bit multiplies, while AArch64 didn't add flag-setting multiplies. So there is no argument that these instructions are in any way useful to

Re: [PATCH][AArch64] Improve clz patterns

2020-02-04 Thread Wilco Dijkstra
range of clz/ctz/cls results, Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary sign extension. Avoid this by adding an explicit AND with 127 in the patterns. Deepsjeng performance improves by ~0.6%. Bootstrap OK. ChangeLog: 2020-02-04 Wilco Dijkstra PR rtl-o

Re: [PATCH][AARCH64] Fix for PR86901

2020-02-05 Thread Wilco Dijkstra
Hi Modi, Thanks for your patch! > Adding support for extv and extzv on aarch64 as described in > PR86901. I also changed > extract_bit_field_using_extv to use gen_lowpart_if_possible instead of > gen_lowpart directly. Using > gen_lowpart directly will fail with an ICE in building libgcc when t

Re: [PATCH][AARCH64] Fix for PR86901

2020-02-07 Thread Wilco Dijkstra
Hi, Richard wrote: > However, inside the compiler we really want to represent this as a >shift. ... > Ideally this would be handled inside the mid-end expansion of an > extract, but in the absence of that I think this is best done inside the > extv expansion so that we never end up with a real

Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Wilco Dijkstra
mance improves by ~0.6%. Bootstrap OK. ChangeLog: 2020-02-12 Wilco Dijkstra PR rtl-optimization/93565 * config/aarch64/aarch64.c (aarch64_rtx_costs): Add CTZ costs. * gcc.target/aarch64/pr93565.c: New test. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aa

Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Wilco Dijkstra
Hi Richard, See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565#c8 - the problem is more generic like I suspected and it's easy to create similar examples. So while this turned out to be an easy worksaround for ctz, there general case is harder to avoid since you still want to allow beneficial

Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Wilco Dijkstra
Hi Andrew, > Yes I agree a better cost model for CTZ/CLZ is the right solution but > I disagree with 2 ALU instruction as the cost.  It should either be > the same cost as a multiply or have its own cost entry. > For an example on OcteonTX (and ThunderX1), the cost of CLS/CLZ is 4 > cycles, the sa

Re: [PATCH] PR85678: Change default to -fno-common

2019-12-04 Thread Wilco Dijkstra
Hi Jeff, >> I've noticed quite significant package failures caused by the revision. >> Would you please consider documenting this change in porting_to.html >> (and in changes.html) for GCC 10 release? > > I'm not in the office right now, but figured I'd chime in.  I'd estimate > 400-500 packages a

[wwwdocs] Document -fcommon default change

2019-12-05 Thread Wilco Dijkstra
Hi, Add entries for the default change in changes.html and porting_to.html. Passes the W3 validator. Cheers, Wilco --- diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html index e02966460450b7aad884b2d45190b9ecd8c7a5d8..304e1e8ccd38795104156e86b92062696fa5aa8b 100644 --- a/htd

Re: [PATCH] PR85678: Change default to -fno-common

2019-12-05 Thread Wilco Dijkstra
Hi, I have updated the documentation patch here and added relevant maintainers so hopefully this can go in soon: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00311.html I moved the paragraph in changes.html to the C section like you suggested. Would it make sense to link to the porting_to entry

Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-06 Thread Wilco Dijkstra
Hi Christophe, > This patch (r278968) is causing regressions when building GCC > --target arm-none-linux-gnueabihf > --with-mode thumb > --with-cpu cortex-a57 > --with-fpu crypto-neon-fp-armv8 > because the assembler (gas version 2.33.1) complains: > /ccc7z5eW.s:4267: IT blocks containing more tha

Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-06 Thread Wilco Dijkstra
Hi Christophe, I've added an option to allow the warning to be enabled/disabled: https://sourceware.org/ml/binutils/2019-12/msg00093.html Cheers, Wilco

Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-06 Thread Wilco Dijkstra
Hi Christophe, > In practice, how do you activate it when running the GCC testsuite? Do > you plan to send a GCC patch to enable this assembler flag, or do you > locally enable that option by default in your binutils? The warning is off by default so there is no need to do anything in the testsu

Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-09 Thread Wilco Dijkstra
Hi Christophe, >> The warning is off by default so there is no need to do anything in the >> testsuite, >> you just need a fixed binutils. >> > > Don't we want to fix GCC to stop generating the offending sequence? Why? All ARMv8 implementations have to support it, and despite the warning code a

Re: [PATCH] PR90838: Support ctz idioms

2019-12-11 Thread Wilco Dijkstra
d)((x & -x) * 0x077CB531U)) >> 27]; } Is optimized to: rbitw0, w0 clz w0, w0 and w0, w0, 31 ret Bootstrapped on AArch64. OK for commit? ChangeLog: 2019-12-11 Wilco Dijkstra PR tree-optimization/90838 * tree-ssa-forwprop.c

[PATCH][AArch64] Fixup core tunings

2019-12-13 Thread Wilco Dijkstra
ortex-A65AE to cortexa53. Bootstrap OK, OK for commit? ChangeLog: 2019-12-11 Wilco Dijkstra * config/aarch64/aarch64-cores.def: Update settings for cortex-a76ae, cortex-a77, cortex-a65, cortex-a65ae, neoverse-e1, cortex-a76.cortex-a55. -- diff --git a/gcc/config/aarch64/aa

Re: [PATCH][AArch64] Fixup core tunings

2019-12-17 Thread Wilco Dijkstra
7;s the same as for Cortex-A65. Set the scheduler for Cortex-A65 and Cortex-A65AE to cortexa53. Bootstrap OK, OK for commit? ChangeLog: 2019-12-17 Wilco Dijkstra * config/aarch64/aarch64-cores.def: ("cortex-a76ae"): Use neoversen1 tuning. ("cortex-a77")

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-12-19 Thread Wilco Dijkstra
Hi, >> I've noticed that your patch caused a regression: >> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments >> "internal loop alignment added" 1 I've created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93007 Cheers, Wilco

[PATCH][AARCH64] Enable compare branch fusion

2019-12-24 Thread Wilco Dijkstra
Enable the most basic form of compare-branch fusion since various CPUs support it. This has no measurable effect on cores which don't support branch fusion, but increases fusion opportunities on cores which do. Bootstrapped on AArch64, OK for commit? ChangeLog: 2019-12-24 Wilco Dij

[PATCH][AARCH64] Set jump-align=4 for neoversen1

2019-12-24 Thread Wilco Dijkstra
Testing shows the setting of 32:16 for jump alignment has a significant codesize cost, however it doesn't make a difference in performance. So set jump-align to 4 to get 1.6% codesize improvement. OK for commit? ChangeLog 2019-12-24 Wilco Dijkstra * config/aarch64/aarc

Re: [wwwdocs] Document -fcommon default change

2020-01-07 Thread Wilco Dijkstra
Hi, >On 1/6/20 7:10 AM, Jonathan Wakely wrote: >> GCC now defaults to -fno-common.  As a result, global >> variable accesses are more efficient on various targets.  In C, global >> variables with multiple tentative definitions will result in linker >> errors. > > This is better.  I'd also s/will/n

[COMMITTED] ARM: Fix builtin-bswap-1.c test [PR113915]

2024-03-08 Thread Wilco Dijkstra
On Thumb-2 the use of CBZ blocks conditional execution, so change the test to compare with a non-zero value. gcc/testsuite/ChangeLog: PR target/113915 * gcc.target/arm/builtin-bswap.x: Fix test to avoid emitting CBZ. --- diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap.x

Re: [PATCH] libatomic: Fix build for --disable-gnu-indirect-function [PR113986]

2024-03-26 Thread Wilco Dijkstra
Hi Richard, > This description is too brief for me.  Could you say in detail how the > new scheme works?  E.g. the description doesn't explain: > > -if ARCH_AARCH64_HAVE_LSE128 > -AM_CPPFLAGS   = -DHAVE_FEAT_LSE128 > -endif That is not needed because we can include auto-config.h in atomic_16.

[PATCH] libatomic: Cleanup macros in atomic_16.S

2024-03-26 Thread Wilco Dijkstra
As mentioned in https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648397.html , do some additional cleanup of the macros and aliases: Cleanup the macros to add the libat_ prefixes in atomic_16.S. Emit the alias to __atomic_ when ifuncs are not enabled in the ENTRY macro. Passes regress and

[PATCH] libgcc: Add missing HWCAP entries to aarch64/cpuinfo.c

2024-04-02 Thread Wilco Dijkstra
A few HWCAP entries are missing from aarch64/cpuinfo.c. This results in build errors on older machines. This counts a trivial build fix, but since it's late in stage 4 I'll let maintainers chip in. OK for commit? libgcc/ * config/aarch64/cpuinfo.c: Add HWCAP_EVTSTRM, HWCAP_CRC32, HWC

[PATCH] AArch64: memcpy/memset expansions should not emit LDP/STP [PR113618]

2024-02-01 Thread Wilco Dijkstra
The new RTL introduced for LDP/STP results in regressions due to use of UNSPEC. Given the new LDP fusion pass is good at finding LDP opportunities, change the memcpy, memmove and memset expansions to emit single vector loads/stores. This fixes the regression and enables more RTL optimization on th

[PATCH] ARM: Fix conditional execution [PR113915]

2024-02-21 Thread Wilco Dijkstra
By default most patterns can be conditionalized on Arm targets. However Thumb-2 predication requires the "predicable" attribute be explicitly set to "yes". Most patterns are shared between Arm and Thumb(-2) and are marked with "predicable". Given this sharing, it does not make sense to use a di

Re: [PATCH] AArch64: memcpy/memset expansions should not emit LDP/STP [PR113618]

2024-02-22 Thread Wilco Dijkstra
Hi Richard, > It looks like this is really doing two things at once: disabling the > direct emission of LDP/STP Qs, and switching the GPR handling from using > pairs of DImode moves to single TImode moves.  At least, that seems to be > the effect of... No it still uses TImode for the !TARGET_SIMD

Re: [PATCH] ARM: Fix conditional execution [PR113915]

2024-02-23 Thread Wilco Dijkstra
Hi Richard, > This bit isn't.  The correct fix here is to fix the pattern(s) concerned to > add the missing predicate. > > Note that builtin-bswap.x explicitly mentions predicated mnemonics in the > comments. I fixed the patterns in v2. There are likely some more, plus we could likely merge ma

[PATCH] libatomic: Fix build for --disable-gnu-indirect-function [PR113986]

2024-02-23 Thread Wilco Dijkstra
Fix libatomic build to support --disable-gnu-indirect-function on AArch64. Always build atomic_16.S and add aliases to the __atomic_* functions if !HAVE_IFUNC. Passes regress and bootstrap, OK for commit? libatomic: PR target/113986 * Makefile.in: Regenerated. * Makefile.

Re: [PATCH] ARM: Fix conditional execution [PR113915]

2024-02-26 Thread Wilco Dijkstra
Hi Richard, > Did you test this on a thumb1 target?  It seems to me that the target parts > that you've > removed were likely related to that.  In fact, I don't see why this test > would need to be changed at all. The testcase explicitly forces a Thumb-2 target (arm_arch_v6t2). The patterns wer

[PATCH] AArch64: Reassociate CONST in address expressions [PR112573]

2024-01-10 Thread Wilco Dijkstra
GCC tends to optimistically create CONST of globals with an immediate offset. However it is almost always better to CSE addresses of globals and add immediate offsets separately (the offset could be merged later in single-use cases). Splitting CONST expressions with an index in aarch64_legitimize_

Re: [PATCH] AArch64: Reassociate CONST in address expressions [PR112573]

2024-01-16 Thread Wilco Dijkstra
Hi Richard, >> +  rtx base = strip_offset_and_salt (XEXP (x, 1), &offset); > > This should be just strip_offset, so that we don't lose the salt > during optimisation. Fixed. > + > +  if (offset.is_constant ()) > I'm not sure this is really required.  Logically the same thing > would app

[PATCH] AArch64: Add -mcpu=cobalt-100

2024-01-16 Thread Wilco Dijkstra
Add support for -mcpu=cobalt-100 (Neoverse N2 with a different implementer ID). Passes regress, OK for commit? gcc/ChangeLog: * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add 'cobalt-100' CPU. * config/aarch64/aarch64-tune.md: Regenerated. * doc/invoke.texi (-mcpu):

Re: [PATCH] AArch64: Add -mcpu=cobalt-100

2024-01-25 Thread Wilco Dijkstra
Hi, >> Add support for -mcpu=cobalt-100 (Neoverse N2 with a different implementer >> ID). >> >> Passes regress, OK for commit? > > Ok. Also OK to backport to GCC 13, 12 and 11? Cheers, Wilco

[PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS

2024-01-30 Thread Wilco Dijkstra
(follow-on based on review comments on https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641913.html) Remove the tune AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS since it is only used by an old core and doesn't properly support -Os. SPECINT_2017 shows that removing it has no performance difference

Re: [PATCH v4] AArch64: Cleanup memset expansion

2024-01-30 Thread Wilco Dijkstra
Hi Richard, >> That tune is only used by an obsolete core. I ran the memcpy and memset >> benchmarks from Optimized Routines on xgene-1 with and without LDP/STP. >> There is no measurable penalty for using LDP/STP. I'm not sure why it was >> ever added given it does not do anything useful. I'll po

Re: [PATCH v3] AArch64: Cleanup memset expansion

2023-12-22 Thread Wilco Dijkstra
v3: rebased to latest trunk Cleanup memset implementation. Similar to memcpy/memmove, use an offset and bytes throughout. Simplify the complex calculations when optimizing for size by using a fixed limit. Passes regress & bootstrap. gcc/ChangeLog: * config/aarch64/aarch64.h (MAX_SET_SI

Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Wilco Dijkstra
Hi, >> Is there no benefit to using SWPPL for RELEASE here?  Similarly for the >> others. > > We started off implementing all possible memory orderings available. > Wilco saw value in merging less restricted orderings into more > restricted ones - mainly to reduce codesize in less frequently use

Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Wilco Dijkstra
Hi Richard, >> Benchmarking showed that LSE and LSE2 RMW atomics have similar performance >> once >> the atomic is acquire, release or both. Given there is already a significant >> overhead due >> to the function call, PLT indirection and argument setup, it doesn't make >> sense to add >> extra

Re: [PATCH v4] AArch64: Cleanup memset expansion

2024-01-09 Thread Wilco Dijkstra
Hi Richard, >> +#define MAX_SET_SIZE(speed) (speed ? 256 : 96) > > Since this isn't (AFAIK) a standard macro, there doesn't seem to be > any need to put it in the header file.  It could just go at the head > of aarch64.cc instead. Sure, I've moved it in v4. >> +  if (len <= 24 || (aarch64_tune_p

  1   2   3   4   5   6   7   8   9   10   >