[PATCH][AArch64] Improve popcount expansion

2020-02-03 Thread Wilco Dijkstra
expansion is now: fmovs0, w0 cnt v0.8b, v0.8b addvb0, v0.8b fmovw0, s0 Bootstrap OK, passes regress. ChangeLog 2020-02-02 Wilco Dijkstra gcc/ * config/aarch64/aarch64.md (popcount2): Improve expansion. * config/aarch64/aarch64-simd.md

Re: [PATCH][AArch64] Fix shrinkwrapping interactions with atomics (PR92692)

2020-01-27 Thread Wilco Dijkstra
Hi Segher, > On Thu, Jan 16, 2020 at 12:50:14PM +0000, Wilco Dijkstra wrote: >> The separate shrinkwrapping pass may insert stores in the middle >> of atomics loops which can cause issues on some implementations. >> Avoid this by delaying splitting of atomic patterns until a

Re: [PATCH v2][ARM] Disable code hoisting with -O3 (PR80155)

2020-01-21 Thread Wilco Dijkstra
hoisting for -O3 and higher. OK for commit? ChangeLog: 2019-11-26 Wilco Dijkstra PR tree-optimization/80155 * common/config/arm/arm-common.c (arm_option_optimization_table): Disable -fcode-hoisting with -O3. -- diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common

Re: [PATCH][ARM] Correctly set SLOW_BYTE_ACCESS

2020-01-21 Thread Wilco Dijkstra
r3, r2, r3 add r0, r0, r3 bx lr Bootstrap OK, OK for commit? ChangeLog: 2019-09-11 Wilco Dijkstra * config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1. -- diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index e07cf03538c5bb23e3285859b9e44a6

Re: [PATCH 3/4 GCC11] IVOPTs Consider cost_step on different forms during unrolling

2020-01-20 Thread Wilco Dijkstra
Hi Kewen, Would it not make more sense to use the TARGET_ADDRESS_COST hook to return different costs for immediate offset and register offset addressing, and ensure IVOpts correctly takes this into account? On AArch64 we've defined different costs for immediate offset, register offset, register

Re: [PATCH][AARCH64] Set jump-align=4 for neoversen1

2020-01-17 Thread Wilco Dijkstra
Hi Kyrill & Richard, > I was leaving this to others in case it was obvious to them. On the > basis that silence suggests it wasn't, :-) could you go into more details? > Is it expected on first principles that jump alignment doesn't matter > for Neoverse N1, or is this purely based on

Re: [PATCH][AARCH64] Enable compare branch fusion

2020-01-17 Thread Wilco Dijkstra
Hi Richard, > If you're able to say for the record which cores you tested, then that'd > be good. I've mostly checked it on Cortex-A57 - if there is any affect, it would be on older cores. > OK, thanks.  I agree there doesn't seem to be an obvious reason why this > would pessimise any cores

Re: [PATCH][Arm] Only enable fsched-pressure with Ofast

2020-01-16 Thread Wilco Dijkstra
floating point code is generally beneficial (more registers and higher latencies), only enable the pressure scheduler with -Ofast. On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well as a 0.2% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-11-06 Wilco

Re: [PATCH][AARCH64] Enable compare branch fusion

2020-01-16 Thread Wilco Dijkstra
ping Enable the most basic form of compare-branch fusion since various CPUs support it. This has no measurable effect on cores which don't support branch fusion, but increases fusion opportunities on cores which do. Bootstrapped on AArch64, OK for commit? ChangeLog: 2019-12-24 Wilco Dijkstra

Re: [PATCH][AARCH64] Set jump-align=4 for neoversen1

2020-01-16 Thread Wilco Dijkstra
ping Testing shows the setting of 32:16 for jump alignment has a significant codesize cost, however it doesn't make a difference in performance. So set jump-align to 4 to get 1.6% codesize improvement. OK for commit? ChangeLog 2019-12-24 Wilco Dijkstra * config/aarch64/aarch64.c

[PATCH][AArch64] Fix shrinkwrapping interactions with atomics (PR92692)

2020-01-16 Thread Wilco Dijkstra
this fixes the failure you were getting? ChangeLog: 2020-01-16 Wilco Dijkstra PR target/92692 * config/aarch64/aarch64.c (aarch64_split_compare_and_swap) Add assert to ensure prolog has been emitted. (aarch64_split_atomic_op): Likewise. * config/aarch64

Re: [PATCH] Fix ctz issues (PR93231)

2020-01-15 Thread Wilco Dijkstra
ants. Check the type is a char type for the string constant case to avoid accidentally matching a wide STRING_CST. Add a tree_expr_nonzero_p check to allow the optimization even if CTZ_DEFINED_VALUE_AT_ZERO returns 0 or 1. Add extra test cases. Bootstrap OK on AArch64 and x64. ChangeLog: 2020-01-15 Wil

[PATCH] Fix ctz issues (PR93231)

2020-01-13 Thread Wilco Dijkstra
returns 0 or 1. Add extra test cases. (note the diff uses the old tree and includes Jakub's bootstrap fixes) Bootstrap OK on AArch64 and x64. ChangeLog: 2020-01-13 Wilco Dijkstra PR tree-optimization/93231 * tree-ssa-forwprop.c (optimize_count_trailing_zeroes): Use

Re: [PATCH] Further bootstrap unbreak (was Re: [PATCH] PR90838: Support ctz idioms)

2020-01-13 Thread Wilco Dijkstra
Hi Jakub, On Sat, Jan 11, 2020 at 05:30:52PM +0100, Jakub Jelinek wrote: > On Sat, Jan 11, 2020 at 05:24:19PM +0100, Andreas Schwab wrote: > > ../../gcc/tree-ssa-forwprop.c: In function 'bool > > simplify_count_trailing_zeroes(gimple_stmt_iterator*)': > > ../../gcc/tree-ssa-forwprop.c:1925:23:

Re: [wwwdocs] Document -fcommon default change

2020-01-07 Thread Wilco Dijkstra
Hi, >On 1/6/20 7:10 AM, Jonathan Wakely wrote: >> GCC now defaults to -fno-common.  As a result, global >> variable accesses are more efficient on various targets.  In C, global >> variables with multiple tentative definitions will result in linker >> errors. > > This is better.  I'd also

[PATCH][AARCH64] Set jump-align=4 for neoversen1

2019-12-24 Thread Wilco Dijkstra
Testing shows the setting of 32:16 for jump alignment has a significant codesize cost, however it doesn't make a difference in performance. So set jump-align to 4 to get 1.6% codesize improvement. OK for commit? ChangeLog 2019-12-24 Wilco Dijkstra * config/aarch64/aarch64.c

[PATCH][AARCH64] Enable compare branch fusion

2019-12-24 Thread Wilco Dijkstra
Enable the most basic form of compare-branch fusion since various CPUs support it. This has no measurable effect on cores which don't support branch fusion, but increases fusion opportunities on cores which do. Bootstrapped on AArch64, OK for commit? ChangeLog: 2019-12-24 Wilco Dijkstra

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-12-19 Thread Wilco Dijkstra
Hi, >> I've noticed that your patch caused a regression: >> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments >> "internal loop alignment added" 1 I've created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93007 Cheers, Wilco

Re: [PATCH][AArch64] Fixup core tunings

2019-12-17 Thread Wilco Dijkstra
the same as for Cortex-A65. Set the scheduler for Cortex-A65 and Cortex-A65AE to cortexa53. Bootstrap OK, OK for commit? ChangeLog: 2019-12-17 Wilco Dijkstra * config/aarch64/aarch64-cores.def: ("cortex-a76ae"): Use neoversen1 tuning. ("cortex-a77"): Likewise

[PATCH][AArch64] Fixup core tunings

2019-12-13 Thread Wilco Dijkstra
-A65AE to cortexa53. Bootstrap OK, OK for commit? ChangeLog: 2019-12-11 Wilco Dijkstra * config/aarch64/aarch64-cores.def: Update settings for cortex-a76ae, cortex-a77, cortex-a65, cortex-a65ae, neoverse-e1, cortex-a76.cortex-a55. -- diff --git a/gcc/config/aarch64/aarch64

Re: [PATCH] PR90838: Support ctz idioms

2019-12-11 Thread Wilco Dijkstra
rbitw0, w0 clz w0, w0 and w0, w0, 31 ret Bootstrapped on AArch64. OK for commit? ChangeLog: 2019-12-11 Wilco Dijkstra PR tree-optimization/90838 * tree-ssa-forwprop.c (check_ctz_array): Add new function. (check_ctz_string): Likew

Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-09 Thread Wilco Dijkstra
Hi Christophe, >> The warning is off by default so there is no need to do anything in the >> testsuite, >> you just need a fixed binutils. >> > > Don't we want to fix GCC to stop generating the offending sequence? Why? All ARMv8 implementations have to support it, and despite the warning code

Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-06 Thread Wilco Dijkstra
Hi Christophe, > In practice, how do you activate it when running the GCC testsuite? Do > you plan to send a GCC patch to enable this assembler flag, or do you > locally enable that option by default in your binutils? The warning is off by default so there is no need to do anything in the

Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-06 Thread Wilco Dijkstra
Hi Christophe, I've added an option to allow the warning to be enabled/disabled: https://sourceware.org/ml/binutils/2019-12/msg00093.html Cheers, Wilco

Re: [PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-06 Thread Wilco Dijkstra
Hi Christophe, > This patch (r278968) is causing regressions when building GCC > --target arm-none-linux-gnueabihf > --with-mode thumb > --with-cpu cortex-a57 > --with-fpu crypto-neon-fp-armv8 > because the assembler (gas version 2.33.1) complains: > /ccc7z5eW.s:4267: IT blocks containing more

Re: [PATCH] PR85678: Change default to -fno-common

2019-12-05 Thread Wilco Dijkstra
Hi, I have updated the documentation patch here and added relevant maintainers so hopefully this can go in soon: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00311.html I moved the paragraph in changes.html to the C section like you suggested. Would it make sense to link to the porting_to

[wwwdocs] Document -fcommon default change

2019-12-05 Thread Wilco Dijkstra
Hi, Add entries for the default change in changes.html and porting_to.html. Passes the W3 validator. Cheers, Wilco --- diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html index e02966460450b7aad884b2d45190b9ecd8c7a5d8..304e1e8ccd38795104156e86b92062696fa5aa8b 100644 ---

Re: [PATCH] PR85678: Change default to -fno-common

2019-12-04 Thread Wilco Dijkstra
Hi Jeff, >> I've noticed quite significant package failures caused by the revision. >> Would you please consider documenting this change in porting_to.html >> (and in changes.html) for GCC 10 release? > > I'm not in the office right now, but figured I'd chime in.  I'd estimate > 400-500 packages

Re: [PATCH][AArch64] Add support for fused compare and branch

2019-12-03 Thread Wilco Dijkstra
ith branch. Rename the existing AARCH64_FUSE_CMP_BRANCH to ALU_BRANCH, and AARCH64_FUSE_ALU_BRANCH to ALU_CBZ to make it clear what is being fused. AArch64 bootstrap OK, OK to commit? ChangeLog: 2019-12-03 Wilco Dijkstra * config/aarch64/aarch64.c (thunderxt88_tunin

[PATCH v2 2/2][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-03 Thread Wilco Dijkstra
on SPECINT2006 as well as a 0.4% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-12-03 Wilco Dijkstra * config/arm/arm.c (arm_option_override_internal): Use max_cond_insns from CPU tuning unless -mrestrict-it is used. -- diff --git a/gcc/config/arm/arm.c b

Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores

2019-12-03 Thread Wilco Dijkstra
to 5 due to historical reasons. Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A cores, so change it to 2. Set it to 4 on older in-order cores as that is the MAX_INSN_PER_IT_BLOCK limit for Thumb-2. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-12-03 Wilco Dijkstra

[PATCH][GCC8][AArch64] Backport Cortex-A76, Ares and Neoverse N1 cpu names

2019-12-02 Thread Wilco Dijkstra
Add support for Cortex-A76, Ares and Neoverse N1 cpu names in GCC8 branch. 2019-11-29 Wilco Dijkstra * config/aarch64/aarch64-cores.def (ares): Define. (cortex-a76): Likewise. (neoverse-n1): Likewise. * config/aarch64/aarch64-tune.md: Regenerate. * doc

[COMMITTED][GCC8] Backport driver/89014 Use-after-free in aarch64 -march=native

2019-11-29 Thread Wilco Dijkstra
Hi, I've backported r268189 to GCC8: aarch64: fix use-after-free in -march=native (PR driver/89014) Running: $ valgrind ./xgcc -B. -c test.c -march=native on aarch64 shows a use-after-free in host_detect_local_cpu due to the std::string result of aarch64_get_extension_string_for_isa_flags

[PATCH][AArch64] Add support for fused compare and branch

2019-11-29 Thread Wilco Dijkstra
Hi, Add support for fused compare with branch. Rename the existing AARCH64_FUSE_CMP_BRANCH to ALU_BRANCH, and AARCH64_FUSE_ALU_BRANCH to ALU_CBZ to make it clear what is being fused. AArch64 bootstrap OK, OK to commit? ChangeLog: 2019-11-29 Wilco Dijkstra * config/aarch64/aarch64

Re: [PATCH] PR85678: Change default to -fno-common

2019-11-29 Thread Wilco Dijkstra
Hi Martin, > I've noticed quite significant package failures caused by the revision. How significant? Is it mostly the common mistake of forgetting extern? > Would you please consider documenting this change in porting_to.html > (and in changes.html) for GCC 10 release? Sure, I already had a

Re: [PATCH] PR90838: Support ctz idioms

2019-11-28 Thread Wilco Dijkstra
ch64. OK for commit? ChangeLog: 2019-11-15 Wilco Dijkstra PR tree-optimization/90838 * tree-ssa-forwprop.c (optimize_count_trailing_zeroes): Add new function. (simplify_count_trailing_zeroes): Add new function. (pass_forwprop::execute): Try ctz simplif

Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64

2019-11-27 Thread Wilco Dijkstra
Hi Richard, >> Yes so it does the insane "fully unrolled trailing loop before the unrolled >> loop" thing. One always does the trailing loop last (and typically as an >> actual loop of course) and then the code ends up much faster, close to >> the ideal version shown in the PR. > > Well, you

Re: [PATCH v2][ARM] Disable code hoisting with -O3 (PR80155)

2019-11-26 Thread Wilco Dijkstra
Hi Christophe, > Some time ago, you proposed to enable code hoisting for -Os instead, > and this is the approach that was chosen > in arm-9-branch. Why are you proposing a different setting for trunk? Like I said in my message, I've now done more detailed benchmarking which shows it affects -O3

[PATCH v2][ARM] Disable code hoisting with -O3 (PR80155)

2019-11-26 Thread Wilco Dijkstra
for -O3 and higher. OK for commit? ChangeLog: 2019-11-26 Wilco Dijkstra PR tree-optimization/80155 * common/config/arm/arm-common.c (arm_option_optimization_table): Disable -fcode-hoisting with -O3. -- diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config

Re: [PATCH/AARCH64] Generate FRINTZ for (double)(long) under -ffast-math on aarch64

2019-11-26 Thread Wilco Dijkstra
Hi Andrew, Could you repost your patch please to make review easier/quicker? It's no longer linked... Cheers, Wilco

Re: [PATCH] Fix libstdc++ compiling for an aarch64 multilib with big-endian.

2019-11-26 Thread Wilco Dijkstra
Hi Andrew, > Hi if we have a aarch64 compiler that has a big-endian > multi-lib, it fails to compile libstdc++ because > simd_fast_mersenne_twister_engine is only defined for little-endian > in ext/random but ext/opt_random.h thinks it is defined always. > > OK? Built an aarch64-elf toolchain

[COMMITTED] Fix global_vars_f90_init test failure

2019-11-21 Thread Wilco Dijkstra
Add a missing extern to ensure the test passes with -fno-common. Committed as obvious. ChangeLog: 2019-11-21 Wilco Dijkstra testsuite/ * gfortran.dg/global_vars_f90_init_driver.c: Add missing extern. -- diff --git a/gcc/testsuite/gfortran.dg/global_vars_f90_init_driver.c b/gcc

Re: [PATCH] Fix libgo build (was Re: [PATCH v3] PR85678: Change default to -fno-common)

2019-11-21 Thread Wilco Dijkstra
Hi Rainer, >> ld: warning: symbol 'err' has differing types: >> (file /var/tmp//ccWQCyMc.o type=OBJT; file /lib/libc.so type=FUNC); >> /var/tmp//ccWQCyMc.o definition taken So are glob and err somehow exported as globals by your GLIBC? I don't think those are standard

[COMMITTED][AArch64] Fix vrbit_1.c test failure

2019-11-20 Thread Wilco Dijkstra
The vrbit_1 test was missing a flag to disable code sharing. Committed as obvious. ChangeLog: 2019-11-20 Wilco Dijkstra testsuite/ * gcc.target/aarch64/simd/vrbit_1.c: Add -fno-ipa-icf. -- diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vrbit_1.c b/gcc/testsuite/gcc.target

Re: [PATCH][AArch64] PR79262: Adjust vector cost

2019-11-19 Thread Wilco Dijkstra
Hi Richard, > I acked this here: > https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01229.html Thanks - I missed your email, but it's committed now. Yes we will need to look at the vector costs again and retune them based on recent vectorizer improvements and latest microarchitectures. Cheers,

Re: [PATCH][AArch64] PR79262: Adjust vector cost

2019-11-19 Thread Wilco Dijkstra
the testcase - libquantum and SPECv6 performance improves. OK for commit? ChangeLog: 2018-01-22 Wilco Dijkstra PR target/79262 * config/aarch64/aarch64.c (generic_vector_cost): Adjust vec_to_scalar_cost. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

Re: [PATCH][Arm] Only enable fsched-pressure with Ofast

2019-11-19 Thread Wilco Dijkstra
floating point code is generally beneficial (more registers and higher latencies), only enable the pressure scheduler with -Ofast. On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well as a 0.2% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-11-06 Wilco

Re: [PATCH][ARM] Improve max_cond_insns setting for Cortex cores

2019-11-19 Thread Wilco Dijkstra
to that by MAX_INSN_PER_IT_BLOCK. Also use the CPU tuning setting when a CPU/tune is selected if -mrestrict-it is not explicitly set. On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well as a 0.4% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-08-19 Wilco Dijkstra

[PATCH][Arm] Set Armv7-A tune to Cortex-A53

2019-11-18 Thread Wilco Dijkstra
, and codesize reduces by 0.2%. OK for commit? ChangeLog: 2019-11-15 Wilco Dijkstra * config/arm/arm-cpus.in (armv7): Set tune to Cortex-A53. (armv7-a): Likewise. (armv7ve): Likewise. --- diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index

Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64

2019-11-15 Thread Wilco Dijkstra
Hi Richard, > So what do we actually do unpatched with -funroll-loops here? Yes so it does the insane "fully unrolled trailing loop before the unrolled loop" thing. One always does the trailing loop last (and typically as an actual loop of course) and then the code ends up much faster, close to

Re: [PATCH] PR90838: Support ctz idioms

2019-11-15 Thread Wilco Dijkstra
4. OK for commit? ChangeLog: 2019-11-15 Wilco Dijkstra PR tree-optimization/90838 * tree-ssa-forwprop.c (optimize_count_trailing_zeroes): Add new function. (simplify_count_trailing_zeroes): Add new function. (pass_forwprop::execute): Try ctz simplification.

Re: [PATCH] PR90838: Support ctz idioms

2019-11-13 Thread Wilco Dijkstra
Hi Segher, > Out of interest, what uses this? I have never seen it before. It's used in sjeng in SPEC and gives a 2% speedup on Cortex-A57. Tricks like this used to be very common 20 years ago since a loop or binary search is way too slow and few CPUs supported fast clz/ctz instructions. It's

[PATCH] PR90838: Support ctz idioms

2019-11-12 Thread Wilco Dijkstra
18, 6, 11, 5, 10, 9 }; return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27]; } Is optimized to: rbitw0, w0 clz w0, w0 and w0, w0, 31 ret Bootstrapped on AArch64. OK for commit? ChangeLog: 2019-11-12 Wilco Dijkstra

[PATCH][ARM] Improve max_cond_insns setting for Cortex cores

2019-11-06 Thread Wilco Dijkstra
. Also use the CPU tuning setting when a CPU/tune is selected if -mrestrict-it is not explicitly set. On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well as a 0.4% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-08-19 Wilco Dijkstra * gcc

[PATCH][Arm] Only enable fsched-pressure with Ofast

2019-11-06 Thread Wilco Dijkstra
point code is generally beneficial (more registers and higher latencies), only enable the pressure scheduler with -Ofast. On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well as a 0.2% codesize reduction. Bootstrapped on armhf. OK for commit? ChangeLog: 2019-11-06 Wilco

Re: [PATCH v3] PR85678: Change default to -fno-common

2019-11-05 Thread Wilco Dijkstra
by -fcommon. It is about time to change the default. Passes bootstrap and regress on AArch64 and x64. OK for commit? ChangeLog 2019-11-05 Wilco Dijkstra PR85678 * common.opt (fcommon): Change init to 1. doc/ * invoke.texi (-fcommon): Update documentation. testsuite/

Re: [PATCH v2] PR85678: Change default to -fno-common

2019-11-04 Thread Wilco Dijkstra
Hi Richard, >> > Please don't add -fcommon in lto.exp. >> >> So what is the best way to add an extra option to lto.exp? >> Note dg-lto-options completely overrides the options from lto.exp, so I can't >> use that except in tests which already use it. > > On what testcases do you need it at all?

Re: [PATCH v2] PR85678: Change default to -fno-common

2019-10-30 Thread Wilco Dijkstra
Hi Richard, > Please don't add -fcommon in lto.exp. So what is the best way to add an extra option to lto.exp? Note dg-lto-options completely overrides the options from lto.exp, so I can't use that except in tests which already use it. Cheers, Wilco

[PATCH v2] PR85678: Change default to -fno-common

2019-10-29 Thread Wilco Dijkstra
to C code only, C++ code is not affected by -fcommon. It is about time to change the default. Bootstrap OK, passes testsuite on AArch64. OK for commit? ChangeLog 2019-10-29 Wilco Dijkstra PR85678 * common.opt (fcommon): Change init to 1. doc/ * invoke.texi (-fcommon

Re: [PATCH] PR85678: Change default to -fno-common

2019-10-29 Thread Wilco Dijkstra
Hi Iain, > for the record,  Darwin bootstraps OK with the change (which is to be > expected, > since the preferred setting for it is -fno-common). That's good to hear. > Testsuite fails are order “a few hundred” mostly seem to be related to > tree-prof > and vector tests (plus the anticipated

Re: [PATCH] PR85678: Change default to -fno-common

2019-10-28 Thread Wilco Dijkstra
Hi, >> I suppose targets can override this decision. > I think they probably could via the override_options mechanism. Yes, it's trivial to add this to target_option_override(): if (!global_options_set.x_flag_no_common) flag_no_common = 0; Cheers, Wilco

Re: [PATCH] PR85678: Change default to -fno-common

2019-10-28 Thread Wilco Dijkstra
Hi Jeff, > Has this been bootstrapped and regression tested? Yes, it bootstraps OK of course. I ran regression over the weekend, there are a few minor regressions in lto due to relying on tentative definitions and a few latent bugs. I'd expect there will be a few similar failures on other

[PATCH] PR85678: Change default to -fno-common

2019-10-25 Thread Wilco Dijkstra
by -fcommon. It is about time to change the default. OK for commit? ChangeLog 2019-10-25 Wilco Dijkstra PR85678 * common.opt (fcommon): Change init to 1. doc/ * invoke.texi (-fcommon): Update documentation. --- diff --git a/gcc/common.opt b/gcc/common.opt index

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-16 Thread Wilco Dijkstra
Hi Christophe, > I've noticed that your patch caused a regression: > FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments > "internal loop alignment added" 1 That's just a testism - it only tests for loop alignment and doesn't consider the possibility of the loop being jumped into

Re: [PATCH][AArch64] Fix symbol offset limit

2019-10-15 Thread Wilco Dijkstra
Hi Richard, > Sure, the "extern array of unknown size" case isn't about section anchors. > But this part of my message (snipped above) was about the other case > (objects of known size), and applied to individual objects as well as > section anchors. > > What I was trying to say is: yes, we need

Re: [PATCH][AArch64] Fix symbol offset limit

2019-10-14 Thread Wilco Dijkstra
Hi Richard, >> No - the testcases fail with that. > > Hmm, OK. Could you give more details? What does the motivating case > actually look like? Well it's now a very long time ago since I first posted this patch but the failure was in SPEC. It did something like [0xff000 - x], presumably

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-11 Thread Wilco Dijkstra
Hi, > the defaults for v7-a are still to use the > Cortex-A8 scheduler I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old now so way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate scheduler that performs surprisingly well on both in-order

Re: [PATCH][AArch64] Fix symbol offset limit

2019-10-11 Thread Wilco Dijkstra
Hi Richard, > If global_char really is a char then isn't that UB? No why? We can do all kinds of arithmetic based on pointers, either using pointer types or converted to uintptr_t. Note that the optimizer actually creates these expressions, for example arr[N-x] can be evaluated as ([0] + N) -

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-11 Thread Wilco Dijkstra
Hi Ramana, > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to > spread the range across some v7-a CPUs as well ? While they aren't that > popular today I > would suggest you look at them because the defaults for v7-a are still to use > the > Cortex-A8 scheduler and the

Re: [PATCH][ARM] Enable arm_legitimize_address for Thumb-2

2019-10-11 Thread Wilco Dijkstra
Hi Ramana, >On Mon, Sep 9, 2019 at 6:03 PM Wilco Dijkstra wrote: >> >> Currently arm_legitimize_address doesn't handle Thumb-2 at all, resulting in >> inefficient code. Since Thumb-2 supports similar address offsets use the Arm >> legitimization code for Thumb-2

Re: [PATCH][ARM] Tweak HONOR_REG_ALLOC_ORDER

2019-10-11 Thread Wilco Dijkstra
Hi Ramana, > My only question would be whether it's more suitable to use > optimize_function_for_size_p(cfun) instead as IIRC that gives us a > chance with lto rather than the global optimize_size. Yes that is even better and that defaults to optimize_size if cfun isn't set. I've committed this:

Re: [PATCH][AArch64] PR79262: Adjust vector cost

2019-10-10 Thread Wilco Dijkstra
the testcase - libquantum and SPECv6 performance improves. OK for commit? ChangeLog: 2018-01-22 Wilco Dijkstra PR target/79262 * config/aarch64/aarch64.c (generic_vector_cost): Adjust vec_to_scalar_cost. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

Re: [PATCH][ARM] Enable arm_legitimize_address for Thumb-2

2019-10-10 Thread Wilco Dijkstra
SPECFP improves 0.2%. Bootstrap OK, OK for commit? ChangeLog: 2019-09-09 Wilco Dijkstra * config/arm/arm.c (arm_legitimize_address): Remove Thumb-2 bailout. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a5a6a0fab1b4b7ef07931522e7d47e59842d7f27

Re: [PATCH][ARM] Tweak HONOR_REG_ALLOC_ORDER

2019-10-10 Thread Wilco Dijkstra
? ChangeLog: 2019-09-09 Wilco Dijkstra * config/arm/arm.h (HONOR_REG_ALLOC_ORDER): Set when optimizing for size. -- diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 8d023389eec469ad9c8a4e88edebdad5f3c23769..e3473e29fbbb964ff1136c226fbe30d35dbf7b39 100644 --- a/gcc

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-10 Thread Wilco Dijkstra
inux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c i

Re: [PATCH][AArch64] Fix symbol offset limit

2019-10-10 Thread Wilco Dijkstra
. Bootstrapped on AArch64, passes regress, OK for commit? ChangeLog: 2018-11-09 Wilco Dijkstra gcc/ * config/aarch64/aarch64.c (aarch64_classify_symbol): Apply reasonable limit to symbol offsets. testsuite/ * gcc.target/aarch64

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2019-10-10 Thread Wilco Dijkstra
for commit until we get rid of it? ChangeLog: 2017-11-17 Wilco Dijkstra gcc/ * config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1. -- diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 056110afb228fb919e837c04aa5e5552a4868ec3

Re: [PATCH][ARM] Correctly set SLOW_BYTE_ACCESS

2019-10-10 Thread Wilco Dijkstra
ap OK, OK for commit? ChangeLog: 2019-09-11 Wilco Dijkstra * config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1. -- diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 8b92c830de09a3ad49420fdfacde02d8efc2a89b..11212d988a0f56299c2266bace80170d074be56c 100644 --- a/gcc/config/arm/ar

Re: [PATCH][ARM] Remove support for MULS

2019-10-10 Thread Wilco Dijkstra
Any further comments? Note GCC doesn't support S/UMULLS either since it is equally useless. It's no surprise that Thumb-2 removed support for flag-setting 64-bit multiplies, while AArch64 didn't add flag-setting multiplies. So there is no argument that these instructions are in any way useful

Re: [PATCH][ARM] Add logical DImode expanders

2019-09-19 Thread Wilco Dijkstra
Hi Richard, > Please reformat this as one mapping per line.  Over time I expect this > is only going to grow. Sure, I've committed it reformatted as r275970. Wilco

Re: [PATCH][ARM] Remove support for MULS

2019-09-19 Thread Wilco Dijkstra
Hi Richard, Kyrill, >> I disagree. If they still trigger and generate better code than without >> we should keep them. > >> What kind of code is *common* varies greatly from user to user. Not really - doing a multiply and checking whether the result is zero is exceedingly rare. I found only 3

Re: [PATCH][ARM] Add logical DImode expanders

2019-09-19 Thread Wilco Dijkstra
sting optab one. Here is what I did: [PATCH][ARM] Simplify logical DImode iterators Further simplify the logical DImode expander using code iterator and obtab attributes. This avoids adding unnecessary code_attr entries. ChangeLog: 2019-09-19 Wilco Dijkstra * config/arm/

Re: [PATCH][ARM] Add logical DImode expanders

2019-09-18 Thread Wilco Dijkstra
Hi Kyrill, > We should be able to "compress" the above 3 patterns into one using code > iterators. Good point, that makes sense. I've committed this: ChangeLog: 2019-09-18 Wilco Dijkstra PR target/91738 * config/arm/arm.md (di3): Expand explicitly.

Re: [PATCH][ARM] Cleanup multiply patterns

2019-09-18 Thread Wilco Dijkstra
Hi Kyrill, >>  + (mult:SI (match_operand:SI 3 "s_register_operand" "r") >>  +  (match_operand:SI 2 "s_register_operand" "r"] > > Looks like we'll want to mark operand 2 here with '%' as well? That doesn't make any difference since both operands are identical. It only

Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)

2019-09-17 Thread Wilco Dijkstra
Hi Richard, > The issue with the bugzilla is that it lacked appropriate testcase(s) and thus > it is now a mess.  There are clear testcases (maybe not in the benchmarks you Agreed - it's not clear whether any of the proposed changes would actually help the original issue. My patch absolutely

Re: [ARM/FDPIC v6 13/24] [ARM] FDPIC: Force LSB bit for PC in Cortex-M architecture

2019-09-17 Thread Wilco Dijkstra
Hi Christophe, Can you explain this in more detail - it doesn't make sense to me to force the Thumb bit during unwinding since it should already be correct, even on a Thumb-only CPU. Perhaps the kernel code that pushes an incorrect address on the stack could be fixed instead? > Without this,

Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line

2019-09-17 Thread Wilco Dijkstra
Hi Kyrill, >> When you select a CPU the goal is that we optimize and schedule for that >> specific microarchitecture. That implies using atomics that work best for >> that core rather than outlining them. > > I think we want to go ahead with this framework to enable the portable > deployment of

Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)

2019-09-16 Thread Wilco Dijkstra
Hi Prathamesh, > My only concern with the patch is that the issue isn't specific to > code-hoisting. > For this particular case (reproducible with pr77445-2.c), disabling > jump threading > doesn't cause the register spill with hoisting enabled. > Likewise disabling forwprop3 and forwprop4

Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line

2019-09-16 Thread Wilco Dijkstra
Hi Richard, >> So what is the behaviour when you explicitly select a specific CPU? > > Selecting a specific cpu selects the specific architecture that the cpu > supports, does it not?  Thus the architecture example above still applies. > > Unless I don't understand what distinction that you're

Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)

2019-09-12 Thread Wilco Dijkstra
Hi Richard, > Do we document target specific deviations from "default" behavior somewhere? Not as far as I know. The other option changes in arm-common.c are not mentioned anywhere, neither is any of arm_option_override_internal. If we want to keep documentation useful, we shouldn't clutter

Re: [PATCH] Fix PR 91708

2019-09-11 Thread Wilco Dijkstra
Hi Jeff, > We're talking about two instructions where if the first executes, then > the second also executes.  If the memory addresses are the same, then > their alignment is the same. > > In your case the two instructions are on different execution paths and > are in fact mutually exclusive.

Re: [PATCH][ARM] Correctly set SLOW_BYTE_ACCESS

2019-09-11 Thread Wilco Dijkstra
Hi Paul, > > On Sep 11, 2019, at 11:48 AM, Wilco Dijkstra wrote: > > > > Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing > > bitfields by their declared type, which results in better codegeneration > > on practically any target.  S

Re: [PATCH] Fix PR 91708

2019-09-11 Thread Wilco Dijkstra
Hi Jeff, Jeff wrote: > Just to make sure I understand. Are you saying the addresses for the > MEMs are equal or the contents of the memory location are equal. > > For the former the alignment has to be the same, plain and simple, even > if GCC isn't aware the alignments have to be the same. > >

[PATCH][ARM] Enable code hoisting with -Os (PR80155)

2019-09-11 Thread Wilco Dijkstra
While code hoisting generally improves codesize, it can affect performance negatively. Benchmarking shows it doesn't help SPEC and negatively affects embedded benchmarks, so only enable code hoisting with -Os on Arm. Bootstrap OK, OK for commit? ChangeLog: 2019-09-11 Wilco Dijkstra

[PATCH][ARM] Correctly set SLOW_BYTE_ACCESS

2019-09-11 Thread Wilco Dijkstra
or commit? ChangeLog: 2019-09-11 Wilco Dijkstra * config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1. -- diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 8b92c830de09a3ad49420fdfacde02d8efc2a89b..11212d988a0f56299c2266bace80170d074be56c 100644 --- a/gcc/config/arm/arm.h

Re: [PATCH][ARM] Cleanup 64-bit multiplies

2019-09-09 Thread Wilco Dijkstra
subreg issues due to other DImode operations splitting early. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-09-03  Wilco Dijkstra      * config/arm/arm.md (maddsidi4): Remove expander.     (mulsidi3adddi): Remove pattern.     (mulsidi3adddi_v6): Likewise

Re: [PATCH][ARM] Cleanup highpart multiply patterns

2019-09-09 Thread Wilco Dijkstra
  Wilco Dijkstra      * config/arm/arm.md (smulsi3_highpart): Use and iterators.     (smulsi3_highpart_nov6): Remove pattern.     (smulsi3_highpart_v6): Likewise.     (umulsi3_highpart): Likewise.     (umulsi3_highpart_nov6): Likewise.     (umulsi3_highpart_v6

Re: [PATCH][ARM] Cleanup multiply patterns

2019-09-09 Thread Wilco Dijkstra
ping   Cleanup the 32-bit multiply patterns.  Merge the pre-Armv6 with the Armv6 patterns, remove useless alternatives and order the accumulator operands to prefer MLA Ra, Rb, Rc, Ra whenever feasible. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-09-03  Wilco Dijkstra

Re: [PATCH][ARM] Remove support for MULS

2019-09-09 Thread Wilco Dijkstra
ping   Remove various MULS/MLAS patterns which are enabled when optimizing for size.  However the codesize gain from these patterns is so minimal that there is no point in keeping them. Bootstrap OK on armhf, regress passes. ChangeLog: 2019-09-03  Wilco Dijkstra      * config

Re: [PATCH][AArch64] Fix symbol offset limit

2019-09-09 Thread Wilco Dijkstra
references.        Bootstrapped on AArch64, passes regress, OK for commit?        ChangeLog:    2018-11-09  Wilco Dijkstra         gcc/    * config/aarch64/aarch64.c (aarch64_classify_symbol):    Apply reasonable limit to symbol offsets.        testsuite

<    1   2   3   4   5   6   7   8   9   10   >