expansion is now:
fmovs0, w0
cnt v0.8b, v0.8b
addvb0, v0.8b
fmovw0, s0
Bootstrap OK, passes regress.
ChangeLog
2020-02-02 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.md (popcount2): Improve expansion.
* config/aarch64/aarch64-simd.md
Hi Segher,
> On Thu, Jan 16, 2020 at 12:50:14PM +0000, Wilco Dijkstra wrote:
>> The separate shrinkwrapping pass may insert stores in the middle
>> of atomics loops which can cause issues on some implementations.
>> Avoid this by delaying splitting of atomic patterns until a
hoisting for -O3 and higher.
OK for commit?
ChangeLog:
2019-11-26 Wilco Dijkstra
PR tree-optimization/80155
* common/config/arm/arm-common.c (arm_option_optimization_table):
Disable -fcode-hoisting with -O3.
--
diff --git a/gcc/common/config/arm/arm-common.c
b/gcc/common
r3, r2, r3
add r0, r0, r3
bx lr
Bootstrap OK, OK for commit?
ChangeLog:
2019-09-11 Wilco Dijkstra
* config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
e07cf03538c5bb23e3285859b9e44a6
Hi Kewen,
Would it not make more sense to use the TARGET_ADDRESS_COST hook
to return different costs for immediate offset and register offset addressing,
and ensure IVOpts correctly takes this into account?
On AArch64 we've defined different costs for immediate offset, register offset,
register
Hi Kyrill & Richard,
> I was leaving this to others in case it was obvious to them. On the
> basis that silence suggests it wasn't, :-) could you go into more details?
> Is it expected on first principles that jump alignment doesn't matter
> for Neoverse N1, or is this purely based on
Hi Richard,
> If you're able to say for the record which cores you tested, then that'd
> be good.
I've mostly checked it on Cortex-A57 - if there is any affect, it would be on
older cores.
> OK, thanks. I agree there doesn't seem to be an obvious reason why this
> would pessimise any cores
floating point
code is generally beneficial (more registers and higher latencies), only enable
the pressure scheduler with -Ofast.
On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well
as a 0.2% codesize reduction.
Bootstrapped on armhf. OK for commit?
ChangeLog:
2019-11-06 Wilco
ping
Enable the most basic form of compare-branch fusion since various CPUs
support it. This has no measurable effect on cores which don't support
branch fusion, but increases fusion opportunities on cores which do.
Bootstrapped on AArch64, OK for commit?
ChangeLog:
2019-12-24 Wilco Dijkstra
ping
Testing shows the setting of 32:16 for jump alignment has a significant codesize
cost, however it doesn't make a difference in performance. So set jump-align
to 4 to get 1.6% codesize improvement.
OK for commit?
ChangeLog
2019-12-24 Wilco Dijkstra
* config/aarch64/aarch64.c
this fixes the failure you were getting?
ChangeLog:
2020-01-16 Wilco Dijkstra
PR target/92692
* config/aarch64/aarch64.c (aarch64_split_compare_and_swap)
Add assert to ensure prolog has been emitted.
(aarch64_split_atomic_op): Likewise.
* config/aarch64
ants. Check the type is a char type for the
string constant case to avoid accidentally matching a wide STRING_CST.
Add a tree_expr_nonzero_p check to allow the optimization even if
CTZ_DEFINED_VALUE_AT_ZERO returns 0 or 1. Add extra test cases.
Bootstrap OK on AArch64 and x64.
ChangeLog:
2020-01-15 Wil
returns 0 or 1. Add extra test cases.
(note the diff uses the old tree and includes Jakub's bootstrap fixes)
Bootstrap OK on AArch64 and x64.
ChangeLog:
2020-01-13 Wilco Dijkstra
PR tree-optimization/93231
* tree-ssa-forwprop.c
(optimize_count_trailing_zeroes): Use
Hi Jakub,
On Sat, Jan 11, 2020 at 05:30:52PM +0100, Jakub Jelinek wrote:
> On Sat, Jan 11, 2020 at 05:24:19PM +0100, Andreas Schwab wrote:
> > ../../gcc/tree-ssa-forwprop.c: In function 'bool
> > simplify_count_trailing_zeroes(gimple_stmt_iterator*)':
> > ../../gcc/tree-ssa-forwprop.c:1925:23:
Hi,
>On 1/6/20 7:10 AM, Jonathan Wakely wrote:
>> GCC now defaults to -fno-common. As a result, global
>> variable accesses are more efficient on various targets. In C, global
>> variables with multiple tentative definitions will result in linker
>> errors.
>
> This is better. I'd also
Testing shows the setting of 32:16 for jump alignment has a significant codesize
cost, however it doesn't make a difference in performance. So set jump-align
to 4 to get 1.6% codesize improvement.
OK for commit?
ChangeLog
2019-12-24 Wilco Dijkstra
* config/aarch64/aarch64.c
Enable the most basic form of compare-branch fusion since various CPUs
support it. This has no measurable effect on cores which don't support
branch fusion, but increases fusion opportunities on cores which do.
Bootstrapped on AArch64, OK for commit?
ChangeLog:
2019-12-24 Wilco Dijkstra
Hi,
>> I've noticed that your patch caused a regression:
>> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
>> "internal loop alignment added" 1
I've created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93007
Cheers,
Wilco
the same as for
Cortex-A65. Set the scheduler for Cortex-A65 and Cortex-A65AE to
cortexa53.
Bootstrap OK, OK for commit?
ChangeLog:
2019-12-17 Wilco Dijkstra
* config/aarch64/aarch64-cores.def:
("cortex-a76ae"): Use neoversen1 tuning.
("cortex-a77"): Likewise
-A65AE to
cortexa53.
Bootstrap OK, OK for commit?
ChangeLog:
2019-12-11 Wilco Dijkstra
* config/aarch64/aarch64-cores.def: Update settings for
cortex-a76ae, cortex-a77, cortex-a65, cortex-a65ae, neoverse-e1,
cortex-a76.cortex-a55.
--
diff --git a/gcc/config/aarch64/aarch64
rbitw0, w0
clz w0, w0
and w0, w0, 31
ret
Bootstrapped on AArch64. OK for commit?
ChangeLog:
2019-12-11 Wilco Dijkstra
PR tree-optimization/90838
* tree-ssa-forwprop.c (check_ctz_array): Add new function.
(check_ctz_string): Likew
Hi Christophe,
>> The warning is off by default so there is no need to do anything in the
>> testsuite,
>> you just need a fixed binutils.
>>
>
> Don't we want to fix GCC to stop generating the offending sequence?
Why? All ARMv8 implementations have to support it, and despite the warning
code
Hi Christophe,
> In practice, how do you activate it when running the GCC testsuite? Do
> you plan to send a GCC patch to enable this assembler flag, or do you
> locally enable that option by default in your binutils?
The warning is off by default so there is no need to do anything in the
Hi Christophe,
I've added an option to allow the warning to be enabled/disabled:
https://sourceware.org/ml/binutils/2019-12/msg00093.html
Cheers,
Wilco
Hi Christophe,
> This patch (r278968) is causing regressions when building GCC
> --target arm-none-linux-gnueabihf
> --with-mode thumb
> --with-cpu cortex-a57
> --with-fpu crypto-neon-fp-armv8
> because the assembler (gas version 2.33.1) complains:
> /ccc7z5eW.s:4267: IT blocks containing more
Hi,
I have updated the documentation patch here and added relevant maintainers
so hopefully this can go in soon:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00311.html
I moved the paragraph in changes.html to the C section like you suggested. Would
it make sense to link to the porting_to
Hi,
Add entries for the default change in changes.html and porting_to.html.
Passes the W3 validator.
Cheers,
Wilco
---
diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index
e02966460450b7aad884b2d45190b9ecd8c7a5d8..304e1e8ccd38795104156e86b92062696fa5aa8b
100644
---
Hi Jeff,
>> I've noticed quite significant package failures caused by the revision.
>> Would you please consider documenting this change in porting_to.html
>> (and in changes.html) for GCC 10 release?
>
> I'm not in the office right now, but figured I'd chime in. I'd estimate
> 400-500 packages
ith branch. Rename the existing
AARCH64_FUSE_CMP_BRANCH to ALU_BRANCH, and AARCH64_FUSE_ALU_BRANCH
to ALU_CBZ to make it clear what is being fused.
AArch64 bootstrap OK, OK to commit?
ChangeLog:
2019-12-03 Wilco Dijkstra
* config/aarch64/aarch64.c
(thunderxt88_tunin
on SPECINT2006 as well as a
0.4% codesize reduction.
Bootstrapped on armhf. OK for commit?
ChangeLog:
2019-12-03 Wilco Dijkstra
* config/arm/arm.c (arm_option_override_internal):
Use max_cond_insns from CPU tuning unless -mrestrict-it is used.
--
diff --git a/gcc/config/arm/arm.c b
to 5 due to historical reasons.
Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A
cores, so change it to 2. Set it to 4 on older in-order cores as that is
the MAX_INSN_PER_IT_BLOCK limit for Thumb-2.
Bootstrapped on armhf. OK for commit?
ChangeLog:
2019-12-03 Wilco Dijkstra
Add support for Cortex-A76, Ares and Neoverse N1 cpu names in GCC8 branch.
2019-11-29 Wilco Dijkstra
* config/aarch64/aarch64-cores.def (ares): Define.
(cortex-a76): Likewise.
(neoverse-n1): Likewise.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc
Hi,
I've backported r268189 to GCC8:
aarch64: fix use-after-free in -march=native (PR driver/89014)
Running:
$ valgrind ./xgcc -B. -c test.c -march=native
on aarch64 shows a use-after-free in host_detect_local_cpu due
to the std::string result of aarch64_get_extension_string_for_isa_flags
Hi,
Add support for fused compare with branch. Rename the existing
AARCH64_FUSE_CMP_BRANCH to ALU_BRANCH, and AARCH64_FUSE_ALU_BRANCH
to ALU_CBZ to make it clear what is being fused.
AArch64 bootstrap OK, OK to commit?
ChangeLog:
2019-11-29 Wilco Dijkstra
* config/aarch64/aarch64
Hi Martin,
> I've noticed quite significant package failures caused by the revision.
How significant? Is it mostly the common mistake of forgetting extern?
> Would you please consider documenting this change in porting_to.html
> (and in changes.html) for GCC 10 release?
Sure, I already had a
ch64. OK for commit?
ChangeLog:
2019-11-15 Wilco Dijkstra
PR tree-optimization/90838
* tree-ssa-forwprop.c (optimize_count_trailing_zeroes):
Add new function.
(simplify_count_trailing_zeroes): Add new function.
(pass_forwprop::execute): Try ctz simplif
Hi Richard,
>> Yes so it does the insane "fully unrolled trailing loop before the unrolled
>> loop" thing. One always does the trailing loop last (and typically as an
>> actual loop of course) and then the code ends up much faster, close to
>> the ideal version shown in the PR.
>
> Well, you
Hi Christophe,
> Some time ago, you proposed to enable code hoisting for -Os instead,
> and this is the approach that was chosen
> in arm-9-branch. Why are you proposing a different setting for trunk?
Like I said in my message, I've now done more detailed benchmarking which
shows it affects -O3
for -O3 and higher.
OK for commit?
ChangeLog:
2019-11-26 Wilco Dijkstra
PR tree-optimization/80155
* common/config/arm/arm-common.c (arm_option_optimization_table):
Disable -fcode-hoisting with -O3.
--
diff --git a/gcc/common/config/arm/arm-common.c
b/gcc/common/config
Hi Andrew,
Could you repost your patch please to make review easier/quicker? It's no
longer linked...
Cheers,
Wilco
Hi Andrew,
> Hi if we have a aarch64 compiler that has a big-endian
> multi-lib, it fails to compile libstdc++ because
> simd_fast_mersenne_twister_engine is only defined for little-endian
> in ext/random but ext/opt_random.h thinks it is defined always.
>
> OK? Built an aarch64-elf toolchain
Add a missing extern to ensure the test passes with -fno-common.
Committed as obvious.
ChangeLog:
2019-11-21 Wilco Dijkstra
testsuite/
* gfortran.dg/global_vars_f90_init_driver.c: Add missing extern.
--
diff --git a/gcc/testsuite/gfortran.dg/global_vars_f90_init_driver.c
b/gcc
Hi Rainer,
>> ld: warning: symbol 'err' has differing types:
>> (file /var/tmp//ccWQCyMc.o type=OBJT; file /lib/libc.so type=FUNC);
>> /var/tmp//ccWQCyMc.o definition taken
So are glob and err somehow exported as globals by your GLIBC? I don't think
those
are standard
The vrbit_1 test was missing a flag to disable code sharing.
Committed as obvious.
ChangeLog:
2019-11-20 Wilco Dijkstra
testsuite/
* gcc.target/aarch64/simd/vrbit_1.c: Add -fno-ipa-icf.
--
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vrbit_1.c
b/gcc/testsuite/gcc.target
Hi Richard,
> I acked this here:
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01229.html
Thanks - I missed your email, but it's committed now. Yes we will
need to look at the vector costs again and retune them based on
recent vectorizer improvements and latest microarchitectures.
Cheers,
the testcase - libquantum and SPECv6
performance improves.
OK for commit?
ChangeLog:
2018-01-22 Wilco Dijkstra
PR target/79262
* config/aarch64/aarch64.c (generic_vector_cost): Adjust
vec_to_scalar_cost.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
floating point
code is generally beneficial (more registers and higher latencies), only enable
the pressure scheduler with -Ofast.
On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well
as a 0.2% codesize reduction.
Bootstrapped on armhf. OK for commit?
ChangeLog:
2019-11-06 Wilco
to that by
MAX_INSN_PER_IT_BLOCK. Also use the CPU tuning setting when a CPU/tune
is selected if -mrestrict-it is not explicitly set.
On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well
as a 0.4% codesize reduction.
Bootstrapped on armhf. OK for commit?
ChangeLog:
2019-08-19 Wilco Dijkstra
,
and codesize reduces by 0.2%.
OK for commit?
ChangeLog:
2019-11-15 Wilco Dijkstra
* config/arm/arm-cpus.in (armv7): Set tune to Cortex-A53.
(armv7-a): Likewise.
(armv7ve): Likewise.
---
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index
Hi Richard,
> So what do we actually do unpatched with -funroll-loops here?
Yes so it does the insane "fully unrolled trailing loop before the unrolled
loop" thing. One always does the trailing loop last (and typically as an
actual loop of course) and then the code ends up much faster, close to
4. OK for commit?
ChangeLog:
2019-11-15 Wilco Dijkstra
PR tree-optimization/90838
* tree-ssa-forwprop.c (optimize_count_trailing_zeroes):
Add new function.
(simplify_count_trailing_zeroes): Add new function.
(pass_forwprop::execute): Try ctz simplification.
Hi Segher,
> Out of interest, what uses this? I have never seen it before.
It's used in sjeng in SPEC and gives a 2% speedup on Cortex-A57.
Tricks like this used to be very common 20 years ago since a loop or binary
search
is way too slow and few CPUs supported fast clz/ctz instructions. It's
18, 6, 11, 5, 10, 9
};
return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
}
Is optimized to:
rbitw0, w0
clz w0, w0
and w0, w0, 31
ret
Bootstrapped on AArch64. OK for commit?
ChangeLog:
2019-11-12 Wilco Dijkstra
. Also use the CPU tuning setting when a CPU/tune
is selected if -mrestrict-it is not explicitly set.
On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well
as a 0.4% codesize reduction.
Bootstrapped on armhf. OK for commit?
ChangeLog:
2019-08-19 Wilco Dijkstra
* gcc
point
code is generally beneficial (more registers and higher latencies), only enable
the pressure scheduler with -Ofast.
On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well
as a 0.2% codesize reduction.
Bootstrapped on armhf. OK for commit?
ChangeLog:
2019-11-06 Wilco
by -fcommon. It is about
time to change the default.
Passes bootstrap and regress on AArch64 and x64. OK for commit?
ChangeLog
2019-11-05 Wilco Dijkstra
PR85678
* common.opt (fcommon): Change init to 1.
doc/
* invoke.texi (-fcommon): Update documentation.
testsuite/
Hi Richard,
>> > Please don't add -fcommon in lto.exp.
>>
>> So what is the best way to add an extra option to lto.exp?
>> Note dg-lto-options completely overrides the options from lto.exp, so I can't
>> use that except in tests which already use it.
>
> On what testcases do you need it at all?
Hi Richard,
> Please don't add -fcommon in lto.exp.
So what is the best way to add an extra option to lto.exp?
Note dg-lto-options completely overrides the options from lto.exp, so I can't
use that except in tests which already use it.
Cheers,
Wilco
to C code only, C++ code is not affected by -fcommon. It is about
time to change the default.
Bootstrap OK, passes testsuite on AArch64. OK for commit?
ChangeLog
2019-10-29 Wilco Dijkstra
PR85678
* common.opt (fcommon): Change init to 1.
doc/
* invoke.texi (-fcommon
Hi Iain,
> for the record, Darwin bootstraps OK with the change (which is to be
> expected,
> since the preferred setting for it is -fno-common).
That's good to hear.
> Testsuite fails are order “a few hundred” mostly seem to be related to
> tree-prof
> and vector tests (plus the anticipated
Hi,
>> I suppose targets can override this decision.
> I think they probably could via the override_options mechanism.
Yes, it's trivial to add this to target_option_override():
if (!global_options_set.x_flag_no_common)
flag_no_common = 0;
Cheers,
Wilco
Hi Jeff,
> Has this been bootstrapped and regression tested?
Yes, it bootstraps OK of course. I ran regression over the weekend, there
are a few minor regressions in lto due to relying on tentative definitions
and a few latent bugs. I'd expect there will be a few similar failures on
other
by -fcommon. It is about
time to change the default.
OK for commit?
ChangeLog
2019-10-25 Wilco Dijkstra
PR85678
* common.opt (fcommon): Change init to 1.
doc/
* invoke.texi (-fcommon): Update documentation.
---
diff --git a/gcc/common.opt b/gcc/common.opt
index
Hi Christophe,
> I've noticed that your patch caused a regression:
> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
> "internal loop alignment added" 1
That's just a testism - it only tests for loop alignment and doesn't
consider the possibility of the loop being jumped into
Hi Richard,
> Sure, the "extern array of unknown size" case isn't about section anchors.
> But this part of my message (snipped above) was about the other case
> (objects of known size), and applied to individual objects as well as
> section anchors.
>
> What I was trying to say is: yes, we need
Hi Richard,
>> No - the testcases fail with that.
>
> Hmm, OK. Could you give more details? What does the motivating case
> actually look like?
Well it's now a very long time ago since I first posted this patch but the
failure
was in SPEC. It did something like [0xff000 - x], presumably
Hi,
> the defaults for v7-a are still to use the
> Cortex-A8 scheduler
I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old
now so
way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate
scheduler
that performs surprisingly well on both in-order
Hi Richard,
> If global_char really is a char then isn't that UB?
No why? We can do all kinds of arithmetic based on pointers, either using
pointer types or converted to uintptr_t. Note that the optimizer actually
creates
these expressions, for example arr[N-x] can be evaluated as ([0] + N) -
Hi Ramana,
> Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
> spread the range across some v7-a CPUs as well ? While they aren't that
> popular today I
> would suggest you look at them because the defaults for v7-a are still to use
> the
> Cortex-A8 scheduler and the
Hi Ramana,
>On Mon, Sep 9, 2019 at 6:03 PM Wilco Dijkstra wrote:
>>
>> Currently arm_legitimize_address doesn't handle Thumb-2 at all, resulting in
>> inefficient code. Since Thumb-2 supports similar address offsets use the Arm
>> legitimization code for Thumb-2
Hi Ramana,
> My only question would be whether it's more suitable to use
> optimize_function_for_size_p(cfun) instead as IIRC that gives us a
> chance with lto rather than the global optimize_size.
Yes that is even better and that defaults to optimize_size if cfun isn't
set. I've committed this:
the testcase - libquantum and SPECv6
performance improves.
OK for commit?
ChangeLog:
2018-01-22 Wilco Dijkstra
PR target/79262
* config/aarch64/aarch64.c (generic_vector_cost): Adjust
vec_to_scalar_cost.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
SPECFP improves 0.2%.
Bootstrap OK, OK for commit?
ChangeLog:
2019-09-09 Wilco Dijkstra
* config/arm/arm.c (arm_legitimize_address): Remove Thumb-2 bailout.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
a5a6a0fab1b4b7ef07931522e7d47e59842d7f27
?
ChangeLog:
2019-09-09 Wilco Dijkstra
* config/arm/arm.h (HONOR_REG_ALLOC_ORDER): Set when optimizing for
size.
--
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
8d023389eec469ad9c8a4e88edebdad5f3c23769..e3473e29fbbb964ff1136c226fbe30d35dbf7b39
100644
--- a/gcc
inux-gnueabihf --with-cpu=cortex-a57
ChangeLog:
2019-07-29 Wilco Dijkstra
* config/arm/arm.c (arm_option_override): Don't override sched
pressure algorithm.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
i
.
Bootstrapped on AArch64, passes regress, OK for commit?
ChangeLog:
2018-11-09 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.
testsuite/
* gcc.target/aarch64
for commit until we get rid of it?
ChangeLog:
2017-11-17 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index
056110afb228fb919e837c04aa5e5552a4868ec3
ap OK, OK for commit?
ChangeLog:
2019-09-11 Wilco Dijkstra
* config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
8b92c830de09a3ad49420fdfacde02d8efc2a89b..11212d988a0f56299c2266bace80170d074be56c
100644
--- a/gcc/config/arm/ar
Any further comments? Note GCC doesn't support S/UMULLS either since it is
equally
useless. It's no surprise that Thumb-2 removed support for flag-setting 64-bit
multiplies,
while AArch64 didn't add flag-setting multiplies. So there is no argument that
these
instructions are in any way useful
Hi Richard,
> Please reformat this as one mapping per line. Over time I expect this
> is only going to grow.
Sure, I've committed it reformatted as r275970.
Wilco
Hi Richard, Kyrill,
>> I disagree. If they still trigger and generate better code than without
>> we should keep them.
>
>> What kind of code is *common* varies greatly from user to user.
Not really - doing a multiply and checking whether the result is zero is
exceedingly rare. I found only 3
sting
optab one. Here is what I did:
[PATCH][ARM] Simplify logical DImode iterators
Further simplify the logical DImode expander using code iterator and
obtab attributes. This avoids adding unnecessary code_attr entries.
ChangeLog:
2019-09-19 Wilco Dijkstra
* config/arm/
Hi Kyrill,
> We should be able to "compress" the above 3 patterns into one using code
> iterators.
Good point, that makes sense. I've committed this:
ChangeLog:
2019-09-18 Wilco Dijkstra
PR target/91738
* config/arm/arm.md (di3): Expand explicitly.
Hi Kyrill,
>> + (mult:SI (match_operand:SI 3 "s_register_operand" "r")
>> + (match_operand:SI 2 "s_register_operand" "r"]
>
> Looks like we'll want to mark operand 2 here with '%' as well?
That doesn't make any difference since both operands are identical.
It only
Hi Richard,
> The issue with the bugzilla is that it lacked appropriate testcase(s) and thus
> it is now a mess. There are clear testcases (maybe not in the benchmarks you
Agreed - it's not clear whether any of the proposed changes would actually
help the original issue. My patch absolutely
Hi Christophe,
Can you explain this in more detail - it doesn't make sense to me to force the
Thumb bit during unwinding since it should already be correct, even on a
Thumb-only CPU. Perhaps the kernel code that pushes an incorrect address on
the stack could be fixed instead?
> Without this,
Hi Kyrill,
>> When you select a CPU the goal is that we optimize and schedule for that
>> specific microarchitecture. That implies using atomics that work best for
>> that core rather than outlining them.
>
> I think we want to go ahead with this framework to enable the portable
> deployment of
Hi Prathamesh,
> My only concern with the patch is that the issue isn't specific to
> code-hoisting.
> For this particular case (reproducible with pr77445-2.c), disabling
> jump threading
> doesn't cause the register spill with hoisting enabled.
> Likewise disabling forwprop3 and forwprop4
Hi Richard,
>> So what is the behaviour when you explicitly select a specific CPU?
>
> Selecting a specific cpu selects the specific architecture that the cpu
> supports, does it not? Thus the architecture example above still applies.
>
> Unless I don't understand what distinction that you're
Hi Richard,
> Do we document target specific deviations from "default" behavior somewhere?
Not as far as I know. The other option changes in arm-common.c are not mentioned
anywhere, neither is any of arm_option_override_internal.
If we want to keep documentation useful, we shouldn't clutter
Hi Jeff,
> We're talking about two instructions where if the first executes, then
> the second also executes. If the memory addresses are the same, then
> their alignment is the same.
>
> In your case the two instructions are on different execution paths and
> are in fact mutually exclusive.
Hi Paul,
> > On Sep 11, 2019, at 11:48 AM, Wilco Dijkstra wrote:
> >
> > Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
> > bitfields by their declared type, which results in better codegeneration
> > on practically any target. S
Hi Jeff,
Jeff wrote:
> Just to make sure I understand. Are you saying the addresses for the
> MEMs are equal or the contents of the memory location are equal.
>
> For the former the alignment has to be the same, plain and simple, even
> if GCC isn't aware the alignments have to be the same.
>
>
While code hoisting generally improves codesize, it can affect performance
negatively. Benchmarking shows it doesn't help SPEC and negatively affects
embedded benchmarks, so only enable code hoisting with -Os on Arm.
Bootstrap OK, OK for commit?
ChangeLog:
2019-09-11 Wilco Dijkstra
or commit?
ChangeLog:
2019-09-11 Wilco Dijkstra
* config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
8b92c830de09a3ad49420fdfacde02d8efc2a89b..11212d988a0f56299c2266bace80170d074be56c
100644
--- a/gcc/config/arm/arm.h
subreg issues due to other DImode operations splitting early.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-09-03 Wilco Dijkstra
* config/arm/arm.md (maddsidi4): Remove expander.
(mulsidi3adddi): Remove pattern.
(mulsidi3adddi_v6): Likewise
Wilco Dijkstra
* config/arm/arm.md (smulsi3_highpart): Use and iterators.
(smulsi3_highpart_nov6): Remove pattern.
(smulsi3_highpart_v6): Likewise.
(umulsi3_highpart): Likewise.
(umulsi3_highpart_nov6): Likewise.
(umulsi3_highpart_v6
ping
Cleanup the 32-bit multiply patterns. Merge the pre-Armv6 with the Armv6
patterns, remove useless alternatives and order the accumulator operands
to prefer MLA Ra, Rb, Rc, Ra whenever feasible.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-09-03 Wilco Dijkstra
ping
Remove various MULS/MLAS patterns which are enabled when optimizing for
size. However the codesize gain from these patterns is so minimal that
there is no point in keeping them.
Bootstrap OK on armhf, regress passes.
ChangeLog:
2019-09-03 Wilco Dijkstra
* config
references.
Bootstrapped on AArch64, passes regress, OK for commit?
ChangeLog:
2018-11-09 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.
testsuite
201 - 300 of 1125 matches
Mail list logo