https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110619
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108441
--- Comment #4 from Peter Cordes ---
This is already fixed in current trunk; sorry I forgot to check that before
recommending to report this store-coalescing bug.
# https://godbolt.org/z/j3MdWrcWM
# GCC nightly -O3 (tune=generic) and GCC11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
--- Comment #27 from Peter Cordes ---
(In reply to Alexander Monakov from comment #26)
> Sure, the right course of action seems to be to simply document that atomic
> types and built-ins are meant to be used on "common" (writeback) memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
--- Comment #25 from Peter Cordes ---
(In reply to Alexander Monakov from comment #24)
>
> I think it's possible to get UC/WC mappings via a graphics/compute API (e.g.
> OpenGL, Vulkan, OpenCL, CUDA) on any OS if you get a mapping to device
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #23
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106138
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #3
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: arm64-*-*
void
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: arm64-*-*
void foo(unsigned long *p) {
*p
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105904
Bug ID: 105904
Summary: Predicated mov r0, #1 with opposite conditions could
be hoisted, between 1 and 1< // using the libstdc++ header
unsigned roundup(unsigned x){
return std::bit_ceil(x);
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105596
--- Comment #1 from Peter Cordes ---
https://godbolt.org/z/aoG55T5Yq
gcc -O3 -m32 has the same problem with unsigned long long total and unsigned
i.
Pretty much identical instruction sequences in the loop for all 3 versions,
doing add/adc to
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
For total *= i with a u128 total and a u32 loop counter, GCC pessimizes by
widening i and doing a full 128x128 => 128-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65146
--- Comment #25 from Peter Cordes ---
(In reply to CVS Commits from comment #24)
> The master branch has been updated by Jakub Jelinek :
>
> https://gcc.gnu.org/g:04df5e7de2f3dd652a9cddc1c9adfbdf45947ae6
>
> commit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261
--- Comment #4 from Peter Cordes ---
GCC will emit SHLD / SHRD as part of shifting an integer that's two registers
wide.
Hironori Bono proposed the following functions as a workaround for this missed
optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105066
--- Comment #5 from Peter Cordes ---
> pextrw requires sse4.1 for mem operands.
You're right! I didn't double-check the asm manual for PEXTRW when writing up
the initial report, and had never realized that PINSRW wasn't symmetric with
it.
: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
With PR105066 fixed, we do _mm_loadu_si16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508
--- Comment #17 from Peter Cordes ---
(In reply to Andrew Pinski from comment #16)
> >According to Intel (
> > https://software.intel.com/sites/landingpage/IntrinsicsGuide), there are no
> > alignment requirements for _mm_load_sd, _mm_store_sd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #14
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754
--- Comment #6 from Peter Cordes ---
Looks good to me, thanks for taking care of this quickly, hopefully we can get
this backported to the GCC11 series to limit the damage for people using these
newish intrinsics. I'd love to recommend them for
: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
PR99754 fixed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754
--- Comment #3 from Peter Cordes ---
Wait a minute, the current implementation of _mm_loadu_si32 isn't
strict-aliasing or alignment safe!!! That defeats the purpose for its
existence as something to use instead of _mm_cvtsi32_si128( *(int*)p
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #2
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*, arm-*-*
std::bit_ceil(x) involves if(x == 0 || x == 1) return 1;
and 1u << (32-c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97759
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #14
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #11 from Peter Cordes ---
Also, horizontal byte sums are generally best done with VPSADBW against a zero
vector, even if that means some fiddling to flip to unsigned first and then
undo the bias.
simde_vaddlv_s8:
vpxorxmm0,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80570
--- Comment #3 from Peter Cordes ---
(In reply to Andrew Pinski from comment #2)
> Even on aarch64:
>
> .L2:
> ldr q0, [x1], 16
> sxtlv1.2d, v0.2s
> sxtl2 v0.2d, v0.4s
> scvtf v1.2d, v1.2d
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91103
--- Comment #9 from Peter Cordes ---
Thanks for implementing my idea :)
(In reply to Hongtao.liu from comment #6)
> For elements located above 128bits, it seems always better(?) to use
> valign{d,q}
TL:DR:
I think we should still use
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #37 from Peter Cordes ---
Correction, PR82666 is that the cmov on the critical path happens even at -O2
(with GCC7 and later). Not just with -O3 -fno-tree-vectorize.
Anyway, that's related, but probably separate from choosing to do
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #36
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15533
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82940
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100922
--- Comment #2 from Peter Cordes ---
Possibly also related:
With different surrounding code, this loop can compile to asm which has two
useless movz / mov register copies in the loop at -O2
(https://godbolt.org/z/PTcqzM6q7). (To set up for
: 12.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Created attachment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88770
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80636
Peter Cordes changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42587
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #12
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98801
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #5
Version: 11.0
Status: UNCONFIRMED
Keywords: missed-optimization, ssemmx
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366
--- Comment #1 from Peter Cordes ---
Forgot to include https://godbolt.org/z/q44r13
-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
When you use the same _mm_load_si128 or _mm256_load_si256 result twice,
sometimes GCC loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #53
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #91
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89346
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82459
Peter Cordes changed:
What|Removed |Added
See Also||https://gcc.gnu.org/bugzill
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92244
--- Comment #4 from Peter Cordes ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Peter Cordes from comment #1)
> > On AArch64 (with gcc8.2), we see a similar effect, more instructions in the
> > loop. And an indexed addressing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92246
--- Comment #1 from Peter Cordes ---
And BTW, GCC *does* use vpermd (not vpermt2d) for swapt = int or long. This
problem only applies to char and short. Possibly because AVX2 includes vpermd
ymm.
Apparently CannonLake has 1 uop vpermb
: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
typedef short
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92244
Peter Cordes changed:
What|Removed |Added
Summary|extra sub inside vectorized |vectorized loop updating 2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92244
--- Comment #1 from Peter Cordes ---
On AArch64 (with gcc8.2), we see a similar effect, more instructions in the
loop. And an indexed addressing mode.
https://godbolt.org/z/6ZVWY_
# strrev_explicit -O3 -mcpu=cortex-a53
...
.L4:
-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
We get a redundant instruction inside the vectorized loop here. But it's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92243
--- Comment #1 from Peter Cordes ---
Forgot to mention, this probably applies to other ISAs with GP-integer
byte-reverse instructions and efficient unaligned loads.
sion: 10.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Targ
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82887
--- Comment #5 from Peter Cordes ---
Reported bug 92080 for the missed CSE
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
As a workaround for PR 82887 some code (e.g. a memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82887
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91515
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91398
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91026
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91103
--- Comment #4 from Peter Cordes ---
We should not put any stock in what ICC does for GNU C native vector indexing.
I think it doesn't know how to optimize that because it *always* spills/reloads
even for `vec[0]` which could be a no-op. And
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
GCC9.1 and current trunk
-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
void protect_me() {
volatile int buf[2];
buf[1] = 3;
}
https://godbolt.org/z/xdlr5w
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568
--- Comment #5 from Peter Cordes ---
And BTW, this only helps if the SUB and JNE are consecutive, which GCC
(correctly) doesn't currently optimize for with XOR.
If this sub/jne is different from a normal sub/branch and won't already get
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568
--- Comment #3 from Peter Cordes ---
(In reply to Jakub Jelinek from comment #2)
> The xor there is intentional, for security reasons we do not want the stack
> canary to stay in the register afterwards, because then it could be later
> spilled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568
--- Comment #1 from Peter Cordes ---
https://godbolt.org/z/hHCVTc
Forgot to mention, stack-protector also disables use of the red-zone for no
apparent reason, so that's another missed optimization. (Perhaps rarely
relevant; probably most
: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
cmp/jne is always at least as efficient as xor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88809
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #22 from Peter Cordes ---
Nice, that's exactly the kind of thing I suggested in bug 80571. If this
covers
* vsqrtss/sd (mem),%merge_into, %xmm
* vpcmpeqd%same,%same, %dest# false dep on KNL / Silvermont
* vcmptrueps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80571
--- Comment #2 from Peter Cordes ---
I think hjl's patch for PR 89071 / PR 87007 fixes (most of?) this, at least for
AVX.
If register pressure is an issue, using a reg holding a arbitrary constant
(instead of xor-zeroed) is a valid option, as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38959
--- Comment #4 from Peter Cordes ---
The __builtin_ia32_rdpmc being a pure function bug I mentioned in my previous
comment is already reported and fixed (in gcc9 only): bug 87550
It was present since at least gcc 5.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38959
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #15 from Peter Cordes ---
(In reply to Uroš Bizjak from comment #13)
> I assume that memory inputs are not problematic for SSE/AVX {R,}SQRT, RCP
> and ROUND instructions. Contrary to CVTSI2S{S,D}, CVTSS2SD and CVTSD2SS, we
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
--- Comment #6 from Peter Cordes ---
Oops, these were SD not SS. Getting sleepy >.<. Still, my optimization
suggestion for doing both compares in one masked SUB of +-PBCx applies equally.
And I think my testing with VBLENDVPS should apply
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
--- Comment #5 from Peter Cordes ---
IF ( xij.GT.+HALf ) xij = xij - PBCx
IF ( xij.LT.-HALf ) xij = xij + PBCx
For code like this, *if we can prove only one of the IF() conditions will be
true*, we can implement it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
--- Comment #4 from Peter Cordes ---
I suspect dep-chains are the problem, and branching to skip work is a Good
Thing when it's predictable.
(In reply to Richard Biener from comment #2)
> On Skylake it's better (1uop, 1 cycle latency) while on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #10 from Peter Cordes ---
(In reply to Uroš Bizjak from comment #9)
> There was similar patch for sqrt [1], I think that the approach is
> straightforward, and could be applied to other reg->reg scalar insns as
> well, independently
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #8 from Peter Cordes ---
Created attachment 45544
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45544=edit
testloop-cvtss2sd.asm
(In reply to H.J. Lu from comment #7)
> I fixed assembly codes and run it on different AVX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #6 from Peter Cordes ---
(In reply to Peter Cordes from comment #5)
> But whatever the effect is, it's totally unrelated to what you were *trying*
> to test. :/
After adding a `ret` to each AVX function, all 5 are basically the same
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #5 from Peter Cordes ---
(In reply to H.J. Lu from comment #4)
> (In reply to Peter Cordes from comment #2)
> > Can you show some
> > asm where this performs better?
>
> Please try cvtsd2ss branch at:
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #3 from Peter Cordes ---
(In reply to H.J. Lu from comment #1)
I have a patch for PR 87007:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00298.html
>
> which inserts a vxorps at the last possible position. vxorps
> will be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #2 from Peter Cordes ---
(In reply to H.J. Lu from comment #1)
> But
>
> vxorps %xmm0, %xmm0, %xmm0
> vcvtsd2ss %xmm1, %xmm0, %xmm0
>
> are faster than both.
On Skylake-client (i7-6700k), I can't reproduce this
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80586
Peter Cordes changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
ywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
float cvt(double unused, double xmm1) { return xmm1; }
g++ (GCC-Explorer-Build)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89063
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82459
--- Comment #4 from Peter Cordes ---
The VPAND instructions in the 256-bit version are a missed-optimization.
I had another look at this with current trunk. Code-gen is similar to before
with -march=skylake-avx512 -mprefer-vector-width=512.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82459
--- Comment #3 from Peter Cordes ---
I had another look at this with current trunk. Code-gen is similar to before
with -march=skylake-avx512 -mprefer-vector-width=512. (If we improve code-gen
for that choice, it will make it a win in more
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
The wrong-code bug 86314 also
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80820
--- Comment #5 from Peter Cordes ---
AVX512F with marge-masking for integer->vector broadcasts give us a single-uop
replacement for vpinsrq/d, which is 2 uops on Intel/AMD.
See my answer on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #14 from Peter Cordes ---
I happened to look at this old bug again recently.
re: extracting high the low two 32-bit elements:
(In reply to Uroš Bizjak from comment #11)
> > Or without SSE4 -mtune=sandybridge (anything that excluded
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69615
--- Comment #5 from Peter Cordes ---
Update: https://godbolt.org/g/ZQDY1G
gcc7/8 optimizes this to and / cmp / jb, while gcc6.3 doesn't.
void rangecheck_var(int64_t x, int64_t lim2) {
//lim2 >>= 60;
lim2 &= 0xf; // let the compiler figure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011
--- Comment #13 from Peter Cordes ---
(In reply to Jakub Jelinek from comment #10)
> ?? That is the task for the linker SHF_MERGE|SHF_STRINGS handling.
> Why should gcc duplicate that?
Because gcc would benefit from knowing if merging makes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011
--- Comment #12 from Peter Cordes ---
(In reply to Jakub Jelinek from comment #10)
> (In reply to Peter Cordes from comment #9)
> > gcc already totally misses optimizations here where one string is a suffix
> > of another. "mii" could just be a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85585
--- Comment #1 from Peter Cordes ---
By comparison, the no-PIE table of pointers only needs one instruction:
movqCSWTCH.4(,%rdi,8), %rax
So all my suggestions cost 1 extra instruction on x86 in no-PIE mode, but at a
massive savings
Reporter: peter at cordes dot ca
Target Milestone: ---
Bug 84011 shows some really silly code-gen for PIC code and discussion
suggested using a table of offsets instead of a table of actual pointers, so
you just need one base address.
A further optimization is possible when the strings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81274
--- Comment #2 from Peter Cordes ---
The stray LEA bug seems to be fixed in current trunk (9.0.0 20180429), at least
for this testcase. Gcc's stack-alignment strategy seems to be improved overall
(not copying the return address when not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69560
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #23
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81274
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #1
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at cordes dot ca
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
From
https
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85038
--- Comment #1 from Peter Cordes ---
Correction for AArch64: it supports addressing modes with a 64-bit base
register + 32-bit index register with zero or sign extension for the 32-bit
index. But not 32-bit base registers.
As a hack that's
1 - 100 of 267 matches
Mail list logo