[Bug target/98891] [10/11 regression] Neon logical operations not vectorized in DImode since g:cdfc0e863a03698a80c74896cbdc9f5c8c652e64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98891 Wilco changed: What|Removed |Added Resolution|--- |WONTFIX Status|NEW |RESOLVED --- Comment #6 from Wilco --- Current codegen is more optimal (there is no gain from using Neon for 64-bit types in general), so closing.
[Bug target/98891] [10/11 regression] Neon logical operations not vectorized in DImode since g:cdfc0e863a03698a80c74896cbdc9f5c8c652e64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98891 Richard Biener changed: What|Removed |Added Target Milestone|10.3|10.4 --- Comment #5 from Richard Biener --- GCC 10.3 is being released, retargeting bugs to GCC 10.4.
[Bug target/98891] [10/11 regression] Neon logical operations not vectorized in DImode since g:cdfc0e863a03698a80c74896cbdc9f5c8c652e64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98891
--- Comment #4 from Wilco ---
(In reply to Jakub Jelinek from comment #1)
> Reduced testcase:
> extern unsigned long long a, b, c;
>
> void
> foo (void)
> {
> a = b | ~c;
> }
>
> Seems this is the usual dilemma between split double-word operations early
> vs. split it late, each has its advantages and serious disadvantages.
> By splitting early, combiner can't really do much with it, it is split into
> loads, not, or and store of the halves separately and combiner doesn't see
> the two halves together, one would need essentially vectorization on RTL to
> match that.
Splitting early is required since it results in much more efficient code.
However the real underlying problem is the concept that a type can map to
different register files. Generally a compiler must decide the register file
for each operand before register allocation, but GCC does this during register
allocation. And it does it badly with incomplete knowledge and way too many
costing hacks. To get decent code for AArch64 we had to add special hooks to
force the allocator to strongly prefer allocating integer types to integer
registers and FP/SIMD types to FP/SIMD registers.
[Bug target/98891] [10/11 regression] Neon logical operations not vectorized in DImode since g:cdfc0e863a03698a80c74896cbdc9f5c8c652e64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98891 --- Comment #3 from Wilco --- Older GCCs only ever did this for vorn, not for other operations like add/sub/and/orr/eor, so current behaviour is now fully consistent, and I don't consider it a bug. One could argue these intrinsics should always map to Neon instructions rather than being optimized into 64-bit integer operations. However GCC never did support this except for vorn, so it's not clear whether there is an advantage in changing this now.
[Bug target/98891] [10/11 regression] Neon logical operations not vectorized in DImode since g:cdfc0e863a03698a80c74896cbdc9f5c8c652e64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98891 --- Comment #2 from Jakub Jelinek --- E.g. x86_64 (both -m32 and -m64) keeps the double-word logicals in the IL, then has its machine dependent stv pass that promotes some sets of operations into SIMD ones and finally (admittedly, clearly too late) splits the double-word operations into the operations on halves when SIMD wasn't beneficial.
[Bug target/98891] [10/11 regression] Neon logical operations not vectorized in DImode since g:cdfc0e863a03698a80c74896cbdc9f5c8c652e64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98891
Jakub Jelinek changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
CC||jakub at gcc dot gnu.org,
||wilco at gcc dot gnu.org
Last reconfirmed||2021-03-17
--- Comment #1 from Jakub Jelinek ---
Reduced testcase:
extern unsigned long long a, b, c;
void
foo (void)
{
a = b | ~c;
}
Seems this is the usual dilemma between split double-word operations early vs.
split it late, each has its advantages and serious disadvantages.
By splitting early, combiner can't really do much with it, it is split into
loads, not, or and store of the halves separately and combiner doesn't see the
two halves together, one would need essentially vectorization on RTL to match
that.
[Bug target/98891] [10/11 regression] Neon logical operations not vectorized in DImode since g:cdfc0e863a03698a80c74896cbdc9f5c8c652e64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98891 Richard Biener changed: What|Removed |Added Target Milestone|11.0|10.3 Summary|[11 regression] Neon|[10/11 regression] Neon |logical operations not |logical operations not |vectorized in DImode since |vectorized in DImode since |g:cdfc0e863a03698a80c74896c |g:cdfc0e863a03698a80c74896c |bdc9f5c8c652e64 |bdc9f5c8c652e64 Priority|P3 |P2
