[Bug middle-end/88487] union prevents autovectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88487 --- Comment #6 from Daniel Fruzynski --- Not good. Fortunately I found workaround. This is probably the best what one can get: [code] #include #include template struct TypeHelper { constexpr unsigned offset(); operator Type&() { uint8_t*__restrict p = (uint8_t*__restrict)this - offset(); Type*__restrict pt = (Type*__restrict)p; return *pt; } }; struct S { struct Union { void*__restrict*__restrict ptr; TypeHelper d; } u; }; template<> constexpr unsigned TypeHelper::offset() { return offsetof(S::Union, d) - offsetof(S::Union, ptr); } void test(S* __restrict s1, S* __restrict s2) { for (int n = 0; n < 2; ++n) { s1->u.d[n][0] = s2->u.d[n][0]; s1->u.d[n][1] = s2->u.d[n][1]; } } [/code]
[Bug middle-end/88487] union prevents autovectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88487 --- Comment #5 from rguenther at suse dot de --- On Fri, 14 Dec 2018, bugzi...@poradnik-webmastera.com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88487 > > --- Comment #4 from Daniel Fruzynski --- > OK, I see. Is there any workaround for this? Only not using a union ... > I tried to assign pointer to local > variable directly and with intermediate casting via void*, but it did not > help. > Casting S1* to S2* also does not work. Yes, that doesn't work by design.
[Bug middle-end/88487] union prevents autovectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88487 --- Comment #4 from Daniel Fruzynski --- OK, I see. Is there any workaround for this? I tried to assign pointer to local variable directly and with intermediate casting via void*, but it did not help. Casting S1* to S2* also does not work.
[Bug middle-end/88487] union prevents autovectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88487 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2018-12-14 Blocks||49774 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #3 from Richard Biener --- t.c:22:1: note: can't determine dependence between MEM[(double *)_28 clique 1 base 0] and MEM[(double *)_25 + 8B clique 1 base 0] t.c:22:1: note: removing SLP instance operations starting from: MEM[(double *)_28 clique 1 base 0] = _29; t.c:22:1: note: can't determine dependence between MEM[(double *)_43 clique 1 base 0] and MEM[(double *)_40 + 8B clique 1 base 0] t.c:22:1: note: removing SLP instance operations starting from: MEM[(double *)_43 clique 1 base 0] = _44; # PT = nonlocal escaped null _25 = MEM[(double * restrict *)_21 clique 1 base 0]; # PT = nonlocal escaped null _28 = MEM[(double * restrict *)_26 clique 1 base 0]; .. MEM[(double *)_28 clique 1 base 0] = _29; _31 = MEM[(double *)_25 + 8B clique 1 base 0]; while w/o unions we have # PT = null { D.2686 } (nonlocal, restrict) _25 = MEM[(double * restrict *)_21 clique 1 base 1]; .. _29 = MEM[(double *)_25 clique 1 base 2]; that is, the indirect loads from non-union members produce restricted pointers while those from union members not. The reason for this is that points-to analysis doesn't handle unions in field-sensitive analysis and thus the restrict code doesn't apply. This can probably be fixed in a reasonable manner in push_fields_onto_fieldstack by initializing only_restrict_pointers appropriately for the UNION case. Not really my top-priority though. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49774 [Bug 49774] [meta-bug] restrict qualification aliasing issues
[Bug middle-end/88487] union prevents autovectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88487 --- Comment #2 from Daniel Fruzynski --- I spotted that test3 in previous comment uses structure S2 which does not have union inside. When I changes it to use S1, I got non-vectorized code. So this workaround does not work.
[Bug middle-end/88487] union prevents autovectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88487 --- Comment #1 from Daniel Fruzynski --- Update: when pointers to data are copied to local variables like below, autovectorization starts working again. [code] void test3(S2* __restrict__ s1, S2* __restrict__ s2) { double* __restrict__ * __restrict__ d1 = s1->d; double* __restrict__ * __restrict__ d2 = s2->d; for (int n = 0; n < 2; ++n) { d1[n][0] = d2[n][0]; d1[n][1] = d2[n][1]; } } [/code]