[Bug tree-optimization/113678] SLP misses up vec_concat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678 --- Comment #3 from Andrew Pinski --- Note the SLP that happens in connection with the loop vectorizer actually does a decent job ...
[Bug tree-optimization/113678] SLP misses up vec_concat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678 --- Comment #2 from Andrew Pinski --- Noticed the same with: ``` void f(unsigned char *a, unsigned char *b, unsigned char *c) { unsigned char t[8]; t[0] = a[0]; t[1] = a[1]; t[2] = a[2]; t[3] = a[3]; t[4] = b[0]; t[5] = b[1]; t[6] = b[2]; t[7] = b[3]; c[0] = t[0]; c[1] = t[1]; c[2] = t[2]; c[3] = t[3]; c[4] = t[4]; c[5] = t[5]; c[6] = t[6]; c[7] = t[7]; } ``` Adding `-fno-tree-vectorize` gives the best code even.
[Bug tree-optimization/113678] SLP misses up vec_concat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-01-31 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- I think the SLP tree we discover is sound: t2.c:11:14: note: node 0x5db76f0 (max_nunits=8, refcnt=2) vector(8) char t2.c:11:14: note: op template: *a_7(D) = _1; t2.c:11:14: note: stmt 0 *a_7(D) = _1; t2.c:11:14: note: stmt 1 MEM[(char *)a_7(D) + 1B] = _2; t2.c:11:14: note: stmt 2 MEM[(char *)a_7(D) + 2B] = _3; t2.c:11:14: note: stmt 3 MEM[(char *)a_7(D) + 3B] = _4; t2.c:11:14: note: stmt 4 MEM[(char *)a_7(D) + 4B] = _1; t2.c:11:14: note: stmt 5 MEM[(char *)a_7(D) + 5B] = _2; t2.c:11:14: note: stmt 6 MEM[(char *)a_7(D) + 6B] = _3; t2.c:11:14: note: stmt 7 MEM[(char *)a_7(D) + 7B] = _4; t2.c:11:14: note: children 0x5db7778 t2.c:11:14: note: node 0x5db7778 (max_nunits=8, refcnt=2) vector(8) char t2.c:11:14: note: op template: _1 = *b_6(D); t2.c:11:14: note: stmt 0 _1 = *b_6(D); t2.c:11:14: note: stmt 1 _2 = MEM[(char *)b_6(D) + 1B]; t2.c:11:14: note: stmt 2 _3 = MEM[(char *)b_6(D) + 2B]; t2.c:11:14: note: stmt 3 _4 = MEM[(char *)b_6(D) + 3B]; t2.c:11:14: note: stmt 4 _1 = *b_6(D); t2.c:11:14: note: stmt 5 _2 = MEM[(char *)b_6(D) + 1B]; t2.c:11:14: note: stmt 6 _3 = MEM[(char *)b_6(D) + 2B]; t2.c:11:14: note: stmt 7 _4 = MEM[(char *)b_6(D) + 3B]; t2.c:11:14: note: load permutation { 0 1 2 3 0 1 2 3 } the issue is as so often t2.c:11:14: note: ==> examining statement: _1 = *b_6(D); t2.c:11:14: missed: BB vectorization with gaps at the end of a load is not supported t2.c:3:19: missed: not vectorized: relevant stmt not supported: _1 = *b_6(D); t2.c:11:14: note: Building vector operands of 0x5db7778 from scalars instead where we are not applying much non-ad-hoc work to deal with those "out-of-bound" accesses. The choice here would be obvious in doing a single vector(4) load instead.