[Bug c++/86619] Missed optimization opportunity with array aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86619 --- Comment #5 from Andrew Pinski --- I Noticed clang, ICC nor MSVC either handle this either. Note GCC is the only one which handles : int f(std::array & a, std::array & b) { a[0] = 1; b[0] = 1; return a[0]; }
[Bug c++/86619] Missed optimization opportunity with array aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86619 --- Comment #4 from Michael Veksler --- It is interesting to check the impact on numerical C++ benchmarks. Fortran has a conceptual restrict on all its parameter arrays, since aliasing is not allowed. void f(int * __restrict__ v1, int * __restrict__ v2, int n) { for (int i=0 ; i < n ; i++) v1[0] += v2[i]; } and Fortran: subroutine f(v1, v2, n) integer :: v1(100) integer :: v2(100) integer :: n DO i=1, n v1(1) = v1(1) + v2(i) END DO end subroutine f Generate the same loop: .L3: addl(%rdx), %eax addq$4, %rdx cmpq%rdx, %r8 jne .L3 But without restrict, as expected, g++ generates: .L8: addl(%rdx), %eax addq$4, %rdx cmpq%r8, %rdx movl%eax, (%rcx) jne .L8 Running both variants from a loop (in a separate translation unit, without whole program optimization) (g++ 7.2.0 with -O2 on 64 bit cygwin): #include #include void f(int * __restrict__ v1, int *__restrict__ v2, int SIZE); void g(int * v1, int * v2, int SIZE); constexpr int SIZE = 1'000'000; int v2[SIZE]; int main() { int v1; f(, v2, SIZE); // Warm up cache auto start = std::clock(); constexpr int TIMES = 10'000; for (int i=0 ; i < TIMES; ++i) { v1 = 0; f(, v2, SIZE); } auto t1 = std::clock(); for (int i=0 ; i < TIMES; ++i) { v1 = 0; g(, v2, SIZE); } auto t2 = std::clock(); std::cout << "with restrict: " << double(t1 - start) / CLOCKS_PER_SEC << " sec\n"; std::cout << "without restrict: " << double(t2 - t1) / CLOCKS_PER_SEC << " sec\n"; } And the results are: with restrict: 4.477 sec without restrict: 5.756 sec Which clearly demonstrates the impact of good alias analysis. With plain C pointers, this is an unavoidable price. But unfortunately this also happens when passing pointers or references to arrays of different sizes, or when inheriting two different types from std::array, in order to mark the parameters as non-aliasing.
[Bug c++/86619] Missed optimization opportunity with array aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86619 --- Comment #3 from rguenther at suse dot de --- On Mon, 23 Jul 2018, mickey.veksler at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86619 > > --- Comment #2 from Michael Veksler --- > >> type-based alias analysis doesn't distinguish between int[2] and int[3]. > > Is it just the way GCC implements type-based alias analysis, > or is it defined that way in the C and C++ standards? It's the way GCC implements it. > I suspect that the weaker alias analysis of arrays (int [size] and > std::array) is one of the things that make C++ slower than > Fortran on some benchmarks. Not sure - Fortran shares the restriction and also uses pointer-based accesses. Fortran is just more constrained so it can put __restrict on its arrays as an implementation detail very aggressively.
[Bug c++/86619] Missed optimization opportunity with array aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86619 --- Comment #2 from Michael Veksler --- >> type-based alias analysis doesn't distinguish between int[2] and int[3]. Is it just the way GCC implements type-based alias analysis, or is it defined that way in the C and C++ standards? I suspect that the weaker alias analysis of arrays (int [size] and std::array) is one of the things that make C++ slower than Fortran on some benchmarks.
[Bug c++/86619] Missed optimization opportunity with array aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86619 Richard Biener changed: What|Removed |Added Keywords||alias, missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2018-07-23 CC||rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- type-based alias analysis doesn't distinguish between int[2] and int[3]. The issue with operator[] is that the FE produces ;; Function T& ar::operator[](size_t) [with T = int; long unsigned int size = 2; size_t = long unsigned int] (null) ;; enabled by -tree-original return = (int &) &((struct ar *) this)->ar[offset]; and <::operator[] ((struct ar *) a, 0) = 1) >; <::operator[] ((struct ar *) b, 0) = 2) >; < = *ar::operator[] ((struct ar *) a, 0)>>; which after inlining is int & _6; int & _9; : _6 = _5(D)->ar[0]; *_6 = 1; _9 = _8(D)->ar[0]; *_9 = 2; _11 = _5(D)->ar[0]; _12 = *_11; return _12; compared to f1 (struct ar & a, struct ar & b) { int _6; : a_2(D)->ar[0] = 1; b_4(D)->ar[0] = 2; _6 = a_2(D)->ar[0]; return _6; here TBAA only sees int & accesses which do conflict and points-to analysis is TBAA agnostic and cannot disambiguate a_5(D) and b_8(D). For f1 TBAA sees structure accesses and can disambiguate. C++ abstraction makes it harder to optimize here. You get two accesses of effective type int vs. one of ar and one of ar. Way in the past points-to had some bits of TBAA, eventually we can re-introduce bits here but the TBAA bits did not play well with the points-to solver and created wrong-code. Note there isn't really a way to tell the middle-end that a pointed to object is of a specific dynamic type. Eventually we can play leeway and make REFERENCE_TYPE parameters behave that way.