https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104195
Bug ID: 104195 Summary: Fails to optimize nested array indexing p[i/N].data[i%N] Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: arthur.j.odwyer at gmail dot com Target Milestone: --- GCC seems to be unable to optimize some nested array accesses of the form p[i/N].data[i%N] into a simple ((T*)p)[i]. The C test case is struct CChunk { int data[4]; }; int f(struct CChunk *p, unsigned long long i) { return p[i/4].data[i%4]; } gcc -O2 currently produces: movq %rsi, %rax andl $3, %esi shrq $2, %rax salq $4, %rax addq %rax, %rdi movl (%rdi,%rsi,4), %eax ret but I would prefer it to produce: movl (%rdi,%rsi,4), %eax retq A more exhaustive C++ test follows — GCC can optimize a few of these, but not all. (Clang can't optimize any of these; I've just filed https://github.com/llvm/llvm-project/issues/53367 about that.) https://godbolt.org/z/3E1e6c5e3 template<class T, int N> struct Chunk { T data[N]; }; template<class T, int N, class IndexType> int f(Chunk<T, N> *p, IndexType i) { return p[i/N].data[i%N]; } template int f(Chunk<char,2>*, unsigned long long); // GCC wins template int f(Chunk<char,4>*, unsigned long long); // GCC wins template int f(Chunk<char,2>*, unsigned); template int f(Chunk<char,4>*, unsigned); template int f(Chunk<int,2>*, unsigned long long); // GCC wins template int f(Chunk<int,4>*, unsigned long long); template int f(Chunk<int,2>*, unsigned); template int f(Chunk<int,4>*, unsigned);