[Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #4 from Andrew Pinski --- Dup of bug 98953. *** This bug has been marked as a duplicate of bug 98953 ***
[Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391 --- Comment #3 from Richard Biener --- the bswap pass is in principle able to handle these but it sees _1 = (sizetype) offset_12(D); _2 = RomHeader_13(D) + _1; _3 = *_2; _4 = (signed short) _3; _5 = _1 + 1; _6 = RomHeader_13(D) + _5; _7 = *_6; so the constant offset is not forwarded to the MEM_REFs (int vs. size_t issue) and the bswap pass doesn't perform any fancy dataref analysis to spot constant offsetted same bases (it could simply use split_constant_offset on the found base I guess or invoke DR analysis in BB mode).
[Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Ever confirmed|0 |1 Last reconfirmed||2021-09-18 Status|UNCONFIRMED |NEW --- Comment #2 from Andrew Pinski --- GCC can figure out case offset = 0; There might be a dup of this one too.
[Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391 Gabriel Ravier changed: What|Removed |Added Summary|Failure to optimize 2 8-bit |Failure to optimize |loads into a single 16-bit |adjacent 8-bit loads into a |load|single bigger load --- Comment #1 from Gabriel Ravier --- Note: this also equivalently works on bigger sizes: uint32_t HeaderReadU32LE(int offset, uint8_t *RomHeader) { return RomHeader[offset] | (RomHeader[offset + 1] << 8) | (RomHeader[offset + 2] << 16) | (RomHeader[offset + 3] << 24); } On AMD64, GCC outputs this: HeaderReadU32LE: movsx rdi, edi movzx eax, BYTE PTR [rsi+1+rdi] movzx edx, BYTE PTR [rsi+2+rdi] sal eax, 8 sal edx, 16 or eax, edx movzx edx, BYTE PTR [rsi+rdi] or eax, edx movzx edx, BYTE PTR [rsi+3+rdi] sal edx, 24 or eax, edx ret LLVM manages this: HeaderReadU32LE: movsxd rax, edi mov eax, dword ptr [rsi + rax] ret