[Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load

2021-12-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #4 from Andrew Pinski  ---
Dup of bug 98953.

*** This bug has been marked as a duplicate of bug 98953 ***

[Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load

2021-09-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

--- Comment #3 from Richard Biener  ---
the bswap pass is in principle able to handle these but it sees

  _1 = (sizetype) offset_12(D);
  _2 = RomHeader_13(D) + _1;
  _3 = *_2;
  _4 = (signed short) _3;
  _5 = _1 + 1;
  _6 = RomHeader_13(D) + _5;
  _7 = *_6;

so the constant offset is not forwarded to the MEM_REFs (int vs. size_t issue)
and the bswap pass doesn't perform any fancy dataref analysis to spot
constant offsetted same bases (it could simply use split_constant_offset
on the found base I guess or invoke DR analysis in BB mode).

[Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load

2021-09-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
 Ever confirmed|0   |1
   Last reconfirmed||2021-09-18
 Status|UNCONFIRMED |NEW

--- Comment #2 from Andrew Pinski  ---
GCC can figure out case offset = 0;

There might be a dup of this one too.

[Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load

2021-09-17 Thread gabravier at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

Gabriel Ravier  changed:

   What|Removed |Added

Summary|Failure to optimize 2 8-bit |Failure to optimize
   |loads into a single 16-bit  |adjacent 8-bit loads into a
   |load|single bigger load

--- Comment #1 from Gabriel Ravier  ---
Note: this also equivalently works on bigger sizes:

uint32_t HeaderReadU32LE(int offset, uint8_t *RomHeader)
{
return RomHeader[offset] |
(RomHeader[offset + 1] << 8) |
(RomHeader[offset + 2] << 16) |
(RomHeader[offset + 3] << 24);
}

On AMD64, GCC outputs this:

HeaderReadU32LE:
  movsx rdi, edi
  movzx eax, BYTE PTR [rsi+1+rdi]
  movzx edx, BYTE PTR [rsi+2+rdi]
  sal eax, 8
  sal edx, 16
  or eax, edx
  movzx edx, BYTE PTR [rsi+rdi]
  or eax, edx
  movzx edx, BYTE PTR [rsi+3+rdi]
  sal edx, 24
  or eax, edx
  ret

LLVM manages this:

HeaderReadU32LE:
  movsxd rax, edi
  mov eax, dword ptr [rsi + rax]
  ret