[Bug target/70557] uint64_t zeroing on 32-bit hardware
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557 Jeffrey A. Law changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2020-04-18 Ever confirmed|0 |1 CC||law at redhat dot com --- Comment #8 from Jeffrey A. Law --- I believe you can see the poor code generation when using the -m5200 option.
[Bug target/70557] uint64_t zeroing on 32-bit hardware
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557 --- Comment #7 from Andreas Schwab--- When compiling for m68k the compiler already generates the latter.
[Bug target/70557] uint64_t zeroing on 32-bit hardware
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557 --- Comment #6 from Jakub Jelinek --- In that case it is a backend enhancement request. Backends have many ways how to deal with this, starting from specialized patterns, or using the lower subreg passes, using their own splitters etc. and many of the actively maintained backends handle this right.
[Bug target/70557] uint64_t zeroing on 32-bit hardware
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557 --- Comment #5 from Albert Cahalan --- This example shows the most simple form of the problem: unsigned long long ull; void simple64(void){ ull = 0; } NOTE: In the assembly below, I might have missing/excess parentheses. Assembler syntax varies. gcc generates: clr.L %d0 clr.L %d1 move.L %d0,ull move.L %d1,ull+4 As you can see, two registers are set to the same value. It's better to set just one, and even better to directly address memory with a clr.L instruction. Also, given that this code was optimized for size and there was an address register free, gcc should have put the address of ull into a register and then used that, preferably with autoincrement addressing. I'd like to see something like this: movea.L ull, %a0 clr.L (%a0)+ clr.L (%a0) When optimizing for speed and registers are not available, maybe this: clr.L ull clr.L ull+4 (the code is larger with those 6-byte instructions though, and it might actually run slower especially considering the small cache)
[Bug target/70557] uint64_t zeroing on 32-bit hardware
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557 --- Comment #4 from Albert Cahalan --- Mostly it's more like PR58741 because of the long long issue. PR22141 (and PR23684 which is a better match) is about merging small things. Two of the six examples here show that problem, those being the ones with a loop over char. The problem that prompted this bug report and determined the bug title is different. It's in some way the opposite. When I ask gcc to store a 64-bit zero value, gcc makes a 64-bit zero value in memory (two identical 32-bit halves in a pair of 32-bit registers) and then stores that to memory. There are many ways that this is wrong, and I worry that fixing one problem may hide the other problems. Depending on compiler internals that I don't understand, this could perhaps be four bugs: 1. When the two halves of a 64-bit value are identical, there is no need to load values into two different registers. This is true for many constant values, though obviously -1 and 0 would be most popular. Other popular values would be the constants for computing a Hamming weight. AFAIK, this optimization should apply whenever dealing with values that are larger than registers, such as 128-bit values on 64-bit platforms. 2. When the address is to be encoded in the instruction that writes to memory, it is best to directly clear the memory without first generating the constant in registers. AFAIK, this optimization should apply to most CISC machines. The fact that there is a special instruction for storing a 0 makes the optimization more important. 3. When the address is to be encoded in an instruction, sometimes it is best to place the address in a register and then use that register to supply the address for storing to memory. This tends to apply when doing lots of writes, when an address register happens to be available, and when optimizing for size. AFAIK this optimization applies to most machines. 4. When using an address register to supply the location for storing, often it is best to use autoincrement addressing instead of distinct offsets. This usually generates smaller code. AFAIK this applies to many machines, including at least: arm, m68k, and ppc. (and also the store-merge issue, which makes 5)
[Bug target/70557] uint64_t zeroing on 32-bit hardware
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- PR22141 ? We really should tackle that as late GIMPLE pass for GCC 7.
[Bug target/70557] uint64_t zeroing on 32-bit hardware
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Component|other |target --- Comment #2 from Richard Biener --- This is a known bug (partly) as GCC currently has no way to combine small stores into a larger one (apart from BB vectorization if the result fits a vector store). Too lazy to find the duplicate but you can search for it yourself. Eventually there's a m68k target piece left, so keeping open as target bug.