[Bug target/70557] uint64_t zeroing on 32-bit hardware

2020-04-18 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-04-18
 Ever confirmed|0   |1
 CC||law at redhat dot com

--- Comment #8 from Jeffrey A. Law  ---
I believe you can see the poor code generation when using the -m5200 option.

[Bug target/70557] uint64_t zeroing on 32-bit hardware

2016-09-13 Thread sch...@linux-m68k.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557

--- Comment #7 from Andreas Schwab  ---
When compiling for m68k the compiler already generates the latter.

[Bug target/70557] uint64_t zeroing on 32-bit hardware

2016-04-06 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557

--- Comment #6 from Jakub Jelinek  ---
In that case it is a backend enhancement request.  Backends have many ways how
to deal with this, starting from specialized patterns, or using the lower
subreg passes, using their own splitters etc. and many of the actively
maintained backends handle this right.

[Bug target/70557] uint64_t zeroing on 32-bit hardware

2016-04-06 Thread acahalan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557

--- Comment #5 from Albert Cahalan  ---
This example shows the most simple form of the problem:

unsigned long long ull;
void simple64(void){
ull = 0;
}

NOTE: In the assembly below, I might have missing/excess parentheses. Assembler
syntax varies.

gcc generates:

clr.L %d0
clr.L %d1
move.L %d0,ull
move.L %d1,ull+4

As you can see, two registers are set to the same value. It's better to set
just one, and even better to directly address memory with a clr.L instruction.

Also, given that this code was optimized for size and there was an address
register free, gcc should have put the address of ull into a register and then
used that, preferably with autoincrement addressing.

I'd like to see something like this:

movea.L ull, %a0
clr.L (%a0)+
clr.L (%a0)

When optimizing for speed and registers are not available, maybe this:

clr.L ull
clr.L ull+4

(the code is larger with those 6-byte instructions though, and it might
actually run slower especially considering the small cache)

[Bug target/70557] uint64_t zeroing on 32-bit hardware

2016-04-06 Thread acahalan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557

--- Comment #4 from Albert Cahalan  ---
Mostly it's more like PR58741 because of the long long issue.

PR22141 (and PR23684 which is a better match) is about merging small things.
Two of the six examples here show that problem, those being the ones with a
loop over char.

The problem that prompted this bug report and determined the bug title is
different. It's in some way the opposite. When I ask gcc to store a 64-bit zero
value, gcc makes a 64-bit zero value in memory (two identical 32-bit halves in
a pair of 32-bit registers) and then stores that to memory.

There are many ways that this is wrong, and I worry that fixing one problem may
hide the other problems. Depending on compiler internals that I don't
understand, this could perhaps be four bugs:

1. When the two halves of a 64-bit value are identical, there is no need to
load values into two different registers. This is true for many constant
values, though obviously -1 and 0 would be most popular. Other popular values
would be the constants for computing a Hamming weight. AFAIK, this optimization
should apply whenever dealing with values that are larger than registers, such
as 128-bit values on 64-bit platforms.

2. When the address is to be encoded in the instruction that writes to memory,
it is best to directly clear the memory without first generating the constant
in registers. AFAIK, this optimization should apply to most CISC machines. The
fact that there is a special instruction for storing a 0 makes the optimization
more important.

3. When the address is to be encoded in an instruction, sometimes it is best to
place the address in a register and then use that register to supply the
address for storing to memory. This tends to apply when doing lots of writes,
when an address register happens to be available, and when optimizing for size.
AFAIK this optimization applies to most machines.

4. When using an address register to supply the location for storing, often it
is best to use autoincrement addressing instead of distinct offsets. This
usually generates smaller code. AFAIK this applies to many machines, including
at least: arm, m68k, and ppc.

(and also the store-merge issue, which makes 5)

[Bug target/70557] uint64_t zeroing on 32-bit hardware

2016-04-06 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
PR22141 ?  We really should tackle that as late GIMPLE pass for GCC 7.

[Bug target/70557] uint64_t zeroing on 32-bit hardware

2016-04-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70557

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
  Component|other   |target

--- Comment #2 from Richard Biener  ---
This is a known bug (partly) as GCC currently has no way to combine small
stores
into a larger one (apart from BB vectorization if the result fits a vector
store).

Too lazy to find the duplicate but you can search for it yourself.

Eventually there's a m68k target piece left, so keeping open as target bug.