[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t

2023-04-16 Thread klaus.doldinger64 at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

Wilhelm M  changed:

   What|Removed |Added

 CC||klaus.doldinger64@googlemai
   ||l.com

--- Comment #5 from Wilhelm M  ---
(In reply to Roger Sayle from comment #4)
> Created attachment 54871 [details]
> proposed patch
> 
> Proposed patch, using a peephole2 in avr-dimode.md to inline calls to
> __lshrdi3 that require only a single instruction or two (due to truncation).
> For truncations to char, this is smaller and faster, and for truncations to
> unsigned short this is the same size, but faster.  The drawback is that
> performing this late (in peephole2) doesn't eliminate the stack frame
> prolog/epilog.  Thoughts?

Looks good to me.
Many thanks!

[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t

2023-04-16 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

Roger Sayle  changed:

   What|Removed |Added

 CC||roger at nextmovesoftware dot 
com

--- Comment #4 from Roger Sayle  ---
Created attachment 54871
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54871=edit
proposed patch

Proposed patch, using a peephole2 in avr-dimode.md to inline calls to __lshrdi3
that require only a single instruction or two (due to truncation).  For
truncations to char, this is smaller and faster, and for truncations to
unsigned short this is the same size, but faster.  The drawback is that
performing this late (in peephole2) doesn't eliminate the stack frame
prolog/epilog.  Thoughts?

[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t

2015-08-13 Thread gjl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

--- Comment #3 from Georg-Johann Lay gjl at gcc dot gnu.org ---
(In reply to Matthijs Kooijman from comment #2)
 So, IIUC, this is quite hard to fix? Either you use lib functions, which
 prevents the optimizer from just relabeling or coyping registers to apply
 shifting, or you don't and then more complex operations will become very
 verbose and messy?

avr-gcc is using lib functions, but not as libcall but as transparent call
instead.  You'll find the implementation in avr-dimode.md.  avr-gcc has no
proper DImode support.  For reasoning cf. the comments in the head of that
file.


[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t

2015-08-02 Thread matthijs at stdin dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

--- Comment #2 from Matthijs Kooijman matthijs at stdin dot nl ---
So, IIUC, this is quite hard to fix? Either you use lib functions, which
prevents the optimizer from just relabeling or coyping registers to apply
shifting, or you don't and then more complex operations will become very
verbose and messy?

Would it make sense (and be possible) to add a special case to not use lib
functions for shifts by a constant number of bits that is also a multiple of 8?
At first glance, that would make a lot of common cases (where an integer is
decomposed into separate bytes or other parts) a lot faster, while still
keeping the lib functions for more complex operations?


[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t

2015-06-27 Thread gjl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

Georg-Johann Lay gjl at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||avr
   Priority|P3  |P5
 CC||gjl at gcc dot gnu.org
   Severity|normal  |enhancement


[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t

2015-06-27 Thread gjl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

--- Comment #1 from Georg-Johann Lay gjl at gcc dot gnu.org ---
(In reply to Matthijs Kooijman from comment #0)
 I haven't found a readily available 5.x package yet to test.

It's the same.

 As you can see, the versions operating on 64 bit values preserve the
 8-bit shift (which is very inefficient on AVR), while the versions
 running on 32 bit values simply copy the right registers.

Lib functions are used because users complained about bloated 64-bit
arithmetic.  

Notice that indide these 64-bit shift functions byte-shifts are used.

 The foo32_16 function still has some useless instructions (r27 and r26
 are not part of the return value, not sure why these are set) but that
 is probably an unrelated problem.

Yes.

 I've marked this with component target, since I think these
 optimizations are avr-specific (or at least not applicable to bigger
 architectures).