[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511 Wilhelm M changed: What|Removed |Added CC||klaus.doldinger64@googlemai ||l.com --- Comment #5 from Wilhelm M --- (In reply to Roger Sayle from comment #4) > Created attachment 54871 [details] > proposed patch > > Proposed patch, using a peephole2 in avr-dimode.md to inline calls to > __lshrdi3 that require only a single instruction or two (due to truncation). > For truncations to char, this is smaller and faster, and for truncations to > unsigned short this is the same size, but faster. The drawback is that > performing this late (in peephole2) doesn't eliminate the stack frame > prolog/epilog. Thoughts? Looks good to me. Many thanks!
[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511 Roger Sayle changed: What|Removed |Added CC||roger at nextmovesoftware dot com --- Comment #4 from Roger Sayle --- Created attachment 54871 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54871=edit proposed patch Proposed patch, using a peephole2 in avr-dimode.md to inline calls to __lshrdi3 that require only a single instruction or two (due to truncation). For truncations to char, this is smaller and faster, and for truncations to unsigned short this is the same size, but faster. The drawback is that performing this late (in peephole2) doesn't eliminate the stack frame prolog/epilog. Thoughts?
[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511 --- Comment #3 from Georg-Johann Lay gjl at gcc dot gnu.org --- (In reply to Matthijs Kooijman from comment #2) So, IIUC, this is quite hard to fix? Either you use lib functions, which prevents the optimizer from just relabeling or coyping registers to apply shifting, or you don't and then more complex operations will become very verbose and messy? avr-gcc is using lib functions, but not as libcall but as transparent call instead. You'll find the implementation in avr-dimode.md. avr-gcc has no proper DImode support. For reasoning cf. the comments in the head of that file.
[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511 --- Comment #2 from Matthijs Kooijman matthijs at stdin dot nl --- So, IIUC, this is quite hard to fix? Either you use lib functions, which prevents the optimizer from just relabeling or coyping registers to apply shifting, or you don't and then more complex operations will become very verbose and messy? Would it make sense (and be possible) to add a special case to not use lib functions for shifts by a constant number of bits that is also a multiple of 8? At first glance, that would make a lot of common cases (where an integer is decomposed into separate bytes or other parts) a lot faster, while still keeping the lib functions for more complex operations?
[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511 Georg-Johann Lay gjl at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Target||avr Priority|P3 |P5 CC||gjl at gcc dot gnu.org Severity|normal |enhancement
[Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511 --- Comment #1 from Georg-Johann Lay gjl at gcc dot gnu.org --- (In reply to Matthijs Kooijman from comment #0) I haven't found a readily available 5.x package yet to test. It's the same. As you can see, the versions operating on 64 bit values preserve the 8-bit shift (which is very inefficient on AVR), while the versions running on 32 bit values simply copy the right registers. Lib functions are used because users complained about bloated 64-bit arithmetic. Notice that indide these 64-bit shift functions byte-shifts are used. The foo32_16 function still has some useless instructions (r27 and r26 are not part of the return value, not sure why these are set) but that is probably an unrelated problem. Yes. I've marked this with component target, since I think these optimizations are avr-specific (or at least not applicable to bigger architectures).