[Bug target/29560] [avr] Poor optimization for byte shifts
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560 --- Comment #10 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-08-10 08:58:07 UTC --- Author: gjl Date: Wed Aug 10 08:58:03 2011 New Revision: 177616 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=177616 Log: PR target/29560 * config/avr/avr.md (*ashlhiqi3): New insn-and-split. (*ashlextend_prefixqihiqi3): New insn-and-splits. (*ashlextend_prefixqihiqi3.mem): New insn-and-splits. Add peephole2 to map ashlhi3 to ashlqi3 if high part of shift target is unused. Modified: trunk/gcc/ChangeLog trunk/gcc/config/avr/avr.md
[Bug target/29560] [avr] Poor optimization for byte shifts
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560 Georg-Johann Lay gjl at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED Known to fail|4.7.0 | --- Comment #11 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-08-10 09:19:49 UTC --- Closed as FIXED for 4.7.0 trunk with patch from comment #10. The test case from attachment 24816 compiles with -Os -dp to: shift1: rjmp 2f ; 22*ashlqi3/1[length = 5] 1:lsl r24 2:dec r22 brpl 1b ret ; 27return[length = 1] shift2: mov r25,r24 ; 19*movqi/1[length = 1] rjmp 2f ; 15*ashlqi3/1[length = 5] 1:lsl r25 2:dec r22 brpl 1b sts y,r25 ; 16*movqi/3[length = 2] ret ; 22return[length = 1] setpin: in r25,44-0x20 ; 11*movqi/4[length = 1] ldi r18,lo8(1) ; 13*movhi/4[length = 2] ldi r19,hi8(1) mov r0,r24 ; 49*ashlqi3/1[length = 5] rjmp 2f 1:lsl r18 2:dec r0 brpl 1b cpi r22,lo8(1) ; 7*cmpqi/3[length = 1] brne .L4 ; 8branch[length = 1] or r25,r18 ; 15iorqi3/1[length = 1] out 44-0x20,r25 ; 17*movqi/3[length = 1] ret ; 46return[length = 1] .L4: com r18 ; 27one_cmplqi2[length = 1] and r18,r25 ; 28andqi3/1[length = 1] out 44-0x20,r18 ; 30*movqi/3[length = 1] ret ; 48return[length = 1] Notice the *ashlqi3 insns. The fix uses combine patterns and an RTL peephole to transform 16-bit shift to 8-bit shift if applicable. There will be situations where optimization opportunities are not noticed, e.g. if a move occurs after the shift so that this move will get a REG_UNUSED note but not the shift loop itself. The fix just replaces 16-bit loop by 8-bit loop; there is no back-propagation of the information.
[Bug target/29560] [avr] Poor optimization for byte shifts
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560 --- Comment #9 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-07-25 12:48:29 UTC --- Created attachment 24827 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24827 Fix PR29560 by adding peephole2 pattern. PR target/29560 * config/avr/avr.md: Add peephole2 to map ashlhi3 to ashlqi3 if high part of shift target is unused.
[Bug target/29560] [avr] Poor optimization for byte shifts
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560 --- Comment #7 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-07-23 14:50:58 UTC --- Created attachment 24816 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24816 Test case for 16-bit shifts that use low part only Reopened this optimization issue. For the attached test case, there are sometimes REG_UNUSED notes for the high part like with avr-gcc-4.6.1 -Os -dP shift1: ; (insn 8 4 17 (set (reg:HI 24 r24 [50]) ; (ashift:HI (reg:HI 24 r24 [ x ]) ; (reg/v:QI 22 r22 [orig:47 s ] [47]))) foo.c:3 68 {ashlhi3} ; (expr_list:REG_DEAD (reg/v:QI 22 r22 [orig:47 s ] [47]) ; (expr_list:REG_UNUSED (reg:QI 25 r25) ; (nil rjmp 2f ; 8ashlhi3/1[length = 6] 1:lsl r24 rol r25 2:dec r22 brpl 1b So that there is a way to map ashlhi3 to ashlqi3, i.e. 16-bit shift to a 8-bit shift. Unfortunately, these notes are not always present like in shift2: shift2: ; (insn 15 4 8 (set (reg:HI 18 r18) ; (reg:HI 24 r24 [ x ])) foo.c:10 10 {*movhi} ; (nil)) movw r18,r24 ; 15*movhi/1[length = 1] ; (insn 8 15 9 (set (reg:HI 18 r18) ; (ashift:HI (reg:HI 18 r18) ; (reg/v:QI 22 r22 [orig:46 s ] [46]))) foo.c:10 68 {ashlhi3} ; (expr_list:REG_DEAD (reg/v:QI 22 r22 [orig:46 s ] [46]) ; (nil))) rjmp 2f ; 8ashlhi3/1[length = 6] 1:lsl r18 rol r19 2:dec r22 brpl 1b The notes are not present in pass .split2 so a split won't help. The notes appear to be available in .peephole2 so that could be a fix. Moreover, the notes won't be back-propagated so that an optimization will cover at most one insn: the shift insn.
[Bug target/29560] [avr] Poor optimization for byte shifts
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560 Georg-Johann Lay gjl at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|NEW Resolution|INVALID | Target Milestone|--- |4.7.0 --- Comment #8 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-07-23 14:53:52 UTC --- Set to NEW as explained in previous post.
[Bug target/29560] [avr] Poor optimization for byte shifts
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560 Georg-Johann Lay gjl at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||INVALID --- Comment #6 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-06-21 18:24:44 UTC --- Closing this PR as invalid. As Andy already explained, for shift offsets 7 and 15 there has to be a defined result because int for avr-gcc is 16 bits, and this is actually *not* a byte shift. With alternative code that tested for these offsets at runtime, jumped around and loaded 0 if appropriate, you were not better of than with the current code -- in the contrary. Note that gcc deflates you function calls to just one instruction in main. So maybe what you really want here is to make these functions static inline so that you do no more see the code in the functions. Presumably you always call this functions with compile time constants. If you really need to quench out the last tick and call with non-constants, you could invent inline assembler with an instruction sequence that is similar to (1 (R24 7)), perhaps together with __builtin_constant_p magic: LDI R24, 1 SBRC R24, 1 LDI R24, 4 SBRC R24, 0 LSL R24 SBRC R24, 2 SWAP R24 Note that with -mint8, int is 8 bits. That is still present as option, but no more supported and perhaps non-functional. -- Just for reference; here is a source without external #include: #define PORTC (*((unsigned char volatile*) 53)) void setpin(unsigned char pin, unsigned char val) { if (val == 1) PORTC |= (1pin); else PORTC = ~(1pin); } void setpin1(unsigned char pin, unsigned char val) { const unsigned char one = 1; if (val == 1) PORTC |= (one(pin)); else PORTC = ~(onepin); } int main(void) { setpin(3, 1); setpin1(3, 1); return 0; }
[Bug target/29560] [avr] Poor optimization for byte shifts
--- Comment #5 from Rudolf dot Leitgeb at gmx dot at 2010-02-18 13:28 --- Right away: I am NOT an expert in compiler or optimizer design, so please bear with me. I understand, that (((unsigned char)1) ((unsigned char)something)) may need more than 8 bits for representation and that gcc's default int size on the ATmega platform is 16 bits. But the result is assigned to an 8 bit value. I take it that there is no way to back propagate this potential optimization from the assignment to the shifting step. If you look in my assembly code, r25 is computed with great effort yet never transferred anywhere. The trick with 7 is nice but introduces another unnecessary AND operation. And it is only correct if the input numebr is guaranteed to be between 0 and 7. Am I correct that this whole optimization issue comes from the fact that ATmega is an 8 bit architecture yet gcc's int on that platform is 16 bit? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560
[Bug target/29560] [avr] Poor optimization for byte shifts
--- Comment #4 from hutchinsonandy at gcc dot gnu dot org 2009-12-31 03:16 --- The same occurs in 4.5 I think the bug is invalid The expression 1 pin will be promoted. This produces a defined result if pin7 and 15 The expression can not then be lower to 8 bit shift - since a shift by 7 is undefined. If you use pin 7, the shift is indeed 8 bit shift. Though it still loads a int (HIMODE) value of 1 that should have been reduced to QIMODE. setpinx: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ .L__stack_usage = 0 mov r25,r24 andi r25,lo8(7) ldi r18,lo8(1) ldi r19,hi8(1) mov r24,r18 rjmp 2f 1: lsl r24 2: dec r25 brpl 1b cpi r22,lo8(1) brne .L7 in r25,53-0x20 or r25,r24 out 53-0x20,r25 ret .L7: in r25,53-0x20 com r24 and r24,r25 out 53-0x20,r24 ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560
[Bug target/29560] [avr] Poor optimization for byte shifts
--- Comment #3 from eric dot weddington at atmel dot com 2009-08-24 17:18 --- Confirmed on 4.3.2. -- eric dot weddington at atmel dot com changed: What|Removed |Added Severity|normal |enhancement Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 GCC build triplet|powerpc-apple-darwin7.9.0 | GCC host triplet|powerpc-apple-darwin7.9.0 | Keywords||missed-optimization Known to fail||4.1.1 4.3.2 Last reconfirmed|-00-00 00:00:00 |2009-08-24 17:18:30 date|| Summary|Poor optimization for |[avr] Poor optimization for |character shifts on Atmel |byte shifts |AVR | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560