[Bug target/11180] [avr-gcc] Optimization decrease performance of struct assignment.
--- Additional Comments From andrewhutchinson at cox dot net 2005-03-27 14:33 --- The problem here is that gcc is using a DImode register to handle 6 byte (int+long) structure. Why I have no idea! Since the target has no insn for DI move, gcc turns this into individual QImode byte moves (subregs all over the place!). The 'stacked' 6 byte structure is 'popped' into DI register (6 bytes ). Two other byte registers are explicitely cleared (making our 8 byte DI register) What then follows is a large amount of shuffling. i.e. Moving from intermediate virtual DI register (8 bytes) into correct place for a 6 byte return. Which seems to surpass the abilities of the register allocator (DI and return registers overlap). Smaller structures (=4 bytes) are optimally handled. Larger structure 8 are also much better since they are returned in memory. So in summary, it would appear that the root cause is allocation of a DI mode register for structures 4 and =8 bytes. A secondary factor is the use of QImode moves (when SI,HImode are available and more efficient) The problem can be partially alleviated by defining DImode moves (that a hell of a change though). Poor code still remains - for example clearing unused padding bytes and extra register usage. PS -fpack-struct does not change this bug. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11180
[Bug bootstrap/20452] New: HEAD ICE during make install
gcc build fails with: gcc -c -g -O2 -DIN_GCC -DCROSS_COMPILE -W -Wall -Wwrite-strings -Wstrict-prot otypes -Wmissing-prototypes -fno-common -DHAVE_CONFIG_H-I. -I. -I../../gc c/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/inc lude ../../gcc/gcc/c-lex.c -o c-lex.o ../../gcc/gcc/c-lex.c: In function `c_lex_with_flags': ../../gcc/gcc/c-lex.c:428: error: too many arguments to function `cpp_spell_toke n' make[1]: *** [c-lex.o] Error 1 make[1]: Leaving directory `/home/cvsroot/awhconf/gcc' make: *** [all-gcc] Error 2 Apparently due to: *cpp_spell_token (parse_in, tok, name, true) = 0; in c-lex.c where cpp_spell_token appears to now only have 3 arguments elsewhere. configured using: $ ../gcc/configure --prefix=/avrdev --enable-languages=c,c++ --target=avr --d isable-nls -- Summary: HEAD ICE during make install Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: bootstrap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: andrewhutchinson at cox dot net CC: gcc-bugs at gcc dot gnu dot org GCC host triplet: i686-pc-cygwin GCC target triplet: avr http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20452
[Bug target/18251] unable to find a register to spill in class `POINTER_REGS'
--- Additional Comments From andrewhutchinson at cox dot net 2005-03-12 21:20 --- (In reply to comment #18) (In reply to comment #17) I think it is always true but the original used the same predicate and test (so I played safe). The pattern only helps if it is a constant. I also thought it should handle variable block size. However, I found gcc already produces optimal code for that case without any help. Marek, can you review this bug, the attached patches, and possibly approve committing the fix? I'm looking into it right now. I'm not sure about one thing: should movmemhi handle only constant block sizes, or variable block sizes too? If variable - is it safe to assume nonzero? (now 0 means 65536) Operand 2 (block size) has the const_int_operand predicate - doesn't this mean that (GET_CODE(operands[2]) == CONST_INT) is always true? Thanks, Marek -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18251
[Bug target/18251] unable to find a register to spill in class `POINTER_REGS'
--- Additional Comments From andrewhutchinson at cox dot net 2005-03-13 01:19 --- Subject: Re: unable to find a register to spill in class `POINTER_REGS' The concerns have merit but can be discounted:. The reload problem occurs because the original pattern demands two pointers in parrallel existence.(Actually two pointers and a counter!) The current register allocation is imperfect and with this constraint (and only 3 pointers incl frame) fails to find a solution. The new RTL expansion does not demand both pointer registers at the same time - indeed they could be the same register in extreme circumstances. Breaking up the RTL reveals this to GCC and allows the register allocator to find a solution. So that is why it works. The SAME result can also be realised by deleting the offending pattern - in this situation GCC generates it's own solution which happens to be identical RTL to the proposed solution (with a 16 bit counter). And indeed there is no reload failure. Since the proposed pattern FAILS, the variable case, we will still end up with GCC's solution and we can conclude there will be no hidden reload issue. (It should also be noted that a variable count is also less retrictive on hard register use than a constant). Now here is the neat bit!. Since GCC middle end generates the detailed RTL loop, for a variable count, we can and should rely on it to consider any restriction on the variable (ie variable count=0). If not, its very very broken. I was very tempted to submit a patch that just deletes the pattern, however, that would produce worse code for the very common case where fixed count255. I hope this clarifies things. marekm at amelek dot gda dot pl wrote: --- Additional Comments From marekm at amelek dot gda dot pl 2005-03-13 00:30 --- Subject: Re: unable to find a register to spill in class `POINTER_REGS' On Sat, Mar 12, 2005 at 09:20:18PM -, andrewhutchinson at cox dot net wrote: The pattern only helps if it is a constant. I also thought it should handle variable block size. However, I found gcc already produces optimal code for that case without any help. See below for revised patch (currently for mainline): - FAIL if count is not a CONST_INT - handle count == 0 (nothing to do) - handle count 32767 (negative in RTL, mask with 0x) - minor formatting fixes But, I'm still concerned a little about the variable block size: - __tmp_reg__ will not be used (some other register will) - more importantly, can the problem from this PR (unable to find a register to spill in class POINTER_REGS) still occur in the variable size case? (only with a different, not yet known test case - this means we are perhaps trying to hide the real bug instead of fixing it...) If we have to handle the variable count case too, one more insn will be needed (initially jump to decrementing the counter; test for carry instead of zero). Some other targets handle this by calling a subroutine in libgcc.S - smaller (but slower) than generating the loop inline. Marek 2005-03-12 Andy Hutchinson [EMAIL PROTECTED] PR target/18251 * config/avr/avr.md (movmemhi): Rewrite as RTL loop. (*movmemqi_insn): Delete. (*movmemhi): Delete. Index: avr.md === RCS file: /cvs/gcc/gcc/gcc/config/avr/avr.md,v retrieving revision 1.50 diff -c -3 -p -r1.50 avr.md *** avr.md 6 Mar 2005 21:50:36 - 1.50 --- avr.md 12 Mar 2005 23:51:57 - *** *** 346,421 ;;= ;; move string (like memcpy) (define_expand movmemhi [(parallel [(set (match_operand:BLK 0 memory_operand ) ! (match_operand:BLK 1 memory_operand )) !(use (match_operand:HI 2 const_int_operand )) !(use (match_operand:HI 3 const_int_operand )) !(clobber (match_scratch:HI 4 )) !(clobber (match_scratch:HI 5 )) !(clobber (match_dup 6))])] { ! rtx addr0, addr1; ! int cnt8; enum machine_mode mode; if (GET_CODE (operands[2]) != CONST_INT) FAIL; - cnt8 = byte_immediate_operand (operands[2], GET_MODE (operands[2])); - mode = cnt8 ? QImode : HImode; - operands[6] = gen_rtx_SCRATCH (mode); - operands[2] = copy_to_mode_reg (mode, - gen_int_mode (INTVAL (operands[2]), mode)); - addr0 = copy_to_mode_reg (Pmode, XEXP (operands[0], 0)); - addr1 = copy_to_mode_reg (Pmode, XEXP (operands[1], 0)); ! operands[0] = gen_rtx_MEM (BLKmode, addr0); ! operands[1] = gen_rtx_MEM (BLKmode, addr1); }) - (define_insn *movmemqi_insn - [(set (mem:BLK (match_operand:HI 0 register_operand e)) - (mem:BLK (match_operand:HI 1 register_operand e))) -(use (match_operand:QI 2 register_operand r)) -(use (match_operand:QI 3 const_int_operand i
[Bug target/18251] unable to find a register to spill in class `POINTER_REGS'
--- Additional Comments From andrewhutchinson at cox dot net 2005-03-13 02:44 --- Subject: Re: unable to find a register to spill in class `POINTER_REGS' This is a define EXPAND. predicates (such as const_int_operand) and pattern have no effect at all on generated code or matching. This pattern always emits DONE or FAIL. That is why you need to test operand in body. And with ox is wrong. As is trying to handle the variable count case. That is fixing something that is not broke. So looks like my patch is ok? schlie at comcast dot net wrote: --- Additional Comments From schlie at comcast dot net 2005-03-13 02:06 --- (In reply to comment #20) with reference to the most recent patch: - anding with 0x may turn negatives positive so it seems wrong. - there's no need limit byte counts to below 0x100 for bytes, as 0xFF is a count as long is it was originally verifed that the integer value is positive. And just as a heads up, from when I was fooling a differnt varant, discovered that (use (match_operand:HI 2 const_int_operand )) apparently also matches variable operands when compiling avrlibc: Apparently failing as no code is generated: ../../../../libc/stdlib/realloc.c:154: error: unrecognizable insn: (insn 235 232 236 31 ../../../../libc/stdlib/realloc.c:151 (parallel [ (set (mem:BLK (reg/v/f:HI 49 [ memp ]) [0 A8]) (mem:BLK (reg/v/f:HI 60 [ ptr ]) [0 A8])) (use (reg:HI 81 [ variable.sz ])) (use (const_int 1 [0x1])) ]) -1 (insn_list:REG_DEP_TRUE 232 (nil)) (expr_list:REG_DEAD (reg:HI 81 [ variable.sz ]) (expr_list:REG_DEAD (reg/v/f:HI 49 [ memp ]) (nil From the following yet another version of Andy's patch: (and for the hell of it, enclosed at the end, a version which attempts to handle variable counts, but couldn't figure out how to get the conditional insertion of a forward branch label generated correctly:) - won't emit code unless (count 0). - removes code for non-constant count moves; as it would have generated incorrect code for move count = 0. - allocates a temporary, rather than presuming r0 is safe to use. (and seems to generate just as good code, as a step to freeing r0) -- def -- ;; move string (like memcpy) ;; implement as RTL loop (define_expand movmemhi [(parallel [(set (match_operand:BLK 0 memory_operand ) (match_operand:BLK 1 memory_operand )) (use (match_operand:HI 2 const_int_operand )) (use (match_operand:HI 3 const_int_operand ))] )] { int cnt8, prob; enum machine_mode mode; rtx loop_reg; rtx label = gen_label_rtx (); /* Copy pointers into new psuedos - they will be changed. */ rtx addr0 = copy_to_mode_reg (Pmode, XEXP (operands[0], 0)); rtx addr1 = copy_to_mode_reg (Pmode, XEXP (operands[1], 0)); /* If loop count is constant, try to use QImode counter. */ if ((GET_CODE (operands[2]) == CONST_INT) (INTVAL (operands[2]) 0)) { /* See if constant fit 8 bits. */ cnt8 = byte_immediate_operand (operands[2], GET_MODE (operands[2])); mode = cnt8 ? QImode : HImode; /* Create loop counter register. */ loop_reg = copy_to_mode_reg (mode, gen_int_mode (INTVAL (operands[2]), mode)); /* Create RTL code for move loop, with label at top of loop. */ emit_label (label); /* Move one byte into scratch and inc pointer. */ rtx tmp_reg = copy_to_mode_reg (QImode, gen_rtx_MEM (QImode, addr1)); emit_move_insn (addr1, gen_rtx_PLUS (Pmode, addr1, const1_rtx)); /* Move scratch into mem, and inc other pointer. */ emit_move_insn (gen_rtx_MEM (QImode, addr0), tmp_reg); emit_move_insn (addr0, gen_rtx_PLUS (Pmode, addr0, const1_rtx)); /* Decrement count. */ emit_move_insn (loop_reg, gen_rtx_PLUS (mode, loop_reg, constm1_rtx)); /* Compare with zero and jump if not equal. */ emit_cmp_and_jump_insns (loop_reg, const0_rtx, NE, NULL_RTX, mode, 1, label); /* Set jump probability based on loop count. */ rtx jump = get_last_insn (); prob = REG_BR_PROB_BASE - (REG_BR_PROB_BASE / INTVAL (operands[2])); REG_NOTES (jump) = gen_rtx_EXPR_LIST (REG_BR_PROB, GEN_INT (prob), REG_NOTES (jump)); DONE; }}) This time attempting to handle variable counts: ;; move string (like memcpy) ;; implement as RTL loop (define_expand movmemhi [(parallel [(set (match_operand:BLK 0 memory_operand ) (match_operand:BLK 1 memory_operand )) (use (match_operand:HI 2 const_int_operand )) (use (match_operand:HI 3 const_int_operand ))] )] { enum machine_mode mode = HImode; int prob = (REG_BR_PROB_BASE * 95) / 100; rtx test_label = 0; /* Initial no-test value. */ /* Specify default variable loop count initial value. */ rtx loop_cnt
[Bug target/18251] unable to find a register to spill in class `POINTER_REGS'
--- Additional Comments From andrewhutchinson at cox dot net 2005-03-13 04:05 --- Subject: Re: unable to find a register to spill in class `POINTER_REGS' You answered your own question. GCC handles variable moves just like anything else. Dealing with range of possible values and size etc and construct appropriate RTL. GCC does not need this backend define or expand. It is quite happy working out moves by itself. The pattern is *only* defined when the target can do a better job - i.e. when we have a constant byte count - but not otherwise. I have found it's a really bad idea to second guess compiler optimisations. - how about in the case of a variable count = 0 ? (or since only constants are handled, it falls back to letting gcc figure it out?) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18251
[Bug target/19684] avr-gcc 4.0 (and 3.3.4): wrong size in asm comment
--- Additional Comments From andrewhutchinson at cox dot net 2005-03-03 01:57 --- This is almost certainly caused by code peepholes doing last minute optimisation of the code just before the assembler is generated. Prior to that, all RTL instructions have a length (in 16 bit words) that is *soley* used to select the appropriate jump and branch instructions. The size comments are generated from those lengths. If a peephole does indeed change some instructions, they will quite likely over estimate the final size (as is the case presented here) The length (and so size) is fine if it over estimates the actual size (Jumps/branches can always work over shorter distances.) The actual jump displacments are based on labels so they are unaffected. There may be some other areas of the backend that apply worse-case estimates of asm instruction size to avoid the complexity of calculating every situation. However, peepholes definitely do this! I would suggest this is a non-bug as the size is an internal compiler debug comment and there is no regression, misoptimisation or similar downside. If the size ever under-estimates the true size THAT IS A BUG! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19684
[Bug c/20222] New: [AVR] Double load of volatile operand
007CCBD0 i1) [2 i1+0 S2 A8])) -1 (nil) (nil)) (insn 14 13 15 (set (reg:HI 43) (mem/v/f:HI (symbol_ref:HI (i1) [flags 0x40] var_decl 007CCBD0 i1) [2 i1+0 S2 A8])) -1 (nil) (nil)) (insn 15 14 16 (set (cc0) (reg:HI 43)) -1 (nil) (nil)) (jump_insn 16 15 17 (set (pc) (if_then_else (ge (cc0) (const_int 0 [0x0])) (label_ref 18) (pc))) -1 (nil) (nil)) (insn 17 16 18 (set (reg:HI 43) (neg:HI (reg:HI 43))) -1 (nil) (nil)) (code_label 18 17 19 3 [0 uses]) (insn 19 18 20 (parallel [ (set (cc0) (compare (reg:HI 43) (const_int 1 [0x1]))) (clobber (scratch:QI)) ]) -1 (nil) (nil)) (jump_insn 20 19 21 (set (pc) (if_then_else (eq (cc0) (const_int 0 [0x0])) (label_ref 27) (pc))) -1 (nil) (nil)) (note 21 20 22 00729208 NOTE_INSN_BLOCK_BEG) (note 22 21 23 NOTE_INSN_DELETED) (note 23 22 24 (testabs-2.c) 12) (call_insn 24 23 25 (call (mem:HI (symbol_ref:HI (abort) [flags 0x41] function_decl 00779690 abort) [0 S2 A8]) (const_int 0 [0x0])) -1 (nil) (expr_list:REG_NORETURN (const_int 0 [0x0]) (expr_list:REG_EH_REGION (const_int 0 [0x0]) (nil))) (nil)) (barrier 25 24 26) (note 26 25 27 00729208 NOTE_INSN_BLOCK_END) (code_label 27 26 28 2 [0 uses]) (note 28 27 29 00729230 NOTE_INSN_BLOCK_END) (note 29 28 30 00729280 NOTE_INSN_BLOCK_BEG) (note 30 29 31 NOTE_INSN_DELETED) (note 31 30 32 (testabs-2.c) 13) (insn 32 31 33 (set (reg:HI 44) (mem/f:HI (symbol_ref:HI (xi1) [flags 0x40] var_decl 007CCAF0 xi1) [2 xi1+0 S2 A8])) -1 (nil) (nil)) (insn 33 32 34 (set (reg:HI 45) (mem/f:HI (symbol_ref:HI (xi1) [flags 0x40] var_decl 007CCAF0 xi1) [2 xi1+0 S2 A8])) -1 (nil) (nil)) (insn 34 33 35 (set (cc0) (reg:HI 45)) -1 (nil) (nil)) (jump_insn 35 34 36 (set (pc) (if_then_else (ge (cc0) (const_int 0 [0x0])) (label_ref 37) (pc))) -1 (nil) (nil)) (insn 36 35 37 (set (reg:HI 45) (neg:HI (reg:HI 45))) -1 (nil) (nil)) (code_label 37 36 38 5 [0 uses]) (insn 38 37 39 (parallel [ (set (cc0) (compare (reg:HI 45) (const_int 1 [0x1]))) (clobber (scratch:QI)) ]) -1 (nil) (nil)) (jump_insn 39 38 40 (set (pc) (if_then_else (eq (cc0) (const_int 0 [0x0])) (label_ref 46) (pc))) -1 (nil) (nil)) (note 40 39 41 00729258 NOTE_INSN_BLOCK_BEG) (note 41 40 42 NOTE_INSN_DELETED) (note 42 41 43 (testabs-2.c) 14) (call_insn 43 42 44 (call (mem:HI (symbol_ref:HI (abort) [flags 0x41] function_decl 00779690 abort) [0 S2 A8]) (const_int 0 [0x0])) -1 (nil) (expr_list:REG_NORETURN (const_int 0 [0x0]) (expr_list:REG_EH_REGION (const_int 0 [0x0]) (nil))) (nil)) (barrier 44 43 45) (note 45 44 46 00729258 NOTE_INSN_BLOCK_END) (code_label 46 45 47 4 [0 uses]) (note 47 46 48 00729280 NOTE_INSN_BLOCK_END) (note 48 47 49 007292A8 NOTE_INSN_BLOCK_END) (note 49 48 50 NOTE_INSN_FUNCTION_END) (note 50 49 51 (testabs-2.c) 16) (code_label 51 50 0 1 [0 uses]) -- Summary: [AVR] Double load of volatile operand Product: gcc Version: 3.4.3 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: andrewhutchinson at cox dot net CC: gcc-bugs at gcc dot gnu dot org GCC target triplet: avr http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20222
[Bug target/18251] unable to find a register to spill in class `POINTER_REGS'
--- Additional Comments From andrewhutchinson at cox dot net 2005-02-22 12:31 --- Subject: Re: unable to find a register to spill in class `POINTER_REGS' if you can wait 12hrs I'll create 3.4 version. Alternatively cut n paste from a 4.0 avr.md the change is local to one area. dieterbmeier at yahoo dot com wrote: --- Additional Comments From dieterbmeier at yahoo dot com 2005-02-22 10:32 --- Andy's patch works great for HEAD, but I get patching file avr.md Hunk #1 FAILED at 344. 1 out of 1 hunk FAILED -- saving rejects to file avr.md.rej when patching 3_4 branch. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18251
[Bug target/18251] unable to find a register to spill in class `POINTER_REGS'
--- Additional Comments From andrewhutchinson at cox dot net 2005-02-12 13:50 --- A sub-optimal fix is to disable movmemhi expansion. Either delete it or the less draconian: (define_expand movmemhi [(parallel [(set (match_operand:BLK 0 memory_operand ) (match_operand:BLK 1 memory_operand )) (use (match_operand:HI 2 const_int_operand )) (use (match_operand:HI 3 const_int_operand )) (clobber (match_scratch:HI 4 )) (clobber (match_scratch:HI 5 )) (clobber (match_dup 6))])] (0) etc A better solution is currently being tested. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18251
[Bug c/19924] New: [AVR] MODES_TIEABLE incorrect
MODES_TIEABLE in avr is set such that access to subreg is prevented. This result is significantly sub-optimal code. Attached patch changes this. No regressions have been found with test suite - indeed things got better! Note this is related to PR/19815 documentation error. -- Summary: [AVR] MODES_TIEABLE incorrect Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: andrewhutchinson at cox dot net CC: gcc-bugs at gcc dot gnu dot org GCC target triplet: avr http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19924
[Bug c/19924] [AVR] MODES_TIEABLE incorrect
--- Additional Comments From andrewhutchinson at cox dot net 2005-02-12 15:35 --- Created an attachment (id=8186) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8186action=view) Patch to chnage MODES_TIEABLE -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19924
[Bug target/19636] Can't compile ethernut OS (avr-gcc)
--- Additional Comments From andrewhutchinson at cox dot net 2005-02-13 02:07 --- Try patch attached to PR 18251. Good chance it will fix. If not, pass me the source for a llok at. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19636
[Bug target/19636] Can't compile ethernut OS (avr-gcc)
-- What|Removed |Added CC||andrewhutchinson at cox dot ||net http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19636
[Bug c/19835] New: [AVR] Loop variable gets widened to LONG instead of int
GNU C version 4.0.0 20041205 (experimental) (avr) Loop variable gets widened to LONG instead of unsigned int (or perhaps even int). Seems we forgeot how big the target is? Testcase: struct S19 { unsigned char i[19]; }; void init (struct S19 *p, int i) { int j; for (j = 0; j 19; j++) p-i[j] = i + j; } tree dump: ;; Function init (init) init (p, i) { long unsigned int ivtmp.3; bb 0: ivtmp.3 = 0; L0:; ((unsigned char *) ivtmp.3 + p-i[0])-i[0] = (unsigned char) ivtmp.3 + (unsigned char) (signed char) i; ivtmp.3 = ivtmp.3 + 1; if (ivtmp.3 != 19) goto L0; else goto L2; L2:; return; } Not surprisingly, backend code reflects long (SImode) decrement and compare : /* prologue: frame size=0 */ /* prologue end (size=0) */ movw r26,r24 ldi r18,lo8(0) ldi r19,hi8(0) ldi r20,hlo8(0) ldi r21,hhi8(0) .L2: movw r30,r26 add r30,r18 adc r31,r19 mov r24,r18 add r24,r22 st Z,r24 subi r18,lo8(-(1)) sbci r19,hi8(-(1)) sbci r20,hlo8(-(1)) sbci r21,hhi8(-(1)) cpi r18,lo8(19) cpc r19,__zero_reg__ cpc r20,__zero_reg__ cpc r21,__zero_reg__ brne .L2 /* epilogue: frame size=0 */ ret -- Summary: [AVR] Loop variable gets widened to LONG instead of int Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: andrewhutchinson at cox dot net CC: gcc-bugs at gcc dot gnu dot org GCC host triplet: cygwin GCC target triplet: avr http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19835
[Bug tree-optimization/19686] [4.0 Regression] loop performance decrease, not comparing against 0
--- Additional Comments From andrewhutchinson at cox dot net 2005-02-08 01:48 --- I ran testcase with proposed avr_costs patch applied. The result is unchanged. The initially generated RTL is unfortunately beyond that which can be fixed by backend. I dont think this problem is avr specific, it should appear on other targets. I have attached initially generated RTL. It is alarmingly complex given starting point. (This is not so apparent in the assembler as the backend has done a rather good job of tidying up what it can.) Perhaps somebody could glance at this to see what exactly went off the rails. It might just be a manifestation of a known problem. Wish I could help more - but trees are beyond me at the moment. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19686
[Bug tree-optimization/18219] [4.0 Regression] gcc-4.0.0 bloats code by 31%
--- Additional Comments From andrewhutchinson at cox dot net 2005-02-08 02:12 --- Bad post ignore -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18219
[Bug other/19815] New: Documentation change - GCC Internals MODES_TIEABLE_P
Documentation change - GCC Internals The definition of MODES_TIEABLE_P is incorrect and has resulted in reduced optimisation for the avr target (and perhaps others) The definition is currently: A C expression that is nonzero if a value of mode mode1 is accessible in mode mode2 without copying. This part would be ok but is then detailed as : If HARD_REGNO_MODE_OK (r, mode1) and HARD_REGNO_MODE_OK (r, mode2) are always the same for any r, then MODES_TIEABLE_P (mode1, mode2) should be nonzero. If they differ for any r, you should define this macro to return zero unless some other mechanism ensures the accessibility of the value in a narrower mode. This second paragraph is too restrictive. MODES_TIEABLE_p may also be nonzero if r is accessible in any SMALLER mode. In the particular example of the avr target, word or larger registers are assigned even numbered registers ONLY. Byte registers have no such restriction. Because this does indeed fail the second paragraph criteria, MODE_TIEABLE_P has been set 0=FALSE preventing byte operations on the word register and uneeded register moves. It should have been set TRUE. I was tempted to report this as AVR target bug - but the code is not really the problem. Note that the definition is often included in target header files as well as gcc internal manual. -- Summary: Documentation change - GCC Internals MODES_TIEABLE_P Product: gcc Version: 3.4.3 Status: UNCONFIRMED Severity: minor Priority: P2 Component: other AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: andrewhutchinson at cox dot net CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19815
[Bug tree-optimization/19686] [4.0 Regression] loop performance decrease, not comparing against 0
--- Additional Comments From andrewhutchinson at cox dot net 2005-02-06 23:06 --- Taking X as the initial value of x on function entry. The loop is defined as i=X to 0, step -1. Which is a simple do loop. It gets optimized as i=0 to -X, step -1. (Which is something bizarre!) The code increase is due to 1) Computation of -X and 2) compare said -X -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19686
[Bug c/19703] New: Poor optimisation of loop test
Missed optimization gcc version 4.0.0 20041205 (experimental) 4.0.0/cc1.exe -quiet -v looprv.c -quiet -dumpbase looprv.c -mmcu=atmega169 -auxbase looprv -Os -Wall -version -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -o looprv.s Down counting loop, uses expensive compare EQ (-n) instead of compare =0. Testcase as follows: volatile char value6; extern void foo6(char); void testloop6(void) { int i; for (i=100;i= 0;i-=10) { if (!value6) { foo6(i); } } Loop test in RTL is compare NE -10. It should be compare GE 0 - which is(generally) free. First dump of Expanded RTL show the compare. -- Summary: Poor optimisation of loop test Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: minor Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: andrewhutchinson at cox dot net CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19703
[Bug tree-optimization/19703] Poor optimisation of loop test
--- Additional Comments From andrewhutchinson at cox dot net 2005-01-30 04:58 --- Subject: Re: Poor optimisation of loop test I am not sure what makes you think that. Compare with ZERO is invariabley cheaper than compare with n. The former is free sign status following any conditioning setting instruction - like subtract! Its even the sign bit of the result! subi r28,10 cpi r28, -10 brpl looptop subi r28,10 brpl looptop or did I miss something? pinskia at gcc dot gnu dot org wrote: --- Additional Comments From pinskia at gcc dot gnu dot org 2005-01-30 03:17 --- Hmm, on most targets it is true that != is the same case as =. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19703
[Bug c/19676] New: Loop optimizer fails to reverse simple loop
AVR Target 20041205 snapshot gcc version 4.0.0 20041205 (experimental) /avrdev/libexec/gcc/avr/4.0.0/cc1.exe -quiet -v looprv.c -quiet -dumpbase looprv.c -mmcu=atmega169 -auxbase looprv -Os -Wall -version -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -o looprv.s Loop optimiser fails to reverse simple loop. Example void testloop5(void) { int i; for (i=0;i100;i++) { if (!value) { foo(); } } } generates RTL setting index to 100 then using decrement/branch at end of loop as expected. However, adding any kind of while/for loop inside outer loop leaves index unoptimised. For example void testloop3(void) { int i; for (i=0;i100;i++) { while (!value) { foo(); } } } Here index starts at 0 and increments to 99. Problem seems to be related to maybe_multiple being set in loop scan. However, since 'i' is never used inside loop there would seem to be no need to check for multiple setting. This was tested with AVR target but looks like it will affect any target - I can provide RTL etc on demand. -- Summary: Loop optimizer fails to reverse simple loop Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: andrewhutchinson at cox dot net CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19676
[Bug c/19676] Loop optimizer fails to reverse simple loop
--- Additional Comments From andrewhutchinson at cox dot net 2005-01-28 19:12 --- Created an attachment (id=8092) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8092action=view) Testcase c source Testloop3() is NOT reversed. Others for reference are. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19676
[Bug c/19676] Loop optimizer fails to reverse simple loop
--- Additional Comments From andrewhutchinson at cox dot net 2005-01-28 19:13 --- Created an attachment (id=8093) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8093action=view) Expanded RTL Expanded RTL from looprv testcase source -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19676
[Bug target/19676] Loop optimizer fails to reverse simple loop
--- Additional Comments From andrewhutchinson at cox dot net 2005-01-28 19:14 --- Created an attachment (id=8094) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8094action=view) Optimised RTL Final Optimised RTL before asm code generation. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19676
[Bug tree-optimization/19676] Loop optimizer fails to reverse simple loop
--- Additional Comments From andrewhutchinson at cox dot net 2005-01-28 20:15 --- Subject: Re: Loop optimizer fails to reverse simple loop GCC 3.3.1 did reverse testloop3 but not testloop2() or testloop(4). So 4.0 gets 4/5 right an 3.3.1 3/5 right. Its complicated by other optimisations though on my inner loop code so I could not say if testloop3 is a regression. Im trying to get some results from 3.4.x The only issue with inconsistent patterns is that it makes matching backend patterns more likely to fail. As we need to catch GE 0 or EQ -1 after decrement for almost identical code structure. I am not sure if this is a real probelm or one that gcc will take care of by alternate patter substitutions. pinskia at gcc dot gnu dot org wrote: --- Additional Comments From pinskia at gcc dot gnu dot org 2005-01-28 19:54 --- Confirmed, we should be able to do this on the tree level but don't for testloop2, testloop3, testloop4. To answer this question: * - why is gcc inconsistent in loop reversal bounds Because sometimes we do loop reversal on the tree level or the rtl level. See above about where we don't do it on the tree level. Do you know if all of these loops were loop reversal for say 3.4.0? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19676
[Bug rtl-optimization/14151] [new-ra] new-ra get frame size incorrect
--- Additional Comments From andrewhutchinson at cox dot net 2004-12-25 02:33 --- Problem still present on gcc (GCC) 4.0.0 20041205 (experimental) SNAPSHOT *sigh* -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14151