[avr-gcc-list] Re: loop deleted using optimization
Why does avr-gcc delete my empty for loops if I compile with optimization on? I am able to preserve the loop if I add a NOP in the loop but that will eat up one clock cycle. Is there a way to preserve the empty loops without adding any NOP clock cycles? It would probably work to add an empty asm, as long as you used the __volatile attribute to scare gcc away from reordering code in the loop. So, for () { __asm __volatile () ; } (I note I wasn't able to get gcc 3.3 to optimize away the loop on i386) -- Ben Jackson AD7GD [EMAIL PROTECTED] http://www.ben.com/ ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Using PORTB with variables, PORTB = var
void change() { var = 1; } That's a shift, not a rotation, so var is going to become 0. while (1) { int var=1; while (var != 0) { PORTB = var; var = 1; } } Also, 'int var' is going to be 16 bits wide, so in this version which rotates the bit it's going to spend half its time in high bits and PORTB is going to be 0. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] multiply by constant always expands to int
On Tue, Dec 06, 2005 at 07:33:51PM +, Paulo Marques wrote: I thought this would just happen with any multiply operation, and tried to build a simple test case, and to my surprise everything was just fine. I just did the same thing. Turns out there are at least two things required for my version: one of the operands has to be `const', and you must use `-Os': void g(void) { extern char f(char); const char y = 20; char x, z; z = f(x * y); } Turn off `const' or `-Os' and you get __mulqi3, else __mulhi3 You can probably also just move the literal '20' in for y; -- Ben Jackson [EMAIL PROTECTED] http://www.ben.com/ ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] multiply by constant always expands to int
Using 3.4.4 (I'm off 4.0.1 for now because I haven't had time to figure out why all loop variables are promoted to ints) I had a strange experience where changing: char x = 10, y; loop { f(x * y); } got *longer* when I made it 'const char x = 10' or just hardcoded 10. Turns out to be because mulqi was replaced with mulhi which is longer. Now I'm not 100% sure what the integral type promotion rules say for multiply, but I'd be surprised if they were different for const char vs char, especially if the *const* version widens *more*. Where's most of the development going on now, 3.x or 4.x? I guess I need to buckle down and read the gcc internals docs and start submitting patches. -- Ben Jackson [EMAIL PROTECTED] http://www.ben.com/ ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Saving space in interrupt handler prologues
I don't know if this has been covered already, but noe way to save space in interrupt handlers is to inline all the functions they call. Obviously is it not always practical, but if the functions are split out mainly for clarity this can be a big win. The main reason is that if the interrupt handler doesn't call *any* other functions it isn't obligated to save the caller-saves registers if it doesn't use them. Also, I think someone else wanted to know why things sometimes did/did not get inlined. gcc tries to be clever about this, so if you want the final say, try this: #define NOINLINE __attribute__ ((__noinline__)) #define YESINLINE inline __attribute__ ((__always_inline__)) Make sure to declare the YESINLINE functions static if you don't want a callable copy for external references. Also, as far a general prologue bloat goes: I may have mentioned this before, but I think a lot of it comes from the fact that AVR GCC generates 16 and 32 bit math in parallel instead of series. For example, something like: int32 a, b, c; a = b | c; becomes: lds r24,b lds r25,(b)+1 lds r26,(b)+2 lds r27,(b)+3 lds r18,c lds r19,(c)+1 lds r20,(c)+2 lds r21,(c)+3 or r24,r18 or r25,r19 or r26,r20 or r27,r21 sts a,r24 sts (a)+1,r25 sts (a)+2,r26 sts (a)+3,r27 when it could be: lds r24,b lds r25,c or r24,r25 sts a,r24 ... using only 2 registers instead of 8. None of the load/store instructions set flag bits, so even math with carry can be done this way. I don't think it's a legitimate peephole optimization (though I haven't pondered it deeply) since qualifiers like 'volatile' would prevent the re-ordering of the memory accesses. -- Ben Jackson [EMAIL PROTECTED] http://www.ben.com/ ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Zeroing longs: use memset for -Os?
Due to the size of sts instructions, it's actually fewer bytes to memset(somelong, 0, sizeof(long)). The memset is 12 bytes (count, hi, lo, sts, dec, loop, all 2 bytes) vs 16 for 4 straight sts's. If you're doing more than one variable of course the savings are bigger. -- Ben Jackson [EMAIL PROTECTED] http://www.ben.com/ ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Shift by multiple of 8 is really shifting...
Here I'm doing something like l = ((ulong)i 16) + ...; It's actually loading the i, expanding it, then shifting it. Once again I'm sure I've seen gcc do better (on i386 or PPC, not sure). If it's not useful for me to point out these problems as I run across them, feel free to tell me to shut up. :-) Better yet, give me an idea of where in gcc I should fix this (clearly it could be fixed with a peephole rule of some kind, but I bet it's also possible to just generate better code). BTW, this is not purely academic for me, I'm working on a LC meter that greatly benefits from the float math etc and I've managed to get the float-using version down from 2046 to 1662, which is going to leave space to finish the code. This asm got generated when I started tracking timer0/1 overflows and adding them in, scaled, at the end. It's still a win, but it'd be even better if both copies of this saved 4 more instructions! b2: 20 91 6a 00 lds r18, 0x006A b6: 30 91 6b 00 lds r19, 0x006B ba: 44 27 eor r20, r20 bc: 55 27 eor r21, r21 be: 53 2f mov r21, r19 c0: 42 2f mov r20, r18 c2: 33 27 eor r19, r19 c4: 22 27 eor r18, r18 -- Ben Jackson [EMAIL PROTECTED] http://www.ben.com/ ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Don't use gcc 3.4.4, use 4.0.1
When I built gcc for AVR I wasn't sure what the Right version of GCC to use was. I've been using a cross-compiling GCC 3.4.4 for another platform with success so I went with that. However, even with -morder1 and -fnew-ra, the code is not nearly as good as 4.0.1 for AVR. My main test file .o text (at the moment mostly poking bits and doing integer math) got 12% shorter with -Os (due in no small part to the fact that it's smarter with registers, resulting in less spilling, resulting in no need for the prolog/eplilog callouts). Of course I've got nearly zero experience with both, so I welcome dissenting opinions, but I wanted to put the recommendation out there for any other newbies. -- Ben Jackson [EMAIL PROTECTED] http://www.ben.com/ ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Should -mtiny-stack be the default for small devices?
The AT90S2313, for example, has only 128 bytes of memory. Is there any reason why -mtiny-stack shouldn't be the default for these devices? Looks like there are half a dozen ATtiny devices with 0 and =256 bytes of SRAM and several of the old AT90S* devices. -- Ben Jackson [EMAIL PROTECTED] http://www.ben.com/ ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list