Re: [avr-gcc-list] avr-gcc and char strings
Could you post the assembler output? IIRC --save-temps or -S should be added to the gcc command line. Wouter Op 3 dec. 2014 22:12 schreef Andreas Höschler ahoe...@smartsoft.de: Hi all, I am close to tearing my hair out. After having established the avr tool chain I tried out a very simple C-program (see below) on an SainSmart Mega2560 board programmed into the chip by making use of /Applications/Arduino.app//Contents/Resources/Java/hardware/tools/avr/bin/avrdude: #define F_CPU 1600UL /* 16 MHz CPU clock */ #include Global.h #include util/delay.h #include avr/io.h #include avr/interrupt.h #include inttypes.h char String[] = Hello world!!; void USARTInit0(uint16_t baud) { // Set Baud rate int value = (F_CPU / 16 / baud) - 1; UBRR0H = (uint8_t)(value8); UBRR0L = (uint8_t)value; // 8N1 UCSR0C = 0x06; // (3UCSZ00); // Enable receiver and transmitter UCSR0B = (1RXEN0) | (1TXEN0); } void TxByte0 (uint8_t data) { // Wait for empty transmit buffer while ( !(UCSR0A (1 UDRE0)) ); // Putting data into the buffer, forces transmission UDR0 = data; } int main (void) { DDRB = 0xff; // all outputs USARTInit0(38400); // 9600 19200 38400 while (1) { TxByte0('A'); TxByte0('B'); TxByte0('C'); TxByte0('\n'); char *s = String; while (*s != 0) { TxByte0(*s); s++; } _delay_ms(500); } return (0); } This program produces the following output ... ÿÿABC ÿÿABC ÿÿABC ... telling me that sending single chars works but sending strings fails (does not seem to have anything to do with the serial communication but rather be some kind of memory management problem!??). To be sure the problem is not caused by my own gcc build I changed my Makefile to # AVR-GCC Makefile PROJECT=toggle_led SOURCES=main.c HEADERS= CC=/Applications/Arduino.app//Contents/Resources/Java/hardware/tools/avr/bin/avr-gcc OBJCOPY=avr-objcopy MMCU=atmega2560 CFLAGS=-mmcu=$(MMCU) -Wall -O2 -I /usr/local/avr/include $(PROJECT).hex: $(PROJECT).out $(OBJCOPY) -j .text -O ihex $(PROJECT).out $(PROJECT).hex $(PROJECT).out: $(SOURCES) $(CC) $(CFLAGS) -I./ -o $(PROJECT).out $(SOURCES) program: $(PROJECT).hex avrdude -p $(MMCU) -c avrispmkII -P usb -e -U flash:w:$(PROJECT).hex clean: rm -f $(PROJECT).out rm -f $(PROJECT).hex and thus built the program with a gcc from a respected source (avr-gcc coming with Arduino.app). But the problem persists!? I already discussed this problem on avrfreaks forum. All agree that the above code should work but it does not!?? :-() Could this be a problem of the board/chip? I am using a SainSmart Mega2560 board (which seems to be a clone of the original Arduino product)! Or am I doing anything wrong?? Clueless!?? :-( Hints are greatly appreciated!! Thanks a lot, Andreas ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Pointer register allocation optimizer
Georg-Johann Lay schreef op 2012-07-27 16:06: Wouter van Gulik schrieb: Hi list, This code: char* f(char* p) { p++; return p; } Results in: mov r18,r24 mov r19,r25 subi r18,lo8(-(1)) sbci r19,hi8(-(1)) mov r24,r18 mov r25,r19 ret When compiling with avr-gcc -O[23s] -mmcu=avr5 -S main.c Oops, copy paste error; for avr5 movw is used to move the pointer registers. Still it does a useless move. Looks very much like PR52278, which is still open. I think this is the same, for sanity I also checked with int and long (against my Ubuntu gcc-avr 4.5.3) and it yields the same result; first move the register then the add, then move it back. What I wonder: why is r18 picked? Clearly r26, or r30 are way better choices. Maybe this is fixed in 4.7.1 already, don't have a 4.7+ at the moment. According to Vladimir, the register allocator (RA) should work smooth with SUBREGs, but obviously, it does not. Clearly. You can try -fno-split-wide-types, but that might have other disadvantages. That gives the expected result. HTH, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Stack use - possible bug
Jan Waclawek schreef: Yes, I can agree to that, but I don't know if I could fully characterize = this as a bug. Considering this code: #include avr/io.h #include stdio.h static void test0(void) { char buf[100]; puts(buf);} static void test1(void) { char buf[110]; puts(buf);} static void test2(void) { char buf[120]; puts(buf);} static void test3(void) { char buf[130]; puts(buf);} static void test4(void) { char buf[140]; puts(buf);} static void test5(void) { char buf[150]; puts(buf);} static void test6(void) { char buf[160]; puts(buf);} static void test7(void) { char buf[170]; puts(buf);} static void test8(void) { char buf[180]; puts(buf);} static void test9(void) { char buf[190]; puts(buf);} int main(void) { test0(); test1(); test2(); test3(); test4(); test5(); test6(); test7(); test8(); test9(); } Running on an atmega48 with only 512 bytes this will work (at first glance). But the optimizer creates this: push r29 push r28 in r28,__SP_L__ in r29,__SP_H__ subi r28,lo8(-(-1450)) sbci r29,hi8(-(-1450)) in __tmp_reg__,__SREG__ cli out __SP_H__,r29 out __SREG__,__tmp_reg__ out __SP_L__,r28 Which just is bogus since it is beyond memory. I would consider this a bug. Interestingly making all functions use buf[100] the optimizer gets smart and only uses 100 once: push r16 push r17 push r29 push r28 in r28,__SP_L__ in r29,__SP_H__ subi r28,lo8(-(-100)) sbci r29,hi8(-(-100)) in __tmp_reg__,__SREG__ cli out __SP_H__,r29 out __SREG__,__tmp_reg__ out __SP_L__,r28 Here my knowledge of GCC stack optimization stops. HTH, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C aliasing rules
Hi all, This might be due to bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39635 or the still open http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39386. It is about an error on shifting/inline/return values, it might be this that is code triggers the same bug Although I thought it was fixed it was not fixed in 4.4 the comments says. HTH, Wouter On 19/05/10 14:10, Lars Noschinski wrote: Hello! * David Browndavid.br...@hesbynett.no [10-05-18 21:01]: Thanks for your answer, David. Lars Noschinski wrote: I'm trying to debug a strange problem, which depends on whether a function is inlined (then it's broken) or not (then it's ok). Can someone tell me if the following code snippet violates the C aliasing rules for b1 (declared as uint8_t*, written as uint32_t* by xteaDecrypt)? [...] It's not impossible that this is a bug when inlining such complex code (and 32-bit code like this is complex on an 8-bit micro). It's difficult to tell without a compilable code snippet, and some indication of the expected results. If you can, you should look at the generated assembly to see if you can figure out what is going wrong. Also try to simplify or remove parts of the code until you have a minimal example of the problem. While it would be useful to find out about the problem (especially if it is a bug that is not already known), code like this does not benefit much from being inlined. It's too complex, and requires too many registers - the function call overhead is therefore minimal. You could improve the results somewhat by manual restructuring (perhaps eliminating the memcpy calls), but unless XTEA_ROUNDS is very small, the loop there will dominate everything. XTEA_ROUNDS is 32 and this code is far from being performance critical, but code breaking with optimization always makes me nervous ;) Reading more about strict aliasing issues, especially http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html http://davmac.wordpress.com/2010/02/26/c99-revisited/ I'm fairly sure, that accessing a uint8_t[] via a uint32_t* constitutes undefined behaviour; the same holds for converting the pointer by use of an union { uint32_t[2]; uint8_t[8] }; by the gcc (4.3.4) documentation for -fstrict-aliasing: | Similarly, access by taking the address, casting the resulting | pointer and dereferencing the result has undefined behavior, even | if the cast uses a union type, e.g.: | int f() { |double d = 3.0; |return ((union a_union *)d)-i; | } So it seems the only correct way is either changing the declaration of xteaDecryptCbc (i.e. use uint32_t from the beginning) or using memcpy. Or maybe some playing around with __attribute__((may_alias)). OTOH, this problem also occurs with -fno-strict-aliasing, so maybe there is some real bug down there in the compiler. I'll try analyising it later. If I read http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html correctly, it should violate the rules? // --- void xteaDecrypt(uint32_t v[2], uint32_t const k[4]) { uint32_t v0=v[0], v1=v[1], delta=0x9E3779B9, sum=delta*XTEA_ROUNDS; for (uint8_t i=0; i XTEA_ROUNDS; i++) { v1 -= (((v0 4) ^ (v0 5)) + v0) ^ (sum + k[(sum11) 3]); sum -= delta; v0 -= (((v1 4) ^ (v1 5)) + v1) ^ (sum + k[sum 3]); } v[0]=v0; v[1]=v1; } void xteaDecryptCbc(uint8_t v[8], uint8_t cb[8], uint8_t const k[16]) { static uint8_t tmpbuf[8]; memcpy(tmpbuf, v, 8); xteaDecrypt((uint32_t*)v, (uint32_t*)k); for (uint8_t i=0; i 8; i++) v[i] ^= cb[i]; memcpy(cb, tmpbuf, 8); } int main(void) { uint8_t b1[8], b2[8], b3[16]; xteaDecryptCbc(b1, b2 b3); } // --- -- Lars ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] in-line assembler
Hi, I don't know what you intended to do but I guess this is more like it (read the value and write it back): I personally prefer the %[] construction. uint8_t get_ram_byte(uint16_t ram_address) { uint8_tbyte; asm (ld %[reg] , %[adr] \n\t sts %[adr] , %[reg] \n\t : [reg] =r (byte) : [adr] e (ram_address)); return byte; } You're construction of sts %0, __tmp_reg__ is not correct. GCC is trying to feed sts r24 as first argument, which is invalid. You are feeding him the uninitialized variable 'byte'. Which is also allocate to R24. HTH, Wouter Robert von Knobloch schreef: Hello, I've been trying to decipher the intricacies of in-line assembler (using the Inline Assembler Cookbook as my guide). I have a very simple application that I cannot seem to realise. I want a C function that will return the contents of the RAM address that I give it as argument. My assembler-based function looks like this: file is hex.c = uint8_t get_ram_byte(uint16_t ram_address) { uint8_tbyte; asm (ld __tmp_reg__, %a1\n\t sts %0, __tmp_reg__\n\t : =r (byte) : e (ram_address)); return byte; } and is called from rambyte = get_ram_byte(i ); u_hex8out(rambyte); // Print byte as 8-bit hex. Trying to compile this results in ~/Monitor/hex.c:5: undefined reference to `r24' If I comment out the line sts %0, __tmp_reg__\n\t then it compiles and I see that the parameter is passed in R24,25, copied to R30,31[Z] and the value is read into R0 [__tmp_reg__]. I cannot see what is wrong with the sts command or why R24 is mentioned. Can anybody help me ? Many thanks, Robert von Knobloch. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] in-line assembler
Oke I had some more coffee: uint8_t get_ram_byte(uint16_t ram_address) { uint8_tbyte; asm (ld %[reg], %a[adr] \n\t st %a[adr], %[reg] \n\t : [reg] =r (byte) : [adr] e (ram_address)); return byte; } I only compiled, not assembled... This compiles and assembles. Note the extra 'a' in front of adr and the st instead of sts HTH, Wouter Wouter van Gulik schreef: Hi, I don't know what you intended to do but I guess this is more like it (read the value and write it back): I personally prefer the %[] construction. uint8_t get_ram_byte(uint16_t ram_address) { uint8_tbyte; asm (ld %[reg] , %[adr] \n\t sts %[adr] , %[reg] \n\t : [reg] =r (byte) : [adr] e (ram_address)); return byte; } You're construction of sts %0, __tmp_reg__ is not correct. GCC is trying to feed sts r24 as first argument, which is invalid. You are feeding him the uninitialized variable 'byte'. Which is also allocate to R24. HTH, Wouter Robert von Knobloch schreef: Hello, I've been trying to decipher the intricacies of in-line assembler (using the Inline Assembler Cookbook as my guide). I have a very simple application that I cannot seem to realise. I want a C function that will return the contents of the RAM address that I give it as argument. My assembler-based function looks like this: file is hex.c = uint8_t get_ram_byte(uint16_t ram_address) { uint8_tbyte; asm (ld __tmp_reg__, %a1\n\t sts %0, __tmp_reg__\n\t : =r (byte) : e (ram_address)); return byte; } and is called from rambyte = get_ram_byte(i ); u_hex8out(rambyte); // Print byte as 8-bit hex. Trying to compile this results in ~/Monitor/hex.c:5: undefined reference to `r24' If I comment out the line sts %0, __tmp_reg__\n\t then it compiles and I see that the parameter is passed in R24,25, copied to R30,31[Z] and the value is read into R0 [__tmp_reg__]. I cannot see what is wrong with the sts command or why R24 is mentioned. Can anybody help me ? Many thanks, Robert von Knobloch. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [Fwd: Re: [Fwd: Re: [avr-gcc-list] in-line assembler]]
Robert von Knobloch schreef: Thanks Jan, I have reached this conclusion too, I didn't understand the compiler/assembler interaction (and still don't fully, I can't get an sts var, Y to work, but I'll work at it). Is it me or are you looking for st var, Y (note the missing trailing s). HTH Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] AVR simulators
Schwichtenberg, Knut schreef: I'm aware of several AVR simulator running on different OS. VMLAB, AVRStudio, avrora, simulavr* and there was one mentioned here specially developped for the gcc regression tests. Could anyone forward the URL for this simulator please. It is called avrtest and currently is in the winavr repo. http://winavr.cvs.sourceforge.net/viewvc/winavr/avrtest/ HTH Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Howto I/O in asm instructions?
Ruud Vlaming schreef: Hi, There are a couple of ways to use i/o address in assymbly. Below i used a some: #define _EEARL_ 0x1E uint8_t portFSReadByte(unsigned char * pAddress) { uint8_t result; asm volatile ( \ in r26, __SREG__ \n\t cli \n\t out %2, %A0\n\t out _EEARL_, %B0 \n\t sbi __EECR__, 0\n\t in %A0, 0x1D \n\t out __SREG__, r26 \n\t :=r (result) :0 (pAddress), I (_SFR_IO_ADDR(EEARH)) : r26 ); return result; } (1) You can define a constant, like i did for _EEARL_ This is nice but no so portable. (2) You can use the asm paramter list, like for EEARH Also nice, but errorprone, since i keep making mistakes in numbering, especially when you have many parameters and have to change something. (3) The best thing to have would be something like __EECR__, resembling the definition for __SREG__, but that does not compile right now. Is the latter possible somehow? Or are there other solutions? You could go for option 2 if you use syntax like this (took this from eeprom.h) __asm__ __volatile__ ( /* START EEPROM WRITE CRITICAL SECTION */\n\t inr0, %[__sreg] \n\t cli \n\t sbi %[__eecr], %[__eemwe] \n\t sbi %[__eecr], %[__eewe]\n\t out %[__sreg], r0 \n\t /* END EEPROM WRITE CRITICAL SECTION */ : : [__eecr] i (_SFR_IO_ADDR(EECR)), [__sreg] i (_SFR_IO_ADDR(SREG)), [__eemwe] i (EEMWE), [__eewe] i (EEWE) : r0 ); This makes the code far more readable IMHO. For more info on the %[ ] notation take look at: http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Extended-Asm.html HTH Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Howto I/O in asm instructions?
Ruud Vlaming schreef: On Thursday 03 July 2008 08:43, Wouter van Gulik wrote: Ruud Vlaming schreef: (2) You can use the asm paramter list, like for EEARH ... You could go for option 2 if you use syntax like this http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Extended-Asm.html That is indeed a good solution, which i was not aware of. Thank you for the tip, i will use this, until i found an awnser to the question below. Most beautiful would be if you could somehow define __EECR__ 'in the background' once, so it is available in every asm routine you write, like __SREG__ is defined. Do you know if this is possible at all? It seems avr-libc does not do so by itself, and it is a little less simple as just defining the values somewhere. It must be (automatically) architecture dependent and globaly visible. You could do -D__EECR__=val of EECR on the command line, but having alot defines and different architecture is not a good idea IMHO. The __SREG__ defines are build into GCC. HTH Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Checking carry flag and Rotate operation in C.
Jonathan Blanchard schreef: Funny enough I was just debugging some error in my inline assembly code. I was pretty amazed that GCC can actually transform ADC R0 R0 to ROL R0. In binary it is the same instruction. ROL rx does not exist it is just a short form for ADC Rx, Rx and LSL Rx is just ADD, Rx, Rx CLR Rx equals XOR, Rx, Rx SET Rx is LDI Rx, 0xFF and TST Rx is AND Rx, Rx. I once found a website with (almost) all duplicates, but I can't find it anymore HTH Wouter Jonathan Blanchard [EMAIL PROTECTED] On Mon, Jun 16, 2008 at 9:18 PM, Andy H [EMAIL PROTECTED] wrote: Internally gcc understands rotate. So I looked up how gcc might expect rotate to be expressed. It is unsigned char a; return (acx) | (acy); - where the two constant cx, cy add up to size of mode (1+7=8 bits) However, since we have not defined AVR instruction patterns to gcc for rotate, it will produce code using shifts. I think this is worthy of a bug report or at least a place on the TODO list. Andy Andy H wrote: The simple answer us that you cant. Thougg we could do with this in library and/or gcc patterns (builtin rotate) This is close: unsigned char foo(unsigned char b) { if (b 128) { b = 1; b ^= 0b00011101; } else { b=1; } return b; } unsigned char bar(unsigned char b) { if (b 128) { b ^= 0b0001110; b = 1; b |= 1; } else { b=1; } return b; } Jonathan Blanchard wrote: Hi, I got two question about programming with AVR-GCC. Both are related to finding a way to generate a specific output in assembler. First, how do you create the rotate operation in C. Specifically how can the ROL and ROR can be generated. Secondly I have this piece of code where b is a 16 bit unsigned integer : b = b 1; if( b 256 ) b = b ^ 0b100011101; To optimize that I only need to check if b overflow at the left shift operation by checking the carry flag. I'm trying to find a way to do that in C. Right now I'm using the following piece of inline assembly to do the trick : asm volatile( LSL %0 \n\t BRCC 1f\n\t EOR %0, %1 \n\t 1: \n\t :+d (b) :r (PPoly) ); In this last piece of code b is a 8 bit unsigned integer and PPoly is a 8 bit unsigned integer with the value 0b00011101. I'm just curious to know if it's possible to achieve the same result only by using C. Jonathan Blanchard ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] Updates needed to AVR test files
Hi Andy / list, I was just curious, how many issues are still left? IIRC the last time you posted some result you got ~50 torture execution and ~500 compile issues. IIUC this should ideally be patched to the boardfile in the WinAVR CVS repository or is there an other place where avr board files life these days (in a GCC repository)? Wouter -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:avr-gcc- [EMAIL PROTECTED] Namens Andy H Verzonden: zondag 1 juni 2008 1:35 Aan: AVR-GCC; Paulo Marques; Mike Stein; Weddington, Eric; Anatoly Sokolov Onderwerp: Re: [avr-gcc-list] Updates needed to AVR test files Small typo - in COMPLEX_INT Should be set COMPAT_SKIPS [list {VA} {COMPLEX_INT}] not plural Duh! Andy Andy H wrote: BTW you can also just define these in environment and leave board file unchanged set COMPAT_SKIPS [list {VA} {COMPLEX_INTS}] set COMPAT_OPTIONS [list [list {-Os -mcall-prologues} {-Os -mcall-prologues}]] Andy Andy H wrote: There are a couple of changes needed to AVR test files to pass a few tests. Compatibility tests default to no optimization and maximum tests - this can easily overflow 128K code area. Add these lines to end board file (mine is called atmega128-simnew.exp). They set environment vars that control these tests and get many more to work. (Some still need other fixes). # Restrict compatibility tests. And optimise to reduce size. set COMPAT_SKIPS [list {VA} {COMPLEX_INTS}] set COMPAT_OPTIONS [list [list {-Os -mcall-prologues} {-Os -mcall-prologues}]] Dummy io/exit/abort file exit.c has unused parameter stream. The warning created then causes a failure in some tests. Hack as follows to create dummy reference to stream, thus removing the warning. int putchar_exit_c(char c, FILE *stream) { *((volatile unsigned char *) STDIO_PORT) = c; stream = NULL; return 0; } best regards Andy ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Add builtins in avr target.
Anatoly Sokolov schreef: Hello. I have considered all proposals on changing '__builtin_avr_delay_cycles' builtin. Also has added '__builtin_avr_fmul*' builtins. There is one thing that crosses my mind. A user would not only want a voidinterrupt_enable (void) uint8_t interrupt_disable (void) but also a void interrupt_restore (uint8_t) In 99% of the cases interrupt_disable would be used, the user needs an option to restore the interrupt flag to it's previous state. This feature would also make the atomic access builtin set of GCC functions reachable. See http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Atomic-Builtins.html#Atomic-Builtins for more detail. HTH Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] Illegal opcode errors building gcc-4.2.3
-Oorspronkelijk bericht- Van: Rick Mann [mailto:[EMAIL PROTECTED] Verzonden: zaterdag 19 april 2008 19:49 Aan: Wouter van Gulik Onderwerp: Re: [avr-gcc-list] Illegal opcode errors building gcc-4.2.3 On Apr 19, 2008, at 7:28 AM, Wouter van Gulik wrote: This is because binutils 2.18 does not support avr architecture 35. Use binutils 2.18.5 or more recent. Search the gcc mailing list for some more info. If your are building for linux maybe you should take a look at the linux build scripts sticky post on http://www.avrfreaks.net/index.php? name=PNphpBB2file=viewtopict=42631 You need to be logged in to actually download the scripts. These script are for 4.2.2 I finally realized I had to be logged in to find those. I build for Mac OS X, using a script I made myself, but it does not apply patches (it makes the mistaken assumption that recent GCC versions actually work out-of-the-box for AVR). My script also builds GDB and AVaRICE. I don't see a binutils-2.18.5 where I normally look: http://ftp.gnu.org/gnu/binutils/ Is there another place I should be looking? ftp://sourceware.org/pub/binutils/snapshots It is the snapshot. I do not know if the current version is broken for the AVR. I finally used your scripts and got a toolchain working that supports the ATmega324P (the need that triggered all this). Thank you for those. The scripts are not mine. That would be to much credit. But it is good to know your setup works. HTH, Wouter Ps please use reply-all next time, so it gets to the list, just in case someone else has the same problem. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Add builtins in avr target.
Anatoly Sokolov schreef: Hi. I wish to add: .. 3. builtin similarly to IAR '__delay_cycles'; .. This unfinished patch add '__builtin_avr_delay_cycles(long delay)' builtin to the avr backend. The 'delay' parameter should be constant. If 'delay' is 1 or 2 then one or two 'nop' instructions is generated. For a 2 cycles delays an rjmp can be used. Saves an instruction! If 'delay' is from 3 to 756 then code: ldi rX, (delay/3) 1:dec rX brne 1b is generated. 'ldi' instruction can be removed by optimizer. For 'delay' from 757 to 196605 loop is: 1:sbiw Rx,1 brne 1b For 'delay' from 196606 to 83886075 loop is: 1:subi %0,1 sbci %B0,0 sbci %C0,0 brne 1b And for 'delay' from 83886076 to 0x loop is: 1:subi %0,1 sbci %B0,0 sbci %C0,0 sbci %D0,0 brne 1b That is a high registers usage. 4 register used just for burning cycles? On the other hand burning cycles this way will probably never be used in real code. Adding '__builtin_avr_delay_cycles' builtin will allow to remove restrictions on max possible values of parameter for '_delay_us' macro and reduce code size for long delay of '_delay_ms' macro. Also it will simplify porting code from IAR C, if define '__delay_cycles' as '__builtin_avr_delay_cycles'. As you consider, this builtin will be useful? Yes it is useful, especially the really short ones. Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] RE: Patch Fix PR35013, PR27192
Andy H schreef: I think I have found a simple fix. I changed gcc so that offsets added to assembler symbols are doubled. So in c when we use foo+2 this gets send to assembler/linker as gs(foo+4). This has the effect that offsets or arithmetic are consistently in words - on a word pointer. (which makes more sense) Now it does not matter if optimisation creates p=foo+2 OR p=foo, p=p+2 as the result will be the same. I attach test program I used to check several variant and it worked. Apart from normal warning messages about linker stubs. There also lst and lss files you can look at what gets send to assembler and code produced. It looks ok to me (just looking at the lss, not rebuilding gcc) but the code is not optimal. It is moving to r18 doing the operation and then moving it back to r24. Is this because of your patch? Or something else? HTH, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] optimization flags causes problems??
Ramazan Kerek schreef: Hello, I have started using AT90CAN128 with WinAVR-20080402. I am having problem with with optimizatin flags. Do not use 20080402, use 20080411. HTH Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] [AVR gcc link error] relocation truncated to fit: R_AVR_13_PCREL ...
Emmanuel Bourien schreef: Hello, I get and install the last version of WinAVR (April 08 -GCC 4.3.0). I get this relocation truncated to fit: R_AVR_13_PCREL error during the link operation. I'm working with AVR Studio 4.12 I've read i should add -lm to the linker command to avoid this...but no success :( So i need your help to avoid this problem! Excuse my english ;) thanks for your help. Best regards emmanuel Your are almost there You need to add -lm at the end of the linker step. See http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_libm HTH, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: Patch Fix PR35013, PR27192
Andy H schreef: RFC A problem has come up trying to fix function pointer arthmetic bugs. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35013 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27192 I created patch to solve this, but Anatoly has found a problem. Without patch we had func returning word address and (func + 2) returning byte address. This occurred because AVR backend did not recognise (func+2) as program memory and did not use linker directive gs(func+2). With patch we get func and func+2 both returning word address. Which solved the bugs reported. Now if such a word pointer undergoes further arithmetic in c, then it will, of course, be adjusting it by words. The problem that Anatoly discovered is that optimization can break this. His example involved volatile pointers but it will happen on more realistic cases. For example if we create pointers to Foo, Foo+2, Foo+4, optimization will try and use one pointer to create values of other pointers. So we will get word address of Foo offset by words - or word address of Foo offset by bytes! This just depends if the offset calculation is done in gcc or the linker. Ag! Just for my understanding. The programpointer calculation is thus translated to datapointer calculation? There probably is no way of telling GCC we are dealing with different pointer types here? Since GCC can not handle different memory types at all. To fix is not simple. The following are ideas only - please add some more 1) One way to to get linker to perform maths on gs() using word offsets. gs(foo+2) would be word address of foo+ 2 words. Then it would not matter if gcc or linker performed maths. I do not know if this is possible or what problems this might create. 2) We could avoid using gs() and get function address as byte address + byte offset. This would require extra run-time code for function pointer calls (to divide by two). It is useful in that same pointers could be used to read bytes from program memory. 3) Give up and dont permit any arithmetic. (Perhaps error or warning would still be needed) 4) Like (1) but use new directive gsw() to permit this method? 5) Like (2) but use attribute to permit this method? 6) Get gcc to recognize constant pointer maths and exclude it from linker gs() 7) Get gcc to recognize constant pointer maths and pass to linker as gs(Foo) + n instead of gs(Foo+n) - if this is possible. Please add to discussion. So they main point is to keep the knowledge that the add is in words not bytes. Right? Is it not possible to do all programspace addition on byte level, until the linker? Which converts them to words? e.g. foo+2 becomes foo+linker_know_this_is_2_words. Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] testsuite saga continues
The logging version will always be slower. This is not just a matter of outputting the log, it is also a matter of building the log. We can avoid the output cost by only printing the last N lines, but we can not avoid the build cost. The code to do this was there at some point, but I decided to remove it, because under Linux you probably can do the same by running avrtest_log test_program | tail -n N and it should run almost as fast as a native solution. The only information you print is register info right? Since the parsing is so heavy does it make sense to save the complete register file (up to SPH). And then parse afterwards? It is just 96 bytes. The only info missing would be addresses to/from memory, but that could be ignored, since it's only load/store. When the address is absolutely relevant, just re-run using log. Just thinking out loud, it is probably nasty to create and the gain is almost nothing... So, I can add a --tail option to the log version, but the naked version will never be able to print any log at all, so that it runs as fast as possible. Remember that the main purpose of avrtest is to run gcc's testsuite. While running the testsuite, having a log is useless, but speed is important. Yes you are right, you can't have it all. BTW, I've done some more optimizations and the version I have now is almost twice as fast as the one on CVS, doing 30 P4 clocks per AVR clock, i.e., on my P4 3GHz I can simulate a 100MHz avr :) I don't have those numbers right now, but since there are tests that don't even fit in 128Kb of flash, there are probably some more that don't fit on 8Kb. Aha, well it is going to be hard to test the 8Kb then. Are these 128KB even in an optimized? I can imagine not fitting when using -O0. Do you already have a format for doing this? XML based? Nop. I haven't even started to think about the details. I would give my full support to anyone trying to setup a benchmarking framework, though ;) Hmm, nope never done such a thing before. Let me first try to get gcc compiling on my windows machine. It is a lot faster compared to my linux machine (DualCore 2.1GHZ vs Duron 1.3 GHz) Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] testsuite saga continues
Just a little note to let you know that the above change and other small cleanups have been committed to cvs. From the changelog: - give more information at program exit - cleanup a lot of #ifdef's - change the timeout from cycles to instructions, because the simulator runs slightly faster this way - add a barrier for the stack at 0x60, that makes avrtest abort with stack overflow when crossed Yes I have seen it, and used it. My clz no longer passes the test. It bails on stack overflow. But if I comment out the long long parts, all is ok. The next step will definitely be ELF loading support. With ELF loading, I can decode symbols like __bss_end to know where the stack overflows exactly or use __stack to know where the stack underflows. I can also do a more symbolic log, by decoding addresses to their symbol names. That would be very cool! Although a dump log at an abort might also be useful when debugging testcases. Some more thoughts about the smaller avr's. I did not intend to catch wrong instruction, but I was aiming at finding bugs that are do not apply to the mega series. Because if there are bugs in the less capable devices, it's very likely to be in the avr backend, which is easier to fix. Can't avrtest/gcc fake a avr2 device (e.g. at90s8535) with tons of flash and ram? Just like you now fake huge amounts of external memory? Thanks for the good work! Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] testsuite saga continues
Andrew Hutchinson schreef: One might argue that carry is the result of a compare with largest integer value (255 for bytes). But these situation do not directly arise in C - or I assume any other supported language - so it is not considered. (Though the ability to propage carry would indeed help create mode independent arithmetic operation.). Having carry as a condition code indeed seems not very useful But the most benefit from teaching gcc about the carry is the propagation of the carry, that is my main concern. Is not possible to create a special register for carry, (not in cc0) just for doing arithmetics using carry? This would lead to an expand of the sub/add/shift/cmp(?) in to simple byte patterns. Giving gcc much more knowledge on what's going on. This is close to what Dave has suggested in the other thread. I have to little knowledge on gcc's further internals to over see all consequences, I guess there are very good reason not to do this. Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
FW: [avr-gcc-list] testsuite saga continues
On behalf off Andy I am forwarding this: Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Onderwerp: Re: [avr-gcc-list] testsuite saga continues Wouter, You are correct. gcc will treat any structure that is 8 bytes or less as a single register and likely have same problem. This would be structures. I am trying to come up with fixes to several bugs and hopefully this one sometime. avr-gcc is odd that it accepts 8 byte as maximum - rather than 4. Im am nor sure why it was needed without also all the other parts for long long that must work. However, perhaps this was intended for double support. To get long long to work, it would be wise for all other patterns to be reduced to byte level expansion (where possible) this would exclude shift/compare/add etc that need carry operations. That would mimimise the amount of work required. New patterns would include move DI (8bytes) and perhaps add/sub/compare Also, it would be wise to change priority of register allocation (which I have a fix for). Since the current allocation does not easily permit 8 contigous registers to be allocated (the first is r24!). So you have mess of stack operations and register moves created Of course it will still be slow! I do not intend to fix problems for long long - unless they also fix problems in other areas. Please post to list - this email will not work for me. -Original Message- From: Wouter van Gulik [EMAIL PROTECTED] To: Paulo Marques [EMAIL PROTECTED] Cc: avr-gcc-list@nongnu.org Sent: Wed, 30 Jan 2008 3:56 am Subject: Re: [avr-gcc-list] testsuite saga continues Do you have a clue on why the tests fail? There is an ugly bug concerning stack allocation and 64 bit variables, maybe that is the evil one. See: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27386 for details. All the failed tests I've seen so far do in fact pass long long arguments to functions together with a bunch of other arguments (sometimes using va_args, too). In one of the cases (gcc.c-torture/execute/20030307-1.c), the test only fails at -O0 and -O1, but passes with other optimization levels because the functions get inlined and disappear completely, so the argument passing problem disappears too. So, I would say that it is very likely the same bug... So this means that 64 bit is mainly supported, only due to the stack allocation bug it's hard (if not impossible) to use. We should really try to find someone who can fix this nasty bug. Note that all sort of stack parsed arguments can go wrong, so it's not only 64 bit. Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] testsuite saga continues
In the meantime, I tried to not mark long long as unsupported, with similar results. Without no_long_long there are more 32 tests that fail, but less 556 tests are marked as unsupported. Which means that 64 bit long long's are mostly supported in fact. The question is: are long long's officially supported? Should we be running the tests that use them? BTW: 64 bit long long is really hard for a 8 bit microcontroller. At least one of the tests (with -O0 optimization) was initially failing from timeout, which means that it was taking more than 500 million cycles to execute. Increasing the timeout to 2 billion cycles solved the problem, though. Well today I have found out why this could be. I am testing a new version of the clz fixes and I also implemented some DI versions (DI = double int = 64 bit in gcc's internal terms). To my surprise some options did not changed a thing in cpu cycles, while the program got much shorter... So I took another look at it, and guess what... The stack usage was to much, so that it was now pushing it values into I/O memory including the special exit code memory. The program now exited successful on a push r15 :D Can you make avrtest check on stack overflow? Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] testsuite saga continues
Paulo Marques schreef: The program used more than 4k of stack? Yikes! Well I thinks it's the 64 bit stack bug... if anything goes wrong with the stack you might end up having a huge stack. It's a bug in the program. Can you make avrtest check on stack overflow? I can, specially if I start accepting command line arguments to define memory regions, so that I also know where the stack really ends. I'll post a new version as soon as I have this. In the meanwhile, you can work around that specific problem, by switching the addresses of the exit and the abort ports, so that the abort port is hit first ;) Yes I already thought about doing so. Could you then also print the real flash address of the exit just like you do with the log. And the total number of cycles past. Thanks in advance, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] testsuite saga continues
I have not dug enough into the details of gcc, but I thought that flags were only visible at a low level, such as in the avr.md file, where you are defining the assembly code sequences for different effects. Thus it is possible to define a 16-bit addition instruction with an add, adc sequence - but you can't really make use of the carry flag after that. Yes this is exactly what I wanted to point out. The carry is now only used in handwritten assembler (in avr.md). GCC's RTL does not know anything about the carry bit being available when it's set/cleared and when it's clobbered. HTH, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] (no subject)
Try casting the expression (foo +1) to a function pointer. This may prove the point and correct the code in table - and elsewhere. Arithmetic with function pointers is probably not well standardized - if at all. I agree. Although giving different results for the same statement is very nasty. I tried casting but I could not get any difference. When I tried to cast both operands to a function pointer I got a message that + is not a legal operation. Which seems ok and logical to me. Is there anyway we can make this warning an error (by default)? main.c:14: warning: internal error: out of range error So the user knows something is broken in his code. Thanks for the help Wouter Andy Wouter van Gulik wrote: http://gcc.gnu.org/onlinedocs/gcc-4.2.2/gcc/Pointer-Arith.html#Pointer- Arith The elements to which function (and void) pointer refer are assumed to be size 1 byte. So if you really want to mess with these pointers, you must treat as byte address. Exactly what I thought. If I do nasty things I should know what I am doing. That does not explain the other problems reported directly. However given foo is a function pointer, what is the type of the expression foo + 1.? Perhaps gcc treats such an expression as void? Exactly the table is still messed up by gcc. Any idea how I/we can test this? I looked at the -da output but I could not find anything related to the table. Is the avr backend involved in generating the correct function pointer addresses? Where is this gs() coming from? I searched through the as documentation but I could not find it. Is it from ld? Are there other platforms supported by gcc having the same strange non equal data/program space? I know that the TI C54x series have a 8 bit program space, and a 16 bit data space. sizeof(char) == sizeof(int), both 16-bit!, but instructions and function addresses are in bytes Maybe we can find some hints there? HTH Wouter Andy Andrew Hutchinson wrote: I think you highlight the problem for gcc. We are have to treat program memory as byte addressable to support LPM. Direct, function calls only want word address to form the correct opcode. But we use byte address labels and assembler removes the redundant bit to form the correct opcode. Indirect (icall) functions show up the anomaly as these are formed outside of the assembler. Gcc is assuming that the item that a function pointer points to is size 1. When in fact it is size 2. This is similar as having pointer to some other object such as long: long *ptr; x = ptr+1; /* x will be assinged byte address potr+4 */ So if we can correct that mistake, I believe the problem is resolved. Now, I am not sure how gcc determines that size! So I will look. Andy Wouter van Gulik wrote: Compiling the following program ends up in main.c:(.text+0x2): warning: internal error: out of range error = main.c //Dummy func void foo(void) {} //Table with address manipulation void (* const pFuncTable[]) (void) = { foo + 0, foo + 1, //need odd offset }; int main(int argc, char* argv[]) { //Call table pFuncTable[1](); return 1; } Looking into the generated assembler gives: pFuncTable: .word gs(foo) .word foo+1 Which is wrong. It should have been gs(foo + 1) or perhaps gs(foo)+1 But the true wrong thing is that gcc out smarts the table (since it's const) and directly does: call foo+1. This gives the internal error. Even worse is that the compiler does not stop!! IMHO it should stop here, instead it generates this final assembly: 00a6 main: a6: 0e 94 00 00 call0 ; 0x0 __vectors aa: 81 e0 ldi r24, 0x01 ; 1 ac: 90 e0 ldi r25, 0x00 ; 0 ae: 08 95 ret Before I post a note to the existing bug report (it's probably related with http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27192 ) I want to know what foo + 1 is supposed to do. GCC seems to mix up byte address (for lpm) and word addresses (for ijmp/jmp//icall/call). Is it supposed to increment the byte address or the word address? I guess byte addresses are what it's supposed to be, since calling foo + 2 ends up in calling foo + 2 bytes. Leaving foo + 1 as illegal address. And I just found another nasty error: //Dummy func void foo(void) {} //Table with address manipulation void (* const pFuncTable[]) (void) = { foo + 4, //need odd offset }; int main(int argc, char* argv[]) { //Call table pFuncTable[0](); return 1; } This will generate a correct call (4 bytes after foo) but the value in the table is not left shifted! Meaning that a call via the table will generate a call to the wrong address, while
RE: [avr-gcc-list] (no subject)
http://gcc.gnu.org/onlinedocs/gcc-4.2.2/gcc/Pointer-Arith.html#Pointer- Arith The elements to which function (and void) pointer refer are assumed to be size 1 byte. So if you really want to mess with these pointers, you must treat as byte address. Exactly what I thought. If I do nasty things I should know what I am doing. That does not explain the other problems reported directly. However given foo is a function pointer, what is the type of the expression foo + 1.? Perhaps gcc treats such an expression as void? Exactly the table is still messed up by gcc. Any idea how I/we can test this? I looked at the -da output but I could not find anything related to the table. Is the avr backend involved in generating the correct function pointer addresses? Where is this gs() coming from? I searched through the as documentation but I could not find it. Is it from ld? Are there other platforms supported by gcc having the same strange non equal data/program space? I know that the TI C54x series have a 8 bit program space, and a 16 bit data space. sizeof(char) == sizeof(int), both 16-bit!, but instructions and function addresses are in bytes Maybe we can find some hints there? HTH Wouter Andy Andrew Hutchinson wrote: I think you highlight the problem for gcc. We are have to treat program memory as byte addressable to support LPM. Direct, function calls only want word address to form the correct opcode. But we use byte address labels and assembler removes the redundant bit to form the correct opcode. Indirect (icall) functions show up the anomaly as these are formed outside of the assembler. Gcc is assuming that the item that a function pointer points to is size 1. When in fact it is size 2. This is similar as having pointer to some other object such as long: long *ptr; x = ptr+1; /* x will be assinged byte address potr+4 */ So if we can correct that mistake, I believe the problem is resolved. Now, I am not sure how gcc determines that size! So I will look. Andy Wouter van Gulik wrote: Compiling the following program ends up in main.c:(.text+0x2): warning: internal error: out of range error = main.c //Dummy func void foo(void) {} //Table with address manipulation void (* const pFuncTable[]) (void) = { foo + 0, foo + 1, //need odd offset }; int main(int argc, char* argv[]) { //Call table pFuncTable[1](); return 1; } Looking into the generated assembler gives: pFuncTable: .word gs(foo) .word foo+1 Which is wrong. It should have been gs(foo + 1) or perhaps gs(foo)+1 But the true wrong thing is that gcc out smarts the table (since it's const) and directly does: call foo+1. This gives the internal error. Even worse is that the compiler does not stop!! IMHO it should stop here, instead it generates this final assembly: 00a6 main: a6: 0e 94 00 00 call0 ; 0x0 __vectors aa: 81 e0 ldi r24, 0x01 ; 1 ac: 90 e0 ldi r25, 0x00 ; 0 ae: 08 95 ret Before I post a note to the existing bug report (it's probably related with http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27192 ) I want to know what foo + 1 is supposed to do. GCC seems to mix up byte address (for lpm) and word addresses (for ijmp/jmp//icall/call). Is it supposed to increment the byte address or the word address? I guess byte addresses are what it's supposed to be, since calling foo + 2 ends up in calling foo + 2 bytes. Leaving foo + 1 as illegal address. And I just found another nasty error: //Dummy func void foo(void) {} //Table with address manipulation void (* const pFuncTable[]) (void) = { foo + 4, //need odd offset }; int main(int argc, char* argv[]) { //Call table pFuncTable[0](); return 1; } This will generate a correct call (4 bytes after foo) but the value in the table is not left shifted! Meaning that a call via the table will generate a call to the wrong address, while the original call is ok. Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] (no subject)
Compiling the following program ends up in main.c:(.text+0x2): warning: internal error: out of range error = main.c //Dummy func void foo(void) {} //Table with address manipulation void (* const pFuncTable[]) (void) = { foo + 0, foo + 1, //need odd offset }; int main(int argc, char* argv[]) { //Call table pFuncTable[1](); return 1; } Looking into the generated assembler gives: pFuncTable: .word gs(foo) .word foo+1 Which is wrong. It should have been gs(foo + 1) or perhaps gs(foo)+1 But the true wrong thing is that gcc out smarts the table (since it's const) and directly does: call foo+1. This gives the internal error. Even worse is that the compiler does not stop!! IMHO it should stop here, instead it generates this final assembly: 00a6 main: a6: 0e 94 00 00 call0 ; 0x0 __vectors aa: 81 e0 ldi r24, 0x01 ; 1 ac: 90 e0 ldi r25, 0x00 ; 0 ae: 08 95 ret Before I post a note to the existing bug report (it's probably related with http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27192 ) I want to know what foo + 1 is supposed to do. GCC seems to mix up byte address (for lpm) and word addresses (for ijmp/jmp//icall/call). Is it supposed to increment the byte address or the word address? I guess byte addresses are what it's supposed to be, since calling foo + 2 ends up in calling foo + 2 bytes. Leaving foo + 1 as illegal address. And I just found another nasty error: //Dummy func void foo(void) {} //Table with address manipulation void (* const pFuncTable[]) (void) = { foo + 4, //need odd offset }; int main(int argc, char* argv[]) { //Call table pFuncTable[0](); return 1; } This will generate a correct call (4 bytes after foo) but the value in the table is not left shifted! Meaning that a call via the table will generate a call to the wrong address, while the original call is ok. Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Small bug in avrtest
Hi list/Paulo, I just wanted to let you know before someone else also spends an evening searching this bug :D The file is opened with the rt option. I don't know what it's supposed to do, but it makes my Mingw Windows compiled version open files only half or so. Leading to illegal pc out of bounds errors. I changed it to rb and all was good. HTH, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Optimisation help
Magnus Johansson schreef: I totally get the second and third reads. But the first one, just moving r24 to r17 will only work if r24 is only 0x00 or 0x01 not otherwise...? What should I do? Well I can't see all assembler so this is a bit of a guess. GCC is probably going to do a conditional load. Since result/r17 could be loaded after the first call it will do so. So it will probably generate a clr and conditional ldi r17, 1 or something alike. It would help if you provide all of the assembler between the first and the second call. HTH Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] How to get low byte off a function address?
Erik Christiansen schreef: On Sat, Jan 19, 2008 at 04:15:35PM +0100, Wouter van Gulik wrote: How do I do such a thing? Using the lower 8 bits is possible when loading a register so why not in a table? In the past, we've encountered other relocations that aren't handled by the avr port of binutils. It does look like this is another case that avr-ld hasn't been tweaked to handle. It sounds like you've tried one work-around, i.e. loading a register, then writing to the table, now necessarily in RAM. That's workable, code space allowing. Ehm no not exactly. I am wondering why something like this works: ldi r31, lo8(gs(foo)) and this not: .byte lo8(gs(foo)) Why does as or ld (?) in the latter state it's not constant and the second is no problem? After some more testing I found out that constructions like: .byte lo8(1024) are not allowed. Is this a bug? The learning curve for binutils internals being a bit too steep for a quick toolchain tweak, I'd alternatively be tempted to invoke a few lines of awk (from the makefile) to snaffle the absolute addresses from the map file, insert them in the table, reassemble that file, and link again. (Pre-existing dummy .byte lines would ensure addresses don't move in flash.) That's perhaps worthwhile if you're chasing this either because a RAM-resident table, or the copy loop, is intolerable in the tiny bootloader. Granted, this comes close to winning an ugly contest, but it pretty much has to work(tm). If the file with the function pointer table is linked last, then the others can be incrementally linked, and the table file linked after being awked. I am not afraid of winning the contest. As long as it save flash I am in for it :D That would be an other option yes. An afterthought: You could alternatively put the foo() functions into a separate output section, allowing the linker script to place the block of them at a fixed address. (Each could in fact be placed in an individual section.) The table could in the latter case be filled with constants. (It's not real pretty either, is it?) Well, the whole idea was to have it constant. I wanted to reduce the big cpi/brne tree and so I came up with this reduced-size jump table idea. The table in the real application should also contain an opcode. So the idea was to check against opc and the ijmp/icall to the correct function. After I wrote the assembler it turned out to be 2 bytes shorter then my cpi/brne... But I thought it reads better and it is easier to extend the table. So I kept on trying, but no success yet. Hope you can help Hope some of the above does, at least a little. :-) Thanks anyway! Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] How to get low byte off a function address?
Dear list, How do I do this: main.c = void foo(void) { } char table[2] = { (foo), (foo), }; int main(int argc, char* argv[]) { int adr = table[argc]+0x3F00; ((void (*) (void))adr)(); return 0; }= It gives: main.c:5: warning: initialization makes integer from pointer without a cast main.c:5: error: initializer element is not computable at load time main.c:5: error: (near initialization for 'table[0]') main.c:6: warning: initialization makes integer from pointer without a cast main.c:6: error: initializer element is not computable at load time main.c:6: error: (near initialization for 'table[1]') Why is it not computable? Why is it when I make it 16 bits? I first wanted to implement this in assembler. But I got all sorts off error, mainly gas refusing to see I only want 8 bits, not 16. So I figured I go and try this in C, but I can't get it done. You might wonder why on earth would I want to such a strange thing? Well I have a very small bootloader so I know all my functions are within the 512byte/256 word boundary. And I there for I have no need to store the full 16-bit address (in order to keep my bootloader small). I just need the low byte of the address. How do I do such a thing? Using the lower 8 bits is possible when loading a register so why not in a table? In assembler I tried this: .byte lo8(foo) .byte pm_lo8(foo) .byte lo8(pm(foo)) But all are with the same result: Error: illegal relocation size: 1 Error: junk at end of line, first unrecognized character is `(' Hope you can help Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] '-morder' option with Avr-libc: comparison table
Hi all. If I run the -morder1 and the -frename-registers on my test programs it grows in size. The difference is probably that my test programs (sorry I can't release sources) is not using any 32 bit variable and hardly any 16 bit. Test summary: -morder1 + -frename-register| -morder1 test1 | bigger | smaller test2 | bigger | smaller Test1 grows from 9886 bytes to 9930 bytes. An increase of 44 bytes. The most (38 bytes) is in one file. There is not one file that gets smaller. In Test2 there are several files that get a little bit smaller. But again there are 2 that get larger. Including one interrupt routine that now uses an extra register and thus an requires an extra push/pop. So I think -frename-register works very well for 32 bit variables but not for applications not using any 32 bit variable. HTH, Wouter Hi. Summary results for Avr-libc CVS HEAD 2008-01-13, only C-functions. Values (base variant) are slightly different from ones of 10 Jan, due to bug #21995 is fixed. GCC 4.3.X is 4.3-20080104 snapshot. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Missed optimisations
David Brown schreef: The code is basically good (the swap instruction is used for the shifts, which is very nice - a big improvement over the older 3.4 gcc), but there are a few missed optimisations shown here that are probably quite common in other code. Why is the address of crcTable8n loaded into r18:r19 first, before being copied into r30:r31 for the address calculation? It seems that this happens when the address is reused - if it is not reused, then r30:r31 are loaded directly. However, the reuse does not benefit from having the address in a register - the add r30, r18 and adc r31, r19 on lines 68 and 69 could be replaced with subi and sbci instructions to save space and time, and to free registers r18:r19. On most RISC cpus, storing the address in a register for reuse would be a benefit, which is probably why this code is generated - on the AVR, it is not helpful (at least, not here). I don't know. But it happens more often that register are not re-used when the could have been. Maybe because lpm is an a macro. Try replacing it with a normal table index. If that helps, write the ld r??, Z in an assembler macro to be sure. Secondly, the (data 0x0f) clause generates messy 16-bit code. I realise C requires integer promotion in such cases, but it's important to try to remove unnecessary code such as loading the high register with zero, then anding it with zero, then eoring it. gcc version 3.4.6 was sometimes marginally better at such code. It should be noted that the quality of the generated code depends very much on the exact expression - the original [(crc 4) ^ (data 0x0f)] generates poor code, while the equivalent [((crc 4) ^ data) 0x0f] generates tight code. Hmm, yes it really gets messy on r31/r23: 62 0020 F0E0 ldi r31,lo8(0) ; load with 0 63 0022 70E0 ldi r23,lo8(0) ; load with 0 64 0024 6F70 andi r22,lo8(15); 65 0026 7070 andi r23,hi8(15); re-load R23 with 0 66 0028 E627 eor r30,r22 ; 67 002a F727 eor r31,r23 ; zero XOR zero == 0 68 002c E20F add r30,r18 ; 69 002e F31F adc r31,r19 ; This is a known feature. The patches Andrew Hutchinson is working (?) on are supposed to improve this. I'am wondering why the load of r31 and r23 is done before the operations. It seems like gcc 4.2.x is moving the loading of the variables a little more away from the use of them, but this does not benefit the AVR. HTH, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]
Andrew Hutchinson schreef: PS Please report as a bug - gcc should be better than this. I did, it got number 34737. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737 I hope all info is ok. I wanted to add a link to your e-mail. Put it's not on the list archives yet. Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] GCC-AVR Register optimisations
Registers 17 downwards are call saved and push/popped in prescribed order by prolog/epilog functions. Also R28,29 is potential frame pointer and so that is best left alone. So the key registers are: R18-R27 R30,31 Note that in some cases it could be very interesting to use r27, or Y, register. Consider this example: char *x; volatile int y; void foo(char *p) { y += *p; } void main(void) { char *p1 = x; foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); foo(p1++); } This will generate very bad code. /* prologue: frame size=0 */ push r14 push r15 push r16 push r17 /* prologue end (size=4) */ lds r24,x lds r25,(x)+1 movw r16,r24 subi r16,lo8(-(1)) sbci r17,hi8(-(1)) call foo movw r14,r16 sec adc r14,__zero_reg__ adc r15,__zero_reg__ movw r24,r16 call foo movw r16,r14 subi r16,lo8(-(1)) sbci r17,hi8(-(1)) movw r24,r14 call foo movw r14,r16 sec adc r14,__zero_reg__ adc r15,__zero_reg__ movw r24,r16 call foo movw r16,r14 subi r16,lo8(-(1)) sbci r17,hi8(-(1)) movw r24,r14 call foo etc.. A more optimal scheme would be call foo movw r24, r16 adiw r24, 1 movw r16, r24 call foo etc.. Using the r24 capability to do a 16 bit increment But in this special case there is no frame pointer. So we could use R28 to store instead of R16. Then we can add on r28 and do something like this: call foo adiw r28, 1 movw r24, r28 call foo So yes using R28 as last resort looks like a sane thing. Unless there is no frame pointer at all, and there is a need for 16 (or 32 bit) arithmetic on saved registers. This is probably incredibly difficult. But I thought to mention it anyway HTH, Wouter ps. Writing it like foo(p); p++; Will produce better code?!? I will fill a bug report for this. With the order, there are several problems: 1) Initial register allocation fragments the register set. For example, allocating r25 will prevent R24-25 being used for 16bit register and prevent R22-25 and R24-27 being used as 32 bit registers. gcc register allocator does not seem to overcome this fragmentation. 2) The situation is made worse by the order of 16bit+ register used for call and return values - which are allocated in reverse order. eg R24-R25, R22-24, R18-24. This means that the function parameters or return values are rarely in the right place - except for 16bit values. 3) Allocating a byte to odd number register precluded it being extended to 16bit value without a move. So, I tried creating an order which would preserve the contiguous register space and avoid the above issues as much as possible. This is what I ended up with: R18,26,22,30,20,24,19,21,23,25,27,31,28,29, \ 17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,\ The result is a 1.25% saving in code size for a simple mixed application. Pretty good for such a simple change! For more floating point, the saving might well be higher as it demands more contiguous 32 bit registers. On the same basis, the current order of called saved registers R2-R17 dictated by (mcall) prolog limit further improvement is clearly imperfect. These are used less frequently, though their cost is much higher. So its difficult to gauge impact. I might take a look at some intense floating point functions to see if this if it is worth pursuing reordering these too. Andy ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] GCC-AVR Register optimisations
Wouter van Gulik schreef: Note that in some cases it could be very interesting to use r27, or Y, register. Should have written R28 of course. Since gcc seems down at the moment I did some more testing. Now consider this example: void main(void) { char *p = x; foo(p); p+=65; foo(p); p+=65; foo(p); p+=65; foo(p); p+=65; foo(p); p+=65; foo(p); p+=65; foo(p); p+=65; foo(p); p+=65; foo(p); p+=65; foo(p); p+=65; } This must be done using a subi/sbci pare. But the compiler now seems to realize that p is a constant offset to x. So we now get: main: /* prologue: frame size=0 */ push r16 push r17 /* prologue end (size=2) */ lds r16,x lds r17,(x)+1 movw r24,r16 call foo movw r24,r16 subi r24,lo8(-(65)) sbci r25,hi8(-(65)) call foo movw r24,r16 subi r24,lo8(-(130)) sbci r25,hi8(-(130)) Here x is stored in r16 and the cumulative offset is added to R24 But if the compiler can realize this... Then why not do this for adds within the adiw range?!? So for p++/p+=1 we would get something like: movw r24, r16 adiw r24, 1 call foo movw r24, r16 adiw r24, 2 etc.. This is just as small as the earlier suggested use of R28! Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] Tablejumps - needless run time conversion to byteaddress
I can make GCC use a jumptable using this code: test.c === volatile int x; volatile int y; void foo (void) { x++; } void main(void) { switch(y) { case 0 : foo(); case 1 : foo(); case 2 : foo(); case 3 : foo(); case 4 : foo(); case 5 : foo(); case 6 : foo(); case 7 : foo(); case 8 : foo(); case 9 : foo(); case 10 : foo(); case 11 : foo(); case 12 : foo(); case 13 : foo(); case 14 : foo(); case 15 : foo(); case 16 : foo(); } } === Compiling using: avr-gcc -g -Os -Wall -mmcu=atmega16 -fno-inline test.c (Using no inline to keep disassembly small) gcc version 4.2.2 Gives: === main: .LFB3: .LM3: /* prologue: frame size=0 */ /* prologue end (size=0) */ .LM4: lds r30,y lds r31,(y)+1 cpi r30,17 cpc r31,__zero_reg__ brsh .L23 .LM5: subi r30,lo8(-(gs(.L22))) sbci r31,hi8(-(gs(.L22))) lsl r30 rol r31 lpm __tmp_reg__,Z+ lpm r31,Z mov r30,__tmp_reg__ ijmp .data .section .progmem.gcc_sw_table, a, @progbits .p2align 1 .L22: .data .section .progmem.gcc_sw_table, a, @progbits .p2align 1 .word gs(.L5) .word gs(.L6) .word gs(.L7) .word gs(.L8) .word gs(.L9) .word gs(.L10) .word gs(.L11) .word gs(.L12) .word gs(.L13) .word gs(.L14) .word gs(.L15) .word gs(.L16) .word gs(.L17) .word gs(.L18) .word gs(.L19) .word gs(.L20) .word gs(.L21) .text .L5: .LM6: call foo snip etc... == Some interesting notes: It works only from 17 cases and up. For smaller devices (e.g. atmega8) It works from already from 3 cases. But then an rjmp table is used. Why is GCC not using this rjmp scheme for the atmega16? Is it too difficult to predict it will not pass 4k boundary? HTH, Wouter Hi Does anyone have some code that creates tablejump in Avr-gcc? This is where gcc will create table instead of long line of if-then-else tests I cant seem to create enough switch cases to force one! I have been looking at compilation patterns and noticed that gcc address is multiplied by 2 to form address for LPM (table being in ROM). LPM needs byte address and gcc has word address. lsl r30 rol r31 lpm __tmp_reg__,Z+ lpm r31,Z mov r30,__tmp_reg__ ijmp Asm Pattern currently expects value to be in R30. However, it would appear that this would be better with a symbol rather than value in register - thus providing a means to multiply that value by 2 at compile time. (and I cant see any reason it would be called with other than constant address in ROM) Obviously, I'd like to test it. Andy ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] invalid ram address
andi schreef: Hi , I make a program for atmega32, and i compile it using WinAVR version 20070525. But when i want to simulate in AVRStudio, the variables that i declare are invalid location. I check the SRAM address and in outside the maximum address (example : 0xA64) Is it a bug ? Or maybe I have to configure something ? Please provide more info. State the code you are using and the compile option. Otherwise it's impossible to help. Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Intel hex record types
Scott Morken schreef: What do we do for AVR architecture? I would also be interested in any other possible record types output by AVRGCC if anyone knows. Take a look at s-rec or s-record, it comes with the WinAVR releases. It can almost convert anything to anything. HTH Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] RE: [avrdude-dev] Re: How to talk to second device in JTAG chain?
The fallback plan is to have a header with TRST, TCK and TDI pins shared, with separate TDO and TMS pins for each device. :/ That's a good idea anyway, since all debugging instruction must go through the other chip, you get an extra delay per cycle. For a few bytes this is ok, but when (down)loading several Kbytes this start to be uncomfortable. HTH, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Problem with delay loop
Royce Pereira schreef: Hi all, In the latest WinAVR (avr-gcc (GCC) 4.1.2 (WinAVR 20070525) I found this. Check this out: //== void delay(unsigned del_cnt) { while(del_cnt--); return; } //=== Well writing your own delay_loops is not recommended, because the optimiser might optimise your loop away. Use util/delay.h instead. Please note that delay.h might not work if compiling without optimiser (but then again, your loop will not be gone) HTH, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] 8-bit return values again
Dusan Ferbas schreef: Hi guys, when I searched this list for 8-bit return values, I found 2 threads. Described snippets seems to me more about switch/case expression optimization: http://lists.gnu.org/archive/html/avr-gcc-list/2003-06/msg0.html http://lists.gnu.org/archive/html/avr-gcc-list/2003-06/msg5.html -- I want to solve case, when a function is declared as u_char(char, int8, etc.). It is compiled in a way, that it returns a value in the R24,R25 register pair. This is true not only with literals (see example below), but also with byte variables. R25 value is never used in a calling code (see assembler listing below). Any idea ? Any plans for resolving this ? It seems that (at least some of) this is fixed in gcc 4.3.0 I currently don't have acces to 4.3.0 but Eric Weddington has, and his assembler output shows no clr r25. See: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33050 I don't know if this is always the case or just a lucky example. HTH Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Inversion of logic improves size speed
Anatoly Sokolov wrote: Hi. This patch optimizes logic left shift of unsigned char by 4, 5, and 6, excluding double 'andi' instructions in some cases. snip Now: 0092 getBit4InvShift: 92: 82 95swap r24 94: 81 70andi r24, 0x01 ; 1 96: 08 95ret 0098 getBit5InvShift: 98: 82 95swap r24 9a: 86 95lsr r24 9c: 81 70andi r24, 0x01 ; 1 9e: 08 95ret 00a0 getBit6InvShift: a0: 82 95swap r24 a2: 86 95lsr r24 a4: 86 95lsr r24 a6: 81 70andi r24, 0x01 ; 1 a8: 08 95ret That's good news! No more clr r25 and no double and anymore! Does this fix the double and in more situations? Is this because the swapand is now exposed to the upperlayers? One thing, the patch is not in this e-mail (the list). And I did not receive your e-mail on my private e-mail. Maybe it's filtered. Will check my junk map. Thanks, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Inversion of logic improves size speed
Eric Weddington schreef: Patch was not attached to email. However, Anatoly attached the patch to the bug report. What bug report? I looked at: Non optimal bit extraction http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33049 No register save: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33050 Double and: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11259 I can't find them there or I need some more coffee... it's after all still monday ;) Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Another missed optimization
Hi list, Ok I'll admit this one is rare, but a really annonying one. Since my application is all in one file I try to optimise the code (and especially my ISR's) by making heavily used variables reside in lower part registers. This reduces size a whole lot and speeds up a good bit. I know that instructions are restricted to handling r16..r31 only. But this example should not suffer from this. Why is 0xA load again in r24? Strange enough gcc does optimise the extra ldi when r is not a explicit register. So it seems that the logic for writing register 15 and below is non optimal? I've seen misses when doing adding as well (I've not tried to reproduce it yet, will give it another try later) I used winavr-20070525 (GCC4.1.2) and the following compile options avr-gcc -S -Os -mmcu=atmega644 test.c Ok this is the c snippet: // C register unsigned char r asm(r2); //use only r2..r15 volatile unsigned char dummy; //give the optimizer something to keep int main(void) { unsigned char localDummy = dummy; if(localDummy == 0xA) { r = localDummy; } } // ASM The ASM output: main: /* prologue: frame size=0 */ /* prologue end (size=0) */ lds r24,dummy -- load to localDummy cpi r24,lo8(10) -- compare against 0xA brne .L5-- branch ldi r24,lo8(10) -- WHY??? it's there allready! mov r2,r24 -- mov .L5: /* epilogue: frame size=0 */ ret ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Explicitly using lower half registers gives non optimal code
Ok I tried adding as well. It's ok when writing something like this: // C /// register unsigned char r asm(r2); void foo(unsigned char in) { if(in == 0xA) { r += in +1 ; } } You get neat code (but that's because of the optimizer seeing that the increment is constant): // ASM /// cpi r24,lo8(10) brne .L4 ldi r24,lo8(11) add r2,r24 .L4: ret But now try this: It looks like as if the compiler thinks r2 is in RAM? I cannot see why r2 should be load in to r24? Is making a variable a register giving extra constraints on manipulation (apart from a less available instructions?) I tried different rewrites but all ends up the same. I can image the optimizer incrementing in before adding it to r. But I can't make sense of this. // C /// register unsigned char r asm(r2); void foo(unsigned char in, unsigned char in2) { if(in == in2) { r += in +1 ; } // ASM /// mov r25,r24 WHY? cp r24,r22 brne .L4 mov r24,r2 WHY?? (tmp = r2) subi r24,lo8(-(1)) tmp++ mov r2,r25 r2 = in1 add r2,r24 r2 += tmp .L4: ret This last part could have been: brne .L4 inc r24 (or r2) add r2, r24 Greetings Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Optimiser bloats code
Paulo Marques schreef: Not really a better idea for 3 bits, but it would be for 4: prog_uint8_t inv_table[8]={0,4,2,6,1,5,3,7}; unsigned char inv_test(void) { return pgm_read_byte(inv_table[PORTB 0x3]); } Ah yes, of course a table! The output from gcc 4.2.0: byte inv_test(void) { return pgm_read_byte(inv_table[PORTB 0x3]); 96: e8 b3 in r30, 0x18 ; 24 98: ff 27 eor r31, r31 9a: e3 70 andir30, 0x03 ; 3 9c: f0 70 andir31, 0x00 ; 0 9e: ec 5a subir30, 0xAC ; 172 a0: ff 4f sbcir31, 0xFF ; 255 a2: e4 91 lpm r30, Z } a4: 8e 2f mov r24, r30 a6: 99 27 eor r25, r25 a8: 08 95 ret If not for the redundant andi r31, 0x00 (when r31 has just been zeroed by the eor r31,r31) it would give the same number of instructions as your code. The nice thing about this approach is that it works the same for 4 or more bits (up to 8). Yes but the table would grow large on 8 bits inversion :D. And I'm more afraid of running out of space, then running out of time. Thanks for the help, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Inversion of logic improves size speed
Anatoly Sokolov schreef: Hi, Bug #11259 [avr] gcc Double 'andi' missed optimization: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11259 Bug #29560 Poor optimization for character shifts on Atmel AVR: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560 Bug #29560 seems to be a little different. The bug report is on shifting with a variable shift count. And the loop for doing this shift is non optimal (high byte shift because of int promotion or something alike). While my example works with fixed shifts. Actually, it's bit extraction implemented as shifting. My concern is that when rewriting/inverting my logic I get much better (optimal in most cases) results. So it seems the compiler has not chosen the most optimal path. It seems like he has two ways of doing the shifting? Mabye it's some hidden 8-bit/16-bit variable difference? Testcase: snip There are two 'and' insn (#24 and #12), but them are not optimized yet. Why? Probably reason, 'lshiftrt' insn is splited in 'rotate' and 'and' insns in 'pass_split_after_reload' pass of the compiler, but optimization passes (combine and cse) of which two 'and' insns can merge are run earlier. I see, to bad... It is possible to add peephole for merge two 'and' insns. But I do not think that this decision optimum. Why not? I agree it's not solving the roots of the problem but it helps anyway. I am a total noob on GCC internals so this might be a stupid question... Thanks for all the explantions! Really interresting stuff. Greetings, Wouter ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Inversion of logic improves size speed
Hi list, After some testing I found out that inverting shift and and instruction can significantly reduce speed and size. In the first is case the compiler misses that it can optimise the shifts for bit 4..7 by first nibble swapping. Which it does figure out when rewriting the part as in the lower part. Is this a (known?) bug or am I missing something? Wouter /* This results in shifting instructions */ uint8_t getBit0(uint8_t temp) { uint8_t r = 0; if(temp(10)) r|=0x1; return r; } uint8_t getBit1(uint8_t temp) { uint8_t r = 0; if(temp(11)) r|=0x1; return r; } uint8_t getBit2(uint8_t temp) { uint8_t r = 0; if(temp(12)) r|=0x1; return r; } uint8_t getBit3(uint8_t temp) { uint8_t r = 0; if(temp(13)) r|=0x1; return r; } uint8_t getBit4(uint8_t temp) { uint8_t r = 0; if(temp(14)) r|=0x1; return r; } uint8_t getBit5(uint8_t temp) { uint8_t r = 0; if(temp(15)) r|=0x1; return r; } uint8_t getBit6(uint8_t temp) { uint8_t r = 0; if(temp(16)) r|=0x1; return r; } uint8_t getBit7(uint8_t temp) { uint8_t r = 0; if(temp(17)) r|=0x1; return r; } /* This results in better shifting instructions */ uint8_t getBit0InvShift(uint8_t temp) { uint8_t r = 0; if((temp0)1) r|=0x1; return r; } uint8_t getBit1InvShift(uint8_t temp) { uint8_t r = 0; if((temp1)1) r|=0x1; return r; } uint8_t getBit2InvShift(uint8_t temp) { uint8_t r = 0; if((temp2)1) r|=0x1; return r; } uint8_t getBit3InvShift(uint8_t temp) { uint8_t r = 0; if((temp3)1) r|=0x1; return r; } uint8_t getBit4InvShift(uint8_t temp) { uint8_t r = 0; if((temp4)1) r|=0x1; return r; } uint8_t getBit5InvShift(uint8_t temp) { uint8_t r = 0; if((temp5)1) r|=0x1; return r; } uint8_t getBit6InvShift(uint8_t temp) { uint8_t r = 0; if((temp6)1) r|=0x1; return r; } uint8_t getBit7InvShift(uint8_t temp) { uint8_t r = 0; if((temp7)1) r|=0x1; return r; } This results in: /* This results in shifting instructions */ uint8_t getBit0(uint8_t temp) { uint8_t r = 0; if(temp(10)) r|=0x1; return r; } ae: 81 70 andir24, 0x01 ; 1 b0: 99 27 eor r25, r25 b2: 08 95 ret 00b4 getBit1: uint8_t getBit1(uint8_t temp) { uint8_t r = 0; if(temp(11)) r|=0x1; return r; } b4: 99 27 eor r25, r25 b6: 96 95 lsr r25 b8: 87 95 ror r24 ba: 81 70 andir24, 0x01 ; 1 bc: 90 70 andir25, 0x00 ; 0 be: 08 95 ret 00c0 getBit2: uint8_t getBit2(uint8_t temp) { uint8_t r = 0; if(temp(12)) r|=0x1; return r; } c0: 99 27 eor r25, r25 c2: 96 95 lsr r25 c4: 87 95 ror r24 c6: 96 95 lsr r25 c8: 87 95 ror r24 ca: 81 70 andir24, 0x01 ; 1 cc: 90 70 andir25, 0x00 ; 0 ce: 08 95 ret 00d0 getBit3: uint8_t getBit3(uint8_t temp) { uint8_t r = 0; if(temp(13)) r|=0x1; return r; } d0: 99 27 eor r25, r25 d2: 43 e0 ldi r20, 0x03 ; 3 d4: 96 95 lsr r25 d6: 87 95 ror r24 d8: 4a 95 dec r20 da: e1 f7 brne.-8 ; 0xd4 getBit3+0x4 dc: 81 70 andir24, 0x01 ; 1 de: 90 70 andir25, 0x00 ; 0 e0: 08 95 ret 00e2 getBit4: uint8_t getBit4(uint8_t temp) { uint8_t r = 0; if(temp(14)) r|=0x1; return r; } e2: 99 27 eor r25, r25 e4: 54 e0 ldi r21, 0x04 ; 4 e6: 96 95 lsr r25 e8: 87 95 ror r24 ea: 5a 95 dec r21 ec: e1 f7 brne.-8 ; 0xe6 getBit4+0x4 ee: 81 70 andir24, 0x01 ; 1 f0: 90 70 andir25, 0x00 ; 0 f2: 08 95 ret 00f4 getBit5: uint8_t getBit5(uint8_t temp) { uint8_t r = 0; if(temp(15)) r|=0x1; return r; } f4: 99 27 eor r25, r25 f6: 65 e0 ldi r22, 0x05 ; 5 f8: 96 95 lsr r25 fa: 87 95 ror r24 fc: 6a 95 dec r22 fe: e1 f7 brne.-8 ; 0xf8 getBit5+0x4 100: 81 70 andir24, 0x01 ; 1 102: 90 70 andir25, 0x00 ; 0 104: 08 95 ret 0106 getBit6: uint8_t getBit6(uint8_t temp) { uint8_t r = 0; if(temp(16)) r|=0x1; return r; } 106: 99 27 eor r25, r25 108: 76 e0 ldi r23, 0x06 ; 6 10a: 96 95 lsr r25 10c: 87 95 ror r24 10e: 7a 95 dec r23 110: e1 f7 brne.-8 ; 0x10a getBit6+0x4 112: 81 70
Re: [avr-gcc-list] Optimiser bloats code
Return values are promoted to an int. Why? Is this a bug or a feature? Am I doing something wrong or is an u08 return always promoted to an int? You probably already know this, but you could also do: return PINB 5; which returns the same answer using the following: in r24,54-0x20 swap r24 lsr r24 andi r24,0x7 clr r25 ret Yes I know, (it is written above my example) I wanted to point out how bad the results is when compiler start to optimise this. Just curious, is there any faster way to bit invert as in my foo3 example (see below). It now takes 9 instructions which is good but less is always better. A loop requires more instructions and is much slower. Anyone an idea on smaller bit inversion for just 3 bits? Because if this is the smallest way, you cant tell the compiler to do so :( HTH Wouter //Force the compiler and voila! Optimal! //Not bit inverted or bit inverted, the result is the same uint8_t foo3(void) { //good e0: 88 27 eor r24, r24 uint8_t temp = 0; asm volatile(clr %0 : =r (temp) :); if(PINB (15)) temp |= (12); e2: 1d 99 sbic0x03, 5 ; 3 e4: 84 60 ori r24, 0x04 ; 4 if(PINB (16)) temp |= (11); e6: 1e 99 sbic0x03, 6 ; 3 e8: 82 60 ori r24, 0x02 ; 2 if(PINB (17)) temp |= (10); ea: 1f 99 sbic0x03, 7 ; 3 ec: 81 60 ori r24, 0x01 ; 1 return temp; } ee: 99 27 eor r25, r25 f0: 08 95 ret ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list