Re: [avr-gcc-list] avr-gcc and char strings

2014-12-03 Thread Wouter van Gulik
Could you post the assembler output? IIRC --save-temps or -S should be
added to the gcc command line.

Wouter
Op 3 dec. 2014 22:12 schreef Andreas Höschler ahoe...@smartsoft.de:

 Hi all,

 I am close to tearing my hair out. After having established the avr tool
 chain I tried out a very simple C-program (see below) on an SainSmart
 Mega2560 board programmed into the chip by making use
 of 
 /Applications/Arduino.app//Contents/Resources/Java/hardware/tools/avr/bin/avrdude:

 #define F_CPU 1600UL  /* 16 MHz CPU clock */
 #include Global.h
 #include util/delay.h
 #include avr/io.h
 #include avr/interrupt.h
 #include inttypes.h


 char String[]  = Hello world!!;


 void USARTInit0(uint16_t baud)
 {
 // Set Baud rate
 int value = (F_CPU / 16 / baud) - 1;
 UBRR0H = (uint8_t)(value8);
 UBRR0L = (uint8_t)value;


 // 8N1
 UCSR0C = 0x06; // (3UCSZ00);


 // Enable receiver and transmitter
 UCSR0B = (1RXEN0) | (1TXEN0);
 }


 void TxByte0 (uint8_t data)
 {
 // Wait for empty transmit buffer
 while ( !(UCSR0A  (1  UDRE0)) );
// Putting data into the buffer, forces transmission
UDR0 = data;
 }


 int main (void)
   {
DDRB = 0xff;  // all outputs


USARTInit0(38400);   // 9600 19200 38400


while (1)
  {
   TxByte0('A');
   TxByte0('B');
   TxByte0('C');
   TxByte0('\n');


   char *s = String;
   while (*s != 0)
 {
  TxByte0(*s);
  s++;
 }


   _delay_ms(500);
  }
return (0);
   }


 This program produces the following output

 ...
 ÿÿABC
 ÿÿABC
 ÿÿABC
 ...



 telling me that sending single chars works but sending strings fails (does
 not seem to have anything to do with the serial communication but rather be
 some kind of memory management problem!??).


 To be sure the problem is not caused by my own gcc build I changed my
 Makefile to


 # AVR-GCC Makefile
 PROJECT=toggle_led
 SOURCES=main.c
 HEADERS=

 CC=/Applications/Arduino.app//Contents/Resources/Java/hardware/tools/avr/bin/avr-gcc
 OBJCOPY=avr-objcopy
 MMCU=atmega2560


 CFLAGS=-mmcu=$(MMCU) -Wall -O2 -I /usr/local/avr/include

 $(PROJECT).hex: $(PROJECT).out
 $(OBJCOPY) -j .text -O ihex $(PROJECT).out $(PROJECT).hex


 $(PROJECT).out: $(SOURCES)
 $(CC) $(CFLAGS) -I./ -o $(PROJECT).out $(SOURCES)


 program: $(PROJECT).hex
 avrdude  -p $(MMCU) -c avrispmkII -P usb -e -U flash:w:$(PROJECT).hex
 clean:
 rm -f $(PROJECT).out
 rm -f $(PROJECT).hex



 and thus built the program with a gcc from a respected source (avr-gcc
 coming with Arduino.app). But the problem persists!? I already discussed
 this problem on avrfreaks forum. All agree that the above code should work
 but it does not!?? :-()

 Could this be a problem of the board/chip? I am using a SainSmart Mega2560
 board (which seems to be a clone of the original Arduino product)! Or am I
 doing anything wrong??


 Clueless!?? :-( Hints are greatly appreciated!!


 Thanks a lot,


  Andreas





 ___
 AVR-GCC-list mailing list
 AVR-GCC-list@nongnu.org
 https://lists.nongnu.org/mailman/listinfo/avr-gcc-list


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Pointer register allocation optimizer

2012-07-27 Thread Wouter van Gulik

Georg-Johann Lay schreef op 2012-07-27 16:06:

Wouter van Gulik schrieb:

Hi list,
This code:
char* f(char* p)
{
  p++;
  return p;
}
Results in:

mov r18,r24
mov r19,r25
subi r18,lo8(-(1))
sbci r19,hi8(-(1))
mov r24,r18
mov r25,r19
ret
When compiling with avr-gcc -O[23s] -mmcu=avr5 -S main.c


Oops, copy paste error; for avr5 movw is used to move the pointer 
registers. Still it does a useless move.




Looks very much like PR52278, which is still open.



I think this is the same, for sanity I also checked with int and long 
(against my Ubuntu gcc-avr 4.5.3) and it yields the same result; first 
move the register then the add, then move it back.


What I wonder: why is r18 picked? Clearly r26, or r30 are way better 
choices. Maybe this is fixed in 4.7.1 already, don't have a 4.7+ at the 
moment.



According to Vladimir, the register allocator (RA) should work smooth
with SUBREGs, but obviously, it does not.



Clearly.

You can try -fno-split-wide-types, but that might have other 
disadvantages.




That gives the expected result.


HTH,

Wouter

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Stack use - possible bug

2010-06-09 Thread Wouter van Gulik

Jan Waclawek schreef:

Yes, I can agree to that, but I don't know if I could fully characterize =
this as a bug.




Considering this code:

#include avr/io.h
#include stdio.h

static void test0(void) { char buf[100]; puts(buf);}
static void test1(void) { char buf[110]; puts(buf);}
static void test2(void) { char buf[120]; puts(buf);}
static void test3(void) { char buf[130]; puts(buf);}
static void test4(void) { char buf[140]; puts(buf);}
static void test5(void) { char buf[150]; puts(buf);}
static void test6(void) { char buf[160]; puts(buf);}
static void test7(void) { char buf[170]; puts(buf);}
static void test8(void) { char buf[180]; puts(buf);}
static void test9(void) { char buf[190]; puts(buf);}

int main(void)
{
test0();
test1();
test2();
test3();
test4();
test5();
test6();
test7();
test8();
test9();
}

Running on an atmega48 with only 512 bytes this will work (at first 
glance). But the optimizer creates this:


push r29
push r28
in r28,__SP_L__
in r29,__SP_H__
subi r28,lo8(-(-1450))
sbci r29,hi8(-(-1450))
in __tmp_reg__,__SREG__
cli
out __SP_H__,r29
out __SREG__,__tmp_reg__
out __SP_L__,r28

Which just is bogus since it is beyond memory.
I would consider this a bug.

Interestingly making all functions use buf[100] the optimizer gets smart 
and only uses 100 once:


push r16
push r17
push r29
push r28
in r28,__SP_L__
in r29,__SP_H__
subi r28,lo8(-(-100))
sbci r29,hi8(-(-100))
in __tmp_reg__,__SREG__
cli
out __SP_H__,r29
out __SREG__,__tmp_reg__
out __SP_L__,r28

Here my knowledge of GCC stack optimization stops.

HTH,

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C aliasing rules

2010-05-19 Thread Wouter van Gulik

Hi all,

This might be due to bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39635
or the still open http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39386.
It is about  an error on shifting/inline/return values, it might be this 
that is code triggers the same bug

Although I thought it was fixed it was not fixed in 4.4 the comments says.

HTH,

Wouter

On 19/05/10 14:10, Lars Noschinski wrote:

Hello!

* David Browndavid.br...@hesbynett.no  [10-05-18 21:01]:

Thanks for your answer, David.

   

Lars Noschinski wrote:
 

I'm trying to debug a strange problem, which depends on whether a
function is inlined (then it's broken) or not (then it's ok). Can
someone tell me if the following code snippet violates the C aliasing
rules for b1 (declared as uint8_t*, written as uint32_t* by
xteaDecrypt)?
   

[...]
   

It's not impossible that this is a bug when inlining such complex
code (and 32-bit code like this is complex on an 8-bit micro).  It's
difficult to tell without a compilable code snippet, and some
indication of the expected results.  If you can, you should look at
the generated assembly to see if you can figure out what is going
wrong.  Also try to simplify or remove parts of the code until you
have a minimal example of the problem.

While it would be useful to find out about the problem (especially
if it is a bug that is not already known), code like this does not
benefit much from being inlined.  It's too complex, and requires too
many registers - the function call overhead is therefore minimal.
You could improve the results somewhat by manual restructuring
(perhaps eliminating the memcpy calls), but unless XTEA_ROUNDS is
very small, the loop there will dominate everything.
 

XTEA_ROUNDS is 32 and this code is far from being performance critical,
but code breaking with optimization always makes me nervous ;)

Reading more about strict aliasing issues, especially

 
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
 http://davmac.wordpress.com/2010/02/26/c99-revisited/

I'm fairly sure, that accessing a uint8_t[] via a uint32_t* constitutes
undefined behaviour; the same holds for converting the pointer by use of an
union { uint32_t[2]; uint8_t[8] }; by the gcc (4.3.4) documentation for
-fstrict-aliasing:

| Similarly, access by taking the address, casting the resulting
| pointer and dereferencing the result has undefined behavior, even
| if the cast uses a union type, e.g.:
|  int f() {
|double d = 3.0;
|return ((union a_union *)d)-i;
|  }

So it seems the only correct way is either changing the declaration of
xteaDecryptCbc (i.e. use uint32_t from the beginning) or using memcpy.
Or maybe some playing around with __attribute__((may_alias)).

OTOH, this problem also occurs with -fno-strict-aliasing, so maybe there
is some real bug down there in the compiler. I'll try analyising it
later.

   

If I read http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html
correctly, it should violate the rules?

// ---
void xteaDecrypt(uint32_t v[2], uint32_t const k[4]) {
uint32_t v0=v[0], v1=v[1], delta=0x9E3779B9, sum=delta*XTEA_ROUNDS;
for (uint8_t i=0; i  XTEA_ROUNDS; i++) {
v1 -= (((v0  4) ^ (v0  5)) + v0) ^ (sum + k[(sum11)  3]);
sum -= delta;
v0 -= (((v1  4) ^ (v1  5)) + v1) ^ (sum + k[sum  3]);
}
v[0]=v0; v[1]=v1;
}

void xteaDecryptCbc(uint8_t v[8], uint8_t cb[8], uint8_t const k[16]) {
static uint8_t tmpbuf[8];
memcpy(tmpbuf, v, 8);
xteaDecrypt((uint32_t*)v, (uint32_t*)k);
for (uint8_t i=0; i  8; i++)
v[i] ^= cb[i];
memcpy(cb, tmpbuf, 8);
}

int main(void) {
uint8_t b1[8], b2[8], b3[16];
xteaDecryptCbc(b1, b2 b3);
}
// ---
   

   -- Lars

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
   



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] in-line assembler

2009-08-11 Thread Wouter van Gulik

Hi,

I don't know what you intended to do but I guess this is more like it 
(read the value and write it back):

I personally prefer the %[] construction.

uint8_t get_ram_byte(uint16_t ram_address)
{
uint8_tbyte;

asm  (ld  %[reg] , %[adr]  \n\t
  sts %[adr] , %[reg]  \n\t
: [reg] =r (byte)
: [adr] e   (ram_address));
return byte;
}

You're construction of sts %0, __tmp_reg__ is not correct. GCC is 
trying to feed sts r24 as first argument, which is invalid.
You are feeding him the uninitialized variable 'byte'. Which is also 
allocate to R24.


HTH,

Wouter


Robert von Knobloch schreef:

Hello,
I've been trying to decipher the intricacies of in-line assembler (using
the Inline Assembler Cookbook as my guide).

I have a very simple application that I cannot seem to realise.

I want a C function that will return the contents of the RAM address
that I give it as argument.

My assembler-based function looks like this:

file is hex.c
=
uint8_t get_ram_byte(uint16_t ram_address)
{
uint8_tbyte;

asm  (ld __tmp_reg__, %a1\n\t
sts %0, __tmp_reg__\n\t
: =r (byte) : e (ram_address));
return byte;
}

and is called from

rambyte = get_ram_byte(i );
u_hex8out(rambyte); // Print byte as 8-bit hex.

Trying to compile this results in ~/Monitor/hex.c:5: undefined
reference to `r24' 
If I comment out the line sts %0, __tmp_reg__\n\t then it
compiles and I see that the parameter is passed in R24,25, copied to
R30,31[Z] and the value is read into R0 [__tmp_reg__].
I cannot see what is wrong with the sts command or why R24 is mentioned.

Can anybody help me ?

Many thanks,

Robert von Knobloch.



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] in-line assembler

2009-08-11 Thread Wouter van Gulik

Oke I had some more coffee:

uint8_t get_ram_byte(uint16_t ram_address)
{
uint8_tbyte;

asm  (ld  %[reg], %a[adr]  \n\t
  st %a[adr], %[reg]  \n\t
: [reg] =r (byte)
: [adr] e   (ram_address));
return byte;
}

I only compiled, not assembled...
This compiles and assembles. Note the extra 'a' in front of adr and the 
st instead of sts


HTH,

Wouter


Wouter van Gulik schreef:

Hi,

I don't know what you intended to do but I guess this is more like it 
(read the value and write it back):

I personally prefer the %[] construction.

uint8_t get_ram_byte(uint16_t ram_address)
{
uint8_tbyte;

asm  (ld  %[reg] , %[adr]  \n\t
  sts %[adr] , %[reg]  \n\t
: [reg] =r (byte)
: [adr] e   (ram_address));
return byte;
}

You're construction of sts %0, __tmp_reg__ is not correct. GCC is 
trying to feed sts r24 as first argument, which is invalid.
You are feeding him the uninitialized variable 'byte'. Which is also 
allocate to R24.


HTH,

Wouter


Robert von Knobloch schreef:

Hello,
I've been trying to decipher the intricacies of in-line assembler (using
the Inline Assembler Cookbook as my guide).

I have a very simple application that I cannot seem to realise.

I want a C function that will return the contents of the RAM address
that I give it as argument.

My assembler-based function looks like this:

file is hex.c
=
uint8_t get_ram_byte(uint16_t ram_address)
{
uint8_tbyte;

asm  (ld __tmp_reg__, %a1\n\t
sts %0, __tmp_reg__\n\t
: =r (byte) : e (ram_address));
return byte;
}

and is called from

rambyte = get_ram_byte(i );
u_hex8out(rambyte); // Print byte as 8-bit hex.

Trying to compile this results in ~/Monitor/hex.c:5: undefined
reference to `r24' 
If I comment out the line sts %0, __tmp_reg__\n\t then it
compiles and I see that the parameter is passed in R24,25, copied to
R30,31[Z] and the value is read into R0 [__tmp_reg__].
I cannot see what is wrong with the sts command or why R24 is mentioned.

Can anybody help me ?

Many thanks,

Robert von Knobloch.



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [Fwd: Re: [Fwd: Re: [avr-gcc-list] in-line assembler]]

2009-08-11 Thread Wouter van Gulik

Robert von Knobloch schreef:

Thanks Jan,
I have reached this conclusion too, I didn't understand the
compiler/assembler interaction (and still don't fully, I can't get an
sts var, Y to work, but I'll work at it).


Is it me or are you looking for st var, Y (note the missing trailing s).

HTH

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] AVR simulators

2009-04-02 Thread Wouter van Gulik

Schwichtenberg, Knut schreef:

I'm aware of several AVR simulator running on different OS.
VMLAB, AVRStudio, avrora, simulavr* and there was one mentioned here specially 
developped for the gcc regression tests. Could anyone forward the URL for this 
simulator please.



It is called avrtest and currently is in the winavr repo.

http://winavr.cvs.sourceforge.net/viewvc/winavr/avrtest/

HTH

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Howto I/O in asm instructions?

2008-07-03 Thread Wouter van Gulik

Ruud Vlaming schreef:

Hi,

There are a couple of ways to use i/o address in assymbly.
Below i used a some:

#define _EEARL_ 0x1E
uint8_t portFSReadByte(unsigned char * pAddress)
{ uint8_t result;
  asm volatile ( \
   in   r26, __SREG__  \n\t  
   cli \n\t  
   out  %2, %A0\n\t 
   out  _EEARL_, %B0 \n\t 
   sbi  __EECR__, 0\n\t  
   in   %A0, 0x1D  \n\t 
   out __SREG__, r26   \n\t 
:=r (result) :0 (pAddress), I (_SFR_IO_ADDR(EEARH)) : r26 );

  return result; }

(1) You can define a constant, like i did for _EEARL_ 
  This is nice but no so portable.

(2) You can use the asm paramter list, like for EEARH
  Also nice, but errorprone, since i keep making mistakes in 
  numbering, especially when you have many parameters and 
  have to change something.

(3) The best thing to have would be something like __EECR__,
  resembling the definition for __SREG__, but that does not
  compile right now.

Is the latter possible somehow? Or are there other solutions?


You could go for option 2 if you use syntax like this (took this from 
eeprom.h)


__asm__ __volatile__ (
/* START EEPROM WRITE CRITICAL SECTION */\n\t
inr0, %[__sreg]   \n\t
cli   \n\t
sbi   %[__eecr], %[__eemwe]   \n\t
sbi   %[__eecr], %[__eewe]\n\t
out   %[__sreg], r0   \n\t
/* END EEPROM WRITE CRITICAL SECTION */
:
: [__eecr]  i (_SFR_IO_ADDR(EECR)),
  [__sreg]  i (_SFR_IO_ADDR(SREG)),
  [__eemwe] i (EEMWE),
  [__eewe]  i (EEWE)
: r0
);

This makes the code far more readable IMHO. For more info on the %[ ] 
notation take look at: 
http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Extended-Asm.html


HTH

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Howto I/O in asm instructions?

2008-07-03 Thread Wouter van Gulik

Ruud Vlaming schreef:

On Thursday 03 July 2008 08:43, Wouter van Gulik wrote:

Ruud Vlaming schreef:

(2) You can use the asm paramter list, like for EEARH ...



You could go for option 2 if you use syntax like this
http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Extended-Asm.html


That is indeed a good solution, which i was not aware of.
Thank you for the tip, i will use this, until i found an awnser
to the question below.

Most beautiful would be if you could somehow define __EECR__
'in the background' once, so it is available in every asm routine you
write, like __SREG__ is defined.  Do you know if this is possible at all? 
It seems avr-libc does not do so by itself, and it is a little less simple

as just defining the values somewhere. It must be (automatically)
architecture dependent and globaly visible. 



You could do -D__EECR__=val of EECR on the command line, but having 
alot defines and different architecture is not a good idea IMHO.

The __SREG__ defines are build into GCC.

HTH

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Checking carry flag and Rotate operation in C.

2008-06-17 Thread Wouter van Gulik

Jonathan Blanchard schreef:

Funny enough I was just debugging some error in my inline assembly
code. I was pretty amazed that GCC can actually transform ADC R0 R0 to
ROL R0.



In binary it is the same instruction. ROL rx does not exist it is 
just a short form for ADC Rx, Rx and

LSL Rx is just ADD, Rx, Rx
CLR Rx equals XOR, Rx, Rx
SET Rx is LDI Rx, 0xFF and
TST Rx is AND Rx, Rx.
I once found a website with (almost) all duplicates, but I can't find it 
anymore


HTH

Wouter


Jonathan Blanchard
[EMAIL PROTECTED]


On Mon, Jun 16, 2008 at 9:18 PM, Andy H [EMAIL PROTECTED] wrote:

Internally gcc understands rotate.

So I looked up how gcc might expect rotate to be expressed.

It is
  unsigned char a;
  return (acx) | (acy);

- where the two constant cx, cy add up to size of mode (1+7=8 bits)

However, since we have not defined AVR instruction patterns to gcc for
rotate, it will produce code using shifts. I think this is worthy of a bug
report or at least a place on the TODO list.

Andy


Andy H wrote:

The simple answer us that you cant. Thougg we could do with this in
library and/or gcc patterns (builtin rotate)

This is close:


unsigned char foo(unsigned char b)
{
if (b  128)
{
  b  = 1;
  b ^= 0b00011101;
}
else
{
  b=1;
}
return b;
}


unsigned char bar(unsigned char b)
{
if (b  128)
{
  b ^= 0b0001110;
  b = 1;
  b |= 1;
}
else
{
  b=1;
}
return b;
}

Jonathan Blanchard wrote:

Hi,

I got two question about programming with AVR-GCC. Both are related to
finding a way to generate a specific output in assembler.

First, how do you create the rotate operation in C. Specifically how
can the ROL and ROR can be generated.

Secondly I have this piece of code where b is a 16 bit unsigned integer :

   b = b  1;
   if( b  256 )
   b = b ^ 0b100011101;

To optimize that I only need to check if b overflow at the left shift
operation by checking the carry flag. I'm trying to find a way to do
that in C. Right now I'm using the following piece of inline assembly
to do the trick :

   asm volatile(

   LSL %0 \n\t
   BRCC 1f\n\t
   EOR %0, %1 \n\t
   1:   \n\t
   :+d (b)
   :r  (PPoly)

 );

In this last piece of code b is a 8 bit unsigned integer and PPoly is
a 8 bit unsigned integer with the value 0b00011101. I'm just curious
to know if it's possible to achieve the same result only by using C.

Jonathan Blanchard


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] Updates needed to AVR test files

2008-06-01 Thread Wouter van Gulik
Hi Andy / list,

I was just curious, how many issues are still left? IIRC the last time you
posted some result you got ~50 torture execution and ~500 compile issues.

IIUC this should ideally be patched to the boardfile in the WinAVR CVS
repository or is there an other place where avr board files life these days
(in a GCC repository)?

Wouter

 -Oorspronkelijk bericht-
 Van: [EMAIL PROTECTED] [mailto:avr-gcc-
 [EMAIL PROTECTED] Namens Andy H
 Verzonden: zondag 1 juni 2008 1:35
 Aan: AVR-GCC; Paulo Marques; Mike Stein; Weddington, Eric; Anatoly Sokolov
 Onderwerp: Re: [avr-gcc-list] Updates needed to AVR test files
 
 Small typo - in COMPLEX_INT
 
 Should be
 
 set COMPAT_SKIPS [list {VA} {COMPLEX_INT}]
 
 not plural
 
 Duh!
 Andy
 
 
 Andy H wrote:
  BTW you can also just define these in environment and leave board file
  unchanged
 
  set COMPAT_SKIPS [list {VA} {COMPLEX_INTS}]
  set COMPAT_OPTIONS [list [list {-Os -mcall-prologues} {-Os
  -mcall-prologues}]]
 
  Andy
 
  Andy H wrote:
  There are a couple of changes needed to AVR test files to pass a few
  tests.
 
  Compatibility tests default to no optimization and maximum tests -
  this can easily overflow 128K code area.
  Add these lines to end board file (mine is called
  atmega128-simnew.exp). They set environment vars that control these
  tests and get many more to work. (Some still need other fixes).
 
  # Restrict compatibility tests. And optimise to reduce size.
  set COMPAT_SKIPS [list {VA} {COMPLEX_INTS}]
  set COMPAT_OPTIONS [list [list {-Os -mcall-prologues} {-Os
  -mcall-prologues}]]
 
  Dummy io/exit/abort  file exit.c  has unused parameter stream. The
  warning created  then causes a failure in some tests. Hack as follows
  to create
  dummy reference to stream, thus removing the warning.
 
  int putchar_exit_c(char c, FILE *stream)
  {
 *((volatile unsigned char *) STDIO_PORT) = c;
 stream = NULL;
 return 0;
  }
 
 
  best regards
  Andy
 
 
 
 
 
 
  ___
  AVR-GCC-list mailing list
  AVR-GCC-list@nongnu.org
  http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
 
 
 
 
 ___
 AVR-GCC-list mailing list
 AVR-GCC-list@nongnu.org
 http://lists.nongnu.org/mailman/listinfo/avr-gcc-list



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Add builtins in avr target.

2008-04-22 Thread Wouter van Gulik

Anatoly Sokolov schreef:

Hello.

I have considered all proposals on changing '__builtin_avr_delay_cycles' 
builtin. Also has  added '__builtin_avr_fmul*' builtins.



There is one thing that crosses my mind. A user would not only want a

voidinterrupt_enable  (void)
uint8_t interrupt_disable (void)

but also a

void interrupt_restore  (uint8_t)

In 99% of the cases interrupt_disable would be used, the user needs an 
option to restore the interrupt flag to it's previous state.
This feature would also make the atomic access builtin set of GCC 
functions reachable. See 
http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Atomic-Builtins.html#Atomic-Builtins

for more detail.

HTH

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] Illegal opcode errors building gcc-4.2.3

2008-04-20 Thread Wouter van Gulik


 -Oorspronkelijk bericht-
 Van: Rick Mann [mailto:[EMAIL PROTECTED]
 Verzonden: zaterdag 19 april 2008 19:49
 Aan: Wouter van Gulik
 Onderwerp: Re: [avr-gcc-list] Illegal opcode errors building gcc-4.2.3
 
 
 On Apr 19, 2008, at 7:28 AM, Wouter van Gulik wrote:
 
  This is because binutils 2.18 does not support avr architecture 35.
  Use binutils 2.18.5 or more recent. Search the gcc mailing list for
  some
  more info.
  If your are building for linux maybe you should take a look at the
  linux
  build scripts sticky post on
  http://www.avrfreaks.net/index.php?
  name=PNphpBB2file=viewtopict=42631
  You need to be logged in to actually download the scripts. These
  script are
  for 4.2.2
 
 
 I finally realized I had to be logged in to find those. I build for
 Mac OS X, using a script I made myself, but it does not apply patches
 (it makes the mistaken assumption that recent GCC versions actually
 work out-of-the-box for AVR). My script also builds GDB and AVaRICE.
 
 I don't see a binutils-2.18.5 where I normally look:
 
   http://ftp.gnu.org/gnu/binutils/
 
 Is there another place I should be looking?
 

ftp://sourceware.org/pub/binutils/snapshots It is the snapshot. I do not
know if the current version is broken for the AVR.

 I finally used your scripts and got a toolchain working that supports
 the ATmega324P (the need that triggered all this). Thank you for those.
 

The scripts are not mine. That would be to much credit. But it is good to
know your setup works.

HTH,

Wouter

Ps please use reply-all next time, so it gets to the list, just in case
someone else has the same problem.



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Add builtins in avr target.

2008-04-17 Thread Wouter van Gulik

Anatoly Sokolov schreef:

Hi.


  I wish to add:

..

3. builtin similarly to IAR '__delay_cycles';

..

This unfinished patch add '__builtin_avr_delay_cycles(long delay)'  builtin to 
the avr backend. The 'delay' parameter should be constant.

If 'delay' is 1 or 2 then one or two 'nop' instructions is generated. 



For a 2 cycles delays an rjmp can be used. Saves an instruction!


If  'delay' is from 3 to 756 then code:
   ldi rX, (delay/3)
  1:dec rX
brne 1b
is generated. 'ldi' instruction can be removed by optimizer.

For  'delay'  from 757 to 196605 loop is:  
 1:sbiw Rx,1

brne 1b

For  'delay' from 196606 to 83886075 loop is:  
 1:subi %0,1

 sbci %B0,0
 sbci %C0,0
 brne 1b

And for  'delay' from 83886076 to 0x loop is:
1:subi %0,1
sbci %B0,0
sbci %C0,0
sbci %D0,0
brne 1b



That is a high registers usage. 4 register used just for burning cycles? 
On the other hand burning cycles this way will probably never be used in 
real code.



Adding '__builtin_avr_delay_cycles' builtin will allow to remove restrictions 
on max possible values of parameter for '_delay_us' macro and reduce code size 
for long delay of  '_delay_ms' macro.

Also it will simplify porting code from IAR C, if define  '__delay_cycles' as 
'__builtin_avr_delay_cycles'.

As you consider, this builtin will be useful?



Yes it is useful, especially the really short ones.

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] RE: Patch Fix PR35013, PR27192

2008-04-17 Thread Wouter van Gulik

Andy H schreef:
 I think I have found a simple fix.

 I changed gcc so that offsets added to assembler symbols are doubled. So
 in c when we use foo+2  this gets send to assembler/linker as 
gs(foo+4).


 This has the effect that offsets or arithmetic are consistently in words
 - on a word pointer. (which makes more sense)

 Now it does not matter if optimisation  creates  p=foo+2  OR p=foo,
 p=p+2 as the result will be the same.
 I attach test program I used to check several variant and it worked.
 Apart from normal warning messages about linker stubs. There also lst
 and lss files you can look at what gets send to assembler and code
 produced.


It looks ok to me (just looking at the lss, not rebuilding gcc)
but the code is not optimal. It is moving to r18 doing the operation and 
then moving it back to r24. Is this because of your patch? Or something 
else?


HTH,

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] optimization flags causes problems??

2008-04-17 Thread Wouter van Gulik

Ramazan Kerek schreef:

Hello,
I have started using AT90CAN128 with WinAVR-20080402.
I am having problem with with optimizatin flags.



Do not use 20080402, use 20080411.

HTH

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] [AVR gcc link error] relocation truncated to fit: R_AVR_13_PCREL ...

2008-04-16 Thread Wouter van Gulik

Emmanuel Bourien schreef:

Hello,
I get and install the last version of WinAVR (April 08 -GCC 4.3.0).
I get this relocation truncated to fit: R_AVR_13_PCREL error during 
the link operation.

I'm working with AVR Studio 4.12

I've read i should add -lm to the linker command to avoid this...but no 
success :(

So i need your help to avoid this problem!

Excuse my english ;)
thanks for your help. Best regards
emmanuel



Your are almost there
You need to add -lm at the end of the linker step. See 
http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_libm


HTH,

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: Patch Fix PR35013, PR27192

2008-04-16 Thread Wouter van Gulik

Andy H schreef:

RFC

A problem has come up trying to  fix  function pointer arthmetic bugs.


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35013
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27192


I created patch to solve this, but Anatoly has found a problem.

Without patch we had  func returning word address and (func + 2) 
returning byte address. This occurred because AVR backend did not 
recognise (func+2) as program memory and did not use linker directive 
gs(func+2).


With patch we get  func and func+2  both returning word address. Which 
solved the bugs reported.


Now if such a word pointer undergoes further arithmetic in c,  then it  
will, of course, be adjusting it by words.


The problem that Anatoly discovered is that optimization can  break 
this. His example involved volatile pointers but it will happen on more 
realistic cases. For example if we create pointers to Foo, Foo+2,  
Foo+4, optimization will try and use one pointer  to create values of 
other pointers.  So  we will get word address of Foo offset by  words - 
or word address of Foo offset by bytes!  This just depends if the offset 
calculation is done in gcc or the linker. Ag!




Just for my understanding. The programpointer calculation is thus 
translated to datapointer calculation?


There probably is no way of telling GCC we are dealing with different 
pointer types here? Since GCC can not handle different memory types at all.



To fix is not simple. The following are ideas only - please add some more

1) One way to to get linker to perform maths on gs() using word offsets. 
gs(foo+2) would be word address of foo+ 2 words. Then it would not 
matter if gcc or linker performed maths. I do not know if this is 
possible or what problems this might create.


2) We could avoid using gs() and get function address as byte address + 
byte offset.  This would require extra run-time code for function 
pointer calls (to divide by two). It is useful in that same pointers 
could be used to read bytes from program memory.


3) Give up and dont permit any arithmetic. (Perhaps error or warning 
would still be needed)


4) Like (1) but use new directive gsw() to permit this method?

5) Like (2) but use attribute to permit this method?

6) Get gcc to recognize constant pointer maths and exclude it from 
linker gs()


7) Get gcc to recognize constant pointer maths and pass to linker as 
gs(Foo) + n instead of gs(Foo+n) - if this is possible.



Please add to discussion.




So they main point is to keep the knowledge that the add is in words not 
bytes. Right?
Is it not possible to do all programspace addition on byte level, until 
the linker? Which converts them to words? e.g. foo+2 becomes 
foo+linker_know_this_is_2_words.


Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] testsuite saga continues

2008-02-07 Thread Wouter van Gulik
 The logging version will always be slower. This is not just a matter of
 outputting the log, it is also a matter of building the log.
 
 We can avoid the output cost by only printing the last N lines, but we
 can not avoid the build cost. The code to do this was there at some
 point, but I decided to remove it, because under Linux you probably can
 do the same by running avrtest_log test_program | tail -n N and it
 should run almost as fast as a native solution.
 

The only information you print is register info right? Since the parsing is
so heavy does it make sense to save the complete register file (up to SPH).
And then parse afterwards? It is just 96 bytes.
The only info missing would be addresses to/from memory, but that could be
ignored, since it's only load/store. When the address is absolutely
relevant, just re-run using log.
Just thinking out loud, it is probably nasty to create and the gain is
almost nothing...

 So, I can add a --tail option to the log version, but the naked
 version will never be able to print any log at all, so that it runs as
 fast as possible.
 
 Remember that the main purpose of avrtest is to run gcc's testsuite.
 While running the testsuite, having a log is useless, but speed is
 important.
 

Yes you are right, you can't have it all.

 BTW, I've done some more optimizations and the version I have now is
 almost twice as fast as the one on CVS, doing 30 P4 clocks per AVR
 clock, i.e., on my P4 3GHz I can simulate a 100MHz avr :)
 
 
 I don't have those numbers right now, but since there are tests that
 don't even fit in 128Kb of flash, there are probably some more that
 don't fit on 8Kb.
 

Aha, well it is going to be hard to test the 8Kb then. Are these 128KB even
in an optimized? I can imagine not fitting when using -O0.

 
  Do you already have a format for doing this? XML based?
 
 Nop. I haven't even started to think about the details.
 
 I would give my full support to anyone trying to setup a benchmarking
 framework, though ;)
 

Hmm, nope never done such a thing before. Let me first try to get gcc
compiling on my windows machine. It is a lot faster compared to my linux
machine (DualCore 2.1GHZ vs Duron 1.3 GHz)

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] testsuite saga continues

2008-02-04 Thread Wouter van Gulik
 
 Just a little note to let you know that the above change and other small
 cleanups have been committed to cvs.
 
  From the changelog:
 
 - give more information at program exit
 - cleanup a lot of #ifdef's
 - change the timeout from cycles to instructions, because the simulator
 runs slightly faster this way
 - add a barrier for the stack at 0x60, that makes avrtest abort with
 stack overflow when crossed
 

Yes I have seen it, and used it. My clz no longer passes the test. It bails
on stack overflow. But if I comment out the long long parts, all is ok.

 The next step will definitely be ELF loading support. With ELF loading,
 I can decode symbols like __bss_end to know where the stack overflows
 exactly or use __stack to know where the stack underflows. I can also
 do a more symbolic log, by decoding addresses to their symbol names.
 

That would be very cool! Although a dump log at an abort might also be
useful when debugging testcases.

Some more thoughts about the smaller avr's. I did not intend to catch wrong
instruction, but I was aiming at finding bugs that are do not apply to the
mega series. Because if there are bugs in the less capable devices, it's
very likely to be in the avr backend, which is easier to fix.

Can't avrtest/gcc fake a avr2 device (e.g. at90s8535) with tons of flash and
ram? Just like you now fake huge amounts of external memory? 

Thanks for the good work!

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] testsuite saga continues

2008-01-31 Thread Wouter van Gulik

Andrew Hutchinson schreef:
One might argue that carry is the result of a compare with largest 
integer value (255 for bytes). But these situation do not directly arise 
in C - or I assume any other supported language - so it is not 
considered.  (Though the ability to propage carry would indeed help 
create mode independent arithmetic operation.).




Having carry as a condition code indeed seems not very useful
But the most benefit from teaching gcc about the carry is the 
propagation of the carry, that is my main concern. Is not possible to 
create a special register for carry, (not in cc0) just for doing 
arithmetics using carry? This would lead to an expand of the 
sub/add/shift/cmp(?) in to simple byte patterns.

Giving gcc much more knowledge on what's going on.

This is close to what Dave has suggested in the other thread.

I have to little knowledge on gcc's further internals to over see all 
consequences, I guess there are very good reason not to do this.


Wouter




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


FW: [avr-gcc-list] testsuite saga continues

2008-01-31 Thread Wouter van Gulik
On behalf off Andy I am forwarding this:


Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Onderwerp: Re: [avr-gcc-list] testsuite saga continues

Wouter,
 
You are correct. gcc will treat any structure that is 8 bytes or less as a
single register and likely have same problem. This would be structures.

I am trying to come up with fixes to several bugs and hopefully this
one sometime.

avr-gcc is odd that it accepts 8 byte as maximum - rather than 4. Im am nor
sure why it was needed without also all the other parts for long long that
must work. However, perhaps this was intended for double support.

To get long long to work, it would be wise for all other patterns to be
reduced to byte level expansion (where possible) this would exclude
shift/compare/add etc that need carry operations. That would mimimise the
amount of work required.

New patterns would include move DI (8bytes) and perhaps add/sub/compare

Also, it would be wise to change priority of register allocation (which I
have a fix  for). Since the current allocation does not easily permit 8
contigous registers to be allocated (the first is r24!). So you have mess of
stack operations and register moves created

Of course it will still be slow!

I do not intend to fix problems for long long - unless they also fix
problems in other areas.

Please post to list - this email will not work for me.




-Original Message-
From: Wouter van Gulik [EMAIL PROTECTED]
To: Paulo Marques [EMAIL PROTECTED]
Cc: avr-gcc-list@nongnu.org
Sent: Wed, 30 Jan 2008 3:56 am
Subject: Re: [avr-gcc-list] testsuite saga continues
 Do you have a clue on why the tests fail? There is an ugly bug
concerning 
 stack allocation and 64 bit variables, maybe that is the evil one. See: 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27386 for details. 
  All the failed tests I've seen so far do in fact pass long long 
arguments to functions together with a bunch of other arguments  (sometimes
using va_args, too). 
  In one of the cases (gcc.c-torture/execute/20030307-1.c), the test only
 fails at -O0 and -O1, but passes with other optimization levels because 
the functions get inlined and disappear completely, so the argument 
passing problem disappears too. 
  So, I would say that it is very likely the same bug... 
  
So this means that 64 bit is mainly supported, only due to the stack
allocation bug it's hard (if not impossible) to use. 
We should really try to find someone who can fix this nasty bug. 
 
Note that all sort of stack parsed arguments can go wrong, so it's not only
64 bit. 
 
Wouter 



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] testsuite saga continues

2008-01-31 Thread Wouter van Gulik

 In the meantime, I tried to not mark long long as unsupported, with
 similar results.
 
 Without no_long_long there are more 32 tests that fail, but less 556
 tests are marked as unsupported. Which means that 64 bit long long's
 are mostly supported in fact.
 
 The question is: are long long's officially supported? Should we be
 running the tests that use them?
 
 BTW: 64 bit long long is really hard for a 8 bit microcontroller. At
 least one of the tests (with -O0 optimization) was initially failing
 from timeout, which means that it was taking more than 500 million
 cycles to execute. Increasing the timeout to 2 billion cycles solved the
 problem, though.
 

Well today I have found out why this could be. I am testing a new version of
the clz fixes and I also implemented some DI versions (DI = double int = 64
bit in gcc's internal terms).

To my surprise some options did not changed a thing in cpu cycles, while the
program got much shorter... So I took another look at it, and guess what...
The stack usage was to much, so that it was now pushing it values into I/O
memory including the special exit code memory.
The program now exited successful on a push r15 :D

Can you make avrtest check on stack overflow?

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] testsuite saga continues

2008-01-31 Thread Wouter van Gulik

Paulo Marques schreef:



The program used more than 4k of stack? Yikes!



Well I thinks it's the 64 bit stack bug... if anything goes wrong with 
the stack you might end up having a huge stack. It's a bug in the program.



Can you make avrtest check on stack overflow?


I can, specially if I start accepting command line arguments to define 
memory regions, so that I also know where the stack really ends.


I'll post a new version as soon as I have this. In the meanwhile, you 
can work around that specific problem, by switching the addresses of the 
exit and the abort ports, so that the abort port is hit first ;)




Yes I already thought about doing so.

Could you then also print the real flash address of the exit just like 
you do with the log.

And the total number of cycles past.

Thanks in advance,

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] testsuite saga continues

2008-01-30 Thread Wouter van Gulik


I have not dug enough into the details of gcc, but I thought that flags 
were only visible at a low level, such as in the avr.md file, where you 
are defining the assembly code sequences for different effects.  Thus it 
is possible to define a 16-bit addition instruction with an add, adc 
sequence - but you can't really make use of the carry flag after that. 


Yes this is exactly what I wanted to point out. The carry is now only 
used in handwritten assembler (in avr.md). GCC's RTL does not know 
anything about the carry bit being available when it's set/cleared and 
when it's clobbered.


HTH,

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] (no subject)

2008-01-28 Thread Wouter van Gulik
 
 Try casting the expression (foo +1) to a function pointer. This may
 prove the point and correct the code in table - and elsewhere.
 
 Arithmetic with function pointers is probably not well standardized - if
 at all.
 

I agree. Although giving different results for the same statement is very
nasty.

I tried casting but I could not get any difference. When I tried to cast
both operands to a function pointer I got a message that + is not a legal
operation. Which seems ok and logical to me.

Is there anyway we can make this warning an error (by default)?
main.c:14: warning: internal error: out of range error

So the user knows something is broken in his code.

Thanks for the help

Wouter

 Andy
 
 
 Wouter van Gulik wrote:
  http://gcc.gnu.org/onlinedocs/gcc-4.2.2/gcc/Pointer-Arith.html#Pointer-
  Arith
 
  The elements to which function (and void) pointer refer are assumed to
  be size 1 byte.
 
  So if you really want to mess with these pointers, you must treat as
  byte address.
 
 
 
  Exactly what I thought. If I do nasty things I should know what I am
 doing.
 
 
  That does not explain the other problems reported directly. However
  given  foo is a function pointer, what is the type of the expression
 foo
  + 1.? Perhaps gcc treats such an expression as void?
 
 
 
  Exactly the table is still messed up by gcc.
  Any idea how I/we can test this? I looked at the -da output but I could
 not
  find anything related to the table.
 
  Is the avr backend involved in generating the correct function pointer
  addresses? Where is this gs() coming from? I searched through the as
  documentation but I could not find it. Is it from ld?
 
  Are there other platforms supported by gcc having the same strange non
 equal
  data/program space?
  I know that the TI C54x series have a 8 bit program space, and a 16 bit
 data
  space. sizeof(char) == sizeof(int), both 16-bit!, but instructions and
  function addresses are in bytes 
  Maybe we can find some hints there?
 
  HTH
 
  Wouter
 
 
  Andy
 
 
 
  Andrew Hutchinson wrote:
 
  I think you highlight the problem for gcc.
 
  We are have to treat program memory as byte addressable to support
 LPM.
 
  Direct, function calls only want word address to form the correct
  opcode. But we use byte address labels  and assembler removes the
  redundant bit to form the correct opcode.
 
  Indirect (icall) functions show up the anomaly as these are formed
  outside of the assembler.
 
  Gcc is assuming that the item that a function pointer points to is
  size 1. When in fact it is size 2.
 
  This is similar as having pointer to some other object such as long:
 
  long *ptr;
 x = ptr+1;  /* x will be assinged byte address potr+4 */
 
  So if we can correct that mistake, I believe the problem is resolved.
  Now, I am not sure how gcc determines that size! So I will look.
 
  Andy
 
 
 
  Wouter van Gulik wrote:
 
  Compiling the following program ends up in main.c:(.text+0x2):
  warning: internal error: out of range error
 
  = main.c 
 
  //Dummy func
  void foo(void) {}
 
  //Table with address manipulation
  void (* const pFuncTable[]) (void) = {
  foo + 0,
  foo + 1, //need odd offset
  };  int main(int argc, char* argv[]) {
  //Call table
  pFuncTable[1]();
  return 1;
  }
  Looking into the generated assembler gives:
 
  pFuncTable:
  .word   gs(foo)
  .word   foo+1
 
  Which is wrong. It should have been gs(foo + 1) or perhaps gs(foo)+1
 
  But the true wrong thing is that gcc out smarts the table (since it's
  const)
  and directly does: call foo+1. This gives the internal error.
  Even worse is that the compiler does not stop!! IMHO it should stop
  here,
  instead it generates this final assembly: 00a6 main:
a6:   0e 94 00 00 call0   ; 0x0 __vectors
aa:   81 e0   ldi r24, 0x01   ; 1
ac:   90 e0   ldi r25, 0x00   ; 0
ae:   08 95   ret
 
 
  Before I post a note to the existing bug report (it's probably
  related with
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27192 ) I want to know
  what foo
  + 1 is supposed to do. GCC seems to mix up byte address (for lpm) and
  word
  addresses (for ijmp/jmp//icall/call).
  Is it supposed to increment the byte address or the word address?
  I guess byte addresses are what it's supposed to be, since calling
  foo + 2
  ends up in calling foo + 2 bytes. Leaving foo + 1 as illegal address.
 
  And I just found another nasty error:
 
  //Dummy func
  void foo(void) {}
 
  //Table with address manipulation
  void (* const pFuncTable[]) (void) = {
  foo + 4, //need odd offset
  };  int main(int argc, char* argv[]) {
  //Call table
  pFuncTable[0]();
  return 1;
  }
  This will generate a correct call (4 bytes after foo) but the value
  in the
  table is not left shifted! Meaning that a call via the table will
  generate a
  call to the wrong address, while

RE: [avr-gcc-list] (no subject)

2008-01-28 Thread Wouter van Gulik
 http://gcc.gnu.org/onlinedocs/gcc-4.2.2/gcc/Pointer-Arith.html#Pointer-
 Arith
 
 The elements to which function (and void) pointer refer are assumed to
 be size 1 byte.
 
 So if you really want to mess with these pointers, you must treat as
 byte address.
 

Exactly what I thought. If I do nasty things I should know what I am doing. 

 That does not explain the other problems reported directly. However
 given  foo is a function pointer, what is the type of the expression foo
 + 1.? Perhaps gcc treats such an expression as void?
 

Exactly the table is still messed up by gcc.
Any idea how I/we can test this? I looked at the -da output but I could not
find anything related to the table.

Is the avr backend involved in generating the correct function pointer
addresses? Where is this gs() coming from? I searched through the as
documentation but I could not find it. Is it from ld?

Are there other platforms supported by gcc having the same strange non equal
data/program space?
I know that the TI C54x series have a 8 bit program space, and a 16 bit data
space. sizeof(char) == sizeof(int), both 16-bit!, but instructions and
function addresses are in bytes 
Maybe we can find some hints there?

HTH

Wouter

 Andy
 
 
 
 Andrew Hutchinson wrote:
  I think you highlight the problem for gcc.
 
  We are have to treat program memory as byte addressable to support LPM.
 
  Direct, function calls only want word address to form the correct
  opcode. But we use byte address labels  and assembler removes the
  redundant bit to form the correct opcode.
 
  Indirect (icall) functions show up the anomaly as these are formed
  outside of the assembler.
 
  Gcc is assuming that the item that a function pointer points to is
  size 1. When in fact it is size 2.
 
  This is similar as having pointer to some other object such as long:
 
  long *ptr;
 x = ptr+1;  /* x will be assinged byte address potr+4 */
 
  So if we can correct that mistake, I believe the problem is resolved.
  Now, I am not sure how gcc determines that size! So I will look.
 
  Andy
 
 
 
  Wouter van Gulik wrote:
  Compiling the following program ends up in main.c:(.text+0x2):
  warning: internal error: out of range error
 
  = main.c 
 
  //Dummy func
  void foo(void) {}
 
  //Table with address manipulation
  void (* const pFuncTable[]) (void) = {
  foo + 0,
  foo + 1, //need odd offset
  };  int main(int argc, char* argv[]) {
  //Call table
  pFuncTable[1]();
  return 1;
  }
  Looking into the generated assembler gives:
 
  pFuncTable:
  .word   gs(foo)
  .word   foo+1
 
  Which is wrong. It should have been gs(foo + 1) or perhaps gs(foo)+1
 
  But the true wrong thing is that gcc out smarts the table (since it's
  const)
  and directly does: call foo+1. This gives the internal error.
  Even worse is that the compiler does not stop!! IMHO it should stop
  here,
  instead it generates this final assembly: 00a6 main:
a6:   0e 94 00 00 call0   ; 0x0 __vectors
aa:   81 e0   ldi r24, 0x01   ; 1
ac:   90 e0   ldi r25, 0x00   ; 0
ae:   08 95   ret
 
 
  Before I post a note to the existing bug report (it's probably
  related with
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27192 ) I want to know
  what foo
  + 1 is supposed to do. GCC seems to mix up byte address (for lpm) and
  word
  addresses (for ijmp/jmp//icall/call).
  Is it supposed to increment the byte address or the word address?
  I guess byte addresses are what it's supposed to be, since calling
  foo + 2
  ends up in calling foo + 2 bytes. Leaving foo + 1 as illegal address.
 
  And I just found another nasty error:
 
  //Dummy func
  void foo(void) {}
 
  //Table with address manipulation
  void (* const pFuncTable[]) (void) = {
  foo + 4, //need odd offset
  };  int main(int argc, char* argv[]) {
  //Call table
  pFuncTable[0]();
  return 1;
  }
  This will generate a correct call (4 bytes after foo) but the value
  in the
  table is not left shifted! Meaning that a call via the table will
  generate a
  call to the wrong address, while the original call is ok.
 
  Wouter
 
 
 
  ___
  AVR-GCC-list mailing list
  AVR-GCC-list@nongnu.org
  http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
 
 
 
  ___
  AVR-GCC-list mailing list
  AVR-GCC-list@nongnu.org
  http://lists.nongnu.org/mailman/listinfo/avr-gcc-list



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] (no subject)

2008-01-27 Thread Wouter van Gulik
Compiling the following program ends up in 
main.c:(.text+0x2): warning: internal error: out of range error

= main.c 

//Dummy func
void foo(void) {}

//Table with address manipulation
void (* const pFuncTable[]) (void) = {
foo + 0,
foo + 1, //need odd offset
};  

int main(int argc, char* argv[]) {
//Call table
pFuncTable[1]();

return 1;
}   

Looking into the generated assembler gives:

pFuncTable:
.word   gs(foo)
.word   foo+1

Which is wrong. It should have been gs(foo + 1) or perhaps gs(foo)+1

But the true wrong thing is that gcc out smarts the table (since it's const)
and directly does: call foo+1. This gives the internal error.
Even worse is that the compiler does not stop!! IMHO it should stop here,
instead it generates this final assembly: 
00a6 main:
  a6:   0e 94 00 00 call0   ; 0x0 __vectors
  aa:   81 e0   ldi r24, 0x01   ; 1
  ac:   90 e0   ldi r25, 0x00   ; 0
  ae:   08 95   ret


Before I post a note to the existing bug report (it's probably related with
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27192 ) I want to know what foo
+ 1 is supposed to do. GCC seems to mix up byte address (for lpm) and word
addresses (for ijmp/jmp//icall/call).
Is it supposed to increment the byte address or the word address?
I guess byte addresses are what it's supposed to be, since calling foo + 2
ends up in calling foo + 2 bytes. Leaving foo + 1 as illegal address.

And I just found another nasty error:

//Dummy func
void foo(void) {}

//Table with address manipulation
void (* const pFuncTable[]) (void) = {
foo + 4, //need odd offset
};  

int main(int argc, char* argv[]) {
//Call table
pFuncTable[0]();

return 1;
}   

This will generate a correct call (4 bytes after foo) but the value in the
table is not left shifted! Meaning that a call via the table will generate a
call to the wrong address, while the original call is ok.

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] Small bug in avrtest

2008-01-25 Thread Wouter van Gulik

Hi list/Paulo,

I just wanted to let you know before someone else also spends an evening 
searching this bug :D


The file is opened with the rt option. I don't know what it's supposed 
to do, but it makes my Mingw Windows compiled version open files only 
half or so. Leading to illegal pc out of bounds errors.


I changed it to rb and all was good.

HTH,

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Optimisation help

2008-01-21 Thread Wouter van Gulik

Magnus Johansson schreef:


I totally get the second and third reads. But the first one, just moving 
 r24 to r17 will only work if r24 is only 0x00 or 0x01 not otherwise...?


What should I do?



Well I can't see all assembler so this is a bit of a guess. GCC is 
probably going to do a conditional load. Since result/r17 could be 
loaded after the first call it will do so. So it will probably generate 
a clr and conditional ldi r17, 1 or something alike.


It would help if you provide all of the assembler between the first and 
the second call.


HTH

Wouter




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] How to get low byte off a function address?

2008-01-21 Thread Wouter van Gulik

Erik Christiansen schreef:

On Sat, Jan 19, 2008 at 04:15:35PM +0100, Wouter van Gulik wrote:

How do I do such a thing? Using the lower 8 bits is possible when loading a
register so why not in a table?


In the past, we've encountered other relocations that aren't handled by
the avr port of binutils. It does look like this is another case that
avr-ld hasn't been tweaked to handle.

It sounds like you've tried one work-around, i.e. loading a register,
then writing to the table, now necessarily in RAM. That's workable, code
space allowing.



Ehm no not exactly. I am wondering why something like this works:
ldi r31, lo8(gs(foo))

and this not:
.byte lo8(gs(foo))

Why does as or ld (?) in the latter state it's not constant and the 
second is no problem?


After some more testing I found out that constructions like:

.byte lo8(1024)

are not allowed.

Is this a bug?


The learning curve for binutils internals being a bit too steep for a
quick toolchain tweak, I'd alternatively be tempted to invoke a few
lines of awk (from the makefile) to snaffle the absolute addresses from
the map file, insert them in the table, reassemble that file, and link
again. (Pre-existing dummy .byte lines would ensure addresses don't move
in flash.) That's perhaps worthwhile if you're chasing this either
because a RAM-resident table, or the copy loop, is intolerable in the
tiny bootloader. Granted, this comes close to winning an ugly contest,
but it pretty much has to work(tm).

If the file with the function pointer table is linked last, then the
others can be incrementally linked, and the table file linked after
being awked.



I am not afraid of winning the contest. As long as it save flash I am in 
for it :D

That would be an other option yes.



An afterthought: You could alternatively put the foo() functions into a
separate output section, allowing the linker script to place the block
of them at a fixed address. (Each could in fact be placed in an
individual section.) The table could in the latter case be filled with
constants. (It's not real pretty either, is it?)



Well, the whole idea was to have it constant. I wanted to reduce the big 
cpi/brne tree and so I came up with this reduced-size jump table idea. 
The table in the real application should also contain an opcode.
So the idea was to check against opc and the ijmp/icall to the correct 
function. After I wrote the assembler it turned out to be 2 bytes 
shorter then my cpi/brne... But I thought it reads better and it is 
easier to extend the table. So I kept on trying, but no success yet.



Hope you can help


Hope some of the above does, at least a little. :-)


Thanks anyway!

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] How to get low byte off a function address?

2008-01-19 Thread Wouter van Gulik
Dear list,

How do I do this:

main.c
=
void foo(void) {
}

char table[2] = {
(foo), 
(foo), 
};

int main(int argc, char* argv[]) 
{
int adr = table[argc]+0x3F00;
((void (*) (void))adr)();
return 0;
}=

It gives:

main.c:5: warning: initialization makes integer from pointer without a cast
main.c:5: error: initializer element is not computable at load time
main.c:5: error: (near initialization for 'table[0]')
main.c:6: warning: initialization makes integer from pointer without a cast
main.c:6: error: initializer element is not computable at load time
main.c:6: error: (near initialization for 'table[1]')

Why is it not computable? Why is it when I make it 16 bits?

I first wanted to implement this in assembler.
But I got all sorts off error, mainly gas refusing to see I only want 8
bits, not 16.
So I figured I go and try this in C, but I can't get it done.

You might wonder why on earth would I want to such a strange thing?
Well I have a very small bootloader so I know all my functions are within
the 512byte/256 word boundary. And I there for I have no need to store the
full 16-bit address (in order to keep my bootloader small).
I just need the low byte of the address.

How do I do such a thing? Using the lower 8 bits is possible when loading a
register so why not in a table?

In assembler I tried this:
   .byte lo8(foo)
 .byte pm_lo8(foo)
 .byte lo8(pm(foo))

But all are with the same result:
Error: illegal relocation size: 1  
Error: junk at end of line, first unrecognized character is `('

Hope you can help

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] '-morder' option with Avr-libc: comparison table

2008-01-14 Thread Wouter van Gulik

Hi all.

If I run the -morder1 and the -frename-registers on my test programs it 
grows in size. The difference is probably that my test programs (sorry I 
can't release sources) is not using any 32 bit variable and hardly any 
16 bit.


Test summary:

 -morder1 + -frename-register| -morder1
test1 |  bigger  | smaller
test2 |  bigger  | smaller

Test1 grows from 9886 bytes to 9930 bytes. An increase of 44 bytes. The 
most (38 bytes) is in one file. There is not one file that gets smaller.


In Test2 there are several files that get a little bit smaller. But 
again there are 2 that get larger.


Including one interrupt routine that now uses an extra register and thus 
an requires an extra push/pop.


So I think -frename-register works very well for 32 bit variables but 
not for applications not using any 32 bit variable.


HTH,

Wouter


Hi.

Summary results for Avr-libc CVS HEAD 2008-01-13, only C-functions.
Values (base variant) are slightly different from ones of 10 Jan,
due to bug #21995 is fixed.  GCC 4.3.X is 4.3-20080104 snapshot.








___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Missed optimisations

2008-01-14 Thread Wouter van Gulik

David Brown schreef:


The code is basically good (the swap instruction is used for the 
shifts, which is very nice - a big improvement over the older 3.4 gcc), 
but there are a few missed optimisations shown here that are probably 
quite common in other code.


Why is the address of crcTable8n loaded into r18:r19 first, before being 
copied into r30:r31 for the address calculation?  It seems that this 
happens when the address is reused - if it is not reused, then r30:r31 
are loaded directly.  However, the reuse does not benefit from having 
the address in a register - the add r30, r18 and adc r31, r19 on 
lines 68 and 69 could be replaced with subi and sbci instructions to 
save space and time, and to free registers r18:r19.  On most RISC cpus, 
storing the address in a register for reuse would be a benefit, which is 
probably why this code is generated - on the AVR, it is not helpful (at 
least, not here).




I don't know. But it happens more often that register are not re-used 
when the could have been.
Maybe because lpm is an a macro. Try replacing it with a normal table 
index. If that helps, write the ld r??, Z in an assembler macro to be 
sure.


Secondly, the (data  0x0f) clause generates messy 16-bit code.  I 
realise C requires integer promotion in such cases, but it's important 
to try to remove unnecessary code such as loading the high register with 
zero, then anding it with zero, then eoring it.  gcc version 3.4.6 was 
sometimes marginally better at such code.  It should be noted that the 
quality of the generated code depends very much on the exact expression 
- the original [(crc  4) ^ (data  0x0f)] generates poor code, while 
the equivalent [((crc  4) ^ data)  0x0f] generates tight code.




Hmm, yes it really gets messy on r31/r23:

  62 0020 F0E0  ldi r31,lo8(0)  ; load with 0
  63 0022 70E0  ldi r23,lo8(0)  ; load with 0
  64 0024 6F70  andi r22,lo8(15);
  65 0026 7070  andi r23,hi8(15); re-load R23 with 0
  66 0028 E627  eor r30,r22 ;
  67 002a F727  eor r31,r23 ; zero XOR zero == 0
  68 002c E20F  add r30,r18 ;
  69 002e F31F  adc r31,r19 ;

This is a known feature. The patches Andrew Hutchinson is working (?) 
on are supposed to improve this.


I'am wondering why the load of r31 and r23 is done before the 
operations. It seems like gcc 4.2.x is moving the loading of the 
variables a little more away from the use of them, but this does not 
benefit the AVR.


HTH,

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-11 Thread Wouter van Gulik

Andrew Hutchinson schreef:


PS Please report as a bug - gcc should be better than this.



I did, it got number 34737.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737

I hope all info is ok.
I wanted to add a link to your e-mail. Put it's not on the list archives 
yet.


Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] GCC-AVR Register optimisations

2008-01-10 Thread Wouter van Gulik

Registers 17 downwards are  call saved and push/popped in prescribed
order by prolog/epilog functions. Also R28,29 is potential frame pointer
and so that is best left alone. So the key registers are: R18-R27   R30,31



Note that in some cases it could be very interesting to use r27, or Y, 
register.


Consider this example:

char *x;
volatile int y;

void foo(char *p)
{
y += *p;
}

void main(void)
{
char *p1 = x;
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
}


This will generate very bad code.
/* prologue: frame size=0 */
push r14
push r15
push r16
push r17
/* prologue end (size=4) */
lds r24,x
lds r25,(x)+1
movw r16,r24
subi r16,lo8(-(1))
sbci r17,hi8(-(1))
call foo
movw r14,r16
sec
adc r14,__zero_reg__
adc r15,__zero_reg__
movw r24,r16
call foo
movw r16,r14
subi r16,lo8(-(1))
sbci r17,hi8(-(1))
movw r24,r14
call foo
movw r14,r16
sec
adc r14,__zero_reg__
adc r15,__zero_reg__
movw r24,r16
call foo
movw r16,r14
subi r16,lo8(-(1))
sbci r17,hi8(-(1))
movw r24,r14
call foo
etc..

A more optimal scheme would be
call foo
movw r24, r16
adiw r24, 1
movw r16, r24
call foo
etc..
Using the r24 capability to do a 16 bit increment

But in this special case there is no frame pointer. So we could use 
R28 to store instead of R16. Then we can add on r28 and do something 
like this:


 call foo
 adiw r28, 1
 movw r24, r28
 call foo

So yes using R28 as last resort looks like a sane thing.
Unless there is no frame pointer at all, and there is a need for 16 (or 
32 bit) arithmetic on saved registers. This is probably incredibly 
difficult. But I thought to mention it anyway


HTH,

Wouter

ps.

Writing it like foo(p); p++; Will produce better code?!? I will fill a 
bug report for this.



With the order, there are several problems:

1) Initial register  allocation fragments the register set. For example,
allocating r25 will prevent R24-25 being used for 16bit register  and
prevent R22-25 and R24-27 being used as 32 bit registers. gcc register
allocator does not seem to overcome this fragmentation.

2) The situation is made worse by the order of  16bit+ register used for
call and return values - which are allocated in reverse order. eg
R24-R25, R22-24, R18-24.  This means that the function parameters or
return values are rarely  in the right place - except for 16bit values.

3) Allocating a byte to odd number register precluded it being extended
to 16bit value without a move.

So, I tried creating an order which would preserve the contiguous
register space and avoid the above issues as much as possible.
This is what I ended up with:

R18,26,22,30,20,24,19,21,23,25,27,31,28,29, \
   17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,\


The result is a 1.25% saving in code size for a simple mixed
application. Pretty good for such a simple change!

For more floating point, the saving might well be higher as it demands
more contiguous 32 bit registers.

On the same basis, the current order of called saved registers R2-R17
dictated by  (mcall) prolog limit further improvement is clearly
imperfect.  These are used less frequently, though their cost is much
higher. So its difficult to gauge impact. I might take a look at some
intense floating point functions to see if this if it is worth pursuing
reordering these too.


Andy









___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] GCC-AVR Register optimisations

2008-01-10 Thread Wouter van Gulik

Wouter van Gulik schreef:



Note that in some cases it could be very interesting to use r27, or Y, 
register.




Should have written R28 of course.

Since gcc seems down at the moment I did some more testing.

Now consider this example:
void main(void)
{
char *p = x;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
}
This must be done using a subi/sbci pare.

But the compiler now seems to realize that p is a constant offset to x. 
So we now get:


main:
/* prologue: frame size=0 */
push r16
push r17
/* prologue end (size=2) */
lds r16,x
lds r17,(x)+1
movw r24,r16
call foo
movw r24,r16
subi r24,lo8(-(65))
sbci r25,hi8(-(65))
call foo
movw r24,r16
subi r24,lo8(-(130))
sbci r25,hi8(-(130))

Here x is stored in r16 and the cumulative offset is added to R24

But if the compiler can realize this... Then why not do this for adds 
within the adiw range?!?

So for p++/p+=1 we would get something like:

movw r24, r16
adiw r24, 1
call foo
movw r24, r16
adiw r24, 2
etc..

This is just as small as the earlier suggested use of R28!

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] Tablejumps - needless run time conversion to byteaddress

2008-01-05 Thread Wouter van Gulik

I can make GCC use a jumptable using this code:

test.c
===
volatile int x;
volatile int y;

void foo  (void) {
x++;
}

void main(void)
{   
switch(y)
{
case 0  : foo();
case 1  : foo();
case 2  : foo();
case 3  : foo();
case 4  : foo();
case 5  : foo();
case 6  : foo();
case 7  : foo();
case 8  : foo();
case 9  : foo();
case 10 : foo();
case 11 : foo();
case 12 : foo();
case 13 : foo();
case 14 : foo();
case 15 : foo();
case 16 : foo();
}   

}
===
Compiling using:

avr-gcc -g -Os -Wall -mmcu=atmega16 -fno-inline test.c
(Using no inline to keep disassembly small)

gcc version  4.2.2

Gives:
===
main:
.LFB3:
.LM3:
/* prologue: frame size=0 */
/* prologue end (size=0) */
.LM4:
lds r30,y
lds r31,(y)+1
cpi r30,17
cpc r31,__zero_reg__
brsh .L23
.LM5:
subi r30,lo8(-(gs(.L22)))
sbci r31,hi8(-(gs(.L22)))
lsl r30
rol r31
lpm __tmp_reg__,Z+
lpm r31,Z
mov r30,__tmp_reg__
ijmp
.data
.section .progmem.gcc_sw_table, a, @progbits
.p2align 1
.L22:
.data
.section .progmem.gcc_sw_table, a, @progbits
.p2align 1
.word gs(.L5)
.word gs(.L6)
.word gs(.L7)
.word gs(.L8)
.word gs(.L9)
.word gs(.L10)
.word gs(.L11)
.word gs(.L12)
.word gs(.L13)
.word gs(.L14)
.word gs(.L15)
.word gs(.L16)
.word gs(.L17)
.word gs(.L18)
.word gs(.L19)
.word gs(.L20)
.word gs(.L21)
.text
.L5:
.LM6:
call foo
snip
etc...
==

Some interesting notes:
It works only from 17 cases and up.

For smaller devices (e.g. atmega8)
It works from already from 3 cases. But then an rjmp table is used.

Why is GCC not using this rjmp scheme for the atmega16? Is it too difficult
to predict it will not pass 4k boundary?

HTH,

Wouter

 
 Hi
 
 Does anyone have some code that creates tablejump in Avr-gcc? This is
 where gcc will create table instead of long line of if-then-else tests
 
 I cant seem to create enough switch cases to force one!
 
 I have been looking at compilation patterns and noticed that gcc address
 is multiplied by 2 to form address for LPM (table being in ROM). LPM
 needs byte address and gcc has word address.
 
  lsl r30
rol r31
lpm __tmp_reg__,Z+
lpm r31,Z
mov r30,__tmp_reg__
ijmp
 
 Asm Pattern currently expects value to be in R30. However, it would
 appear that this would be better with a symbol rather than value in
 register - thus providing a means to multiply that value by 2 at compile
 time. (and I cant see any reason it would be called with other than
 constant address in ROM)
 
 Obviously, I'd like to test it.
 
 Andy
 
 
 
 
 ___
 AVR-GCC-list mailing list
 AVR-GCC-list@nongnu.org
 http://lists.nongnu.org/mailman/listinfo/avr-gcc-list



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] invalid ram address

2007-12-13 Thread Wouter van Gulik

andi schreef:

Hi ,
 
I make a program for atmega32, and i compile it using WinAVR version 
20070525. But when i want to simulate in AVRStudio, the variables that i 
declare are invalid location. I check the SRAM address and in outside 
the maximum address (example : 0xA64)

Is it a bug ? Or maybe I have to configure something ?
 


Please provide more info. State the code you are using and the compile 
option. Otherwise it's impossible to help.


Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Intel hex record types

2007-11-23 Thread Wouter van Gulik

Scott Morken schreef:


What do we do for AVR architecture?  I would also be interested in any 
other possible record types output by AVRGCC if anyone knows.




Take a look at s-rec or s-record, it comes with the WinAVR releases. It 
can almost convert anything to anything.


HTH

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] RE: [avrdude-dev] Re: How to talk to second device in JTAG chain?

2007-11-07 Thread Wouter van Gulik
 
 The fallback plan is to have a header with TRST, TCK and TDI
 pins shared, with separate TDO and TMS pins for each device. :/
 

That's a good idea anyway, since all debugging instruction must go through
the other chip, you get an extra delay per cycle. For a few bytes this is
ok, but when (down)loading several Kbytes this start to be uncomfortable.

HTH,

Wouter




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Problem with delay loop

2007-09-28 Thread Wouter van Gulik

Royce Pereira schreef:

Hi all,

In the latest WinAVR (avr-gcc (GCC) 4.1.2 (WinAVR 20070525) I found this.

Check this out:
//==
void delay(unsigned del_cnt)
{
   while(del_cnt--);

   return;
}
//===



Well writing your own delay_loops is not recommended, because the 
optimiser might optimise your loop away. Use util/delay.h instead.


Please note that delay.h might not work if compiling without optimiser 
(but then again, your loop will not be gone)


HTH,

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] 8-bit return values again

2007-09-27 Thread Wouter van Gulik

Dusan Ferbas schreef:

Hi guys,

when I searched this list for 8-bit return values, I found 2 threads. 
Described snippets seems to me more about switch/case  expression 
optimization:

http://lists.gnu.org/archive/html/avr-gcc-list/2003-06/msg0.html
http://lists.gnu.org/archive/html/avr-gcc-list/2003-06/msg5.html

--
I want to solve case, when a function is declared as u_char(char, int8, 
etc.). It is compiled in a way, that it returns a value in the R24,R25 
register pair. This is true not only with literals (see example below), 
but also with byte variables. R25 value is never used in a calling code 
(see assembler listing below).


Any idea ? Any plans for resolving this ?



It seems that (at least some of) this is fixed in gcc 4.3.0
I currently don't have acces to 4.3.0 but Eric Weddington has, and his 
assembler output shows no clr r25. See:


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33050

I don't know if this is always the case or just a lucky example.

HTH

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Inversion of logic improves size speed

2007-08-27 Thread Wouter van Gulik

Anatoly Sokolov wrote:

Hi.

This patch optimizes logic left shift of unsigned char by 4, 5, and 6, 
excluding double 'andi' instructions in some cases.




snip




Now:

0092 getBit4InvShift:
  92: 82 95swap r24
  94: 81 70andi r24, 0x01 ; 1
  96: 08 95ret

0098 getBit5InvShift:
  98: 82 95swap r24
  9a: 86 95lsr r24
  9c: 81 70andi r24, 0x01 ; 1
  9e: 08 95ret

00a0 getBit6InvShift:
  a0: 82 95swap r24
  a2: 86 95lsr r24
  a4: 86 95lsr r24
  a6: 81 70andi r24, 0x01 ; 1
  a8: 08 95ret




That's good news! No more clr r25 and no double and anymore!
Does this fix the double and in more situations? Is this because the 
swapand is now exposed to the upperlayers?


One thing, the patch is not in this e-mail (the list). And I did not 
receive your e-mail on my private e-mail. Maybe it's filtered. Will 
check my junk map.


Thanks,

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Inversion of logic improves size speed

2007-08-27 Thread Wouter van Gulik

Eric Weddington schreef:


Patch was not attached to email. However, Anatoly attached the patch to the
bug report.



What bug report?
I looked at:

Non optimal bit extraction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33049

No register save:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33050

Double and:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11259

I can't find them there or I need some more coffee... it's after all 
still monday ;)


Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] Another missed optimization

2007-08-09 Thread Wouter van Gulik

Hi list,

Ok I'll admit this one is rare, but a really annonying one. Since my 
application is all in one file I try to optimise the code (and 
especially my ISR's) by making heavily used variables reside in lower 
part registers. This reduces size a whole lot and speeds up a good bit.
I know that instructions are restricted to handling r16..r31 only. But 
this example should not suffer from this.


Why is 0xA load again in r24?
Strange enough gcc does optimise the extra ldi when r is not a explicit 
register.


So it seems that the logic for writing register 15 and below is non 
optimal? I've seen misses when doing adding as well (I've not tried to 
reproduce it yet, will give it another try later)


I used winavr-20070525 (GCC4.1.2) and the following compile options

avr-gcc -S -Os -mmcu=atmega644 test.c

Ok this is the c snippet:

// C 
register unsigned char r asm(r2); //use only r2..r15
volatile unsigned char dummy; //give the optimizer something to keep
int main(void) {
unsigned char localDummy = dummy;
if(localDummy == 0xA) {
r = localDummy;
}
}

// ASM 
The ASM output:

main:
/* prologue: frame size=0 */
/* prologue end (size=0) */
lds r24,dummy   -- load to localDummy
cpi r24,lo8(10) -- compare against 0xA  
brne .L5-- branch
ldi r24,lo8(10) -- WHY??? it's there allready!
mov r2,r24  -- mov
.L5:
/* epilogue: frame size=0 */
ret




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] Explicitly using lower half registers gives non optimal code

2007-08-09 Thread Wouter van Gulik

Ok I tried adding as well. It's ok when writing something like this:

// C ///
register unsigned char r asm(r2);
void foo(unsigned char in) {

if(in == 0xA) {
r += in +1 ;
}
}

You get neat code (but that's because of the optimizer seeing that the 
increment is constant):

// ASM ///
cpi r24,lo8(10)
brne .L4
ldi r24,lo8(11)
add r2,r24
.L4:
ret



But now try this:
It looks like as if the compiler thinks r2 is in RAM?
I cannot see why r2 should be load in to r24? Is making a variable a 
register giving extra constraints on manipulation (apart from a less 
available instructions?)
I tried different rewrites but all ends up the same. I can image the 
optimizer incrementing in before adding it to r. But I can't make 
sense of this.


// C ///
register unsigned char r asm(r2);
void foo(unsigned char in, unsigned char in2) {

if(in == in2) {
r += in +1 ;
}

// ASM ///
mov r25,r24  WHY?
cp r24,r22
brne .L4
mov r24,r2   WHY?? (tmp = r2)
subi r24,lo8(-(1))   tmp++
mov r2,r25   r2 = in1
add r2,r24   r2 += tmp
.L4:
ret

This last part could have been:
brne .L4
inc r24 (or r2)
add r2, r24


Greetings


Wouter





___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Optimiser bloats code

2007-08-07 Thread Wouter van Gulik

Paulo Marques schreef:

 Not really a better idea for 3 bits, but it would be for 4:

prog_uint8_t inv_table[8]={0,4,2,6,1,5,3,7};

unsigned char inv_test(void)
{
  return pgm_read_byte(inv_table[PORTB  0x3]);
}


Ah yes, of course a table!



The output from gcc 4.2.0:

byte inv_test(void)
{
return pgm_read_byte(inv_table[PORTB  0x3]);
  96:   e8 b3   in  r30, 0x18   ; 24
  98:   ff 27   eor r31, r31
  9a:   e3 70   andir30, 0x03   ; 3
  9c:   f0 70   andir31, 0x00   ; 0
  9e:   ec 5a   subir30, 0xAC   ; 172
  a0:   ff 4f   sbcir31, 0xFF   ; 255
  a2:   e4 91   lpm r30, Z
}
  a4:   8e 2f   mov r24, r30
  a6:   99 27   eor r25, r25
  a8:   08 95   ret

If not for the redundant andi r31, 0x00 (when r31 has just been zeroed
by the eor r31,r31) it would give the same number of instructions as
your code.

The nice thing about this approach is that it works the same for 4 or
more bits (up to 8).



Yes but the table would grow large on 8 bits inversion :D. And I'm more 
afraid of running out of space, then running out of time.


Thanks for the help,

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Inversion of logic improves size speed

2007-08-07 Thread Wouter van Gulik

Anatoly Sokolov schreef:

Hi,

Bug #11259 [avr] gcc Double 'andi' missed optimization:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11259

Bug #29560 Poor optimization for character shifts on Atmel AVR:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29560




Bug #29560 seems to be a little different. The bug report is on shifting 
with a variable shift count. And the loop for doing this shift is non 
optimal (high byte shift because of int promotion or something alike).


While my example works with fixed shifts. Actually, it's bit extraction 
implemented as shifting.
My concern is that when rewriting/inverting my logic I get much better 
(optimal in most cases) results. So it seems the compiler has not chosen 
the most optimal path. It seems like he has two ways of doing the 
shifting? Mabye it's some hidden 8-bit/16-bit variable difference?



Testcase:


snip



  There are two 'and' insn (#24 and #12), but them are not optimized yet. Why?
Probably reason, 'lshiftrt' insn is splited in 'rotate' and 'and' insns in
'pass_split_after_reload' pass of the compiler, but optimization passes
(combine and cse) of which two 'and' insns can merge are run earlier.



I see, to bad...


It is possible to add peephole for merge two 'and' insns. But I do not think
that this decision optimum.



Why not? I agree it's not solving the roots of the problem but it helps 
anyway. I am a total noob on GCC internals so this might be a stupid 
question...


Thanks for all the explantions! Really interresting stuff.

Greetings,

Wouter


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] Inversion of logic improves size speed

2007-08-05 Thread Wouter van Gulik
Hi list,

After some testing I found out that inverting shift and and instruction can
significantly reduce speed and size.
In the first is case the compiler misses that it can optimise the shifts for
bit 4..7 by first nibble swapping.
Which it does figure out when rewriting the part as in the lower part.

Is this a (known?) bug or am I missing something?

Wouter

/* This results in shifting instructions */

uint8_t getBit0(uint8_t temp) { uint8_t r = 0; if(temp(10)) r|=0x1;
return r; } 
uint8_t getBit1(uint8_t temp) { uint8_t r = 0; if(temp(11)) r|=0x1;
return r; } 
uint8_t getBit2(uint8_t temp) { uint8_t r = 0; if(temp(12)) r|=0x1;
return r; } 
uint8_t getBit3(uint8_t temp) { uint8_t r = 0; if(temp(13)) r|=0x1;
return r; } 
uint8_t getBit4(uint8_t temp) { uint8_t r = 0; if(temp(14)) r|=0x1;
return r; } 
uint8_t getBit5(uint8_t temp) { uint8_t r = 0; if(temp(15)) r|=0x1;
return r; } 
uint8_t getBit6(uint8_t temp) { uint8_t r = 0; if(temp(16)) r|=0x1;
return r; } 
uint8_t getBit7(uint8_t temp) { uint8_t r = 0; if(temp(17)) r|=0x1;
return r; } 
 

/* This results in better shifting instructions */

uint8_t getBit0InvShift(uint8_t temp) { uint8_t r = 0; if((temp0)1)
r|=0x1; return r; } 
uint8_t getBit1InvShift(uint8_t temp) { uint8_t r = 0; if((temp1)1)
r|=0x1; return r; } 
uint8_t getBit2InvShift(uint8_t temp) { uint8_t r = 0; if((temp2)1)
r|=0x1; return r; } 
uint8_t getBit3InvShift(uint8_t temp) { uint8_t r = 0; if((temp3)1)
r|=0x1; return r; } 
uint8_t getBit4InvShift(uint8_t temp) { uint8_t r = 0; if((temp4)1)
r|=0x1; return r; } 
uint8_t getBit5InvShift(uint8_t temp) { uint8_t r = 0; if((temp5)1)
r|=0x1; return r; } 
uint8_t getBit6InvShift(uint8_t temp) { uint8_t r = 0; if((temp6)1)
r|=0x1; return r; } 
uint8_t getBit7InvShift(uint8_t temp) { uint8_t r = 0; if((temp7)1)
r|=0x1; return r; } 
 

This results in:
/* This results in shifting instructions */
uint8_t getBit0(uint8_t temp) { uint8_t r = 0; if(temp(10)) r|=0x1;
return r; }
  ae:   81 70   andir24, 0x01   ; 1
  b0:   99 27   eor r25, r25
  b2:   08 95   ret

00b4 getBit1:
uint8_t getBit1(uint8_t temp) { uint8_t r = 0; if(temp(11)) r|=0x1;
return r; }
  b4:   99 27   eor r25, r25
  b6:   96 95   lsr r25
  b8:   87 95   ror r24
  ba:   81 70   andir24, 0x01   ; 1
  bc:   90 70   andir25, 0x00   ; 0
  be:   08 95   ret

00c0 getBit2:
uint8_t getBit2(uint8_t temp) { uint8_t r = 0; if(temp(12)) r|=0x1;
return r; }
  c0:   99 27   eor r25, r25
  c2:   96 95   lsr r25
  c4:   87 95   ror r24
  c6:   96 95   lsr r25
  c8:   87 95   ror r24
  ca:   81 70   andir24, 0x01   ; 1
  cc:   90 70   andir25, 0x00   ; 0
  ce:   08 95   ret

00d0 getBit3:
uint8_t getBit3(uint8_t temp) { uint8_t r = 0; if(temp(13)) r|=0x1;
return r; }
  d0:   99 27   eor r25, r25
  d2:   43 e0   ldi r20, 0x03   ; 3
  d4:   96 95   lsr r25
  d6:   87 95   ror r24
  d8:   4a 95   dec r20
  da:   e1 f7   brne.-8 ; 0xd4 getBit3+0x4
  dc:   81 70   andir24, 0x01   ; 1
  de:   90 70   andir25, 0x00   ; 0
  e0:   08 95   ret

00e2 getBit4:
uint8_t getBit4(uint8_t temp) { uint8_t r = 0; if(temp(14)) r|=0x1;
return r; }
  e2:   99 27   eor r25, r25
  e4:   54 e0   ldi r21, 0x04   ; 4
  e6:   96 95   lsr r25
  e8:   87 95   ror r24
  ea:   5a 95   dec r21
  ec:   e1 f7   brne.-8 ; 0xe6 getBit4+0x4
  ee:   81 70   andir24, 0x01   ; 1
  f0:   90 70   andir25, 0x00   ; 0
  f2:   08 95   ret

00f4 getBit5:
uint8_t getBit5(uint8_t temp) { uint8_t r = 0; if(temp(15)) r|=0x1;
return r; }
  f4:   99 27   eor r25, r25
  f6:   65 e0   ldi r22, 0x05   ; 5
  f8:   96 95   lsr r25
  fa:   87 95   ror r24
  fc:   6a 95   dec r22
  fe:   e1 f7   brne.-8 ; 0xf8 getBit5+0x4
 100:   81 70   andir24, 0x01   ; 1
 102:   90 70   andir25, 0x00   ; 0
 104:   08 95   ret

0106 getBit6:
uint8_t getBit6(uint8_t temp) { uint8_t r = 0; if(temp(16)) r|=0x1;
return r; }
 106:   99 27   eor r25, r25
 108:   76 e0   ldi r23, 0x06   ; 6
 10a:   96 95   lsr r25
 10c:   87 95   ror r24
 10e:   7a 95   dec r23
 110:   e1 f7   brne.-8 ; 0x10a getBit6+0x4
 112:   81 70   

Re: [avr-gcc-list] Optimiser bloats code

2007-08-01 Thread Wouter van Gulik
 Return values are promoted to an int.


Why? Is this a bug or a feature? Am I doing something wrong or is an u08
return always promoted to an int?

 You probably already know this, but you could also do:

 return PINB  5;

 which returns the same answer using the following:

 in r24,54-0x20
 swap r24
 lsr r24
 andi r24,0x7
 clr r25
 ret


Yes I know, (it is written above my example) I wanted to point out how bad
the results is when compiler start to optimise this.

Just curious, is there any faster way to bit invert as in my foo3 example
(see below). It now takes 9 instructions which is good but less is always
better.
A loop requires more instructions and is much slower. Anyone an idea on
smaller bit inversion for just 3 bits? Because if this is the smallest
way, you cant tell the compiler to do so :(

HTH

Wouter

//Force the compiler and voila! Optimal!
//Not bit inverted or bit inverted, the result is the same
uint8_t foo3(void) { //good
  e0:   88 27   eor r24, r24

  uint8_t temp = 0;
  asm volatile(clr %0 : =r (temp) :);
  if(PINB  (15)) temp |= (12);
  e2:   1d 99   sbic0x03, 5 ; 3
  e4:   84 60   ori r24, 0x04   ; 4
  if(PINB  (16)) temp |= (11);
  e6:   1e 99   sbic0x03, 6 ; 3
  e8:   82 60   ori r24, 0x02   ; 2
  if(PINB  (17)) temp |= (10);
  ea:   1f 99   sbic0x03, 7 ; 3
  ec:   81 60   ori r24, 0x01   ; 1

  return temp;
}
  ee:   99 27   eor r25, r25
  f0:   08 95   ret




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list