[avr-gcc-list] Re: avr-gcc optimisation

2009-02-28 Thread Nicholas Vinen
avr-gcc-list-requ...@nongnu.org wrote:
>>> OK, I only spent a few minutes looking at old code and I found some 
>>> obviously sub-optimal results. It distills down to this:
>>>
>>> #include 
>>>
>>> int main(void) {
>>>   unsigned long packet = 0;
>>>
>>>   while(1) {
>>> if( !(PINC & _BV(PC2)) ) {
>>>   packet = (packet<<1)|(((unsigned char)PINC>>1)&1);
>>> }
>>> PORTB = packet;
>>>   }
>>> }
>>>   
>> Did you write the code like this just to test the optimiser?  It 
>> 
>
> As far as I understand, it's a stripped down example to demonstrate the 
> code bloat in a reproducable way (combileable source).
>
>   
It originally came from a program that did something sensible. This was
the part of the code which received data from another microcontroller
one bit at a time into a 32 bit buffer (packet) MSB first, and then when
it's full it does various things depending on what's in the packet. This
seemed like the simplest way to do it, and I think it should have
compiled something compact. I was surprised when I looked at the assembly.

Here's another example of a couple of inefficiencies which caused me
trouble in the same application. Take these two bits of code.

a)

#define PINC  (*((unsigned char volatile*) 0x20))
#define PORTB (*((unsigned char volatile*) 0x22))

void foo ()
{
unsigned char a, b, c;

while(1)
{
a = PINC;
b = PINC;
c = ((unsigned short)a*(unsigned short)b)>>8;
PORTB = c;
}
}

b)

#define PINC  (*((unsigned short volatile*) 0x20))
#define PORTB (*((unsigned short volatile*) 0x22))

void foo ()
{
unsigned short a, b, c;

while(1)
{
a = PINC;
b = PINC;
c = ((unsigned long)a*(unsigned long)b)>>16;
PORTB = c;
}
}


Now, interestingly, in the first case the multiply gets inlined and this
saves it having to cross-multiply the top half of the two shorts which
will always be zero, as well as removing the function call overhead etc.
(this is using -O3):

.L2:
.stabn  68,0,11,.LM1-.LFBB1
.LM1:
in r18,32-0x20   ;  a,   ;  7   *movqi/4[length = 1]
.stabn  68,0,12,.LM2-.LFBB1
.LM2:
in r24,32-0x20   ;  b,   ;  9   *movqi/4[length = 1]
.stabn  68,0,13,.LM3-.LFBB1
.LM3:
mul r24,r18  ;  b, a ;  10  umulqihi3   [length = 3]
movw r24,r0  ;  tmp46
clr r1
.stabn  68,0,14,.LM4-.LFBB1
.LM4:
out 34-0x20,r25  ; , ;  14  *movqi/3[length = 1]
rjmp .L2 ;   ;  35  jump[length = 1]


It's still doing the final shift very inefficiently - all it has to do
is output the top half of the result, but instead it twiddles the
registers around to do the shift first, then outputs the lower half.
Kind of silly, but overall, not too horrible. The second case is not as
good (also using -O3):

.L2:
.stabn  68,0,11,.LM1-.LFBB1
.LM1:
in r18,32-0x20   ;  a,   ;  7   *movhi/2[length = 2]
in r19,(32)+1-0x20   ;  a,
.stabn  68,0,12,.LM2-.LFBB1
.LM2:
in r22,32-0x20   ;  b,   ;  9   *movhi/2[length = 2]
in r23,(32)+1-0x20   ;  b,
.stabn  68,0,13,.LM3-.LFBB1
.LM3:
ldi r24,lo8(0)   ;  b,   ;  37  *movhi/4[length = 2]
ldi r25,hi8(0)   ;  b,
ldi r20,lo8(0)   ;  a,   ;  39  *movhi/4[length = 2]
ldi r21,hi8(0)   ;  a,
rcall __mulsi3   ;  14  *mulsi3_call[length = 1]
movw r22,r24 ;  tmp49, tmp48 ;  42  *lshrsi3_const/3
[length = 3]
clr r24  ;  tmp49
clr r25  ;  tmp49
.stabn  68,0,14,.LM4-.LFBB1
.LM4:
out (34)+1-0x20,r23  ; , tmp49   ;  19  *movhi/3[length 
= 2]
out 34-0x20,r22  ; , tmp49
rjmp .L2 ;   ;  43  jump[length = 1]


Still the same pointless twiddling at the end, but this time it's
actually loading zeroes into the top 16 bits of each of the two inputs
and calling __mulsi3. This, I'm fairly sure, is going to do a whole
bunch of useless multiplies with zero and adds of zero. Is there a
reason it won't inline and optimise __mulsi3? Perhaps the 8*8=>16 case
is handled especially but the 16*16=>32 isn't, or perhaps I need to add
some command line to avr-gcc to make it understand that inlining a
larger intrinsic is worthwhile. I would have thought -O3 would do this
already, especially since it should result in a reduction in code size
because __mulsi3 is only called once and inlining it could remove a
bunch of overhead and dead instructions.

I guess what I'm doing here is not very common, but the performance
difference between this and the hand-written assembly is going to be
pretty drastic. Inline assembly can help a lot in this case (i.e. create
a 16x16=>32 function), but it would be nice if that wasn't necessary.

Am I asking too much from the compiler?



Nicholas

___
AVR-GCC-list maili

RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric
 

> -Original Message-
> From: 
> avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
> [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
> org] On Behalf Of Vincent Trouilliez
> Sent: Saturday, February 28, 2009 7:41 PM
> To: avr-gcc-list@nongnu.org
> Subject: Re: [avr-gcc-list] Re: C vs. assembly performance
> 
> 
> I switch on an unsigned byte, with contiguous values (0-24).
> A Function table sounds elegant to my ear, but it would mean 25
> functions ! In my case the work is done in-situ, within the switch
> statement, as it's takes only 2 or 3 statements to process a givne
> "case". Using functions would be both overkill and overwhelming to
> manage I think !! ;-)
> 

It is no more overkill and overwhelming than dealing with a single contiguous 
switch statement with 25 cases. I think it's just a matter of perspective. The 
one good thing is that each function really encapsulates a single idea and has 
nothing else, which *may* make maintenance easier. Implementing it this way 
really separates the ideas of 'making some choice' from the 'implementation of 
a single choice', rather than conflating the two together in a single massive 
switch statement.

Yes, your switch sounds like a candidate for a function table. However, like 
everything in engineering, there are trade-offs. The trade off here is the 
overhead for each function, plus the slight overhead for checking the range of 
the value used to index into the table. To be fair, you would have to write it 
both ways and see which produces the smaller code.

However, the other advantage for the function table is that you can guarantee 
that it will take the same amount of time, for each value, to start executing 
its associated function. With an if-else-if structure, the longer down the 
if-else-if chain you go then the longer it takes to execute the associated 
code. For a switch statement, it all depends on how the compiler generates code 
for it, which can change as you add more cases or remove cases from the switch. 
This leaves you out of control with regards to timing.


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Vincent Trouilliez
On Sat, 28 Feb 2009 19:24:38 -0700
"Weddington, Eric"  wrote:
> You wouldn't need *nested* ifs, but an if-else-if structure, or better yet, a 
> table of function pointers, also known as a dispatch table. Each method 
> depends on the type of data that you're switching on.

I switch on an unsigned byte, with contiguous values (0-24).
A Function table sounds elegant to my ear, but it would mean 25
functions ! In my case the work is done in-situ, within the switch
statement, as it's takes only 2 or 3 statements to process a givne
"case". Using functions would be both overkill and overwhelming to
manage I think !! ;-)

I pasted my switch statement below for the curious.


--
Vince



void var_format(char *buff, uint8_t var_num)
{
uint16_t tmp16;
uint32_t tmp32;
int16_t tmpS16;

char Yes[] ="  Yes";
char No[] = "   No";
char Unknown[] ="-\?\?\?-";

switch (var_num)
{
case ECU_COOLANT: 
{// (N * 0.75) - 40 DB41
-40 to +150 °C
tmpS16 = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40;
var_format_S16(buff, tmpS16, 0);
break;
}
case ECU_ENGINE_SPEED: 
{// MSB:LSB DB12:13 
0 to    RPM
ATOMIC_BLOCK(ATOMIC_FORCEON)
{
tmp16 = ((uint16_t)KLineFrameM1[12] << 8) + 
(uint16_t)KLineFrameM1[13];
}
var_format_S16(buff, (int16_t)tmp16, 0);
break;
}
case ECU_ROAD_SPEED: 
{// DB14uint8_t MPH
var_format_byte(buff, KLineFrameM1[8]);
break;
}
case ECU_BARO_AIR_PRESSURE: 
{// ((N-128)/100)+1 DB24
-0.28 to +2.27 Bar
tmpS16 = (int16_t)(KLineFrameM1[24]) - 28;
var_format_S16(buff, tmpS16, 2);
break;
}
case ECU_MAP_PRESSURE: 
{// ((N-130)/100)+1  DB25   -0.30 
to +2.25 Bar
tmpS16 = (int16_t)(KLineFrameM1[25]) - 30;
var_format_S16(buff, tmpS16, 2);
break;
}
case ECU_MAT_TEMP: 
{// (N * 0.75) - 40 DB42
-40 to +150 °C
tmpS16 = (int16_t)(KLineFrameM1[42]) * 3 / 4 - 40;
var_format_S16(buff, tmpS16, 0);
break;
}
case ECU_THROTTLE_POSITION: 
{// N / 2.55DB27
0 to 100 %
tmp16 = (uint16_t)KLineFrameM1[27] * 100 / 255;
var_format_byte(buff, (uint8_t)tmp16);
break;
}
case ECU_ENGINE_LOAD: 
{// DB360 - 100 %
var_format_byte(buff, KLineFrameM1[36]);
break;
}
case ECU_KNOCK_COUNT: 
{// DB43uint8_t
var_format_byte(buff, KLineFrameM1[43]);
break;
}
case ECU_KNOCK_RETARD: 
{// (N * 45) / 255  DB44
0 to 45 Deg
tmp16 = (uint16_t)KLineFrameM1[44] * 90 / 51;
var_format_S16(buff, (int16_t)tmp16, 1);
break;
}
case ECU_SPARK_ADVANCE: 
{// (N * 9000)/256   MSB:LSB   DB39:400.00 Degrees
ATOMIC_BLOCK(ATOMIC_FORCEON)
{
tmp16 = ((uint16_t)KLineFrameM1[39] << 8) + 
(uint16_t)KLineFrameM1[40];
}
tmp32 = (uint32_t)tmp16 * 9000 / 256;
var_format_S16(buff, (int16_t)tmp32, 2);
break;
}
case ECU_BOOST_DC: 
{// DB31uint8_t or %, don't know
var_format_byte(buff, KLineFrameM1[31]);
break;
}
case ECU_MAIN_INJ_DC: 
{// DB45uint8_t
var_format_byte(buff, KLineFrameM1[45]);
break;
}
case ECU_SECONDARY_INJ_DC: 
{// DB37   

RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric
 

> -Original Message-
> From: 
> avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
> [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
> org] On Behalf Of Vincent Trouilliez
> Sent: Saturday, February 28, 2009 7:20 PM
> To: avr-gcc-list@nongnu.org
> Subject: Re: [avr-gcc-list] Re: C vs. assembly performance
> 
> On Sat, 28 Feb 2009 19:09:13 -0700
> "Weddington, Eric"  wrote:
> 
> > So in application code I tend to avoid switch statements 
> for embedded systems, unless I'm writing throw-away code or 
> the application is trivial.
> 
> Oh no ! ;-)
> I have only recently got round to using switch statements, to improve
> code legibility. In my current/first embedded project, I 
> happen to have
> a very long (25 cases, 160 lines long) switch statement.. I dread to
> think what it would like if I had to replace it (what else 
> with ?) with
> nested if's !
> How readable would that be... not to mention that with indentation, 25
> levels of nesting would mean the last case would be 3 meters 
> on the far
> right... ;-)
> 
> Any coding tips to make all this look about readable by human
> beings ?! ;-/

You wouldn't need *nested* ifs, but an if-else-if structure, or better yet, a 
table of function pointers, also known as a dispatch table. Each method depends 
on the type of data that you're switching on.


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Vincent Trouilliez
On Sat, 28 Feb 2009 19:09:13 -0700
"Weddington, Eric"  wrote:

> So in application code I tend to avoid switch statements for embedded 
> systems, unless I'm writing throw-away code or the application is trivial.

Oh no ! ;-)
I have only recently got round to using switch statements, to improve
code legibility. In my current/first embedded project, I happen to have
a very long (25 cases, 160 lines long) switch statement.. I dread to
think what it would like if I had to replace it (what else with ?) with
nested if's !
How readable would that be... not to mention that with indentation, 25
levels of nesting would mean the last case would be 3 meters on the far
right... ;-)

Any coding tips to make all this look about readable by human
beings ?! ;-/

--
Vince, catastrophed...


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric
 

> -Original Message-
> From: 
> avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
> [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
> org] On Behalf Of Bob Paddock
> Sent: Saturday, February 28, 2009 6:40 PM
> To: avr-gcc-list@nongnu.org
> Subject: Re: [avr-gcc-list] Re: C vs. assembly performance
> 
> >  In practice, the experienced programmer can do a lot to help
> > the tools. avr-gcc *does* do a good job with most code - I do
> > much less re-structuring of my source code for avr-gcc than I do
> > for most other compilers (I use a lot of compilers for a 
> lot of >different targets).
> 
> Something I always found amusing/depressing is that some
> compilers generate smaller code for ++i than i++ everything
> being equal.  Then other compilers generate smaller code for i++ than
> ++i.  So in the embedded space you have to know what your tools are
> doing. Sadly it should not be this way.
> 
> avr-gcc generates the same size code in any case that I've looked at.
> 

Along the same lines of "you should know what your compiler generates" is the 
use of switch statements. They can be implemented (in the code generation) in 
many different ways, and it is based on heuristics. These heuristics are not 
always tuned the best way for the target.

So in application code I tend to avoid switch statements for embedded systems, 
unless I'm writing throw-away code or the application is trivial.


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Bob Paddock
>  In practice, the experienced programmer can do a lot to help
> the tools. avr-gcc *does* do a good job with most code - I do
> much less re-structuring of my source code for avr-gcc than I do
> for most other compilers (I use a lot of compilers for a lot of >different 
> targets).

Something I always found amusing/depressing is that some
compilers generate smaller code for ++i than i++ everything
being equal.  Then other compilers generate smaller code for i++ than
++i.  So in the embedded space you have to know what your tools are
doing. Sadly it should not be this way.

avr-gcc generates the same size code in any case that I've looked at.


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread David Brown

Georg-Johann Lay wrote:

David Brown schrieb:

Nicholas Vinen wrote:

OK, I only spent a few minutes looking at old code and I found some 
obviously sub-optimal results. It distills down to this:


#include 

int main(void) {
  unsigned long packet = 0;

  while(1) {
if( !(PINC & _BV(PC2)) ) {
  packet = (packet<<1)|(((unsigned char)PINC>>1)&1);
}
PORTB = packet;
  }
}


Did you write the code like this just to test the optimiser?  It 


As far as I understand, it's a stripped down example to demonstrate the 
code bloat in a reproducable way (combileable source).




Yes, I understand - it's just bad luck that it happens to be 
particularly tough code for the optimiser.


However, avr-gcc constantly surprises me in the quality of its code 
generation - it really is very good, and it has got steadily better 
through the years.  Sometimes it pays to think a bit about the way 
your source code is structured, and maybe test out different 
arrangements.


Source code structure is a concern of the project, not of the compiler.
Even for braindead code that comes from a code generator a compiler is 
supposed to yield good results.




That's true in theory - but embedded programmers are used to the 
difference between theory and practice (there's an interesting 
discussion about the theory and practice of "volatile" on 
comp.arch.embedded at the moment).  In theory, the compiler should 
generate good code no matter how the source code is structured.  In 
practice, the experienced programmer can do a lot to help the tools. 
avr-gcc *does* do a good job with most code - I do much less 
re-structuring of my source code for avr-gcc than I do for most other 
compilers (I use a lot of compilers for a lot of different targets).


I am inspecting the produced asm in some of my AVR projects with hard 
realtime requirements, too. But I would not encourage anyone to dig in 
the generated asm and try to get best code by re-arranging it or trying 
to find other algebraic representations. That takes a lot of time, and a 
compiler should care for the sources it gets, not the other way round. 
And if your code is intended to be cross-platform, you are stuck. If 
your code changes some 100 source lines away from the critical code, the 
inefficient code can return and you have to rewrite your code again to 
find another representation that avoids the bad code.




It is certainly true that you want to keep such compiler-helpful 
structuring to a minimum.  But if you are trying to write efficient code 
(rather than emphasising portability or development speed or other 
priorities), you *must* be familiar with your compiler and the types of 
code it generates for particular sequences of input.  You can very 
quickly learn some basic tricks that can make a great difference to the 
generated code with very little re-structuring of the source code.  A 
prime example is to use 8-bit data rather than traditional C "int" where 
possible.  Another case in point is to prefer explicit "if" conditionals 
rather than trying to calculate a conditional expression, such as was 
done here (if you are using a heavily pipelined processor, the opposite 
is true).


But I fully agree that you should not be hand-optimising all your source 
code and studying the generated assembly - the readability of the source 
code is more important than the tightness of the generated code in all 
but the most time-critical sections (there's no point in writing fast 
code if you can't be sure it's correct!).


However, in this case, I believe that my re-write is better source code, 
although I'm aware that's a personal preference.  I think it is much 
clearer what the code is doing, and it is far more obvious which pins 
are being used - it would also be much easier for proper code (rather 
than this example code) in which the pins would normally have defined 
symbolic names rather than "magic numbers" in the code.






___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Georg-Johann Lay

David Brown schrieb:

Nicholas Vinen wrote:

OK, I only spent a few minutes looking at old code and I found some 
obviously sub-optimal results. It distills down to this:


#include 

int main(void) {
  unsigned long packet = 0;

  while(1) {
if( !(PINC & _BV(PC2)) ) {
  packet = (packet<<1)|(((unsigned char)PINC>>1)&1);
}
PORTB = packet;
  }
}


Did you write the code like this just to test the optimiser?  It 


As far as I understand, it's a stripped down example to demonstrate the 
code bloat in a reproducable way (combileable source).


However, avr-gcc constantly surprises me in the quality of its code 
generation - it really is very good, and it has got steadily better 
through the years.  Sometimes it pays to think a bit about the way your 
source code is structured, and maybe test out different arrangements.


Source code structure is a concern of the project, not of the compiler.
Even for braindead code that comes from a code generator a compiler is 
supposed to yield good results.


I am inspecting the produced asm in some of my AVR projects with hard 
realtime requirements, too. But I would not encourage anyone to dig in 
the generated asm and try to get best code by re-arranging it or trying 
to find other algebraic representations. That takes a lot of time, and a 
compiler should care for the sources it gets, not the other way round. 
And if your code is intended to be cross-platform, you are stuck. If 
your code changes some 100 source lines away from the critical code, the 
inefficient code can return and you have to rewrite your code again to 
find another representation that avoids the bad code.


Georg-Johann


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: sprintf

2009-02-28 Thread David VanHorn
>
> Unless your "big iron" guys are similarly inexperienced (being a beginner
> is a good excuse for many mistakes), then they are incompetent.  There is
> *no* excuse for a knowledgeable programmer using an inefficient and unsafe
> function in such a horribly unclear manner



I disagree, but I'm closer to the situation.  They are under a lot of
pressure, and helping me out is something they do in odd moments.  I'm happy
for what I can get.


> Having lots of memory and processor resources available is not a reason to
> write rubbish.  It *is* a good reason for prioritising development time over
> run-time (that's why I often write PC software in Python, not C).  If you
> want to go to the shop two miles away, you might take your car - it's faster
> and easier, even if it costs more than walking.  But you don't get your car
> out the garage to drive to your post box at the bottom of the garden -
> that's the equivalent of using sprintf in this case.


I agree.  Sprintf certainly wasn't optimal, but it did WORK, and that was my
main concern at the moment.


> I will admit to being ignorant in C, but I'm not stupid.  :)
> >
>
> Don't worry, ignorance is curable :-)


Yup.. Working on it. :)
___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread David Brown

Nicholas Vinen wrote:
OK, I only spent a few minutes looking at old code and I found some 
obviously sub-optimal results. It distills down to this:


#include 

int main(void) {
  unsigned long packet = 0;

  while(1) {
if( !(PINC & _BV(PC2)) ) {
  packet = (packet<<1)|(((unsigned char)PINC>>1)&1);
}
PORTB = packet;
  }
}



Did you write the code like this just to test the optimiser?  It 
certainly gives it more of a challenge than most code, since it contains 
32-bit data (the compiler writers will place more emphasis on getting 
good code for far more common 8-bit and 16-bit data), and the compiler 
must combat the C rules for integer promotion to generate ideal code.


Try re-writing your code like this (which I think is clearer anyway):

int main(void) {
  unsigned long packet = 0;

  while (1) {
if (!(PINC & _BV(PC2))) {
  packet <<= 1;
  if (PINC & 0x02) {
  packet |= 0x01;
  };
}
PORTB = packet;
  }
}

This generates:

  77main:
  78/* prologue: frame size=0 */
  79/* prologue end (size=0) */
  80 0032 80E0  ldi r24,lo8(0)   ;  packet,
  81 0034 90E0  ldi r25,hi8(0)   ;  packet,
  82 0036 A0E0  ldi r26,hlo8(0)  ;  packet,
  83 0038 B0E0  ldi r27,hhi8(0)  ;  packet,
  84.L7:
  85 003a 9A99  sbic 51-0x20,2   ; ,
  86 003c 00C0  rjmp .L8 ;
  87 003e 880F  lsl r24  ;  packet
  88 0040 991F  rol r25  ;  packet
  89 0042 AA1F  rol r26  ;  packet
  90 0044 BB1F  rol r27  ;  packet
  91 0046   sbic 51-0x20,1   ; ,
  92 0048 8160  ori r24,lo8(1)   ;  packet,
  93.L8:
  94 004a 88BB  out 56-0x20,r24  ; , packet
  95 004c 00C0  rjmp .L7 ;

You may note that this code is in fact one instruction and one cycle 
shorter than your hand-written assembly...



I'm not disputing the fact that avr-gcc's "optimiser" does not always 
generate "optimal" code.  And there are certainly types of code which 
can be written smaller and faster in assembly than using any realistic 
compiler, simply because you can use techniques that are virtually 
impossible in C or which would require a totally different way of 
compiling code (using dedicated registers is a prime example).


However, avr-gcc constantly surprises me in the quality of its code 
generation - it really is very good, and it has got steadily better 
through the years.  Sometimes it pays to think a bit about the way your 
source code is structured, and maybe test out different arrangements.


mvh.,

David



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] Re: sprintf

2009-02-28 Thread David Brown

David VanHorn wrote:


 > has exactly the same effect as:
 >
 >   A_String[i] = 0xff;
 >   A_String[i + 1] = 0x00;
 >   delay_about_1000_processor_cycles();
 >   waste_about_2000_bytes_of_code_space();

ROTF ! :-

As was earlier, and more gently pointed out by Dave, Sprintf was pretty 
wasteful.

It worked though, and that's what the programmers here suggested I use.
They are "big iron" guys, so less sensitive to the costs.
Also, given that I had plenty of code space and cycles, the use of 
sprintf was "good enough", in that it allowed me to demonstrate the 
system, and show live data from my sensors.
 
Dropping the data into the array is unarguably more efficient, and I 
shall use that approach in the future.
 


Unless your "big iron" guys are similarly inexperienced (being a 
beginner is a good excuse for many mistakes), then they are incompetent. 
 There is *no* excuse for a knowledgeable programmer using an 
inefficient and unsafe function in such a horribly unclear manner.


sprintf is not a function that should be much used in real world code - 
snprintf (or asprintf on "big" systems with dynamic memory) is much 
safer for general use (you don't get buffer overflows).


Secondly, using pointer arithmetic when array access is the logically 
correct method is very bad practice.  Again, it is hard to read and 
understand, and it stops the compiler from being able to check for 
mistakes (out of bounds accesses).  If that was *your* mistake, that's 
fair enough - making mistakes is part of learning.  If that was the "big 
iron" guys' mistake, they are in the wrong job.


Having lots of memory and processor resources available is not a reason 
to write rubbish.  It *is* a good reason for prioritising development 
time over run-time (that's why I often write PC software in Python, not 
C).  If you want to go to the shop two miles away, you might take your 
car - it's faster and easier, even if it costs more than walking.  But 
you don't get your car out the garage to drive to your post box at the 
bottom of the garden - that's the equivalent of using sprintf in this case.


> I will admit to being ignorant in C, but I'm not stupid.  :)
>

Don't worry, ignorance is curable :-)



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] Re: sprintf

2009-02-28 Thread Weddington, Eric
 

> -Original Message-
> From: 
> avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
> [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
> org] On Behalf Of David Brown
> Sent: Saturday, February 28, 2009 12:49 PM
> To: avr-gcc-list@nongnu.org
> Subject: [avr-gcc-list] Re: sprintf
> 
> I didn't mean to imply that there is a problem with sprintf itself - 
> I've had occasion to use it in my own programs.  The point is 
> that it is 
> totally inappropriate in this case - not only is direct access to the 
> array hugely smaller and faster, but it is very much clearer to write 
> and easier to understand.
> 
> When you want to use the features of sprintf (and friends) 
> for text and 
> number formatting, then it is much easier and faster than 
> writing your 
> own code.

Which can be summed up in the saying: "Use the right tool for the job." ;-)


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] Re: sprintf

2009-02-28 Thread David Brown

Daniel O'Connor wrote:

On Saturday 28 February 2009 09:00:50 David Kelly wrote:

On Fri, Feb 27, 2009 at 10:37:30PM +0100, Vincent Trouilliez wrote:

On Fri, 27 Feb 2009 22:10:16 +0100

David Brown  wrote:

sprintf((A_String + i), "%c", 0xff)

has exactly the same effect as:

A_String[i] = 0xff;
A_String[i + 1] = 0x00;
delay_about_1000_processor_cycles();
waste_about_2000_bytes_of_code_space();

ROTF ! :-

sprintf() only costs 2kB? Sure enough, this little program is 2132 bytes
after strip:


Depends which vprintf you ask for, see 
http://www.nongnu.org/avr-libc/user-manual/group__avr__stdio.html#ga3b98c0d17b35642c0f3e4649092b9f1


Also, while it IS a pig (relative to the micro flash size and CPU) it is
very powerful and for a lot of applications the speed is not critical.



I didn't mean to imply that there is a problem with sprintf itself - 
I've had occasion to use it in my own programs.  The point is that it is 
totally inappropriate in this case - not only is direct access to the 
array hugely smaller and faster, but it is very much clearer to write 
and easier to understand.


When you want to use the features of sprintf (and friends) for text and 
number formatting, then it is much easier and faster than writing your 
own code.




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] just 2 bytes allocated for uint32_t?

2009-02-28 Thread Radoslav Kolev
Hi Eric,

thanks for the extremely fast replies!

On Sat, Feb 28, 2009 at 9:09 PM, Weddington, Eric
 wrote:
> Used this command line:
> avr-gcc -Os -mmcu=atmega8 -c ATmegaBOOT.c -DF_CPU=800UL -o ATmegaBOOT.o
> No warnings, no errors.
> Using avr-gcc 4.3.2.

I have found the problem in my Makefile. Due to a wrong variable
setting I was including the default linux i386 inttypes.h istead of
the correct avr/include one.

I'm attaching the fixed Makefile, just for reference. F_CPU and
BAUD_RATE are defined there.

All works perfectly now!

Thanks again,
Radoslav Kolev


Makefile
Description: Binary data
___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] just 2 bytes allocated for uint32_t?

2009-02-28 Thread Weddington, Eric
 

> -Original Message-
> From: 
> avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
> [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
> org] On Behalf Of Radoslav Kolev
> Sent: Saturday, February 28, 2009 11:55 AM
> To: avr-gcc-list@nongnu.org
> Subject: [avr-gcc-list] just 2 bytes allocated for uint32_t?
> 
> On Sat, Feb 28, 2009 at 8:09 PM, Weddington, Eric
>  wrote:
> > Are you including  or ?
> 
> Yes,  is included. I'm attaching the complete .c file.
> 
> > And please stop using inb() and outb(). They have been 
> removed from avr-libc and are not necessary. Just use a 
> regular assignment.
> 
> I know. I didn't write this file, it's from the Arduino
> (http://arduino.cc) platform that's getting pretty popular lately. It
> hasn't been touched for some time as I can see from the svn logs. I
> just wanted to recompile it to run on a lower frequency crystal.
> 
> I noticed now, that there is even a warning displayed at compile time:
> 
> ATmegaBOOT.c:470: warning: comparison is always false due to limited
> range of data type
> 

- Change  to 
- define BAUD_RATE somewhere
- define F_CPU somewhere (like on the command line below)

Used this command line:
avr-gcc -Os -mmcu=atmega8 -c ATmegaBOOT.c -DF_CPU=800UL -o ATmegaBOOT.o

No warnings, no errors.

Using avr-gcc 4.3.2.


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] just 2 bytes allocated for uint32_t?

2009-02-28 Thread Weddington, Eric
 

> -Original Message-
> From: 
> avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
> [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
> org] On Behalf Of Radoslav Kolev
> Sent: Saturday, February 28, 2009 11:55 AM
> To: avr-gcc-list@nongnu.org
> Subject: [avr-gcc-list] just 2 bytes allocated for uint32_t?
> 
> On Sat, Feb 28, 2009 at 8:09 PM, Weddington, Eric
>  wrote:
> > Are you including  or ?
> 
> Yes,  is included. I'm attaching the complete .c file.
> 
> > And please stop using inb() and outb(). They have been 
> removed from avr-libc and are not necessary. Just use a 
> regular assignment.
> 
> I know. I didn't write this file, it's from the Arduino
> (http://arduino.cc) platform that's getting pretty popular lately. It
> hasn't been touched for some time as I can see from the svn logs. I
> just wanted to recompile it to run on a lower frequency crystal.
> 
> I noticed now, that there is even a warning displayed at compile time:
> 
> ATmegaBOOT.c:470: warning: comparison is always false due to limited
> range of data type
> 

And what is the command line used to compile this file?


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] just 2 bytes allocated for uint32_t?

2009-02-28 Thread Radoslav Kolev
On Sat, Feb 28, 2009 at 8:09 PM, Weddington, Eric
 wrote:
> Are you including  or ?

Yes,  is included. I'm attaching the complete .c file.

> And please stop using inb() and outb(). They have been removed from avr-libc 
> and are not necessary. Just use a regular assignment.

I know. I didn't write this file, it's from the Arduino
(http://arduino.cc) platform that's getting pretty popular lately. It
hasn't been touched for some time as I can see from the svn logs. I
just wanted to recompile it to run on a lower frequency crystal.

I noticed now, that there is even a warning displayed at compile time:

ATmegaBOOT.c:470: warning: comparison is always false due to limited
range of data type

Regards,
Rado
/**/
/* Serial Bootloader for Atmel mega8 AVR Controller   */
/**/
/* ATmegaBOOT.c   */
/**/
/* Copyright (c) 2003, Jason P. Kyle  */
/**/
/* Hacked by DojoCorp - ZGZ - MMX - IVR   */
/* Hacked by David A. Mellis  */
/**/
/* This program is free software; you can redistribute it */
/* and/or modify it under the terms of the GNU General*/
/* Public License as published by the Free Software   */
/* Foundation; either version 2 of the License, or*/
/* (at your option) any later version.*/
/**/
/* This program is distributed in the hope that it will   */
/* be useful, but WITHOUT ANY WARRANTY; without even the  */
/* implied warranty of MERCHANTABILITY or FITNESS FOR A   */
/* PARTICULAR PURPOSE.  See the GNU General Public*/
/* License for more details.  */
/**/
/* You should have received a copy of the GNU General */
/* Public License along with this program; if not, write  */
/* to the Free Software Foundation, Inc., */
/* 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA */
/**/
/* Licence can be viewed at   */
/* http://www.fsf.org/licenses/gpl.txt*/
/**/
/* Target = Atmel AVR m8  */
/**/

#include 
#include 
#include 
#include 
#include 
#include 

//#define F_CPU			1600

/* We, Malmoitians, like slow interaction
 * therefore the slow baud rate ;-)
 */
//#define BAUD_RATE		9600

/* 6.000.000 is more or less 8 seconds at the
 * speed configured here
 */
//#define MAX_TIME_COUNT	600
#define MAX_TIME_COUNT (F_CPU>>1)
///#define MAX_TIME_COUNT_MORATORY	160

/* SW_MAJOR and MINOR needs to be updated from time to time to avoid warning message from AVR Studio */
#define HW_VER	 0x02
#define SW_MAJOR 0x01
#define SW_MINOR 0x12

// AVR-GCC compiler compatibility
// avr-gcc compiler v3.1.x and older doesn't support outb() and inb()
//  if necessary, convert outb and inb to outp and inp
#ifndef outb
	#define outb(sfr,val)  (_SFR_BYTE(sfr) = (val))
#endif
#ifndef inb
	#define inb(sfr) _SFR_BYTE(sfr)
#endif

/* defines for future compatibility */
#ifndef cbi
	#define cbi(sfr, bit) (_SFR_BYTE(sfr) &= ~_BV(bit))
#endif
#ifndef sbi
	#define sbi(sfr, bit) (_SFR_BYTE(sfr) |= _BV(bit))
#endif

/* Adjust to suit whatever pin your hardware uses to enter the bootloader */
#define eeprom_rb(addr)   eeprom_read_byte ((uint8_t *)(addr))
#define eeprom_rw(addr)   eeprom_read_word ((uint16_t *)(addr))
#define eeprom_wb(addr, val)   eeprom_write_byte ((uint8_t *)(addr), (uint8_t)(val))

/* Onboard LED is connected to pin PB5 */
#define LED_DDR  DDRB
#define LED_PORT PORTB
#define LED_PIN  PINB
#define LED  PINB5


#define SIG1	0x1E	// Yep, Atmel is the only manufacturer of AVR micros.  Single source :(
#define SIG2	0x93
#define SIG3	0x07
#define PAGE_SIZE	0x20U	//32 words


void putch(char);
char getch(void);
void getNch(uint8_t);
void byte_response(uint8_t);
void nothing_response(void);

union address_union {
  uint16_t word;
  uint8_t  byte[2];
} address;

union length_union {
  uint16_t word;
  uint8_t  byte[2];
} length;

struct flags_struct {
  unsigned eeprom : 1;
  unsigned rampz  : 1;
} flags;

uint8_t buff[256];
//uint8_t address_high;

uint8_t pagesz=0x80;

uint8_t i;
//uint8_t bootuart0=0,bootuart1=0;


void (*app_start)(void) = 0x;

int main(void)
{
  uint8_t ch,ch2;
  uint16_t w;

  //cbi(BL_DDR,BL);
  //sbi(BL_PORT,BL);

  asm volatile("nop\n\t");

  /* check if flash is programmed already, if not start bootloader anyway */
  //if(pgm_read_byte_near(0x) != 0xFF) 

RE: [avr-gcc-list] just 2 bytes allocated for uint32_t?

2009-02-28 Thread Weddington, Eric
 

> -Original Message-
> From: 
> avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
> [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
> org] On Behalf Of Radoslav Kolev
> Sent: Saturday, February 28, 2009 10:51 AM
> To: avr-gcc-list@nongnu.org
> Subject: [avr-gcc-list] just 2 bytes allocated for uint32_t?
> 
> 
> The problem is the compiler inlining functions more than once,

Known issue.

> 
> char getch(void)
> {
>   /* m8 */
> uint32_t count = 0;
>   while(!(inb(UCSRA) & _BV(RXC))) {
> /* HACKME:: here is a good place to count times*/
> count++;
> if (count > MAX_TIME_COUNT)
> app_start();
>   }
>   return (inb(UDR));
> }
> 
> The complete code is available here:
> 
> When I changed the type of the count variable from uint32_t to
> unsigned long things started working correctly.
> 
> But aren't uint32_t and unsigned long supposed to be the same and both
> 4 bytes long in this case?
> 

Are you including  or ?

And please stop using inb() and outb(). They have been removed from avr-libc 
and are not necessary. Just use a regular assignment.


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] just 2 bytes allocated for uint32_t?

2009-02-28 Thread Radoslav Kolev
Hello!

I wanted to rebuild the Arduino Atmega8 bootloader using avr-gcc
4.2.2, installed from the Ubuntu 8.04.1 packages.

At first I run against the problem of 4.x gcc versions producing
bigger code, compared to 3.x versions that doesn't fit in the 1k boot
loader space. The bug report is here
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30908

The problem is the compiler inlining functions more than once,
resulting in bigger code, even when the -0s option is specified. I was
able to work around this issue by passing the -finline-limit=1
parameter. The code compiles and fits in the bootloader space. But
after testing on one Atmega8 it didn't work right. The MCU gets stuck
in the bootloader forever, and never gets to executing the main
program.

I tracked it down to the following part:

char getch(void)
{
  /* m8 */
uint32_t count = 0;
  while(!(inb(UCSRA) & _BV(RXC))) {
/* HACKME:: here is a good place to count times*/
count++;
if (count > MAX_TIME_COUNT)
app_start();
  }
  return (inb(UDR));
}

The complete code is available here:
http://code.google.com/p/arduino/source/browse/#svn/trunk/hardware/bootloaders/atmega8

So if there is no data coming in through the USART a counter is
incremented. After it reaches some value the main program is started.
The MAX_TIME_COUNT constant is frequency dependent and is in the range
of 100-1600. In my case the count variable was rolling over
back to zero at 65535, so never reaching the MAX_TIME_COUNT value and
the MCU is forever stuck in the bootloader.

I suspect this, because setting MAX_TIME_COUNT to 65534 gives the
expected very short delay, but setting it to 65535 gives an infinite
loop.

When I changed the type of the count variable from uint32_t to
unsigned long things started working correctly.

But aren't uint32_t and unsigned long supposed to be the same and both
4 bytes long in this case?

Can someone give me a hint or explanation of the above behaviour?

Best regards,
Radoslav Kolev


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric
 

> -Original Message-
> From: Georg-Johann Lay [mailto:a...@gjlay.de] 
> Sent: Saturday, February 28, 2009 10:09 AM
> To: Weddington, Eric
> Cc: Nicholas Vinen; avr-gcc-list@nongnu.org
> Subject: Re: [avr-gcc-list] Re: C vs. assembly performance
> 
> Weddington, Eric schrieb:
> 
> BTW, is there a reason why there is more than one bug list?

Yes, because there is more than one project. ;-)

GNU Binutils
GCC
Avr-libc

Each of those projects are separate and they each have their own bug list.

WinAVR has its own bug list for 2 reasons:
- There may be bugs to the installation itself, which has nothing to do with 
the underlying projects
- It is used as a "catch-all" starting point for users who are not used to 
filing bug reports with open source projects. It is easier to point them there 
first, then then bugs can be analysed and reported to upstream projects.

Although, I admit that I haven't gone through the WinAVR bug list with the 
intent to move bugs upstream in some time. I'm planning on doing that after the 
WinAVR release, to clean up a bit.

I keep track of the AVR specific bugs in binutils, gcc, gdb here:

And that page has links to the other projects' bug lists.

Eric


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Georg-Johann Lay

Weddington, Eric schrieb:
 




-Original Message-
From: 
avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
[mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.

org] On Behalf Of Georg-Johann Lay
Sent: Saturday, February 28, 2009 9:00 AM
To: Nicholas Vinen
Cc: avr-gcc-list@nongnu.org
Subject: Re: [avr-gcc-list] Re: C vs. assembly performance


Note that this may partially be covered by report 145284 
(which I cannot 
find, maybe Eric has closed/removed it)


Sorry, my mistake. I wrote the patch because I saw gcc 4 making bad code 
(campared with 3.4.6) and read some bad-optimization reports that 
address the subject on similar code.


It was not initiated by a specific report. However, several bug reports 
on "performance regression" will be fixed/get some release by that patch.


> I'm sorry, but on which project?

BTW, is there a reason why there is more than one bug list? I saw a 
third, but cannot find it again (maybe just a view on the sourceforge list).


http://www.nongnu.org/avr-libc/bugs.html
http://sourceforge.net/tracker2/?group_id=68108&atid=520074

Georg-Johann



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric
 

> -Original Message-
> From: 
> avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
> [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
> org] On Behalf Of Georg-Johann Lay
> Sent: Saturday, February 28, 2009 9:00 AM
> To: Nicholas Vinen
> Cc: avr-gcc-list@nongnu.org
> Subject: Re: [avr-gcc-list] Re: C vs. assembly performance
> 
> 
> Note that this may partially be covered by report 145284 
> (which I cannot 
> find, maybe Eric has closed/removed it)
> 

I'm sorry, but on which project?


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Georg-Johann Lay

Nicholas Vinen schrieb:

Georg-Johann Lay wrote:


If you are sure it is really some performance issue/regression and not
due to some language standard implication, you can add a report to
   http://sourceforge.net/tracker/?group_id=68108

so that the subject won't be forgotten. Also mind
   http://gcc.gnu.org/bugs.html

And of course, you can ask questions here. In that case it is helpful
if you can manage to simplify the source to a small piece of code that
triggers the problem and allows others to reproduce the problem. (i.e.
no #include in the code, no ... (except for varargs), a.s.o).

Snippets of .s may point to the problem when you add -dp -fverbose-asm

And there are lots of places where avr-gcc produces suboptimal or even
bad code, so feedback is welcome.

But note that just a few guys are working on the AVR part of gcc.
I would do more if I had the time (and the support of some gurus to
ask questions on internals then and when...)


OK, I only spent a few minutes looking at old code and I found some
obviously sub-optimal results. It distills down to this:

#include 

int main(void) {
  unsigned long packet = 0;

  while(1) {
if( !(PINC & _BV(PC2)) ) {
  packet = (packet<<1)|(((unsigned char)PINC>>1)&1);
}
PORTB = packet;
  }
}

avr-gcc is: avr-gcc (Gentoo 4.3.3 p1.0, pie-10.1.5) 4.3.3

The avr/io stuff is just so that it won't optimise the code away to nothing.



Please avoid the #include stuff. You can use source like this:

#define PINC  (*((unsigned char volatile*) 0x20))
#define PORTB (*((unsigned char volatile*) 0x21))

void foo ()
{
unsigned long packet = 0;

while(1)
{
if (!(PINC & (1 << 2)))
{
packet = (packet<<1)|(((unsigned char)PINC>>1)&1);
}
PORTB = packet;
}
}


I tried compiling it with both -Os and -O2:

avr-gcc -g -dp -fverbose-asm -Os -S -mmcu=atmega48 -o test test.c

The result includes this:

lsl r18  ;  packet   ;  50  *ashlsi3_const/2[length = 4]
rol r19  ;  packet
rol r20  ;  packet
rol r21  ;  packet
in r24,38-0x20   ;  D.1214,  ;  16  *movqi/4[length = 1]
lsr r24  ;  D.1214   ;  17  lshrqi3/3   [length = 1]
ldi r25,lo8(0)   ; , ;  48  *movqi/2[length = 1]
ldi r26,lo8(0)   ; , ;  46  *movhi/4[length = 2]
ldi r27,hi8(0)   ; ,
andi r24,lo8(1)  ;  tmp52,   ;  19  andsi3/2[length = 4]
andi r25,hi8(1)  ;  tmp52,
andi r26,hlo8(1) ;  tmp52,
andi r27,hhi8(1) ;  tmp52,
or r18,r24   ;  packet, tmp52;  20  iorsi3/1[length 
= 4]
or r19,r25   ;  packet, tmp52
or r20,r26   ;  packet, tmp52
or r21,r27   ;  packet, tmp52

The problem, it seems, is that the compiler doesn't realize that the
right hand side of the expression can only have any non-zero values in
the bottom 8 bits, since it's an unsigned char which is being implicitly
expanded to 32 bits for the or operation. In fact, it's only the bottom
bit that's ever non-zero. As a result it's spending a number of cycles
and registers doing useless things. I'll copy a report to the locations
you mention in your e-mail.


Note that this may partially be covered by report 145284 (which I cannot 
find, maybe Eric has closed/removed it)


I already filed a patch for that in
   http://lists.gnu.org/archive/html/avr-gcc-list/2008-12/msg00019.html
that covers your issue to some extent or maybe almost complete: The new 
pattern "*ior2_bit0" would match some parts of your code.



There are probably ways to work around this, such as making "packet" a
union of an unsigned char and a long, then shifting the long and only
ORing in the unsigned char. I'll note that there's also an optimization
to be had with the right hand side of the expression. I would write the
assembly something like this:


>lsl r18
>rol r19
>rol r20
>rol r21
>in r24,38-0x20
>bst r24, 1
>bld r18, 0

The result of the above patch should lead to something like
lsl r18
rol r19
rol r20
rol r21
in r24,38-0x20
bst r24, 1
sbrs r18, 0
bld r18, 0
The SBRS is necessary, because the pattern is not aware of the fact that 
r18.0 is 0. Maybe the optimization is even better (or waeker); I am not 
using that patch at the moment and can just estimate its effect when 
peeking into rtl dumps of an unpatched gcc.


Concerning the patch itself, I don't know anything about its fate and if 
it will ever make its way into gcc because of administrative obstacles 
and the technique I used.


I don't like the technique I am using because it leads to complex 
patterns that are hard to understand an test and will become useless if 
the middleend decides to represent the stuff in a slightly different way...


Georg-Johann



___
AVR-GCC-list mailing lis

RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric
 

> -Original Message-
> From: 
> avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
> [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
> org] On Behalf Of Nicholas Vinen
> Sent: Saturday, February 28, 2009 6:21 AM
> To: partha_nay...@yahoo.com
> Cc: avr-gcc-list@nongnu.org
> Subject: Re: [avr-gcc-list] Re: C vs. assembly performance
> 
> 
> Now that I've signed up to this list, if and when I come 
> across avr-gcc missing obvious optimisations I'll report them.
> 

We certainly appreciate bug reports. However, before you report them, please 
make sure that they haven't been reported already. An AVR toolchain bug list is 
kept here:



There are already a number of missed optimization gcc bugs reported on that 
list. Some have even been fixed already, though they haven't been released 
through a toolchain distribution.


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Nicholas Vinen
Georg-Johann Lay wrote:
> If you are sure it is really some performance issue/regression and not
> due to some language standard implication, you can add a report to
> http://sourceforge.net/tracker/?group_id=68108
>
> so that the subject won't be forgotten. Also mind
> http://gcc.gnu.org/bugs.html
>
> And of course, you can ask questions here. In that case it is helpful
> if you can manage to simplify the source to a small piece of code that
> triggers the problem and allows others to reproduce the problem. (i.e.
> no #include in the code, no ... (except for varargs), a.s.o).
>
> Snippets of .s may point to the problem when you add -dp -fverbose-asm
>
> And there are lots of places where avr-gcc produces suboptimal or even
> bad code, so feedback is welcome.
>
> But note that just a few guys are working on the AVR part of gcc.
> I would do more if I had the time (and the support of some gurus to
> ask questions on internals then and when...)
OK, I only spent a few minutes looking at old code and I found some
obviously sub-optimal results. It distills down to this:

#include 

int main(void) {
  unsigned long packet = 0;

  while(1) {
if( !(PINC & _BV(PC2)) ) {
  packet = (packet<<1)|(((unsigned char)PINC>>1)&1);
}
PORTB = packet;
  }
}


avr-gcc is: avr-gcc (Gentoo 4.3.3 p1.0, pie-10.1.5) 4.3.3

The avr/io stuff is just so that it won't optimise the code away to nothing.

I tried compiling it with both -Os and -O2:

avr-gcc -g -dp -fverbose-asm -Os -S -mmcu=atmega48 -o test test.c

The result includes this:

lsl r18  ;  packet   ;  50  *ashlsi3_const/2[length = 4]
rol r19  ;  packet
rol r20  ;  packet
rol r21  ;  packet
in r24,38-0x20   ;  D.1214,  ;  16  *movqi/4[length = 1]
lsr r24  ;  D.1214   ;  17  lshrqi3/3   [length = 1]
ldi r25,lo8(0)   ; , ;  48  *movqi/2[length = 1]
ldi r26,lo8(0)   ; , ;  46  *movhi/4[length = 2]
ldi r27,hi8(0)   ; ,
andi r24,lo8(1)  ;  tmp52,   ;  19  andsi3/2[length = 4]
andi r25,hi8(1)  ;  tmp52,
andi r26,hlo8(1) ;  tmp52,
andi r27,hhi8(1) ;  tmp52,
or r18,r24   ;  packet, tmp52;  20  iorsi3/1[length 
= 4]
or r19,r25   ;  packet, tmp52
or r20,r26   ;  packet, tmp52
or r21,r27   ;  packet, tmp52



The problem, it seems, is that the compiler doesn't realize that the
right hand side of the expression can only have any non-zero values in
the bottom 8 bits, since it's an unsigned char which is being implicitly
expanded to 32 bits for the or operation. In fact, it's only the bottom
bit that's ever non-zero. As a result it's spending a number of cycles
and registers doing useless things. I'll copy a report to the locations
you mention in your e-mail.


There are probably ways to work around this, such as making "packet" a
union of an unsigned char and a long, then shifting the long and only
ORing in the unsigned char. I'll note that there's also an optimization
to be had with the right hand side of the expression. I would write the
assembly something like this:

lsl r18
rol r19
rol r20
rol r21
in r24,38-0x20
bst r24, 1
bld r18, 0


I'm sure I can find other examples of poor code generation in this
particular file, since I remember coming across many cases where I
replaced the generated code with inline assembly when I was originally
working on it, but that will have to wait for later.


Thanks for your help, I appreciate it. As I said avr-gcc is pretty good,
but I would love it if it could get even better :)



Nicholas

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Nicholas Vinen
Georg-Johann Lay wrote:
> compiling the following code
> unsigned char sh4 (unsigned char x)
> {
> return x >> 4;
> }
>
> unsigned char sh8 (unsigned short x)
> {
> return x >> 8;
> }
>
> with avr-gcc 4.3.2 and -Os yields (non-code stripped)
>
> sh4:
> swap r24 ;
> andi r24,lo8(15) ; ,
> ret
>
> sh8:
> mov r24,r25 ; , x
> ret
Interesting. It may be that either I was using an earlier version which
missed these optimisations, or else it was because my code was much more
complex and the optimiser therefore missed them. I suppose I can go back
and find the old code, compile it, and see what comes out now. I forgot
about "andi", that makes it an even better optimisation, half as many
cycles and instructions.
>
>> the compiler doing silly things like this, is it worth me posting the
>> code & assembly output to this list?
> If you are sure it is really some performance issue/regression and not
> due to some language standard implication, you can add a report to
> http://sourceforge.net/tracker/?group_id=68108
>
> so that the subject won't be forgotten. Also mind
> http://gcc.gnu.org/bugs.html
>
> And of course, you can ask questions here. In that case it is helpful
> if you can manage to simplify the source to a small piece of code that
> triggers the problem and allows others to reproduce the problem. (i.e.
> no #include in the code, no ... (except for varargs), a.s.o).
>
> Snippets of .s may point to the problem when you add -dp -fverbose-asm
>
> And there are lots of places where avr-gcc produces suboptimal or even
> bad code, so feedback is welcome.
>
> But note that just a few guys are working on the AVR part of gcc.
> I would do more if I had the time (and the support of some gurus to
> ask questions on internals then and when...)
Yeah, this is one reason I haven't complained loudly in the past,
avr-gcc is already pretty good and I didn't want to apply a lot of
pressure to fix every little missed optimisation. However, it sure would
be nice. I'll see if I can dig up some of my old code now, before I
rewrote it in assembly. If it's still doing things the slow way I'll
point it out at the places you mention.

Thanks!


Nicholas



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Nicholas Vinen
Parthasaradhi Nayani wrote:
>   
>> From: Nicholas Vinen 
>> 
>
>  For example, things like "unsigned char x, y;
>   
>> x = y>>4" could
>> use the nibble swap instruction rather than four shifts,
>> and things like
>> 
>
> Shifting a byte or int right or left must push in 00s from the other side so 
> swapping a nibble is not the right thing to do. So is the case with other 
> examples. Correct me if I am wrong.
>
> Nayani
>
>   
Yes, it has to blank the top 4 bits, but I believe it's still faster to
swap the nibble and do that than four shifts. Something like:

SWAP r1
LDI $15, r2
AND r2, r1

This is three instructions and three cycles, as opposed to:

LSR r1
LSR r1
LSR r1
LSR r1


which is four instructions and cycles. The former requires a spare
register but that generally isn't a problem.

This is just an example. I didn't note them down at the time but I saw
the compiler doing a lot of things the "long way" when there was a
simple, faster, smaller way to do it. The case of accessing some of the
bytes in a larger type via shifting was particularly annoying. Perhaps a
union would have solved that, but it seems silly to have to resort to
doing it that way.

Now that I've signed up to this list, if and when I come across avr-gcc
missing obvious optimisations I'll report them.



Nicholas

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Georg-Johann Lay

Nicholas Vinen schrieb:

* On the other hand, it would be great if avr-gcc could perform some
basic optimisations that even a fairly inexperienced amateur could
manage. For example, things like "unsigned char x, y; x = y>>4" could
use the nibble swap instruction rather than four shifts, and things like


The compiler uses swap if it can. Maybe there are some effects of 
implicit type conversion like

   unsigend char x, y, z; z = (x+y) >> 4;
which is quite different from
   z = x >> 4;


"unsigned short a; unsigned char b; b = a>>8" could pretty much just be
a single instruction. These are examples of things I've seen the
compiler waste cycles on, that are fairly obvious. In future if I see


compiling the following code
unsigned char sh4 (unsigned char x)
{
return x >> 4;
}

unsigned char sh8 (unsigned short x)
{
return x >> 8;
}

with avr-gcc 4.3.2 and -Os yields (non-code stripped)

sh4:
swap r24 ;
andi r24,lo8(15) ; ,
ret

sh8:
mov r24,r25  ; , x
ret


the compiler doing silly things like this, is it worth me posting the
code & assembly output to this list?
If you are sure it is really some performance issue/regression and not 
due to some language standard implication, you can add a report to

http://sourceforge.net/tracker/?group_id=68108

so that the subject won't be forgotten. Also mind
http://gcc.gnu.org/bugs.html

And of course, you can ask questions here. In that case it is helpful if 
you can manage to simplify the source to a small piece of code that 
triggers the problem and allows others to reproduce the problem. (i.e. 
no #include in the code, no ... (except for varargs), a.s.o).


Snippets of .s may point to the problem when you add -dp -fverbose-asm

And there are lots of places where avr-gcc produces suboptimal or even 
bad code, so feedback is welcome.


But note that just a few guys are working on the AVR part of gcc.
I would do more if I had the time (and the support of some gurus to ask 
questions on internals then and when...)




Nicholas


Georg-Johann


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Parthasaradhi Nayani


> From: Nicholas Vinen 

 For example, things like "unsigned char x, y;
> x = y>>4" could
> use the nibble swap instruction rather than four shifts,
> and things like

Shifting a byte or int right or left must push in 00s from the other side so 
swapping a nibble is not the right thing to do. So is the case with other 
examples. Correct me if I am wrong.

Nayani


  


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Nicholas Vinen

> If I wanted fast and small, I'd have done it in ASM. But half of the point
> of this exercise was to get my feet wet with C.
> So, both points accomplished. I have GCC up under AVR studio, a working
> project, and I feel reasonably confident with the code.
>   
Despite avr-gcc being fairly dim about optimisations*, the results can
be pretty reasonable. I had to write one program for a Mega48 in
assembly language to get it fast enough. The goal was to linearly
interpolate 16 bit values from a table stored in flash fast enough to
output a 192kHz signal with the processor running at 20MHz. That gives
just over 100 cycles to fetch the values, do the interpolation, output
the value and update the pointers. I couldn't do it in C, even with
inline assembly, but managed it in assembly. Part of the trick was
storing everything in registers, which JUST fit for two channels.
However, the C compiler was able to do it in something like 160-180
cycles. So I wouldn't discount C performance being quite acceptable if
you are careful.

* On the other hand, it would be great if avr-gcc could perform some
basic optimisations that even a fairly inexperienced amateur could
manage. For example, things like "unsigned char x, y; x = y>>4" could
use the nibble swap instruction rather than four shifts, and things like
"unsigned short a; unsigned char b; b = a>>8" could pretty much just be
a single instruction. These are examples of things I've seen the
compiler waste cycles on, that are fairly obvious. In future if I see
the compiler doing silly things like this, is it worth me posting the
code & assembly output to this list?



Nicholas



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list