subject:"\[avr\-gcc\-list\] Re\: C vs. assembly performance"

[avr-gcc-list] Re: C vs. assembly performance

2009-03-02 Thread David Brown


Georg-Johann Lay wrote:

David Brown schrieb:

Georg-Johann Lay wrote:
regardless if optimization is on or not. If fact I would guess that 
it is a policy that the code *must* be the same regardless what debug 
level (if any) or debug format is used, and code beeing dependent on 
debug level/format is worth a bug report.


uaaa, typo devil above. I meant debugging info enabled instead of 
optimization in the first line. With optimization my statement is 
obvious nonsense. blush.


That is certainly not true.  Enabling debug information will disable 
or limit some optimisations.  gcc in general is pretty good at optimising 


Can you make that explicit with an example? With code I mean code that 
will end up in the target, i.e. no code that lives in some .stab* or 
.debug* section.


But I must admit that I don't debug my code and consequently have debug 
info turned off, so I am not familiar with debugging info and maybe 
fundamentally wrong on gcc policies concerning that topic.


Turned on -g3 for a try in my actual AVR project (12k .text) does not 
show other sizes for .text, and static ram usage is exactly the same as 
without -g3. Size of .i, .s, .o and .elf stuff will increase because of 
debug info, but .text and .data et al. should not change, neither in 
size, nor in content.




A few quick tests here confirm what you've said.  I looks like my 
understanding of debug data generation is a bit outdated.


mvh.,

David




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread Georg-Johann Lay


Vincent Trouilliez schrieb:

On Sat, 28 Feb 2009 19:09:13 -0700
Weddington, Eric ewedding...@cso.atmel.com wrote:


So in application code I tend to avoid switch statements for embedded systems, 
unless I'm writing throw-away code or the application is trivial.


Oh no ! ;-)
I have only recently got round to using switch statements, to improve
code legibility. In my current/first embedded project, I happen to have
a very long (25 cases, 160 lines long) switch statement.. I dread to
think what it would like if I had to replace it (what else with ?) with
nested if's !
How readable would that be... not to mention that with indentation, 25
levels of nesting would mean the last case would be 3 meters on the far
right... ;-)

Any coding tips to make all this look about readable by human
beings ?! ;-/

You can write

if ()
{}
else if ()
{}
else if ()
{}
...
else
{}

instead of

if ()
{}
else
if ()
{}
else
if ()
{}
   ...
   else
   {}

(Optionally with one more level of {} in each but the last else)
In fact, editors like emacs will indent the two sources differently, 
i.e. just in the way I indicated, even though there is absolutely no 
difference in semantic. And the first complies with coding standards 
like gnu and gcc, and maybe even others like misra etc.


Georg-Johann


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread Bob Paddock

 And the first complies with coding standards like gnu and gcc,
 and maybe even others like misra etc.

MISRA doesn't say a lot about style, that is how pretty
the code looks.  It is implicit that ugly looking code
is bad, and probably buggy.  If your code looks
like a IOCCC entry, your code would not be MISRA
complaint.
http://www.ioccc.org/

Per MISRA, in a nested if/else tree all brackets are required, and
there must be a final else{} as you showed.

As far as I can tell MISRA-2004 is silent on a preference
between switch(), nested if/else() and function pointers.
Personally I prefer tables of function pointers.  Makes
code looking like a threaded language like Forth at times.
In any case what can get hairy fast is conditionals containing
conditionals, which makes testing all execution paths problematic.

Note that technically no AVR-LibC based project complies
with MISRA due to rule #3.6 that says any library used,
including the GCC libraries, must be complaint with MISRA [IEC 61508
Part 3].  I tend to follow the MISRA Guidelines anyway as it shows an
attempt at Due Diligence.


-- 
http://www.wearablesmartsensors.com/
http://www.softwaresafety.net/
http://www.designer-iii.com/
http://www.unusualresearch.com/


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

[avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread David Brown


Vincent Trouilliez wrote:

On Sat, 28 Feb 2009 19:24:38 -0700 Weddington, Eric
ewedding...@cso.atmel.com wrote:

You wouldn't need *nested* ifs, but an if-else-if structure, or
better yet, a table of function pointers, also known as a dispatch
table. Each method depends on the type of data that you're
switching on.


I switch on an unsigned byte, with contiguous values (0-24). A
Function table sounds elegant to my ear, but it would mean 25 
functions ! In my case the work is done in-situ, within the switch 
statement, as it's takes only 2 or 3 statements to process a givne 
case. Using functions would be both overkill and overwhelming to 
manage I think !! ;-)


I pasted my switch statement below for the curious.



These sorts of decisions depend very much on the circumstances.  I am 
not nearly as anti-switch as Eric - I would use a switch in this case. 
I agree with him that many switches could be better expressed as 
if-trees or function tables (though the compiler can often do a better 
job at optimising switches than function tables...).


As Eric says, if's give you better control of your timing and priority, 
in case that is useful.


An interesting structure for replacing switches is a binary if tree:

// switch (x) for x = 0 .. 7
if (!(x  0x04)) {
if (!(x  0x02)) {
if (!(x  0x01)) {
// case 0: ...
} else {
// case 1: 
}
} else {
if (!(x  0x01)) {
// case 2: ...
} else {
// case 3: 
}
}
} else {
...
}

This lets you avoid the long delays you get with a flatter if structure 
if you have lots of cases.  Unfortunately, this doesn't seem to generate 
good code according to my very brief testing - ideally, we should see a 
series of sbrs or sbrc instructions, but in practice C's irritating 
int-promotion feature is getting in the way.



Apart from that, I've a couple of other comments on your code.  The 
variable names tmp16, tmp32 and tmpS16 are truly awful.  It is 
also (usually) best to declare such temporaries in as small a block as 
possible.  Thus they should not be at the start of the function, but 
instead make your cases like this:


{// (N * 0.75) - 40 DB41-40 to +150 °C
int16_t temp = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40;
var_format_S16(buff, temp, 0);
break;
}

That will give clearer code and let the optimiser be more flexible in 
its register choices.


Also avoid local automatic constant arrays (like Yes in your example) 
- they must be built on the stack each time the function is called, 
whether they are used or not.  By making these static, they will have 
local names but be built once at the start of the program.  If you are 
short on ram space, you might also want to make them PROGMEM.




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread Vincent Trouilliez


Thanks all for the input on switch statement, I appreciate.


On Sun, 01 Mar 2009 14:37:57 +0100
David Brown david.br...@hesbynett.no wrote:

 Apart from that, I've a couple of other comments on your code.  The 
 variable names tmp16, tmp32 and tmpS16 are truly awful.

Oh :-) I was just trying to give their meaningful names.
They are just that: temporary/scratch integers used only to
process the raw data byte before it can be sent to the function
var_format_S16() which will convert it to ASCII.

 It is also (usually) best to declare such temporaries in as small a block as 
 possible.  Thus they should not be at the start of the function, but 
 instead make your cases like this:
 
 {// (N * 0.75) - 40   DB41-40 to +150 °C
   int16_t temp = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40;
   var_format_S16(buff, temp, 0);
   break;
 }
   

Hmm, actually I didn't even know one could declare local variables
inanother place than the start of the function, I take note.

 That will give clearer code

I find it just adds bloat and confusion: I have to declare the variable
25 times every 5 lines of code, that's overwheleming, plus it make the
statement harder to parse (my brain is easily overloaded ;-), and also,
it is a little confusing/harder to understand: if the variable is
declared 25 times, one could think: well if it's a diffeent variable
everytime, there must be something special/obscure that makes it
impossible to use the same temp/scratch integer. This leaves a false
impression that the processing of the 25 individual data bytes is more
complex than it really is, and that a single temp integer could
perfectly be used.

Well that's just my view as a beginner.. maybe with time I will
progressively start to see things the way you do ! ;-)

 Also avoid local automatic constant arrays (like Yes in your example)

Yes I don't like them, but since they just that, local, I didn't want
to declare them global, to avoid throwing bloat and confusion among all
the variables declaration which are truly/genuinely global.

 - they must be built on the stack each time the function is called, 
 whether they are used or not.

Yes, I was conscious of this but it was trade-off between wasting time
initialising them each time I enter the function.. and degrading code
clarity by defining local variables among truly global ones. Doesn't
make me happy.. like most compromises I suppose ! ;-)

  By making these static, they will have 
 local names but be built once at the start of the program.

He ? Wow, so I can get the best of both worlds, just be qualifying them
static ?! Yahoo ! :o)
Thanks David, you just made me and my code that little bit better ! :-)

  If you are short on ram space, you might also want to make them PROGMEM.

Yeah I do do this at times, even though I have ample available RAM..
just because I don't like wasting things, be it MCU resources or money
or anything, it's just a general state of mind ;-)

The Yes and No strings I accepted to waste RAM on because they are
very short. But other strings in the program, which are 20 characters
long and used in several times, I declared them global to save memory.


Thanks for you comment, made my program better and I, a little bit
smarter/skilled ;-)


--
Vince


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread Georg-Johann Lay


David Brown schrieb:
variable names tmp16, tmp32 and tmpS16 are truly awful.  It is 
also (usually) best to declare such temporaries in as small a block as 
possible.  Thus they should not be at the start of the function, but 
instead make your cases like this:


{// (N * 0.75) - 40DB41-40 to +150 °C
int16_t temp = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40;
var_format_S16(buff, temp, 0);
break;
}

That will give clearer code and let the optimiser be more flexible in 
its register choices.


As far as the optimizer of gcc is concerned, that makes no difference. 
 It knows exactly what register contains what value and is aware of the 
place where a register dies, i.e. the register can be reused for 
whatever other stuff. Anyway, even if just one temp variabe is used, gcc 
will produce a new (pseudo) register vor every result like moves, 
arithmetic, etc. These pseudos may or may not end up in the same macine 
register. On that level, blocks are just syntactic sugar (if they are 
not used to hide visibility, e.g. like in int tmp=0; {int tmp = 1;} )


To get a notion of the various machine intependant transformations, have 
a glance at gcc's output with -fdump-tree-all, and for the machine 
dependent it is -fdump-rtl all. They make clear that do-while, while, 
for and if-goto are just flavours of same sugar.


Georg-Johann


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

RE: [avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread Weddington, Eric

 -Original Message-
 From: 
 avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
 [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
 org] On Behalf Of Bob Paddock
 Sent: Sunday, March 01, 2009 6:22 AM
 To: avr-gcc-list@nongnu.org
 Subject: Re: [avr-gcc-list] Re: C vs. assembly performance

  And the first complies with coding standards like gnu and gcc,
  and maybe even others like misra etc.

 MISRA doesn't say a lot about style, that is how pretty
 the code looks.  It is implicit that ugly looking code
 is bad, and probably buggy.  If your code looks
 like a IOCCC entry, your code would not be MISRA
 complaint.
 http://www.ioccc.org/

 Per MISRA, in a nested if/else tree all brackets are required, and
 there must be a final else{} as you showed.

 As far as I can tell MISRA-2004 is silent on a preference
 between switch(), nested if/else() and function pointers.
 Personally I prefer tables of function pointers.  Makes
 code looking like a threaded language like Forth at times.
 In any case what can get hairy fast is conditionals containing
 conditionals, which makes testing all execution paths problematic.

 Note that technically no AVR-LibC based project complies
 with MISRA due to rule #3.6 that says any library used,
 including the GCC libraries, must be complaint with MISRA [IEC 61508
 Part 3].  I tend to follow the MISRA Guidelines anyway as it shows an
 attempt at Due Diligence.

Bob,

s/complaint/compliant/g
They're two different words. ;-)

I would really like gcc to have a -wmisra switch someday, so we can test if an 
application, or library such as avr-libc is compliant. Not that I like MISRA 
anyway. I think it's a brain-dead standard.

But do you have some tool that you have used to check avr-libc against MISRA? 
If so, do you have a list of issues that it found?

Eric

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

[avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread David Brown


Vincent Trouilliez wrote:

Thanks all for the input on switch statement, I appreciate.


On Sun, 01 Mar 2009 14:37:57 +0100
David Brown david.br...@hesbynett.no wrote:

Apart from that, I've a couple of other comments on your code.  The 
variable names tmp16, tmp32 and tmpS16 are truly awful.


Oh :-) I was just trying to give their meaningful names.
They are just that: temporary/scratch integers used only to
process the raw data byte before it can be sent to the function
var_format_S16() which will convert it to ASCII.

It is also (usually) best to declare such temporaries in as small a block as 
possible.  Thus they should not be at the start of the function, but 
instead make your cases like this:


{// (N * 0.75) - 40 DB41-40 to +150 °C
int16_t temp = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40;
var_format_S16(buff, temp, 0);
break;
}



Hmm, actually I didn't even know one could declare local variables
inanother place than the start of the function, I take note.



Even with older C standards, you can declare variables at the beginning 
of any block.  With C99, you can declare them almost anywhere.



That will give clearer code


I find it just adds bloat and confusion: I have to declare the variable
25 times every 5 lines of code, that's overwheleming, plus it make the
statement harder to parse (my brain is easily overloaded ;-), and also,
it is a little confusing/harder to understand: if the variable is
declared 25 times, one could think: well if it's a diffeent variable
everytime, there must be something special/obscure that makes it
impossible to use the same temp/scratch integer. This leaves a false
impression that the processing of the 25 individual data bytes is more
complex than it really is, and that a single temp integer could
perfectly be used.



It doesn't add bloat to your source code - it's an extra type 
declaration at the start of each use of your temporaries.  You can 
change lines such as tmp32 =  into int32_t tmp = , which is hardly 
bloat.  And it will reduce bloat in the generated code.


Each clause in your switch uses the variables in a slightly different 
way - there is no benefit to your source code in trying to manually 
force the compiler to re-use these variables.  They are different uses, 
so you can declare them individually.


Also remember that the smaller the scope of an identifier, the shorter 
the name you can use while keeping the code equally clear.  A variable 
called tmp16S is a meaningless name over a span of 100 lines - a 
variable called t is perfectly clear over a two-line lifespan.



Well that's just my view as a beginner.. maybe with time I will
progressively start to see things the way you do ! ;-)


Also avoid local automatic constant arrays (like Yes in your example)


Yes I don't like them, but since they just that, local, I didn't want
to declare them global, to avoid throwing bloat and confusion among all
the variables declaration which are truly/genuinely global.



That's what static within a function is for (well, one of its uses). 
Being static it exists for the lifetime of the program, being locally 
declared within a function limits its scope to that function.  So 
although it *exists* globally, it is only *visible* locally.


- they must be built on the stack each time the function is called, 
whether they are used or not.


Yes, I was conscious of this but it was trade-off between wasting time
initialising them each time I enter the function.. and degrading code
clarity by defining local variables among truly global ones. Doesn't
make me happy.. like most compromises I suppose ! ;-)



A good compromise makes everyone equally unhappy.

 By making these static, they will have 
local names but be built once at the start of the program.


He ? Wow, so I can get the best of both worlds, just be qualifying them
static ?! Yahoo ! :o)
Thanks David, you just made me and my code that little bit better ! :-)


 If you are short on ram space, you might also want to make them PROGMEM.


Yeah I do do this at times, even though I have ample available RAM..
just because I don't like wasting things, be it MCU resources or money
or anything, it's just a general state of mind ;-)



That's a good attitude in general, but for small strings it's often 
easier to be a little wasteful.  On of the weak points of the AVR is 
that you can't mix pointers to ram and pointers to flash, so using data 
in flash is a bit of a pain.



The Yes and No strings I accepted to waste RAM on because they are
very short. But other strings in the program, which are 20 characters
long and used in several times, I declared them global to save memory.



Make sure they are PROGMEM - that's what saves ram space, not the global 
or local declarations.




Thanks for you comment, made my program better and I, a little bit
smarter/skilled ;-)


--
Vince




___
AVR-GCC-list mailing list

[avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread David Brown


Georg-Johann Lay wrote:

David Brown schrieb:
variable names tmp16, tmp32 and tmpS16 are truly awful.  It is 
also (usually) best to declare such temporaries in as small a block as 
possible.  Thus they should not be at the start of the function, but 
instead make your cases like this:


{// (N * 0.75) - 40DB41-40 to +150 °C
int16_t temp = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40;
var_format_S16(buff, temp, 0);
break;
}
That will give clearer code and let the optimiser be more flexible 
in its register choices.


As far as the optimizer of gcc is concerned, that makes no difference. 
 It knows exactly what register contains what value and is aware of the 
place where a register dies, i.e. the register can be reused for 
whatever other stuff. Anyway, even if just one temp variabe is used, gcc 
will produce a new (pseudo) register vor every result like moves, 
arithmetic, etc. These pseudos may or may not end up in the same macine 
register. On that level, blocks are just syntactic sugar (if they are 
not used to hide visibility, e.g. like in int tmp=0; {int tmp = 1;} )




I haven't looked at code generated for such switches (there is often so 
much of it), so I admit to having guessed a little.  I was thinking 
especially of when you have debug information enabled - that can force 
the compiler to keep variables in separate registers.


To get a notion of the various machine intependant transformations, have 
a glance at gcc's output with -fdump-tree-all, and for the machine 
dependent it is -fdump-rtl all. They make clear that do-while, while, 
for and if-goto are just flavours of same sugar.




And here was me thinking the generated source code was sometimes a bit 
big to wade through...  Sometime I must look at this in more detail.


mvh.,

David



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

RE: [avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread Weddington, Eric

 -Original Message-
 From: Bob Paddock [mailto:bob.padd...@gmail.com] 
 Sent: Sunday, March 01, 2009 9:59 AM
 To: Weddington, Eric
 Cc: avr-gcc-list@nongnu.org
 Subject: Re: [avr-gcc-list] Re: C vs. assembly performance

 Would be nice.  There are probably legal issues with doing that.
 MISRA is one of those things you must buy.  There are not Open Source
 versions, as MISRA does not allow the distribution of the 
 rules without
 proper licensing.

I wasn't aware of that. How disgusting.

 MISRA aside I think everyone should invest in their own copy of Lint,
 especially people new to the language.  One of the few tools 
 priced within
 the range of an individual.  There is the Open Source Splint:
 http://www.splint.org/ , however I'm not very familiar with it.

I've looked briefly into building splint and redistributing it with WinAVR. 
IIRC, I had problems with building it. But I still think it's a good idea.

 After a while you get to the point of writing your code where 
 you end up with
 a dialog going on in your head: Lint is going to complain 
 about that, do
 it this way instead.  For example while(1){} per Lint and MISRA 13.7,
 should be for(;;){} to prevent constant value Boolean error.

That is totally stupid.

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread Georg-Johann Lay


David Brown schrieb:

Georg-Johann Lay wrote:



Source code structure is a concern of the project, not of the compiler.
Even for braindead code that comes from a code generator a compiler is 
supposed to yield good results.


That's true in theory - but embedded programmers are used to the 
difference between theory and practice (there's an interesting 
discussion about the theory and practice of volatile on 
comp.arch.embedded at the moment).  In theory, the compiler should 
generate good code no matter how the source code is structured.  In 
practice, the experienced programmer can do a lot to help the tools. 
avr-gcc *does* do a good job with most code - I do much less 
re-structuring of my source code for avr-gcc than I do for most other 
compilers (I use a lot of compilers for a lot of different targets).


Yes, I agree with you. You can help the tools a lot. At the moment I am 
in the strange and interesting situation just the other way round: Not 
tweaking a compiler to please a hardware, but to give a recommendation 
for a new ISA design to please a compiler (GCC).


I am inspecting the produced asm in some of my AVR projects with hard 
realtime requirements, too. But I would not encourage anyone to dig in 
the generated asm and try to get best code by re-arranging it or 
trying to find other algebraic representations. That takes a lot of 
time, and a compiler should care for the sources it gets, not the 
other way round. And if your code is intended to be cross-platform, 
you are stuck. If your code changes some 100 source lines away from 
the critical code, the inefficient code can return and you have to 
rewrite your code again to find another representation that avoids the 
bad code.
It is certainly true that you want to keep such compiler-helpful 
structuring to a minimum.  But if you are trying to write efficient code 
(rather than emphasising portability or development speed or other 
priorities), you *must* be familiar with your compiler and the types of 
code it generates for particular sequences of input.  You can very 
quickly learn some basic tricks that can make a great difference to the 
generated code with very little re-structuring of the source code.  A 
prime example is to use 8-bit data rather than traditional C int where 
possible.  Another case in point is to prefer explicit if conditionals 
rather than trying to calculate a conditional expression, such as was 
done here (if you are using a heavily pipelined processor, the opposite 
is true).
GCC will more and more transform and canonicalize such stuff, i.e. it 
will turn if-else into flat (with respect to code flow) algebraic code. 
The backend can fix that, but the jumps then are implicit, no more 
explicit. And I was astonished to see examples where int instead of char 
yield better code. The cases are rare, but they exist.


Georg-Johann


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread Bob Paddock

 Would be nice.  There are probably legal issues with doing that.
 MISRA is one of those things you must buy.  There are not Open Source
 versions, as MISRA does not allow the distribution of the
 rules without proper licensing.

 I wasn't aware of that. How disgusting.

Yes.  Standards are largely a scam to make money for those
that wrote them.  They make sense to a degree, but the cost
can be so exorbitant to an individual you could not afford
them.  UL is great for this.   They make you comply to a standard
like 913, then obsolete it so you must buy, and recertify to
a new edition.  This orneriness regulatory burden kills, or prevents,
the small business.

 For example while(1){} per Lint and MISRA 13.7,
 should be for(;;){} to prevent constant value Boolean error.

 That is totally stupid.

Welcome to standards.  The first thing you have to realize when
working with standards, in particular government standards,
is that logic does not apply.

http://www.gimpel.com/html/pub90/msg.txt see #716.

The Gimpel Lint error message is a free download, and quite educational to read.



-- 
http://www.wearablesmartsensors.com/
http://www.softwaresafety.net/
http://www.designer-iii.com/
http://www.unusualresearch.com/


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

[avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread David Brown


Georg-Johann Lay wrote:

David Brown schrieb:

Georg-Johann Lay wrote:


As far as the optimizer of gcc is concerned, that makes no 
difference.  It knows exactly what register contains what value and 
is aware of the place where a register dies, i.e. the register can 
be reused for whatever other stuff. Anyway, even if just one temp 
variabe is used, gcc will produce a new (pseudo) register vor every 
result like moves, arithmetic, etc. These pseudos may or may not end 
up in the same macine register. On that level, blocks are just 
syntactic sugar (if they are not used to hide visibility, e.g. like 
in int tmp=0; {int tmp = 1;} )




I haven't looked at code generated for such switches (there is often 
so much of it), so I admit to having guessed a little.  I was thinking 
especially of when you have debug information enabled - that can force 
the compiler to keep variables in separate registers.


Are you really sure? As far as I know gcc produces the same code 


No, I'm not sure in this case (as I said, I haven't checked it).

regardless if optimization is on or not. If fact I would guess that it 
is a policy that the code *must* be the same regardless what debug level 
(if any) or debug format is used, and code beeing dependent on debug 
level/format is worth a bug report.




That is certainly not true.  Enabling debug information will disable or 
limit some optimisations.  gcc in general is pretty good at optimising 
code even when debugging is enabled (compared to many other compilers), 
but debugging formats are limited and that limits the compiler.  For 
example, most debugging formats are happy with a local variable being 
assigned to a register, but can't describe situations where the 
variable's register moves around.  Even the most sophisticated debugging 
formats can't cope with transforms such as for (x = 0; x  10; x++) 
... being transformed into x = 10; while (--x) ... which will often 
be smaller and faster.


To get a notion of the various machine intependant transformations, 
have a glance at gcc's output with -fdump-tree-all, and for the 
machine dependent it is -fdump-rtl all. They make clear that 
do-while, while, for and if-goto are just flavours of same sugar.




And here was me thinking the generated source code was sometimes a bit 
big to wade through...  Sometime I must look at this in more detail.


Yes, of course,  that example is much too complex. But for small 
examples it is very interesting to track how gcc is transforming and 
kneading and stiring the code again and again beyond recognition.




I was trying with a small example!  But as you say, it is interesting to 
look at this output, and I plan to do so for some code samples when I 
get the chance - thanks for the tip on these flags.




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread Georg-Johann Lay


David Brown schrieb:

Georg-Johann Lay wrote:
regardless if optimization is on or not. If fact I would guess that it 
is a policy that the code *must* be the same regardless what debug 
level (if any) or debug format is used, and code beeing dependent on 
debug level/format is worth a bug report.


uaaa, typo devil above. I meant debugging info enabled instead of 
optimization in the first line. With optimization my statement is 
obvious nonsense. blush.


That is certainly not true.  Enabling debug information will disable or 
limit some optimisations.  gcc in general is pretty good at optimising 


Can you make that explicit with an example? With code I mean code that 
will end up in the target, i.e. no code that lives in some .stab* or 
.debug* section.


But I must admit that I don't debug my code and consequently have debug 
info turned off, so I am not familiar with debugging info and maybe 
fundamentally wrong on gcc policies concerning that topic.


Turned on -g3 for a try in my actual AVR project (12k .text) does not 
show other sizes for .text, and static ram usage is exactly the same as 
without -g3. Size of .i, .s, .o and .elf stuff will increase because of 
debug info, but .text and .data et al. should not change, neither in 
size, nor in content.


Georg-Johann



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-03-01 Thread Georg-Johann Lay


David Brown schrieb:

I haven't looked at code generated for such switches (there is often so 
much of it), so I admit to having guessed a little.  I was thinking 
especially of when you have debug information enabled - that can force 
the compiler to keep variables in separate registers.


According to the following mail, not for GCC:

   http://gcc.gnu.org/ml/gcc-help/2009-03/msg00011.html





___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

[avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Nicholas Vinen


 If I wanted fast and small, I'd have done it in ASM. But half of the point
 of this exercise was to get my feet wet with C.
 So, both points accomplished. I have GCC up under AVR studio, a working
 project, and I feel reasonably confident with the code.
   
Despite avr-gcc being fairly dim about optimisations*, the results can
be pretty reasonable. I had to write one program for a Mega48 in
assembly language to get it fast enough. The goal was to linearly
interpolate 16 bit values from a table stored in flash fast enough to
output a 192kHz signal with the processor running at 20MHz. That gives
just over 100 cycles to fetch the values, do the interpolation, output
the value and update the pointers. I couldn't do it in C, even with
inline assembly, but managed it in assembly. Part of the trick was
storing everything in registers, which JUST fit for two channels.
However, the C compiler was able to do it in something like 160-180
cycles. So I wouldn't discount C performance being quite acceptable if
you are careful.

* On the other hand, it would be great if avr-gcc could perform some
basic optimisations that even a fairly inexperienced amateur could
manage. For example, things like unsigned char x, y; x = y4 could
use the nibble swap instruction rather than four shifts, and things like
unsigned short a; unsigned char b; b = a8 could pretty much just be
a single instruction. These are examples of things I've seen the
compiler waste cycles on, that are fairly obvious. In future if I see
the compiler doing silly things like this, is it worth me posting the
code  assembly output to this list?



Nicholas



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Parthasaradhi Nayani



 From: Nicholas Vinen h...@x256.org

 For example, things like unsigned char x, y;
 x = y4 could
 use the nibble swap instruction rather than four shifts,
 and things like

Shifting a byte or int right or left must push in 00s from the other side so 
swapping a nibble is not the right thing to do. So is the case with other 
examples. Correct me if I am wrong.

Nayani


  


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Georg-Johann Lay


Nicholas Vinen schrieb:

* On the other hand, it would be great if avr-gcc could perform some
basic optimisations that even a fairly inexperienced amateur could
manage. For example, things like unsigned char x, y; x = y4 could
use the nibble swap instruction rather than four shifts, and things like


The compiler uses swap if it can. Maybe there are some effects of 
implicit type conversion like

   unsigend char x, y, z; z = (x+y)  4;
which is quite different from
   z = x  4;


unsigned short a; unsigned char b; b = a8 could pretty much just be
a single instruction. These are examples of things I've seen the
compiler waste cycles on, that are fairly obvious. In future if I see


compiling the following code
unsigned char sh4 (unsigned char x)
{
return x  4;
}

unsigned char sh8 (unsigned short x)
{
return x  8;
}

with avr-gcc 4.3.2 and -Os yields (non-code stripped)

sh4:
swap r24 ;
andi r24,lo8(15) ; ,
ret

sh8:
mov r24,r25  ; , x
ret


the compiler doing silly things like this, is it worth me posting the
code  assembly output to this list?
If you are sure it is really some performance issue/regression and not 
due to some language standard implication, you can add a report to

http://sourceforge.net/tracker/?group_id=68108

so that the subject won't be forgotten. Also mind
http://gcc.gnu.org/bugs.html

And of course, you can ask questions here. In that case it is helpful if 
you can manage to simplify the source to a small piece of code that 
triggers the problem and allows others to reproduce the problem. (i.e. 
no #include in the code, no ... (except for varargs), a.s.o).


Snippets of .s may point to the problem when you add -dp -fverbose-asm

And there are lots of places where avr-gcc produces suboptimal or even 
bad code, so feedback is welcome.


But note that just a few guys are working on the AVR part of gcc.
I would do more if I had the time (and the support of some gurus to ask 
questions on internals then and when...)




Nicholas


Georg-Johann


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Nicholas Vinen

Parthasaradhi Nayani wrote:
   
 From: Nicholas Vinen h...@x256.org
 

  For example, things like unsigned char x, y;
   
 x = y4 could
 use the nibble swap instruction rather than four shifts,
 and things like
 

 Shifting a byte or int right or left must push in 00s from the other side so 
 swapping a nibble is not the right thing to do. So is the case with other 
 examples. Correct me if I am wrong.

 Nayani

   
Yes, it has to blank the top 4 bits, but I believe it's still faster to
swap the nibble and do that than four shifts. Something like:

SWAP r1
LDI $15, r2
AND r2, r1

This is three instructions and three cycles, as opposed to:

LSR r1
LSR r1
LSR r1
LSR r1


which is four instructions and cycles. The former requires a spare
register but that generally isn't a problem.

This is just an example. I didn't note them down at the time but I saw
the compiler doing a lot of things the long way when there was a
simple, faster, smaller way to do it. The case of accessing some of the
bytes in a larger type via shifting was particularly annoying. Perhaps a
union would have solved that, but it seems silly to have to resort to
doing it that way.

Now that I've signed up to this list, if and when I come across avr-gcc
missing obvious optimisations I'll report them.



Nicholas

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Nicholas Vinen

Georg-Johann Lay wrote:
 compiling the following code
 unsigned char sh4 (unsigned char x)
 {
 return x  4;
 }

 unsigned char sh8 (unsigned short x)
 {
 return x  8;
 }

 with avr-gcc 4.3.2 and -Os yields (non-code stripped)

 sh4:
 swap r24 ;
 andi r24,lo8(15) ; ,
 ret

 sh8:
 mov r24,r25 ; , x
 ret
Interesting. It may be that either I was using an earlier version which
missed these optimisations, or else it was because my code was much more
complex and the optimiser therefore missed them. I suppose I can go back
and find the old code, compile it, and see what comes out now. I forgot
about andi, that makes it an even better optimisation, half as many
cycles and instructions.

 the compiler doing silly things like this, is it worth me posting the
 code  assembly output to this list?
 If you are sure it is really some performance issue/regression and not
 due to some language standard implication, you can add a report to
 http://sourceforge.net/tracker/?group_id=68108

 so that the subject won't be forgotten. Also mind
 http://gcc.gnu.org/bugs.html

 And of course, you can ask questions here. In that case it is helpful
 if you can manage to simplify the source to a small piece of code that
 triggers the problem and allows others to reproduce the problem. (i.e.
 no #include in the code, no ... (except for varargs), a.s.o).

 Snippets of .s may point to the problem when you add -dp -fverbose-asm

 And there are lots of places where avr-gcc produces suboptimal or even
 bad code, so feedback is welcome.

 But note that just a few guys are working on the AVR part of gcc.
 I would do more if I had the time (and the support of some gurus to
 ask questions on internals then and when...)
Yeah, this is one reason I haven't complained loudly in the past,
avr-gcc is already pretty good and I didn't want to apply a lot of
pressure to fix every little missed optimisation. However, it sure would
be nice. I'll see if I can dig up some of my old code now, before I
rewrote it in assembly. If it's still doing things the slow way I'll
point it out at the places you mention.

Thanks!


Nicholas



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Nicholas Vinen

Georg-Johann Lay wrote:
 If you are sure it is really some performance issue/regression and not
 due to some language standard implication, you can add a report to
 http://sourceforge.net/tracker/?group_id=68108

 so that the subject won't be forgotten. Also mind
 http://gcc.gnu.org/bugs.html

 And of course, you can ask questions here. In that case it is helpful
 if you can manage to simplify the source to a small piece of code that
 triggers the problem and allows others to reproduce the problem. (i.e.
 no #include in the code, no ... (except for varargs), a.s.o).

 Snippets of .s may point to the problem when you add -dp -fverbose-asm

 And there are lots of places where avr-gcc produces suboptimal or even
 bad code, so feedback is welcome.

 But note that just a few guys are working on the AVR part of gcc.
 I would do more if I had the time (and the support of some gurus to
 ask questions on internals then and when...)
OK, I only spent a few minutes looking at old code and I found some
obviously sub-optimal results. It distills down to this:

#include avr/io.h

int main(void) {
  unsigned long packet = 0;

  while(1) {
if( !(PINC  _BV(PC2)) ) {
  packet = (packet1)|(((unsigned char)PINC1)1);
}
PORTB = packet;
  }
}


avr-gcc is: avr-gcc (Gentoo 4.3.3 p1.0, pie-10.1.5) 4.3.3

The avr/io stuff is just so that it won't optimise the code away to nothing.

I tried compiling it with both -Os and -O2:

avr-gcc -g -dp -fverbose-asm -Os -S -mmcu=atmega48 -o test test.c

The result includes this:

lsl r18  ;  packet   ;  50  *ashlsi3_const/2[length = 4]
rol r19  ;  packet
rol r20  ;  packet
rol r21  ;  packet
in r24,38-0x20   ;  D.1214,  ;  16  *movqi/4[length = 1]
lsr r24  ;  D.1214   ;  17  lshrqi3/3   [length = 1]
ldi r25,lo8(0)   ; , ;  48  *movqi/2[length = 1]
ldi r26,lo8(0)   ; , ;  46  *movhi/4[length = 2]
ldi r27,hi8(0)   ; ,
andi r24,lo8(1)  ;  tmp52,   ;  19  andsi3/2[length = 4]
andi r25,hi8(1)  ;  tmp52,
andi r26,hlo8(1) ;  tmp52,
andi r27,hhi8(1) ;  tmp52,
or r18,r24   ;  packet, tmp52;  20  iorsi3/1[length 
= 4]
or r19,r25   ;  packet, tmp52
or r20,r26   ;  packet, tmp52
or r21,r27   ;  packet, tmp52



The problem, it seems, is that the compiler doesn't realize that the
right hand side of the expression can only have any non-zero values in
the bottom 8 bits, since it's an unsigned char which is being implicitly
expanded to 32 bits for the or operation. In fact, it's only the bottom
bit that's ever non-zero. As a result it's spending a number of cycles
and registers doing useless things. I'll copy a report to the locations
you mention in your e-mail.


There are probably ways to work around this, such as making packet a
union of an unsigned char and a long, then shifting the long and only
ORing in the unsigned char. I'll note that there's also an optimization
to be had with the right hand side of the expression. I would write the
assembly something like this:

lsl r18
rol r19
rol r20
rol r21
in r24,38-0x20
bst r24, 1
bld r18, 0


I'm sure I can find other examples of poor code generation in this
particular file, since I remember coming across many cases where I
replaced the generated code with inline assembly when I was originally
working on it, but that will have to wait for later.


Thanks for your help, I appreciate it. As I said avr-gcc is pretty good,
but I would love it if it could get even better :)



Nicholas

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric

 -Original Message-
 From: 
 avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
 [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
 org] On Behalf Of Nicholas Vinen
 Sent: Saturday, February 28, 2009 6:21 AM
 To: partha_nay...@yahoo.com
 Cc: avr-gcc-list@nongnu.org
 Subject: Re: [avr-gcc-list] Re: C vs. assembly performance

 Now that I've signed up to this list, if and when I come 
 across avr-gcc missing obvious optimisations I'll report them.

We certainly appreciate bug reports. However, before you report them, please 
make sure that they haven't been reported already. An AVR toolchain bug list is 
kept here:

http://www.nongnu.org/avr-libc/bugs.html

There are already a number of missed optimization gcc bugs reported on that 
list. Some have even been fixed already, though they haven't been released 
through a toolchain distribution.

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Georg-Johann Lay


Nicholas Vinen schrieb:

Georg-Johann Lay wrote:


If you are sure it is really some performance issue/regression and not
due to some language standard implication, you can add a report to
   http://sourceforge.net/tracker/?group_id=68108

so that the subject won't be forgotten. Also mind
   http://gcc.gnu.org/bugs.html

And of course, you can ask questions here. In that case it is helpful
if you can manage to simplify the source to a small piece of code that
triggers the problem and allows others to reproduce the problem. (i.e.
no #include in the code, no ... (except for varargs), a.s.o).

Snippets of .s may point to the problem when you add -dp -fverbose-asm

And there are lots of places where avr-gcc produces suboptimal or even
bad code, so feedback is welcome.

But note that just a few guys are working on the AVR part of gcc.
I would do more if I had the time (and the support of some gurus to
ask questions on internals then and when...)


OK, I only spent a few minutes looking at old code and I found some
obviously sub-optimal results. It distills down to this:

#include avr/io.h

int main(void) {
  unsigned long packet = 0;

  while(1) {
if( !(PINC  _BV(PC2)) ) {
  packet = (packet1)|(((unsigned char)PINC1)1);
}
PORTB = packet;
  }
}

avr-gcc is: avr-gcc (Gentoo 4.3.3 p1.0, pie-10.1.5) 4.3.3

The avr/io stuff is just so that it won't optimise the code away to nothing.



Please avoid the #include stuff. You can use source like this:

#define PINC  (*((unsigned char volatile*) 0x20))
#define PORTB (*((unsigned char volatile*) 0x21))

void foo ()
{
unsigned long packet = 0;

while(1)
{
if (!(PINC  (1  2)))
{
packet = (packet1)|(((unsigned char)PINC1)1);
}
PORTB = packet;
}
}


I tried compiling it with both -Os and -O2:

avr-gcc -g -dp -fverbose-asm -Os -S -mmcu=atmega48 -o test test.c

The result includes this:

lsl r18  ;  packet   ;  50  *ashlsi3_const/2[length = 4]
rol r19  ;  packet
rol r20  ;  packet
rol r21  ;  packet
in r24,38-0x20   ;  D.1214,  ;  16  *movqi/4[length = 1]
lsr r24  ;  D.1214   ;  17  lshrqi3/3   [length = 1]
ldi r25,lo8(0)   ; , ;  48  *movqi/2[length = 1]
ldi r26,lo8(0)   ; , ;  46  *movhi/4[length = 2]
ldi r27,hi8(0)   ; ,
andi r24,lo8(1)  ;  tmp52,   ;  19  andsi3/2[length = 4]
andi r25,hi8(1)  ;  tmp52,
andi r26,hlo8(1) ;  tmp52,
andi r27,hhi8(1) ;  tmp52,
or r18,r24   ;  packet, tmp52;  20  iorsi3/1[length 
= 4]
or r19,r25   ;  packet, tmp52
or r20,r26   ;  packet, tmp52
or r21,r27   ;  packet, tmp52

The problem, it seems, is that the compiler doesn't realize that the
right hand side of the expression can only have any non-zero values in
the bottom 8 bits, since it's an unsigned char which is being implicitly
expanded to 32 bits for the or operation. In fact, it's only the bottom
bit that's ever non-zero. As a result it's spending a number of cycles
and registers doing useless things. I'll copy a report to the locations
you mention in your e-mail.


Note that this may partially be covered by report 145284 (which I cannot 
find, maybe Eric has closed/removed it)


I already filed a patch for that in
   http://lists.gnu.org/archive/html/avr-gcc-list/2008-12/msg00019.html
that covers your issue to some extent or maybe almost complete: The new 
pattern *iorMODE2_MODEbit0 would match some parts of your code.



There are probably ways to work around this, such as making packet a
union of an unsigned char and a long, then shifting the long and only
ORing in the unsigned char. I'll note that there's also an optimization
to be had with the right hand side of the expression. I would write the
assembly something like this:


lsl r18
rol r19
rol r20
rol r21
in r24,38-0x20
bst r24, 1
bld r18, 0

The result of the above patch should lead to something like
lsl r18
rol r19
rol r20
rol r21
in r24,38-0x20
bst r24, 1
sbrs r18, 0
bld r18, 0
The SBRS is necessary, because the pattern is not aware of the fact that 
r18.0 is 0. Maybe the optimization is even better (or waeker); I am not 
using that patch at the moment and can just estimate its effect when 
peeking into rtl dumps of an unpatched gcc.


Concerning the patch itself, I don't know anything about its fate and if 
it will ever make its way into gcc because of administrative obstacles 
and the technique I used.


I don't like the technique I am using because it leads to complex 
patterns that are hard to understand an test and will become useless if 
the middleend decides to represent the stuff in a slightly different way...


Georg-Johann



___
AVR-GCC-list mailing list

RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric

 

 -Original Message-
 From: 
 avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
 [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
 org] On Behalf Of Georg-Johann Lay
 Sent: Saturday, February 28, 2009 9:00 AM
 To: Nicholas Vinen
 Cc: avr-gcc-list@nongnu.org
 Subject: Re: [avr-gcc-list] Re: C vs. assembly performance
 
 
 Note that this may partially be covered by report 145284 
 (which I cannot 
 find, maybe Eric has closed/removed it)
 

I'm sorry, but on which project?


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Georg-Johann Lay


Weddington, Eric schrieb:
 




-Original Message-
From: 
avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
[mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.

org] On Behalf Of Georg-Johann Lay
Sent: Saturday, February 28, 2009 9:00 AM
To: Nicholas Vinen
Cc: avr-gcc-list@nongnu.org
Subject: Re: [avr-gcc-list] Re: C vs. assembly performance


Note that this may partially be covered by report 145284 
(which I cannot 
find, maybe Eric has closed/removed it)


Sorry, my mistake. I wrote the patch because I saw gcc 4 making bad code 
(campared with 3.4.6) and read some bad-optimization reports that 
address the subject on similar code.


It was not initiated by a specific report. However, several bug reports 
on performance regression will be fixed/get some release by that patch.


 I'm sorry, but on which project?

BTW, is there a reason why there is more than one bug list? I saw a 
third, but cannot find it again (maybe just a view on the sourceforge list).


http://www.nongnu.org/avr-libc/bugs.html
http://sourceforge.net/tracker2/?group_id=68108atid=520074

Georg-Johann



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric

 -Original Message-
 From: Georg-Johann Lay [mailto:a...@gjlay.de] 
 Sent: Saturday, February 28, 2009 10:09 AM
 To: Weddington, Eric
 Cc: Nicholas Vinen; avr-gcc-list@nongnu.org
 Subject: Re: [avr-gcc-list] Re: C vs. assembly performance

 Weddington, Eric schrieb:

 BTW, is there a reason why there is more than one bug list?

Yes, because there is more than one project. ;-)

GNU Binutils
GCC
Avr-libc

Each of those projects are separate and they each have their own bug list.

WinAVR has its own bug list for 2 reasons:
- There may be bugs to the installation itself, which has nothing to do with 
the underlying projects
- It is used as a catch-all starting point for users who are not used to 
filing bug reports with open source projects. It is easier to point them there 
first, then then bugs can be analysed and reported to upstream projects.

Although, I admit that I haven't gone through the WinAVR bug list with the 
intent to move bugs upstream in some time. I'm planning on doing that after the 
WinAVR release, to clean up a bit.

I keep track of the AVR specific bugs in binutils, gcc, gdb here:
http://www.nongnu.org/avr-libc/bugs.html
And that page has links to the other projects' bug lists.

Eric

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

[avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread David Brown


Nicholas Vinen wrote:
OK, I only spent a few minutes looking at old code and I found some 
obviously sub-optimal results. It distills down to this:


#include avr/io.h

int main(void) {
  unsigned long packet = 0;

  while(1) {
if( !(PINC  _BV(PC2)) ) {
  packet = (packet1)|(((unsigned char)PINC1)1);
}
PORTB = packet;
  }
}



Did you write the code like this just to test the optimiser?  It 
certainly gives it more of a challenge than most code, since it contains 
32-bit data (the compiler writers will place more emphasis on getting 
good code for far more common 8-bit and 16-bit data), and the compiler 
must combat the C rules for integer promotion to generate ideal code.


Try re-writing your code like this (which I think is clearer anyway):

int main(void) {
  unsigned long packet = 0;

  while (1) {
if (!(PINC  _BV(PC2))) {
  packet = 1;
  if (PINC  0x02) {
  packet |= 0x01;
  };
}
PORTB = packet;
  }
}

This generates:

  77main:
  78/* prologue: frame size=0 */
  79/* prologue end (size=0) */
  80 0032 80E0  ldi r24,lo8(0)   ;  packet,
  81 0034 90E0  ldi r25,hi8(0)   ;  packet,
  82 0036 A0E0  ldi r26,hlo8(0)  ;  packet,
  83 0038 B0E0  ldi r27,hhi8(0)  ;  packet,
  84.L7:
  85 003a 9A99  sbic 51-0x20,2   ; ,
  86 003c 00C0  rjmp .L8 ;
  87 003e 880F  lsl r24  ;  packet
  88 0040 991F  rol r25  ;  packet
  89 0042 AA1F  rol r26  ;  packet
  90 0044 BB1F  rol r27  ;  packet
  91 0046   sbic 51-0x20,1   ; ,
  92 0048 8160  ori r24,lo8(1)   ;  packet,
  93.L8:
  94 004a 88BB  out 56-0x20,r24  ; , packet
  95 004c 00C0  rjmp .L7 ;

You may note that this code is in fact one instruction and one cycle 
shorter than your hand-written assembly...



I'm not disputing the fact that avr-gcc's optimiser does not always 
generate optimal code.  And there are certainly types of code which 
can be written smaller and faster in assembly than using any realistic 
compiler, simply because you can use techniques that are virtually 
impossible in C or which would require a totally different way of 
compiling code (using dedicated registers is a prime example).


However, avr-gcc constantly surprises me in the quality of its code 
generation - it really is very good, and it has got steadily better 
through the years.  Sometimes it pays to think a bit about the way your 
source code is structured, and maybe test out different arrangements.


mvh.,

David



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Georg-Johann Lay


David Brown schrieb:

Nicholas Vinen wrote:

OK, I only spent a few minutes looking at old code and I found some 
obviously sub-optimal results. It distills down to this:


#include avr/io.h

int main(void) {
  unsigned long packet = 0;

  while(1) {
if( !(PINC  _BV(PC2)) ) {
  packet = (packet1)|(((unsigned char)PINC1)1);
}
PORTB = packet;
  }
}


Did you write the code like this just to test the optimiser?  It 


As far as I understand, it's a stripped down example to demonstrate the 
code bloat in a reproducable way (combileable source).


However, avr-gcc constantly surprises me in the quality of its code 
generation - it really is very good, and it has got steadily better 
through the years.  Sometimes it pays to think a bit about the way your 
source code is structured, and maybe test out different arrangements.


Source code structure is a concern of the project, not of the compiler.
Even for braindead code that comes from a code generator a compiler is 
supposed to yield good results.


I am inspecting the produced asm in some of my AVR projects with hard 
realtime requirements, too. But I would not encourage anyone to dig in 
the generated asm and try to get best code by re-arranging it or trying 
to find other algebraic representations. That takes a lot of time, and a 
compiler should care for the sources it gets, not the other way round. 
And if your code is intended to be cross-platform, you are stuck. If 
your code changes some 100 source lines away from the critical code, the 
inefficient code can return and you have to rewrite your code again to 
find another representation that avoids the bad code.


Georg-Johann


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric

 -Original Message-
 From: 
 avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
 [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
 org] On Behalf Of Bob Paddock
 Sent: Saturday, February 28, 2009 6:40 PM
 To: avr-gcc-list@nongnu.org
 Subject: Re: [avr-gcc-list] Re: C vs. assembly performance

   In practice, the experienced programmer can do a lot to help
  the tools. avr-gcc *does* do a good job with most code - I do
  much less re-structuring of my source code for avr-gcc than I do
  for most other compilers (I use a lot of compilers for a 
 lot of different targets).

 Something I always found amusing/depressing is that some
 compilers generate smaller code for ++i than i++ everything
 being equal.  Then other compilers generate smaller code for i++ than
 ++i.  So in the embedded space you have to know what your tools are
 doing. Sadly it should not be this way.

 avr-gcc generates the same size code in any case that I've looked at.

Along the same lines of you should know what your compiler generates is the 
use of switch statements. They can be implemented (in the code generation) in 
many different ways, and it is based on heuristics. These heuristics are not 
always tuned the best way for the target.

So in application code I tend to avoid switch statements for embedded systems, 
unless I'm writing throw-away code or the application is trivial.

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Vincent Trouilliez

On Sat, 28 Feb 2009 19:09:13 -0700
Weddington, Eric ewedding...@cso.atmel.com wrote:

 So in application code I tend to avoid switch statements for embedded 
 systems, unless I'm writing throw-away code or the application is trivial.

Oh no ! ;-)
I have only recently got round to using switch statements, to improve
code legibility. In my current/first embedded project, I happen to have
a very long (25 cases, 160 lines long) switch statement.. I dread to
think what it would like if I had to replace it (what else with ?) with
nested if's !
How readable would that be... not to mention that with indentation, 25
levels of nesting would mean the last case would be 3 meters on the far
right... ;-)

Any coding tips to make all this look about readable by human
beings ?! ;-/

--
Vince, catastrophed...


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric

 -Original Message-
 From: 
 avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
 [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
 org] On Behalf Of Vincent Trouilliez
 Sent: Saturday, February 28, 2009 7:20 PM
 To: avr-gcc-list@nongnu.org
 Subject: Re: [avr-gcc-list] Re: C vs. assembly performance

 On Sat, 28 Feb 2009 19:09:13 -0700
 Weddington, Eric ewedding...@cso.atmel.com wrote:

  So in application code I tend to avoid switch statements 
 for embedded systems, unless I'm writing throw-away code or 
 the application is trivial.

 Oh no ! ;-)
 I have only recently got round to using switch statements, to improve
 code legibility. In my current/first embedded project, I 
 happen to have
 a very long (25 cases, 160 lines long) switch statement.. I dread to
 think what it would like if I had to replace it (what else 
 with ?) with
 nested if's !
 How readable would that be... not to mention that with indentation, 25
 levels of nesting would mean the last case would be 3 meters 
 on the far
 right... ;-)

 Any coding tips to make all this look about readable by human
 beings ?! ;-/

You wouldn't need *nested* ifs, but an if-else-if structure, or better yet, a 
table of function pointers, also known as a dispatch table. Each method depends 
on the type of data that you're switching on.

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Vincent Trouilliez

On Sat, 28 Feb 2009 19:24:38 -0700
Weddington, Eric ewedding...@cso.atmel.com wrote:
 You wouldn't need *nested* ifs, but an if-else-if structure, or better yet, a 
 table of function pointers, also known as a dispatch table. Each method 
 depends on the type of data that you're switching on.

I switch on an unsigned byte, with contiguous values (0-24).
A Function table sounds elegant to my ear, but it would mean 25
functions ! In my case the work is done in-situ, within the switch
statement, as it's takes only 2 or 3 statements to process a givne
case. Using functions would be both overkill and overwhelming to
manage I think !! ;-)

I pasted my switch statement below for the curious.


--
Vince



void var_format(char *buff, uint8_t var_num)
{
uint16_t tmp16;
uint32_t tmp32;
int16_t tmpS16;

char Yes[] =  Yes;
char No[] =No;
char Unknown[] =-\?\?\?-;

switch (var_num)
{
case ECU_COOLANT: 
{// (N * 0.75) - 40 DB41
-40 to +150 °C
tmpS16 = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40;
var_format_S16(buff, tmpS16, 0);
break;
}
case ECU_ENGINE_SPEED: 
{// MSB:LSB DB12:13 
0 to    RPM
ATOMIC_BLOCK(ATOMIC_FORCEON)
{
tmp16 = ((uint16_t)KLineFrameM1[12]  8) + 
(uint16_t)KLineFrameM1[13];
}
var_format_S16(buff, (int16_t)tmp16, 0);
break;
}
case ECU_ROAD_SPEED: 
{// DB14uint8_t MPH
var_format_byte(buff, KLineFrameM1[8]);
break;
}
case ECU_BARO_AIR_PRESSURE: 
{// ((N-128)/100)+1 DB24
-0.28 to +2.27 Bar
tmpS16 = (int16_t)(KLineFrameM1[24]) - 28;
var_format_S16(buff, tmpS16, 2);
break;
}
case ECU_MAP_PRESSURE: 
{// ((N-130)/100)+1  DB25   -0.30 
to +2.25 Bar
tmpS16 = (int16_t)(KLineFrameM1[25]) - 30;
var_format_S16(buff, tmpS16, 2);
break;
}
case ECU_MAT_TEMP: 
{// (N * 0.75) - 40 DB42
-40 to +150 °C
tmpS16 = (int16_t)(KLineFrameM1[42]) * 3 / 4 - 40;
var_format_S16(buff, tmpS16, 0);
break;
}
case ECU_THROTTLE_POSITION: 
{// N / 2.55DB27
0 to 100 %
tmp16 = (uint16_t)KLineFrameM1[27] * 100 / 255;
var_format_byte(buff, (uint8_t)tmp16);
break;
}
case ECU_ENGINE_LOAD: 
{// DB360 - 100 %
var_format_byte(buff, KLineFrameM1[36]);
break;
}
case ECU_KNOCK_COUNT: 
{// DB43uint8_t
var_format_byte(buff, KLineFrameM1[43]);
break;
}
case ECU_KNOCK_RETARD: 
{// (N * 45) / 255  DB44
0 to 45 Deg
tmp16 = (uint16_t)KLineFrameM1[44] * 90 / 51;
var_format_S16(buff, (int16_t)tmp16, 1);
break;
}
case ECU_SPARK_ADVANCE: 
{// (N * 9000)/256   MSB:LSB   DB39:400.00 Degrees
ATOMIC_BLOCK(ATOMIC_FORCEON)
{
tmp16 = ((uint16_t)KLineFrameM1[39]  8) + 
(uint16_t)KLineFrameM1[40];
}
tmp32 = (uint32_t)tmp16 * 9000 / 256;
var_format_S16(buff, (int16_t)tmp32, 2);
break;
}
case ECU_BOOST_DC: 
{// DB31uint8_t or %, don't know
var_format_byte(buff, KLineFrameM1[31]);
break;
}
case ECU_MAIN_INJ_DC: 
{// DB45uint8_t
var_format_byte(buff, KLineFrameM1[45]);
break;
}
case ECU_SECONDARY_INJ_DC: 
{// DB37

RE: [avr-gcc-list] Re: C vs. assembly performance

2009-02-28 Thread Weddington, Eric

 -Original Message-
 From: 
 avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org 
 [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.
 org] On Behalf Of Vincent Trouilliez
 Sent: Saturday, February 28, 2009 7:41 PM
 To: avr-gcc-list@nongnu.org
 Subject: Re: [avr-gcc-list] Re: C vs. assembly performance

 I switch on an unsigned byte, with contiguous values (0-24).
 A Function table sounds elegant to my ear, but it would mean 25
 functions ! In my case the work is done in-situ, within the switch
 statement, as it's takes only 2 or 3 statements to process a givne
 case. Using functions would be both overkill and overwhelming to
 manage I think !! ;-)

It is no more overkill and overwhelming than dealing with a single contiguous 
switch statement with 25 cases. I think it's just a matter of perspective. The 
one good thing is that each function really encapsulates a single idea and has 
nothing else, which *may* make maintenance easier. Implementing it this way 
really separates the ideas of 'making some choice' from the 'implementation of 
a single choice', rather than conflating the two together in a single massive 
switch statement.

Yes, your switch sounds like a candidate for a function table. However, like 
everything in engineering, there are trade-offs. The trade off here is the 
overhead for each function, plus the slight overhead for checking the range of 
the value used to index into the table. To be fair, you would have to write it 
both ways and see which produces the smaller code.

However, the other advantage for the function table is that you can guarantee 
that it will take the same amount of time, for each value, to start executing 
its associated function. With an if-else-if structure, the longer down the 
if-else-if chain you go then the longer it takes to execute the associated 
code. For a switch statement, it all depends on how the compiler generates code 
for it, which can change as you add more cases or remove cases from the switch. 
This leaves you out of control with regards to timing.

___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

[avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

[avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

RE: [avr-gcc-list] Re: C vs. assembly performance

[avr-gcc-list] Re: C vs. assembly performance

[avr-gcc-list] Re: C vs. assembly performance

RE: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

[avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

[avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

RE: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

RE: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

RE: [avr-gcc-list] Re: C vs. assembly performance

[avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

RE: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

RE: [avr-gcc-list] Re: C vs. assembly performance

Re: [avr-gcc-list] Re: C vs. assembly performance

RE: [avr-gcc-list] Re: C vs. assembly performance

33 matches

Site Navigation

Mail list logo

Footer information