[avr-gcc-list] Re: C vs. assembly performance
Georg-Johann Lay wrote: David Brown schrieb: Georg-Johann Lay wrote: regardless if optimization is on or not. If fact I would guess that it is a policy that the code *must* be the same regardless what debug level (if any) or debug format is used, and code beeing dependent on debug level/format is worth a bug report. uaaa, typo devil above. I meant debugging info enabled instead of optimization in the first line. With optimization my statement is obvious nonsense. blush. That is certainly not true. Enabling debug information will disable or limit some optimisations. gcc in general is pretty good at optimising Can you make that explicit with an example? With code I mean code that will end up in the target, i.e. no code that lives in some .stab* or .debug* section. But I must admit that I don't debug my code and consequently have debug info turned off, so I am not familiar with debugging info and maybe fundamentally wrong on gcc policies concerning that topic. Turned on -g3 for a try in my actual AVR project (12k .text) does not show other sizes for .text, and static ram usage is exactly the same as without -g3. Size of .i, .s, .o and .elf stuff will increase because of debug info, but .text and .data et al. should not change, neither in size, nor in content. A few quick tests here confirm what you've said. I looks like my understanding of debug data generation is a bit outdated. mvh., David ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
Vincent Trouilliez schrieb: On Sat, 28 Feb 2009 19:09:13 -0700 Weddington, Eric ewedding...@cso.atmel.com wrote: So in application code I tend to avoid switch statements for embedded systems, unless I'm writing throw-away code or the application is trivial. Oh no ! ;-) I have only recently got round to using switch statements, to improve code legibility. In my current/first embedded project, I happen to have a very long (25 cases, 160 lines long) switch statement.. I dread to think what it would like if I had to replace it (what else with ?) with nested if's ! How readable would that be... not to mention that with indentation, 25 levels of nesting would mean the last case would be 3 meters on the far right... ;-) Any coding tips to make all this look about readable by human beings ?! ;-/ You can write if () {} else if () {} else if () {} ... else {} instead of if () {} else if () {} else if () {} ... else {} (Optionally with one more level of {} in each but the last else) In fact, editors like emacs will indent the two sources differently, i.e. just in the way I indicated, even though there is absolutely no difference in semantic. And the first complies with coding standards like gnu and gcc, and maybe even others like misra etc. Georg-Johann ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
And the first complies with coding standards like gnu and gcc, and maybe even others like misra etc. MISRA doesn't say a lot about style, that is how pretty the code looks. It is implicit that ugly looking code is bad, and probably buggy. If your code looks like a IOCCC entry, your code would not be MISRA complaint. http://www.ioccc.org/ Per MISRA, in a nested if/else tree all brackets are required, and there must be a final else{} as you showed. As far as I can tell MISRA-2004 is silent on a preference between switch(), nested if/else() and function pointers. Personally I prefer tables of function pointers. Makes code looking like a threaded language like Forth at times. In any case what can get hairy fast is conditionals containing conditionals, which makes testing all execution paths problematic. Note that technically no AVR-LibC based project complies with MISRA due to rule #3.6 that says any library used, including the GCC libraries, must be complaint with MISRA [IEC 61508 Part 3]. I tend to follow the MISRA Guidelines anyway as it shows an attempt at Due Diligence. -- http://www.wearablesmartsensors.com/ http://www.softwaresafety.net/ http://www.designer-iii.com/ http://www.unusualresearch.com/ ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Re: C vs. assembly performance
Vincent Trouilliez wrote: On Sat, 28 Feb 2009 19:24:38 -0700 Weddington, Eric ewedding...@cso.atmel.com wrote: You wouldn't need *nested* ifs, but an if-else-if structure, or better yet, a table of function pointers, also known as a dispatch table. Each method depends on the type of data that you're switching on. I switch on an unsigned byte, with contiguous values (0-24). A Function table sounds elegant to my ear, but it would mean 25 functions ! In my case the work is done in-situ, within the switch statement, as it's takes only 2 or 3 statements to process a givne case. Using functions would be both overkill and overwhelming to manage I think !! ;-) I pasted my switch statement below for the curious. These sorts of decisions depend very much on the circumstances. I am not nearly as anti-switch as Eric - I would use a switch in this case. I agree with him that many switches could be better expressed as if-trees or function tables (though the compiler can often do a better job at optimising switches than function tables...). As Eric says, if's give you better control of your timing and priority, in case that is useful. An interesting structure for replacing switches is a binary if tree: // switch (x) for x = 0 .. 7 if (!(x 0x04)) { if (!(x 0x02)) { if (!(x 0x01)) { // case 0: ... } else { // case 1: } } else { if (!(x 0x01)) { // case 2: ... } else { // case 3: } } } else { ... } This lets you avoid the long delays you get with a flatter if structure if you have lots of cases. Unfortunately, this doesn't seem to generate good code according to my very brief testing - ideally, we should see a series of sbrs or sbrc instructions, but in practice C's irritating int-promotion feature is getting in the way. Apart from that, I've a couple of other comments on your code. The variable names tmp16, tmp32 and tmpS16 are truly awful. It is also (usually) best to declare such temporaries in as small a block as possible. Thus they should not be at the start of the function, but instead make your cases like this: {// (N * 0.75) - 40 DB41-40 to +150 °C int16_t temp = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40; var_format_S16(buff, temp, 0); break; } That will give clearer code and let the optimiser be more flexible in its register choices. Also avoid local automatic constant arrays (like Yes in your example) - they must be built on the stack each time the function is called, whether they are used or not. By making these static, they will have local names but be built once at the start of the program. If you are short on ram space, you might also want to make them PROGMEM. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
Thanks all for the input on switch statement, I appreciate. On Sun, 01 Mar 2009 14:37:57 +0100 David Brown david.br...@hesbynett.no wrote: Apart from that, I've a couple of other comments on your code. The variable names tmp16, tmp32 and tmpS16 are truly awful. Oh :-) I was just trying to give their meaningful names. They are just that: temporary/scratch integers used only to process the raw data byte before it can be sent to the function var_format_S16() which will convert it to ASCII. It is also (usually) best to declare such temporaries in as small a block as possible. Thus they should not be at the start of the function, but instead make your cases like this: {// (N * 0.75) - 40 DB41-40 to +150 °C int16_t temp = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40; var_format_S16(buff, temp, 0); break; } Hmm, actually I didn't even know one could declare local variables inanother place than the start of the function, I take note. That will give clearer code I find it just adds bloat and confusion: I have to declare the variable 25 times every 5 lines of code, that's overwheleming, plus it make the statement harder to parse (my brain is easily overloaded ;-), and also, it is a little confusing/harder to understand: if the variable is declared 25 times, one could think: well if it's a diffeent variable everytime, there must be something special/obscure that makes it impossible to use the same temp/scratch integer. This leaves a false impression that the processing of the 25 individual data bytes is more complex than it really is, and that a single temp integer could perfectly be used. Well that's just my view as a beginner.. maybe with time I will progressively start to see things the way you do ! ;-) Also avoid local automatic constant arrays (like Yes in your example) Yes I don't like them, but since they just that, local, I didn't want to declare them global, to avoid throwing bloat and confusion among all the variables declaration which are truly/genuinely global. - they must be built on the stack each time the function is called, whether they are used or not. Yes, I was conscious of this but it was trade-off between wasting time initialising them each time I enter the function.. and degrading code clarity by defining local variables among truly global ones. Doesn't make me happy.. like most compromises I suppose ! ;-) By making these static, they will have local names but be built once at the start of the program. He ? Wow, so I can get the best of both worlds, just be qualifying them static ?! Yahoo ! :o) Thanks David, you just made me and my code that little bit better ! :-) If you are short on ram space, you might also want to make them PROGMEM. Yeah I do do this at times, even though I have ample available RAM.. just because I don't like wasting things, be it MCU resources or money or anything, it's just a general state of mind ;-) The Yes and No strings I accepted to waste RAM on because they are very short. But other strings in the program, which are 20 characters long and used in several times, I declared them global to save memory. Thanks for you comment, made my program better and I, a little bit smarter/skilled ;-) -- Vince ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
David Brown schrieb: variable names tmp16, tmp32 and tmpS16 are truly awful. It is also (usually) best to declare such temporaries in as small a block as possible. Thus they should not be at the start of the function, but instead make your cases like this: {// (N * 0.75) - 40DB41-40 to +150 °C int16_t temp = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40; var_format_S16(buff, temp, 0); break; } That will give clearer code and let the optimiser be more flexible in its register choices. As far as the optimizer of gcc is concerned, that makes no difference. It knows exactly what register contains what value and is aware of the place where a register dies, i.e. the register can be reused for whatever other stuff. Anyway, even if just one temp variabe is used, gcc will produce a new (pseudo) register vor every result like moves, arithmetic, etc. These pseudos may or may not end up in the same macine register. On that level, blocks are just syntactic sugar (if they are not used to hide visibility, e.g. like in int tmp=0; {int tmp = 1;} ) To get a notion of the various machine intependant transformations, have a glance at gcc's output with -fdump-tree-all, and for the machine dependent it is -fdump-rtl all. They make clear that do-while, while, for and if-goto are just flavours of same sugar. Georg-Johann ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] Re: C vs. assembly performance
-Original Message- From: avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu. org] On Behalf Of Bob Paddock Sent: Sunday, March 01, 2009 6:22 AM To: avr-gcc-list@nongnu.org Subject: Re: [avr-gcc-list] Re: C vs. assembly performance And the first complies with coding standards like gnu and gcc, and maybe even others like misra etc. MISRA doesn't say a lot about style, that is how pretty the code looks. It is implicit that ugly looking code is bad, and probably buggy. If your code looks like a IOCCC entry, your code would not be MISRA complaint. http://www.ioccc.org/ Per MISRA, in a nested if/else tree all brackets are required, and there must be a final else{} as you showed. As far as I can tell MISRA-2004 is silent on a preference between switch(), nested if/else() and function pointers. Personally I prefer tables of function pointers. Makes code looking like a threaded language like Forth at times. In any case what can get hairy fast is conditionals containing conditionals, which makes testing all execution paths problematic. Note that technically no AVR-LibC based project complies with MISRA due to rule #3.6 that says any library used, including the GCC libraries, must be complaint with MISRA [IEC 61508 Part 3]. I tend to follow the MISRA Guidelines anyway as it shows an attempt at Due Diligence. Bob, s/complaint/compliant/g They're two different words. ;-) I would really like gcc to have a -wmisra switch someday, so we can test if an application, or library such as avr-libc is compliant. Not that I like MISRA anyway. I think it's a brain-dead standard. But do you have some tool that you have used to check avr-libc against MISRA? If so, do you have a list of issues that it found? Eric ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Re: C vs. assembly performance
Vincent Trouilliez wrote: Thanks all for the input on switch statement, I appreciate. On Sun, 01 Mar 2009 14:37:57 +0100 David Brown david.br...@hesbynett.no wrote: Apart from that, I've a couple of other comments on your code. The variable names tmp16, tmp32 and tmpS16 are truly awful. Oh :-) I was just trying to give their meaningful names. They are just that: temporary/scratch integers used only to process the raw data byte before it can be sent to the function var_format_S16() which will convert it to ASCII. It is also (usually) best to declare such temporaries in as small a block as possible. Thus they should not be at the start of the function, but instead make your cases like this: {// (N * 0.75) - 40 DB41-40 to +150 °C int16_t temp = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40; var_format_S16(buff, temp, 0); break; } Hmm, actually I didn't even know one could declare local variables inanother place than the start of the function, I take note. Even with older C standards, you can declare variables at the beginning of any block. With C99, you can declare them almost anywhere. That will give clearer code I find it just adds bloat and confusion: I have to declare the variable 25 times every 5 lines of code, that's overwheleming, plus it make the statement harder to parse (my brain is easily overloaded ;-), and also, it is a little confusing/harder to understand: if the variable is declared 25 times, one could think: well if it's a diffeent variable everytime, there must be something special/obscure that makes it impossible to use the same temp/scratch integer. This leaves a false impression that the processing of the 25 individual data bytes is more complex than it really is, and that a single temp integer could perfectly be used. It doesn't add bloat to your source code - it's an extra type declaration at the start of each use of your temporaries. You can change lines such as tmp32 = into int32_t tmp = , which is hardly bloat. And it will reduce bloat in the generated code. Each clause in your switch uses the variables in a slightly different way - there is no benefit to your source code in trying to manually force the compiler to re-use these variables. They are different uses, so you can declare them individually. Also remember that the smaller the scope of an identifier, the shorter the name you can use while keeping the code equally clear. A variable called tmp16S is a meaningless name over a span of 100 lines - a variable called t is perfectly clear over a two-line lifespan. Well that's just my view as a beginner.. maybe with time I will progressively start to see things the way you do ! ;-) Also avoid local automatic constant arrays (like Yes in your example) Yes I don't like them, but since they just that, local, I didn't want to declare them global, to avoid throwing bloat and confusion among all the variables declaration which are truly/genuinely global. That's what static within a function is for (well, one of its uses). Being static it exists for the lifetime of the program, being locally declared within a function limits its scope to that function. So although it *exists* globally, it is only *visible* locally. - they must be built on the stack each time the function is called, whether they are used or not. Yes, I was conscious of this but it was trade-off between wasting time initialising them each time I enter the function.. and degrading code clarity by defining local variables among truly global ones. Doesn't make me happy.. like most compromises I suppose ! ;-) A good compromise makes everyone equally unhappy. By making these static, they will have local names but be built once at the start of the program. He ? Wow, so I can get the best of both worlds, just be qualifying them static ?! Yahoo ! :o) Thanks David, you just made me and my code that little bit better ! :-) If you are short on ram space, you might also want to make them PROGMEM. Yeah I do do this at times, even though I have ample available RAM.. just because I don't like wasting things, be it MCU resources or money or anything, it's just a general state of mind ;-) That's a good attitude in general, but for small strings it's often easier to be a little wasteful. On of the weak points of the AVR is that you can't mix pointers to ram and pointers to flash, so using data in flash is a bit of a pain. The Yes and No strings I accepted to waste RAM on because they are very short. But other strings in the program, which are 20 characters long and used in several times, I declared them global to save memory. Make sure they are PROGMEM - that's what saves ram space, not the global or local declarations. Thanks for you comment, made my program better and I, a little bit smarter/skilled ;-) -- Vince ___ AVR-GCC-list mailing list
[avr-gcc-list] Re: C vs. assembly performance
Georg-Johann Lay wrote: David Brown schrieb: variable names tmp16, tmp32 and tmpS16 are truly awful. It is also (usually) best to declare such temporaries in as small a block as possible. Thus they should not be at the start of the function, but instead make your cases like this: {// (N * 0.75) - 40DB41-40 to +150 °C int16_t temp = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40; var_format_S16(buff, temp, 0); break; } That will give clearer code and let the optimiser be more flexible in its register choices. As far as the optimizer of gcc is concerned, that makes no difference. It knows exactly what register contains what value and is aware of the place where a register dies, i.e. the register can be reused for whatever other stuff. Anyway, even if just one temp variabe is used, gcc will produce a new (pseudo) register vor every result like moves, arithmetic, etc. These pseudos may or may not end up in the same macine register. On that level, blocks are just syntactic sugar (if they are not used to hide visibility, e.g. like in int tmp=0; {int tmp = 1;} ) I haven't looked at code generated for such switches (there is often so much of it), so I admit to having guessed a little. I was thinking especially of when you have debug information enabled - that can force the compiler to keep variables in separate registers. To get a notion of the various machine intependant transformations, have a glance at gcc's output with -fdump-tree-all, and for the machine dependent it is -fdump-rtl all. They make clear that do-while, while, for and if-goto are just flavours of same sugar. And here was me thinking the generated source code was sometimes a bit big to wade through... Sometime I must look at this in more detail. mvh., David ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] Re: C vs. assembly performance
-Original Message- From: Bob Paddock [mailto:bob.padd...@gmail.com] Sent: Sunday, March 01, 2009 9:59 AM To: Weddington, Eric Cc: avr-gcc-list@nongnu.org Subject: Re: [avr-gcc-list] Re: C vs. assembly performance Would be nice. There are probably legal issues with doing that. MISRA is one of those things you must buy. There are not Open Source versions, as MISRA does not allow the distribution of the rules without proper licensing. I wasn't aware of that. How disgusting. MISRA aside I think everyone should invest in their own copy of Lint, especially people new to the language. One of the few tools priced within the range of an individual. There is the Open Source Splint: http://www.splint.org/ , however I'm not very familiar with it. I've looked briefly into building splint and redistributing it with WinAVR. IIRC, I had problems with building it. But I still think it's a good idea. After a while you get to the point of writing your code where you end up with a dialog going on in your head: Lint is going to complain about that, do it this way instead. For example while(1){} per Lint and MISRA 13.7, should be for(;;){} to prevent constant value Boolean error. That is totally stupid. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
David Brown schrieb: Georg-Johann Lay wrote: Source code structure is a concern of the project, not of the compiler. Even for braindead code that comes from a code generator a compiler is supposed to yield good results. That's true in theory - but embedded programmers are used to the difference between theory and practice (there's an interesting discussion about the theory and practice of volatile on comp.arch.embedded at the moment). In theory, the compiler should generate good code no matter how the source code is structured. In practice, the experienced programmer can do a lot to help the tools. avr-gcc *does* do a good job with most code - I do much less re-structuring of my source code for avr-gcc than I do for most other compilers (I use a lot of compilers for a lot of different targets). Yes, I agree with you. You can help the tools a lot. At the moment I am in the strange and interesting situation just the other way round: Not tweaking a compiler to please a hardware, but to give a recommendation for a new ISA design to please a compiler (GCC). I am inspecting the produced asm in some of my AVR projects with hard realtime requirements, too. But I would not encourage anyone to dig in the generated asm and try to get best code by re-arranging it or trying to find other algebraic representations. That takes a lot of time, and a compiler should care for the sources it gets, not the other way round. And if your code is intended to be cross-platform, you are stuck. If your code changes some 100 source lines away from the critical code, the inefficient code can return and you have to rewrite your code again to find another representation that avoids the bad code. It is certainly true that you want to keep such compiler-helpful structuring to a minimum. But if you are trying to write efficient code (rather than emphasising portability or development speed or other priorities), you *must* be familiar with your compiler and the types of code it generates for particular sequences of input. You can very quickly learn some basic tricks that can make a great difference to the generated code with very little re-structuring of the source code. A prime example is to use 8-bit data rather than traditional C int where possible. Another case in point is to prefer explicit if conditionals rather than trying to calculate a conditional expression, such as was done here (if you are using a heavily pipelined processor, the opposite is true). GCC will more and more transform and canonicalize such stuff, i.e. it will turn if-else into flat (with respect to code flow) algebraic code. The backend can fix that, but the jumps then are implicit, no more explicit. And I was astonished to see examples where int instead of char yield better code. The cases are rare, but they exist. Georg-Johann ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
Would be nice. There are probably legal issues with doing that. MISRA is one of those things you must buy. There are not Open Source versions, as MISRA does not allow the distribution of the rules without proper licensing. I wasn't aware of that. How disgusting. Yes. Standards are largely a scam to make money for those that wrote them. They make sense to a degree, but the cost can be so exorbitant to an individual you could not afford them. UL is great for this. They make you comply to a standard like 913, then obsolete it so you must buy, and recertify to a new edition. This orneriness regulatory burden kills, or prevents, the small business. For example while(1){} per Lint and MISRA 13.7, should be for(;;){} to prevent constant value Boolean error. That is totally stupid. Welcome to standards. The first thing you have to realize when working with standards, in particular government standards, is that logic does not apply. http://www.gimpel.com/html/pub90/msg.txt see #716. The Gimpel Lint error message is a free download, and quite educational to read. -- http://www.wearablesmartsensors.com/ http://www.softwaresafety.net/ http://www.designer-iii.com/ http://www.unusualresearch.com/ ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Re: C vs. assembly performance
Georg-Johann Lay wrote: David Brown schrieb: Georg-Johann Lay wrote: As far as the optimizer of gcc is concerned, that makes no difference. It knows exactly what register contains what value and is aware of the place where a register dies, i.e. the register can be reused for whatever other stuff. Anyway, even if just one temp variabe is used, gcc will produce a new (pseudo) register vor every result like moves, arithmetic, etc. These pseudos may or may not end up in the same macine register. On that level, blocks are just syntactic sugar (if they are not used to hide visibility, e.g. like in int tmp=0; {int tmp = 1;} ) I haven't looked at code generated for such switches (there is often so much of it), so I admit to having guessed a little. I was thinking especially of when you have debug information enabled - that can force the compiler to keep variables in separate registers. Are you really sure? As far as I know gcc produces the same code No, I'm not sure in this case (as I said, I haven't checked it). regardless if optimization is on or not. If fact I would guess that it is a policy that the code *must* be the same regardless what debug level (if any) or debug format is used, and code beeing dependent on debug level/format is worth a bug report. That is certainly not true. Enabling debug information will disable or limit some optimisations. gcc in general is pretty good at optimising code even when debugging is enabled (compared to many other compilers), but debugging formats are limited and that limits the compiler. For example, most debugging formats are happy with a local variable being assigned to a register, but can't describe situations where the variable's register moves around. Even the most sophisticated debugging formats can't cope with transforms such as for (x = 0; x 10; x++) ... being transformed into x = 10; while (--x) ... which will often be smaller and faster. To get a notion of the various machine intependant transformations, have a glance at gcc's output with -fdump-tree-all, and for the machine dependent it is -fdump-rtl all. They make clear that do-while, while, for and if-goto are just flavours of same sugar. And here was me thinking the generated source code was sometimes a bit big to wade through... Sometime I must look at this in more detail. Yes, of course, that example is much too complex. But for small examples it is very interesting to track how gcc is transforming and kneading and stiring the code again and again beyond recognition. I was trying with a small example! But as you say, it is interesting to look at this output, and I plan to do so for some code samples when I get the chance - thanks for the tip on these flags. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
David Brown schrieb: Georg-Johann Lay wrote: regardless if optimization is on or not. If fact I would guess that it is a policy that the code *must* be the same regardless what debug level (if any) or debug format is used, and code beeing dependent on debug level/format is worth a bug report. uaaa, typo devil above. I meant debugging info enabled instead of optimization in the first line. With optimization my statement is obvious nonsense. blush. That is certainly not true. Enabling debug information will disable or limit some optimisations. gcc in general is pretty good at optimising Can you make that explicit with an example? With code I mean code that will end up in the target, i.e. no code that lives in some .stab* or .debug* section. But I must admit that I don't debug my code and consequently have debug info turned off, so I am not familiar with debugging info and maybe fundamentally wrong on gcc policies concerning that topic. Turned on -g3 for a try in my actual AVR project (12k .text) does not show other sizes for .text, and static ram usage is exactly the same as without -g3. Size of .i, .s, .o and .elf stuff will increase because of debug info, but .text and .data et al. should not change, neither in size, nor in content. Georg-Johann ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
David Brown schrieb: I haven't looked at code generated for such switches (there is often so much of it), so I admit to having guessed a little. I was thinking especially of when you have debug information enabled - that can force the compiler to keep variables in separate registers. According to the following mail, not for GCC: http://gcc.gnu.org/ml/gcc-help/2009-03/msg00011.html ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Re: C vs. assembly performance
If I wanted fast and small, I'd have done it in ASM. But half of the point of this exercise was to get my feet wet with C. So, both points accomplished. I have GCC up under AVR studio, a working project, and I feel reasonably confident with the code. Despite avr-gcc being fairly dim about optimisations*, the results can be pretty reasonable. I had to write one program for a Mega48 in assembly language to get it fast enough. The goal was to linearly interpolate 16 bit values from a table stored in flash fast enough to output a 192kHz signal with the processor running at 20MHz. That gives just over 100 cycles to fetch the values, do the interpolation, output the value and update the pointers. I couldn't do it in C, even with inline assembly, but managed it in assembly. Part of the trick was storing everything in registers, which JUST fit for two channels. However, the C compiler was able to do it in something like 160-180 cycles. So I wouldn't discount C performance being quite acceptable if you are careful. * On the other hand, it would be great if avr-gcc could perform some basic optimisations that even a fairly inexperienced amateur could manage. For example, things like unsigned char x, y; x = y4 could use the nibble swap instruction rather than four shifts, and things like unsigned short a; unsigned char b; b = a8 could pretty much just be a single instruction. These are examples of things I've seen the compiler waste cycles on, that are fairly obvious. In future if I see the compiler doing silly things like this, is it worth me posting the code assembly output to this list? Nicholas ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
From: Nicholas Vinen h...@x256.org For example, things like unsigned char x, y; x = y4 could use the nibble swap instruction rather than four shifts, and things like Shifting a byte or int right or left must push in 00s from the other side so swapping a nibble is not the right thing to do. So is the case with other examples. Correct me if I am wrong. Nayani ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
Nicholas Vinen schrieb: * On the other hand, it would be great if avr-gcc could perform some basic optimisations that even a fairly inexperienced amateur could manage. For example, things like unsigned char x, y; x = y4 could use the nibble swap instruction rather than four shifts, and things like The compiler uses swap if it can. Maybe there are some effects of implicit type conversion like unsigend char x, y, z; z = (x+y) 4; which is quite different from z = x 4; unsigned short a; unsigned char b; b = a8 could pretty much just be a single instruction. These are examples of things I've seen the compiler waste cycles on, that are fairly obvious. In future if I see compiling the following code unsigned char sh4 (unsigned char x) { return x 4; } unsigned char sh8 (unsigned short x) { return x 8; } with avr-gcc 4.3.2 and -Os yields (non-code stripped) sh4: swap r24 ; andi r24,lo8(15) ; , ret sh8: mov r24,r25 ; , x ret the compiler doing silly things like this, is it worth me posting the code assembly output to this list? If you are sure it is really some performance issue/regression and not due to some language standard implication, you can add a report to http://sourceforge.net/tracker/?group_id=68108 so that the subject won't be forgotten. Also mind http://gcc.gnu.org/bugs.html And of course, you can ask questions here. In that case it is helpful if you can manage to simplify the source to a small piece of code that triggers the problem and allows others to reproduce the problem. (i.e. no #include in the code, no ... (except for varargs), a.s.o). Snippets of .s may point to the problem when you add -dp -fverbose-asm And there are lots of places where avr-gcc produces suboptimal or even bad code, so feedback is welcome. But note that just a few guys are working on the AVR part of gcc. I would do more if I had the time (and the support of some gurus to ask questions on internals then and when...) Nicholas Georg-Johann ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
Parthasaradhi Nayani wrote: From: Nicholas Vinen h...@x256.org For example, things like unsigned char x, y; x = y4 could use the nibble swap instruction rather than four shifts, and things like Shifting a byte or int right or left must push in 00s from the other side so swapping a nibble is not the right thing to do. So is the case with other examples. Correct me if I am wrong. Nayani Yes, it has to blank the top 4 bits, but I believe it's still faster to swap the nibble and do that than four shifts. Something like: SWAP r1 LDI $15, r2 AND r2, r1 This is three instructions and three cycles, as opposed to: LSR r1 LSR r1 LSR r1 LSR r1 which is four instructions and cycles. The former requires a spare register but that generally isn't a problem. This is just an example. I didn't note them down at the time but I saw the compiler doing a lot of things the long way when there was a simple, faster, smaller way to do it. The case of accessing some of the bytes in a larger type via shifting was particularly annoying. Perhaps a union would have solved that, but it seems silly to have to resort to doing it that way. Now that I've signed up to this list, if and when I come across avr-gcc missing obvious optimisations I'll report them. Nicholas ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
Georg-Johann Lay wrote: compiling the following code unsigned char sh4 (unsigned char x) { return x 4; } unsigned char sh8 (unsigned short x) { return x 8; } with avr-gcc 4.3.2 and -Os yields (non-code stripped) sh4: swap r24 ; andi r24,lo8(15) ; , ret sh8: mov r24,r25 ; , x ret Interesting. It may be that either I was using an earlier version which missed these optimisations, or else it was because my code was much more complex and the optimiser therefore missed them. I suppose I can go back and find the old code, compile it, and see what comes out now. I forgot about andi, that makes it an even better optimisation, half as many cycles and instructions. the compiler doing silly things like this, is it worth me posting the code assembly output to this list? If you are sure it is really some performance issue/regression and not due to some language standard implication, you can add a report to http://sourceforge.net/tracker/?group_id=68108 so that the subject won't be forgotten. Also mind http://gcc.gnu.org/bugs.html And of course, you can ask questions here. In that case it is helpful if you can manage to simplify the source to a small piece of code that triggers the problem and allows others to reproduce the problem. (i.e. no #include in the code, no ... (except for varargs), a.s.o). Snippets of .s may point to the problem when you add -dp -fverbose-asm And there are lots of places where avr-gcc produces suboptimal or even bad code, so feedback is welcome. But note that just a few guys are working on the AVR part of gcc. I would do more if I had the time (and the support of some gurus to ask questions on internals then and when...) Yeah, this is one reason I haven't complained loudly in the past, avr-gcc is already pretty good and I didn't want to apply a lot of pressure to fix every little missed optimisation. However, it sure would be nice. I'll see if I can dig up some of my old code now, before I rewrote it in assembly. If it's still doing things the slow way I'll point it out at the places you mention. Thanks! Nicholas ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
Georg-Johann Lay wrote: If you are sure it is really some performance issue/regression and not due to some language standard implication, you can add a report to http://sourceforge.net/tracker/?group_id=68108 so that the subject won't be forgotten. Also mind http://gcc.gnu.org/bugs.html And of course, you can ask questions here. In that case it is helpful if you can manage to simplify the source to a small piece of code that triggers the problem and allows others to reproduce the problem. (i.e. no #include in the code, no ... (except for varargs), a.s.o). Snippets of .s may point to the problem when you add -dp -fverbose-asm And there are lots of places where avr-gcc produces suboptimal or even bad code, so feedback is welcome. But note that just a few guys are working on the AVR part of gcc. I would do more if I had the time (and the support of some gurus to ask questions on internals then and when...) OK, I only spent a few minutes looking at old code and I found some obviously sub-optimal results. It distills down to this: #include avr/io.h int main(void) { unsigned long packet = 0; while(1) { if( !(PINC _BV(PC2)) ) { packet = (packet1)|(((unsigned char)PINC1)1); } PORTB = packet; } } avr-gcc is: avr-gcc (Gentoo 4.3.3 p1.0, pie-10.1.5) 4.3.3 The avr/io stuff is just so that it won't optimise the code away to nothing. I tried compiling it with both -Os and -O2: avr-gcc -g -dp -fverbose-asm -Os -S -mmcu=atmega48 -o test test.c The result includes this: lsl r18 ; packet ; 50 *ashlsi3_const/2[length = 4] rol r19 ; packet rol r20 ; packet rol r21 ; packet in r24,38-0x20 ; D.1214, ; 16 *movqi/4[length = 1] lsr r24 ; D.1214 ; 17 lshrqi3/3 [length = 1] ldi r25,lo8(0) ; , ; 48 *movqi/2[length = 1] ldi r26,lo8(0) ; , ; 46 *movhi/4[length = 2] ldi r27,hi8(0) ; , andi r24,lo8(1) ; tmp52, ; 19 andsi3/2[length = 4] andi r25,hi8(1) ; tmp52, andi r26,hlo8(1) ; tmp52, andi r27,hhi8(1) ; tmp52, or r18,r24 ; packet, tmp52; 20 iorsi3/1[length = 4] or r19,r25 ; packet, tmp52 or r20,r26 ; packet, tmp52 or r21,r27 ; packet, tmp52 The problem, it seems, is that the compiler doesn't realize that the right hand side of the expression can only have any non-zero values in the bottom 8 bits, since it's an unsigned char which is being implicitly expanded to 32 bits for the or operation. In fact, it's only the bottom bit that's ever non-zero. As a result it's spending a number of cycles and registers doing useless things. I'll copy a report to the locations you mention in your e-mail. There are probably ways to work around this, such as making packet a union of an unsigned char and a long, then shifting the long and only ORing in the unsigned char. I'll note that there's also an optimization to be had with the right hand side of the expression. I would write the assembly something like this: lsl r18 rol r19 rol r20 rol r21 in r24,38-0x20 bst r24, 1 bld r18, 0 I'm sure I can find other examples of poor code generation in this particular file, since I remember coming across many cases where I replaced the generated code with inline assembly when I was originally working on it, but that will have to wait for later. Thanks for your help, I appreciate it. As I said avr-gcc is pretty good, but I would love it if it could get even better :) Nicholas ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] Re: C vs. assembly performance
-Original Message- From: avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu. org] On Behalf Of Nicholas Vinen Sent: Saturday, February 28, 2009 6:21 AM To: partha_nay...@yahoo.com Cc: avr-gcc-list@nongnu.org Subject: Re: [avr-gcc-list] Re: C vs. assembly performance Now that I've signed up to this list, if and when I come across avr-gcc missing obvious optimisations I'll report them. We certainly appreciate bug reports. However, before you report them, please make sure that they haven't been reported already. An AVR toolchain bug list is kept here: http://www.nongnu.org/avr-libc/bugs.html There are already a number of missed optimization gcc bugs reported on that list. Some have even been fixed already, though they haven't been released through a toolchain distribution. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
Nicholas Vinen schrieb: Georg-Johann Lay wrote: If you are sure it is really some performance issue/regression and not due to some language standard implication, you can add a report to http://sourceforge.net/tracker/?group_id=68108 so that the subject won't be forgotten. Also mind http://gcc.gnu.org/bugs.html And of course, you can ask questions here. In that case it is helpful if you can manage to simplify the source to a small piece of code that triggers the problem and allows others to reproduce the problem. (i.e. no #include in the code, no ... (except for varargs), a.s.o). Snippets of .s may point to the problem when you add -dp -fverbose-asm And there are lots of places where avr-gcc produces suboptimal or even bad code, so feedback is welcome. But note that just a few guys are working on the AVR part of gcc. I would do more if I had the time (and the support of some gurus to ask questions on internals then and when...) OK, I only spent a few minutes looking at old code and I found some obviously sub-optimal results. It distills down to this: #include avr/io.h int main(void) { unsigned long packet = 0; while(1) { if( !(PINC _BV(PC2)) ) { packet = (packet1)|(((unsigned char)PINC1)1); } PORTB = packet; } } avr-gcc is: avr-gcc (Gentoo 4.3.3 p1.0, pie-10.1.5) 4.3.3 The avr/io stuff is just so that it won't optimise the code away to nothing. Please avoid the #include stuff. You can use source like this: #define PINC (*((unsigned char volatile*) 0x20)) #define PORTB (*((unsigned char volatile*) 0x21)) void foo () { unsigned long packet = 0; while(1) { if (!(PINC (1 2))) { packet = (packet1)|(((unsigned char)PINC1)1); } PORTB = packet; } } I tried compiling it with both -Os and -O2: avr-gcc -g -dp -fverbose-asm -Os -S -mmcu=atmega48 -o test test.c The result includes this: lsl r18 ; packet ; 50 *ashlsi3_const/2[length = 4] rol r19 ; packet rol r20 ; packet rol r21 ; packet in r24,38-0x20 ; D.1214, ; 16 *movqi/4[length = 1] lsr r24 ; D.1214 ; 17 lshrqi3/3 [length = 1] ldi r25,lo8(0) ; , ; 48 *movqi/2[length = 1] ldi r26,lo8(0) ; , ; 46 *movhi/4[length = 2] ldi r27,hi8(0) ; , andi r24,lo8(1) ; tmp52, ; 19 andsi3/2[length = 4] andi r25,hi8(1) ; tmp52, andi r26,hlo8(1) ; tmp52, andi r27,hhi8(1) ; tmp52, or r18,r24 ; packet, tmp52; 20 iorsi3/1[length = 4] or r19,r25 ; packet, tmp52 or r20,r26 ; packet, tmp52 or r21,r27 ; packet, tmp52 The problem, it seems, is that the compiler doesn't realize that the right hand side of the expression can only have any non-zero values in the bottom 8 bits, since it's an unsigned char which is being implicitly expanded to 32 bits for the or operation. In fact, it's only the bottom bit that's ever non-zero. As a result it's spending a number of cycles and registers doing useless things. I'll copy a report to the locations you mention in your e-mail. Note that this may partially be covered by report 145284 (which I cannot find, maybe Eric has closed/removed it) I already filed a patch for that in http://lists.gnu.org/archive/html/avr-gcc-list/2008-12/msg00019.html that covers your issue to some extent or maybe almost complete: The new pattern *iorMODE2_MODEbit0 would match some parts of your code. There are probably ways to work around this, such as making packet a union of an unsigned char and a long, then shifting the long and only ORing in the unsigned char. I'll note that there's also an optimization to be had with the right hand side of the expression. I would write the assembly something like this: lsl r18 rol r19 rol r20 rol r21 in r24,38-0x20 bst r24, 1 bld r18, 0 The result of the above patch should lead to something like lsl r18 rol r19 rol r20 rol r21 in r24,38-0x20 bst r24, 1 sbrs r18, 0 bld r18, 0 The SBRS is necessary, because the pattern is not aware of the fact that r18.0 is 0. Maybe the optimization is even better (or waeker); I am not using that patch at the moment and can just estimate its effect when peeking into rtl dumps of an unpatched gcc. Concerning the patch itself, I don't know anything about its fate and if it will ever make its way into gcc because of administrative obstacles and the technique I used. I don't like the technique I am using because it leads to complex patterns that are hard to understand an test and will become useless if the middleend decides to represent the stuff in a slightly different way... Georg-Johann ___ AVR-GCC-list mailing list
RE: [avr-gcc-list] Re: C vs. assembly performance
-Original Message- From: avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu. org] On Behalf Of Georg-Johann Lay Sent: Saturday, February 28, 2009 9:00 AM To: Nicholas Vinen Cc: avr-gcc-list@nongnu.org Subject: Re: [avr-gcc-list] Re: C vs. assembly performance Note that this may partially be covered by report 145284 (which I cannot find, maybe Eric has closed/removed it) I'm sorry, but on which project? ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
Weddington, Eric schrieb: -Original Message- From: avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu. org] On Behalf Of Georg-Johann Lay Sent: Saturday, February 28, 2009 9:00 AM To: Nicholas Vinen Cc: avr-gcc-list@nongnu.org Subject: Re: [avr-gcc-list] Re: C vs. assembly performance Note that this may partially be covered by report 145284 (which I cannot find, maybe Eric has closed/removed it) Sorry, my mistake. I wrote the patch because I saw gcc 4 making bad code (campared with 3.4.6) and read some bad-optimization reports that address the subject on similar code. It was not initiated by a specific report. However, several bug reports on performance regression will be fixed/get some release by that patch. I'm sorry, but on which project? BTW, is there a reason why there is more than one bug list? I saw a third, but cannot find it again (maybe just a view on the sourceforge list). http://www.nongnu.org/avr-libc/bugs.html http://sourceforge.net/tracker2/?group_id=68108atid=520074 Georg-Johann ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] Re: C vs. assembly performance
-Original Message- From: Georg-Johann Lay [mailto:a...@gjlay.de] Sent: Saturday, February 28, 2009 10:09 AM To: Weddington, Eric Cc: Nicholas Vinen; avr-gcc-list@nongnu.org Subject: Re: [avr-gcc-list] Re: C vs. assembly performance Weddington, Eric schrieb: BTW, is there a reason why there is more than one bug list? Yes, because there is more than one project. ;-) GNU Binutils GCC Avr-libc Each of those projects are separate and they each have their own bug list. WinAVR has its own bug list for 2 reasons: - There may be bugs to the installation itself, which has nothing to do with the underlying projects - It is used as a catch-all starting point for users who are not used to filing bug reports with open source projects. It is easier to point them there first, then then bugs can be analysed and reported to upstream projects. Although, I admit that I haven't gone through the WinAVR bug list with the intent to move bugs upstream in some time. I'm planning on doing that after the WinAVR release, to clean up a bit. I keep track of the AVR specific bugs in binutils, gcc, gdb here: http://www.nongnu.org/avr-libc/bugs.html And that page has links to the other projects' bug lists. Eric ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
[avr-gcc-list] Re: C vs. assembly performance
Nicholas Vinen wrote: OK, I only spent a few minutes looking at old code and I found some obviously sub-optimal results. It distills down to this: #include avr/io.h int main(void) { unsigned long packet = 0; while(1) { if( !(PINC _BV(PC2)) ) { packet = (packet1)|(((unsigned char)PINC1)1); } PORTB = packet; } } Did you write the code like this just to test the optimiser? It certainly gives it more of a challenge than most code, since it contains 32-bit data (the compiler writers will place more emphasis on getting good code for far more common 8-bit and 16-bit data), and the compiler must combat the C rules for integer promotion to generate ideal code. Try re-writing your code like this (which I think is clearer anyway): int main(void) { unsigned long packet = 0; while (1) { if (!(PINC _BV(PC2))) { packet = 1; if (PINC 0x02) { packet |= 0x01; }; } PORTB = packet; } } This generates: 77main: 78/* prologue: frame size=0 */ 79/* prologue end (size=0) */ 80 0032 80E0 ldi r24,lo8(0) ; packet, 81 0034 90E0 ldi r25,hi8(0) ; packet, 82 0036 A0E0 ldi r26,hlo8(0) ; packet, 83 0038 B0E0 ldi r27,hhi8(0) ; packet, 84.L7: 85 003a 9A99 sbic 51-0x20,2 ; , 86 003c 00C0 rjmp .L8 ; 87 003e 880F lsl r24 ; packet 88 0040 991F rol r25 ; packet 89 0042 AA1F rol r26 ; packet 90 0044 BB1F rol r27 ; packet 91 0046 sbic 51-0x20,1 ; , 92 0048 8160 ori r24,lo8(1) ; packet, 93.L8: 94 004a 88BB out 56-0x20,r24 ; , packet 95 004c 00C0 rjmp .L7 ; You may note that this code is in fact one instruction and one cycle shorter than your hand-written assembly... I'm not disputing the fact that avr-gcc's optimiser does not always generate optimal code. And there are certainly types of code which can be written smaller and faster in assembly than using any realistic compiler, simply because you can use techniques that are virtually impossible in C or which would require a totally different way of compiling code (using dedicated registers is a prime example). However, avr-gcc constantly surprises me in the quality of its code generation - it really is very good, and it has got steadily better through the years. Sometimes it pays to think a bit about the way your source code is structured, and maybe test out different arrangements. mvh., David ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
David Brown schrieb: Nicholas Vinen wrote: OK, I only spent a few minutes looking at old code and I found some obviously sub-optimal results. It distills down to this: #include avr/io.h int main(void) { unsigned long packet = 0; while(1) { if( !(PINC _BV(PC2)) ) { packet = (packet1)|(((unsigned char)PINC1)1); } PORTB = packet; } } Did you write the code like this just to test the optimiser? It As far as I understand, it's a stripped down example to demonstrate the code bloat in a reproducable way (combileable source). However, avr-gcc constantly surprises me in the quality of its code generation - it really is very good, and it has got steadily better through the years. Sometimes it pays to think a bit about the way your source code is structured, and maybe test out different arrangements. Source code structure is a concern of the project, not of the compiler. Even for braindead code that comes from a code generator a compiler is supposed to yield good results. I am inspecting the produced asm in some of my AVR projects with hard realtime requirements, too. But I would not encourage anyone to dig in the generated asm and try to get best code by re-arranging it or trying to find other algebraic representations. That takes a lot of time, and a compiler should care for the sources it gets, not the other way round. And if your code is intended to be cross-platform, you are stuck. If your code changes some 100 source lines away from the critical code, the inefficient code can return and you have to rewrite your code again to find another representation that avoids the bad code. Georg-Johann ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] Re: C vs. assembly performance
-Original Message- From: avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu. org] On Behalf Of Bob Paddock Sent: Saturday, February 28, 2009 6:40 PM To: avr-gcc-list@nongnu.org Subject: Re: [avr-gcc-list] Re: C vs. assembly performance In practice, the experienced programmer can do a lot to help the tools. avr-gcc *does* do a good job with most code - I do much less re-structuring of my source code for avr-gcc than I do for most other compilers (I use a lot of compilers for a lot of different targets). Something I always found amusing/depressing is that some compilers generate smaller code for ++i than i++ everything being equal. Then other compilers generate smaller code for i++ than ++i. So in the embedded space you have to know what your tools are doing. Sadly it should not be this way. avr-gcc generates the same size code in any case that I've looked at. Along the same lines of you should know what your compiler generates is the use of switch statements. They can be implemented (in the code generation) in many different ways, and it is based on heuristics. These heuristics are not always tuned the best way for the target. So in application code I tend to avoid switch statements for embedded systems, unless I'm writing throw-away code or the application is trivial. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
On Sat, 28 Feb 2009 19:09:13 -0700 Weddington, Eric ewedding...@cso.atmel.com wrote: So in application code I tend to avoid switch statements for embedded systems, unless I'm writing throw-away code or the application is trivial. Oh no ! ;-) I have only recently got round to using switch statements, to improve code legibility. In my current/first embedded project, I happen to have a very long (25 cases, 160 lines long) switch statement.. I dread to think what it would like if I had to replace it (what else with ?) with nested if's ! How readable would that be... not to mention that with indentation, 25 levels of nesting would mean the last case would be 3 meters on the far right... ;-) Any coding tips to make all this look about readable by human beings ?! ;-/ -- Vince, catastrophed... ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
RE: [avr-gcc-list] Re: C vs. assembly performance
-Original Message- From: avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu. org] On Behalf Of Vincent Trouilliez Sent: Saturday, February 28, 2009 7:20 PM To: avr-gcc-list@nongnu.org Subject: Re: [avr-gcc-list] Re: C vs. assembly performance On Sat, 28 Feb 2009 19:09:13 -0700 Weddington, Eric ewedding...@cso.atmel.com wrote: So in application code I tend to avoid switch statements for embedded systems, unless I'm writing throw-away code or the application is trivial. Oh no ! ;-) I have only recently got round to using switch statements, to improve code legibility. In my current/first embedded project, I happen to have a very long (25 cases, 160 lines long) switch statement.. I dread to think what it would like if I had to replace it (what else with ?) with nested if's ! How readable would that be... not to mention that with indentation, 25 levels of nesting would mean the last case would be 3 meters on the far right... ;-) Any coding tips to make all this look about readable by human beings ?! ;-/ You wouldn't need *nested* ifs, but an if-else-if structure, or better yet, a table of function pointers, also known as a dispatch table. Each method depends on the type of data that you're switching on. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Re: [avr-gcc-list] Re: C vs. assembly performance
On Sat, 28 Feb 2009 19:24:38 -0700 Weddington, Eric ewedding...@cso.atmel.com wrote: You wouldn't need *nested* ifs, but an if-else-if structure, or better yet, a table of function pointers, also known as a dispatch table. Each method depends on the type of data that you're switching on. I switch on an unsigned byte, with contiguous values (0-24). A Function table sounds elegant to my ear, but it would mean 25 functions ! In my case the work is done in-situ, within the switch statement, as it's takes only 2 or 3 statements to process a givne case. Using functions would be both overkill and overwhelming to manage I think !! ;-) I pasted my switch statement below for the curious. -- Vince void var_format(char *buff, uint8_t var_num) { uint16_t tmp16; uint32_t tmp32; int16_t tmpS16; char Yes[] = Yes; char No[] =No; char Unknown[] =-\?\?\?-; switch (var_num) { case ECU_COOLANT: {// (N * 0.75) - 40 DB41 -40 to +150 °C tmpS16 = (int16_t)(KLineFrameM1[41]) * 3 / 4 - 40; var_format_S16(buff, tmpS16, 0); break; } case ECU_ENGINE_SPEED: {// MSB:LSB DB12:13 0 to RPM ATOMIC_BLOCK(ATOMIC_FORCEON) { tmp16 = ((uint16_t)KLineFrameM1[12] 8) + (uint16_t)KLineFrameM1[13]; } var_format_S16(buff, (int16_t)tmp16, 0); break; } case ECU_ROAD_SPEED: {// DB14uint8_t MPH var_format_byte(buff, KLineFrameM1[8]); break; } case ECU_BARO_AIR_PRESSURE: {// ((N-128)/100)+1 DB24 -0.28 to +2.27 Bar tmpS16 = (int16_t)(KLineFrameM1[24]) - 28; var_format_S16(buff, tmpS16, 2); break; } case ECU_MAP_PRESSURE: {// ((N-130)/100)+1 DB25 -0.30 to +2.25 Bar tmpS16 = (int16_t)(KLineFrameM1[25]) - 30; var_format_S16(buff, tmpS16, 2); break; } case ECU_MAT_TEMP: {// (N * 0.75) - 40 DB42 -40 to +150 °C tmpS16 = (int16_t)(KLineFrameM1[42]) * 3 / 4 - 40; var_format_S16(buff, tmpS16, 0); break; } case ECU_THROTTLE_POSITION: {// N / 2.55DB27 0 to 100 % tmp16 = (uint16_t)KLineFrameM1[27] * 100 / 255; var_format_byte(buff, (uint8_t)tmp16); break; } case ECU_ENGINE_LOAD: {// DB360 - 100 % var_format_byte(buff, KLineFrameM1[36]); break; } case ECU_KNOCK_COUNT: {// DB43uint8_t var_format_byte(buff, KLineFrameM1[43]); break; } case ECU_KNOCK_RETARD: {// (N * 45) / 255 DB44 0 to 45 Deg tmp16 = (uint16_t)KLineFrameM1[44] * 90 / 51; var_format_S16(buff, (int16_t)tmp16, 1); break; } case ECU_SPARK_ADVANCE: {// (N * 9000)/256 MSB:LSB DB39:400.00 Degrees ATOMIC_BLOCK(ATOMIC_FORCEON) { tmp16 = ((uint16_t)KLineFrameM1[39] 8) + (uint16_t)KLineFrameM1[40]; } tmp32 = (uint32_t)tmp16 * 9000 / 256; var_format_S16(buff, (int16_t)tmp32, 2); break; } case ECU_BOOST_DC: {// DB31uint8_t or %, don't know var_format_byte(buff, KLineFrameM1[31]); break; } case ECU_MAIN_INJ_DC: {// DB45uint8_t var_format_byte(buff, KLineFrameM1[45]); break; } case ECU_SECONDARY_INJ_DC: {// DB37
RE: [avr-gcc-list] Re: C vs. assembly performance
-Original Message- From: avr-gcc-list-bounces+eweddington=cso.atmel@nongnu.org [mailto:avr-gcc-list-bounces+eweddington=cso.atmel@nongnu. org] On Behalf Of Vincent Trouilliez Sent: Saturday, February 28, 2009 7:41 PM To: avr-gcc-list@nongnu.org Subject: Re: [avr-gcc-list] Re: C vs. assembly performance I switch on an unsigned byte, with contiguous values (0-24). A Function table sounds elegant to my ear, but it would mean 25 functions ! In my case the work is done in-situ, within the switch statement, as it's takes only 2 or 3 statements to process a givne case. Using functions would be both overkill and overwhelming to manage I think !! ;-) It is no more overkill and overwhelming than dealing with a single contiguous switch statement with 25 cases. I think it's just a matter of perspective. The one good thing is that each function really encapsulates a single idea and has nothing else, which *may* make maintenance easier. Implementing it this way really separates the ideas of 'making some choice' from the 'implementation of a single choice', rather than conflating the two together in a single massive switch statement. Yes, your switch sounds like a candidate for a function table. However, like everything in engineering, there are trade-offs. The trade off here is the overhead for each function, plus the slight overhead for checking the range of the value used to index into the table. To be fair, you would have to write it both ways and see which produces the smaller code. However, the other advantage for the function table is that you can guarantee that it will take the same amount of time, for each value, to start executing its associated function. With an if-else-if structure, the longer down the if-else-if chain you go then the longer it takes to execute the associated code. For a switch statement, it all depends on how the compiler generates code for it, which can change as you add more cases or remove cases from the switch. This leaves you out of control with regards to timing. ___ AVR-GCC-list mailing list AVR-GCC-list@nongnu.org http://lists.nongnu.org/mailman/listinfo/avr-gcc-list