RE: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list]GCC-AVR Register optimisations]]

2008-01-15 Thread William Rivet
For what it is worth, I would prefer that simulavrxx proper could be
used, even if it was just built as an separate executable along with the
full-up code. This is one point of view of course. 

I also however know that libbfd is a pain for us the way we use it
becuase over time it changes in ways we often don't care about, but
cuases trouble for our simulavrxx users who have to cause it to be built
and installed...then simulavrxx has to find and use it x-p

I'm pretty sure one of my build clean-up activities should include just
including a suitable version of libbfd sources in simulavrxx and
dispense with the special build requirements we have today. Hence I'm
actually contemplating doing just what you did. (and I've been told this
is a wrong approach too ;-p )

So in the end I say, more power to you. Thanks for posting. The free and
open communication certainly is in the spirit of FSF and OSS. It's all
good. 

BTW: Where would you host your new tool? For my own information, how do
you use it in conjunction with the GCC testsuite? Feel free to take this
part offline or ignore if you prefer. 

:-)

On Sun, 2008-01-13 at 23:15 +, Paulo Marques wrote:

 Now, I don't mind at all discussing technical merits of the idea, 
 especially if I can show my own code to use as a counter argument. So, 
 I was trying to delay my replies (including the reply to Joerg Wunsch) 
 to a point where I could show some code instead of the natural 
 handwaving that these kinds of discussions inevitably degenerate into.




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list]GCC-AVR Register optimisations]]

2008-01-15 Thread Paulo Marques

Quoting William Rivet [EMAIL PROTECTED]:


For what it is worth, I would prefer that simulavrxx proper could be
used, even if it was just built as an separate executable along with the
full-up code. This is one point of view of course.


Maybe I caused the wrong impression, but I did look into simulavrxx 
before taking on this task, becaue that was my initial thought too.


The thing is, I think of avrtest more of a test tool rather than a 
simulator. A test tool that we can tweak in any way that simplifies the 
setup needed to run the testsuite.



I also however know that libbfd is a pain for us the way we use it
becuase over time it changes in ways we often don't care about, but
cuases trouble for our simulavrxx users who have to cause it to be built
and installed...then simulavrxx has to find and use it x-p

I'm pretty sure one of my build clean-up activities should include just
including a suitable version of libbfd sources in simulavrxx and
dispense with the special build requirements we have today. Hence I'm
actually contemplating doing just what you did. (and I've been told this
is a wrong approach too ;-p )


Please note that I took just a small function from simulavrxx, and one 
that I probably still want to re-write someday, anyway. Most of the 
code is written from scratch to be much simpler than the simulavrxx 
version (just compare the almost 5000 lines of code for just the 
decode.* part of simulavrxx vs 1400 for the complete avrtest).


I'm not going to tell you that it is the wrong approach, but you should 
look at avrtest, too ;)



So in the end I say, more power to you. Thanks for posting. The free and
open communication certainly is in the spirit of FSF and OSS. It's all
good.


Thanks :)


BTW: Where would you host your new tool?


I still didn't look into it, but my idea was to host in place that gave 
the idea of this is what you need to run the gcc testsuite for gcc 
and not so much this is where you can find yet another avr simulator. 
I was just trying to make it work myself before thinking about an 
official release.



For my own information, how do
you use it in conjunction with the GCC testsuite? Feel free to take this
part offline or ignore if you prefer.


At this point I'm still reading dejagnu documentation and trying to 
figure out how everything fits together. From what I've already seen it 
looks like avrtest can indeed be very helpful, in terms of simplifying 
the total setup, increasing execution speed and improving portability.


As soon as I get some concrete results, I'll post them on the list, 
together with the steps needed to reproduce them. Just give me a few 
more days.


--
Paulo Marques



This message was sent using IMP, the Internet Messaging Program.



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: Simulator for GCC Testing [was: RE: [Fwd: Re:[avr-gcc-list]GCC-AVR Register optimisations]]

2008-01-15 Thread Andrew Hutchinson

FYI changing simulators is very easy.

Get testsuite cases to pass is another thing!

Im trying avrora - just for speed right now. I'll compare when I've got 
my build under control.



Weddington, Eric wrote:
 

  

-Original Message-
From: Paulo Marques [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 15, 2008 5:01 PM

To: [EMAIL PROTECTED]
Cc: Weddington, Eric; Andrew Hutchinson; [EMAIL PROTECTED]; 
avr-gcc-list@nongnu.org; KlausRudolph
Subject: RE: Simulator for GCC Testing [was: RE: [Fwd: 
Re:[avr-gcc-list]GCC-AVR Register optimisations]]


Quoting William Rivet [EMAIL PROTECTED]:




BTW: Where would you host your new tool?
  
I still didn't look into it, but my idea was to host in place 
that gave 
the idea of this is what you need to run the gcc testsuite for gcc 
and not so much this is where you can find yet another avr 
simulator. 
I was just trying to make it work myself before thinking about an 
official release.





FWIW, I'm willing to host it on the WinAVR CVS repository.

Thanks,
Eric Weddington


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
  



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list]GCC-AVR Register optimisations]]

2008-01-15 Thread Klaus Rudolph

Hi all,


I also however know that libbfd is a pain for us the way we use it
becuase over time it changes in ways we often don't care about, but
cuases trouble for our simulavrxx users who have to cause it to be built
and installed...then simulavrxx has to find and use it x-p
There has nothing changed which was a problem for simulavrxx. The only 
trouble comes from mixing headers with different libs.


I'm pretty sure one of my build clean-up activities should include just
including a suitable version of libbfd sources in simulavrxx
Oh no! We have discussed that a lot of times and I will not include any! 
sources from foreign projects. Not TCL/TK and not libbfd. There is no 
reason for it! The only problem is that we have the corresponding bfd.h 
with the libbfd compiled for avr. Nothing more must be fulfilled.


For a gcc test suite simulavrxx and simulavr are nearly the same. The 
decoder uses the same instruction set (sleep is not supported, but this 
instruction will not be part of any standard c code and also writing 
flash is not inside, but this is not what the compiler is interested in:-)



I have not understood why we need a reduced smaller simulavrxx for a 
test suite? Is the size a problem?


Back to the point of build tools:
Bill had done a lot for building the tool on different plattforms and a 
lot of searching all the dependencies. But all that work results 
actually in a very complex build system. As Knut allready mentioned, we 
actually have a build tool chain which must have python to build the tcl 
examples. That sounds very terrible for me and makes things much to 
complex.


From my point of view:
I use my own old Makefile with one config file which contains 2 lines of 
informations: path to libbfd and path to tcl/tk. Thats all. No need for 
any! kind of external tooling (autotools) and so on. Making things 
automated simple could result in complex results :-)


For a gcc test suite there is also no need for having tcl/tk or python 
or any other scripting language available. Simply read the elf-files and 
watch the results of simulation with some environment for automation. If 
there is a need for a more elaborate solution: let me know! Maybe I can 
do that for you!


Bye
 Klaus






___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-14 Thread Joerg Wunsch
Dave N6NZ [EMAIL PROTECTED] wrote:

 the Atmel 802.15.4 MAC,

 Need to check license on that one -- but a good choice otherwise

BSD-style.

 If it is desired to have it in a more neutral place, such as
 avr-libc, I'm open to that too, if Joerg Wunsch is willing.

 Seems to me that as long as they are publicly available under an
 appropriate license, it doesn't really matter much who backs them up
 :)

Agreed, I think both locations (sf.net, or savannah.nongnu.org) would
do fine.

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list]GCC-AVR Register optimisations]]

2008-01-13 Thread Weddington, Eric
 

 -Original Message-
 From: Paulo Marques [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, January 12, 2008 8:21 AM
 To: Weddington, Eric
 Cc: Andrew Hutchinson; [EMAIL PROTECTED]; 
 avr-gcc-list@nongnu.org; [EMAIL PROTECTED]; Klaus Rudolph
 Subject: Re: Simulator for GCC Testing [was: RE: [Fwd: Re: 
 [avr-gcc-list]GCC-AVR Register optimisations]]
 
 Quoting Weddington, Eric [EMAIL PROTECTED]:
 
  I strongly recommend that the wheel not be reinvented. If people are
  interested in running the GCC Regression Test Suite, I 
 would recommend
  using available tools, and improving the available tools rather then
  invent new ones.
 
 I've looked at the code for both simulavr and simulavrxx. It seems to 
 me that these are more geared towards people trying to debug problems 
 with their own projects, and not so much automate compiler 
 tests. (more 
 like AVR studio, too)

The goal is for both. Simulavr is already being used for testing the
compiler and for avr-libc.
 
 Most of the code there is to handle all sorts of peripherals that can 
 be found on avr microcontrollers, as is to be expected from full 
 emulators. However, my idea is much simpler: it is probably just the 
 size of decoder.cpp of simulavrxx, re-written in plain C. 
 This should 
 make it really easy to port it to any platform (cygwin, etc.).

So now we're talking about simulavr again. Simulavr is written in C and
can at least be built for Cygwin. But it's unmaintained. The goal is to
get simulavrxx working for Cygwin AND for running the GCC test suite.
 
 The major advantage is that we are _not_ trying to emulate a specific 
 avr model, and as such we can do all sorts of hacks to help 
 the test / 
 benchmark suite as best as we can.
 
 We can allow the program to write to files on the host. We 
 can measure 
 acurate cycle timings and dump the results in a convenient 
 way for the 
 benchmark suite. We can emulate an AVR with 8Mb of flash and 2Mb of 
 RAM. We can force the start/stop cycle counter instructions to use 
 zero cycles, so that they don't interfere with the counts themselves. 
 We can report exit codes from the avr code, so that the test 
 suite can 
 use it to determine success/failure in some of the tests. Etc., etc.
 
 So, I don't think I'm reinventing the wheel here. This is 
 getting to a 
 point where I'm very tempted to just do it (it seems so simple) and 
 publish it so that I can show what I mean...

And this will further split the community.

Why can't we work together, instead of always separately?

Eric Weddington


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-13 Thread John Regehr
I'll just plug Avrora again:

  http://compilers.cs.ucla.edu/avrora/

It runs on many platforms (written all in Java), is quite fast, and is 
well designed.  Best of all it is easy to extend, you just add monitors 
that can be configured to receive a wide variety of callbacks about 
program events such as memory operations, I/O operations, execution of 
different kinds of instructions, interrupts, etc.

 focus
 on AVR-specific code, and GCC-specific AVR code at that.

Definitely.  If people want to test avr-gcc against other compilers, or 
compare AVR to other architectures, that's a separate exercise.

MiBench is an aging but useful collection of embedded C codes:

  http://www.eecs.umich.edu/mibench/

 John, I would welcome publicly available code from TinyOS, but I would
 need to be already compiled with nesc, so that way we just have straight
 C that we can feed into avr-gcc.

Sure, this is easy.  It'll target ATmega128 only, howver.

Re. floating point I believe that the papabench codes do a lot of this:

http://www.irit.fr/recherches/ARCHI/MARCH/rubrique.php3?id_rubrique=97

This is code extracted from the Paparazzi UAV project, which uses an 
ATmega for onboard flight control.

 There needs to be some consensus on what we measure, how we measure it,
 what output files we want generated, and hopefully some way to
 automatically generate composite results. I'm certainly open to anything
 in this area.

Code size and static RAM consumption are obvious.  Some sort of throughput 
metric is useful.  For interrupt-driven codes, my group often uses 
processor duty cycle as a measure of efficiency.  This is the % of time 
that the CPU is not in a sleep mode.  Dyanmic stack memory consumption is 
good, though this is not a very consistent metric for interrupt-driven 
codes since in a short simulation run the worst-case stack usage is 
unlikely to be encountered.  Perhaps adding up the stack memory usage of 
main + all interrupts would be better.

John Regehr


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list]GCC-AVR Register optimisations]]

2008-01-13 Thread Paulo Marques

Quoting Weddington, Eric [EMAIL PROTECTED]:

-Original Message-
[...]
I've looked at the code for both simulavr and simulavrxx. It seems to
me that these are more geared towards people trying to debug problems
with their own projects, and not so much automate compiler
tests. (more
like AVR studio, too)


The goal is for both. Simulavr is already being used for testing the
compiler and for avr-libc.


Yes, but simulavr isn't being maintained any more, right?

Even more, one of my points is that having a software to handle both 
cases might be harder to maintain than a simple simulator and 
full-featured hardware emulator as separate projects. This is not my 
strongest point, though.



Most of the code there is to handle all sorts of peripherals that can
be found on avr microcontrollers, as is to be expected from full
emulators. However, my idea is much simpler: it is probably just the
size of decoder.cpp of simulavrxx, re-written in plain C.
This should
make it really easy to port it to any platform (cygwin, etc.).


So now we're talking about simulavr again. Simulavr is written in C and
can at least be built for Cygwin. But it's unmaintained. The goal is to
get simulavrxx working for Cygwin AND for running the GCC test suite.


No, simulavrxx has a _lot_ of code to handle peripherals. In fact it is 
the majority of the code of simulavrxx. The CPU part is just 4897 
lines out of a total of 20586.



[...]

And this will further split the community.


You make it sound like a bad thing.


Why can't we work together, instead of always separately?


Because it's not the way open source works. (or at least not the way it 
works better)


Open source works the other way: projects blossom or die by natural 
selection, with the advantage that we can pick the best parts of each 
project and mix and match as we please (doing a bit of artificial 
selection in the process).


So, I don't like the way simulavrxx works. The pre-decoding of flash 
into instances of opcode classes doesn't seem like a good idea to me, 
and it is a fundamental concept of simulavrxx. That _is_ my personal 
opinion and everyone is entitled to his own.


Now, I don't mind at all discussing technical merits of the idea, 
especially if I can show my own code to use as a counter argument. So, 
I was trying to delay my replies (including the reply to Joerg Wunsch) 
to a point where I could show some code instead of the natural 
handwaving that these kinds of discussions inevitably degenerate into.


Attached is a beta version of avrtest. It also has a small Hello, 
World! demo that actually runs under avrtest and produces the expected 
output.


It is a single file of C code, 1391 lines long. It must still have some 
bugs in there, but I was actually quite surprised when after writing 
the whole thing it ran Hello, World! on the first attempt.


I used the lookup_opcode function from simulavrxx to save some time, 
but if the author has a problem with me using it (although I'm legally 
entitled to use it), I'll write my own too, because I believe that 
respecting the author's whishes is more important than respecting the 
actual license.


I'll be polishing it up a bit over the next few days because it still 
lacks a few things:

- a few opcodes are still not implemented at all
- RAMPx registers still need to be handled in a few places
- a few hidden bugs (hopefully) that need to be chased down and shot

So, please, instead of just dismissing this project as a vanity 
hacker's project or NIH sindrome, just take a look at it. If you 
still don't like it, that's fine. But at least now we can talk more 
technical and less handwaving.


At least, I bet you can easily compile it under cygwin ;)

--
Paulo Marques



This message was sent using IMP, the Internet Messaging Program.



avrtest.tar.gz
Description: GNU Zip compressed data
___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-13 Thread Dave N6NZ



Weddington, Eric wrote:


Hi John, Dave, others,

Here are some random thoughts about a benchmark test suite:

- GCC has a page on benchmarks:
http://gcc.gnu.org/benchmarks/
However all of those are geared towards larger processors and host
systems. There is a link to a benchmark that focuses on code size,
CSiBE, http://www.inf.u-szeged.hu/csibe/. Again, that benchmark is
geared towards larger processors.

This creates a need to have a benchmark that is geared towards 8-bit
microcontroller environments in general, and specifically for the AVR.

What would we like to test?

Code size for sure. Everyone always seems to be interested in code size.
There is an interest in seeing how the GCC compiler performs from one
version to the next, to see if optimizations have improved or if they
have regressed.


Which I would call regression tests, not benchmarks, per se.  Of 
performance regressions, I would guess that code size regressions under 
-Os are the #1 priority for the typical user.  (A friend is currently 
tearing his hair out over a code size regression in a commercial PIC C 
compiler -- he needs to release a minor firmware update to the field... 
but not even the original code fits his flash any more...)


It's worth drawing a distinction between benchmarks and regression 
tests.  They need to be written differently.  A regression test needs to 
sensitize a particular condition, and needs to be small enough to be 
debuggable. A benchmark needs to be realistic, which often makes them 
harder to debug. I say we need both.  The performance regression tests 
can easily roll into release criteria.  A suite of performance 
benchmarks is more useful as a confirmatory measure of goodness -- but 
actual mysteries in the aggregate score will most likely be chased with 
smaller tests.


My guess is that existing tests my help us a lot in the benchmark 
category, but the regression tests will require some elbow grease on our 
part to get a good set.  There's a good chance we can extract good 
regression tests from existing benchmark-sized tests.


A semi-related question is how many of these tests can be pushed up 
stream?  If we could get a handful of uCtlr-oriented code size 
regression tests packaged up so that the developers of the generic 
optimizer could run them as release criteria, it would, I would think, 
improve the overall quality of gcc for all uCtlr targets.




There is also an interest in comparing AVR compilers, such as how GCC
compares to IAR, Codevision or ImageCraft compilers.


Who is interested? gcc developers, as a means to keep gcc competitive? 
Or potential users?  The former is benchmarking, the latter is moving 
towards bench-marketing. Not that marketing is bad, but that sort of 
thing can be a distraction.  In any case, the tests that are meaningful 
here are the benchmark overall goodness test suite, not the targeted 
test suite.




And sometimes there is an interest in comparing AVR against other
microcontrollers, notably Microchip's PIC and TI's MSP430.


Different processor with same compiler?  Different processor with best 
compiler? -- Now this is beginning to sound like SPEC.




Because there are these different interests, it is challenging to come
up with appropriate code samples to showcase and benchmark these
different issues. But we could also implement this in stages, and focus
on AVR-specific code, and GCC-specific AVR code at that.


Clarity of classification is import.  Different buckets for different 
issues.




If we are going to put together a benchmark test suite, like others
benchmarks for GCC (for larger processors), then I would think that it
would be better to model it somewhat after those other benchmarks. I see
that they tend to use publicly available code, and a variety of
different types of applications.


For benchmarking, and bench-marketing, that's a good approach.  I'll be 
redundant and say those are probably not what you want to be debugging. 
It would make sense for what I'll call a avr-gcc dashboard.  I see a 
web page with a bunch of bar graphs on it.  A summary bar at the top 
that is the weighted sum of individual test bars.  As an avr-gcc user, 
that kind of summary page would be very useful from one release to the 
next for setting expectations regarding performance on your own 
application. As an avr-gcc release master, it's a good dashboard for 
tracking progress and release worthy-ness.



We should have something similar. Some
suggested projects: FreeRTOS (for the AVR)

Sounds good,
, uIP (however, we need to

pick a specific implementation of it for the AVR; I have a copy of
uIP-Crumb644), 

Another good one


the Atmel 802.15.4 MAC,

Need to check license on that one -- but a good choice otherwise


and the GCC version of the
Butterfly firmware. I also have a copy of the TI Competitive
Benchmark, which they, and other semiconductor companies, have used to
do comparisons between processors.
Not familiar with it.  Also, check the license. 

Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-13 Thread John Regehr
 (A friend is currently tearing his hair out
 over a code size regression in a commercial PIC C compiler -- he needs to
 release a minor firmware update to the field... but not even the original code
 fits his flash any more...)

Embedded compiler rule #1: If you find a version of the compiler that 
works, keep a copy around for the life of the product.
 
John


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]]

2008-01-13 Thread William Rivet
Eric, please share what info you have with me. I wouldn't mind running
whatever works through simulavr to see what it is about (I'm new to the
regression suite...I've only hacked GCC for it's C++ front-end bits, and
certainly broke many things when I did :-p )

Give the recent interest I'm trying to make some time to improve what I
can on simulavrxx since it was supposed to superceed the old simulavr.
If you CC the simulavr list, I'll also pick up on relevant threads a bit
quicker...I'm bad about following the avr list :-)

Cheers,

Bill



On Fri, 2008-01-11 at 20:14 -0700, Weddington, Eric wrote:
 
  -Original Message-
  From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED]
  org] On Behalf Of Andrew Hutchinson
  Sent: Thursday, January 10, 2008 6:27 PM
  To: [EMAIL PROTECTED]
  Cc: avr-gcc-list@nongnu.org
  Subject: Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]
  
  Here my input:
  
  For starters gcc has testsuite that can be used. It's not perfect but 
  its quite demanding - even if we cant do all the tests.
  
  Do we have info on setting this up with simulator? I did have some 
  instruction - once!
  
  After than I suggest some benchmark  that would produce more normal 
  code and also give qualitative indications of performance 
  (size is easy, 
  speed would be nice).
  
  Finally, regression tests using testcases and bug reports.
  
 
 Hi All,
 
 Some points:
 
 - Yes, GCC does have a Regression Test Suite, and it can execute for the
 AVR using the SimulAVR simulator. There are many, many tests that pass
 for the AVR. There are quite a few that don't, but most of those
 failures that I have looked at either the test needs fixing (because it
 assumes a 32-bit processor), or the tests don't apply to the AVR. Some
 work needs to be done to get the Regression Test Suite in shape for the
 AVR.
 
 - As mentioned, simulavr is known to work as a simulator for the GCC
 test suite. However, simulavr is not really maintained anymore. At the
 simulavr project on Savannah, there is a new code base called simulavrxx
 which is based on C++. This is maintained, but it could use help: It
 doesn't run on Cygwin yet, and AFAIK it cannot run the GCC Test Suite
 yet. Any help on this is deeply appreciated.
 https://savannah.nongnu.org/projects/simulavr
 
 I strongly recommend that the wheel not be reinvented. If people are
 interested in running the GCC Regression Test Suite, I would recommend
 using available tools, and improving the available tools rather then
 invent new ones.
 
 I have instructions on running the GCC Regression Test Suite (from
 Bjoern Haase, IIRC). I have yet to run it myself, but others have done
 so successfully. However, there are reports about difficulties on
 running the test on Cygwin. I have heard that it is successful on Linux.
 There is a person from Belgium, Mike Stein, who has been running the GCC
 Test Suite for the AVR pretty much on a daily basis and he has been
 posting the results regularly on the gcc-testresults mailing list:
 http://gcc.gnu.org/ml/gcc-testresults/. Just search for avr to see
 the results. (It looks like he last did it in December.) I'll be
 attempting to run the GCC Test Suite probably sometime in Q1 2008. (I'm
 going to be busy in January and February.) Just email me if anyone is
 interested in the instructions.
 
 Note that running the GCC Test Suite is imperative for anyone who works
 on GCC, because in order to submit any patches to the GCC project, they
 require that the patch is tested with the Test Suite and that there are
 no new regressions. That's the main purpose of the test suite.
 
 I'll start a new thread about a benchmark suite...
 
 Thanks,
 Eric Weddington
 
 



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-13 Thread Weddington, Eric
 

 -Original Message-
 From: Dave N6NZ [mailto:[EMAIL PROTECTED] 
 Sent: Sunday, January 13, 2008 4:19 PM
 To: Weddington, Eric
 Cc: John Regehr; avr-gcc-list@nongnu.org
 Subject: Re: AVR Benchmark Test Suite [was: RE: 
 [avr-gcc-list] GCC-AVR Register optimisations]
 
 
 
 Weddington, Eric wrote:
 
 
 It's worth drawing a distinction between benchmarks and regression 
 tests.  They need to be written differently.  A regression 
 test needs to 
 sensitize a particular condition, and needs to be small enough to be 
 debuggable. A benchmark needs to be realistic, which often 
 makes them 
 harder to debug. I say we need both.  The performance 
 regression tests 
 can easily roll into release criteria.  A suite of performance 
 benchmarks is more useful as a confirmatory measure of 
 goodness -- but 
 actual mysteries in the aggregate score will most likely be 
 chased with 
 smaller tests.

Ok. Regression tests should really fit within the GCC Regression Test
framework. I would rather not duplicate the work that they have there.
So I'm really looking for benchmark tests, under your definition. That's
not to say I want to ignore the regression tests. I just want to fill in
a gap that's missing for the AVR.

 
 A semi-related question is how many of these tests can be pushed up 
 stream?  If we could get a handful of uCtlr-oriented code size 
 regression tests packaged up so that the developers of the generic 
 optimizer could run them as release criteria, it would, I 
 would think, 
 improve the overall quality of gcc for all uCtlr targets.

Nothing can be pushed upstream right now. As I mentioned in another post
in this thread, the AVR target is not that important in the eyes of the
overall members of the GCC project. I'm working diligently to change
that. But it's one of those, if we want something done, do it
ourselves.

 
  
  There is also an interest in comparing AVR compilers, such 
 as how GCC
  compares to IAR, Codevision or ImageCraft compilers.
 
 Who is interested? gcc developers, as a means to keep gcc 
 competitive? 
 Or potential users?  The former is benchmarking, the latter is moving 
 towards bench-marketing. Not that marketing is bad, but that sort of 
 thing can be a distraction.  In any case, the tests that are 
 meaningful 
 here are the benchmark overall goodness test suite, not the 
 targeted 
 test suite.

As a gcc developer, I am interested in some kind of metric to keep gcc
competitive with other AVR compilers. Honestly, it seems that it is
urban myth that IAR optimizes better than GCC. Is that really true?
For what applications? For what compiler switches? Eventually I'd like
to have something definitive to combat any FUD.

I don't want to get into bench-marketing. I would really like to have
something of value and meaningful, and not have to tweak numbers to
arrive at good results to show off. If AVR GCC sucks in an area, I don't
want to paper over it. I want to show it so we know what needs
improvement.

 
  
  And sometimes there is an interest in comparing AVR against other
  microcontrollers, notably Microchip's PIC and TI's MSP430.
 
 Different processor with same compiler?  Different processor 
 with best 
 compiler? -- Now this is beginning to sound like SPEC.

Well, lofty goals for sure. I don't want to get outside of the 8-bit
microcontroller realm. I certainly want to do first things first. But I
think it might be interesting, at some point in the future, if some of
those things could be achieved.
 
  
  If we are going to put together a benchmark test suite, like others
  benchmarks for GCC (for larger processors), then I would 
 think that it
  would be better to model it somewhat after those other 
 benchmarks. I see
  that they tend to use publicly available code, and a variety of
  different types of applications.
 
 For benchmarking, and bench-marketing, that's a good 
 approach.  I'll be 
 redundant and say those are probably not what you want to be 
 debugging. 
 It would make sense for what I'll call a avr-gcc dashboard. 
  I see a 
 web page with a bunch of bar graphs on it.  A summary bar at the top 
 that is the weighted sum of individual test bars.  As an 
 avr-gcc user, 
 that kind of summary page would be very useful from one 
 release to the 
 next for setting expectations regarding performance on your own 
 application. As an avr-gcc release master, it's a good dashboard for 
 tracking progress and release worthy-ness.

That's definitely the idea.
 
 
  the Atmel 802.15.4 MAC,
 Need to check license on that one -- but a good choice otherwise

:-)
 
  and the GCC version of the
  Butterfly firmware. I also have a copy of the TI Competitive
  Benchmark, which they, and other semiconductor companies, 
 have used to
  do comparisons between processors.
 Not familiar with it.  Also, check the license.  Processor 
 manufacturers 
 (like, oh, for instance, *all* the several I have worked for) 
 are very 
 touchy about benchmarks and benchmark

Re: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]]

2008-01-12 Thread Paulo Marques

Quoting Weddington, Eric [EMAIL PROTECTED]:

[...]
Hi All,


Hi,


Some points:

- Yes, GCC does have a Regression Test Suite, and it can execute for the
AVR using the SimulAVR simulator. There are many, many tests that pass
for the AVR. There are quite a few that don't, but most of those
failures that I have looked at either the test needs fixing (because it
assumes a 32-bit processor), or the tests don't apply to the AVR. Some
work needs to be done to get the Regression Test Suite in shape for the
AVR.

- As mentioned, simulavr is known to work as a simulator for the GCC
test suite. However, simulavr is not really maintained anymore. At the
simulavr project on Savannah, there is a new code base called simulavrxx
which is based on C++. This is maintained, but it could use help: It
doesn't run on Cygwin yet, and AFAIK it cannot run the GCC Test Suite
yet. Any help on this is deeply appreciated.
https://savannah.nongnu.org/projects/simulavr

I strongly recommend that the wheel not be reinvented. If people are
interested in running the GCC Regression Test Suite, I would recommend
using available tools, and improving the available tools rather then
invent new ones.


I've looked at the code for both simulavr and simulavrxx. It seems to 
me that these are more geared towards people trying to debug problems 
with their own projects, and not so much automate compiler tests. (more 
like AVR studio, too)


Most of the code there is to handle all sorts of peripherals that can 
be found on avr microcontrollers, as is to be expected from full 
emulators. However, my idea is much simpler: it is probably just the 
size of decoder.cpp of simulavrxx, re-written in plain C. This should 
make it really easy to port it to any platform (cygwin, etc.).


The major advantage is that we are _not_ trying to emulate a specific 
avr model, and as such we can do all sorts of hacks to help the test / 
benchmark suite as best as we can.


We can allow the program to write to files on the host. We can measure 
acurate cycle timings and dump the results in a convenient way for the 
benchmark suite. We can emulate an AVR with 8Mb of flash and 2Mb of 
RAM. We can force the start/stop cycle counter instructions to use 
zero cycles, so that they don't interfere with the counts themselves. 
We can report exit codes from the avr code, so that the test suite can 
use it to determine success/failure in some of the tests. Etc., etc.


So, I don't think I'm reinventing the wheel here. This is getting to a 
point where I'm very tempted to just do it (it seems so simple) and 
publish it so that I can show what I mean...


--
Paulo Marques



This message was sent using IMP, the Internet Messaging Program.



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]]

2008-01-12 Thread Gre7g Luterman
Writing this sounds intriguing and amusing to me.  If
you need more coders, add me to the list.  I'd be
interested in contributing.

Gre7g

--- Paulo Marques [EMAIL PROTECTED] wrote:

 Quoting Weddington, Eric
 [EMAIL PROTECTED]:
 So, I don't think I'm reinventing the wheel here.
 This is getting to a 
 point where I'm very tempted to just do it (it seems
 so simple) and 
 publish it so that I can show what I mean...


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]]

2008-01-12 Thread Joerg Wunsch
Paulo Marques [EMAIL PROTECTED] wrote:

 I've looked at the code for both simulavr and simulavrxx. It seems
 to me that these are more geared towards people trying to debug
 problems with their own projects, and not so much automate compiler
 tests. (more like AVR studio, too)

No, that's one of their points but not the only one.  simulavr is
already in use by the avr-libc project for teir own testsuite, and as
Eric told you, it used to be in use for the GCC testsuite as well.
This is done by using the simulator on a standalone basis (as opposed
to coupling it to GDB as it's the case for interactive debugging).

I don't think another simulator will do any good.  My opinion is that
simulavrxx particularly needs help on the documentation front (the
original author isn't very fluent in writing English), and it might
need some help in upgrading some of the implemented features for more
recent AVRs (but except for adding the ATmega256x architecture, this
is not much of relevance for plain compiler or library tests).

Another simulator will only further split the development forces
rather than bundle them, it will further confuse the users about why
there's a multitude of simulators (with none of them being really
good), it will introduce further bugs rather than fixing those that
are already there.  Some of the CPU details are not so obvious from
the datasheet, so since a further simulator is likely to ignore all
the experience the other simulator writers have already collected, it
stands a good chance of simulating wrong in some situations.

Excuse me, but it sounds much like a vanity hacker's project to me
than any serious help.  NIH, so to speak.

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-11 Thread Wouter van Gulik

Andrew Hutchinson schreef:


PS Please report as a bug - gcc should be better than this.



I did, it got number 34737.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34737

I hope all info is ok.
I wanted to add a link to your e-mail. Put it's not on the list archives 
yet.


Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-11 Thread Paulo Marques

Dave N6NZ wrote:

[...]
The test matrix deserves some thought.  The same test might have 
different pass/fail criteria under different options.  For example, 
imagine a suite of 10 tests.  You might say 10/10 must show zero code 
growth under -Os, 7/10 show no speed degradation under -Os, the same 10 
tests must show zero slow down for 10/10 under -O3, 6/10 no code growth 
under -O3.  (Just an example, may not be realistic.)


One thing that might help us a lot here, is a cooperative simulator.

Something along these lines:

 - the simulator would be used to test gcc, so most hardware simulation 
could be ignored. The test code could be written to only use what the 
simulator offered.


 - the simulator could have special ports to control output and 
generate statistics to test regressions. Something along these lines 
(for instance):


   - mem address 0xFF is the control port. Writting 0xEn starts cycle 
counter 'n' and writing 0xFn stops cycle counter 'n' and outputs the result


   - writing a value less than 0x20 to the control port stops the 
emulator and returns that as the emulator exit code


   - writing to address 0xFE sends data to standard output

   - reading from address 0xFE reads one byte from standard input

The possibilities for commands and emulator control from the actual code 
being executed are endless, and this is just a few ideas from the top of 
my head.


Writing a simulator like this is pretty easy (I've written one for the 
CPU on my watch [1]) because most of the work in doing an emulator is 
writing the hardware emulation part. The CPU is the easy part, 
especially with a CPU with such a regular instruction set.


With this emulator we could build test scripts that would run the 
generated code under the emulator and could compare cycle counts, code 
size, return values, etc.


The idea is to allow avr code to run almost as any other unix process. 
What do you guys think? Is it worth doing? (and I'm volunteering for the 
initial work, at least)


--
Paulo Marques
Software Development Department - Grupo PIE, S.A.
Phone: +351 252 290600, Fax: +351 252 290601
Web: www.grupopie.com

Anything is possible, unless it's not.

[1] http://sourceforge.net/projects/virtualdatalink/ and
http://tech.groups.yahoo.com/group/timexdatalinkusbdevelop/



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Simulator for GCC Testing [was: RE: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]]

2008-01-11 Thread Weddington, Eric
 

 -Original Message-
 From: 
 [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED]
 org] On Behalf Of Andrew Hutchinson
 Sent: Thursday, January 10, 2008 6:27 PM
 To: [EMAIL PROTECTED]
 Cc: avr-gcc-list@nongnu.org
 Subject: Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]
 
 Here my input:
 
 For starters gcc has testsuite that can be used. It's not perfect but 
 its quite demanding - even if we cant do all the tests.
 
 Do we have info on setting this up with simulator? I did have some 
 instruction - once!
 
 After than I suggest some benchmark  that would produce more normal 
 code and also give qualitative indications of performance 
 (size is easy, 
 speed would be nice).
 
 Finally, regression tests using testcases and bug reports.
 

Hi All,

Some points:

- Yes, GCC does have a Regression Test Suite, and it can execute for the
AVR using the SimulAVR simulator. There are many, many tests that pass
for the AVR. There are quite a few that don't, but most of those
failures that I have looked at either the test needs fixing (because it
assumes a 32-bit processor), or the tests don't apply to the AVR. Some
work needs to be done to get the Regression Test Suite in shape for the
AVR.

- As mentioned, simulavr is known to work as a simulator for the GCC
test suite. However, simulavr is not really maintained anymore. At the
simulavr project on Savannah, there is a new code base called simulavrxx
which is based on C++. This is maintained, but it could use help: It
doesn't run on Cygwin yet, and AFAIK it cannot run the GCC Test Suite
yet. Any help on this is deeply appreciated.
https://savannah.nongnu.org/projects/simulavr

I strongly recommend that the wheel not be reinvented. If people are
interested in running the GCC Regression Test Suite, I would recommend
using available tools, and improving the available tools rather then
invent new ones.

I have instructions on running the GCC Regression Test Suite (from
Bjoern Haase, IIRC). I have yet to run it myself, but others have done
so successfully. However, there are reports about difficulties on
running the test on Cygwin. I have heard that it is successful on Linux.
There is a person from Belgium, Mike Stein, who has been running the GCC
Test Suite for the AVR pretty much on a daily basis and he has been
posting the results regularly on the gcc-testresults mailing list:
http://gcc.gnu.org/ml/gcc-testresults/. Just search for avr to see
the results. (It looks like he last did it in December.) I'll be
attempting to run the GCC Test Suite probably sometime in Q1 2008. (I'm
going to be busy in January and February.) Just email me if anyone is
interested in the instructions.

Note that running the GCC Test Suite is imperative for anyone who works
on GCC, because in order to submit any patches to the GCC project, they
require that the patch is tested with the Test Suite and that there are
no new regressions. That's the main purpose of the test suite.

I'll start a new thread about a benchmark suite...

Thanks,
Eric Weddington




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-11 Thread Weddington, Eric
 -Original Message-
 From: Dave N6NZ [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, January 10, 2008 1:35 PM
 To: gEDA user mailing list (by accident)
 Subject: Re: [avr-gcc-list] GCC-AVR Register optimisations
 
 
 Hi,
Over the course of years I have worked in and managed various 
 different hardware and software validation teams.  This is probably a 
 good way for me to contribute, since I'm not a compiler 
 back-end guru, 
 but I have relevant experience writing and reviewing test 
 plans.  Some 
 of those test plans even targeted C compilers :)
 
 -dave
  

 -Original Message-
 From: 
 [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED]
 org] On Behalf Of John Regehr
 Sent: Thursday, January 10, 2008 4:34 PM
 To: avr-gcc-list@nongnu.org
 Subject: RE: [avr-gcc-list] GCC-AVR Register optimisations
 
  - I, and others, are very interested in what you are using 
 to test your
  proposed changes. I have plans to put together an AVR 
 Benchmark Suite,
  consisting of a variety of publicly available programs that 
 can be used
  to test the compiler performance over time. It definitely 
 needs to have
  different types of programs, and publicly available 
 programs so there
  are no issues with distributing such a Benchmark Suite. I 
 welcome any
  collaboration on this.
 
 This is excellent and much needed, Eric.
 
 I can help with some TinyOS applications.
 
 Hopefully the benchmarks can be packaged up with a simulator 
 in order to 
 get dynamic information as well as static.  I use Avrora 
 (first google hit 
 for avrora) for everything, though it supports relatively few chips.
 

Hi John, Dave, others,

Here are some random thoughts about a benchmark test suite:

- GCC has a page on benchmarks:
http://gcc.gnu.org/benchmarks/
However all of those are geared towards larger processors and host
systems. There is a link to a benchmark that focuses on code size,
CSiBE, http://www.inf.u-szeged.hu/csibe/. Again, that benchmark is
geared towards larger processors.

This creates a need to have a benchmark that is geared towards 8-bit
microcontroller environments in general, and specifically for the AVR.

What would we like to test?

Code size for sure. Everyone always seems to be interested in code size.
There is an interest in seeing how the GCC compiler performs from one
version to the next, to see if optimizations have improved or if they
have regressed.

There is also an interest in comparing AVR compilers, such as how GCC
compares to IAR, Codevision or ImageCraft compilers.

And sometimes there is an interest in comparing AVR against other
microcontrollers, notably Microchip's PIC and TI's MSP430.

Because there are these different interests, it is challenging to come
up with appropriate code samples to showcase and benchmark these
different issues. But we could also implement this in stages, and focus
on AVR-specific code, and GCC-specific AVR code at that.

If we are going to put together a benchmark test suite, like others
benchmarks for GCC (for larger processors), then I would think that it
would be better to model it somewhat after those other benchmarks. I see
that they tend to use publicly available code, and a variety of
different types of applications. We should have something similar. Some
suggested projects: FreeRTOS (for the AVR), uIP (however, we need to
pick a specific implementation of it for the AVR; I have a copy of
uIP-Crumb644), the Atmel 802.15.4 MAC, and the GCC version of the
Butterfly firmware. I also have a copy of the TI Competitive
Benchmark, which they, and other semiconductor companies, have used to
do comparisons between processors. I also have a copy of some Atmel
internal AES code. I believe that this code is publicly available in
some kits that Atmel offers, but I need to do some double-checking to
make sure that it is public. 

John, I would welcome publicly available code from TinyOS, but I would
need to be already compiled with nesc, so that way we just have straight
C that we can feed into avr-gcc. I've been aware of avrora for several
years. If it is useful for this kind of work, I'm open minded to it. I
just want to make sure that it works and is somewhat easy to use.

Does anyone have other suggestions on projects to include in the
Benchmark? One are that seems to be lacking is some application that
uses floating point. Any help to find some application in this area
would be much appreciated.

There needs to be some consensus on what we measure, how we measure it,
what output files we want generated, and hopefully some way to
automatically generate composite results. I'm certainly open to anything
in this area. I would think that we need to be as open as possible on
this, with documentation (minimal, it can be a text file) on what are
our methods, how the results were arrived at, but importantly that the
secondary/generated files be available for others to review and verify
the results.

On practicalities: I am certainly willing to host the benchmark test

Re: [avr-gcc-list] GCC-AVR Register optimisations

2008-01-10 Thread Wouter van Gulik

Registers 17 downwards are  call saved and push/popped in prescribed
order by prolog/epilog functions. Also R28,29 is potential frame pointer
and so that is best left alone. So the key registers are: R18-R27   R30,31



Note that in some cases it could be very interesting to use r27, or Y, 
register.


Consider this example:

char *x;
volatile int y;

void foo(char *p)
{
y += *p;
}

void main(void)
{
char *p1 = x;
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
foo(p1++);
}


This will generate very bad code.
/* prologue: frame size=0 */
push r14
push r15
push r16
push r17
/* prologue end (size=4) */
lds r24,x
lds r25,(x)+1
movw r16,r24
subi r16,lo8(-(1))
sbci r17,hi8(-(1))
call foo
movw r14,r16
sec
adc r14,__zero_reg__
adc r15,__zero_reg__
movw r24,r16
call foo
movw r16,r14
subi r16,lo8(-(1))
sbci r17,hi8(-(1))
movw r24,r14
call foo
movw r14,r16
sec
adc r14,__zero_reg__
adc r15,__zero_reg__
movw r24,r16
call foo
movw r16,r14
subi r16,lo8(-(1))
sbci r17,hi8(-(1))
movw r24,r14
call foo
etc..

A more optimal scheme would be
call foo
movw r24, r16
adiw r24, 1
movw r16, r24
call foo
etc..
Using the r24 capability to do a 16 bit increment

But in this special case there is no frame pointer. So we could use 
R28 to store instead of R16. Then we can add on r28 and do something 
like this:


 call foo
 adiw r28, 1
 movw r24, r28
 call foo

So yes using R28 as last resort looks like a sane thing.
Unless there is no frame pointer at all, and there is a need for 16 (or 
32 bit) arithmetic on saved registers. This is probably incredibly 
difficult. But I thought to mention it anyway


HTH,

Wouter

ps.

Writing it like foo(p); p++; Will produce better code?!? I will fill a 
bug report for this.



With the order, there are several problems:

1) Initial register  allocation fragments the register set. For example,
allocating r25 will prevent R24-25 being used for 16bit register  and
prevent R22-25 and R24-27 being used as 32 bit registers. gcc register
allocator does not seem to overcome this fragmentation.

2) The situation is made worse by the order of  16bit+ register used for
call and return values - which are allocated in reverse order. eg
R24-R25, R22-24, R18-24.  This means that the function parameters or
return values are rarely  in the right place - except for 16bit values.

3) Allocating a byte to odd number register precluded it being extended
to 16bit value without a move.

So, I tried creating an order which would preserve the contiguous
register space and avoid the above issues as much as possible.
This is what I ended up with:

R18,26,22,30,20,24,19,21,23,25,27,31,28,29, \
   17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,\


The result is a 1.25% saving in code size for a simple mixed
application. Pretty good for such a simple change!

For more floating point, the saving might well be higher as it demands
more contiguous 32 bit registers.

On the same basis, the current order of called saved registers R2-R17
dictated by  (mcall) prolog limit further improvement is clearly
imperfect.  These are used less frequently, though their cost is much
higher. So its difficult to gauge impact. I might take a look at some
intense floating point functions to see if this if it is worth pursuing
reordering these too.


Andy









___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list




___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] GCC-AVR Register optimisations

2008-01-10 Thread Wouter van Gulik

Wouter van Gulik schreef:



Note that in some cases it could be very interesting to use r27, or Y, 
register.




Should have written R28 of course.

Since gcc seems down at the moment I did some more testing.

Now consider this example:
void main(void)
{
char *p = x;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
}
This must be done using a subi/sbci pare.

But the compiler now seems to realize that p is a constant offset to x. 
So we now get:


main:
/* prologue: frame size=0 */
push r16
push r17
/* prologue end (size=2) */
lds r16,x
lds r17,(x)+1
movw r24,r16
call foo
movw r24,r16
subi r24,lo8(-(65))
sbci r25,hi8(-(65))
call foo
movw r24,r16
subi r24,lo8(-(130))
sbci r25,hi8(-(130))

Here x is stored in r16 and the cumulative offset is added to R24

But if the compiler can realize this... Then why not do this for adds 
within the adiw range?!?

So for p++/p+=1 we would get something like:

movw r24, r16
adiw r24, 1
call foo
movw r24, r16
adiw r24, 2
etc..

This is just as small as the earlier suggested use of R28!

Wouter



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] GCC-AVR Register optimisations

2008-01-10 Thread andrewhutchinson
Thanks for feedback!

I will try your example latter today and see what I get. The change in register 
allocation order allows gcc to fixes some other things.

Part of the problem in your example is the strange move away from R16 to R14:

movw r14,r16 
sec 
adc r14,__zero_reg__ 
adc r15,__zero_reg__ 
movw r24,r16 

It is not obvious why this is not optimised out (unless optimisation was 
disabled or retricted). This normally only happens when all the higher numbered 
registers are used up - or it needs to preserve result across call.

It should have picked a higher number register (24 or even R28) - as I would 
expect these to be unused - and obviously ahead of R14 in the allocation order.

If the move had been made to R16 or higher, then the addition would have been 
simpler - or even as simple as your example.

However, often looking at intermediate RTL gives some clue why.

Can you tell me what optimisation setting was used?

Andy

 Wouter van Gulik [EMAIL PROTECTED] wrote: 
  Registers 17 downwards are  call saved and push/popped in prescribed
  order by prolog/epilog functions. Also R28,29 is potential frame pointer
  and so that is best left alone. So the key registers are: R18-R27   R30,31
  
 
 Note that in some cases it could be very interesting to use r27, or Y, 
 register.
 
 Consider this example:
 
 char *x;
 volatile int y;
 
 void foo(char *p)
 {
  y += *p;
 }
 
 void main(void)
 {
   char *p1 = x;
   foo(p1++);
   foo(p1++);
   foo(p1++);
   foo(p1++);
   foo(p1++);
   foo(p1++);
   foo(p1++);
   foo(p1++);
   foo(p1++);
   foo(p1++);
 }
 
 
 This will generate very bad code.
 /* prologue: frame size=0 */
   push r14
   push r15
   push r16
   push r17
 /* prologue end (size=4) */
   lds r24,x
   lds r25,(x)+1
   movw r16,r24
   subi r16,lo8(-(1))
   sbci r17,hi8(-(1))
   call foo
   movw r14,r16
   sec
   adc r14,__zero_reg__
   adc r15,__zero_reg__
   movw r24,r16
   call foo
   movw r16,r14
   subi r16,lo8(-(1))
   sbci r17,hi8(-(1))
   movw r24,r14
   call foo
   movw r14,r16
   sec
   adc r14,__zero_reg__
   adc r15,__zero_reg__
   movw r24,r16
   call foo
   movw r16,r14
   subi r16,lo8(-(1))
   sbci r17,hi8(-(1))
   movw r24,r14
   call foo
 etc..
 
 A more optimal scheme would be
   call foo
   movw r24, r16
   adiw r24, 1
  movw r16, r24
  call foo
 etc..
 Using the r24 capability to do a 16 bit increment
 
 But in this special case there is no frame pointer. So we could use 
 R28 to store instead of R16. Then we can add on r28 and do something 
 like this:
   
   call foo
   adiw r28, 1
   movw r24, r28
   call foo
 
 So yes using R28 as last resort looks like a sane thing.
 Unless there is no frame pointer at all, and there is a need for 16 (or 
 32 bit) arithmetic on saved registers. This is probably incredibly 
 difficult. But I thought to mention it anyway
 
 HTH,
 
 Wouter
 
 ps.
 
 Writing it like foo(p); p++; Will produce better code?!? I will fill a 
 bug report for this.
 
  With the order, there are several problems:
  
  1) Initial register  allocation fragments the register set. For example,
  allocating r25 will prevent R24-25 being used for 16bit register  and
  prevent R22-25 and R24-27 being used as 32 bit registers. gcc register
  allocator does not seem to overcome this fragmentation.
  
  2) The situation is made worse by the order of  16bit+ register used for
  call and return values - which are allocated in reverse order. eg
  R24-R25, R22-24, R18-24.  This means that the function parameters or
  return values are rarely  in the right place - except for 16bit values.
  
  3) Allocating a byte to odd number register precluded it being extended
  to 16bit value without a move.
  
  So, I tried creating an order which would preserve the contiguous
  register space and avoid the above issues as much as possible.
  This is what I ended up with:
  
  R18,26,22,30,20,24,19,21,23,25,27,31,28,29, \
 17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,\
  
  
  The result is a 1.25% saving in code size for a simple mixed
  application. Pretty good for such a simple change!
  
  For more floating point, the saving might well be higher as it demands
  more contiguous 32 bit registers.
  
  On the same basis, the current order of called saved registers R2-R17
  dictated by  (mcall) prolog limit further improvement is clearly
  imperfect.  These are used less frequently, though their cost is much
  higher. So its difficult to gauge impact. I might take a look at some
  intense floating point functions to see if this if it is worth pursuing
  reordering these too.
  
  
  Andy
  
  
  
  
  
  
  
  
  
  ___
  AVR-GCC-list mailing list
  AVR-GCC-list@nongnu.org
  

RE: [avr-gcc-list] GCC-AVR Register optimisations

2008-01-10 Thread Weddington, Eric
 

 -Original Message-
 From: 
 [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED]
 org] On Behalf Of Andrew Hutchinson
 Sent: Wednesday, January 09, 2008 6:52 PM
 To: avr-gcc-list@nongnu.org
 Subject: [avr-gcc-list] GCC-AVR Register optimisations
 
 Hi all,
 
 just spend some days going over gcc-avr and missed optimizations.
 
 One area I looked at was register allocation
snip
 So is there an better order?

I certainly appreciate all your effort in looking at missed
optimizations. I know that not very many people are able to look into
this area in AVR GCC right now.

However, I'd like to bring up a few points:

- Changing the register order, while it seems promising, introduces a
major backwards incompatibility. Avr-libc is written in mostly
hand-optimized assembly, which means that C functions in the application
call assembly routines in avr-libc. Changing the register order means a
complete overhaul of avr-libc; something that is not likely to happen
quickly or without a lot of effort. Would you be prepared to help take
this on?

- There are many missed optimization bugs in the bug database that
probably could be fixed without resorting to changing the register
order. These are definitely real world problems that need to be fixed.
http://www.nongnu.org/avr-libc/bugs.html

- I, and others, are very interested in what you are using to test your
proposed changes. I have plans to put together an AVR Benchmark Suite,
consisting of a variety of publicly available programs that can be used
to test the compiler performance over time. It definitely needs to have
different types of programs, and publicly available programs so there
are no issues with distributing such a Benchmark Suite. I welcome any
collaboration on this.

- The bug list http://www.nongnu.org/avr-libc/bugs.html has a number
of bugs that are wrong-code bugs or bugs that generate an internal
compiler error on valid code (ICE-on-valid-code). These bugs are much
more important to fix right now then tackling the various missed
optimization bugs. These higher priority bugs show where the compiler,
or AVR back end of the compiler, is *failing*. Any help in fixing these
would be very much appreciated. IMHO, after these high-priority bugs get
fixed, then it would be worthwhile to start looking at fixing missed
optimizations.

Eric Weddington


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] GCC-AVR Register optimisations

2008-01-10 Thread andrewhutchinson

 Weddington wrote: 
  
 
  -Original Message-
  From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED]
  org] On Behalf Of Andrew Hutchinson
  Sent: Wednesday, January 09, 2008 6:52 PM
  To: avr-gcc-list@nongnu.org
  Subject: [avr-gcc-list] GCC-AVR Register optimisations
  
  Hi all,
  
  just spend some days going over gcc-avr and missed optimizations.
  
  One area I looked at was register allocation
 snip
  So is there an better order?
 
 I certainly appreciate all your effort in looking at missed
 optimizations. I know that not very many people are able to look into
 this area in AVR GCC right now.
 
 However, I'd like to bring up a few points:
 
 - Changing the register order, while it seems promising, introduces a
 major backwards incompatibility. Avr-libc is written in mostly
 hand-optimized assembly, which means that C functions in the application
 call assembly routines in avr-libc. Changing the register order means a
 complete overhaul of avr-libc; something that is not likely to happen
 quickly or without a lot of effort. Would you be prepared to help take
 this on?
 
No!. Changing the order has not effect of the registers used to call functions 
or return values.  They are separately controlled in back end. The allocation 
order refers to the order in which registers are used for intermediate values 
or locals. So even if order starts with R18, functions will still expect and 
return an int in R24-25.

Changing the lower registers (CALL SAVED) does introduce a libgcc 
incompatibility, in that the routines for prolog/epilog invoked by 
-mcall-prolog assume that these registers are push/popped in a contigous 
sequence starting with R17. So changing to a different allocation order may 
well incur more push/ops than needed - unless prolog/epilog push/pop order was 
changed to match the same order. For this reason, I have left that alone. 
(although its not a big deal). prolog/epilog call are parts of gcc not replaced 
by libc. So even that would  leaves libc untouched.

Of course, any c still used in libc would benefit from recompilation.


 - There are many missed optimization bugs in the bug database that
 probably could be fixed without resorting to changing the register
 order. These are definitely real world problems that need to be fixed.
 http://www.nongnu.org/avr-libc/bugs.html
 
Yes, I have tried to look at underlying problems rather than concentrate on 
specfics. That way, you fix more problems


 - I, and others, are very interested in what you are using to test your
 proposed changes. I have plans to put together an AVR Benchmark Suite,
 consisting of a variety of publicly available programs that can be used
 to test the compiler performance over time. It definitely needs to have
 different types of programs, and publicly available programs so there
 are no issues with distributing such a Benchmark Suite. I welcome any
 collaboration on this.

Absolutely!


 
 - The bug list http://www.nongnu.org/avr-libc/bugs.html has a number
 of bugs that are wrong-code bugs or bugs that generate an internal
 compiler error on valid code (ICE-on-valid-code). These bugs are much
 more important to fix right now then tackling the various missed
 optimization bugs. These higher priority bugs show where the compiler,
 or AVR back end of the compiler, is *failing*. Any help in fixing these
 would be very much appreciated. IMHO, after these high-priority bugs get
 fixed, then it would be worthwhile to start looking at fixing missed
 optimizations.

I have not ignored the higher priority bugs. Indeed you have my patch for 
register spill. The register allocation order is an off shoot from this to 
cover the possibility that patch would produce less optimum code.

Some of the others bugs less easy to reproduce on newer versions of gcc - also 
fixing one problem often prevent the other occuring. And having multiple 
gcc/winavr version is tricky enough with 2.

I have some WIP for other bugs - but have not posted any resolution yet.

 
 Eric Weddington



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


RE: [avr-gcc-list] GCC-AVR Register optimisations

2008-01-10 Thread Weddington, Eric
 

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, January 10, 2008 11:48 AM
 To: avr-gcc-list@nongnu.org; Weddington, Eric
 Subject: RE: [avr-gcc-list] GCC-AVR Register optimisations
 
 
  
  - Changing the register order, while it seems promising, 
 introduces a
  major backwards incompatibility. Avr-libc is written in mostly
  hand-optimized assembly, which means that C functions in 
 the application
  call assembly routines in avr-libc. Changing the register 
 order means a
  complete overhaul of avr-libc; something that is not likely 
 to happen
  quickly or without a lot of effort. Would you be prepared 
 to help take
  this on?
  
 No!. Changing the order has not effect of the registers used 
 to call functions or return values.  They are separately 
 controlled in back end. The allocation order refers to the 
 order in which registers are used for intermediate values or 
 locals. So even if order starts with R18, functions will 
 still expect and return an int in R24-25.
 
 Changing the lower registers (CALL SAVED) does introduce a 
 libgcc incompatibility, in that the routines for 
 prolog/epilog invoked by -mcall-prolog assume that these 
 registers are push/popped in a contigous sequence starting 
 with R17. So changing to a different allocation order may 
 well incur more push/ops than needed - unless prolog/epilog 
 push/pop order was changed to match the same order. For this 
 reason, I have left that alone. (although its not a big 
 deal). prolog/epilog call are parts of gcc not replaced by 
 libc. So even that would  leaves libc untouched.
 
 Of course, any c still used in libc would benefit from 
 recompilation.

Thanks for the clarification. It certainly helps that the call order
essentially won't change. I'm still not fond of the idea of having to
change libgcc. It brings up a whole host of issues of synchronizing
these changes and introducing them to the end user.

  - The bug list http://www.nongnu.org/avr-libc/bugs.html 
 has a number
  of bugs that are wrong-code bugs or bugs that generate an internal
  compiler error on valid code (ICE-on-valid-code). These 
 bugs are much
  more important to fix right now then tackling the various missed
  optimization bugs. These higher priority bugs show where 
 the compiler,
  or AVR back end of the compiler, is *failing*. Any help in 
 fixing these
  would be very much appreciated. IMHO, after these 
 high-priority bugs get
  fixed, then it would be worthwhile to start looking at fixing missed
  optimizations.
 
 I have not ignored the higher priority bugs. Indeed you have 
 my patch for register spill. 

Thanks again! I hope that your patch can be reviewed soon. :-)

 The register allocation order is 
 an off shoot from this to cover the possibility that patch 
 would produce less optimum code.
 
 Some of the others bugs less easy to reproduce on newer 
 versions of gcc - also fixing one problem often prevent the 
 other occuring. And having multiple gcc/winavr version is 
 tricky enough with 2.
 
 I have some WIP for other bugs - but have not posted any 
 resolution yet.
 

I look forward to seeing your work when it's ready! :-)

Eric Weddington


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] GCC-AVR Register optimisations

2008-01-10 Thread andrewhutchinson
Ok, I checked instruction patterns in GCC AVR.MD and use of ADIW registers is 
marked ! for ADD 16bits, ADD 32 bits and TEST 16bits

This means that it will not be used by reload and it will be a second/third 
choice elsewhere. Which seems to match your observations!

It also will push allocation away from R24-R30 - which might explain why R14 
was getting used.

I looked back thru change history and this has been their since original.  

It could be that it fixes a problem that no longer exists. For sure it will 
produce poor code as you describe

It so happens I noticed this the other day and removed it from my working copy 
(to see if anything bad happened and also to smoke test my patch for 
BASE_POINTER register spill - since I wanted to force more use of pointer 
registers)

Nothing bad has happened so far.

I will post results latter.







 Wouter van Gulik [EMAIL PROTECTED] wrote: 
 Wouter van Gulik schreef:
 
  
  Note that in some cases it could be very interesting to use r27, or Y, 
  register.
  
 
 Should have written R28 of course.
 
 Since gcc seems down at the moment I did some more testing.
 
 Now consider this example:
 void main(void)
 {
   char *p = x;
   foo(p); p+=65;
   foo(p); p+=65;
   foo(p); p+=65;
   foo(p); p+=65;
   foo(p); p+=65;
   foo(p); p+=65;
   foo(p); p+=65;
   foo(p); p+=65;
   foo(p); p+=65;
   foo(p); p+=65;
 }
 This must be done using a subi/sbci pare.
 
 But the compiler now seems to realize that p is a constant offset to x. 
 So we now get:
 
 main:
 /* prologue: frame size=0 */
   push r16
   push r17
 /* prologue end (size=2) */
   lds r16,x
   lds r17,(x)+1
   movw r24,r16
   call foo
   movw r24,r16
   subi r24,lo8(-(65))
   sbci r25,hi8(-(65))
   call foo
   movw r24,r16
   subi r24,lo8(-(130))
   sbci r25,hi8(-(130))
 
 Here x is stored in r16 and the cumulative offset is added to R24
 
 But if the compiler can realize this... Then why not do this for adds 
 within the adiw range?!?
 So for p++/p+=1 we would get something like:
 
   movw r24, r16
   adiw r24, 1
   call foo
   movw r24, r16
   adiw r24, 2
 etc..
 
 This is just as small as the earlier suggested use of R28!
 
 Wouter
 



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-10 Thread Dave N6NZ
Well, after spamming the wrong list with this, I hope I've got the right 
place now :(


-dave
---BeginMessage---


Weddington, Eric wrote:

- I, and others, are very interested in what you are using to test your
proposed changes. I have plans to put together an AVR Benchmark Suite,
consisting of a variety of publicly available programs that can be used
to test the compiler performance over time. It definitely needs to have
different types of programs, and publicly available programs so there
are no issues with distributing such a Benchmark Suite. I welcome any
collaboration on this.


Hi,
  Over the course of years I have worked in and managed various 
different hardware and software validation teams.  This is probably a 
good way for me to contribute, since I'm not a compiler back-end guru, 
but I have relevant experience writing and reviewing test plans.  Some 
of those test plans even targeted C compilers :)


-dave

---End Message---
___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-10 Thread Andrew Hutchinson

Here my input:

For starters gcc has testsuite that can be used. It's not perfect but 
its quite demanding - even if we cant do all the tests.


Do we have info on setting this up with simulator? I did have some 
instruction - once!


After than I suggest some benchmark  that would produce more normal 
code and also give qualitative indications of performance (size is easy, 
speed would be nice).


Finally, regression tests using testcases and bug reports.

Andy


Dave N6NZ wrote:
Well, after spamming the wrong list with this, I hope I've got the 
right place now :(


-dave



Subject:
Re: [avr-gcc-list] GCC-AVR Register optimisations
From:
Dave N6NZ [EMAIL PROTECTED]
Date:
Thu, 10 Jan 2008 12:35:10 -0800
To:
gEDA user mailing list [EMAIL PROTECTED]

To:
gEDA user mailing list [EMAIL PROTECTED]




Weddington, Eric wrote:

- I, and others, are very interested in what you are using to test your
proposed changes. I have plans to put together an AVR Benchmark Suite,
consisting of a variety of publicly available programs that can be used
to test the compiler performance over time. It definitely needs to have
different types of programs, and publicly available programs so there
are no issues with distributing such a Benchmark Suite. I welcome any
collaboration on this.


Hi,
  Over the course of years I have worked in and managed various 
different hardware and software validation teams.  This is probably a 
good way for me to contribute, since I'm not a compiler back-end guru, 
but I have relevant experience writing and reviewing test plans.  Some 
of those test plans even targeted C compilers :)


-dave



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
  



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-10 Thread Andrew Hutchinson


I tried they earlier example:

  char *p1 = x;
  foo(p1++);
  foo(p1++);
  foo(p1++);

etc

Even with different register allocation, the result is bad.

a) Basic flow that gcc creates is something like:

b=a+1;
R24=a;
foo(R24);
c=b+1;
R24=b;
foo(R24)

This needs 1 variable to be saved over call. But due to overlapped 
lifetimes, it creates 2.  For example, both a and b must exist at same time.

If it had reversed ordering, it would not need this. For example

R24=a;
b=a+1
foo(R24);
R24=b;
c=b+1;
foo(R24)

with reordering, when  b is created, a is dead. So we only need 1 register

I am not experienced enough to know why gcc cannot optimise this case. 
But it looks like a weakness with gcc (not gcc-avr)


b) Register costs used to preference allocation are all equal for AVR - 
so there is no preference for ADIW regs (even when I removed !w)


So backend does not indicate a preference between R16=R16+1, R14=R14+1 
or R28=R28+1


In current gcc, frame pointer (r28-29) does not get used for register 
allocation - clearly that would be the best call saved register which 
could use ADIW and avoid moves.
It looks like  allocation is made with frame pointer used. Then, if it 
is not required, it does not use R28-29. But it does not try allocation 
without frame_pointer.


I tried improved foo(p);p++;  it produces much better code. Still not 
using R28 (for same reason)
In this case, the increment is specified after function call, so we dont 
have overlapped lifetime of registers - only one is then used and all 
becomes simple.


Andy

PS Please report as a bug - gcc should be better than this.



Wouter van Gulik wrote:

The RTL dump will tell me why it chose R14 before.




What do you mean with RTL dump exactly? I tried looking through some dumps
but I could not make sense of it. I used -dP and --save-temps. But all
looked the same to me.

  

I recollect there were some odd ! markers that stops the possibility of
ADIW registers being by reload for certain operations

That might be reason. If so I'll have to dig out why they were put in -
maybe to fix some other problem.




Well for all possible ADIW uses (addsi, addhi) it's a !w.
If this could be undone much pointer arithmetic could be done better I
guess/hope.

Any clue on why foo(p++) gives even poorer code compared to foo(p); p++?

HTH,

Wouter

  


 Wouter van Gulik [EMAIL PROTECTED] wrote:


Wouter van Gulik schreef:

  

Note that in some cases it could be very interesting to use r27, or Y,
register.



Should have written R28 of course.

Since gcc seems down at the moment I did some more testing.

Now consider this example:
void main(void)
{
char *p = x;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
foo(p); p+=65;
}
This must be done using a subi/sbci pare.

But the compiler now seems to realize that p is a constant offset to x.
So we now get:

main:
/* prologue: frame size=0 */
push r16
push r17
/* prologue end (size=2) */
lds r16,x
lds r17,(x)+1
movw r24,r16
call foo
movw r24,r16
subi r24,lo8(-(65))
sbci r25,hi8(-(65))
call foo
movw r24,r16
subi r24,lo8(-(130))
sbci r25,hi8(-(130))

Here x is stored in r16 and the cumulative offset is added to R24

But if the compiler can realize this... Then why not do this for adds
within the adiw range?!?
So for p++/p+=1 we would get something like:

movw r24, r16
adiw r24, 1
call foo
movw r24, r16
adiw r24, 2
etc..

This is just as small as the earlier suggested use of R28!

Wouter

  




  





___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-10 Thread Dave N6NZ



Andrew Hutchinson wrote:

Here my input:

For starters gcc has testsuite that can be used. It's not perfect but 
its quite demanding - even if we cant do all the tests.


Yes, we could probably pull a subset of meaningful test from that which 
would give us a great start.  The hard/tedious part will be generating 
our expected results.




After than I suggest some benchmark  that would produce more normal 
code and also give qualitative indications of performance (size is easy, 
speed would be nice).


Finally, regression tests using testcases and bug reports.


Performance regressions are particularly nasty to test. First you have 
to have code that can sensitize an optimization that you want.  Then, 
you need to have some expected results.  The compare of actual to 
expected is difficult -- what are you measuring?  And how close is good 
enough?  Exactly clock count?  Clock count within a guard band?  A 
specific assembler sequence that must happen (but actual registers used 
don't matter)?  Some registers matter?


Then you have the test matrix... what is the expected result with -Os 
versus -O3?  etc, etc, etc.


All this presumes a test framework that can capture all the required 
outputs to check against expected results: .S, clock count, code size, 
exit code,  Oh, and getting the right numerical (or whatever) 
answer... and are we getting it for the right reason?  The compiler 
might show a great speed and code size improvement by mistakenly 
ignoring the volatile keyword in some optimization, and a simulator 
might never show a wrong answer :(


Having been purely a gcc and avr-gcc user, I don't have any idea if any 
parts of this framework already exist in a form that addresses these 
problems.


Anyway, the test matrix is huge, and a simulator is going to limit the 
ultimate performance of the test rig.  So we won't be able to do 
everything we can think of.


On the plus side of the ledger: Our audience, being embedded developers, 
have very specific needs, so that can help prioritize.


OK, this e-mail was a shotgun blast of issues completely devoid of 
specifics :) :)




Andy


-dave



Dave N6NZ wrote:
Well, after spamming the wrong list with this, I hope I've got the 
right place now :(


-dave



Subject:
Re: [avr-gcc-list] GCC-AVR Register optimisations
From:
Dave N6NZ [EMAIL PROTECTED]
Date:
Thu, 10 Jan 2008 12:35:10 -0800
To:
gEDA user mailing list [EMAIL PROTECTED]

To:
gEDA user mailing list [EMAIL PROTECTED]




Weddington, Eric wrote:

- I, and others, are very interested in what you are using to test your
proposed changes. I have plans to put together an AVR Benchmark Suite,
consisting of a variety of publicly available programs that can be used
to test the compiler performance over time. It definitely needs to have
different types of programs, and publicly available programs so there
are no issues with distributing such a Benchmark Suite. I welcome any
collaboration on this.


Hi,
  Over the course of years I have worked in and managed various 
different hardware and software validation teams.  This is probably a 
good way for me to contribute, since I'm not a compiler back-end guru, 
but I have relevant experience writing and reviewing test plans.  Some 
of those test plans even targeted C compilers :)


-dave



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
  






___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-10 Thread Andrew Hutchinson


You input in valued.

For speed/size I was think we need to have sanity checks as gcc versions 
have  been known to suddenly dump a bunch of unexpected data in the 
middle of a linked program.
Of course all the functional tests still give correct results and all 
looks good!


The second use would be as evaluating impact of a patch - so we can see 
that if it causes significant degradations in other areas. Qualative 
perhaps - but the alternative is nothing..


gcc test suite has full set of answers and framework. We may not pass 
all the marked applicable tests due to limited memory or unimplemented 
functions. Yet even with this functional regression can easily be seen.


Andy




Dave N6NZ wrote:



Andrew Hutchinson wrote:

Here my input:

For starters gcc has testsuite that can be used. It's not perfect but 
its quite demanding - even if we cant do all the tests.


Yes, we could probably pull a subset of meaningful test from that 
which would give us a great start.  The hard/tedious part will be 
generating our expected results.




After than I suggest some benchmark  that would produce more normal 
code and also give qualitative indications of performance (size is 
easy, speed would be nice).


Finally, regression tests using testcases and bug reports.


Performance regressions are particularly nasty to test. First you have 
to have code that can sensitize an optimization that you want.  Then, 
you need to have some expected results.  The compare of actual to 
expected is difficult -- what are you measuring?  And how close is 
good enough?  Exactly clock count?  Clock count within a guard band?  
A specific assembler sequence that must happen (but actual registers 
used don't matter)?  Some registers matter?


Then you have the test matrix... what is the expected result with -Os 
versus -O3?  etc, etc, etc.


All this presumes a test framework that can capture all the required 
outputs to check against expected results: .S, clock count, code size, 
exit code,  Oh, and getting the right numerical (or whatever) 
answer... and are we getting it for the right reason?  The compiler 
might show a great speed and code size improvement by mistakenly 
ignoring the volatile keyword in some optimization, and a simulator 
might never show a wrong answer :(


Having been purely a gcc and avr-gcc user, I don't have any idea if 
any parts of this framework already exist in a form that addresses 
these problems.


Anyway, the test matrix is huge, and a simulator is going to limit the 
ultimate performance of the test rig.  So we won't be able to do 
everything we can think of.


On the plus side of the ledger: Our audience, being embedded 
developers, have very specific needs, so that can help prioritize.


OK, this e-mail was a shotgun blast of issues completely devoid of 
specifics :) :)




Andy


-dave



Dave N6NZ wrote:
Well, after spamming the wrong list with this, I hope I've got the 
right place now :(


-dave

 



Subject:
Re: [avr-gcc-list] GCC-AVR Register optimisations
From:
Dave N6NZ [EMAIL PROTECTED]
Date:
Thu, 10 Jan 2008 12:35:10 -0800
To:
gEDA user mailing list [EMAIL PROTECTED]

To:
gEDA user mailing list [EMAIL PROTECTED]




Weddington, Eric wrote:
- I, and others, are very interested in what you are using to test 
your

proposed changes. I have plans to put together an AVR Benchmark Suite,
consisting of a variety of publicly available programs that can be 
used
to test the compiler performance over time. It definitely needs to 
have

different types of programs, and publicly available programs so there
are no issues with distributing such a Benchmark Suite. I welcome any
collaboration on this.


Hi,
  Over the course of years I have worked in and managed various 
different hardware and software validation teams.  This is probably 
a good way for me to contribute, since I'm not a compiler back-end 
guru, but I have relevant experience writing and reviewing test 
plans.  Some of those test plans even targeted C compilers :)


-dave

 



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
  






___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-10 Thread Dave N6NZ


Andrew Hutchinson wrote:



For speed/size I was think we need to have sanity checks as gcc versions 
have  been known to suddenly dump a bunch of unexpected data in the 
middle of a linked program.
Of course all the functional tests still give correct results and all 
looks good!


Yup.  A common compiler testing problem.  Optimizations are always a 
balancing act, since some that improve speed increase code size and so 
forth.




The second use would be as evaluating impact of a patch - so we can see 
that if it causes significant degradations in other areas. Qualative 
perhaps - but the alternative is nothing..


I agree.  If there were a good test set that could be applied to a patch 
in isolation it would be very useful.  Even just a qualitative dash 
board report reveals good information, especially after everyone gets 
used to seeing the same data presented with the same format every time.


The test matrix deserves some thought.  The same test might have 
different pass/fail criteria under different options.  For example, 
imagine a suite of 10 tests.  You might say 10/10 must show zero code 
growth under -Os, 7/10 show no speed degradation under -Os, the same 10 
tests must show zero slow down for 10/10 under -O3, 6/10 no code growth 
under -O3.  (Just an example, may not be realistic.)


W.r.t options, we might also want to have a depth first test set, and 
a breadth first test set.  Depth first being going after specific test 
points with targeted, tests, and breadth first being a few tests under a 
broad range of conditions.




gcc test suite has full set of answers and framework. We may not pass 
all the marked applicable tests due to limited memory or unimplemented 
functions. Yet even with this functional regression can easily be seen.


I'll have to go take a look at the framework.  So far, I've been a 
trusting user ./configure; make all; and get a cup of tea.


-dave



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [Fwd: Re: [avr-gcc-list] GCC-AVR Register optimisations]

2008-01-10 Thread Dmitry K.
On Friday 11 January 2008 11:38, Andrew Hutchinson wrote:
 I tried they earlier example:

char *p1 = x;
foo(p1++);
foo(p1++);
foo(p1++);
[...]
 I am not experienced enough to know why gcc cannot optimise this case.
 But it looks like a weakness with gcc (not gcc-avr)

Possible, yes.
Some another targets of GCC 4.2.2:
  pdp11   -- ugly
  arm/thumb   -- ugly
  arm/arm -- OK

Regards,
Dmitry



___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


[avr-gcc-list] GCC-AVR Register optimisations

2008-01-09 Thread Andrew Hutchinson

Hi all,

just spend some days going over gcc-avr and missed optimizations.

One area I looked at was register allocation - this is not gcc strong
point. However, the current settings we use are making life more
difficult than it needs be.

The current order is:

   R24,25,\
   18,19,\
   20,21,\
   22,23,\
   30,31,\
   26,27,\
   28,29,\
   17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,\

You can tweak it with gcc-avr/winavr compile option -morder1 to get a
better result  ( or -morder2 to get a much worse one!)

So is there an better order?

Registers 17 downwards are  call saved and push/popped in prescribed
order by prolog/epilog functions. Also R28,29 is potential frame pointer
and so that is best left alone. So the key registers are: R18-R27   R30,31

With the order, there are several problems:

1) Initial register  allocation fragments the register set. For example,
allocating r25 will prevent R24-25 being used for 16bit register  and
prevent R22-25 and R24-27 being used as 32 bit registers. gcc register
allocator does not seem to overcome this fragmentation.

2) The situation is made worse by the order of  16bit+ register used for
call and return values - which are allocated in reverse order. eg
R24-R25, R22-24, R18-24.  This means that the function parameters or
return values are rarely  in the right place - except for 16bit values.

3) Allocating a byte to odd number register precluded it being extended
to 16bit value without a move.

So, I tried creating an order which would preserve the contiguous
register space and avoid the above issues as much as possible.
This is what I ended up with:

R18,26,22,30,20,24,19,21,23,25,27,31,28,29, \
   17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,\


The result is a 1.25% saving in code size for a simple mixed
application. Pretty good for such a simple change!

For more floating point, the saving might well be higher as it demands
more contiguous 32 bit registers.

On the same basis, the current order of called saved registers R2-R17
dictated by  (mcall) prolog limit further improvement is clearly
imperfect.  These are used less frequently, though their cost is much
higher. So its difficult to gauge impact. I might take a look at some
intense floating point functions to see if this if it is worth pursuing
reordering these too.


Andy









___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] GCC-AVR Register optimisations

2008-01-09 Thread Dave N6NZ



Andrew Hutchinson wrote:

Hi all,

just spend some days going over gcc-avr and missed optimizations.

Which is a huge bunch of work! Thanks!
snip



The result is a 1.25% saving in code size for a simple mixed
application. Pretty good for such a simple change!


Very good!  What test cases were you using as a test case?

1.25% code size is very significant, since it translates into a speed 
improvement as well, so 1.25% reduction of an inner loop can buy even 
more of a speed up.




For more floating point, the saving might well be higher as it demands
more contiguous 32 bit registers.


That makes sense.  I don't have any good test cases for 32 bit AVR code. 
But actually, simply measuring libm is probably a very good test case in 
a practical sense.


-dave


___
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


Re: [avr-gcc-list] GCC-AVR Register optimisations

2008-01-09 Thread Dmitry K.
Very interesting!

I will try this new order with Avr-libc's C-functions
(probably at the nearest week-end).

Today (with default order) the results are:

AVR:   at90s8515__  atmega8 
GCC:   3.3.6 3.4.6 4.0.4 4.1.2 4.2.2 4.3.X  3.3.6 3.4.6 4.0.4 4.1.2 4.2.2 4.3.X 
---
bsearch(z,s,sizeof(s),1,cmp)
  Flh:   272   270   266   266   266   268214   212   208   208   208   204 
  Stk:161616161616 161616161616 
  Tim:   526   533   530   530   530   530327   334   331   331   331   327 
---
dtostre(1.2345,s,6,0)
  Flh:  1000   998  1104  1194  1184  1174932   930  1020  1094  1088  1086 
  Stk:151515171717 151515171717 
  Tim:  1197  1197  1285  1313  1313  1290   1058  1058  1119  1152  1152  1143 
---
dtostrf(1.2345,15,6,s)
  Flh:  1666  1688  1696  1668  1676  1690   1512  1528  1568  1544  1548  1566 
  Stk:353838383639 353838383639 
  Tim:  1670  1621  1667  1607  1608  1618   1480  1437  1493  1444  1443  1456 
---
free(p)
  Flh:   540   548   544   544   556   568486   494   500   500   508   512 
  Stk: 4 4 4 4 4 4  4 4 4 4 4 4 
  Tim:   222   229   229   229   231   229201   208   211   211   212   210 
---
malloc(1)
  Flh:   540   548   544   544   556   568486   494   500   500   508   512 
  Stk: 2 4 4 4 4 4  2 4 4 4 4 4 
  Tim:   186   193   195   195   197   195167   174   178   178   179   177 
---
qsort(s,sizeof(s),1,cmp)
  Flh:  1314  1302  1220  1222  1242  1496   1086  1078   994   996  1008  1268 
  Stk:363636363840 363636363840 
  Tim: 21915 21896 20182 20474 20914 21091  16976 16964 16002 16294 16678 16926 
---
rand()
  Flh:   548   548   498   492   508   498508   508   478   480   484   456 
  Stk:181818181818 181818181818 
  Tim:  1505  1505  1484  1484  1488  1484   1497  1497  1482  1482  1484  1475 
---
realloc((void*)0,1)
  Flh:  1162  1170  1156  1130  1156  1166   1036  1046  1044  1032  1046  1046 
  Stk:182020182022 182020182022 
  Tim:   293   300   302   294   304   311269   276   280   272   281   288 
---
sprintf_min(s,%d,12345)
  Flh:  1306  1272  1292  1210  1216  1274   1158  1138  1160  1082  1086  1142 
  Stk:555554535954 555554535954 
  Tim:  1847  1841  1805  1811  1846  1801   1706  1703  1673  1678  1711  1666 
---
sprintf(s,%d,12345)
  Flh:  1720  1696  1704  1642  1674  1608   1534  1514  1524  1462  1498  1422 
  Stk:545457575857 545457575857 
  Tim:  1633  1627  1639  1618  1610  1623   1545  1543  1555  1536  1528  1537 
---
sprintf_flt(s,%e,1.2345)
  Flh:  3422  3372  3300  3262  3334  3320   3130  3088  3006  2968  3040  2966 
  Stk:616163646666 616163646667 
  Tim:  2503  2496  2500  2482  2513  2492   2281  2277  2282  2263  2297  2302 
---
sscanf_min(12345,%d,i)
  Flh:  1468  1466  1486  1484  1498  1504   1334  1334  1350  1352  1360  1364 
  Stk:494953535955 494953535955 
  Tim:  1681  1672  1640  1638  1643  1685   1371  1367  1359  1357  1356  1392 
---
sscanf(12345,%d,i)
  Flh:  1792  1784  1908  1874  1848  1876   1624  1616  1700  1674  1662  1668 
  Stk:505054546156 505054546156 
  Tim:  1715  1736  1714  1694  1749  1751   1413  1434  1432  1417  1461  1463 
---
sscanf_flt(1.2345,%e,x)
  Flh:  4134  4094  4190  4114  4220  4472   3802  3762  3808  3772  3856  4086 
  Stk:   124   124   126   128   140   132