On Sun, 10 Mar 2002, Joaquim Carvalho wrote:
> I agree that Assembly should replace C code only when there are obvious
> and consistent performance gains.
>
> Of course crappy assembly code is not good in terms of performance or
> program simplicity. Assembly must be written by passionate people who will
> gladly spend a few days writing and redesigning a short section of time critical
> code so that it runs as fast as can be.
Agreed, perhaps I have simply met to many programmers over the years who
weren't willing to take the time to do the job right, which may be why I
tend to be a little hard line on this issue.
> When you say that performance gains are rare I cannot agree. Just compare
> the frame rate of MAME and SPARCADE when running, for instance, PacMan
> and you'll get an idea of what can be done in assembly. Both emulators are
> highly optimized but SPARCADE is several times faster than MAME on every
> processor.
I can't speak to this directly since I am not familiar with either of
these, and it is not my contention that assembly will not be faster, but
in every case that I have had time to look at for 32 bit processors, this
was an apples and oranges comparison. The problem with these comparisons
is that someone who spends days working on a piece of assembly will often
come up with a better algorithm for the code, but this is pretty much
never back ported into the C code for a valid comparison. Ultimately, a
better algorithm almost always beats better optimization, which is why C
code performance is often much better than assembly in commercial
programming environments under tight deadlines, the C programmers have
more time to devote to improving the design and algorithms. In the course
of my career I have repeatedly demonstrated one to two orders of magnitude
performance increases by throwing out carefully optimized assembly code
and replacing it with a different algorithm which in most cases could be
implemented in C with virtually identical performance to the best assembly
had to offer. The original assembly in some cases was very well written,
but had they not spent the time on optimizing the code in assembly and
instead spent it looking for a better algorithm they would have been way
ahead. Sometimes the most trivial of code changes can give an enormous
peformance increase. years ago when I was playing with "Little Smalltalk",
I found a two character code change in the C source which increased it's
performance by over 30% (replace one divide with a right shift), and a
half dozen other trivial changes which boosted it by another 15%. These
changes could have been done in assembly as well, but the performance
boost would not have been from the use of assembly, but rather from the
slight changes in the algorithms.
Yes, with the latest processors where the compilers haven't caught up, for
some sections of code, massive performance gains can occur with assembly,
but at this stage, do we really need to be worrying about optimizing the
code for the processors that are already going to run it the fastest? My
attitude is let the compiler writers do their job while we get it stable,
and by the time we are ready, the performance boost may be as simple as
specifying the correct target processor.
> ..and plex86 needs a speed improvement. Have you tried VMWARE or
> Win4Lin? VMARE runs well though it's quite memory hungry. Win4Lin
> runs Windows 98 actually faster than native speed. Why is plex86 behind?
> Isn't the aim of this project to get more people to run Linux by giving them
> a relaxed way of breaking free from their Windows dependency?
I agree here that plex86 is behind, but if people are going to break free
using plex86, it will be based more on stability than performance. I'm
not saying that performance isn't necessary, but the mainstream users are
far more likely to go with a slower environment that works than a fast
one that crashes.
Again, my contention is not that assembly should not be used, but rather
that before going to assembly:
1 - The system should be more complete and more stable.
2 - The algorithms should be optimized
3 - The C code should be optimized
I didn't mean for this to turn into a big discussion (I know I'm talking
more than anyone :-) I just thought we should wait for the optimizations,
but if people want to spend the time on it now, that is their call (or our
project leader's call, but certainly not mine), just keep in mind that (at
least from my perspective) anything you do should not alter the support
for other processors or other host operating systems in a negative way,
such as making it faster for the Pentium IV but slower for the K6, or
making it faster for Linux but breaking support for BSD or Windows.
On a related note, has anyone actually tried doing a major profiling run
on the code to establish where the bottle necks actually are? It often
seems to me that the performance problems are never where you expect them
(or at least never where I expect them :-)
FWIW.
Shannon C. Dealy | DeaTech Research Inc.
[EMAIL PROTECTED] | - Custom Software Development -
| Embedded Systems, Real-time, Device Drivers
Phone: (800) 467-5820 | Networking, Scientific & Engineering Applications
or: (541) 451-5177 | www.deatech.com