RE: Multi-CPU Compilers and OS

Aaron Ardiri Wed, 13 Feb 2002 11:33:58 -0800

On Wed, 13 Feb 2002, Peter Epstein wrote:
> I'm a little confused by your comments. I'd expect a custom optimized
> blitter you wrote for a 68K based Palm device to be slower than the OS
> blitter in Palm OS 5 because the former will be emulated while the latter
> will be ARM native. What was once faster is now slower. The reason the
> emulator is indeed fast is that most of the time an application is running
> it's in an OS routine. All the OS routines are now ARM native, so they run
> nice and fast.


  well, the Win* copying routines are terribly slow, and, you need to
  make two API calls (mask + overlay) to draw a sprite. it also has
  clip region checking, bounds checking, bit depth checking blah blah
  blah.. to the point where, its 80% checking, 20% copying.

  our own routines, do the mask + overlay together, and, dont need
  to do all the checking, cause we design around such limitations..
  hence the "speed factor"..

  just imagine our routines in native ARM? :)

> If you've got a processing bottleneck in your application (such as computing
> values in the Mandelbrot Set, as John Marshall did), these will have enough
> work to do that even when running ARM native, they'll still swamp the
> overhead of getting to and from ARM native code. If instead you have
> something that gets run a huge number of times, and is a bottleneck only
> because of that, then porting it to ARM won't help you. You need to get that
> inner loop into ARM native code!

  what we have is a FPS counter in our app, i have compiler switches
  that allow us to use API's or, our own custom routines.. quite nice
  for testing.. we also test C verses optimized assembler :)

// #define USE_PALMOS_WINAPI       1   // use the Win* API for sprites
// #define PORTABLE                1   // do not use any m68k asm

    API      = 10 fps
    m68k c   = 23 fps
    m68k asm = 26 fps

  these were stats on the 75Mhz ARM boards in the labs :) back in
  september when we ran these tests we were also shocked, but, thats
  the way it is i guess :)

> Don't trust your intuition when it comes to optimizing code. Run a profiler
> such as the one provided in POSE, and examine the results carefully. When
> you do find a bottleneck and optimize it, another bottleneck will be
> exposed. You can keep doing this, but after a few iterations you'll tend to
> have no one big bottleneck, but rather a bunch of separate things that each
> take 20% of the time. That's a good time to stop ;-)

  it all has to do with circumstances, yes. profiling is very important,
  and, we found that the blitter API's did more checking than actual
  work, so, we just pulled out the checking :) take a look at the OS
  source for WinCopyRectangle() or WinDrawBitmap() and, you'll see what
  i mean :)

  of course, if we wrote some of these routines in ARM, it would definately
  speed up - but, having our m68k asm run faster than the ARM API is more
  than enough for now.. and, we maintain compatibility with older units..
  we will of course, one day, write them in ARM too :) just to boost that
  frame rate up 3x :P

  in general, porting the exact same code to ARM will run faster than
  emulating the same code written in m68k. which, is obvious :)

// az
[EMAIL PROTECTED]
http://www.ardiri.com/


-- 
For information on using the Palm Developer Forums, or to unsubscribe, please see 
http://www.palmos.com/dev/tech/support/forums/

RE: Multi-CPU Compilers and OS

Reply via email to