Because all the implementations were already probably fast enough, I didn't
even bother talking about the efficiency of the compiler. Both tweaked to
the optimum, I'd still say that the table lookup will be about twice as fast
as looped and/or method. Of course, the code snippet below will run faster,
but since it doesn't include the 'andi' or 'or' instruction, the results
will be a little different <grin>...
Long ago, a great programmer named Paul Rother was looking over some Z80
assembly code I had written. It was a 4MHz Z80 doing vector graphics, with
3 axis rotation, etc. XY points were clocked out at somewhere around 20 kHz
(about 70 uS per pair). The same processor was also doing a terminal based
UI and reading a pulse based timecode from tape. It was pretty hairy,
constant cycle-counting-in-the-comments stuff. I had gotten something to
fit that was supposed to be too complicated and I was feeling very cocky.
Paul was looking over the code and suddenly said 'Why are you doing this?'
'Uh, I need the register cleared...' I replied. He flipped back 2 or 3
pages (it was all fanfold paper back then) and said 'But you cleared it back
here... You're wasting cycles.'
No matter how hard I worked at a chunk of code, no matter how fast I thought
I had something, Paul could *always* tweak it to run faster. The most
important thing I learned was that his big gains almost always came from the
algorithms he would choose, not his gift for cycle skrunching.
-jjf
-----Original Message-----
From: Roger Chaplin [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 12, 2000 6:53 PM
To: Palm Developer Forum
Subject: RE: Byte order
> > Here's where the in-line assembler pays off:
> >
> > link a6,#0
> > move.l d3,-(sp)
> > move.b 8(a6),d1
> > moveq #0,d0
> > moveq #7,d3
> > 1$ lsr.b d1
> > lsl.b d0
> > dbra.b d3,1$
> > move.l (sp)+,d3
> > unlk a6
> > rts
> >
> > The lsr, lsl, and dbra instructions all fit into the instruction fetch
> > registers, and so once they fetched from memory the first time, the
> > remainder of the loop runs with NO memory cycles!
>
> D2 is also a scratch register. Use it instead of D3, and you don't have
to save
> or restore D3. And as long as you're using inline assembly, you might as
well
> get rid of those LINK and UNLK statements.
True. I left those in because I didn't want to bother with
recalculating the stack references in my head, just for the sake of the
example.
> Not that any program would be better for these changes... :-)
Maybe not better from a performance or maintenance point of view, but
it sure is prettier :-)
--
Roger Chaplin
<[EMAIL PROTECTED]>
--
For information on using the Palm Developer Forums, or to unsubscribe,
please see http://www.palmos.com/dev/tech/support/forums/
--
For information on using the Palm Developer Forums, or to unsubscribe, please see
http://www.palmos.com/dev/tech/support/forums/