Because all the implementations were already probably fast enough, I didn't
even bother talking about the efficiency of the compiler.  Both tweaked to
the optimum, I'd still say that the table lookup will be about twice as fast
as looped and/or method.  Of course, the code snippet below will run faster,
but since it doesn't include the 'andi' or 'or' instruction, the results
will be a little different <grin>...

Long ago, a great programmer named Paul Rother was looking over some Z80
assembly code I had written.  It was a 4MHz Z80 doing vector graphics, with
3 axis rotation, etc.  XY points were clocked out at somewhere around 20 kHz
(about 70 uS per pair).  The same processor was also doing a terminal based
UI and reading a pulse based timecode from tape.  It was pretty hairy,
constant cycle-counting-in-the-comments stuff.  I had gotten something to
fit that was supposed to be too complicated and I was feeling very cocky.
Paul was looking over the code and suddenly said 'Why are you doing this?'
'Uh, I need the register cleared...' I replied.  He flipped back 2 or 3
pages (it was all fanfold paper back then) and said 'But you cleared it back
here...  You're wasting cycles.'

No matter how hard I worked at a chunk of code, no matter how fast I thought
I had something, Paul could *always* tweak it to run faster.  The most
important thing I learned was that his big gains almost always came from the
algorithms he would choose, not his gift for cycle skrunching.

-jjf

-----Original Message-----
From: Roger Chaplin [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 12, 2000 6:53 PM
To: Palm Developer Forum
Subject: RE: Byte order


> > Here's where the in-line assembler pays off:
> >
> >     link        a6,#0
> >     move.l      d3,-(sp)
> >     move.b      8(a6),d1
> >     moveq       #0,d0
> >     moveq       #7,d3
> > 1$  lsr.b       d1
> >     lsl.b       d0
> >     dbra.b      d3,1$
> >     move.l      (sp)+,d3
> >     unlk        a6
> >     rts
> >
> > The lsr, lsl, and dbra instructions all fit into the instruction fetch
> > registers, and so once they fetched from memory the first time, the
> > remainder of the loop runs with NO memory cycles!
> 
> D2 is also a scratch register.  Use it instead of D3, and you don't have
to save
> or restore D3.  And as long as you're using inline assembly, you might as
well
> get rid of those LINK and UNLK statements.

True. I left those in because I didn't want to bother with 
recalculating the stack references in my head, just for the sake of the 
example.

> Not that any program would be better for these changes... :-)

Maybe not better from a performance or maintenance point of view, but 
it sure is prettier :-)

--
Roger Chaplin
<[EMAIL PROTECTED]>

-- 
For information on using the Palm Developer Forums, or to unsubscribe,
please see http://www.palmos.com/dev/tech/support/forums/

-- 
For information on using the Palm Developer Forums, or to unsubscribe, please see 
http://www.palmos.com/dev/tech/support/forums/

Reply via email to