You bothered to do it...  Wow.

I can't recall all the 68K execution times off the top of my head, but they
vary widely.  At a glance, I'd say the lookup snippet is probably twice as
fast, not 3 times as fast.

But, the loop requires no heap space.  If you want to swap the bytes in an
interrupt handler or in response to certain launch codes, you have to either
take steps to find your data segment or put the table on the stack (remove
the 'static').  If the table initialization code is included, the speed
benefit vanishes.

In this case, the table lookup is clear cut since the offset calculation is
an add.  Sometimes, when the elements are odd sized (and larger than one),
the address math can really add up.  The worst I've seen on 68K is an old
Mark Williams C.  It just blindly used unsigned multiplies for array
indexing instead of adds and shifts.  I had a sin table that seemed WAY too
slow...

As for 'choose', well one is faster, one uses fewer system resources and is
context safe.  I'd pick the approach that best fit my current problem...
<grin>

Best Regards,
-jjf

-----Original Message-----
From: Robert Herold [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 12, 2000 3:36 PM
To: Palm Developer Forum
Subject: RE: Byte order


Here we go:

The loop version (size = 0x2c bytes, 8 passes thru loop, 2 memory write, 3
memory reads)

10071E02: 4E560000        link     a6,#0
10071E06: 2F05            move.l   d5,-(sp)
10071E08: 1A2E0008        move.b   8(a6),d5
10071E0C: 7200            moveq    #0,d1
10071E0E: 7400            moveq    #0,d2
10071E10: 600E            bra.s    *+16                    ; 0x10071e20
10071E12: D201            add.b    d1,d1
10071E14: 1005            move.b   d5,d0
10071E16: 02000001        andi.b   #0x1,d0
10071E1A: 8200            or.b     d0,d1
10071E1C: E20D            lsr.b    #1,d5
10071E1E: 5242            addq.w   #1,d2
10071E20: 0C420008        cmpi.w   #8,d2
10071E24: 6DEC            blt.s    *-18                    ; 0x10071e12
10071E26: 1001            move.b   d1,d0
10071E28: 2A1F            move.l   (sp)+,d5
10071E2A: 4E5E            unlk     a6
10071E2C: 4E75            rts


The lookup version (size = 0x2e bytes, plus 0x10 for static data, no loop, 1
memory write, 6 memory reads).

10071E40: 4E560000        link     a6,#0
10071E44: 102E0008        move.b   8(a6),d0
10071E48: 024000F0        andi.w   #0xf0,d0
10071E4C: E840            asr.w    #4,d0
10071E4E: 41EDE8F2        lea      -5902(a5),a0
10071E52: 7200            moveq    #0,d1
10071E54: 12300000        move.b   (a0,d0.w),d1
10071E58: E841            asr.w    #4,d1
10071E5A: 102E0008        move.b   8(a6),d0
10071E5E: 0240000F        andi.w   #0xf,d0
10071E62: 7400            moveq    #0,d2
10071E64: 14300000        move.b   (a0,d0.w),d2
10071E68: D441            add.w    d1,d2
10071E6A: 1002            move.b   d2,d0
10071E6C: 4E5E            unlk     a6
10071E6E: 4E75            rts

The stats:

        codesize  #OpcodesRun  #datareads #datawrites

loop      0x2c       74            3           2
lookup    0x2e       16            6           1


Code size is virtually identical, so we have to decide based on speed.  I
guess it depends how many cycles per instruction, and how fast the memory
system is.  Since its the 68k (CISC), lets say 2 cycles/instruction, and
since its a Palm, lets say a slow memory system (5 cycles/references).  Of
course, the best thing to do would be to just measure how long it took to
run each one a million times...

Using the guestimates, we then have:

loop:   74*2 + 5*5 = 173 cycles
lookup: 16*2 + 7*5 = 68 cycles

I like the lookup version!

-- bob


-----Original Message-----
From: Fitzpatrick, Joe [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 12, 2000 1:37 PM
To: Palm Developer Forum
Subject: RE: Byte order


Usually a lookup table is by far the fastest algorithm.  But, on the 68K a
loop of shifts and ors might be as fast.  It might be interesting to compare
the code that CW generates for your function with the one below.

-jjf

Byte SwapByte(Byte oldbyte)
{
   int i;
   Byte newbyte = 0;

   for (i=0; i<8 ; i++)
   {
      newbyte <<= 1;
      newbyte |= (oldbyte & 1);
      oldbyte >>= 1;
   }

   return newbyte;
}

-----Original Message-----
From: Robert Herold [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 12, 2000 1:10 PM
To: Palm Developer Forum
Subject: RE: Byte order


Or how about this:

Byte swap_byte(Byte b)
{
    static Byte swapped_nibble[16] = { 
        0x00, 0x80, 0x40, 0xC0, 
        0x20, 0xA0, 0x60, 0xe0,
        0x10, 0x90, 0x50, 0xd0,
        0x30, 0xb0, 0x70, 0xf0
    };

    return swapped_nibble[b & 0x0f] + (swapped_nibble[(b & 0xf0) >> 4] >>
4);
}

-- bob


====================
Robert Herold           (650) 627-3302
Webvan Group, Inc       (650) 627-3948 (fax)
310 Lakeside Drive      [EMAIL PROTECTED]
Foster City, CA  94404  www.webvan.com


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Friday, May 12, 2000 12:00 PM
To: Palm Developer Forum
Subject: Re: Byte order


This function seems like its doing a lot of work, what about something like 
this instead:

Byte swap_byte(Byte byOld)
{
    Byte    byNew = 0;

    if (byOld & 0x01) { byNew |= 0x80; }
    if (byOld & 0x02) { byNew |= 0x40; }
    if (byOld & 0x04) { byNew |= 0x20; }
    if (byOld & 0x08) { byNew |= 0x10; }
    if (byOld & 0x10) { byNew |= 0x08; }
    if (byOld & 0x20) { byNew |= 0x04; }
    if (byOld & 0x40) { byNew |= 0x02; }
    if (byOld & 0x80) { byNew |= 0x01; }

    return byNew;
}

In a message dated 5/12/00 12:32:49 PM Eastern Daylight Time, 
[EMAIL PROTECTED] writes:

<< Byte swap_byte(Byte old_byte){
 
   Byte new_byte;
   Byte working_byte[8];
   int  i;
 
 
   new_byte=0x00; // zero out new_byte
 
   for(i=0;i<8;i++)
     working_byte[i]=0x00;  // zero out working bytes
 
   working_byte[0]=(old_byte << 7); // bit 1
 
   working_byte[1]=(old_byte >> 1); // bit 2
   working_byte[1]=(working_byte[1] << 7);
   working_byte[1]=(working_byte[1] >> 1);
 
   working_byte[2]=(old_byte >> 2); // bit 3
   working_byte[2]=(working_byte[2] << 7);
   working_byte[2]=(working_byte[2] >> 2);
 
   working_byte[3]=(old_byte >> 3); // bit 4
   working_byte[3]=(working_byte[3] << 7);
   working_byte[3]=(working_byte[3] >> 3);
 
   working_byte[4]=(old_byte >> 4); // bit 5
   working_byte[4]=(working_byte[4] << 7);
   working_byte[4]=(working_byte[4] >> 4);
 
   working_byte[5]=(old_byte >> 5); // bit 6
   working_byte[5]=(working_byte[5] << 7);
   working_byte[5]=(working_byte[5] >> 5);
 
   working_byte[6]=(old_byte >> 6); // bit 7
   working_byte[6]=(working_byte[6] << 7);
   working_byte[6]=(working_byte[6] >> 6);
 
   working_byte[7]=(old_byte >> 7); // bit 8
 
 
 
   for(i=0;i<8;i++){
 
      new_byte=(new_byte | working_byte[i]);
 
   }  // end for loop
 
   return(new_byte);
 
 } // end function >>

-- 
For information on using the Palm Developer Forums, or to unsubscribe,
please see http://www.palmos.com/dev/tech/support/forums/

-- 
For information on using the Palm Developer Forums, or to unsubscribe,
please see http://www.palmos.com/dev/tech/support/forums/

-- 
For information on using the Palm Developer Forums, or to unsubscribe,
please see http://www.palmos.com/dev/tech/support/forums/

-- 
For information on using the Palm Developer Forums, or to unsubscribe,
please see http://www.palmos.com/dev/tech/support/forums/

-- 
For information on using the Palm Developer Forums, or to unsubscribe, please see 
http://www.palmos.com/dev/tech/support/forums/

Reply via email to