Re: GCC optimization?

Ken Krugler Tue, 24 Aug 1999 10:53:30 -0700
>> extern inline void *memcpy(void *dst, const void *src, size_t num)
>> {
>>   register void *ds = dst;
>>   while (num--)
>>     *((char *) dst)++ = *((char *) src)++;
>>   return ds;
>> }
>>
>> This will run faster but might increase the size somewhat.
>
>Take care with this. I tried various scenarii and found that the toolbox
>call "MemMove" is faster than the code above in most cases, except when you
>are moving very small amounts of memory (< 16 bytes).

Here's a copy of my email from 19Feb98 on the topic of DmWrite vs MemMove
vs compiler-generated code.

a. It's a bit dated, since the OS has changed some, but MemMove is moves a
word at a time.

b. The compiler code will only move longs at a time if it is copying a
word-aligned structure, since then it knows the size of the data.

c. If you really wanted to get max speed for moving data, you'd use the
movem.l instruction to suck & spit 48 bytes at a time, then fall into an
unrolled move.l sequence.

d. All of these optimizations are only interesting if you're slamming
around big chunks of data, which is (or should be) a very uncommon
situation on the device.

-- Ken

========================================================================
Performance of DmWrite
========================================================================

I thought the list might find it interesting to review the results of some
timing tests I did. In the past there have been discussions on this list
about the performance cost of keeping dynamic data in a storage heap -
since setting the value of a variable located in a storage heap chunk
requires a call to DmWrite. The advantage, of course, is that you have a
lot more memory available, since the dynamic heap space is rather limited.

This first table has the time (in seconds) to set 1, 10, 100, and 1000
bytes of data using the compiler (inline code generated by the compiler to
copy an N-byte record), MemMove, and DmWrite. These times are based on
moving the indicated amount of data 10,000 times on a PalmPilot Pro (1MB).

Bytes   Compiler        MemMove         DmWrite
1       00.06           00.76           06.10
10      00.14           00.88           06.22
100     00.93           02.29           07.70
1000    08.46           16.35           22.28

1. As you can see, for small variables (1 to 4 bytes) there is a
significant cost to DmWrite - it's on the order of 100x slower than a
direct set (6.10 seconds versus 0.06 seconds).

2. On the other hand, you could still set the value of a small variable
1,640 times in one second using DmWrite (10000 / 6.10). Which means that if
you're not doing this in the middle of a tight loop, the time hit to use
DmWrite isn't significant.

3. An interesting side note is that the compiler-generated code is
significantly faster than using MemMove (roughly 2x even after factoring
out the overhead of the trap call). Examing the code shows that the
compiler is generating a move.l loop (4 bytes copied each loop), while the
MemMove code is moving words (2 bytes copied each loop). Seems like the
MemMove routine could use a little optimization - even if it didn't use the
movem.l instruction for optimal speed, it could still be made almost twice
as fast with a little work.

4. Just for grins, I also ran the same test on my PalmPilot 5000 (512K).
You can see the difference that the extra 512K makes (faster memory
accesses) for the Pro:

Bytes   Compiler        MemMove         DmWrite
1       00.10           01.09           08.96
10      00.32           01.27           09.15
100     01.93           03.20           11.14
1000    17.74           22.57           31.13

5. And finally, for completeness I ran the test on my Duo 2300c (117MHz
603e). Since there's no trap dispatcher overhead, and DmWrite is
essentially the same as MemMove, the times are much closer. Ergo if you are
using DmWrite, and most of your development is done on a Mac using the
simulator, then you'll want to carefully check performance on the device.

Bytes   Compiler        MemMove         DmWrite
1       00.00           00.02           00.06
10      00.00           00.02           00.06
100     00.03           00.06           00.10
1000    00.26           00.45           00.49

Yours in tick counting,


Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200 (direct) +1 408-261-7550 (main)
Re: GCC optimization?

Reply via email to