Jan Slodicka wrote: > I recently compared the speed of an identical algorithm (loading a JPEG) on > several platforms - PC (1.6 MHz), Win CE (about 400 MHz, I forgot now the > device type but it should not matter) and Palm m515. The results really > surprised me. > > m515 run the program (approx.) 1000x slower than the PC whereas the > processor speed differs by a factor 50 only. > CE machine was only 2 times slower than the processor speed difference would > suggest. > > In other words, after linear compensation for different processor speeds CE > machine seems to be 10 times "faster" than Palm. > This is for me incredibly bad result for Palm. Naturally I tried to find > some excuses. >
As far as I know devices running WinCE have a RISC-alike processor WITH a FPU. When you then take the m515 with it's Motorola 68K processor - there's NO FPU unit - well, there is software floating point support by some libraries ... but, ahemmm. no comment. :) Furthermore the 68K (and it's derivates, like the Dragonball in 'earlier' Palm PDAs (like the m515) ) is a CISC processor - with an ~10 times slower clockfreq. compared to the devices running WinCE. Plus no cache at all in the 68K (I don't know about the Dragonball 68K clone - but I'm quite sure it doesn't have a cache too) Therefore - if you compile a JPEG decoder with the DCT decoding using floating point math ... surprise surprise . . . software FPU emulation is for sure A LOT slower than on a processor using a hardware FPU. (Even DCT decoding using int-math is much much more powerful on a RISC type processor (like the ARM), because of it's powerful instruction set (just look at ARM: multiply AND accumulate in just 1 instruction... mla Rd,Rs1,Rs2,Rs3 (Rd:= Rs1*Rs2 + Rs3) ... simply wonderful! :) ... not to speak about the barrel shifter.. ALL registers are general purpose registers ... etc. etc. ) ahh yes.. just forgot about the instruction pipelining on RISC processor too... (another fact: the 'official' JPEG decoding source has in my opinion a too much overhead.. you may write your own image compression code based on DCT->quantization->compression of the quantized output(Huffman + RLE or LZ77) After all, comparing the Dragonball 68K devices to ARM (or any other RISC processor) driven PDAs is like comparing apples with bananas. (plus: comparing just the processor speeds is senseless too.. just image a 1000MHz proc. with a Bus-width of 1 bit ;) and then compare the results to a proc. of 8MHz with a 32 bit wide Bus ... :) [don't know if all facts are true.. but I hope it helped a bit. ;) ] Best regards, Carsten. well.. http://userpages.umbc.edu/~zding1/cmsc611/report.pdf compares RISC - CISC (not very extensive) http://www.ecs.umass.edu/ece/wolf/papers/npw2002.pdf (although it's about Network-processors it's quite interesting has some important formulas about processor architecture) (and you can easily recognize in this paper that a processor isn't just measured by it's clock frequency) ... you should also look for some papers by the gurus of processor architecture: Hennesy & Patterson -- For information on using the Palm Developer Forums, or to unsubscribe, please see http://www.palmos.com/dev/support/forums/
