I have already explained this situation in concept, but I am going to
present hard numbers from execution on a REAL Palm IIIc.  (And yes, I"ll
eventually have data on all Palms, but at this phase my target is the
IIIc.)

For those already doing game programming, this is basic stuff you
already know, and I won't bore you.

But I am going to post a bunch of real world numbers because:

(1) I've never seen ANYONE post real world bandwidth information on the
Palm, and as a game programmer this is very useful information.
(2) I wanted to again demonstrate that it really is useful to have a
special, official, optimized CopyScreenBufferToVRAM function from Palm
to make things legal and unified across different device graphics
architectures.

To recap, if you compose your frame directly in VRAM, you save the time
required to copy an image buffer to the VRAM each frame.  So there is
certainly a huge savings in drawing directly to VRAM.  Also, no matter
what, your graphics must end up on the screen.  So why not just draw
direct to screen?  (e.g., using WinScreenLock())

Without restating issues in terms of bus speeds, wait states, and raw
throughput, let's just use a real world example:

I'm copying a full 8-bit screen from a source to a destination.  The
instructions' bandwidth adds about 7K to the BLiT.  A real game has to
do more than simply copy a screen, but this is a revealing example.

REMEMBER that a screen copy uses two bus operations, a read and write:

Copying from Database to a RAM buffer:        110-160 frames per second
!!!
Copying from a RAM buffer direct to VRAM:   35 - 38 frames per second
Copying from VRAM to VRAM:                        20 frames per second.

The bottom line is that the Epson 1375 graphics bus is roughly 8 times
slower than RAM access on a PalmIIIc.  If afterburner is on, this
difference nearly doubles.

So accessing the VRAM is slow, but you HAVE to do it.  So where's the
break even point?

I've found that (on the IIIc)  if your typical frame draw requires
accessing (count both writes AND reads for masking/alpha) about 20 %
MORE than one full screen of data, then it becomes faster to draw
everything to a RAM buffer and then copy the RAM to VRAM.

Now, if you're writing a simple game with a fixed background and sprites
running around, you probably rarely are drawing even one full screen at
a time.  But if you have a scrolling game, you need to draw the
background each frame PLUS sprites, and this 1.2X screen limit is
reached very quickly.  Move up to a rich, layered game, and you're way
above this limit - to the point where you CAN'T compose directly to VRAM
and still maintain gaming speeds.

As a side note, you can also take advantage of slow VRAM access to do
some complex processing transparent to the user.
For example, in my current application, I operate in full color
internally (my target audience is color.)  If you happen to have a gray
Palm, rather than write special code, I translate the color buffer to
gray during the process of copying the RAM buffer to VRAM.  That sounds
insane, but with the slow VRAM bus combined with half the required VRAM
screen memory for 4-bits, I still get 38-48 frames per second, even with
all the bit shifting logic!  (You'd do even better (and use less RAM) in
native 4-bit mode, but in my case this wouldn't pay.)

Well, I hope this was informative and not merely a waste of bandwidth.
If you're thinking I need a good flaming, just remember that I mean
well.  :)

Happy Gaming!
- Jeff



-- 
For information on using the Palm Developer Forums, or to unsubscribe, please see 
http://www.palmos.com/dev/tech/support/forums/

Reply via email to