template< > void SurfaceDid<16>::Fill ( U32 col ) { U16 c = (U16)col;
U8 *line=PixelPtr( 0, 0 ), *pixel=line;
for ( S16 y = 0; y < 320; y++, line+=640, pixel=line ) for ( S16 x =0; x < 320; x++, pixel+=2 )
*(U16*)pixel=c;
}
The main points are: don't do anything that's potentially tricky from the compiler's perspective; simplify the code as much as you can; unroll the innermost loop. I passed up one significant optimization, which is that you can do two pixels at a time by writing longs instead of shorts. Here's my optimized version, to demonstrate the ideas.
void SurfaceDid<16>::Fill ( U32 col )
{
register U16 c = (U16)col; // On poor compilers, the 'register' keyword
// is helpful.
register U16 *pixel=line;
register int x; // 'int' is the preferred type, even though it's
// not explicit
int y;
for ( y=0; y < 320; y++ ) // Moved extra stuff out of incrementor
{
for ( x=0; x<(320/8); x++ ) // This unrolls naturally. Sizes not
// a convenient multiple are harder.
{
*(pixel++)=c; // This naturally uses the (an)+ addressing
// mode, so make it obvious to the compiler
*(pixel++)=c;
*(pixel++)=c; // The compiler *might* do this, but this way
*(pixel++)=c; // it *must*.
*(pixel++)=c;
*(pixel++)=c;
*(pixel++)=c;
*(pixel++)=c;
} // pixel += 0;// Rather than going forwards and back a line at a
// time, skip over the space from the end of one line
// to the start of the next explicitly. As it so
// happens in this case, the box is full-width so
// this is unneeded.
}
}
-- For information on using the Palm Developer Forums, or to unsubscribe, please see http://www.palmos.com/dev/support/forums/
