Pavel wrote:
template< > void SurfaceDid<16>::Fill ( U32 col ) { U16 c = (U16)col;
U8 *line=PixelPtr( 0, 0 ), *pixel=line;

for ( S16 y = 0; y < 320; y++, line+=640, pixel=line ) for ( S16 x =0; x < 320; x++, pixel+=2 )
*(U16*)pixel=c;
}

The main points are: don't do anything that's potentially tricky from the compiler's perspective; simplify the code as much as you can; unroll the innermost loop. I passed up one significant optimization, which is that you can do two pixels at a time by writing longs instead of shorts. Here's my optimized version, to demonstrate the ideas.


void SurfaceDid<16>::Fill ( U32 col )
{
    register U16 c = (U16)col; // On poor compilers, the 'register' keyword
                               // is helpful.
    register U16 *pixel=line;
    register int x; // 'int' is the preferred type, even though it's
                    // not explicit
    int y;
        
    for ( y=0; y < 320; y++ ) // Moved extra stuff out of incrementor
    {
        for ( x=0; x<(320/8); x++ )  // This unrolls naturally. Sizes not
                                     // a convenient multiple are harder.
        {
               *(pixel++)=c; // This naturally uses the (an)+ addressing
                             // mode, so make it obvious to the compiler
               *(pixel++)=c;
               *(pixel++)=c; // The compiler *might* do this, but this way
               *(pixel++)=c; // it *must*.
               *(pixel++)=c;
               *(pixel++)=c;
               *(pixel++)=c;
               *(pixel++)=c;
        }

        // pixel += 0;// Rather than going forwards and back a line at a
                      // time, skip over the space from the end of one line
                      // to the start of the next explicitly. As it so
                      // happens in this case, the box is full-width so
                      // this is unneeded.
    }
}


-- For information on using the Palm Developer Forums, or to unsubscribe, please see http://www.palmos.com/dev/support/forums/

Reply via email to