On Tue, 2010-03-16 at 19:22 +0100, Alexander Larsson wrote:
> On Tue, 2010-03-16 at 20:17 +0200, Siarhei Siamashka wrote:
> > On Tuesday 16 March 2010, Siarhei Siamashka wrote:
> > > On Tuesday 16 March 2010, Alexander Larsson wrote:
> > > > On Tue, 2010-03-16 at 16:51 +0200, Siarhei Siamashka wrote:
> > > > > Regarding the alex's branch and performance, I already
> mentioned
> > that
> > > > > it was
> > > > > much slower for over_8888_0565 case in my benchmark when
> > compared
> > > > > against my
> > > > > branch on ARM Cortex-A8 (the other cases of scaling are ok).
> I'm
> > using
> > > > > the
> > > > > following test program for benchmarking these optimizations:
> > > > >
> >
> http://cgit.freedesktop.org/~siamashka/pixman/commit/?h=test-n-bench&i
> > > > > d=93ec60149cb3535f70a9e285de0b359ff444f26e
> > > > >
> > > > > The test program tries to benchmark scaling of when source and
> > > > > destination
> > > > > image sizes are approximately the same (and the performance
> can
> > be
> > > > > more or
> > > > > less directly compared to the simple nonscaled blit).
> > > > >
> > > > > The results are (variance is only in the last digit):
> > > > >
> > > > > op=3, src_fmt=20028888, dst_fmt=10020565, speed=5.06 MPix/s
> > (1.21 FPS)
> > > > > vs.
> > > > > op=3, src_fmt=20028888, dst_fmt=10020565, speed=8.72 MPix/s
> > (2.08 FPS)
> > > > >
> > > > > which is quite a lot.
> > > >
> > > > Can you retry with my new branch:
> > > > http://cgit.freedesktop.org/~alexl/pixman/log/?h=alex-scaler2
> > >
> > > Now it is:
> > > op=3, src_fmt=20028888, dst_fmt=10020565, speed=5.16 MPix/s (1.23
> > FPS)
> > >
> > > A little bit better, but still not good.
> > 
> > Found the problem, it's here:
> > > + SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, r5g6b5, 8888_565),
> > This should have a8r8g8b8 instead of a8b8g8r8. So this fast path
> just
> > was not
> > run at all. Once fixed, it shows the expected performance.
> > 
> > 
> > Also 'alex-scaler2' branch is substantially slower than
> 'alex-scaler'
> > for
> > normal repeat:
> > 
> > == nearest tiled SRC (alex-scaler) ==
> > op=1, src_fmt=20028888, dst_fmt=20028888, speed=90.91 MPix/s (21.67
> > FPS)
> > op=1, src_fmt=20028888, dst_fmt=10020565, speed=63.82 MPix/s (15.22
> > FPS)
> > op=1, src_fmt=10020565, dst_fmt=10020565, speed=92.16 MPix/s (21.97
> > FPS)
> > 
> > == nearest tiled SRC (alex-scaler2) ==
> > op=1, src_fmt=20028888, dst_fmt=20028888, speed=76.54 MPix/s (18.25
> > FPS)
> > op=1, src_fmt=20028888, dst_fmt=10020565, speed=50.44 MPix/s (12.03
> > FPS)
> > op=1, src_fmt=10020565, dst_fmt=10020565, speed=67.14 MPix/s (16.01
> > FPS)
> 
> This may well be the change from open-coding the repeat to using the
> repeat() inline function.

It is, the following patch unapplying the use of repeat() changes from:

== nearest tiled SRC ==
op=1, src_fmt=20028888, dst_fmt=20028888, speed=349.27 MPix/s (83.27 FPS)
op=1, src_fmt=20028888, dst_fmt=10020565, speed=137.86 MPix/s (32.87 FPS)
op=1, src_fmt=10020565, dst_fmt=10020565, speed=364.15 MPix/s (86.82 FPS)

to:

== nearest tiled SRC ==
op=1, src_fmt=20028888, dst_fmt=20028888, speed=556.82 MPix/s (132.76 FPS)
op=1, src_fmt=20028888, dst_fmt=10020565, speed=136.19 MPix/s (32.47 FPS)
op=1, src_fmt=10020565, dst_fmt=10020565, speed=471.76 MPix/s (112.48 FPS)

Which is nothing to sneeze at

diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c
index 6607a47..b6c8f7c 100644
--- a/pixman/pixman-fast-path.c
+++ b/pixman/pixman-fast-path.c
@@ -1474,7 +1474,12 @@ fast_composite_scaled_nearest_ ## scale_func_name ## _ 
## OP (pixman_implementat
        y = vy >> 16;                                                           
                \
        vy += unit_y;                                                           
                \
        if (do_repeat)                                                          
                \
-           repeat (PIXMAN_REPEAT_NORMAL, &vy, max_vy);                         
                \
+       {                                                                       
                \
+           if (unit_y >= 0)                                                    
                \
+               while (vy >= max_vy) vy -= max_vy;                              
                \
+           else                                                                
                \
+               while (vy < 0) vy += max_vy;                                    
                \
+       }                                                                       
                \
                                                                                
                \
        src = src_first_line + src_stride * y;                                  
                \
                                                                                
                \
@@ -1485,13 +1490,23 @@ fast_composite_scaled_nearest_ ## scale_func_name ## _ 
## OP (pixman_implementat
            x1 = vx >> 16;                                                      
                \
            vx += unit_x;                                                       
                \
            if (do_repeat)                                                      
                \
-               repeat (PIXMAN_REPEAT_NORMAL, &vx, max_vx);                     
                \
+               {                                                               
                \
+               if (unit_x >= 0)                                                
                \
+                   while (vx >= max_vx) vx -= max_vx;                          
                \
+               else                                                            
                \
+                   while (vx < 0) vx += max_vx;                                
                \
+           }                                                                   
                \
            s1 = src[x1];                                                       
                \
                                                                                
                \
            x2 = vx >> 16;                                                      
                \
            vx += unit_x;                                                       
                \
            if (do_repeat)                                                      
                \
-               repeat (PIXMAN_REPEAT_NORMAL, &vx, max_vx);                     
                \
+               {                                                               
                \
+               if (unit_x >= 0)                                                
                \
+                   while (vx >= max_vx) vx -= max_vx;                          
                \
+               else                                                            
                \
+                   while (vx < 0) vx += max_vx;                                
                \
+           }                                                                   
                \
            s2 = src[x2];                                                       
                \
                                                                                
                \
            if (PIXMAN_OP_ ## OP == PIXMAN_OP_OVER)                             
                \
@@ -1537,7 +1552,12 @@ fast_composite_scaled_nearest_ ## scale_func_name ## _ 
## OP (pixman_implementat
            x1 = vx >> 16;                                                      
                \
            vx += unit_x;                                                       
                \
            if (do_repeat)                                                      
                \
-               repeat (PIXMAN_REPEAT_NORMAL, &vx, max_vx);                     
                \
+               {                                                               
                \
+               if (unit_x >= 0)                                                
                \
+                   while (vx >= max_vx) vx -= max_vx;                          
                \
+               else                                                            
                \
+                   while (vx < 0) vx += max_vx;                                
                \
+           }                                                                   
                \
            s1 = src[x1];                                                       
                \
                                                                                
                \
            if (PIXMAN_OP_ ## OP == PIXMAN_OP_OVER)                             
                \


Further simplifying by removing support for unit_x < 0 (i.e. mirrored
scaling) gives:

== nearest tiled SRC ==
op=1, src_fmt=20028888, dst_fmt=20028888, speed=655.11 MPix/s (156.19 FPS)
op=1, src_fmt=20028888, dst_fmt=10020565, speed=136.02 MPix/s (32.43 FPS)
op=1, src_fmt=10020565, dst_fmt=10020565, speed=619.16 MPix/s (147.62 FPS)

It might be interesting to duplicate the inner loop, once for each
unit_x sign to get this performance increase always.


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                            Red Hat, Inc 
       [email protected]            [email protected] 
He's a bookish moralistic cyborg plagued by the memory of his family's brutal 
murder. She's a sarcastic punk stripper with her own daytime radio talk show. 
They fight crime! 

_______________________________________________
Pixman mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pixman

Reply via email to