On Tuesday 16 March 2010, Siarhei Siamashka wrote: > On Tuesday 16 March 2010, Alexander Larsson wrote: > > On Tue, 2010-03-16 at 16:51 +0200, Siarhei Siamashka wrote: > > > Regarding the alex's branch and performance, I already mentioned that > > > it was > > > much slower for over_8888_0565 case in my benchmark when compared > > > against my > > > branch on ARM Cortex-A8 (the other cases of scaling are ok). I'm using > > > the > > > following test program for benchmarking these optimizations: > > > http://cgit.freedesktop.org/~siamashka/pixman/commit/?h=test-n-bench&i > > > d=93ec60149cb3535f70a9e285de0b359ff444f26e > > > > > > The test program tries to benchmark scaling of when source and > > > destination > > > image sizes are approximately the same (and the performance can be > > > more or > > > less directly compared to the simple nonscaled blit). > > > > > > The results are (variance is only in the last digit): > > > > > > op=3, src_fmt=20028888, dst_fmt=10020565, speed=5.06 MPix/s (1.21 FPS) > > > vs. > > > op=3, src_fmt=20028888, dst_fmt=10020565, speed=8.72 MPix/s (2.08 FPS) > > > > > > which is quite a lot. > > > > Can you retry with my new branch: > > http://cgit.freedesktop.org/~alexl/pixman/log/?h=alex-scaler2 > > Now it is: > op=3, src_fmt=20028888, dst_fmt=10020565, speed=5.16 MPix/s (1.23 FPS) > > A little bit better, but still not good.
Found the problem, it's here: > + SIMPLE_NEAREST_FAST_PATH (OVER, a8b8g8r8, r5g6b5, 8888_565), This should have a8r8g8b8 instead of a8b8g8r8. So this fast path just was not run at all. Once fixed, it shows the expected performance. Also 'alex-scaler2' branch is substantially slower than 'alex-scaler' for normal repeat: == nearest tiled SRC (alex-scaler) == op=1, src_fmt=20028888, dst_fmt=20028888, speed=90.91 MPix/s (21.67 FPS) op=1, src_fmt=20028888, dst_fmt=10020565, speed=63.82 MPix/s (15.22 FPS) op=1, src_fmt=10020565, dst_fmt=10020565, speed=92.16 MPix/s (21.97 FPS) == nearest tiled SRC (alex-scaler2) == op=1, src_fmt=20028888, dst_fmt=20028888, speed=76.54 MPix/s (18.25 FPS) op=1, src_fmt=20028888, dst_fmt=10020565, speed=50.44 MPix/s (12.03 FPS) op=1, src_fmt=10020565, dst_fmt=10020565, speed=67.14 MPix/s (16.01 FPS) One more anomaly is that 16bpp case somehow managed to get slower than 32bpp for normal repeat on ARM Cortex-A8. I'm checking what's wrong here. -- Best regards, Siarhei Siamashka _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
