On Tuesday 14 December 2010 18:32:04 Siarhei Siamashka wrote: Well, appears that some updates may be useful/necessary.
> * Because of the software prefetch, skipping read of the destination image
> pixels does not bring much improvement, moreover there is just no
> improvement at all on the newest OMAP3 and OMAP4 devices. This happens
> because we are prefetching data into cache far ahead and just skipping
> read instruction is not providing memory bandwidth saving (the data was
> already fetched into cache by this time).
Forgot to mention that there is one interesting case here. Because PLD
instruction only does prefetch on TLB hit, there actually may be
performance improvement when skipping both reading and writing pixels in
the destination image (fully transparent source image). But this may happen
only if we are not touching whole pages in the destination buffer, which is
another constraint.
> index 91ec27d..f2fc5fb 100644
> --- a/pixman/pixman-arm-neon-asm.S
> +++ b/pixman/pixman-arm-neon-asm.S
> @@ -536,14 +536,38 @@ generate_composite_function \
>
> /*************************************************************************
> *****/
>
> +.macro pixman_composite_add_8888_8888_init
> + add DUMMY, sp, #ARGS_STACK_OFFSET
> + vpush {d8-d15}
> + vld1.32 {d11[0]}, [DUMMY]
> + vdup.8 d8, d11[0]
> + vdup.8 d9, d11[1]
> + vdup.8 d10, d11[2]
> + vdup.8 d11, d11[3]
> +.endm
> +
> +.macro pixman_composite_add_8888_8888_cleanup
> + vpop {d8-d15}
> +.endm
> +
> +
> .macro pixman_composite_add_8888_8888_process_pixblock_tail_head
> fetch_src_pixblock
> + PF vorr.u8 q12, q0, q1
> + PF vorr.u8 d24, d24, d25
> + PF vcnt.u8 d24, d24
> + PF ldr DUMMY, [sp]!
> + PF vpadd.u8 d24, d24, d24
> + PF vst1.32 d24[0], [sp]
> PF add PF_X, PF_X, #8
> PF tst PF_CTL, #0xF
> vld1.32 {d4, d5, d6, d7}, [DST_R, :128]!
> PF addne PF_X, PF_X, #8
> PF subne PF_CTL, PF_CTL, #1
> - vst1.32 {d28, d29, d30, d31}, [DST_W, :128]!
> + PF cmp DUMMY, #0
> + PF beq 5f
> + vst1.32 {d28, d29, d30, d31}, [DST_W, :128]
> +5: add DST_W, DST_W, #32
> PF cmp PF_X, ORIG_W
> PF pld, [PF_SRC, PF_X, lsl
> #src_bpp_shift] PF pld, [PF_DST, PF_X, lsl #dst_bpp_shift]
As it happens, I messed up and attached a wrong work-in-progress patch
here (not the one that was intended) :) Naturally the dangling 'init' and
'cleanup' macros don't affect anything, and 'head' macro needs to be updated
too in order to put the correct initial value on stack, otherwise the outcome
of the first branch is undefined. Anyway, I think I just need to provide a
final patch a bit later and be done with it.
--
Best regards,
Siarhei Siamashka
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
