Re: [libav-devel] [PATCH] checkasm: add HEVC MC tests
Minor nits: +#define randomize_buffers(buf, size, depth) s/buffers/buffer/ since you're only randomizing a single one at a time. +static const char *interp_names[2][2] = { { pixels, h }, { v, hv } }; const char * const Otherwise lgtm. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] hevcdsp: add x86 SIMD for MC
On 21/08/15 4:19 AM, Anton Khirnov wrote: +%macro PUT_WEIGHTED_PRED 3 +%if %1 +cglobal hevc_put_weighted_pred_avg_ %+ %2 %+ _ %+ %3, 11, 11, 8, denom, weight0, weight1, offset0, offset1, dst, dststride, src0, src1, srcstride, height +%else +cglobal hevc_put_weighted_pred_ %+ %2 %+ _ %+ %3, 8, 8, 8, denom, weight0, offset0, dst, dststride, src0, srcstride, height +%endif +and heightq,0x7fff + +add denomq, 14 + %1 - %3 +movqm0, denomq demon is an uint8_t. This should be add denomd, 14 + %1 - %3 movdm0, denomd I don't think doing a movzx denomd, denomb to clear bits 9 to 31 is necessary, so the above should suffice. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] hevcdsp: add x86 SIMD for MC
On 22/08/15 1:16 PM, Anton Khirnov wrote: +%macro QPEL_8 2 +%if %2 +%define postfixv +%define mvfrac myq Same here and below the else, rename this to mvfracq and add a mvfracd. +%define pixstride srcstrideq +%define pixstride3 sstride3q +%define src_m3 srcm3q +%else +%define postfixh +%define mvfrac mxq +%define pixstride 1 +%define pixstride3 3 +%define src_m3 (srcq - 3) +%endif + +cglobal hevc_qpel_ %+ postfix %+ _ %+ %1 %+ _8, 8, 10, 7, dst, dststride, src, srcstride, height, mx, my, sstride3, srcm3, coeffsreg This should be 7, 10, 7, Otherwise you're loading sstride3 from stack as if it were a function argument. Ideally though, for vertical you'd use 5, 9, 7 then manually load either mx or my instead of both, saving one register, or even 5, 8, 7, since coeffsreg and mvfrac are only used during init, and you can easily reuse one of those two registers for sstride3 or srcm3. You can also push it down to 4, 7, 7 if you manually load height before or after the SPLATWs and reuse the regs for coeffsreg and mvfrac. As a plus, this would make the functions work with x86_32. For horizontal you don't even need sstride3 or srcm3, so you definitely should declare and use less registers. Didn't check other functions but I'm sure similar optimizations can be done. +%if %2 +and mvfrac, 0x3 +%endif +dec mvfrac +shl mvfrac, 4 Use mvfracd on these three, it will clear the high bits for the mova below. anding the whole register with 3/7 should also work fine, with less clutter. and mvfrac, 0x3 is only in ff_hevc_qpel_v_* functions, but not ff_hevc_qpel_h_*. It's the same with the and mvfrac, 0x7 cases below. You need to use the d suffix instead of q on the register names to make sure the high bits are cleared. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] hevcdsp: add x86 SIMD for MC
Quoting James Almer (2015-08-21 19:10:19) On 21/08/15 4:19 AM, Anton Khirnov wrote: + +add dstq, dststrideq +add srcq, srcstrideq + +%assign i (i + 1) +%endrep + +dec heightq This and every other case should be heightd. There's no guarantee the high bits will be zero on every x86_64 target. This is the source of the crashes i was getting. Right, fixed. +jg .loop +RET +%endmacro + +INIT_XMM sse2 +GET_PIXELS 4, 8, 1 +GET_PIXELS 8, 8, 1 +GET_PIXELS 12, 8, 3 +GET_PIXELS 16, 8, 2 +GET_PIXELS 24, 8, 3 +GET_PIXELS 32, 8, 3 +GET_PIXELS 48, 8, 3 +GET_PIXELS 64, 8, 3 + +GET_PIXELS 4, 10, 1 +GET_PIXELS 8, 10, 1 +GET_PIXELS 12, 10, 3 +GET_PIXELS 16, 10, 2 +GET_PIXELS 24, 10, 3 +GET_PIXELS 32, 10, 3 +GET_PIXELS 48, 10, 3 +GET_PIXELS 64, 10, 3 + +; hevc_qpel_h/v_w_8(int16_t *dst, ptrdiff_t dststride, +; uint8_t *src, ptrdiff_t srcstride, +; int height, int mx, int my, int *mcbuffer) + +; 8-bit qpel interpolation +; %1: block width +; %2: 0 - horizontal; 1 - vertical +%macro QPEL_8 2 +%if %2 +%define postfixv +%define mvfrac myq Same here and below the else, rename this to mvfracq and add a mvfracd. +%define pixstride srcstrideq +%define pixstride3 sstride3q +%define src_m3 srcm3q +%else +%define postfixh +%define mvfrac mxq +%define pixstride 1 +%define pixstride3 3 +%define src_m3 (srcq - 3) +%endif + +cglobal hevc_qpel_ %+ postfix %+ _ %+ %1 %+ _8, 8, 10, 7, dst, dststride, src, srcstride, height, mx, my, sstride3, srcm3, coeffsreg +%if %2 +and mvfrac, 0x3 +%endif +dec mvfrac +shl mvfrac, 4 Use mvfracd on these three, it will clear the high bits for the mova below. anding the whole register with 3/7 should also work fine, with less clutter. -- Anton Khirnov ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel