Re: [x265] Fwd: [PATCH Only Review, don't merge] Assembly routine for filterHorizontal_p_p() for 4 tap filter

chen Fri, 20 Sep 2013 18:56:23 -0700

At 2013-09-21 02:35:59,"Jason Garrett-Glaser" <[email protected]> wrote:
>
>> To implement this change , we need to modify HM code.
>> [MC] we can define the table in asm file, but we have to modify HM. of
>> course, it is easy things
>
>You don't have to, of course (you know the code better than I and
>whether or not it's a good idea to change it).


If we don't modify code, we can't know which coef group they want,
HEVC have 4 group to qpel, it is different to h264
 
>>> +
>>> +    mov         tmp,        offset2
>>> +    movd        sumOffset,  tmp
>>> +    pshufd      sumOffset,  sumOffset,  0
>>
>> You can movd directly from memory; going through a register is much
>> slower, especially on AMD machines.
>> [MC] are you means, we put constant into memory and load it once?
>
>movd sumOffset, offset2

I look the document before, I think there haven't instruction support
' movd reg, constant ' on Intel CPU
 
>> [MC] no way, x264 macro have a bug here, you can remove reduce x2 and check
>> the output, the xmm0 seems Intel limit
>
>That makes sense, I don't think the x264 macro was ever designed to
>support non-AVX pblendvb.  I don't recommend non-AVX pblendvb anyways
>as it's a lot slower because of the extra register dependency (it's
>like 3 uops or something).

replace by 'pand + pandn + por' is 3 uops but less dependency,
in Agner's documents, he said pblendvb is 2-uops, 2-latency and 1-through
on my Sandy, so I select it.
 
Of course, this is a bad branch, the code for testbench only.
in really world, the minimum block is 4x8, the width is 4, movd is enough.

>Jason
>_______________________________________________
>x265-devel mailing list
>[email protected]
>https://mailman.videolan.org/listinfo/x265-devel

_______________________________________________
x265-devel mailing list
[email protected]
https://mailman.videolan.org/listinfo/x265-devel

Re: [x265] Fwd: [PATCH Only Review, don't merge] Assembly routine for filterHorizontal_p_p() for 4 tap filter

Reply via email to