On Fri, Sep 20, 2013 at 8:55 PM, chen <[email protected]> wrote: > At 2013-09-21 02:35:59,"Jason Garrett-Glaser" <[email protected]> wrote: > > > > >> To implement this change , we need to modify HM code. > >> [MC] we can define the table in asm file, but we have to modify HM. of > >> course, it is easy things > > > >You don't have to, of course (you know the code better than I and > >whether or not it's a good idea to change it). > If we don't modify code, we can't know which coef group they want, > HEVC have 4 group to qpel, it is different to h264 > > >>> + > >>> + mov tmp, offset2 > >>> + movd sumOffset, tmp > >>> + pshufd sumOffset, sumOffset, 0 > >> > >> You can movd directly from memory; going through a register is much > >> slower, especially on AMD machines. > >> [MC] are you means, we put constant into memory and load it once? > > > >movd sumOffset, offset2 > I look the document before, I think there haven't instruction support > ' movd reg, constant ' on Intel CPU > > > >> [MC] no way, x264 macro have a bug here, you can remove reduce x2 and check > >> the output, the xmm0 seems Intel limit > > > >That makes sense, I don't think the x264 macro was ever designed to > >support non-AVX pblendvb. I don't recommend non-AVX pblendvb anyways > >as it's a lot slower because of the extra register dependency (it's > >like 3 uops or something). > replace by 'pand + pandn + por' is 3 uops but less dependency, > in Agner's documents, he said pblendvb is 2-uops, 2-latency and 1-through > on my Sandy, so I select it. > > Of course, this is a bad branch, the code for testbench only. > in really world, the minimum block is 4x8, the width is 4, movd is enough. >
This sounds like a testbench bug then. Let's not keep dead code in the primitive just because the testbench covers unrealistic block sizes. Cheers -- Steve Borho
_______________________________________________ x265-devel mailing list [email protected] https://mailman.videolan.org/listinfo/x265-devel
