2017-12-17 19:41 GMT+01:00 Henrik Gramner :
> On Thu, Dec 14, 2017 at 11:16 AM, Martin Vignali
> wrote:
> > 2017-12-13 17:37 GMT+01:00 Henrik Gramner :
> >> You could also do vextracti128 + 128-bit packuswb instead of 256-bit
> >> packuswb + vpermq.
> >>
> > Sorry don't understand this part
> > d
On Thu, Dec 14, 2017 at 11:16 AM, Martin Vignali
wrote:
> 2017-12-13 17:37 GMT+01:00 Henrik Gramner :
>> You could also do vextracti128 + 128-bit packuswb instead of 256-bit
>> packuswb + vpermq.
>>
> Sorry don't understand this part
> do you mean 128 bit packuswb + movh for each lane ?
> or somet
2017-12-13 17:37 GMT+01:00 Henrik Gramner :
> On Sat, Dec 9, 2017 at 1:11 PM, Martin Vignali
> wrote:
> > the idea in AVX2 is to load 128bits of data (2x 64 bits)
> > then shuffle accross lane, the two 64 bits in the low part of each lane,
> to
> > keep the rest of the process similar
> > to the
On Sat, Dec 9, 2017 at 1:11 PM, Martin Vignali wrote:
> the idea in AVX2 is to load 128bits of data (2x 64 bits)
> then shuffle accross lane, the two 64 bits in the low part of each lane, to
> keep the rest of the process similar
> to the sse version
What about using pmovzxbw instead of movu + vp
Hello,
in attach patch to add AVX2 version for each 8b func (except divide)
001 : avutil : add ABS2 for avx2
002 : avfilter : add AVX2 version
for most of the func, the AVX2 is a simple modification
VBROADCASTi128, for constant loading
when the process stay in 8bits
when the process use inter