On Sat, Aug 5, 2017 at 9:10 PM, Ivan Kalvachev <ikalvac...@gmail.com> wrote: > +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm > +%if cpuflag(avx2) > + vbroadcastss %1, %2 ; ymm, xmm > +%elif cpuflag(avx) > + %ifnum sizeof%2 ; avx1 register > + vpermilps xmm%1, xmm%2, q0000 ; xmm, xmm, imm || ymm, ymm, imm
Nit: Use shufps instead of vpermilps, it's one byte shorter but otherwise identical in this case. c5 e8 c6 ca 00 vshufps xmm1,xmm2,xmm2,0x0 c4 e3 79 04 ca 00 vpermilps xmm1,xmm2,0x0 > +%macro BLENDVPS 3 ; dst/src_a, src_b, mask > +%if cpuflag(avx) > + blendvps %1, %1, %2, %3 > +%elif cpuflag(sse4) > + %if notcpuflag(avx) > + %ifnidn %3,xmm0 > + %error sse41 blendvps uses xmm0 as default 3d operand, you used > %3 > + %endif > + %endif notcpuflag(avx) is redundant (it's always true since AVX uses the first branch). _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel