On Tue, Mar 17, 2015 at 01:08:06AM -0300, James Almer wrote: > Signed-off-by: James Almer <jamr...@gmail.com> > --- > GCC apparently can't generate a bzhi instruction on its own from the c > version, so > here's a custom implementation. > > Before: > > gcc -O3 > <av_zhb_c>: > 0: 89 f1 mov ecx,esi > 2: ba 01 00 00 00 mov edx,0x1 > 7: d3 e2 shl edx,cl > 9: 83 ea 01 sub edx,0x1 > c: 89 d0 mov eax,edx > e: 21 f8 and eax,edi > 10: c3 ret > > gcc -mbmi2 -O3 > <av_zhb_c>: > 0: ba 01 00 00 00 mov edx,0x1 > 5: c4 e2 49 f7 d2 shlx edx,edx,esi > a: 8d 42 ff lea eax,[rdx-0x1] > d: 21 f8 and eax,edi > f: c3 ret > > After: > > gcc -mbmi2 -O3 > <av_zhb_bmi2>: > 0: c4 e2 48 f5 c7 bzhi eax,edi,esi > 5: c3 ret > > The non-bmi2 example is a bit bloated with movs to have values in ecx (needed > for > shl) and eax (ret value) since, unlike the actual function, it was not > inlined. > Still, best case scenario is mov + shl + sub/dec/lea + and versus a single > bzhi > when p is not a constant.
orthogonal to this patch, you or someone might want to submit a patch to gcc to make it autogenerate this optimization [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB The bravest are surely those who have the clearest vision of what is before them, glory and danger alike, and yet notwithstanding go out to meet it. -- Thucydides
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel