Re: [FFmpeg-devel] [PATCH V1 1/3] lavu: Add alpha blending API based on row.
On Wed, Sep 26, 2018 at 6:58 AM Marton Balint wrote: > > > > On Tue, 25 Sep 2018, Jun Zhao wrote: > > > Add alpha blending API based on row, support global alpha blending/ > > per-pixel blending, and add SSSE3/AVX2 optimizations of the functions. > > You might want to take a look at > libavfilter/vf_framerate.c and libavfilter/x86/vf_framerate.asm as well, > they do something similar. Maybe you should factorize that instead. > > Yep, this is a good suggestion, I think we can factor this part and supply a public 8bits/16bits blend API with SSSE3/AVX2 optimiztion, then we can use the API in vf_framerate/vf_blend (blend_normal_8bit/16bit)/vf_minterpolate (blend mode). ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH V1 1/3] lavu: Add alpha blending API based on row.
On 9/25/2018 10:45 PM, myp...@gmail.com wrote: > On Wed, Sep 26, 2018 at 3:55 AM Rostislav Pehlivanov > wrote: >> >> On 25 September 2018 at 16:27, Jun Zhao wrote: >> >>> Add alpha blending API based on row, support global alpha blending/ >>> per-pixel blending, and add SSSE3/AVX2 optimizations of the functions. >>> > > >> We don't use inline asm on x86 and we don't use global contexts. Look at >> how float_dsp is done. > > I guess you precise mean "prefer NASM assembler over inline asm on x86". :) > In fact, > I know some x86 inline asm in FFmpeg, e,g libavcodec/x86/h264_cabac. > (Use grep "__asm__ volatile" can find more x86 inline asm). And we need to > update > the inline asm on x86 rule in > https://github.com/FFmpeg/FFmpeg/blob/master/doc/optimization.txt? Yes, we still have some inline asm either because nobody has gotten around to port it to NASM syntax after the project moved to it, or because like with CABAC and some single instruction functions in libavutil it makes sense being inline since the call overhead would kill performance. That document could use some polishing, but in any case, as stated in the "Inline asm vs. external asm" section, we have for several years required new code that calls external functions to be written in NASM syntax, as it's the case with this patchset. > > Thanks. > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH V1 1/3] lavu: Add alpha blending API based on row.
On Wed, Sep 26, 2018 at 3:55 AM Rostislav Pehlivanov wrote: > > On 25 September 2018 at 16:27, Jun Zhao wrote: > > > Add alpha blending API based on row, support global alpha blending/ > > per-pixel blending, and add SSSE3/AVX2 optimizations of the functions. > > > We don't use inline asm on x86 and we don't use global contexts. Look at > how float_dsp is done. I guess you precise mean "prefer NASM assembler over inline asm on x86". :) In fact, I know some x86 inline asm in FFmpeg, e,g libavcodec/x86/h264_cabac. (Use grep "__asm__ volatile" can find more x86 inline asm). And we need to update the inline asm on x86 rule in https://github.com/FFmpeg/FFmpeg/blob/master/doc/optimization.txt? Thanks. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH V1 1/3] lavu: Add alpha blending API based on row.
On Tue, 25 Sep 2018, Jun Zhao wrote: Add alpha blending API based on row, support global alpha blending/ per-pixel blending, and add SSSE3/AVX2 optimizations of the functions. You might want to take a look at libavfilter/vf_framerate.c and libavfilter/x86/vf_framerate.asm as well, they do something similar. Maybe you should factorize that instead. Regards, Marton ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH V1 1/3] lavu: Add alpha blending API based on row.
On 25 September 2018 at 16:27, Jun Zhao wrote: > Add alpha blending API based on row, support global alpha blending/ > per-pixel blending, and add SSSE3/AVX2 optimizations of the functions. > > Signed-off-by: Jun Zhao > --- > libavutil/Makefile |2 + > libavutil/blend.c | 101 > libavutil/blend.h | 47 ++ > libavutil/x86/Makefile |3 +- > libavutil/x86/blend.h | 32 > libavutil/x86/blend_init.c | 369 ++ > ++ > 6 files changed, 553 insertions(+), 1 deletions(-) > create mode 100644 libavutil/blend.c > create mode 100644 libavutil/blend.h > create mode 100644 libavutil/x86/blend.h > create mode 100644 libavutil/x86/blend_init.c > > diff --git a/libavutil/Makefile b/libavutil/Makefile > index 9ed24cf..f1c06e4 100644 > --- a/libavutil/Makefile > +++ b/libavutil/Makefile > @@ -10,6 +10,7 @@ HEADERS = adler32.h >\ >avstring.h\ >avutil.h \ >base64.h \ > + blend.h \ >blowfish.h\ >bprint.h \ >bswap.h \ > @@ -95,6 +96,7 @@ OBJS = adler32.o > \ > audio_fifo.o \ > avstring.o \ > base64.o \ > + blend.o \ > blowfish.o \ > bprint.o \ > buffer.o \ > diff --git a/libavutil/blend.c b/libavutil/blend.c > new file mode 100644 > index 000..e28efa0 > --- /dev/null > +++ b/libavutil/blend.c > @@ -0,0 +1,101 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License along > + * with FFmpeg; if not, write to the Free Software Foundation, Inc., > + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > + */ > + > +#include "libavutil/attributes.h" > +#include "libavutil/cpu.h" > +#include "libavutil/mem.h" > +#include "libavutil/x86/asm.h" > +#include "libavutil/blend.h" > + > +#include "libavutil/x86/blend.h" > + > +static void ff_global_blend_row_c(const uint8_t *src0, > + const uint8_t *src1, > + const uint8_t *alpha, /* XXX: only use > alpha[0] */ > + uint8_t *dst, > + int width) > +{ > +int x; > +for (x = 0; x < width - 1; x += 2) { > +dst[0] = (src0[0] * alpha[0] + src1[0] * (255 - alpha[0]) + 255) > >> 8; > +dst[1] = (src0[1] * alpha[0] + src1[1] * (255 - alpha[0]) + 255) > >> 8; > +src0 += 2; > +src1 += 2; > +dst += 2; > +} > +if (width & 1) { > +dst[0] = (src0[0] * alpha[0] + src1[0] * (255 - alpha[0]) + 255) > >> 8; > +} > +} > + > +void av_global_blend_row(const uint8_t *src0, > + const uint8_t *src1, > + const uint8_t *alpha, > + uint8_t *dst, > + int width) > +{ > +blend_row blend_row_fn = NULL; > + > +#if ARCH_X86 > +blend_row_fn = ff_blend_row_init_x86(1); > +#endif > + > +if (!blend_row_fn) > +blend_row_fn = ff_global_blend_row_c; > + > +blend_row_fn(src0, src1, alpha, dst, width); > +} > + > +static void ff_per_pixel_blend_row_c(const uint8_t *src0, > + const uint8_t *src1, > + const uint8_t *alpha, > + uint8_t *dst, > + int width) > +{ > +int x; > +for (x = 0; x < width - 1; x += 2) { > +dst[0] = (src0[0] * alpha[0] + src1[0] * (255 - alpha[0]) + 255) > >> 8; > +dst[1] = (src0[1] * alpha[0] + src1[1] * (255 - alpha[0]) +