Re: [FFmpeg-devel] [PATCH 2/2] libswscale/x86/yuv2rgb: add ssse3 version
> -Original Message- > From: ffmpeg-devel On Behalf Of > Michael Niedermayer > Sent: Friday, November 29, 2019 04:51 AM > To: FFmpeg development discussions and patches > Subject: Re: [FFmpeg-devel] [PATCH 2/2] libswscale/x86/yuv2rgb: add ssse3 > version > > On Thu, Nov 28, 2019 at 02:07:08PM +0800, Ting Fu wrote: > > Signed-off-by: Ting Fu > > --- > > libswscale/x86/yuv2rgb.c | 5 + > > libswscale/x86/yuv2rgb_template.c | 58 ++- > > libswscale/x86/yuv_2_rgb.asm | 163 +++--- > > 3 files changed, 208 insertions(+), 18 deletions(-) > > breaks build on x86-32 > make > X86ASMlibswscale/x86/yuv_2_rgb.o > src/libswscale/x86/yuv_2_rgb.asm:400: error: parser: instruction expected > src/libswscale/x86/yuv_2_rgb.asm:331: ... from macro `yuv2rgb_fn' defined > here > src/libswscale/x86/yuv_2_rgb.asm:400: error: parser: instruction expected > src/libswscale/x86/yuv_2_rgb.asm:332: ... from macro `yuv2rgb_fn' defined > here > src/libswscale/x86/yuv_2_rgb.asm:400: error: parser: instruction expected > src/libswscale/x86/yuv_2_rgb.asm:333: ... from macro `yuv2rgb_fn' defined > here > src/libswscale/x86/yuv_2_rgb.asm:401: error: label `BROADCAST' inconsistently > redefined > src/libswscale/x86/yuv_2_rgb.asm:331: ... from macro `yuv2rgb_fn' defined > here > src/libswscale/x86/yuv_2_rgb.asm:400: note: label `BROADCAST' originally > defined here > src/libswscale/x86/yuv_2_rgb.asm:331: ... from macro `yuv2rgb_fn' defined > here > src/libswscale/x86/yuv_2_rgb.asm:401: error: parser: instruction expected > src/libswscale/x86/yuv_2_rgb.asm:331: ... from macro `yuv2rgb_fn' defined > here > src/libswscale/x86/yuv_2_rgb.asm:401: error: parser: instruction expected > src/libswscale/x86/yuv_2_rgb.asm:332: ... from macro `yuv2rgb_fn' defined > here > src/libswscale/x86/yuv_2_rgb.asm:401: error: parser: instruction expected > src/libswscale/x86/yuv_2_rgb.asm:333: ... from macro `yuv2rgb_fn' defined > here > make: *** [libswscale/x86/yuv_2_rgb.o] Error > Hi Michael, This error comes from the macro define of BROADCAST only under ARCH_X86_64. And I have changed it into VBROADCASTSD(defined in x86util.asm) in PATCH V2. What's more, the md5 test passed in linux32/64 and windows64. Thank you for review. Ting Fu > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > I do not agree with what you have to say, but I'll defend to the death your > right > to say it. -- Voltaire ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/2] libswscale/x86/yuv2rgb: add ssse3 version
On Thu, Nov 28, 2019 at 02:07:08PM +0800, Ting Fu wrote: > Signed-off-by: Ting Fu > --- > libswscale/x86/yuv2rgb.c | 5 + > libswscale/x86/yuv2rgb_template.c | 58 ++- > libswscale/x86/yuv_2_rgb.asm | 163 +++--- > 3 files changed, 208 insertions(+), 18 deletions(-) breaks build on x86-32 make X86ASM libswscale/x86/yuv_2_rgb.o src/libswscale/x86/yuv_2_rgb.asm:400: error: parser: instruction expected src/libswscale/x86/yuv_2_rgb.asm:331: ... from macro `yuv2rgb_fn' defined here src/libswscale/x86/yuv_2_rgb.asm:400: error: parser: instruction expected src/libswscale/x86/yuv_2_rgb.asm:332: ... from macro `yuv2rgb_fn' defined here src/libswscale/x86/yuv_2_rgb.asm:400: error: parser: instruction expected src/libswscale/x86/yuv_2_rgb.asm:333: ... from macro `yuv2rgb_fn' defined here src/libswscale/x86/yuv_2_rgb.asm:401: error: label `BROADCAST' inconsistently redefined src/libswscale/x86/yuv_2_rgb.asm:331: ... from macro `yuv2rgb_fn' defined here src/libswscale/x86/yuv_2_rgb.asm:400: note: label `BROADCAST' originally defined here src/libswscale/x86/yuv_2_rgb.asm:331: ... from macro `yuv2rgb_fn' defined here src/libswscale/x86/yuv_2_rgb.asm:401: error: parser: instruction expected src/libswscale/x86/yuv_2_rgb.asm:331: ... from macro `yuv2rgb_fn' defined here src/libswscale/x86/yuv_2_rgb.asm:401: error: parser: instruction expected src/libswscale/x86/yuv_2_rgb.asm:332: ... from macro `yuv2rgb_fn' defined here src/libswscale/x86/yuv_2_rgb.asm:401: error: parser: instruction expected src/libswscale/x86/yuv_2_rgb.asm:333: ... from macro `yuv2rgb_fn' defined here make: *** [libswscale/x86/yuv_2_rgb.o] Error [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB I do not agree with what you have to say, but I'll defend to the death your right to say it. -- Voltaire signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/2] libswscale/x86/yuv2rgb: add ssse3 version
> -Original Message- > From: ffmpeg-devel On Behalf Of Carl > Eugen Hoyos > Sent: Thursday, November 28, 2019 02:29 PM > To: FFmpeg development discussions and patches > Subject: Re: [FFmpeg-devel] [PATCH 2/2] libswscale/x86/yuv2rgb: add ssse3 > version > > > > > Am 28.11.2019 um 07:07 schrieb Ting Fu : > > > > +#if HAVE_SSSE3 > > +#define COMPILE_TEMPLATE_SSSE3 1 > > +#endif > > Please add a line about performance to the commit message. > > Carl Eugen Hi Carl, Sorry for the missing performance info, I tested it with raw YUV format video, the command is: ./ffmpeg -pix_fmt yuv420p -s 1920*1080 -i input.yuv -vcodec rawvideo -s 1920*1080 -pix_fmt rgb24 -f null /dev/null The outputs are as follows on my local machine: output fmt RGB24: mmx: 337fps ssse3: 634fps output fmt RGB32: mmx: 375fps ssse3: 653fps output fmt RGB555: mmx: 427fps ssse3: 917fps And I will add these infos in the PATCH V2. Tank you Tin Fu > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org > with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/2] libswscale/x86/yuv2rgb: add ssse3 version
> Am 28.11.2019 um 07:07 schrieb Ting Fu : > > +#if HAVE_SSSE3 > +#define COMPILE_TEMPLATE_SSSE3 1 > +#endif Please add a line about performance to the commit message. Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/2] libswscale/x86/yuv2rgb: add ssse3 version
Signed-off-by: Ting Fu --- libswscale/x86/yuv2rgb.c | 5 + libswscale/x86/yuv2rgb_template.c | 58 ++- libswscale/x86/yuv_2_rgb.asm | 163 +++--- 3 files changed, 208 insertions(+), 18 deletions(-) diff --git a/libswscale/x86/yuv2rgb.c b/libswscale/x86/yuv2rgb.c index 70412a3914..d983934762 100644 --- a/libswscale/x86/yuv2rgb.c +++ b/libswscale/x86/yuv2rgb.c @@ -61,6 +61,11 @@ DECLARE_ASM_CONST(8, uint64_t, pb_07) = 0x0707070707070707ULL; #define COMPILE_TEMPLATE_MMXEXT 1 #endif /* HAVE_MMXEXT */ +//SSSE3 versions +#if HAVE_SSSE3 +#define COMPILE_TEMPLATE_SSSE3 1 +#endif + #include "yuv2rgb_template.c" av_cold SwsFunc ff_yuv2rgb_init_x86(SwsContext *c) diff --git a/libswscale/x86/yuv2rgb_template.c b/libswscale/x86/yuv2rgb_template.c index efe6356f30..fe586047f0 100644 --- a/libswscale/x86/yuv2rgb_template.c +++ b/libswscale/x86/yuv2rgb_template.c @@ -40,6 +40,30 @@ const uint8_t *pv = src[2] + (y >> vshift) * srcStride[2]; \ x86_reg index = -h_size / 2; \ +extern void ff_yuv_420_rgb24_ssse3(x86_reg index, uint8_t *image, const uint8_t *pu_index, + const uint8_t *pv_index, const uint8_t *pointer_c_dither, + const uint8_t *py_2index); +extern void ff_yuv_420_bgr24_ssse3(x86_reg index, uint8_t *image, const uint8_t *pu_index, + const uint8_t *pv_index, const uint8_t *pointer_c_dither, + const uint8_t *py_2index); +extern void ff_yuv_420_rgb15_ssse3(x86_reg index, uint8_t *image, const uint8_t *pu_index, + const uint8_t *pv_index, const uint8_t *pointer_c_dither, + const uint8_t *py_2index); +extern void ff_yuv_420_rgb16_ssse3(x86_reg index, uint8_t *image, const uint8_t *pu_index, + const uint8_t *pv_index, const uint8_t *pointer_c_dither, + const uint8_t *py_2index); +extern void ff_yuv_420_rgb32_ssse3(x86_reg index, uint8_t *image, const uint8_t *pu_index, + const uint8_t *pv_index, const uint8_t *pointer_c_dither, + const uint8_t *py_2index); +extern void ff_yuv_420_bgr32_ssse3(x86_reg index, uint8_t *image, const uint8_t *pu_index, + const uint8_t *pv_index, const uint8_t *pointer_c_dither, + const uint8_t *py_2index); +extern void ff_yuva_420_rgb32_ssse3(x86_reg index, uint8_t *image, const uint8_t *pu_index, +const uint8_t *pv_index, const uint8_t *pointer_c_dither, +const uint8_t *py_2index, const uint8_t *pa_2index); +extern void ff_yuva_420_bgr32_ssse3(x86_reg index, uint8_t *image, const uint8_t *pu_index, +const uint8_t *pv_index, const uint8_t *pointer_c_dither, +const uint8_t *py_2index, const uint8_t *pa_2index); extern void ff_yuv_420_rgb24_mmxext(x86_reg index, uint8_t *image, const uint8_t *pu_index, const uint8_t *pv_index, const uint8_t *pointer_c_dither, const uint8_t *py_2index); @@ -84,7 +108,12 @@ static inline int yuv420_rgb15(SwsContext *c, const uint8_t *src[], c->greenDither = ff_dither8[y & 1]; c->redDither = ff_dither8[(y + 1) & 1]; #endif + +#if COMPILE_TEMPLATE_SSSE3 +ff_yuv_420_rgb15_ssse3(index, image, pu - index, pv - index, &(c->redDither), py - 2 * index); +#else ff_yuv_420_rgb15_mmx(index, image, pu - index, pv - index, &(c->redDither), py - 2 * index); +#endif } return srcSliceH; } @@ -102,7 +131,12 @@ static inline int yuv420_rgb16(SwsContext *c, const uint8_t *src[], c->greenDither = ff_dither4[y & 1]; c->redDither = ff_dither8[(y + 1) & 1]; #endif + +#if COMPILE_TEMPLATE_SSSE3 +ff_yuv_420_rgb16_ssse3(index, image, pu - index, pv - index, &(c->redDither), py - 2 * index); +#else ff_yuv_420_rgb16_mmx(index, image, pu - index, pv - index, &(c->redDither), py - 2 * index); +#endif } return srcSliceH; } @@ -115,7 +149,9 @@ static inline int yuv420_rgb24(SwsContext *c, const uint8_t *src[], int y, h_size, vshift; YUV2RGB_LOOP(3) -#if COMPILE_TEMPLATE_MMXEXT +#if COMPILE_TEMPLATE_SSSE3 +ff_yuv_420_rgb24_ssse3(index, image, pu - index, pv - index, &(c->redDither), py - 2 * index); +#elif COMPILE_TEMPLATE_MMXEXT ff_yuv_420_rgb24_mmxext(index, image, pu - index, pv - index, &(c->redDither), py - 2 * index); #else ff_yuv_420_rgb24_mmx(index, image, pu - index, pv - index, &(c->redDither), py - 2 * index); @@ -132,7 +168,9 @@ static inline int yuv420_bgr24(SwsContext *c, const uint8_t *src[], int y, h_size,