Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
On Sat, Dec 19, 2015 at 06:08:10PM +0100, Michael Niedermayer wrote: > On Sat, Dec 19, 2015 at 01:28:06PM +0100, Matthieu Bouron wrote: > > On Sat, Dec 19, 2015 at 11:50:21AM +0100, Michael Niedermayer wrote: > > > On Sat, Dec 19, 2015 at 10:56:26AM +0100, Matthieu Bouron wrote: > > > > On Sat, Dec 19, 2015 at 03:24:17AM +0100, Michael Niedermayer wrote: > > > > > On Fri, Dec 18, 2015 at 04:33:37PM +0100, Matthieu Bouron wrote: > > > > > > On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouron > > > > > >> > > > > > wrote: > > > > > > > > > > > > > From: Matthieu Bouron > > > > > > > > > > > > > > --- > > > > > > > libswscale/arm/swscale_unscaled.c | 52 +++--- > > > > > > > libswscale/arm/yuv2rgb_neon.S | 77 > > > > > > > --- > > > > > > > 2 files changed, 118 insertions(+), 11 deletions(-) > > > > > > > > > > > > > > diff --git a/libswscale/arm/swscale_unscaled.c > > > > > > > b/libswscale/arm/swscale_unscaled.c > > > > > > > index 8aa6432..dce987e 100644 > > > > > > > --- a/libswscale/arm/swscale_unscaled.c > > > > > > > +++ b/libswscale/arm/swscale_unscaled.c > > > > > > > @@ -63,6 +63,50 @@ static int > > > > > > > rgbx_to_nv12_neon_16_wrapper(SwsContext > > > > > > > *context, const uint8_t *src[ > > > > > > > } > > > > > > > #endif > > > > > > > > > > > > > > +#define YUV_TO_RGB_TABLE(precision) > > > > > > >\ > > > > > > > +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > > > > \ > > > > > > > +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > > > > \ > > > > > > > +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > > > > \ > > > > > > > +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > > > > \ > > > > > > > + > > > > > > > +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision) > > > > > > >\ > > > > > > > +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h, > > > > > > >\ > > > > > > > + uint8_t *dst, int linesize, > > > > > > > \ > > > > > > > + const uint8_t *srcY, int > > > > > > > linesizeY, > > > > > > > \ > > > > > > > + const uint8_t *srcU, int > > > > > > > linesizeU, > > > > > > > \ > > > > > > > + const uint8_t *srcV, int > > > > > > > linesizeV, > > > > > > > \ > > > > > > > + const int16_t *table, > > > > > > > \ > > > > > > > + int y_offset, > > > > > > > \ > > > > > > > + int y_coeff); > > > > > > > \ > > > > > > > + > > > > > > > \ > > > > > > > +static int > > > > > > > yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, > > > > > > > const uint8_t *src[],\ > > > > > > > + int srcStride[], int > > > > > > > srcSliceY, int srcSliceH, \ > > > > > > > + uint8_t *dst[], int > > > > > > > dstStride[]) { \ > > > > > > > +const int16_t yuv2rgb_table[] = { > > > > > > > YUV_TO_RGB_TABLE(precision) }; > > > > > > > \ > > > > > > > + > > > > > > > \ > > > > > > > +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH, > > > > > > >\ > > > > > > > + dst[0] + srcSliceY * > > > > > > > dstStride[0], > > > > > > > dstStride[0], \ > > > > > > > + src[0] + srcSliceY * > > > > > > > srcStride[0], > > > > > > > srcStride[0], \ > > > > > > > + src[1] + (srcSliceY / 2) * > > > > > > > srcStride[1], > > > > > > > srcStride[1], \ > > > > > > > + src[2] + (srcSliceY / 2) * > > > > > > > srcStride[2], > > > > > > > srcStride[2], \ > > > > > > > + yuv2rgb_table, > > > > > > >\ > > > > > > > + c->yuv2rgb_y_offset >> 9, > > > > > > > \ > > > > > > > + c->yuv2rgb_y_coeff / > > > > > > > ((precision) == 16 > > > > > > > ? 1 << 7 : 1));\ > > > > > > > + > > > > > > > \ > > > > > > > +return 0; > > > > > > >\ > > > > > > > +} > > > > > > >\ > > > > > > > + > > > > > > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision) > > > > > > >\ > > > > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision) > > > > > > >\ > > > > >
Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
On Sat, Dec 19, 2015 at 01:28:06PM +0100, Matthieu Bouron wrote: > On Sat, Dec 19, 2015 at 11:50:21AM +0100, Michael Niedermayer wrote: > > On Sat, Dec 19, 2015 at 10:56:26AM +0100, Matthieu Bouron wrote: > > > On Sat, Dec 19, 2015 at 03:24:17AM +0100, Michael Niedermayer wrote: > > > > On Fri, Dec 18, 2015 at 04:33:37PM +0100, Matthieu Bouron wrote: > > > > > On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouron > > > > >> > > > > wrote: > > > > > > > > > > > From: Matthieu Bouron > > > > > > > > > > > > --- > > > > > > libswscale/arm/swscale_unscaled.c | 52 +++--- > > > > > > libswscale/arm/yuv2rgb_neon.S | 77 > > > > > > --- > > > > > > 2 files changed, 118 insertions(+), 11 deletions(-) > > > > > > > > > > > > diff --git a/libswscale/arm/swscale_unscaled.c > > > > > > b/libswscale/arm/swscale_unscaled.c > > > > > > index 8aa6432..dce987e 100644 > > > > > > --- a/libswscale/arm/swscale_unscaled.c > > > > > > +++ b/libswscale/arm/swscale_unscaled.c > > > > > > @@ -63,6 +63,50 @@ static int > > > > > > rgbx_to_nv12_neon_16_wrapper(SwsContext > > > > > > *context, const uint8_t *src[ > > > > > > } > > > > > > #endif > > > > > > > > > > > > +#define YUV_TO_RGB_TABLE(precision) > > > > > >\ > > > > > > +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > > > \ > > > > > > +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > > > \ > > > > > > +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > > > \ > > > > > > +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > > > \ > > > > > > + > > > > > > +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision) > > > > > >\ > > > > > > +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h, > > > > > >\ > > > > > > + uint8_t *dst, int linesize, > > > > > > \ > > > > > > + const uint8_t *srcY, int > > > > > > linesizeY, > > > > > > \ > > > > > > + const uint8_t *srcU, int > > > > > > linesizeU, > > > > > > \ > > > > > > + const uint8_t *srcV, int > > > > > > linesizeV, > > > > > > \ > > > > > > + const int16_t *table, > > > > > > \ > > > > > > + int y_offset, > > > > > > \ > > > > > > + int y_coeff); > > > > > > \ > > > > > > + > > > > > > \ > > > > > > +static int yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext > > > > > > *c, > > > > > > const uint8_t *src[],\ > > > > > > + int srcStride[], int > > > > > > srcSliceY, int srcSliceH, \ > > > > > > + uint8_t *dst[], int > > > > > > dstStride[]) { \ > > > > > > +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) > > > > > > }; > > > > > > \ > > > > > > + > > > > > > \ > > > > > > +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH, > > > > > >\ > > > > > > + dst[0] + srcSliceY * > > > > > > dstStride[0], > > > > > > dstStride[0], \ > > > > > > + src[0] + srcSliceY * > > > > > > srcStride[0], > > > > > > srcStride[0], \ > > > > > > + src[1] + (srcSliceY / 2) * > > > > > > srcStride[1], > > > > > > srcStride[1], \ > > > > > > + src[2] + (srcSliceY / 2) * > > > > > > srcStride[2], > > > > > > srcStride[2], \ > > > > > > + yuv2rgb_table, > > > > > >\ > > > > > > + c->yuv2rgb_y_offset >> 9, > > > > > > \ > > > > > > + c->yuv2rgb_y_coeff / ((precision) > > > > > > == 16 > > > > > > ? 1 << 7 : 1));\ > > > > > > + > > > > > > \ > > > > > > +return 0; > > > > > >\ > > > > > > +} > > > > > >\ > > > > > > + > > > > > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision) > > > > > >\ > > > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision) > > > > > >\ > > > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(rgba, precision) > > > > > >\ > > > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(abgr, precision) > > > > > >\ > > > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(bgra, precision) > > > > > >\ >
Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
On Sat, Dec 19, 2015 at 03:24:17AM +0100, Michael Niedermayer wrote: > On Fri, Dec 18, 2015 at 04:33:37PM +0100, Matthieu Bouron wrote: > > On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouron> > wrote: > > > > > From: Matthieu Bouron > > > > > > --- > > > libswscale/arm/swscale_unscaled.c | 52 +++--- > > > libswscale/arm/yuv2rgb_neon.S | 77 > > > --- > > > 2 files changed, 118 insertions(+), 11 deletions(-) > > > > > > diff --git a/libswscale/arm/swscale_unscaled.c > > > b/libswscale/arm/swscale_unscaled.c > > > index 8aa6432..dce987e 100644 > > > --- a/libswscale/arm/swscale_unscaled.c > > > +++ b/libswscale/arm/swscale_unscaled.c > > > @@ -63,6 +63,50 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext > > > *context, const uint8_t *src[ > > > } > > > #endif > > > > > > +#define YUV_TO_RGB_TABLE(precision) > > >\ > > > +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > \ > > > +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > \ > > > +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > \ > > > +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > \ > > > + > > > +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision) > > >\ > > > +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h, > > >\ > > > + uint8_t *dst, int linesize, > > > \ > > > + const uint8_t *srcY, int linesizeY, > > > \ > > > + const uint8_t *srcU, int linesizeU, > > > \ > > > + const uint8_t *srcV, int linesizeV, > > > \ > > > + const int16_t *table, > > > \ > > > + int y_offset, > > > \ > > > + int y_coeff); > > > \ > > > + > > > \ > > > +static int yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, > > > const uint8_t *src[],\ > > > + int srcStride[], int > > > srcSliceY, int srcSliceH, \ > > > + uint8_t *dst[], int > > > dstStride[]) { \ > > > +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) }; > > > \ > > > + > > > \ > > > +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH, > > >\ > > > + dst[0] + srcSliceY * dstStride[0], > > > dstStride[0], \ > > > + src[0] + srcSliceY * srcStride[0], > > > srcStride[0], \ > > > + src[1] + (srcSliceY / 2) * srcStride[1], > > > srcStride[1], \ > > > + src[2] + (srcSliceY / 2) * srcStride[2], > > > srcStride[2], \ > > > + yuv2rgb_table, > > >\ > > > + c->yuv2rgb_y_offset >> 9, > > > \ > > > + c->yuv2rgb_y_coeff / ((precision) == 16 > > > ? 1 << 7 : 1));\ > > > + > > > \ > > > +return 0; > > >\ > > > +} > > >\ > > > + > > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision) > > >\ > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision) > > >\ > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(rgba, precision) > > >\ > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(abgr, precision) > > >\ > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(bgra, precision) > > >\ > > > + > > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS > > > \ > > > +DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(16) > > > \ > > > + > > > +DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS > > > + > > > #define DECLARE_FF_NVX_TO_RGBX_FUNCS(ifmt, ofmt, precision) > > >\ > > > int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h, > > > \ > > > uint8_t *dst, int linesize, > > > \ > > > @@ -75,12 +119,7 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, > > > int h, > > > static int ifmt##_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, > > > const uint8_t *src[], \ > > > int srcStride[], int > > > srcSliceY, int srcSliceH, \ > > > uint8_t *dst[], int > > > dstStride[]) { \ > >
Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
On Sat, Dec 19, 2015 at 10:56:26AM +0100, Matthieu Bouron wrote: > On Sat, Dec 19, 2015 at 03:24:17AM +0100, Michael Niedermayer wrote: > > On Fri, Dec 18, 2015 at 04:33:37PM +0100, Matthieu Bouron wrote: > > > On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouron > > >> > > wrote: > > > > > > > From: Matthieu Bouron > > > > > > > > --- > > > > libswscale/arm/swscale_unscaled.c | 52 +++--- > > > > libswscale/arm/yuv2rgb_neon.S | 77 > > > > --- > > > > 2 files changed, 118 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/libswscale/arm/swscale_unscaled.c > > > > b/libswscale/arm/swscale_unscaled.c > > > > index 8aa6432..dce987e 100644 > > > > --- a/libswscale/arm/swscale_unscaled.c > > > > +++ b/libswscale/arm/swscale_unscaled.c > > > > @@ -63,6 +63,50 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext > > > > *context, const uint8_t *src[ > > > > } > > > > #endif > > > > > > > > +#define YUV_TO_RGB_TABLE(precision) > > > >\ > > > > +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > \ > > > > +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > \ > > > > +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > \ > > > > +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1), > > > > \ > > > > + > > > > +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision) > > > >\ > > > > +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h, > > > >\ > > > > + uint8_t *dst, int linesize, > > > > \ > > > > + const uint8_t *srcY, int linesizeY, > > > > \ > > > > + const uint8_t *srcU, int linesizeU, > > > > \ > > > > + const uint8_t *srcV, int linesizeV, > > > > \ > > > > + const int16_t *table, > > > > \ > > > > + int y_offset, > > > > \ > > > > + int y_coeff); > > > > \ > > > > + > > > > \ > > > > +static int yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, > > > > const uint8_t *src[],\ > > > > + int srcStride[], int > > > > srcSliceY, int srcSliceH, \ > > > > + uint8_t *dst[], int > > > > dstStride[]) { \ > > > > +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) }; > > > > \ > > > > + > > > > \ > > > > +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH, > > > >\ > > > > + dst[0] + srcSliceY * > > > > dstStride[0], > > > > dstStride[0], \ > > > > + src[0] + srcSliceY * > > > > srcStride[0], > > > > srcStride[0], \ > > > > + src[1] + (srcSliceY / 2) * > > > > srcStride[1], > > > > srcStride[1], \ > > > > + src[2] + (srcSliceY / 2) * > > > > srcStride[2], > > > > srcStride[2], \ > > > > + yuv2rgb_table, > > > >\ > > > > + c->yuv2rgb_y_offset >> 9, > > > > \ > > > > + c->yuv2rgb_y_coeff / ((precision) == > > > > 16 > > > > ? 1 << 7 : 1));\ > > > > + > > > > \ > > > > +return 0; > > > >\ > > > > +} > > > >\ > > > > + > > > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision) > > > >\ > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision) > > > >\ > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(rgba, precision) > > > >\ > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(abgr, precision) > > > >\ > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(bgra, precision) > > > >\ > > > > + > > > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS > > > > \ > > > > +DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(16) > > > > \ > > > > + > > > > +DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS > > > > + > > > > #define DECLARE_FF_NVX_TO_RGBX_FUNCS(ifmt, ofmt, precision) > > > >\ > > > > int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h, > > > > \ > > > > uint8_t *dst, int linesize, > > > > \ > > > > @@ -75,12 +119,7 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int
Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
On Thu, Dec 17, 2015 at 07:47:08PM +0100, Michael Niedermayer wrote: > On Thu, Dec 17, 2015 at 04:54:31PM +0100, Matthieu Bouron wrote: > > On Tue, Dec 15, 2015 at 06:22:43PM +0100, Michael Niedermayer wrote: > > > On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote: > > > > From: Matthieu Bouron> > > > > > > > --- > > > > > > > > Hi, > > > > > > > > This commit is likely to break fate on arm since the current C code path > > > > seems to use less precision. > > > > > > > > How should I proceed to fix it ? > > > > > > hmm > > > can the precission of the C code path and any asm impl of it under > > > bitexact (if they exist), be changed to higher precission without > > > speedloss? > > > if so that would be an option > > > > We are currently facing 4 cases (with this patch applied) > > > > * [1] ARM +ACCURATE_RND: uses neon, 13bit coefficients and 32bit > > precision overall > > * [2] ARM -ACCURATE_RND: uses neon, 6bit coefficients and 16bit > > precision overall > > > * [3] X86 +ACCURATE_RND: uses a C code path with lookup tables > > which LUT do you mean here ? The table filled by ff_yuv2rgb_c_init_tables. Not sure if it's used though, I haven't looked at the C code that actually does the conversion (yet). > > > > * [4] X86 -ACCURATE_RND: uses MMX+MMXEXT with apparently 13bit > > coefficients (libswscale/yuv2rgb.c around line 800). > > > > Notes: > > * The 4 outputs are different with [3] being ugly (ghosting/non-sharp > > edges). > > > > * [1] and [4] (13bit coefficient accuracy) should be the same but have > > slight differences. > > > > Questions: > > > > > * What is the meaning of the ACCURATE_RND flag ? > > it should enable accurate rounding > > > > * Does [3] use some kind of interpolation instead of duplicating > > chroma lines ? Its output seems inferior to the other code paths. > > are you sure that is true for real images? > its easy to end up with wrong conclusions with synthetic inputs > unless you want to use the scaler only for such inputs. > > either way line duplication is likely not optimal for real images > iam not made of constant color blocks that are aligned to some cammeras > 2x2 samples > > > > * Is [3] the output that should be taken as reference ? > > id say, the reference is reality, making the output as close as a > image of the new resolution would be if it had been taken that way > > > > * Should we use BITEXACT instead of ACCURATE_RND to determine the > > precision used ? > > BITEXACT is to avoid platform differences and allow regression tests > > if all else is equal it would be best if C and asm matches, and if > C is bad then it should be improved Here are the C, MMX and NEON outputs from a photo: http://0x5c.me/yuv2rgb/photos The C and NEON outputs look almost the same. The MMX one have slightly different "colors" overall. Since figuring out what the C code is actually doing and have the neon asm matches its output is likely to take some time. Would you be OK if, on the ARM platform, +ACCURATE_RND uses the C code path (so fate is not broken), and -ACCURATE_RND uses the neon code path with a precision of 16bit (IMHO, speed is preferred over the slight quality gain of the 32bit version on this platform) ? This behaviour will affect yuv420p but also nv12 and nv21 inputs. This is a kind of a temporary (and lame) solution until I have some time to work on it. Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouronwrote: > From: Matthieu Bouron > > --- > libswscale/arm/swscale_unscaled.c | 52 +++--- > libswscale/arm/yuv2rgb_neon.S | 77 > --- > 2 files changed, 118 insertions(+), 11 deletions(-) > > diff --git a/libswscale/arm/swscale_unscaled.c > b/libswscale/arm/swscale_unscaled.c > index 8aa6432..dce987e 100644 > --- a/libswscale/arm/swscale_unscaled.c > +++ b/libswscale/arm/swscale_unscaled.c > @@ -63,6 +63,50 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext > *context, const uint8_t *src[ > } > #endif > > +#define YUV_TO_RGB_TABLE(precision) >\ > +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1), > \ > +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > \ > +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > \ > +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1), > \ > + > +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision) >\ > +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h, >\ > + uint8_t *dst, int linesize, > \ > + const uint8_t *srcY, int linesizeY, > \ > + const uint8_t *srcU, int linesizeU, > \ > + const uint8_t *srcV, int linesizeV, > \ > + const int16_t *table, > \ > + int y_offset, > \ > + int y_coeff); > \ > + > \ > +static int yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, > const uint8_t *src[],\ > + int srcStride[], int > srcSliceY, int srcSliceH, \ > + uint8_t *dst[], int > dstStride[]) { \ > +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) }; > \ > + > \ > +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH, >\ > + dst[0] + srcSliceY * dstStride[0], > dstStride[0], \ > + src[0] + srcSliceY * srcStride[0], > srcStride[0], \ > + src[1] + (srcSliceY / 2) * srcStride[1], > srcStride[1], \ > + src[2] + (srcSliceY / 2) * srcStride[2], > srcStride[2], \ > + yuv2rgb_table, >\ > + c->yuv2rgb_y_offset >> 9, > \ > + c->yuv2rgb_y_coeff / ((precision) == 16 > ? 1 << 7 : 1));\ > + > \ > +return 0; >\ > +} >\ > + > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision) >\ > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision) >\ > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(rgba, precision) >\ > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(abgr, precision) >\ > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(bgra, precision) >\ > + > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS > \ > +DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(16) > \ > + > +DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS > + > #define DECLARE_FF_NVX_TO_RGBX_FUNCS(ifmt, ofmt, precision) >\ > int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h, > \ > uint8_t *dst, int linesize, > \ > @@ -75,12 +119,7 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, > int h, > static int ifmt##_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, > const uint8_t *src[], \ > int srcStride[], int > srcSliceY, int srcSliceH, \ > uint8_t *dst[], int > dstStride[]) { \ > -const int16_t yuv2rgb_table[] = { >\ > -c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1), > \ > -c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > \ > -c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > \ > -c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1), > \ > -}; > \ > +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) }; > \ > >\ >
Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
On Fri, Dec 18, 2015 at 04:33:37PM +0100, Matthieu Bouron wrote: > On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouron> wrote: > > > From: Matthieu Bouron > > > > --- > > libswscale/arm/swscale_unscaled.c | 52 +++--- > > libswscale/arm/yuv2rgb_neon.S | 77 > > --- > > 2 files changed, 118 insertions(+), 11 deletions(-) > > > > diff --git a/libswscale/arm/swscale_unscaled.c > > b/libswscale/arm/swscale_unscaled.c > > index 8aa6432..dce987e 100644 > > --- a/libswscale/arm/swscale_unscaled.c > > +++ b/libswscale/arm/swscale_unscaled.c > > @@ -63,6 +63,50 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext > > *context, const uint8_t *src[ > > } > > #endif > > > > +#define YUV_TO_RGB_TABLE(precision) > >\ > > +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1), > > \ > > +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > > \ > > +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > > \ > > +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1), > > \ > > + > > +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision) > >\ > > +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h, > >\ > > + uint8_t *dst, int linesize, > > \ > > + const uint8_t *srcY, int linesizeY, > > \ > > + const uint8_t *srcU, int linesizeU, > > \ > > + const uint8_t *srcV, int linesizeV, > > \ > > + const int16_t *table, > > \ > > + int y_offset, > > \ > > + int y_coeff); > > \ > > + > > \ > > +static int yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, > > const uint8_t *src[],\ > > + int srcStride[], int > > srcSliceY, int srcSliceH, \ > > + uint8_t *dst[], int > > dstStride[]) { \ > > +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) }; > > \ > > + > > \ > > +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH, > >\ > > + dst[0] + srcSliceY * dstStride[0], > > dstStride[0], \ > > + src[0] + srcSliceY * srcStride[0], > > srcStride[0], \ > > + src[1] + (srcSliceY / 2) * srcStride[1], > > srcStride[1], \ > > + src[2] + (srcSliceY / 2) * srcStride[2], > > srcStride[2], \ > > + yuv2rgb_table, > >\ > > + c->yuv2rgb_y_offset >> 9, > > \ > > + c->yuv2rgb_y_coeff / ((precision) == 16 > > ? 1 << 7 : 1));\ > > + > > \ > > +return 0; > >\ > > +} > >\ > > + > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision) > >\ > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision) > >\ > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(rgba, precision) > >\ > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(abgr, precision) > >\ > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(bgra, precision) > >\ > > + > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS > > \ > > +DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(16) > > \ > > + > > +DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS > > + > > #define DECLARE_FF_NVX_TO_RGBX_FUNCS(ifmt, ofmt, precision) > >\ > > int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h, > > \ > > uint8_t *dst, int linesize, > > \ > > @@ -75,12 +119,7 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, > > int h, > > static int ifmt##_to_##ofmt##_neon_wrapper_##precision(SwsContext *c, > > const uint8_t *src[], \ > > int srcStride[], int > > srcSliceY, int srcSliceH, \ > > uint8_t *dst[], int > > dstStride[]) { \ > > -const int16_t yuv2rgb_table[] = { > >\ > > -c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1), > > \ > > -c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1), > > \ > > -c->yuv2rgb_v2g_coeff / ((precision) ==
Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
On Fri, Dec 18, 2015 at 11:44:27AM +0100, Matthieu Bouron wrote: > On Thu, Dec 17, 2015 at 07:47:08PM +0100, Michael Niedermayer wrote: > > On Thu, Dec 17, 2015 at 04:54:31PM +0100, Matthieu Bouron wrote: > > > On Tue, Dec 15, 2015 at 06:22:43PM +0100, Michael Niedermayer wrote: > > > > On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote: > > > > > From: Matthieu Bouron> > > > > > > > > > --- > > > > > > > > > > Hi, > > > > > > > > > > This commit is likely to break fate on arm since the current C code > > > > > path > > > > > seems to use less precision. > > > > > > > > > > How should I proceed to fix it ? > > > > > > > > hmm > > > > can the precission of the C code path and any asm impl of it under > > > > bitexact (if they exist), be changed to higher precission without > > > > speedloss? > > > > if so that would be an option > > > > > > We are currently facing 4 cases (with this patch applied) > > > > > > * [1] ARM +ACCURATE_RND: uses neon, 13bit coefficients and 32bit > > > precision overall > > > * [2] ARM -ACCURATE_RND: uses neon, 6bit coefficients and 16bit > > > precision overall > > > > > * [3] X86 +ACCURATE_RND: uses a C code path with lookup tables > > > > which LUT do you mean here ? > > The table filled by ff_yuv2rgb_c_init_tables. Not sure if it's used > though, I haven't looked at the C code that actually does the conversion > (yet). > > > > > > > > * [4] X86 -ACCURATE_RND: uses MMX+MMXEXT with apparently 13bit > > > coefficients (libswscale/yuv2rgb.c around line 800). > > > > > > Notes: > > > * The 4 outputs are different with [3] being ugly (ghosting/non-sharp > > > edges). > > > > > > * [1] and [4] (13bit coefficient accuracy) should be the same but have > > > slight differences. > > > > > > Questions: > > > > > > > > * What is the meaning of the ACCURATE_RND flag ? > > > > it should enable accurate rounding > > > > > > > * Does [3] use some kind of interpolation instead of duplicating > > > chroma lines ? Its output seems inferior to the other code paths. > > > > are you sure that is true for real images? > > its easy to end up with wrong conclusions with synthetic inputs > > unless you want to use the scaler only for such inputs. > > > > either way line duplication is likely not optimal for real images > > iam not made of constant color blocks that are aligned to some cammeras > > 2x2 samples > > > > > > > * Is [3] the output that should be taken as reference ? > > > > id say, the reference is reality, making the output as close as a > > image of the new resolution would be if it had been taken that way > > > > > > > * Should we use BITEXACT instead of ACCURATE_RND to determine the > > > precision used ? > > > > BITEXACT is to avoid platform differences and allow regression tests > > > > if all else is equal it would be best if C and asm matches, and if > > C is bad then it should be improved > > Here are the C, MMX and NEON outputs from a photo: > http://0x5c.me/yuv2rgb/photos > > The C and NEON outputs look almost the same. > The MMX one have slightly different "colors" overall. > > Since figuring out what the C code is actually doing and have the neon asm > matches its output is likely to take some time. Would you be OK if, on the > ARM platform, +ACCURATE_RND uses the C code path (so fate is not broken), > and -ACCURATE_RND uses the neon code path with a precision of 16bit (IMHO, > speed is preferred over the slight quality gain of the 32bit version on > this platform) ? > > This behaviour will affect yuv420p but also nv12 and nv21 inputs. > > This is a kind of a temporary (and lame) solution until I have some time > to work on it. no objections thanks > > Matthieu > [...] > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB During times of universal deceit, telling the truth becomes a revolutionary act. -- George Orwell signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
On Tue, Dec 15, 2015 at 06:22:43PM +0100, Michael Niedermayer wrote: > On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote: > > From: Matthieu Bouron> > > > --- > > > > Hi, > > > > This commit is likely to break fate on arm since the current C code path > > seems to use less precision. > > > > How should I proceed to fix it ? > > hmm > can the precission of the C code path and any asm impl of it under > bitexact (if they exist), be changed to higher precission without > speedloss? > if so that would be an option We are currently facing 4 cases (with this patch applied) * [1] ARM +ACCURATE_RND: uses neon, 13bit coefficients and 32bit precision overall * [2] ARM -ACCURATE_RND: uses neon, 6bit coefficients and 16bit precision overall * [3] X86 +ACCURATE_RND: uses a C code path with lookup tables * [4] X86 -ACCURATE_RND: uses MMX+MMXEXT with apparently 13bit coefficients (libswscale/yuv2rgb.c around line 800). Notes: * The 4 outputs are different with [3] being ugly (ghosting/non-sharp edges). * [1] and [4] (13bit coefficient accuracy) should be the same but have slight differences. Questions: * What is the meaning of the ACCURATE_RND flag ? * Does [3] use some kind of interpolation instead of duplicating chroma lines ? Its output seems inferior to the other code paths. * Is [3] the output that should be taken as reference ? * Should we use BITEXACT instead of ACCURATE_RND to determine the precision used ? The different outputs of testsrc2,format=yuv420,format=rgba can be found here: http://0x5c.me/yuv2rgb Matthieu [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
On Thu, Dec 17, 2015 at 04:54:31PM +0100, Matthieu Bouron wrote: > On Tue, Dec 15, 2015 at 06:22:43PM +0100, Michael Niedermayer wrote: > > On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote: > > > From: Matthieu Bouron> > > > > > --- > > > > > > Hi, > > > > > > This commit is likely to break fate on arm since the current C code path > > > seems to use less precision. > > > > > > How should I proceed to fix it ? > > > > hmm > > can the precission of the C code path and any asm impl of it under > > bitexact (if they exist), be changed to higher precission without > > speedloss? > > if so that would be an option > > We are currently facing 4 cases (with this patch applied) > > * [1] ARM +ACCURATE_RND: uses neon, 13bit coefficients and 32bit > precision overall > * [2] ARM -ACCURATE_RND: uses neon, 6bit coefficients and 16bit > precision overall > * [3] X86 +ACCURATE_RND: uses a C code path with lookup tables which LUT do you mean here ? > * [4] X86 -ACCURATE_RND: uses MMX+MMXEXT with apparently 13bit > coefficients (libswscale/yuv2rgb.c around line 800). > > Notes: > * The 4 outputs are different with [3] being ugly (ghosting/non-sharp > edges). > > * [1] and [4] (13bit coefficient accuracy) should be the same but have > slight differences. > > Questions: > > * What is the meaning of the ACCURATE_RND flag ? it should enable accurate rounding > * Does [3] use some kind of interpolation instead of duplicating > chroma lines ? Its output seems inferior to the other code paths. are you sure that is true for real images? its easy to end up with wrong conclusions with synthetic inputs unless you want to use the scaler only for such inputs. either way line duplication is likely not optimal for real images iam not made of constant color blocks that are aligned to some cammeras 2x2 samples > * Is [3] the output that should be taken as reference ? id say, the reference is reality, making the output as close as a image of the new resolution would be if it had been taken that way > * Should we use BITEXACT instead of ACCURATE_RND to determine the > precision used ? BITEXACT is to avoid platform differences and allow regression tests if all else is equal it would be best if C and asm matches, and if C is bad then it should be improved > > The different outputs of testsrc2,format=yuv420,format=rgba can > be found here: > http://0x5c.me/yuv2rgb > > Matthieu > [...] > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Its not that you shouldnt use gotos but rather that you should write readable code and code with gotos often but not always is less readable signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}
On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote: > From: Matthieu Bouron> > --- > > Hi, > > This commit is likely to break fate on arm since the current C code path > seems to use less precision. > > How should I proceed to fix it ? hmm can the precission of the C code path and any asm impl of it under bitexact (if they exist), be changed to higher precission without speedloss? if so that would be an option [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Democracy is the form of government in which you can choose your dictator signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel