Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

2015-12-19 Thread Matthieu Bouron
On Sat, Dec 19, 2015 at 06:08:10PM +0100, Michael Niedermayer wrote:
> On Sat, Dec 19, 2015 at 01:28:06PM +0100, Matthieu Bouron wrote:
> > On Sat, Dec 19, 2015 at 11:50:21AM +0100, Michael Niedermayer wrote:
> > > On Sat, Dec 19, 2015 at 10:56:26AM +0100, Matthieu Bouron wrote:
> > > > On Sat, Dec 19, 2015 at 03:24:17AM +0100, Michael Niedermayer wrote:
> > > > > On Fri, Dec 18, 2015 at 04:33:37PM +0100, Matthieu Bouron wrote:
> > > > > > On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouron 
> > > > > > 
> > > > > > wrote:
> > > > > > 
> > > > > > > From: Matthieu Bouron 
> > > > > > >
> > > > > > > ---
> > > > > > >  libswscale/arm/swscale_unscaled.c | 52 +++---
> > > > > > >  libswscale/arm/yuv2rgb_neon.S | 77
> > > > > > > ---
> > > > > > >  2 files changed, 118 insertions(+), 11 deletions(-)
> > > > > > >
> > > > > > > diff --git a/libswscale/arm/swscale_unscaled.c
> > > > > > > b/libswscale/arm/swscale_unscaled.c
> > > > > > > index 8aa6432..dce987e 100644
> > > > > > > --- a/libswscale/arm/swscale_unscaled.c
> > > > > > > +++ b/libswscale/arm/swscale_unscaled.c
> > > > > > > @@ -63,6 +63,50 @@ static int 
> > > > > > > rgbx_to_nv12_neon_16_wrapper(SwsContext
> > > > > > > *context, const uint8_t *src[
> > > > > > >  }
> > > > > > >  #endif
> > > > > > >
> > > > > > > +#define YUV_TO_RGB_TABLE(precision)
> > > > > > >\
> > > > > > > +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > > > > >   \
> > > > > > > +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > > > > >   \
> > > > > > > +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > > > > >   \
> > > > > > > +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > > > > >   \
> > > > > > > +
> > > > > > > +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision)
> > > > > > >\
> > > > > > > +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h,
> > > > > > >\
> > > > > > > + uint8_t *dst, int linesize,
> > > > > > >   \
> > > > > > > + const uint8_t *srcY, int 
> > > > > > > linesizeY,
> > > > > > >   \
> > > > > > > + const uint8_t *srcU, int 
> > > > > > > linesizeU,
> > > > > > >   \
> > > > > > > + const uint8_t *srcV, int 
> > > > > > > linesizeV,
> > > > > > >   \
> > > > > > > + const int16_t *table,
> > > > > > >   \
> > > > > > > + int y_offset,
> > > > > > >   \
> > > > > > > + int y_coeff);
> > > > > > >   \
> > > > > > > +
> > > > > > >   \
> > > > > > > +static int 
> > > > > > > yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext *c,
> > > > > > > const uint8_t *src[],\
> > > > > > > +   int srcStride[], int
> > > > > > > srcSliceY, int srcSliceH,   \
> > > > > > > +   uint8_t *dst[], int
> > > > > > > dstStride[]) {   \
> > > > > > > +const int16_t yuv2rgb_table[] = { 
> > > > > > > YUV_TO_RGB_TABLE(precision) };
> > > > > > >   \
> > > > > > > +
> > > > > > >   \
> > > > > > > +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH,
> > > > > > >\
> > > > > > > + dst[0] +  srcSliceY  * 
> > > > > > > dstStride[0],
> > > > > > > dstStride[0], \
> > > > > > > + src[0] +  srcSliceY  * 
> > > > > > > srcStride[0],
> > > > > > > srcStride[0], \
> > > > > > > + src[1] + (srcSliceY / 2) * 
> > > > > > > srcStride[1],
> > > > > > > srcStride[1], \
> > > > > > > + src[2] + (srcSliceY / 2) * 
> > > > > > > srcStride[2],
> > > > > > > srcStride[2], \
> > > > > > > + yuv2rgb_table,
> > > > > > >\
> > > > > > > + c->yuv2rgb_y_offset >> 9,
> > > > > > >   \
> > > > > > > + c->yuv2rgb_y_coeff / 
> > > > > > > ((precision) == 16
> > > > > > > ? 1 << 7 : 1));\
> > > > > > > +
> > > > > > >   \
> > > > > > > +return 0;
> > > > > > >\
> > > > > > > +}
> > > > > > >\
> > > > > > > +
> > > > > > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision)
> > > > > > >\
> > > > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision)
> > > > > > >\
> > > > > 

Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

2015-12-19 Thread Michael Niedermayer
On Sat, Dec 19, 2015 at 01:28:06PM +0100, Matthieu Bouron wrote:
> On Sat, Dec 19, 2015 at 11:50:21AM +0100, Michael Niedermayer wrote:
> > On Sat, Dec 19, 2015 at 10:56:26AM +0100, Matthieu Bouron wrote:
> > > On Sat, Dec 19, 2015 at 03:24:17AM +0100, Michael Niedermayer wrote:
> > > > On Fri, Dec 18, 2015 at 04:33:37PM +0100, Matthieu Bouron wrote:
> > > > > On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouron 
> > > > > 
> > > > > wrote:
> > > > > 
> > > > > > From: Matthieu Bouron 
> > > > > >
> > > > > > ---
> > > > > >  libswscale/arm/swscale_unscaled.c | 52 +++---
> > > > > >  libswscale/arm/yuv2rgb_neon.S | 77
> > > > > > ---
> > > > > >  2 files changed, 118 insertions(+), 11 deletions(-)
> > > > > >
> > > > > > diff --git a/libswscale/arm/swscale_unscaled.c
> > > > > > b/libswscale/arm/swscale_unscaled.c
> > > > > > index 8aa6432..dce987e 100644
> > > > > > --- a/libswscale/arm/swscale_unscaled.c
> > > > > > +++ b/libswscale/arm/swscale_unscaled.c
> > > > > > @@ -63,6 +63,50 @@ static int 
> > > > > > rgbx_to_nv12_neon_16_wrapper(SwsContext
> > > > > > *context, const uint8_t *src[
> > > > > >  }
> > > > > >  #endif
> > > > > >
> > > > > > +#define YUV_TO_RGB_TABLE(precision)
> > > > > >\
> > > > > > +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > > > >   \
> > > > > > +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > > > >   \
> > > > > > +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > > > >   \
> > > > > > +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > > > >   \
> > > > > > +
> > > > > > +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision)
> > > > > >\
> > > > > > +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h,
> > > > > >\
> > > > > > + uint8_t *dst, int linesize,
> > > > > >   \
> > > > > > + const uint8_t *srcY, int 
> > > > > > linesizeY,
> > > > > >   \
> > > > > > + const uint8_t *srcU, int 
> > > > > > linesizeU,
> > > > > >   \
> > > > > > + const uint8_t *srcV, int 
> > > > > > linesizeV,
> > > > > >   \
> > > > > > + const int16_t *table,
> > > > > >   \
> > > > > > + int y_offset,
> > > > > >   \
> > > > > > + int y_coeff);
> > > > > >   \
> > > > > > +
> > > > > >   \
> > > > > > +static int yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext 
> > > > > > *c,
> > > > > > const uint8_t *src[],\
> > > > > > +   int srcStride[], int
> > > > > > srcSliceY, int srcSliceH,   \
> > > > > > +   uint8_t *dst[], int
> > > > > > dstStride[]) {   \
> > > > > > +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) 
> > > > > > };
> > > > > >   \
> > > > > > +
> > > > > >   \
> > > > > > +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH,
> > > > > >\
> > > > > > + dst[0] +  srcSliceY  * 
> > > > > > dstStride[0],
> > > > > > dstStride[0], \
> > > > > > + src[0] +  srcSliceY  * 
> > > > > > srcStride[0],
> > > > > > srcStride[0], \
> > > > > > + src[1] + (srcSliceY / 2) * 
> > > > > > srcStride[1],
> > > > > > srcStride[1], \
> > > > > > + src[2] + (srcSliceY / 2) * 
> > > > > > srcStride[2],
> > > > > > srcStride[2], \
> > > > > > + yuv2rgb_table,
> > > > > >\
> > > > > > + c->yuv2rgb_y_offset >> 9,
> > > > > >   \
> > > > > > + c->yuv2rgb_y_coeff / ((precision) 
> > > > > > == 16
> > > > > > ? 1 << 7 : 1));\
> > > > > > +
> > > > > >   \
> > > > > > +return 0;
> > > > > >\
> > > > > > +}
> > > > > >\
> > > > > > +
> > > > > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision)
> > > > > >\
> > > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision)
> > > > > >\
> > > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(rgba, precision)
> > > > > >\
> > > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(abgr, precision)
> > > > > >\
> > > > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(bgra, precision)
> > > > > >\
> 

Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

2015-12-19 Thread Matthieu Bouron
On Sat, Dec 19, 2015 at 03:24:17AM +0100, Michael Niedermayer wrote:
> On Fri, Dec 18, 2015 at 04:33:37PM +0100, Matthieu Bouron wrote:
> > On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouron 
> > wrote:
> > 
> > > From: Matthieu Bouron 
> > >
> > > ---
> > >  libswscale/arm/swscale_unscaled.c | 52 +++---
> > >  libswscale/arm/yuv2rgb_neon.S | 77
> > > ---
> > >  2 files changed, 118 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/libswscale/arm/swscale_unscaled.c
> > > b/libswscale/arm/swscale_unscaled.c
> > > index 8aa6432..dce987e 100644
> > > --- a/libswscale/arm/swscale_unscaled.c
> > > +++ b/libswscale/arm/swscale_unscaled.c
> > > @@ -63,6 +63,50 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext
> > > *context, const uint8_t *src[
> > >  }
> > >  #endif
> > >
> > > +#define YUV_TO_RGB_TABLE(precision)
> > >\
> > > +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > >   \
> > > +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > >   \
> > > +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > >   \
> > > +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > >   \
> > > +
> > > +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision)
> > >\
> > > +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h,
> > >\
> > > + uint8_t *dst, int linesize,
> > >   \
> > > + const uint8_t *srcY, int linesizeY,
> > >   \
> > > + const uint8_t *srcU, int linesizeU,
> > >   \
> > > + const uint8_t *srcV, int linesizeV,
> > >   \
> > > + const int16_t *table,
> > >   \
> > > + int y_offset,
> > >   \
> > > + int y_coeff);
> > >   \
> > > +
> > >   \
> > > +static int yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext *c,
> > > const uint8_t *src[],\
> > > +   int srcStride[], int
> > > srcSliceY, int srcSliceH,   \
> > > +   uint8_t *dst[], int
> > > dstStride[]) {   \
> > > +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) };
> > >   \
> > > +
> > >   \
> > > +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH,
> > >\
> > > + dst[0] +  srcSliceY  * dstStride[0],
> > > dstStride[0], \
> > > + src[0] +  srcSliceY  * srcStride[0],
> > > srcStride[0], \
> > > + src[1] + (srcSliceY / 2) * srcStride[1],
> > > srcStride[1], \
> > > + src[2] + (srcSliceY / 2) * srcStride[2],
> > > srcStride[2], \
> > > + yuv2rgb_table,
> > >\
> > > + c->yuv2rgb_y_offset >> 9,
> > >   \
> > > + c->yuv2rgb_y_coeff / ((precision) == 16
> > > ? 1 << 7 : 1));\
> > > +
> > >   \
> > > +return 0;
> > >\
> > > +}
> > >\
> > > +
> > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision)
> > >\
> > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision)
> > >\
> > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(rgba, precision)
> > >\
> > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(abgr, precision)
> > >\
> > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(bgra, precision)
> > >\
> > > +
> > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS
> > >   \
> > > +DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(16)
> > >   \
> > > +
> > > +DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS
> > > +
> > >  #define DECLARE_FF_NVX_TO_RGBX_FUNCS(ifmt, ofmt, precision)
> > >\
> > >  int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h,
> > >   \
> > >   uint8_t *dst, int linesize,
> > >   \
> > > @@ -75,12 +119,7 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int w,
> > > int h,
> > >  static int ifmt##_to_##ofmt##_neon_wrapper_##precision(SwsContext *c,
> > > const uint8_t *src[], \
> > > int srcStride[], int
> > > srcSliceY, int srcSliceH,   \
> > > uint8_t *dst[], int
> > > dstStride[]) {   \
> > 

Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

2015-12-19 Thread Michael Niedermayer
On Sat, Dec 19, 2015 at 10:56:26AM +0100, Matthieu Bouron wrote:
> On Sat, Dec 19, 2015 at 03:24:17AM +0100, Michael Niedermayer wrote:
> > On Fri, Dec 18, 2015 at 04:33:37PM +0100, Matthieu Bouron wrote:
> > > On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouron 
> > > 
> > > wrote:
> > > 
> > > > From: Matthieu Bouron 
> > > >
> > > > ---
> > > >  libswscale/arm/swscale_unscaled.c | 52 +++---
> > > >  libswscale/arm/yuv2rgb_neon.S | 77
> > > > ---
> > > >  2 files changed, 118 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/libswscale/arm/swscale_unscaled.c
> > > > b/libswscale/arm/swscale_unscaled.c
> > > > index 8aa6432..dce987e 100644
> > > > --- a/libswscale/arm/swscale_unscaled.c
> > > > +++ b/libswscale/arm/swscale_unscaled.c
> > > > @@ -63,6 +63,50 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext
> > > > *context, const uint8_t *src[
> > > >  }
> > > >  #endif
> > > >
> > > > +#define YUV_TO_RGB_TABLE(precision)
> > > >\
> > > > +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > >   \
> > > > +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > >   \
> > > > +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > >   \
> > > > +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1),
> > > >   \
> > > > +
> > > > +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision)
> > > >\
> > > > +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h,
> > > >\
> > > > + uint8_t *dst, int linesize,
> > > >   \
> > > > + const uint8_t *srcY, int linesizeY,
> > > >   \
> > > > + const uint8_t *srcU, int linesizeU,
> > > >   \
> > > > + const uint8_t *srcV, int linesizeV,
> > > >   \
> > > > + const int16_t *table,
> > > >   \
> > > > + int y_offset,
> > > >   \
> > > > + int y_coeff);
> > > >   \
> > > > +
> > > >   \
> > > > +static int yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext *c,
> > > > const uint8_t *src[],\
> > > > +   int srcStride[], int
> > > > srcSliceY, int srcSliceH,   \
> > > > +   uint8_t *dst[], int
> > > > dstStride[]) {   \
> > > > +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) };
> > > >   \
> > > > +
> > > >   \
> > > > +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH,
> > > >\
> > > > + dst[0] +  srcSliceY  * 
> > > > dstStride[0],
> > > > dstStride[0], \
> > > > + src[0] +  srcSliceY  * 
> > > > srcStride[0],
> > > > srcStride[0], \
> > > > + src[1] + (srcSliceY / 2) * 
> > > > srcStride[1],
> > > > srcStride[1], \
> > > > + src[2] + (srcSliceY / 2) * 
> > > > srcStride[2],
> > > > srcStride[2], \
> > > > + yuv2rgb_table,
> > > >\
> > > > + c->yuv2rgb_y_offset >> 9,
> > > >   \
> > > > + c->yuv2rgb_y_coeff / ((precision) == 
> > > > 16
> > > > ? 1 << 7 : 1));\
> > > > +
> > > >   \
> > > > +return 0;
> > > >\
> > > > +}
> > > >\
> > > > +
> > > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision)
> > > >\
> > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision)
> > > >\
> > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(rgba, precision)
> > > >\
> > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(abgr, precision)
> > > >\
> > > > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(bgra, precision)
> > > >\
> > > > +
> > > > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS
> > > >   \
> > > > +DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(16)
> > > >   \
> > > > +
> > > > +DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS
> > > > +
> > > >  #define DECLARE_FF_NVX_TO_RGBX_FUNCS(ifmt, ofmt, precision)
> > > >\
> > > >  int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h,
> > > >   \
> > > >   uint8_t *dst, int linesize,
> > > >   \
> > > > @@ -75,12 +119,7 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int 

Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

2015-12-18 Thread Matthieu Bouron
On Thu, Dec 17, 2015 at 07:47:08PM +0100, Michael Niedermayer wrote:
> On Thu, Dec 17, 2015 at 04:54:31PM +0100, Matthieu Bouron wrote:
> > On Tue, Dec 15, 2015 at 06:22:43PM +0100, Michael Niedermayer wrote:
> > > On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote:
> > > > From: Matthieu Bouron 
> > > > 
> > > > ---
> > > > 
> > > > Hi,
> > > > 
> > > > This commit is likely to break fate on arm since the current C code path
> > > > seems to use less precision.
> > > > 
> > > > How should I proceed to fix it ?
> > > 
> > > hmm
> > > can the precission of the C code path and any asm impl of it under
> > > bitexact (if they exist), be changed to higher precission without
> > > speedloss?
> > > if so that would be an option
> > 
> > We are currently facing 4 cases (with this patch applied)
> > 
> >   * [1] ARM +ACCURATE_RND: uses neon, 13bit coefficients and 32bit
> >   precision overall
> >   * [2] ARM -ACCURATE_RND: uses neon, 6bit coefficients and 16bit
> >   precision overall
> 
> >   * [3] X86 +ACCURATE_RND: uses a C code path with lookup tables
> 
> which LUT do you mean here ?

The table filled by ff_yuv2rgb_c_init_tables. Not sure if it's used
though, I haven't looked at the C code that actually does the conversion
(yet).

> 
> 
> >   * [4] X86 -ACCURATE_RND: uses MMX+MMXEXT with apparently 13bit
> >   coefficients (libswscale/yuv2rgb.c around line 800).
> > 
> > Notes:
> >   * The 4 outputs are different with [3] being ugly (ghosting/non-sharp
> >   edges).
> > 
> >   * [1] and [4] (13bit coefficient accuracy) should be the same but have
> >   slight differences.
> > 
> > Questions:
> > 
> 
> >   * What is the meaning of the ACCURATE_RND flag ?
> 
> it should enable accurate rounding
> 
> 
> >   * Does [3] use some kind of interpolation instead of duplicating
> >   chroma lines ? Its output seems inferior to the other code paths.
> 
> are you sure that is true for real images?
> its easy to end up with wrong conclusions with synthetic inputs
> unless you want to use the scaler only for such inputs.
> 
> either way line duplication is likely not optimal for real images
> iam not made of constant color blocks that are aligned to some cammeras
> 2x2 samples
> 
> 
> >   * Is [3] the output that should be taken as reference ?
> 
> id say, the reference is reality, making the output as close as a
> image of the new resolution would be if it had been taken that way
> 
> 
> >   * Should we use BITEXACT instead of ACCURATE_RND to determine the
> >   precision used ?
> 
> BITEXACT is to avoid platform differences and allow regression tests
> 
> if all else is equal it would be best if C and asm matches, and if
> C is bad then it should be improved

Here are the C, MMX and NEON outputs from a photo:
http://0x5c.me/yuv2rgb/photos

The C and NEON outputs look almost the same.
The MMX one have slightly different "colors" overall.

Since figuring out what the C code is actually doing and have the neon asm
matches its output is likely to take some time. Would you be OK if, on the
ARM platform, +ACCURATE_RND uses the C code path (so fate is not broken),
and -ACCURATE_RND uses the neon code path with a precision of 16bit (IMHO,
speed is preferred over the slight quality gain of the 32bit version on
this platform) ?

This behaviour will affect yuv420p but also nv12 and nv21 inputs.

This is a kind of a temporary (and lame) solution until I have some time
to work on it.

Matthieu
[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

2015-12-18 Thread Matthieu Bouron
On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouron 
wrote:

> From: Matthieu Bouron 
>
> ---
>  libswscale/arm/swscale_unscaled.c | 52 +++---
>  libswscale/arm/yuv2rgb_neon.S | 77
> ---
>  2 files changed, 118 insertions(+), 11 deletions(-)
>
> diff --git a/libswscale/arm/swscale_unscaled.c
> b/libswscale/arm/swscale_unscaled.c
> index 8aa6432..dce987e 100644
> --- a/libswscale/arm/swscale_unscaled.c
> +++ b/libswscale/arm/swscale_unscaled.c
> @@ -63,6 +63,50 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext
> *context, const uint8_t *src[
>  }
>  #endif
>
> +#define YUV_TO_RGB_TABLE(precision)
>\
> +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1),
>   \
> +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
>   \
> +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
>   \
> +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1),
>   \
> +
> +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision)
>\
> +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h,
>\
> + uint8_t *dst, int linesize,
>   \
> + const uint8_t *srcY, int linesizeY,
>   \
> + const uint8_t *srcU, int linesizeU,
>   \
> + const uint8_t *srcV, int linesizeV,
>   \
> + const int16_t *table,
>   \
> + int y_offset,
>   \
> + int y_coeff);
>   \
> +
>   \
> +static int yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext *c,
> const uint8_t *src[],\
> +   int srcStride[], int
> srcSliceY, int srcSliceH,   \
> +   uint8_t *dst[], int
> dstStride[]) {   \
> +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) };
>   \
> +
>   \
> +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH,
>\
> + dst[0] +  srcSliceY  * dstStride[0],
> dstStride[0], \
> + src[0] +  srcSliceY  * srcStride[0],
> srcStride[0], \
> + src[1] + (srcSliceY / 2) * srcStride[1],
> srcStride[1], \
> + src[2] + (srcSliceY / 2) * srcStride[2],
> srcStride[2], \
> + yuv2rgb_table,
>\
> + c->yuv2rgb_y_offset >> 9,
>   \
> + c->yuv2rgb_y_coeff / ((precision) == 16
> ? 1 << 7 : 1));\
> +
>   \
> +return 0;
>\
> +}
>\
> +
> +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision)
>\
> +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision)
>\
> +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(rgba, precision)
>\
> +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(abgr, precision)
>\
> +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(bgra, precision)
>\
> +
> +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS
>   \
> +DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(16)
>   \
> +
> +DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS
> +
>  #define DECLARE_FF_NVX_TO_RGBX_FUNCS(ifmt, ofmt, precision)
>\
>  int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h,
>   \
>   uint8_t *dst, int linesize,
>   \
> @@ -75,12 +119,7 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int w,
> int h,
>  static int ifmt##_to_##ofmt##_neon_wrapper_##precision(SwsContext *c,
> const uint8_t *src[], \
> int srcStride[], int
> srcSliceY, int srcSliceH,   \
> uint8_t *dst[], int
> dstStride[]) {   \
> -const int16_t yuv2rgb_table[] = {
>\
> -c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1),
>   \
> -c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
>   \
> -c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
>   \
> -c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1),
>   \
> -};
>   \
> +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) };
>   \
>
>\
>  

Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

2015-12-18 Thread Michael Niedermayer
On Fri, Dec 18, 2015 at 04:33:37PM +0100, Matthieu Bouron wrote:
> On Fri, Dec 18, 2015 at 3:59 PM, Matthieu Bouron 
> wrote:
> 
> > From: Matthieu Bouron 
> >
> > ---
> >  libswscale/arm/swscale_unscaled.c | 52 +++---
> >  libswscale/arm/yuv2rgb_neon.S | 77
> > ---
> >  2 files changed, 118 insertions(+), 11 deletions(-)
> >
> > diff --git a/libswscale/arm/swscale_unscaled.c
> > b/libswscale/arm/swscale_unscaled.c
> > index 8aa6432..dce987e 100644
> > --- a/libswscale/arm/swscale_unscaled.c
> > +++ b/libswscale/arm/swscale_unscaled.c
> > @@ -63,6 +63,50 @@ static int rgbx_to_nv12_neon_16_wrapper(SwsContext
> > *context, const uint8_t *src[
> >  }
> >  #endif
> >
> > +#define YUV_TO_RGB_TABLE(precision)
> >\
> > +c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1),
> >   \
> > +c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
> >   \
> > +c->yuv2rgb_v2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
> >   \
> > +c->yuv2rgb_u2b_coeff / ((precision) == 16 ? 1 << 7 : 1),
> >   \
> > +
> > +#define DECLARE_FF_YUV420P_TO_RGBX_FUNCS(ofmt, precision)
> >\
> > +int ff_yuv420p_to_##ofmt##_neon_##precision(int w, int h,
> >\
> > + uint8_t *dst, int linesize,
> >   \
> > + const uint8_t *srcY, int linesizeY,
> >   \
> > + const uint8_t *srcU, int linesizeU,
> >   \
> > + const uint8_t *srcV, int linesizeV,
> >   \
> > + const int16_t *table,
> >   \
> > + int y_offset,
> >   \
> > + int y_coeff);
> >   \
> > +
> >   \
> > +static int yuv420p_to_##ofmt##_neon_wrapper_##precision(SwsContext *c,
> > const uint8_t *src[],\
> > +   int srcStride[], int
> > srcSliceY, int srcSliceH,   \
> > +   uint8_t *dst[], int
> > dstStride[]) {   \
> > +const int16_t yuv2rgb_table[] = { YUV_TO_RGB_TABLE(precision) };
> >   \
> > +
> >   \
> > +ff_yuv420p_to_##ofmt##_neon_##precision(c->srcW, srcSliceH,
> >\
> > + dst[0] +  srcSliceY  * dstStride[0],
> > dstStride[0], \
> > + src[0] +  srcSliceY  * srcStride[0],
> > srcStride[0], \
> > + src[1] + (srcSliceY / 2) * srcStride[1],
> > srcStride[1], \
> > + src[2] + (srcSliceY / 2) * srcStride[2],
> > srcStride[2], \
> > + yuv2rgb_table,
> >\
> > + c->yuv2rgb_y_offset >> 9,
> >   \
> > + c->yuv2rgb_y_coeff / ((precision) == 16
> > ? 1 << 7 : 1));\
> > +
> >   \
> > +return 0;
> >\
> > +}
> >\
> > +
> > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(precision)
> >\
> > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(argb, precision)
> >\
> > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(rgba, precision)
> >\
> > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(abgr, precision)
> >\
> > +DECLARE_FF_YUV420P_TO_RGBX_FUNCS(bgra, precision)
> >\
> > +
> > +#define DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS
> >   \
> > +DECLARE_FF_YUV420P_TO_ALL_RGBX_FUNCS(16)
> >   \
> > +
> > +DECLARE_FF_YUV420P_TO_ALL_RGBX_ALL_PRECISION_FUNCS
> > +
> >  #define DECLARE_FF_NVX_TO_RGBX_FUNCS(ifmt, ofmt, precision)
> >\
> >  int ff_##ifmt##_to_##ofmt##_neon_##precision(int w, int h,
> >   \
> >   uint8_t *dst, int linesize,
> >   \
> > @@ -75,12 +119,7 @@ int ff_##ifmt##_to_##ofmt##_neon_##precision(int w,
> > int h,
> >  static int ifmt##_to_##ofmt##_neon_wrapper_##precision(SwsContext *c,
> > const uint8_t *src[], \
> > int srcStride[], int
> > srcSliceY, int srcSliceH,   \
> > uint8_t *dst[], int
> > dstStride[]) {   \
> > -const int16_t yuv2rgb_table[] = {
> >\
> > -c->yuv2rgb_v2r_coeff / ((precision) == 16 ? 1 << 7 : 1),
> >   \
> > -c->yuv2rgb_u2g_coeff / ((precision) == 16 ? 1 << 7 : 1),
> >   \
> > -c->yuv2rgb_v2g_coeff / ((precision) == 

Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

2015-12-18 Thread Michael Niedermayer
On Fri, Dec 18, 2015 at 11:44:27AM +0100, Matthieu Bouron wrote:
> On Thu, Dec 17, 2015 at 07:47:08PM +0100, Michael Niedermayer wrote:
> > On Thu, Dec 17, 2015 at 04:54:31PM +0100, Matthieu Bouron wrote:
> > > On Tue, Dec 15, 2015 at 06:22:43PM +0100, Michael Niedermayer wrote:
> > > > On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote:
> > > > > From: Matthieu Bouron 
> > > > > 
> > > > > ---
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > This commit is likely to break fate on arm since the current C code 
> > > > > path
> > > > > seems to use less precision.
> > > > > 
> > > > > How should I proceed to fix it ?
> > > > 
> > > > hmm
> > > > can the precission of the C code path and any asm impl of it under
> > > > bitexact (if they exist), be changed to higher precission without
> > > > speedloss?
> > > > if so that would be an option
> > > 
> > > We are currently facing 4 cases (with this patch applied)
> > > 
> > >   * [1] ARM +ACCURATE_RND: uses neon, 13bit coefficients and 32bit
> > >   precision overall
> > >   * [2] ARM -ACCURATE_RND: uses neon, 6bit coefficients and 16bit
> > >   precision overall
> > 
> > >   * [3] X86 +ACCURATE_RND: uses a C code path with lookup tables
> > 
> > which LUT do you mean here ?
> 
> The table filled by ff_yuv2rgb_c_init_tables. Not sure if it's used
> though, I haven't looked at the C code that actually does the conversion
> (yet).
> 
> > 
> > 
> > >   * [4] X86 -ACCURATE_RND: uses MMX+MMXEXT with apparently 13bit
> > >   coefficients (libswscale/yuv2rgb.c around line 800).
> > > 
> > > Notes:
> > >   * The 4 outputs are different with [3] being ugly (ghosting/non-sharp
> > >   edges).
> > > 
> > >   * [1] and [4] (13bit coefficient accuracy) should be the same but have
> > >   slight differences.
> > > 
> > > Questions:
> > > 
> > 
> > >   * What is the meaning of the ACCURATE_RND flag ?
> > 
> > it should enable accurate rounding
> > 
> > 
> > >   * Does [3] use some kind of interpolation instead of duplicating
> > >   chroma lines ? Its output seems inferior to the other code paths.
> > 
> > are you sure that is true for real images?
> > its easy to end up with wrong conclusions with synthetic inputs
> > unless you want to use the scaler only for such inputs.
> > 
> > either way line duplication is likely not optimal for real images
> > iam not made of constant color blocks that are aligned to some cammeras
> > 2x2 samples
> > 
> > 
> > >   * Is [3] the output that should be taken as reference ?
> > 
> > id say, the reference is reality, making the output as close as a
> > image of the new resolution would be if it had been taken that way
> > 
> > 
> > >   * Should we use BITEXACT instead of ACCURATE_RND to determine the
> > >   precision used ?
> > 
> > BITEXACT is to avoid platform differences and allow regression tests
> > 
> > if all else is equal it would be best if C and asm matches, and if
> > C is bad then it should be improved
> 
> Here are the C, MMX and NEON outputs from a photo:
> http://0x5c.me/yuv2rgb/photos
> 
> The C and NEON outputs look almost the same.
> The MMX one have slightly different "colors" overall.
> 
> Since figuring out what the C code is actually doing and have the neon asm
> matches its output is likely to take some time. Would you be OK if, on the
> ARM platform, +ACCURATE_RND uses the C code path (so fate is not broken),
> and -ACCURATE_RND uses the neon code path with a precision of 16bit (IMHO,
> speed is preferred over the slight quality gain of the 32bit version on
> this platform) ?
> 
> This behaviour will affect yuv420p but also nv12 and nv21 inputs.
> 
> This is a kind of a temporary (and lame) solution until I have some time
> to work on it.

no objections

thanks

>
> Matthieu
> [...]
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

During times of universal deceit, telling the truth becomes a
revolutionary act. -- George Orwell


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

2015-12-17 Thread Matthieu Bouron
On Tue, Dec 15, 2015 at 06:22:43PM +0100, Michael Niedermayer wrote:
> On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote:
> > From: Matthieu Bouron 
> > 
> > ---
> > 
> > Hi,
> > 
> > This commit is likely to break fate on arm since the current C code path
> > seems to use less precision.
> > 
> > How should I proceed to fix it ?
> 
> hmm
> can the precission of the C code path and any asm impl of it under
> bitexact (if they exist), be changed to higher precission without
> speedloss?
> if so that would be an option

We are currently facing 4 cases (with this patch applied)

  * [1] ARM +ACCURATE_RND: uses neon, 13bit coefficients and 32bit
  precision overall
  * [2] ARM -ACCURATE_RND: uses neon, 6bit coefficients and 16bit
  precision overall
  * [3] X86 +ACCURATE_RND: uses a C code path with lookup tables
  * [4] X86 -ACCURATE_RND: uses MMX+MMXEXT with apparently 13bit
  coefficients (libswscale/yuv2rgb.c around line 800).

Notes:
  * The 4 outputs are different with [3] being ugly (ghosting/non-sharp
  edges).

  * [1] and [4] (13bit coefficient accuracy) should be the same but have
  slight differences.

Questions:

  * What is the meaning of the ACCURATE_RND flag ?
  * Does [3] use some kind of interpolation instead of duplicating
  chroma lines ? Its output seems inferior to the other code paths.
  * Is [3] the output that should be taken as reference ?
  * Should we use BITEXACT instead of ACCURATE_RND to determine the
  precision used ?

The different outputs of testsrc2,format=yuv420,format=rgba can
be found here:
http://0x5c.me/yuv2rgb

Matthieu
[...]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

2015-12-17 Thread Michael Niedermayer
On Thu, Dec 17, 2015 at 04:54:31PM +0100, Matthieu Bouron wrote:
> On Tue, Dec 15, 2015 at 06:22:43PM +0100, Michael Niedermayer wrote:
> > On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote:
> > > From: Matthieu Bouron 
> > > 
> > > ---
> > > 
> > > Hi,
> > > 
> > > This commit is likely to break fate on arm since the current C code path
> > > seems to use less precision.
> > > 
> > > How should I proceed to fix it ?
> > 
> > hmm
> > can the precission of the C code path and any asm impl of it under
> > bitexact (if they exist), be changed to higher precission without
> > speedloss?
> > if so that would be an option
> 
> We are currently facing 4 cases (with this patch applied)
> 
>   * [1] ARM +ACCURATE_RND: uses neon, 13bit coefficients and 32bit
>   precision overall
>   * [2] ARM -ACCURATE_RND: uses neon, 6bit coefficients and 16bit
>   precision overall

>   * [3] X86 +ACCURATE_RND: uses a C code path with lookup tables

which LUT do you mean here ?


>   * [4] X86 -ACCURATE_RND: uses MMX+MMXEXT with apparently 13bit
>   coefficients (libswscale/yuv2rgb.c around line 800).
> 
> Notes:
>   * The 4 outputs are different with [3] being ugly (ghosting/non-sharp
>   edges).
> 
>   * [1] and [4] (13bit coefficient accuracy) should be the same but have
>   slight differences.
> 
> Questions:
> 

>   * What is the meaning of the ACCURATE_RND flag ?

it should enable accurate rounding


>   * Does [3] use some kind of interpolation instead of duplicating
>   chroma lines ? Its output seems inferior to the other code paths.

are you sure that is true for real images?
its easy to end up with wrong conclusions with synthetic inputs
unless you want to use the scaler only for such inputs.

either way line duplication is likely not optimal for real images
iam not made of constant color blocks that are aligned to some cammeras
2x2 samples


>   * Is [3] the output that should be taken as reference ?

id say, the reference is reality, making the output as close as a
image of the new resolution would be if it had been taken that way


>   * Should we use BITEXACT instead of ACCURATE_RND to determine the
>   precision used ?

BITEXACT is to avoid platform differences and allow regression tests

if all else is equal it would be best if C and asm matches, and if
C is bad then it should be improved


> 
> The different outputs of testsrc2,format=yuv420,format=rgba can
> be found here:
> http://0x5c.me/yuv2rgb
> 
> Matthieu
> [...]
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Its not that you shouldnt use gotos but rather that you should write
readable code and code with gotos often but not always is less readable


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] swscale/arm/yuv2rgb: add ff_yuv420p_to_{argb, rgba, abgr, bgra}_neon_{16, 32}

2015-12-15 Thread Michael Niedermayer
On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote:
> From: Matthieu Bouron 
> 
> ---
> 
> Hi,
> 
> This commit is likely to break fate on arm since the current C code path
> seems to use less precision.
> 
> How should I proceed to fix it ?

hmm
can the precission of the C code path and any asm impl of it under
bitexact (if they exist), be changed to higher precission without
speedloss?
if so that would be an option

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Democracy is the form of government in which you can choose your dictator


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel