date:20220331

Re: [FFmpeg-devel] [PATCH v11 1/5] avcodec/jpegxl: add Jpeg XL image codec and parser

2022-03-31 Thread Lynne

1 Apr 2022, 04:13 by d...@lynne.ee:

> 1 Apr 2022, 02:20 by leo.i...@gmail.com:
>
>> This commit adds support to libavcodec to read and parse
>> encoded Jpeg XL images. Jpeg XL is intended to be an
>> extended-life replacement to legacy mjpeg.
>> ---
>>  MAINTAINERS|   2 +
>>  libavcodec/Makefile|   1 +
>>  libavcodec/codec_desc.c|   9 +
>>  libavcodec/codec_id.h  |   1 +
>>  libavcodec/jpegxl.h|  43 ++
>>  libavcodec/jpegxl_parser.c | 951 + 
>> +}
>> +}
>> +if (header->color_space == FF_JPEGXL_CS_GRAY) {
>> +if (header->bits_per_sample <= 8)
>> +return alpha ? AV_PIX_FMT_YA8 : AV_PIX_FMT_GRAY8;
>> +if (header->bits_per_sample > 16 || header->exp_bits_per_sample)
>> +return alpha ? AV_PIX_FMT_NONE : AV_PIX_FMT_GRAYF32;
>> +return alpha ? AV_PIX_FMT_YA16 : AV_PIX_FMT_GRAY16;
>> +} else if (header->color_space == FF_JPEGXL_CS_RGB
>> +|| header->color_space == FF_JPEGXL_CS_XYB) {
>> +if (header->bits_per_sample <= 8)
>> +return alpha ? AV_PIX_FMT_RGBA : AV_PIX_FMT_RGB24;
>> +if (header->bits_per_sample > 16 || header->exp_bits_per_sample)
>> +return alpha ? AV_PIX_FMT_GBRAPF32 : AV_PIX_FMT_GBRPF32;
>> +return alpha ? AV_PIX_FMT_RGBA64 : AV_PIX_FMT_RGB48;
>> +}
>> +return AV_PIX_FMT_NONE;
>>
>
> YUV is supported, via the do_YCbCr flag in the spec. I think
> we ought to set the pixel format to YUV444P/16 in that case,
> as the codec requires YUV to be upsampled during decoding,
> and doing unnecessary colorspace conversions inside
> decoders is something we don't want.
> Decoders are free to change what the parser sets, so if
> users link to libjxl, then RGB will be reported as lavf will
> decode the first frame during probing.
> Otherwise, the native decoder would match what the parser
> reports and output YUV444-frames when signalled.
>

Come to think of it, we better output XYB instead of RGB.
But that can be changed later, for now I think it's fine if
the parser always reports either RGB or Gray, so this is fine as-is.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v11 1/5] avcodec/jpegxl: add Jpeg XL image codec and parser

2022-03-31 Thread Lynne

1 Apr 2022, 02:20 by leo.i...@gmail.com:

> This commit adds support to libavcodec to read and parse
> encoded Jpeg XL images. Jpeg XL is intended to be an
> extended-life replacement to legacy mjpeg.
> ---
>  MAINTAINERS|   2 +
>  libavcodec/Makefile|   1 +
>  libavcodec/codec_desc.c|   9 +
>  libavcodec/codec_id.h  |   1 +
>  libavcodec/jpegxl.h|  43 ++
>  libavcodec/jpegxl_parser.c | 951 + 
> +}
> +}
> +if (header->color_space == FF_JPEGXL_CS_GRAY) {
> +if (header->bits_per_sample <= 8)
> +return alpha ? AV_PIX_FMT_YA8 : AV_PIX_FMT_GRAY8;
> +if (header->bits_per_sample > 16 || header->exp_bits_per_sample)
> +return alpha ? AV_PIX_FMT_NONE : AV_PIX_FMT_GRAYF32;
> +return alpha ? AV_PIX_FMT_YA16 : AV_PIX_FMT_GRAY16;
> +} else if (header->color_space == FF_JPEGXL_CS_RGB
> +|| header->color_space == FF_JPEGXL_CS_XYB) {
> +if (header->bits_per_sample <= 8)
> +return alpha ? AV_PIX_FMT_RGBA : AV_PIX_FMT_RGB24;
> +if (header->bits_per_sample > 16 || header->exp_bits_per_sample)
> +return alpha ? AV_PIX_FMT_GBRAPF32 : AV_PIX_FMT_GBRPF32;
> +return alpha ? AV_PIX_FMT_RGBA64 : AV_PIX_FMT_RGB48;
> +}
> +return AV_PIX_FMT_NONE;
>

YUV is supported, via the do_YCbCr flag in the spec. I think
we ought to set the pixel format to YUV444P/16 in that case,
as the codec requires YUV to be upsampled during decoding,
and doing unnecessary colorspace conversions inside
decoders is something we don't want.
Decoders are free to change what the parser sets, so if
users link to libjxl, then RGB will be reported as lavf will
decode the first frame during probing.
Otherwise, the native decoder would match what the parser
reports and output YUV444-frames when signalled.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] MAINTAINERS: add myself as maintainer for libsrt protocol

2022-03-31 Thread lance . lmwang

On Wed, Mar 30, 2022 at 09:44:08PM +0200, Marton Balint wrote:
> 
> 
> On Fri, 25 Mar 2022, Zhao Zhili wrote:
> 
> > Signed-off-by: Zhao Zhili 
> > ---
> > MAINTAINERS | 1 +
> > 1 file changed, 1 insertion(+)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 931cf4bd2c..5daa6f8e03 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -516,6 +516,7 @@ Protocols:
> >   bluray.c  Petri Hintukainen
> >   ftp.c Lukasz Marek
> >   http.cRonald S. Bultje
> > +  libsrt.c  Zhao Zhili
> >   libssh.c  Lukasz Marek
> >   libzmq.c  Andriy Gelman
> >   mms*.cRonald S. Bultje
> 
> LGTM, thanks.

Applied, thanks.

> 
> Marton
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

-- 
Thanks,
Limin Wang
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v11 1/1] avformat: Add IPFS protocol support.

2022-03-31 Thread Mark Gaiser

On Fri, Apr 1, 2022 at 2:09 AM Mark Gaiser  wrote:

> This patch adds support for:
> - ffplay ipfs://
> - ffplay ipns://
>
> IPFS data can be played from so called "ipfs gateways".
> A gateway is essentially a webserver that gives access to the
> distributed IPFS network.
>
> This protocol support (ipfs and ipns) therefore translates
> ipfs:// and ipns:// to a http:// url. This resulting url is
> then handled by the http protocol. It could also be https
> depending on the gateway provided.
>
> To use this protocol, a gateway must be provided.
> If you do nothing it will try to find it in your
> $HOME/.ipfs/gateway file. The ways to set it manually are:
> 1. Define a -gateway  to the gateway.
> 2. Define $IPFS_GATEWAY with the full http link to the gateway.
> 3. Define $IPFS_PATH and point it to the IPFS data path.
> 4. Have IPFS running in your local user folder (under $HOME/.ipfs).
>
> Signed-off-by: Mark Gaiser 
> ---
>  configure |   2 +
>  doc/protocols.texi|  30 
>  libavformat/Makefile  |   2 +
>  libavformat/ipfsgateway.c | 328 ++
>  libavformat/protocols.c   |   2 +
>  5 files changed, 364 insertions(+)
>  create mode 100644 libavformat/ipfsgateway.c
>
> diff --git a/configure b/configure
> index e4d36aa639..55af90957a 100755
> --- a/configure
> +++ b/configure
> @@ -3579,6 +3579,8 @@ udp_protocol_select="network"
>  udplite_protocol_select="network"
>  unix_protocol_deps="sys_un_h"
>  unix_protocol_select="network"
> +ipfs_protocol_select="https_protocol"
> +ipns_protocol_select="https_protocol"
>
>  # external library protocols
>  libamqp_protocol_deps="librabbitmq"
> diff --git a/doc/protocols.texi b/doc/protocols.texi
> index d207df0b52..7c9c0a4808 100644
> --- a/doc/protocols.texi
> +++ b/doc/protocols.texi
> @@ -2025,5 +2025,35 @@ decoding errors.
>
>  @end table
>
> +@section ipfs
> +
> +InterPlanetary File System (IPFS) protocol support. One can access files
> stored
> +on the IPFS network through so called gateways. Those are http(s)
> endpoints.
> +This protocol wraps the IPFS native protocols (ipfs:// and ipns://) to be
> send
> +to such a gateway. Users can (and should) host their own node which means
> this
> +protocol will use your local machine gateway to access files on the IPFS
> network.
> +
> +If a user doesn't have a node of their own then the public gateway
> dweb.link is
> +used by default.
> +
> +You can use this protocol in 2 ways. Using IPFS:
> +@example
> +ffplay ipfs://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
> +@end example
> +
> +Or the IPNS protocol (IPNS is mutable IPFS):
> +@example
> +ffplay ipns://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
> +@end example
> +
> +You can also change the gateway to be used:
> +
> +@table @option
> +
> +@item gateway
> +Defines the gateway to use. When nothing is provided the protocol will
> first try
> +your local gateway. If that fails dweb.link will be used.
> +
> +@end table
>
>  @c man end PROTOCOLS
> diff --git a/libavformat/Makefile b/libavformat/Makefile
> index d7182d6bd8..e3233fd7ac 100644
> --- a/libavformat/Makefile
> +++ b/libavformat/Makefile
> @@ -660,6 +660,8 @@ OBJS-$(CONFIG_SRTP_PROTOCOL) +=
> srtpproto.o srtp.o
>  OBJS-$(CONFIG_SUBFILE_PROTOCOL)  += subfile.o
>  OBJS-$(CONFIG_TEE_PROTOCOL)  += teeproto.o tee_common.o
>  OBJS-$(CONFIG_TCP_PROTOCOL)  += tcp.o
> +OBJS-$(CONFIG_IPFS_PROTOCOL) += ipfsgateway.o
> +OBJS-$(CONFIG_IPNS_PROTOCOL) += ipfsgateway.o
>  TLS-OBJS-$(CONFIG_GNUTLS)+= tls_gnutls.o
>  TLS-OBJS-$(CONFIG_LIBTLS)+= tls_libtls.o
>  TLS-OBJS-$(CONFIG_MBEDTLS)   += tls_mbedtls.o
> diff --git a/libavformat/ipfsgateway.c b/libavformat/ipfsgateway.c
> new file mode 100644
> index 00..725cc5e474
> --- /dev/null
> +++ b/libavformat/ipfsgateway.c
> @@ -0,0 +1,328 @@
> +/*
> + * IPFS and IPNS protocol support through IPFS Gateway.
> + * Copyright (c) 2022 Mark Gaiser
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> 02110-1301 USA
> + */
> +
> +#include "libavutil/avstring.h"
> +#include "libavutil/opt.h"
> +#include "url.h"
> +#include 
> +
> +typedef struct IPFSGatewayContext {
> +

[FFmpeg-devel] [PATCH v11 3/5] avcodec/libjxl: add Jpeg XL encoding via libjxl

2022-03-31 Thread Leo Izen

This commit adds encoding support to libavcodec
for Jpeg XL images via the external library libjxl.
---
 configure  |   3 +-
 libavcodec/Makefile|   1 +
 libavcodec/allcodecs.c |   1 +
 libavcodec/libjxlenc.c | 379 +
 4 files changed, 383 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/libjxlenc.c

diff --git a/configure b/configure
index 969b13eba3..85a1a8b53c 100755
--- a/configure
+++ b/configure
@@ -240,7 +240,7 @@ External library support:
   --enable-libiec61883 enable iec61883 via libiec61883 [no]
   --enable-libilbc enable iLBC de/encoding via libilbc [no]
   --enable-libjack enable JACK audio sound server [no]
-  --enable-libjxl  enable JPEG XL decoding via libjxl [no]
+  --enable-libjxl  enable JPEG XL de/encoding via libjxl [no]
   --enable-libklvanc   enable Kernel Labs VANC processing [no]
   --enable-libkvazaar  enable HEVC encoding via libkvazaar [no]
   --enable-liblensfun  enable lensfun lens correction [no]
@@ -3332,6 +3332,7 @@ libgsm_ms_encoder_deps="libgsm"
 libilbc_decoder_deps="libilbc"
 libilbc_encoder_deps="libilbc"
 libjxl_decoder_deps="libjxl libjxl_threads"
+libjxl_encoder_deps="libjxl libjxl_threads"
 libkvazaar_encoder_deps="libkvazaar"
 libmodplug_demuxer_deps="libmodplug"
 libmp3lame_encoder_deps="libmp3lame"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index c00b0d3246..b208cc0097 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1061,6 +1061,7 @@ OBJS-$(CONFIG_LIBGSM_MS_ENCODER)  += libgsmenc.o
 OBJS-$(CONFIG_LIBILBC_DECODER)+= libilbc.o
 OBJS-$(CONFIG_LIBILBC_ENCODER)+= libilbc.o
 OBJS-$(CONFIG_LIBJXL_DECODER) += libjxldec.o libjxl.o
+OBJS-$(CONFIG_LIBJXL_ENCODER) += libjxlenc.o libjxl.o
 OBJS-$(CONFIG_LIBKVAZAAR_ENCODER) += libkvazaar.o
 OBJS-$(CONFIG_LIBMP3LAME_ENCODER) += libmp3lame.o
 OBJS-$(CONFIG_LIBOPENCORE_AMRNB_DECODER)  += libopencore-amr.o
diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
index a9cd69dfce..db92fb7af5 100644
--- a/libavcodec/allcodecs.c
+++ b/libavcodec/allcodecs.c
@@ -750,6 +750,7 @@ extern const FFCodec ff_libgsm_ms_decoder;
 extern const FFCodec ff_libilbc_encoder;
 extern const FFCodec ff_libilbc_decoder;
 extern const FFCodec ff_libjxl_decoder;
+extern const FFCodec ff_libjxl_encoder;
 extern const FFCodec ff_libmp3lame_encoder;
 extern const FFCodec ff_libopencore_amrnb_encoder;
 extern const FFCodec ff_libopencore_amrnb_decoder;
diff --git a/libavcodec/libjxlenc.c b/libavcodec/libjxlenc.c
new file mode 100644
index 00..deacc0f1f8
--- /dev/null
+++ b/libavcodec/libjxlenc.c
@@ -0,0 +1,379 @@
+/*
+ * JPEG XL encoding support via libjxl
+ * Copyright (c) 2021 Leo Izen 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * JPEG XL encoder using libjxl
+ */
+
+#include "libavutil/avutil.h"
+#include "libavutil/error.h"
+#include "libavutil/frame.h"
+#include "libavutil/libm.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/pixfmt.h"
+#include "libavutil/version.h"
+
+#include "avcodec.h"
+#include "codec_internal.h"
+
+#include 
+#include 
+#include "libjxl.h"
+
+typedef struct LibJxlEncodeContext {
+AVClass *class;
+void *runner;
+JxlEncoder *encoder;
+JxlEncoderFrameSettings *options;
+int effort;
+float distance;
+int modular;
+uint8_t *buffer;
+size_t buffer_size;
+} LibJxlEncodeContext;
+
+/**
+ * Map a quality setting for -qscale roughly from libjpeg
+ * quality numbers to libjxl's butteraugli distance for
+ * photographic content.
+ *
+ * Setting distance explicitly is preferred, but this will
+ * allow qscale to be used as a fallback.
+ *
+ * This function is continuous and injective on [0, 100] which
+ * makes it monotonic.
+ *
+ * @param  quality 0.0 to 100.0 quality setting, libjpeg quality
+ * @return Butteraugli distance between 0.0 and 15.0
+ */
+static float quality_to_distance(float quality)
+{
+if (quality >= 100.0)
+return 0.0;
+else if (quality >= 90.0)
+return (100.0 - quality) * 0.10;
+else if (quality >= 30.0)
+return 0.1 + (100.0 - quality) * 0.09;
+

[FFmpeg-devel] [PATCH v11 1/5] avcodec/jpegxl: add Jpeg XL image codec and parser

2022-03-31 Thread Leo Izen

This commit adds support to libavcodec to read and parse
encoded Jpeg XL images. Jpeg XL is intended to be an
extended-life replacement to legacy mjpeg.
---
 MAINTAINERS|   2 +
 libavcodec/Makefile|   1 +
 libavcodec/codec_desc.c|   9 +
 libavcodec/codec_id.h  |   1 +
 libavcodec/jpegxl.h|  43 ++
 libavcodec/jpegxl_parser.c | 951 +
 libavcodec/parsers.c   |   1 +
 libavcodec/version.h   |   2 +-
 8 files changed, 1009 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/jpegxl.h
 create mode 100644 libavcodec/jpegxl_parser.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 76e1332ad8..9ab08bad8e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -188,6 +188,7 @@ Codecs:
   interplayvideo.c  Mike Melanson
   jni*, ffjni*  Matthieu Bouron
   jpeg2000* Nicolas Bertrand
+  jpegxl.h, jpegxl_parser.c Leo Izen
   jvdec.c   Peter Ross
   lcl*.cRoberto Togni, Reimar Doeffinger
   libcelt_dec.c Nicolas George
@@ -617,6 +618,7 @@ Haihao Xiang (haihao) 1F0C 31E8 B4FE F7A4 4DC1 DC99 
E0F5 76D4 76FC 437F
 Jaikrishnan Menon 61A1 F09F 01C9 2D45 78E1 C862 25DC 8831 AF70 D368
 James Almer   7751 2E8C FD94 A169 57E6 9A7A 1463 01AD 7376 59E0
 Jean Delvare  7CA6 9F44 60F1 BDC4 1FD2 C858 A552 6B9B B3CD 4E6A
+Leo Izen (thebombzen) B6FD 3CFC 7ACF 83FC 9137 6945 5A71 C331 FD2F A19A
 Loren Merritt ABD9 08F4 C920 3F65 D8BE 35D7 1540 DAA7 060F 56DE
 Lynne FE50 139C 6805 72CA FD52 1F8D A2FE A5F0 3F03 4464
 Michael Niedermayer   9FF2 128B 147E F673 0BAD F133 611E C787 040B 0FAB
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index fb8b0e824b..3723601b3d 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -44,6 +44,7 @@ OBJS = ac3_parser.o   
  \
dv_profile.o \
encode.o \
imgconvert.o \
+   jpegxl_parser.o  \
jni.o\
mathtables.o \
mediacodec.o \
diff --git a/libavcodec/codec_desc.c b/libavcodec/codec_desc.c
index 81f3b3c640..1b82870aaa 100644
--- a/libavcodec/codec_desc.c
+++ b/libavcodec/codec_desc.c
@@ -1863,6 +1863,15 @@ static const AVCodecDescriptor codec_descriptors[] = {
 .long_name = NULL_IF_CONFIG_SMALL("GEM Raster image"),
 .props = AV_CODEC_PROP_LOSSY,
 },
+{
+.id= AV_CODEC_ID_JPEGXL,
+.type  = AVMEDIA_TYPE_VIDEO,
+.name  = "jpegxl",
+.long_name = NULL_IF_CONFIG_SMALL("JPEG XL"),
+.props = AV_CODEC_PROP_INTRA_ONLY | AV_CODEC_PROP_LOSSY |
+ AV_CODEC_PROP_LOSSLESS,
+.mime_types= MT("image/jxl"),
+},
 
 /* various PCM "codecs" */
 {
diff --git a/libavcodec/codec_id.h b/libavcodec/codec_id.h
index 3ffb9bd22e..dbc4f3a208 100644
--- a/libavcodec/codec_id.h
+++ b/libavcodec/codec_id.h
@@ -308,6 +308,7 @@ enum AVCodecID {
 AV_CODEC_ID_SIMBIOSIS_IMX,
 AV_CODEC_ID_SGA_VIDEO,
 AV_CODEC_ID_GEM,
+AV_CODEC_ID_JPEGXL,
 
 /* various PCM "codecs" */
 AV_CODEC_ID_FIRST_AUDIO = 0x1, ///< A dummy id pointing at the 
start of audio codecs
diff --git a/libavcodec/jpegxl.h b/libavcodec/jpegxl.h
new file mode 100644
index 00..a0f266c4ff
--- /dev/null
+++ b/libavcodec/jpegxl.h
@@ -0,0 +1,43 @@
+/*
+ * JPEG XL header
+ * Copyright (c) 2021 Leo Izen 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * JPEG XL header
+ */
+
+#ifndef AVCODEC_JPEGXL_H
+#define AVCODEC_JPEGXL_H
+
+#include 
+
+/* these are also used in avformat/img2dec.c */
+#define FF_JPEGXL_CODESTREAM_SIGNATURE_LE 0x0aff
+#define

[FFmpeg-devel] [PATCH v11 0/5] Jpeg XL Patch Set

2022-03-31 Thread Leo Izen

This patchset adds the Jpeg XL Image format and a parser for this format,
as well as a decoder and encoder for it based on the external reference
implementation library, libjxl.

Changes:
v11:
 - Fix regression I introduced in v10 with skipping boxes
v10:
 - Make changes requested by Andreas Reinhardt from v9
v9:
 - v8 with a typo fix
v8:
 - v7, but with stylistic changes as requested by Lynne and others on IRC
v7:
 - Fully implement the parser and test it against the conformance samples

Leo Izen (5):
  avcodec/jpegxl: add Jpeg XL image codec and parser
  avcodec/libjxl: add Jpeg XL decoding via libjxl
  avcodec/libjxl: add Jpeg XL encoding via libjxl
  avformat/image2: add Jpeg XL as image2 format
  fate/jpegxl: add Jpeg XL demux and parse FATE test

 MAINTAINERS |   3 +
 configure   |   6 +
 doc/general_contents.texi   |   7 +
 libavcodec/Makefile |   3 +
 libavcodec/allcodecs.c  |   2 +
 libavcodec/codec_desc.c |   9 +
 libavcodec/codec_id.h   |   1 +
 libavcodec/jpegxl.h |  43 ++
 libavcodec/jpegxl_parser.c  | 951 
 libavcodec/libjxl.c |  70 ++
 libavcodec/libjxl.h |  48 ++
 libavcodec/libjxldec.c  | 301 +
 libavcodec/libjxlenc.c  | 379 +++
 libavcodec/parsers.c|   1 +
 libavcodec/version.h|   2 +-
 libavformat/allformats.c|   1 +
 libavformat/img2.c  |   1 +
 libavformat/img2dec.c   |  21 +
 libavformat/img2enc.c   |   6 +-
 libavformat/mov.c   |   1 +
 libavformat/version.h   |   4 +-
 tests/fate/image.mak|  10 +
 tests/ref/fate/jxl-parse-codestream |   6 +
 tests/ref/fate/jxl-parse-container  |   6 +
 24 files changed, 1876 insertions(+), 6 deletions(-)
 create mode 100644 libavcodec/jpegxl.h
 create mode 100644 libavcodec/jpegxl_parser.c
 create mode 100644 libavcodec/libjxl.c
 create mode 100644 libavcodec/libjxl.h
 create mode 100644 libavcodec/libjxldec.c
 create mode 100644 libavcodec/libjxlenc.c
 create mode 100644 tests/ref/fate/jxl-parse-codestream
 create mode 100644 tests/ref/fate/jxl-parse-container

-- 
2.35.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v11 5/5] fate/jpegxl: add Jpeg XL demux and parse FATE test

2022-03-31 Thread Leo Izen

Add a fate test for the Jpeg XL parser in libavcodec and
its image2 wrapper inside libavformat.
---
 tests/fate/image.mak| 10 ++
 tests/ref/fate/jxl-parse-codestream |  6 ++
 tests/ref/fate/jxl-parse-container  |  6 ++
 3 files changed, 22 insertions(+)
 create mode 100644 tests/ref/fate/jxl-parse-codestream
 create mode 100644 tests/ref/fate/jxl-parse-container

diff --git a/tests/fate/image.mak b/tests/fate/image.mak
index 573d398915..15b6145c58 100644
--- a/tests/fate/image.mak
+++ b/tests/fate/image.mak
@@ -357,6 +357,16 @@ FATE_JPEGLS-$(call DEMDEC, IMAGE2, JPEGLS) += 
$(FATE_JPEGLS)
 FATE_IMAGE += $(FATE_JPEGLS-yes)
 fate-jpegls: $(FATE_JPEGLS-yes)
 
+FATE_JPEGXL += fate-jxl-parse-codestream
+fate-jxl-parse-codestream: CMD = framecrc -i $(TARGET_SAMPLES)/jxl/belgium.jxl 
-c:v copy
+
+FATE_JPEGXL += fate-jxl-parse-container
+fate-jxl-parse-container: CMD = framecrc -i 
$(TARGET_SAMPLES)/jxl/lenna-256.jxl -c:v copy
+
+FATE_JPEGXL-$(call DEMDEC, IMAGE2) += $(FATE_JPEGXL)
+FATE_IMAGE += $(FATE_JPEGXL-yes)
+fate-jxl: $(FATE_JPEGXL-yes)
+
 FATE_IMAGE-$(call DEMDEC, IMAGE2, QDRAW) += fate-pict
 fate-pict: CMD = framecrc -i $(TARGET_SAMPLES)/quickdraw/TRU256.PCT -pix_fmt 
rgb24
 
diff --git a/tests/ref/fate/jxl-parse-codestream 
b/tests/ref/fate/jxl-parse-codestream
new file mode 100644
index 00..b2fe5035ac
--- /dev/null
+++ b/tests/ref/fate/jxl-parse-codestream
@@ -0,0 +1,6 @@
+#tb 0: 1/25
+#media_type 0: video
+#codec_id 0: jpegxl
+#dimensions 0: 768x512
+#sar 0: 0/1
+0,  0,  0,1,   32, 0xa2930a20
diff --git a/tests/ref/fate/jxl-parse-container 
b/tests/ref/fate/jxl-parse-container
new file mode 100644
index 00..99233d612a
--- /dev/null
+++ b/tests/ref/fate/jxl-parse-container
@@ -0,0 +1,6 @@
+#tb 0: 1/25
+#media_type 0: video
+#codec_id 0: jpegxl
+#dimensions 0: 256x256
+#sar 0: 0/1
+0,  0,  0,1, 8088, 0xbbfea9bd
-- 
2.35.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v11 4/5] avformat/image2: add Jpeg XL as image2 format

2022-03-31 Thread Leo Izen

This commit adds support to libavformat for muxing
and demuxing Jpeg XL images as image2 streams.
---
 libavformat/allformats.c |  1 +
 libavformat/img2.c   |  1 +
 libavformat/img2dec.c| 21 +
 libavformat/img2enc.c|  6 +++---
 libavformat/mov.c|  1 +
 libavformat/version.h|  4 ++--
 6 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/libavformat/allformats.c b/libavformat/allformats.c
index 587ad59b3c..941f3643f8 100644
--- a/libavformat/allformats.c
+++ b/libavformat/allformats.c
@@ -510,6 +510,7 @@ extern const AVInputFormat  ff_image_gif_pipe_demuxer;
 extern const AVInputFormat  ff_image_j2k_pipe_demuxer;
 extern const AVInputFormat  ff_image_jpeg_pipe_demuxer;
 extern const AVInputFormat  ff_image_jpegls_pipe_demuxer;
+extern const AVInputFormat  ff_image_jpegxl_pipe_demuxer;
 extern const AVInputFormat  ff_image_pam_pipe_demuxer;
 extern const AVInputFormat  ff_image_pbm_pipe_demuxer;
 extern const AVInputFormat  ff_image_pcx_pipe_demuxer;
diff --git a/libavformat/img2.c b/libavformat/img2.c
index 4153102c92..13b1b997b8 100644
--- a/libavformat/img2.c
+++ b/libavformat/img2.c
@@ -87,6 +87,7 @@ const IdStrMap ff_img_tags[] = {
 { AV_CODEC_ID_GEM,"img"  },
 { AV_CODEC_ID_GEM,"ximg" },
 { AV_CODEC_ID_GEM,"timg" },
+{ AV_CODEC_ID_JPEGXL, "jxl"  },
 { AV_CODEC_ID_NONE,   NULL   }
 };
 
diff --git a/libavformat/img2dec.c b/libavformat/img2dec.c
index b9c06c5b54..32cadacb9d 100644
--- a/libavformat/img2dec.c
+++ b/libavformat/img2dec.c
@@ -32,6 +32,7 @@
 #include "libavutil/parseutils.h"
 #include "libavutil/intreadwrite.h"
 #include "libavcodec/gif.h"
+#include "libavcodec/jpegxl.h"
 #include "avformat.h"
 #include "avio_internal.h"
 #include "internal.h"
@@ -836,6 +837,25 @@ static int jpegls_probe(const AVProbeData *p)
 return 0;
 }
 
+static int jpegxl_probe(const AVProbeData *p)
+{
+const uint8_t *b = p->buf;
+
+/* ISOBMFF-based container */
+/* 0x4a584c20 == "JXL " */
+if (AV_RL64(b) == FF_JPEGXL_CONTAINER_SIGNATURE_LE)
+return AVPROBE_SCORE_EXTENSION + 1;
+#if CONFIG_JPEGXL_PARSER
+/* Raw codestreams all start with 0xff0a */
+if (AV_RL16(b) != FF_JPEGXL_CODESTREAM_SIGNATURE_LE)
+return 0;
+if (avpriv_jpegxl_verify_codestream_header(NULL, p->buf, p->buf_size) == 0)
+return AVPROBE_SCORE_MAX - 2;
+#endif
+return 0;
+}
+
+
 static int pcx_probe(const AVProbeData *p)
 {
 const uint8_t *b = p->buf;
@@ -1165,6 +1185,7 @@ IMAGEAUTO_DEMUXER(gif,   GIF)
 IMAGEAUTO_DEMUXER_EXT(j2k,   JPEG2000, J2K)
 IMAGEAUTO_DEMUXER_EXT(jpeg,  MJPEG, JPEG)
 IMAGEAUTO_DEMUXER(jpegls,JPEGLS)
+IMAGEAUTO_DEMUXER(jpegxl,JPEGXL)
 IMAGEAUTO_DEMUXER(pam,   PAM)
 IMAGEAUTO_DEMUXER(pbm,   PBM)
 IMAGEAUTO_DEMUXER(pcx,   PCX)
diff --git a/libavformat/img2enc.c b/libavformat/img2enc.c
index 9b3b8741c8..e6ec6a50aa 100644
--- a/libavformat/img2enc.c
+++ b/libavformat/img2enc.c
@@ -263,9 +263,9 @@ static const AVClass img2mux_class = {
 const AVOutputFormat ff_image2_muxer = {
 .name   = "image2",
 .long_name  = NULL_IF_CONFIG_SMALL("image2 sequence"),
-.extensions = 
"bmp,dpx,exr,jls,jpeg,jpg,ljpg,pam,pbm,pcx,pfm,pgm,pgmyuv,png,"
-  
"ppm,sgi,tga,tif,tiff,jp2,j2c,j2k,xwd,sun,ras,rs,im1,im8,im24,"
-  "sunras,xbm,xface,pix,y",
+.extensions = 
"bmp,dpx,exr,jls,jpeg,jpg,jxl,ljpg,pam,pbm,pcx,pfm,pgm,pgmyuv,"
+  
"png,ppm,sgi,tga,tif,tiff,jp2,j2c,j2k,xwd,sun,ras,rs,im1,im8,"
+  "im24,sunras,xbm,xface,pix,y",
 .priv_data_size = sizeof(VideoMuxData),
 .video_codec= AV_CODEC_ID_MJPEG,
 .write_header   = write_header,
diff --git a/libavformat/mov.c b/libavformat/mov.c
index 6c847de164..c4b8873b0a 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -7697,6 +7697,7 @@ static int mov_probe(const AVProbeData *p)
 if (tag == MKTAG('f','t','y','p') &&
(   AV_RL32(p->buf + offset + 8) == MKTAG('j','p','2',' 
')
 || AV_RL32(p->buf + offset + 8) == MKTAG('j','p','x',' 
')
+|| AV_RL32(p->buf + offset + 8) == MKTAG('j','x','l',' 
')
 )) {
 score = FFMAX(score, 5);
 } else {
diff --git a/libavformat/version.h b/libavformat/version.h
index f4a26c2870..683184d5da 100644
--- a/libavformat/version.h
+++ b/libavformat/version.h
@@ -31,8 +31,8 @@
 
 #include "version_major.h"
 
-#define LIBAVFORMAT_VERSION_MINOR  20
-#define LIBAVFORMAT_VERSION_MICRO 101
+#define LIBAVFORMAT_VERSION_MINOR  21
+#define LIBAVFORMAT_VERSION_MICRO 100
 
 #define LIBAVFORMAT_VERSION_INT AV_VERSION_INT(LIBAVFORMAT_VERSION_MAJOR, \
LIBAVFORMAT_VERSION_MINOR, \
-- 
2.35.1

___
ffmpeg-devel mailing list

[FFmpeg-devel] [PATCH v11 2/5] avcodec/libjxl: add Jpeg XL decoding via libjxl

2022-03-31 Thread Leo Izen

This commit adds decoding support to libavcodec
for Jpeg XL images via the external library libjxl.
---
 MAINTAINERS   |   1 +
 configure |   5 +
 doc/general_contents.texi |   7 +
 libavcodec/Makefile   |   1 +
 libavcodec/allcodecs.c|   1 +
 libavcodec/libjxl.c   |  70 +
 libavcodec/libjxl.h   |  48 ++
 libavcodec/libjxldec.c| 301 ++
 8 files changed, 434 insertions(+)
 create mode 100644 libavcodec/libjxl.c
 create mode 100644 libavcodec/libjxl.h
 create mode 100644 libavcodec/libjxldec.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 9ab08bad8e..fd79234d23 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -195,6 +195,7 @@ Codecs:
   libcodec2.c   Tomas Härdin
   libdirac* David Conrad
   libdavs2.cHuiwen Ren
+  libjxl*.c, libjxl.h   Leo Izen
   libgsm.c  Michel Bardiaux
   libkvazaar.c  Arttu Ylä-Outinen
   libopenh264enc.c  Martin Storsjo, Linjie Fu
diff --git a/configure b/configure
index e4d36aa639..969b13eba3 100755
--- a/configure
+++ b/configure
@@ -240,6 +240,7 @@ External library support:
   --enable-libiec61883 enable iec61883 via libiec61883 [no]
   --enable-libilbc enable iLBC de/encoding via libilbc [no]
   --enable-libjack enable JACK audio sound server [no]
+  --enable-libjxl  enable JPEG XL decoding via libjxl [no]
   --enable-libklvanc   enable Kernel Labs VANC processing [no]
   --enable-libkvazaar  enable HEVC encoding via libkvazaar [no]
   --enable-liblensfun  enable lensfun lens correction [no]
@@ -1833,6 +1834,7 @@ EXTERNAL_LIBRARY_LIST="
 libiec61883
 libilbc
 libjack
+libjxl
 libklvanc
 libkvazaar
 libmodplug
@@ -3329,6 +3331,7 @@ libgsm_ms_decoder_deps="libgsm"
 libgsm_ms_encoder_deps="libgsm"
 libilbc_decoder_deps="libilbc"
 libilbc_encoder_deps="libilbc"
+libjxl_decoder_deps="libjxl libjxl_threads"
 libkvazaar_encoder_deps="libkvazaar"
 libmodplug_demuxer_deps="libmodplug"
 libmp3lame_encoder_deps="libmp3lame"
@@ -6541,6 +6544,8 @@ enabled libgsm&& { for gsm_hdr in "gsm.h" 
"gsm/gsm.h"; do
check_lib libgsm "${gsm_hdr}" gsm_create 
-lgsm && break;
done || die "ERROR: libgsm not found"; }
 enabled libilbc   && require libilbc ilbc.h WebRtcIlbcfix_InitDecode 
-lilbc $pthreads_extralibs
+enabled libjxl&& require_pkg_config libjxl "libjxl >= 0.7.0" 
jxl/decode.h JxlDecoderVersion &&
+ require_pkg_config libjxl_threads "libjxl_threads 
>= 0.7.0" jxl/thread_parallel_runner.h JxlThreadParallelRunner
 enabled libklvanc && require libklvanc libklvanc/vanc.h 
klvanc_context_create -lklvanc
 enabled libkvazaar&& require_pkg_config libkvazaar "kvazaar >= 0.8.1" 
kvazaar.h kvz_api_get
 enabled liblensfun&& require_pkg_config liblensfun lensfun lensfun.h 
lf_db_new
diff --git a/doc/general_contents.texi b/doc/general_contents.texi
index fcd9da1b34..a893347fbe 100644
--- a/doc/general_contents.texi
+++ b/doc/general_contents.texi
@@ -171,6 +171,13 @@ Go to @url{https://github.com/TimothyGu/libilbc} and 
follow the instructions for
 installing the library. Then pass @code{--enable-libilbc} to configure to
 enable it.
 
+@section libjxl
+
+JPEG XL is an image format intended to fully replace legacy JPEG for an 
extended
+period of life. See @url{https://jpegxl.info/} for more information, and see
+@url{https://github.com/libjxl/libjxl} for the library source. You can pass
+@code{--enable-libjxl} to configure in order enable the libjxl wrapper.
+
 @section libvpx
 
 FFmpeg can make use of the libvpx library for VP8/VP9 decoding and encoding.
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 3723601b3d..c00b0d3246 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1060,6 +1060,7 @@ OBJS-$(CONFIG_LIBGSM_MS_DECODER)  += libgsmdec.o
 OBJS-$(CONFIG_LIBGSM_MS_ENCODER)  += libgsmenc.o
 OBJS-$(CONFIG_LIBILBC_DECODER)+= libilbc.o
 OBJS-$(CONFIG_LIBILBC_ENCODER)+= libilbc.o
+OBJS-$(CONFIG_LIBJXL_DECODER) += libjxldec.o libjxl.o
 OBJS-$(CONFIG_LIBKVAZAAR_ENCODER) += libkvazaar.o
 OBJS-$(CONFIG_LIBMP3LAME_ENCODER) += libmp3lame.o
 OBJS-$(CONFIG_LIBOPENCORE_AMRNB_DECODER)  += libopencore-amr.o
diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
index 22d56760ec..a9cd69dfce 100644
--- a/libavcodec/allcodecs.c
+++ b/libavcodec/allcodecs.c
@@ -749,6 +749,7 @@ extern const FFCodec ff_libgsm_ms_encoder;
 extern const FFCodec ff_libgsm_ms_decoder;
 extern const FFCodec ff_libilbc_encoder;
 extern const FFCodec ff_libilbc_decoder;
+extern const FFCodec ff_libjxl_decoder;
 extern const FFCodec ff_libmp3lame_encoder;
 extern const FFCodec

[FFmpeg-devel] [PATCH v11 1/1] avformat: Add IPFS protocol support.

2022-03-31 Thread Mark Gaiser

This patch adds support for:
- ffplay ipfs://
- ffplay ipns://

IPFS data can be played from so called "ipfs gateways".
A gateway is essentially a webserver that gives access to the
distributed IPFS network.

This protocol support (ipfs and ipns) therefore translates
ipfs:// and ipns:// to a http:// url. This resulting url is
then handled by the http protocol. It could also be https
depending on the gateway provided.

To use this protocol, a gateway must be provided.
If you do nothing it will try to find it in your
$HOME/.ipfs/gateway file. The ways to set it manually are:
1. Define a -gateway  to the gateway.
2. Define $IPFS_GATEWAY with the full http link to the gateway.
3. Define $IPFS_PATH and point it to the IPFS data path.
4. Have IPFS running in your local user folder (under $HOME/.ipfs).

Signed-off-by: Mark Gaiser 
---
 configure |   2 +
 doc/protocols.texi|  30 
 libavformat/Makefile  |   2 +
 libavformat/ipfsgateway.c | 328 ++
 libavformat/protocols.c   |   2 +
 5 files changed, 364 insertions(+)
 create mode 100644 libavformat/ipfsgateway.c

diff --git a/configure b/configure
index e4d36aa639..55af90957a 100755
--- a/configure
+++ b/configure
@@ -3579,6 +3579,8 @@ udp_protocol_select="network"
 udplite_protocol_select="network"
 unix_protocol_deps="sys_un_h"
 unix_protocol_select="network"
+ipfs_protocol_select="https_protocol"
+ipns_protocol_select="https_protocol"
 
 # external library protocols
 libamqp_protocol_deps="librabbitmq"
diff --git a/doc/protocols.texi b/doc/protocols.texi
index d207df0b52..7c9c0a4808 100644
--- a/doc/protocols.texi
+++ b/doc/protocols.texi
@@ -2025,5 +2025,35 @@ decoding errors.
 
 @end table
 
+@section ipfs
+
+InterPlanetary File System (IPFS) protocol support. One can access files 
stored 
+on the IPFS network through so called gateways. Those are http(s) endpoints.
+This protocol wraps the IPFS native protocols (ipfs:// and ipns://) to be send 
+to such a gateway. Users can (and should) host their own node which means this 
+protocol will use your local machine gateway to access files on the IPFS 
network.
+
+If a user doesn't have a node of their own then the public gateway dweb.link 
is 
+used by default.
+
+You can use this protocol in 2 ways. Using IPFS:
+@example
+ffplay ipfs://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
+@end example
+
+Or the IPNS protocol (IPNS is mutable IPFS):
+@example
+ffplay ipns://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
+@end example
+
+You can also change the gateway to be used:
+
+@table @option
+
+@item gateway
+Defines the gateway to use. When nothing is provided the protocol will first 
try 
+your local gateway. If that fails dweb.link will be used.
+
+@end table
 
 @c man end PROTOCOLS
diff --git a/libavformat/Makefile b/libavformat/Makefile
index d7182d6bd8..e3233fd7ac 100644
--- a/libavformat/Makefile
+++ b/libavformat/Makefile
@@ -660,6 +660,8 @@ OBJS-$(CONFIG_SRTP_PROTOCOL) += srtpproto.o 
srtp.o
 OBJS-$(CONFIG_SUBFILE_PROTOCOL)  += subfile.o
 OBJS-$(CONFIG_TEE_PROTOCOL)  += teeproto.o tee_common.o
 OBJS-$(CONFIG_TCP_PROTOCOL)  += tcp.o
+OBJS-$(CONFIG_IPFS_PROTOCOL) += ipfsgateway.o
+OBJS-$(CONFIG_IPNS_PROTOCOL) += ipfsgateway.o
 TLS-OBJS-$(CONFIG_GNUTLS)+= tls_gnutls.o
 TLS-OBJS-$(CONFIG_LIBTLS)+= tls_libtls.o
 TLS-OBJS-$(CONFIG_MBEDTLS)   += tls_mbedtls.o
diff --git a/libavformat/ipfsgateway.c b/libavformat/ipfsgateway.c
new file mode 100644
index 00..725cc5e474
--- /dev/null
+++ b/libavformat/ipfsgateway.c
@@ -0,0 +1,328 @@
+/*
+ * IPFS and IPNS protocol support through IPFS Gateway.
+ * Copyright (c) 2022 Mark Gaiser
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/avstring.h"
+#include "libavutil/opt.h"
+#include "url.h"
+#include 
+
+typedef struct IPFSGatewayContext {
+AVClass *class;
+URLContext *inner;
+// Is filled by the -gateway argument and not changed after.
+char *gateway;
+// If the above gateway is non null, it will be copied into this buffer.
+// Else this buffer will contain the auto detected gateway.
+// In either case, the gateway to use

[FFmpeg-devel] [PATCH v11 0/1] Add IPFS protocol support.

2022-03-31 Thread Mark Gaiser

Hi,

This patch series adds support for IPFS.
V11:
- Cleaned up the headers. What's there is actually needed now.
- Some more strict checking (namely on fgets)
- Merged long log in one log entry.
- Another allocation check (this time for "fulluri")
- Lots of formatting changes (not visual) to be more in line with the soft 
  80 char limit.
V10:
- Removed free on c->gateway in ipfs_close to fix a double free.
V9:
- dweb.link as fallback gateway. This is managed by Protocol Labs (like IPFS).
- Change all errors to warnings as not having a gateway still gives you a 
  working video playback.
- Changed the console output to be more clear.
V8:
- Removed unnecessary change to set the first gateway_buffer character to 0.
  It made no sense as the buffer is always overwritten in the function context.
- Change %li to %zu (it's intended to print the sizeof in all cases)
V7:
- Removed sanitize_ipfs_gateway. Only the http/https check stayed and that's
  now in translate_ipfs_to_http.
- Added a check for ipfs_cid. It's only to show an error is someone happens to
  profide `ffplay ipfs://` without a cid.
- All snprintf usages are now checked.
- Adding a / to a gateway if it didn't end with it is now done in the same line
  that composes the full resulting url.
- And a couple more minor things.
V6:
- Moved the gateway buffer (now called gateway_buffer) to IPFSGatewayContext
- Changed nearly all PATH_MAX uses to sizeof(...) uses for future flexibility
- The rest is relatively minor feedback changes
V5:
- "c->gateway" is now not modified anymore
- Moved most variables to the stack
- Even more strict checks with the auto detection logic
- Errors are now AVERROR :)
- Added more logging and changed some debug ones to info ones as they are 
  valuable to aid debugging as a user when something goes wrong.
V3 (V4):
- V4: title issue from V3..
- A lot of style changes
- Made url checks a lot more strict
- av_asprintf leak fixes
- So many changes that a diff to v2 is again not sensible.
V2:
- Squashed and changed so much that a diff to v1 was not sensible.

The following is a short summary. In the IPFS ecosystem you access it's content
by a "Content IDentifier" (CID). This CID is, in simplified terms, a hash of 
the content. IPFS itself is a distributed network where any user can run a node
to be part of the network and access files by their CID. If any reachable node 
within that network has the CID, you can get it.

IPFS (as a technology) has two protocols, ipfs and ipns.
The ipfs protocol is the immutable way to access content.
The ipns protocol is a mutable layer on top of it. It's essentially a new CID 
that points to a ipfs CID. This "pointer" if you will can be changed to point 
to something else.
Much more information on how this technology works can be found here [1].

This patch series allows to interact natively with IPFS. That means being able
to access files like:
- ffplay ipfs://
- ffplay ipns://

There are multiple ways to access files on the IPFS network. This patch series
uses the gateway driven way. An IPFS node - by default - exposes a local 
gateway (say http://localhost:8080) which is then used to get content from IPFS.
The gateway functionality on the IPFS side contains optimizations to
be as ideal to streaming data as it can be. Optimizations that the http protocol
in ffmpeg also has and are thus reused for free in this approach.

A note on other "more appropiate" ways, as I received some feedback on that.
For ffmpeg purposes the gateway approach is ideal! There is a "libipfs" but
that would spin up an ipfs node with the overhead of:
- bootstrapping
- connecting to nodes
- finding other nodes to connect too
- finally finding your file

This alternative approach could take minutes before a file is played. The
gateway approach immediately connects to an already running node thus gives
the file the fastest.

Much of the logic in this patch series is to find that gateway and essentially 
rewrite:

"ipfs://"

to:

"http://localhost:8080/ipfs/"

Once that's found it's forwared to the protocol handler where eventually the
http protocol is going to handle it. Note that it could also be https. There's 
enough flexibility in the implementation to allow the user to provide a 
gateway. There are also public https gateways which can be used just as well.

After this patch is accepted, I'll work on getting IPFS supported in:
- mpv (requires this ffmpeg patch)
- vlc (prefers this patch but can be made to work without this patch)
- kodi (requires this ffmpeg patch)

Best regards,
Mark Gaiser

[1] https://docs.ipfs.io/concepts/

Mark Gaiser (1):
  avformat: Add IPFS protocol support.

 configure |   2 +
 doc/protocols.texi|  30 
 libavformat/Makefile  |   2 +
 libavformat/ipfsgateway.c | 328 ++
 libavformat/protocols.c   |   2 +
 5 files changed, 364 insertions(+)
 create mode 100644 libavformat/ipfsgateway.c

-- 
2.35.1

Re: [FFmpeg-devel] [PATCH v10 1/1] avformat: Add IPFS protocol support.

2022-03-31 Thread Mark Gaiser

On Fri, Apr 1, 2022 at 1:01 AM Andreas Rheinhardt <
andreas.rheinha...@outlook.com> wrote:

> Mark Gaiser:
> > On Thu, Mar 31, 2022 at 11:44 PM Andreas Rheinhardt <
> > andreas.rheinha...@outlook.com> wrote:
> >
> >> Mark Gaiser:
> >>> On Wed, Mar 30, 2022 at 3:57 PM Andreas Rheinhardt <
> >>> andreas.rheinha...@outlook.com> wrote:
> >>>
>  Mark Gaiser:
> > On Wed, Mar 30, 2022 at 2:21 PM Andreas Rheinhardt <
> > andreas.rheinha...@outlook.com> wrote:
> >
> >> Mark Gaiser:
> >>> This patch adds support for:
> >>> - ffplay ipfs://
> >>> - ffplay ipns://
> >>>
> >>> IPFS data can be played from so called "ipfs gateways".
> >>> A gateway is essentially a webserver that gives access to the
> >>> distributed IPFS network.
> >>>
> >>> This protocol support (ipfs and ipns) therefore translates
> >>> ipfs:// and ipns:// to a http:// url. This resulting url is
> >>> then handled by the http protocol. It could also be https
> >>> depending on the gateway provided.
> >>>
> >>> To use this protocol, a gateway must be provided.
> >>> If you do nothing it will try to find it in your
> >>> $HOME/.ipfs/gateway file. The ways to set it manually are:
> >>> 1. Define a -gateway  to the gateway.
> >>> 2. Define $IPFS_GATEWAY with the full http link to the gateway.
> >>> 3. Define $IPFS_PATH and point it to the IPFS data path.
> >>> 4. Have IPFS running in your local user folder (under $HOME/.ipfs).
> >>>
> >>> Signed-off-by: Mark Gaiser 
> >>> ---
> >>>  configure |   2 +
> >>>  doc/protocols.texi|  30 
> >>>  libavformat/Makefile  |   2 +
> >>>  libavformat/ipfsgateway.c | 309
> >> ++
> >>>  libavformat/protocols.c   |   2 +
> >>>  5 files changed, 345 insertions(+)
> >>>  create mode 100644 libavformat/ipfsgateway.c
> >>>
> >>> diff --git a/configure b/configure
> >>> index e4d36aa639..55af90957a 100755
> >>> --- a/configure
> >>> +++ b/configure
> >>> @@ -3579,6 +3579,8 @@ udp_protocol_select="network"
> >>>  udplite_protocol_select="network"
> >>>  unix_protocol_deps="sys_un_h"
> >>>  unix_protocol_select="network"
> >>> +ipfs_protocol_select="https_protocol"
> >>> +ipns_protocol_select="https_protocol"
> >>>
> >>>  # external library protocols
> >>>  libamqp_protocol_deps="librabbitmq"
> >>> diff --git a/doc/protocols.texi b/doc/protocols.texi
> >>> index d207df0b52..7c9c0a4808 100644
> >>> --- a/doc/protocols.texi
> >>> +++ b/doc/protocols.texi
> >>> @@ -2025,5 +2025,35 @@ decoding errors.
> >>>
> >>>  @end table
> >>>
> >>> +@section ipfs
> >>> +
> >>> +InterPlanetary File System (IPFS) protocol support. One can access
> >> files stored
> >>> +on the IPFS network through so called gateways. Those are http(s)
> >> endpoints.
> >>> +This protocol wraps the IPFS native protocols (ipfs:// and
> ipns://)
> >> to
> >> be send
> >>> +to such a gateway. Users can (and should) host their own node
> which
> >> means this
> >>> +protocol will use your local machine gateway to access files on
> the
> >> IPFS network.
> >>> +
> >>> +If a user doesn't have a node of their own then the public gateway
> >> dweb.link is
> >>> +used by default.
> >>> +
> >>> +You can use this protocol in 2 ways. Using IPFS:
> >>> +@example
> >>> +ffplay ipfs://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
> >>> +@end example
> >>> +
> >>> +Or the IPNS protocol (IPNS is mutable IPFS):
> >>> +@example
> >>> +ffplay ipns://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
> >>> +@end example
> >>> +
> >>> +You can also change the gateway to be used:
> >>> +
> >>> +@table @option
> >>> +
> >>> +@item gateway
> >>> +Defines the gateway to use. When nothing is provided the protocol
> >> will
> >> first try
> >>> +your local gateway. If that fails dweb.link will be used.
> >>> +
> >>> +@end table
> >>>
> >>>  @c man end PROTOCOLS
> >>> diff --git a/libavformat/Makefile b/libavformat/Makefile
> >>> index d7182d6bd8..e3233fd7ac 100644
> >>> --- a/libavformat/Makefile
> >>> +++ b/libavformat/Makefile
> >>> @@ -660,6 +660,8 @@ OBJS-$(CONFIG_SRTP_PROTOCOL) +=
> >> srtpproto.o srtp.o
> >>>  OBJS-$(CONFIG_SUBFILE_PROTOCOL)  += subfile.o
> >>>  OBJS-$(CONFIG_TEE_PROTOCOL)  += teeproto.o
> tee_common.o
> >>>  OBJS-$(CONFIG_TCP_PROTOCOL)  += tcp.o
> >>> +OBJS-$(CONFIG_IPFS_PROTOCOL) += ipfsgateway.o
> >>> +OBJS-$(CONFIG_IPNS_PROTOCOL) += ipfsgateway.o
> >>>  TLS-OBJS-$(CONFIG_GNUTLS)+= tls_gnutls.o
> >>>  TLS-OBJS-$(CONFIG_LIBTLS)+= tls_libtls.o
> >>>

Re: [FFmpeg-devel] [PATCH v5 2/2] lavf/mpegenc: fix termination on error conditions

2022-03-31 Thread Andreas Rheinhardt

Nicolas Gaullier:
> Avoid an infinite 'retry' loop in output_packet when flushing.
> 
> Signed-off-by: Nicolas Gaullier 
> ---
>  libavformat/mpegenc.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/libavformat/mpegenc.c b/libavformat/mpegenc.c
> index e0955a7d33..e113a42867 100644
> --- a/libavformat/mpegenc.c
> +++ b/libavformat/mpegenc.c
> @@ -1002,7 +1002,7 @@ static int output_packet(AVFormatContext *ctx, int 
> flush)
>  MpegMuxContext *s = ctx->priv_data;
>  AVStream *st;
>  StreamInfo *stream;
> -int i, avail_space = 0, es_size, trailer_size;
> +int i, has_avail_data = 0, avail_space = 0, es_size, trailer_size;
>  int best_i = -1;
>  int best_score = INT_MIN;
>  int ignore_constraints = 0;
> @@ -1028,6 +1028,7 @@ retry:
>  if (avail_data == 0)
>  continue;
>  av_assert0(avail_data > 0);
> +has_avail_data = 1;
>  
>  if (space < s->packet_size && !ignore_constraints)
>  continue;
> @@ -1048,6 +1049,8 @@ retry:
>  int64_t best_dts = INT64_MAX;
>  int has_premux = 0;
>  
> +if (!has_avail_data)
> +return 0;
>  for (i = 0; i < ctx->nb_streams; i++) {
>  AVStream *st = ctx->streams[i];
>  StreamInfo *stream = st->priv_data;

in case of errors, the context is left in an inconsistent state: The
PacketDesc linked-list claims that there is data in the FIFO although
this is wrong. I always prefer avoiding such scenarios over fixing them
lateron. In this case, fixing them would mean growing the FIFO before
allocating the new PacketDesc (if the FIFO needs growing at all). Shall
I do this or do you want to?
(In any case, thanks for reporting this issue.)

- Andreas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v10 1/1] avformat: Add IPFS protocol support.

2022-03-31 Thread Andreas Rheinhardt

Mark Gaiser:
> On Thu, Mar 31, 2022 at 11:44 PM Andreas Rheinhardt <
> andreas.rheinha...@outlook.com> wrote:
> 
>> Mark Gaiser:
>>> On Wed, Mar 30, 2022 at 3:57 PM Andreas Rheinhardt <
>>> andreas.rheinha...@outlook.com> wrote:
>>>
 Mark Gaiser:
> On Wed, Mar 30, 2022 at 2:21 PM Andreas Rheinhardt <
> andreas.rheinha...@outlook.com> wrote:
>
>> Mark Gaiser:
>>> This patch adds support for:
>>> - ffplay ipfs://
>>> - ffplay ipns://
>>>
>>> IPFS data can be played from so called "ipfs gateways".
>>> A gateway is essentially a webserver that gives access to the
>>> distributed IPFS network.
>>>
>>> This protocol support (ipfs and ipns) therefore translates
>>> ipfs:// and ipns:// to a http:// url. This resulting url is
>>> then handled by the http protocol. It could also be https
>>> depending on the gateway provided.
>>>
>>> To use this protocol, a gateway must be provided.
>>> If you do nothing it will try to find it in your
>>> $HOME/.ipfs/gateway file. The ways to set it manually are:
>>> 1. Define a -gateway  to the gateway.
>>> 2. Define $IPFS_GATEWAY with the full http link to the gateway.
>>> 3. Define $IPFS_PATH and point it to the IPFS data path.
>>> 4. Have IPFS running in your local user folder (under $HOME/.ipfs).
>>>
>>> Signed-off-by: Mark Gaiser 
>>> ---
>>>  configure |   2 +
>>>  doc/protocols.texi|  30 
>>>  libavformat/Makefile  |   2 +
>>>  libavformat/ipfsgateway.c | 309
>> ++
>>>  libavformat/protocols.c   |   2 +
>>>  5 files changed, 345 insertions(+)
>>>  create mode 100644 libavformat/ipfsgateway.c
>>>
>>> diff --git a/configure b/configure
>>> index e4d36aa639..55af90957a 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -3579,6 +3579,8 @@ udp_protocol_select="network"
>>>  udplite_protocol_select="network"
>>>  unix_protocol_deps="sys_un_h"
>>>  unix_protocol_select="network"
>>> +ipfs_protocol_select="https_protocol"
>>> +ipns_protocol_select="https_protocol"
>>>
>>>  # external library protocols
>>>  libamqp_protocol_deps="librabbitmq"
>>> diff --git a/doc/protocols.texi b/doc/protocols.texi
>>> index d207df0b52..7c9c0a4808 100644
>>> --- a/doc/protocols.texi
>>> +++ b/doc/protocols.texi
>>> @@ -2025,5 +2025,35 @@ decoding errors.
>>>
>>>  @end table
>>>
>>> +@section ipfs
>>> +
>>> +InterPlanetary File System (IPFS) protocol support. One can access
>> files stored
>>> +on the IPFS network through so called gateways. Those are http(s)
>> endpoints.
>>> +This protocol wraps the IPFS native protocols (ipfs:// and ipns://)
>> to
>> be send
>>> +to such a gateway. Users can (and should) host their own node which
>> means this
>>> +protocol will use your local machine gateway to access files on the
>> IPFS network.
>>> +
>>> +If a user doesn't have a node of their own then the public gateway
>> dweb.link is
>>> +used by default.
>>> +
>>> +You can use this protocol in 2 ways. Using IPFS:
>>> +@example
>>> +ffplay ipfs://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
>>> +@end example
>>> +
>>> +Or the IPNS protocol (IPNS is mutable IPFS):
>>> +@example
>>> +ffplay ipns://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
>>> +@end example
>>> +
>>> +You can also change the gateway to be used:
>>> +
>>> +@table @option
>>> +
>>> +@item gateway
>>> +Defines the gateway to use. When nothing is provided the protocol
>> will
>> first try
>>> +your local gateway. If that fails dweb.link will be used.
>>> +
>>> +@end table
>>>
>>>  @c man end PROTOCOLS
>>> diff --git a/libavformat/Makefile b/libavformat/Makefile
>>> index d7182d6bd8..e3233fd7ac 100644
>>> --- a/libavformat/Makefile
>>> +++ b/libavformat/Makefile
>>> @@ -660,6 +660,8 @@ OBJS-$(CONFIG_SRTP_PROTOCOL) +=
>> srtpproto.o srtp.o
>>>  OBJS-$(CONFIG_SUBFILE_PROTOCOL)  += subfile.o
>>>  OBJS-$(CONFIG_TEE_PROTOCOL)  += teeproto.o tee_common.o
>>>  OBJS-$(CONFIG_TCP_PROTOCOL)  += tcp.o
>>> +OBJS-$(CONFIG_IPFS_PROTOCOL) += ipfsgateway.o
>>> +OBJS-$(CONFIG_IPNS_PROTOCOL) += ipfsgateway.o
>>>  TLS-OBJS-$(CONFIG_GNUTLS)+= tls_gnutls.o
>>>  TLS-OBJS-$(CONFIG_LIBTLS)+= tls_libtls.o
>>>  TLS-OBJS-$(CONFIG_MBEDTLS)   += tls_mbedtls.o
>>> diff --git a/libavformat/ipfsgateway.c b/libavformat/ipfsgateway.c
>>> new file mode 100644
>>> index 00..1a039589c0
>>> --- /dev/null
>>> +++ b/libavformat/ipfsgateway.c
>>> @@ -0,0 +1,309 @@
>>> +/*
>>> + * IPFS and IPNS protocol support through IPFS Gateway.

Re: [FFmpeg-devel] [PATCH v10 1/1] avformat: Add IPFS protocol support.

2022-03-31 Thread Mark Gaiser

On Fri, Apr 1, 2022 at 12:17 AM Mark Gaiser  wrote:

> On Thu, Mar 31, 2022 at 11:44 PM Andreas Rheinhardt <
> andreas.rheinha...@outlook.com> wrote:
>
>> Mark Gaiser:
>> > On Wed, Mar 30, 2022 at 3:57 PM Andreas Rheinhardt <
>> > andreas.rheinha...@outlook.com> wrote:
>> >
>> >> Mark Gaiser:
>> >>> On Wed, Mar 30, 2022 at 2:21 PM Andreas Rheinhardt <
>> >>> andreas.rheinha...@outlook.com> wrote:
>> >>>
>>  Mark Gaiser:
>> > This patch adds support for:
>> > - ffplay ipfs://
>> > - ffplay ipns://
>> >
>> > IPFS data can be played from so called "ipfs gateways".
>> > A gateway is essentially a webserver that gives access to the
>> > distributed IPFS network.
>> >
>> > This protocol support (ipfs and ipns) therefore translates
>> > ipfs:// and ipns:// to a http:// url. This resulting url is
>> > then handled by the http protocol. It could also be https
>> > depending on the gateway provided.
>> >
>> > To use this protocol, a gateway must be provided.
>> > If you do nothing it will try to find it in your
>> > $HOME/.ipfs/gateway file. The ways to set it manually are:
>> > 1. Define a -gateway  to the gateway.
>> > 2. Define $IPFS_GATEWAY with the full http link to the gateway.
>> > 3. Define $IPFS_PATH and point it to the IPFS data path.
>> > 4. Have IPFS running in your local user folder (under $HOME/.ipfs).
>> >
>> > Signed-off-by: Mark Gaiser 
>> > ---
>> >  configure |   2 +
>> >  doc/protocols.texi|  30 
>> >  libavformat/Makefile  |   2 +
>> >  libavformat/ipfsgateway.c | 309
>> ++
>> >  libavformat/protocols.c   |   2 +
>> >  5 files changed, 345 insertions(+)
>> >  create mode 100644 libavformat/ipfsgateway.c
>> >
>> > diff --git a/configure b/configure
>> > index e4d36aa639..55af90957a 100755
>> > --- a/configure
>> > +++ b/configure
>> > @@ -3579,6 +3579,8 @@ udp_protocol_select="network"
>> >  udplite_protocol_select="network"
>> >  unix_protocol_deps="sys_un_h"
>> >  unix_protocol_select="network"
>> > +ipfs_protocol_select="https_protocol"
>> > +ipns_protocol_select="https_protocol"
>> >
>> >  # external library protocols
>> >  libamqp_protocol_deps="librabbitmq"
>> > diff --git a/doc/protocols.texi b/doc/protocols.texi
>> > index d207df0b52..7c9c0a4808 100644
>> > --- a/doc/protocols.texi
>> > +++ b/doc/protocols.texi
>> > @@ -2025,5 +2025,35 @@ decoding errors.
>> >
>> >  @end table
>> >
>> > +@section ipfs
>> > +
>> > +InterPlanetary File System (IPFS) protocol support. One can access
>>  files stored
>> > +on the IPFS network through so called gateways. Those are http(s)
>>  endpoints.
>> > +This protocol wraps the IPFS native protocols (ipfs:// and
>> ipns://) to
>>  be send
>> > +to such a gateway. Users can (and should) host their own node which
>>  means this
>> > +protocol will use your local machine gateway to access files on the
>>  IPFS network.
>> > +
>> > +If a user doesn't have a node of their own then the public gateway
>>  dweb.link is
>> > +used by default.
>> > +
>> > +You can use this protocol in 2 ways. Using IPFS:
>> > +@example
>> > +ffplay ipfs://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
>> > +@end example
>> > +
>> > +Or the IPNS protocol (IPNS is mutable IPFS):
>> > +@example
>> > +ffplay ipns://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
>> > +@end example
>> > +
>> > +You can also change the gateway to be used:
>> > +
>> > +@table @option
>> > +
>> > +@item gateway
>> > +Defines the gateway to use. When nothing is provided the protocol
>> will
>>  first try
>> > +your local gateway. If that fails dweb.link will be used.
>> > +
>> > +@end table
>> >
>> >  @c man end PROTOCOLS
>> > diff --git a/libavformat/Makefile b/libavformat/Makefile
>> > index d7182d6bd8..e3233fd7ac 100644
>> > --- a/libavformat/Makefile
>> > +++ b/libavformat/Makefile
>> > @@ -660,6 +660,8 @@ OBJS-$(CONFIG_SRTP_PROTOCOL) +=
>>  srtpproto.o srtp.o
>> >  OBJS-$(CONFIG_SUBFILE_PROTOCOL)  += subfile.o
>> >  OBJS-$(CONFIG_TEE_PROTOCOL)  += teeproto.o tee_common.o
>> >  OBJS-$(CONFIG_TCP_PROTOCOL)  += tcp.o
>> > +OBJS-$(CONFIG_IPFS_PROTOCOL) += ipfsgateway.o
>> > +OBJS-$(CONFIG_IPNS_PROTOCOL) += ipfsgateway.o
>> >  TLS-OBJS-$(CONFIG_GNUTLS)+= tls_gnutls.o
>> >  TLS-OBJS-$(CONFIG_LIBTLS)+= tls_libtls.o
>> >  TLS-OBJS-$(CONFIG_MBEDTLS)   += tls_mbedtls.o
>> > diff --git a/libavformat/ipfsgateway.c b/libavformat/ipfsgateway.c
>> > new file mode 100644
>> > index 00..1a039589c0
>> >

Re: [FFmpeg-devel] [PATCH v10 1/1] avformat: Add IPFS protocol support.

2022-03-31 Thread Mark Gaiser

On Thu, Mar 31, 2022 at 11:44 PM Andreas Rheinhardt <
andreas.rheinha...@outlook.com> wrote:

> Mark Gaiser:
> > On Wed, Mar 30, 2022 at 3:57 PM Andreas Rheinhardt <
> > andreas.rheinha...@outlook.com> wrote:
> >
> >> Mark Gaiser:
> >>> On Wed, Mar 30, 2022 at 2:21 PM Andreas Rheinhardt <
> >>> andreas.rheinha...@outlook.com> wrote:
> >>>
>  Mark Gaiser:
> > This patch adds support for:
> > - ffplay ipfs://
> > - ffplay ipns://
> >
> > IPFS data can be played from so called "ipfs gateways".
> > A gateway is essentially a webserver that gives access to the
> > distributed IPFS network.
> >
> > This protocol support (ipfs and ipns) therefore translates
> > ipfs:// and ipns:// to a http:// url. This resulting url is
> > then handled by the http protocol. It could also be https
> > depending on the gateway provided.
> >
> > To use this protocol, a gateway must be provided.
> > If you do nothing it will try to find it in your
> > $HOME/.ipfs/gateway file. The ways to set it manually are:
> > 1. Define a -gateway  to the gateway.
> > 2. Define $IPFS_GATEWAY with the full http link to the gateway.
> > 3. Define $IPFS_PATH and point it to the IPFS data path.
> > 4. Have IPFS running in your local user folder (under $HOME/.ipfs).
> >
> > Signed-off-by: Mark Gaiser 
> > ---
> >  configure |   2 +
> >  doc/protocols.texi|  30 
> >  libavformat/Makefile  |   2 +
> >  libavformat/ipfsgateway.c | 309
> ++
> >  libavformat/protocols.c   |   2 +
> >  5 files changed, 345 insertions(+)
> >  create mode 100644 libavformat/ipfsgateway.c
> >
> > diff --git a/configure b/configure
> > index e4d36aa639..55af90957a 100755
> > --- a/configure
> > +++ b/configure
> > @@ -3579,6 +3579,8 @@ udp_protocol_select="network"
> >  udplite_protocol_select="network"
> >  unix_protocol_deps="sys_un_h"
> >  unix_protocol_select="network"
> > +ipfs_protocol_select="https_protocol"
> > +ipns_protocol_select="https_protocol"
> >
> >  # external library protocols
> >  libamqp_protocol_deps="librabbitmq"
> > diff --git a/doc/protocols.texi b/doc/protocols.texi
> > index d207df0b52..7c9c0a4808 100644
> > --- a/doc/protocols.texi
> > +++ b/doc/protocols.texi
> > @@ -2025,5 +2025,35 @@ decoding errors.
> >
> >  @end table
> >
> > +@section ipfs
> > +
> > +InterPlanetary File System (IPFS) protocol support. One can access
>  files stored
> > +on the IPFS network through so called gateways. Those are http(s)
>  endpoints.
> > +This protocol wraps the IPFS native protocols (ipfs:// and ipns://)
> to
>  be send
> > +to such a gateway. Users can (and should) host their own node which
>  means this
> > +protocol will use your local machine gateway to access files on the
>  IPFS network.
> > +
> > +If a user doesn't have a node of their own then the public gateway
>  dweb.link is
> > +used by default.
> > +
> > +You can use this protocol in 2 ways. Using IPFS:
> > +@example
> > +ffplay ipfs://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
> > +@end example
> > +
> > +Or the IPNS protocol (IPNS is mutable IPFS):
> > +@example
> > +ffplay ipns://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
> > +@end example
> > +
> > +You can also change the gateway to be used:
> > +
> > +@table @option
> > +
> > +@item gateway
> > +Defines the gateway to use. When nothing is provided the protocol
> will
>  first try
> > +your local gateway. If that fails dweb.link will be used.
> > +
> > +@end table
> >
> >  @c man end PROTOCOLS
> > diff --git a/libavformat/Makefile b/libavformat/Makefile
> > index d7182d6bd8..e3233fd7ac 100644
> > --- a/libavformat/Makefile
> > +++ b/libavformat/Makefile
> > @@ -660,6 +660,8 @@ OBJS-$(CONFIG_SRTP_PROTOCOL) +=
>  srtpproto.o srtp.o
> >  OBJS-$(CONFIG_SUBFILE_PROTOCOL)  += subfile.o
> >  OBJS-$(CONFIG_TEE_PROTOCOL)  += teeproto.o tee_common.o
> >  OBJS-$(CONFIG_TCP_PROTOCOL)  += tcp.o
> > +OBJS-$(CONFIG_IPFS_PROTOCOL) += ipfsgateway.o
> > +OBJS-$(CONFIG_IPNS_PROTOCOL) += ipfsgateway.o
> >  TLS-OBJS-$(CONFIG_GNUTLS)+= tls_gnutls.o
> >  TLS-OBJS-$(CONFIG_LIBTLS)+= tls_libtls.o
> >  TLS-OBJS-$(CONFIG_MBEDTLS)   += tls_mbedtls.o
> > diff --git a/libavformat/ipfsgateway.c b/libavformat/ipfsgateway.c
> > new file mode 100644
> > index 00..1a039589c0
> > --- /dev/null
> > +++ b/libavformat/ipfsgateway.c
> > @@ -0,0 +1,309 @@
> > +/*
> > + * IPFS and IPNS protocol support through IPFS Gateway.
> > + * Copyright

Re: [FFmpeg-devel] [PATCH v3 00/10] avcodec/vc1: Arm optimisations

2022-03-31 Thread Martin Storsjö


On Thu, 31 Mar 2022, Ben Avison wrote:


The VC1 decoder was missing lots of important fast paths for Arm, especially
for 64-bit Arm. This submission fills in implementations for all functions
where a fast path already existed and the fallback C implementation was
taking 1% or more of the runtime, and adds a new fast path to permit
vc1_unescape_buffer() to be overridden.

I've measured the playback speed on a 1.5 GHz Cortex-A72 (Raspberry Pi 4)
using `ffmpeg -i  -f null -` for a couple of example streams:

Architecture:  AArch32AArch32AArch64AArch64
Stream:1  2  1  2
Before speed:  1.22x  0.82x  1.00x  0.67x
After speed:   1.31x  0.98x  1.39x  1.06x
Improvement:   7.4%   20%39%58%

`make fate` passes on both AArch32 and AArch64.

Changes in v2:

* Refactor checkasm tests to convert some macros into functions.
* Remove cast-to-void of checked_call.
* Limit 16-bit values in idctdsp checkasm test to +/-0x100.
* Reinstate ff_add_pixels_clamped_arm.
* Adapt vc1 deblocking filters to specify stride as ptrdiff_t.
* Add align specifiers to a few VLD/VST instructions for AArch32 deblocking
 filter, and adapt checkasm test not to test with tighter alignment than is
 encountered in normal use.
* Correct unescape buffer memcmp length.
* Update benchmarks for AArch64 idctdsp.


Thanks! From a quick readthrough, this version of the patchset seems good 
to me! I'll run it through some more testing, and push it if everything 
seems to work fine (tomorrow or so).


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v10 1/1] avformat: Add IPFS protocol support.

2022-03-31 Thread Andreas Rheinhardt

Mark Gaiser:
> On Wed, Mar 30, 2022 at 3:57 PM Andreas Rheinhardt <
> andreas.rheinha...@outlook.com> wrote:
> 
>> Mark Gaiser:
>>> On Wed, Mar 30, 2022 at 2:21 PM Andreas Rheinhardt <
>>> andreas.rheinha...@outlook.com> wrote:
>>>
 Mark Gaiser:
> This patch adds support for:
> - ffplay ipfs://
> - ffplay ipns://
>
> IPFS data can be played from so called "ipfs gateways".
> A gateway is essentially a webserver that gives access to the
> distributed IPFS network.
>
> This protocol support (ipfs and ipns) therefore translates
> ipfs:// and ipns:// to a http:// url. This resulting url is
> then handled by the http protocol. It could also be https
> depending on the gateway provided.
>
> To use this protocol, a gateway must be provided.
> If you do nothing it will try to find it in your
> $HOME/.ipfs/gateway file. The ways to set it manually are:
> 1. Define a -gateway  to the gateway.
> 2. Define $IPFS_GATEWAY with the full http link to the gateway.
> 3. Define $IPFS_PATH and point it to the IPFS data path.
> 4. Have IPFS running in your local user folder (under $HOME/.ipfs).
>
> Signed-off-by: Mark Gaiser 
> ---
>  configure |   2 +
>  doc/protocols.texi|  30 
>  libavformat/Makefile  |   2 +
>  libavformat/ipfsgateway.c | 309 ++
>  libavformat/protocols.c   |   2 +
>  5 files changed, 345 insertions(+)
>  create mode 100644 libavformat/ipfsgateway.c
>
> diff --git a/configure b/configure
> index e4d36aa639..55af90957a 100755
> --- a/configure
> +++ b/configure
> @@ -3579,6 +3579,8 @@ udp_protocol_select="network"
>  udplite_protocol_select="network"
>  unix_protocol_deps="sys_un_h"
>  unix_protocol_select="network"
> +ipfs_protocol_select="https_protocol"
> +ipns_protocol_select="https_protocol"
>
>  # external library protocols
>  libamqp_protocol_deps="librabbitmq"
> diff --git a/doc/protocols.texi b/doc/protocols.texi
> index d207df0b52..7c9c0a4808 100644
> --- a/doc/protocols.texi
> +++ b/doc/protocols.texi
> @@ -2025,5 +2025,35 @@ decoding errors.
>
>  @end table
>
> +@section ipfs
> +
> +InterPlanetary File System (IPFS) protocol support. One can access
 files stored
> +on the IPFS network through so called gateways. Those are http(s)
 endpoints.
> +This protocol wraps the IPFS native protocols (ipfs:// and ipns://) to
 be send
> +to such a gateway. Users can (and should) host their own node which
 means this
> +protocol will use your local machine gateway to access files on the
 IPFS network.
> +
> +If a user doesn't have a node of their own then the public gateway
 dweb.link is
> +used by default.
> +
> +You can use this protocol in 2 ways. Using IPFS:
> +@example
> +ffplay ipfs://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
> +@end example
> +
> +Or the IPNS protocol (IPNS is mutable IPFS):
> +@example
> +ffplay ipns://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
> +@end example
> +
> +You can also change the gateway to be used:
> +
> +@table @option
> +
> +@item gateway
> +Defines the gateway to use. When nothing is provided the protocol will
 first try
> +your local gateway. If that fails dweb.link will be used.
> +
> +@end table
>
>  @c man end PROTOCOLS
> diff --git a/libavformat/Makefile b/libavformat/Makefile
> index d7182d6bd8..e3233fd7ac 100644
> --- a/libavformat/Makefile
> +++ b/libavformat/Makefile
> @@ -660,6 +660,8 @@ OBJS-$(CONFIG_SRTP_PROTOCOL) +=
 srtpproto.o srtp.o
>  OBJS-$(CONFIG_SUBFILE_PROTOCOL)  += subfile.o
>  OBJS-$(CONFIG_TEE_PROTOCOL)  += teeproto.o tee_common.o
>  OBJS-$(CONFIG_TCP_PROTOCOL)  += tcp.o
> +OBJS-$(CONFIG_IPFS_PROTOCOL) += ipfsgateway.o
> +OBJS-$(CONFIG_IPNS_PROTOCOL) += ipfsgateway.o
>  TLS-OBJS-$(CONFIG_GNUTLS)+= tls_gnutls.o
>  TLS-OBJS-$(CONFIG_LIBTLS)+= tls_libtls.o
>  TLS-OBJS-$(CONFIG_MBEDTLS)   += tls_mbedtls.o
> diff --git a/libavformat/ipfsgateway.c b/libavformat/ipfsgateway.c
> new file mode 100644
> index 00..1a039589c0
> --- /dev/null
> +++ b/libavformat/ipfsgateway.c
> @@ -0,0 +1,309 @@
> +/*
> + * IPFS and IPNS protocol support through IPFS Gateway.
> + * Copyright (c) 2022 Mark Gaiser
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option)

Re: [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths

2022-03-31 Thread Martin Storsjö


On Thu, 31 Mar 2022, Ben Avison wrote:


On 30/03/2022 15:14, Martin Storsjö wrote:

On Fri, 25 Mar 2022, Ben Avison wrote:

+// Clamp 16-bit signed block coefficients to signed 8-bit (biased by 128)
+// On entry:
+//   x0 -> array of 64x 16-bit coefficients
+//   x1 -> 8-bit results
+//   x2 = row stride for results, bytes
+function ff_put_signed_pixels_clamped_neon, export=1
+    ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x0], #64
+    movi    v4.8b, #128
+    ld1 {v16.16b, v17.16b, v18.16b, v19.16b}, [x0]
+    sqxtn   v0.8b, v0.8h
+    sqxtn   v1.8b, v1.8h
+    sqxtn   v2.8b, v2.8h
+    sqxtn   v3.8b, v3.8h
+    sqxtn   v5.8b, v16.8h
+    add v0.8b, v0.8b, v4.8b


Here you could save 4 add instructions with sqxtn2 and adding .16b vectors, 
but I'm not sure if it's wortwhile. (It reduces the checkasm numbers by 0.7 
for Cortex A72, by 0.3 for A73, but increases the runtime by 1.0 on A53.) 
Stranegely enough, I get much smaller numbers on my A72 than you got.


That's weird. As you say, it should be independent of clock-frequency. FWIW, 
I'm benchmarking on a Raspberry Pi 4; I'd assume all its board variants' 
Cortex-A72 cores are of identical revision.


Now I run it again, I'm getting these figures:

idctdsp.add_pixels_clamped_c: 313.3
idctdsp.add_pixels_clamped_neon: 24.3
idctdsp.put_pixels_clamped_c: 220.3
idctdsp.put_pixels_clamped_neon: 15.5
idctdsp.put_signed_pixels_clamped_c: 210.5
idctdsp.put_signed_pixels_clamped_neon: 19.5

which is more in line with what you see! I am getting a lot of variability 
between runs though - from a small sample, I'm seeing add_pixels_clamped_neon 
coming out as anything from 21 to 30, which is well above the sort of 
differences you're seeing between alternate implementations.


That's indeed weird. I don't have a Raspberry Pi 4 myself though, but for 
functions in this size range on the devboards I test on, I get essentially 
perfectly stable numbers each time - which is great for empirically 
testing different implementation strategies.


This sort of case is always going to be difficult to schedule optimally for 
multiple core - factors like how much dual-issuing is possible, latency 
before values can be used, load speed and the granularity of scoreboarding 
parts of vectors, all vary widely.


Yup, indeed. In most cases, an implementation that is good for one core is 
usually decent for the other ones as well, but sometimes it ends up a 
compromise, where optimizing for one makes things worse for another one. 
As long as the chosen implementation isn't very suboptimal for some common 
cores, it probably doesn't matter much though.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-31 Thread Martin Storsjö


On Thu, 31 Mar 2022, Ben Avison wrote:


On 30/03/2022 14:49, Martin Storsjö wrote:
Looks generally reasonable. Is it possible to factorize out the individual 
transforms (so that you'd e.g. invoke the same macro twice in the 8x8 and 
4x4 functions) without too much loss?


There is a close analogy here with the vertical/horizontal deblocking 
filters, because while there are similarities between the two matrix 
multiplications within a transform, one of them follows a series of loads and 
the other follows a matrix transposition.


If you look for example at ff_vc1_inv_trans_8x8_neon, you'll see I was able 
to do a fair amount of overlap between sections of the function - 
particularly between the transpose and the second matrix multiplication, but 
to a lesser extent between the loads and the first matrix multiplication and 
between the second multiplication and the stores. This sort of overlapping is 
tricky to maintain when using macros. Also, it means the the order of 
operations within each matrix multiply ended up quite different.


At first sight, you might think that the multiplies from the 8x8 function 
(which you might also view as kind of 8-tap filter) would be re-usable for 
the size-8 multiplies in the 8x4 or 4x8 function. Yes, the instructions are 
similar, save for using .4h elements rather than .8h elements, but that has 
significant impacts on scheduling. For example, the Cortex-A72, which is my 
primary target, can only do NEON bit-shifts in one pipeline at once, 
irrespective of whether the vectors are 64-bit or 128-bit long, while other 
instructions don't have such restrictions.


So while in theory you could factor some of this code out more, I suspect any 
attempt to do so would have a detrimental effect on performance.


Ok, fair enough. Yes, it's always a trade off between code simplicity and 
getting the optimal interleaving. As you've spent the effort on making it 
efficient with respect to that, let's go with that then!


(FWIW, for future endeavours, having the checkasm tests in place while 
developing/tuning the implementation does allow getting good empirical 
data on how much you gain from different alternative scheduling choices. I 
usually don't follow the optimization guides for any specific core, but 
track the benchmark numbers for a couple different cores and try to pick a 
scheduling that is a decent compromise for all of them.)


Also, for future work - if you have checkasm tests in place while working 
on the assembly, I usually amend the test with debug printouts that 
visualize the output of the reference and the tested function, and a map 
showing which elements differ - which makes tracking down issues a whole 
lot easier. I don't think any of the checkasm tests in ffmpeg have such 
printouts though, but within e.g. the dav1d project, the checkasm tool is 
extended with helpers for comparing and printing such debug aids.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 1/8] fate/filter-refcmp-*: make refcmp_metadata fail on empty or truncated input

2022-03-31 Thread Marton Balint

On empty input the awk script was always successful which caused the
filter-refcmp tests to always succeed.

Also fix the command lines for refcmp_metadata compare function because it
needs auto conversion filters, and update reference of test
filter-refcmp-psnr-rgb because it was missed in
a7fc78c1a638a32c3695c06f727774c740d675c2 but was never noticed due to the
original issue...

Signed-off-by: Marton Balint 
---
 tests/fate-run.sh |  2 +-
 tests/ref/fate/filter-refcmp-psnr-rgb | 80 +--
 tests/refcmp-metadata.awk |  5 +-
 3 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/tests/fate-run.sh b/tests/fate-run.sh
index fbfc0a925d..5e8d607d88 100755
--- a/tests/fate-run.sh
+++ b/tests/fate-run.sh
@@ -377,7 +377,7 @@ refcmp_metadata(){
 refcmp=$1
 pixfmt=$2
 fuzz=${3:-0.001}
-ffmpeg $FLAGS $ENC_OPTS \
+ffmpeg -auto_conversion_filters $FLAGS $ENC_OPTS \
 -lavfi 
"testsrc2=size=300x200:rate=1:duration=5,format=${pixfmt},split[ref][tmp];[tmp]avgblur=4[enc];[enc][ref]${refcmp},metadata=print:file=-"
 \
 -f null /dev/null | awk -v ref=${ref} -v fuzz=${fuzz} -f 
${base}/refcmp-metadata.awk -
 }
diff --git a/tests/ref/fate/filter-refcmp-psnr-rgb 
b/tests/ref/fate/filter-refcmp-psnr-rgb
index f06db575ac..20abd3dc5a 100644
--- a/tests/ref/fate/filter-refcmp-psnr-rgb
+++ b/tests/ref/fate/filter-refcmp-psnr-rgb
@@ -1,45 +1,45 @@
 frame:0pts:0   pts_time:0
-lavfi.psnr.mse.r=1381.80
-lavfi.psnr.psnr.r=16.73
-lavfi.psnr.mse.g=896.00
-lavfi.psnr.psnr.g=18.61
-lavfi.psnr.mse.b=277.38
-lavfi.psnr.psnr.b=23.70
-lavfi.psnr.mse_avg=851.73
-lavfi.psnr.psnr_avg=18.83
+lavfi.psnr.mse.r=1367.642090
+lavfi.psnr.psnr.r=16.771078
+lavfi.psnr.mse.g=885.804382
+lavfi.psnr.psnr.g=18.657425
+lavfi.psnr.mse.b=274.825073
+lavfi.psnr.psnr.b=23.740240
+lavfi.psnr.mse_avg=842.757202
+lavfi.psnr.psnr_avg=18.873779
 frame:1pts:1   pts_time:1
-lavfi.psnr.mse.r=1380.37
-lavfi.psnr.psnr.r=16.73
-lavfi.psnr.mse.g=975.91
-lavfi.psnr.psnr.g=18.24
-lavfi.psnr.mse.b=435.72
-lavfi.psnr.psnr.b=21.74
-lavfi.psnr.mse_avg=930.67
-lavfi.psnr.psnr_avg=18.44
+lavfi.psnr.mse.r=1356.681152
+lavfi.psnr.psnr.r=16.806026
+lavfi.psnr.mse.g=958.161560
+lavfi.psnr.psnr.g=18.316416
+lavfi.psnr.mse.b=428.238312
+lavfi.psnr.psnr.b=21.813948
+lavfi.psnr.mse_avg=914.360352
+lavfi.psnr.psnr_avg=18.519630
 frame:2pts:2   pts_time:2
-lavfi.psnr.mse.r=1403.20
-lavfi.psnr.psnr.r=16.66
-lavfi.psnr.mse.g=954.05
-lavfi.psnr.psnr.g=18.34
-lavfi.psnr.mse.b=494.22
-lavfi.psnr.psnr.b=21.19
-lavfi.psnr.mse_avg=950.49
-lavfi.psnr.psnr_avg=18.35
+lavfi.psnr.mse.r=1387.254883
+lavfi.psnr.psnr.r=16.709242
+lavfi.psnr.mse.g=939.230957
+lavfi.psnr.psnr.g=18.403080
+lavfi.psnr.mse.b=493.913757
+lavfi.psnr.psnr.b=21.194292
+lavfi.psnr.mse_avg=940.133179
+lavfi.psnr.psnr_avg=18.398911
 frame:3pts:3   pts_time:3
-lavfi.psnr.mse.r=1452.80
-lavfi.psnr.psnr.r=16.51
-lavfi.psnr.mse.g=1001.02
-lavfi.psnr.psnr.g=18.13
-lavfi.psnr.mse.b=557.39
-lavfi.psnr.psnr.b=20.67
-lavfi.psnr.mse_avg=1003.74
-lavfi.psnr.psnr_avg=18.11
+lavfi.psnr.mse.r=1433.291260
+lavfi.psnr.psnr.r=16.567459
+lavfi.psnr.mse.g=990.005859
+lavfi.psnr.psnr.g=18.174425
+lavfi.psnr.mse.b=550.512329
+lavfi.psnr.psnr.b=20.723133
+lavfi.psnr.mse_avg=991.269836
+lavfi.psnr.psnr_avg=18.168884
 frame:4pts:4   pts_time:4
-lavfi.psnr.mse.r=1401.25
-lavfi.psnr.psnr.r=16.67
-lavfi.psnr.mse.g=1009.80
-lavfi.psnr.psnr.g=18.09
-lavfi.psnr.mse.b=602.42
-lavfi.psnr.psnr.b=20.33
-lavfi.psnr.mse_avg=1004.49
-lavfi.psnr.psnr_avg=18.11
+lavfi.psnr.mse.r=1385.949341
+lavfi.psnr.psnr.r=16.713329
+lavfi.psnr.mse.g=997.065796
+lavfi.psnr.psnr.g=18.143566
+lavfi.psnr.mse.b=601.962952
+lavfi.psnr.psnr.b=20.335106
+lavfi.psnr.mse_avg=994.992676
+lavfi.psnr.psnr_avg=18.152605
diff --git a/tests/refcmp-metadata.awk b/tests/refcmp-metadata.awk
index fa21aad0e0..850aaac5a3 100644
--- a/tests/refcmp-metadata.awk
+++ b/tests/refcmp-metadata.awk
@@ -50,13 +50,16 @@ BEGIN {
 }
 
 END {
+result = result && (NR == ref_nr);
 if (result) {
 for (i = 1; i <= ref_nr; i++)
 print ref_lines[i];
 } else {
 for (i = 1; i <= NR; i++)
 print cmp_lines[i];
-if (NR != ref_nr)
+if (NR == 0)
+print "[refcmp] no input" > "/dev/stderr";
+else if (NR != ref_nr)
 print "[refcmp] lines: " NR " != " ref_nr > "/dev/stderr";
 if (delta_max >= fuzz)
 print "[refcmp] delta_max: " delta_max " >= " fuzz > "/dev/stderr";
-- 
2.31.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 05/10] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths

2022-03-31 Thread Martin Storsjö


On Thu, 31 Mar 2022, Ben Avison wrote:


On 30/03/2022 13:35, Martin Storsjö wrote:

Overall, the code looks sensible to me.

Would it make sense to share the core of the filter between the 
horizontal/vertical cases with e.g. a macro? (I didn't check in detail if 
there's much differences in the core of the filter. At most some 
differences in condition registers for partial writeout in the horizontal 
forms?)


Well, looking at the comments at the right-hand side of the source, which 
give the logical meaning of the results of each instruction, I admit there's 
a resemblance in the middle of the 8-pixel-pair function.


Actually, I didn't try to follow/compare it to that level, I just assumed 
them to be similar.


However, the physical register assignments are quite different, and 
attempting to reassign the registers in one to match the other isn't a 
trivial task. It's hard enough when you start register assignment from 
the top of a function and work your way down, as I have done here.


In the 16-pixel-pair case, the fact that the input values arrive in a 
different order as the result of them, in one case, being loaded in 
regularly-increasing address order, and in the other, falling out of a matrix 
transposition, has resulted in even the logical order of instructions being 
quite different in the two cases.


In the 4-pixel-pair case, the values are packed differently into registers in 
the two cases, because in the v case, we're loading 4 pixels between 
row-strides, which means it's easy to place each row in its own vector, 
whereas in the h case we load 4 rows of 8 pixels each and transpose, which 
leaves the values in 4 vectors rather than 8. Some of the filtering steps can 
be performed with the data packed in this way (calculating a1 and a2) while 
waiting for it to be restructured in order to calculate the other metrics, 
but it's not worth packing the data together in this way in the v case given 
that it starts off already separated. So the two implementations end up quite 
different in the operations they perform, not just the scheduling of 
instructions and in register assignment terms.


Some background: as you may have guessed, I didn't start out writing these 
functions as they currently appear. Prototype versions didn't care much for 
scheduling or keeping to a small number of registers. They were primarily for 
checking the correctness of the mathematics, and they'd use all available 
vectors, sometimes shuffling values between registers or to the stack to make 
room. Once I'd verified correctness, I then reworked them to keep to a 
minimal number of registers and to minimise stalls as far as possible.


I'm targeting the Cortex-A72, since that's what the Raspberry Pi 4 uses and 
it's on the cusp of having enough power to decode VC-1 BluRay streams, so I 
deliberately didn't take too much consideration of the requirements of 
earlier cores. Yes, it's an out-of-order core, but I reckoned there are 
probably limits to how wisely it can select instructions to execute (there 
have got to be limits to instruction queue lengths, for example). So based on 
the pipeline structure documented in Arm's Cortex-A72 software opimization 
guide, I arranged the instructions to best keep all pipelines busy as much as 
possible, then assigned registers to keep the instructions in this order.


For the most part, I was able to keep the number of vectors used low enough 
that no callee-saving was required - or failing that, at least avoiding 
having to spill values to the stack mid-function. But it came pretty close at 
times - witness for example the peculiar order in which vectors had to be 
loaded in the AArch32 version of ff_vc1_h_loop_filter16_neon. There's reason 
behind that!


In short, I'd really rather not tamper with these larger assembly functions 
any more unless I really have to.


Ok, fair enough.

FWIW, my point of view was from implementing the loop filters for VP9 and 
AV1, where I did the core filter as one shared implementation for both 
variants, and where the frontend functions just load (and transpose) data 
into the registers used as input for the common core filter, and vice 
versa.


But I presume that a custom implementation for each of them can be more 
optimal, at the cost of more code to maintain (but if there are no bugs, 
it usually doesn't need maintainance either).


Thus - fair enough, this code probably is ok then.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 2/2] avcodec/vp9_raw_reorder_bsf: Merge close and flush

2022-03-31 Thread Andreas Rheinhardt

Also mark the function as av_cold while at it.

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/vp9_raw_reorder_bsf.c | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/libavcodec/vp9_raw_reorder_bsf.c b/libavcodec/vp9_raw_reorder_bsf.c
index 368dcb26c2..d36093316c 100644
--- a/libavcodec/vp9_raw_reorder_bsf.c
+++ b/libavcodec/vp9_raw_reorder_bsf.c
@@ -390,7 +390,7 @@ fail:
 return err;
 }
 
-static void vp9_raw_reorder_flush(AVBSFContext *bsf)
+static av_cold void vp9_raw_reorder_flush_close(AVBSFContext *bsf)
 {
 VP9RawReorderContext *ctx = bsf->priv_data;
 
@@ -400,16 +400,6 @@ static void vp9_raw_reorder_flush(AVBSFContext *bsf)
 ctx->sequence = 0;
 }
 
-static void vp9_raw_reorder_close(AVBSFContext *bsf)
-{
-VP9RawReorderContext *ctx = bsf->priv_data;
-int s;
-
-for (s = 0; s < FRAME_SLOTS; s++)
-vp9_raw_reorder_clear_slot(ctx, s);
-vp9_raw_reorder_frame_free(>next_frame);
-}
-
 static const enum AVCodecID vp9_raw_reorder_codec_ids[] = {
 AV_CODEC_ID_VP9, AV_CODEC_ID_NONE,
 };
@@ -418,7 +408,7 @@ const FFBitStreamFilter ff_vp9_raw_reorder_bsf = {
 .p.name = "vp9_raw_reorder",
 .p.codec_ids= vp9_raw_reorder_codec_ids,
 .priv_data_size = sizeof(VP9RawReorderContext),
-.close  = _raw_reorder_close,
-.flush  = _raw_reorder_flush,
 .filter = _raw_reorder_filter,
+.flush  = _raw_reorder_flush_close,
+.close  = _raw_reorder_flush_close,
 };
-- 
2.32.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/2] avcodec/vp9_raw_reorder_bsf: Fix leak of cached packet

2022-03-31 Thread Andreas Rheinhardt

In case the BSF has not been drained before flushing/closing,
the context's next_frame might be set; yet it is not freed
in flush or close. The former only zeroes it (which automatically
causes a leak in case it was set). So do this when closing
and flushing.

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/vp9_raw_reorder_bsf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavcodec/vp9_raw_reorder_bsf.c b/libavcodec/vp9_raw_reorder_bsf.c
index e7d301cb85..368dcb26c2 100644
--- a/libavcodec/vp9_raw_reorder_bsf.c
+++ b/libavcodec/vp9_raw_reorder_bsf.c
@@ -396,7 +396,7 @@ static void vp9_raw_reorder_flush(AVBSFContext *bsf)
 
 for (int s = 0; s < FRAME_SLOTS; s++)
 vp9_raw_reorder_clear_slot(ctx, s);
-ctx->next_frame = NULL;
+vp9_raw_reorder_frame_free(>next_frame);
 ctx->sequence = 0;
 }
 
@@ -407,6 +407,7 @@ static void vp9_raw_reorder_close(AVBSFContext *bsf)
 
 for (s = 0; s < FRAME_SLOTS; s++)
 vp9_raw_reorder_clear_slot(ctx, s);
+vp9_raw_reorder_frame_free(>next_frame);
 }
 
 static const enum AVCodecID vp9_raw_reorder_codec_ids[] = {
-- 
2.32.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] avcodec/libvpxenc: enable dynamic max quantizer parameter reconfiguration

2022-03-31 Thread James Zern

On Wed, Mar 30, 2022 at 11:25 AM Danil Chapovalov
 wrote:
>
> ---
>  doc/encoders.texi  | 3 +++
>  libavcodec/libvpxenc.c | 6 ++
>  2 files changed, 9 insertions(+)
>

lgtm. I'll submit this with a patch version bump soon if there aren't
any further comments.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v3 10/10] avcodec/vc1: Arm 32-bit NEON unescape fast path

2022-03-31 Thread Ben Avison

checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.

vc1dsp.vc1_unescape_buffer_c: 918624.7
vc1dsp.vc1_unescape_buffer_neon: 142958.0

Signed-off-by: Ben Avison 
---
 libavcodec/arm/vc1dsp_init_neon.c |  61 +++
 libavcodec/arm/vc1dsp_neon.S  | 118 ++
 2 files changed, 179 insertions(+)

diff --git a/libavcodec/arm/vc1dsp_init_neon.c 
b/libavcodec/arm/vc1dsp_init_neon.c
index f5f5c702d7..48cb816b70 100644
--- a/libavcodec/arm/vc1dsp_init_neon.c
+++ b/libavcodec/arm/vc1dsp_init_neon.c
@@ -19,6 +19,7 @@
 #include 
 
 #include "libavutil/attributes.h"
+#include "libavutil/intreadwrite.h"
 #include "libavcodec/vc1dsp.h"
 #include "vc1dsp.h"
 
@@ -84,6 +85,64 @@ void ff_put_vc1_chroma_mc4_neon(uint8_t *dst, uint8_t *src, 
ptrdiff_t stride,
 void ff_avg_vc1_chroma_mc4_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
 int h, int x, int y);
 
+int ff_vc1_unescape_buffer_helper_neon(const uint8_t *src, int size, uint8_t 
*dst);
+
+static int vc1_unescape_buffer_neon(const uint8_t *src, int size, uint8_t *dst)
+{
+/* Dealing with starting and stopping, and removing escape bytes, are
+ * comparatively less time-sensitive, so are more clearly expressed using
+ * a C wrapper around the assembly inner loop. Note that we assume a
+ * little-endian machine that supports unaligned loads. */
+int dsize = 0;
+while (size >= 4)
+{
+int found = 0;
+while (!found && (((uintptr_t) dst) & 7) && size >= 4)
+{
+found = (AV_RL32(src) &~ 0x0300) == 0x0003;
+if (!found)
+{
+*dst++ = *src++;
+--size;
+++dsize;
+}
+}
+if (!found)
+{
+int skip = size - ff_vc1_unescape_buffer_helper_neon(src, size, 
dst);
+dst += skip;
+src += skip;
+size -= skip;
+dsize += skip;
+while (!found && size >= 4)
+{
+found = (AV_RL32(src) &~ 0x0300) == 0x0003;
+if (!found)
+{
+*dst++ = *src++;
+--size;
+++dsize;
+}
+}
+}
+if (found)
+{
+*dst++ = *src++;
+*dst++ = *src++;
+++src;
+size -= 3;
+dsize += 2;
+}
+}
+while (size > 0)
+{
+*dst++ = *src++;
+--size;
+++dsize;
+}
+return dsize;
+}
+
 #define FN_ASSIGN(X, Y) \
 dsp->put_vc1_mspel_pixels_tab[0][X+4*Y] = 
ff_put_vc1_mspel_mc##X##Y##_16_neon; \
 dsp->put_vc1_mspel_pixels_tab[1][X+4*Y] = ff_put_vc1_mspel_mc##X##Y##_neon
@@ -130,4 +189,6 @@ av_cold void ff_vc1dsp_init_neon(VC1DSPContext *dsp)
 dsp->avg_no_rnd_vc1_chroma_pixels_tab[0] = ff_avg_vc1_chroma_mc8_neon;
 dsp->put_no_rnd_vc1_chroma_pixels_tab[1] = ff_put_vc1_chroma_mc4_neon;
 dsp->avg_no_rnd_vc1_chroma_pixels_tab[1] = ff_avg_vc1_chroma_mc4_neon;
+
+dsp->vc1_unescape_buffer = vc1_unescape_buffer_neon;
 }
diff --git a/libavcodec/arm/vc1dsp_neon.S b/libavcodec/arm/vc1dsp_neon.S
index ba54221ef6..96014fbebc 100644
--- a/libavcodec/arm/vc1dsp_neon.S
+++ b/libavcodec/arm/vc1dsp_neon.S
@@ -1804,3 +1804,121 @@ function ff_vc1_h_loop_filter16_neon, export=1
 4:  vpop{d8-d15}
 pop {r4-r6,pc}
 endfunc
+
+@ Copy at most the specified number of bytes from source to destination buffer,
+@ stopping at a multiple of 16 bytes, none of which are the start of an escape 
sequence
+@ On entry:
+@   r0 -> source buffer
+@   r1 = max number of bytes to copy
+@   r2 -> destination buffer, optimally 8-byte aligned
+@ On exit:
+@   r0 = number of bytes not copied
+function ff_vc1_unescape_buffer_helper_neon, export=1
+@ Offset by 48 to screen out cases that are too short for us to handle,
+@ and also make it easy to test for loop termination, or to determine
+@ whether we need an odd number of half-iterations of the loop.
+subsr1, r1, #48
+bmi 90f
+
+@ Set up useful constants
+vmov.i32q0, #0x300
+vmov.i32q1, #0x3
+
+tst r1, #16
+bne 1f
+
+  vld1.8  {q8, q9}, [r0]!
+  vbicq12, q8, q0
+  vext.8  q13, q8, q9, #1
+  vext.8  q14, q8, q9, #2
+  vext.8  q15, q8, q9, #3
+  veorq12, q12, q1
+  vbicq13, q13, q0
+  vbicq14, q14, q0
+  vbicq15, q15, q0
+  vceq.i32q12, q12, #0
+  veorq13, q13, q1
+  veorq14, q14, q1
+  veorq15, q15, q1
+  vceq.i32q13, q13, #0
+  vceq.i32q14, q14, #0
+  vceq.i32q15,

[FFmpeg-devel] [PATCH v3 09/10] avcodec/vc1: Arm 64-bit NEON unescape fast path

2022-03-31 Thread Ben Avison

checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.

vc1dsp.vc1_unescape_buffer_c: 655617.7
vc1dsp.vc1_unescape_buffer_neon: 118237.0

Signed-off-by: Ben Avison 
---
 libavcodec/aarch64/vc1dsp_init_aarch64.c |  61 
 libavcodec/aarch64/vc1dsp_neon.S | 176 +++
 2 files changed, 237 insertions(+)

diff --git a/libavcodec/aarch64/vc1dsp_init_aarch64.c 
b/libavcodec/aarch64/vc1dsp_init_aarch64.c
index e0eb52dd63..a7976fd596 100644
--- a/libavcodec/aarch64/vc1dsp_init_aarch64.c
+++ b/libavcodec/aarch64/vc1dsp_init_aarch64.c
@@ -21,6 +21,7 @@
 #include "libavutil/attributes.h"
 #include "libavutil/cpu.h"
 #include "libavutil/aarch64/cpu.h"
+#include "libavutil/intreadwrite.h"
 #include "libavcodec/vc1dsp.h"
 
 #include "config.h"
@@ -51,6 +52,64 @@ void ff_put_vc1_chroma_mc4_neon(uint8_t *dst, uint8_t *src, 
ptrdiff_t stride,
 void ff_avg_vc1_chroma_mc4_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
 int h, int x, int y);
 
+int ff_vc1_unescape_buffer_helper_neon(const uint8_t *src, int size, uint8_t 
*dst);
+
+static int vc1_unescape_buffer_neon(const uint8_t *src, int size, uint8_t *dst)
+{
+/* Dealing with starting and stopping, and removing escape bytes, are
+ * comparatively less time-sensitive, so are more clearly expressed using
+ * a C wrapper around the assembly inner loop. Note that we assume a
+ * little-endian machine that supports unaligned loads. */
+int dsize = 0;
+while (size >= 4)
+{
+int found = 0;
+while (!found && (((uintptr_t) dst) & 7) && size >= 4)
+{
+found = (AV_RL32(src) &~ 0x0300) == 0x0003;
+if (!found)
+{
+*dst++ = *src++;
+--size;
+++dsize;
+}
+}
+if (!found)
+{
+int skip = size - ff_vc1_unescape_buffer_helper_neon(src, size, 
dst);
+dst += skip;
+src += skip;
+size -= skip;
+dsize += skip;
+while (!found && size >= 4)
+{
+found = (AV_RL32(src) &~ 0x0300) == 0x0003;
+if (!found)
+{
+*dst++ = *src++;
+--size;
+++dsize;
+}
+}
+}
+if (found)
+{
+*dst++ = *src++;
+*dst++ = *src++;
+++src;
+size -= 3;
+dsize += 2;
+}
+}
+while (size > 0)
+{
+*dst++ = *src++;
+--size;
+++dsize;
+}
+return dsize;
+}
+
 av_cold void ff_vc1dsp_init_aarch64(VC1DSPContext *dsp)
 {
 int cpu_flags = av_get_cpu_flags();
@@ -76,5 +135,7 @@ av_cold void ff_vc1dsp_init_aarch64(VC1DSPContext *dsp)
 dsp->avg_no_rnd_vc1_chroma_pixels_tab[0] = ff_avg_vc1_chroma_mc8_neon;
 dsp->put_no_rnd_vc1_chroma_pixels_tab[1] = ff_put_vc1_chroma_mc4_neon;
 dsp->avg_no_rnd_vc1_chroma_pixels_tab[1] = ff_avg_vc1_chroma_mc4_neon;
+
+dsp->vc1_unescape_buffer = vc1_unescape_buffer_neon;
 }
 }
diff --git a/libavcodec/aarch64/vc1dsp_neon.S b/libavcodec/aarch64/vc1dsp_neon.S
index 0201db4f78..9a96c2523c 100644
--- a/libavcodec/aarch64/vc1dsp_neon.S
+++ b/libavcodec/aarch64/vc1dsp_neon.S
@@ -1368,3 +1368,179 @@ function ff_vc1_h_loop_filter16_neon, export=1
 st2 {v2.b, v3.b}[7], [x6]
 4:  ret
 endfunc
+
+// Copy at most the specified number of bytes from source to destination 
buffer,
+// stopping at a multiple of 32 bytes, none of which are the start of an 
escape sequence
+// On entry:
+//   x0 -> source buffer
+//   w1 = max number of bytes to copy
+//   x2 -> destination buffer, optimally 8-byte aligned
+// On exit:
+//   w0 = number of bytes not copied
+function ff_vc1_unescape_buffer_helper_neon, export=1
+// Offset by 80 to screen out cases that are too short for us to 
handle,
+// and also make it easy to test for loop termination, or to determine
+// whether we need an odd number of half-iterations of the loop.
+subsw1, w1, #80
+b.mi90f
+
+// Set up useful constants
+moviv20.4s, #3, lsl #24
+moviv21.4s, #3, lsl #16
+
+tst w1, #32
+b.ne1f
+
+  ld1 {v0.16b, v1.16b, v2.16b}, [x0], #48
+  ext v25.16b, v0.16b, v1.16b, #1
+  ext v26.16b, v0.16b, v1.16b, #2
+  ext v27.16b, v0.16b, v1.16b, #3
+  ext v29.16b, v1.16b, v2.16b, #1
+  ext v30.16b, v1.16b, v2.16b, #2
+  ext v31.16b, v1.16b, v2.16b, #3
+  bic v24.16b, v0.16b, v20.16b
+  bic v25.16b, v25.16b, v20.16b
+  bic v26.16b, v26.16b, v20.16b
+  bic

[FFmpeg-devel] [PATCH v3 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths

2022-03-31 Thread Ben Avison

checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.

idctdsp.add_pixels_clamped_c: 313.3
idctdsp.add_pixels_clamped_neon: 24.3
idctdsp.put_pixels_clamped_c: 220.3
idctdsp.put_pixels_clamped_neon: 15.5
idctdsp.put_signed_pixels_clamped_c: 210.5
idctdsp.put_signed_pixels_clamped_neon: 19.5

Signed-off-by: Ben Avison 
---
 libavcodec/aarch64/Makefile   |   3 +-
 libavcodec/aarch64/idctdsp_init_aarch64.c |  26 +++--
 libavcodec/aarch64/idctdsp_neon.S | 130 ++
 3 files changed, 150 insertions(+), 9 deletions(-)
 create mode 100644 libavcodec/aarch64/idctdsp_neon.S

diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index 5b25e4dfb9..c8935f205e 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -44,7 +44,8 @@ NEON-OBJS-$(CONFIG_H264PRED)+= 
aarch64/h264pred_neon.o
 NEON-OBJS-$(CONFIG_H264QPEL)+= aarch64/h264qpel_neon.o 
\
aarch64/hpeldsp_neon.o
 NEON-OBJS-$(CONFIG_HPELDSP) += aarch64/hpeldsp_neon.o
-NEON-OBJS-$(CONFIG_IDCTDSP) += aarch64/simple_idct_neon.o
+NEON-OBJS-$(CONFIG_IDCTDSP) += aarch64/idctdsp_neon.o  
\
+   aarch64/simple_idct_neon.o
 NEON-OBJS-$(CONFIG_MDCT)+= aarch64/mdct_neon.o
 NEON-OBJS-$(CONFIG_MPEGAUDIODSP)+= aarch64/mpegaudiodsp_neon.o
 NEON-OBJS-$(CONFIG_PIXBLOCKDSP) += aarch64/pixblockdsp_neon.o
diff --git a/libavcodec/aarch64/idctdsp_init_aarch64.c 
b/libavcodec/aarch64/idctdsp_init_aarch64.c
index 742a3372e3..eec21aa5a2 100644
--- a/libavcodec/aarch64/idctdsp_init_aarch64.c
+++ b/libavcodec/aarch64/idctdsp_init_aarch64.c
@@ -27,19 +27,29 @@
 #include "libavcodec/idctdsp.h"
 #include "idct.h"
 
+void ff_put_pixels_clamped_neon(const int16_t *, uint8_t *, ptrdiff_t);
+void ff_put_signed_pixels_clamped_neon(const int16_t *, uint8_t *, ptrdiff_t);
+void ff_add_pixels_clamped_neon(const int16_t *, uint8_t *, ptrdiff_t);
+
 av_cold void ff_idctdsp_init_aarch64(IDCTDSPContext *c, AVCodecContext *avctx,
  unsigned high_bit_depth)
 {
 int cpu_flags = av_get_cpu_flags();
 
-if (have_neon(cpu_flags) && !avctx->lowres && !high_bit_depth) {
-if (avctx->idct_algo == FF_IDCT_AUTO ||
-avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
-avctx->idct_algo == FF_IDCT_SIMPLENEON) {
-c->idct_put  = ff_simple_idct_put_neon;
-c->idct_add  = ff_simple_idct_add_neon;
-c->idct  = ff_simple_idct_neon;
-c->perm_type = FF_IDCT_PERM_PARTTRANS;
+if (have_neon(cpu_flags)) {
+if (!avctx->lowres && !high_bit_depth) {
+if (avctx->idct_algo == FF_IDCT_AUTO ||
+avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
+avctx->idct_algo == FF_IDCT_SIMPLENEON) {
+c->idct_put  = ff_simple_idct_put_neon;
+c->idct_add  = ff_simple_idct_add_neon;
+c->idct  = ff_simple_idct_neon;
+c->perm_type = FF_IDCT_PERM_PARTTRANS;
+}
 }
+
+c->add_pixels_clamped= ff_add_pixels_clamped_neon;
+c->put_pixels_clamped= ff_put_pixels_clamped_neon;
+c->put_signed_pixels_clamped = ff_put_signed_pixels_clamped_neon;
 }
 }
diff --git a/libavcodec/aarch64/idctdsp_neon.S 
b/libavcodec/aarch64/idctdsp_neon.S
new file mode 100644
index 00..7f47611206
--- /dev/null
+++ b/libavcodec/aarch64/idctdsp_neon.S
@@ -0,0 +1,130 @@
+/*
+ * IDCT AArch64 NEON optimisations
+ *
+ * Copyright (c) 2022 Ben Avison 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/aarch64/asm.S"
+
+// Clamp 16-bit signed block coefficients to unsigned 8-bit
+// On entry:
+//   x0 -> array of 64x 16-bit coefficients
+//   x1 -> 8-bit results
+//   x2 = row stride for results, bytes
+function ff_put_pixels_clamped_neon, export=1
+ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x0], #64
+ld1 {v4.16b, v5.16b, v6.16b, v7.16b}, [x0]
+sqxtun  v0.8b, v0.8h
+sqxtun  v1.8b, v1.8h
+sqxtun  v2.8b, v2.8h
+

[FFmpeg-devel] [PATCH v3 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-31 Thread Ben Avison

checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.

vc1dsp.vc1_inv_trans_4x4_c: 158.2
vc1dsp.vc1_inv_trans_4x4_neon: 65.7
vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5
vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5
vc1dsp.vc1_inv_trans_4x8_c: 335.2
vc1dsp.vc1_inv_trans_4x8_neon: 106.2
vc1dsp.vc1_inv_trans_4x8_dc_c: 151.2
vc1dsp.vc1_inv_trans_4x8_dc_neon: 25.5
vc1dsp.vc1_inv_trans_8x4_c: 365.7
vc1dsp.vc1_inv_trans_8x4_neon: 97.2
vc1dsp.vc1_inv_trans_8x4_dc_c: 139.7
vc1dsp.vc1_inv_trans_8x4_dc_neon: 16.5
vc1dsp.vc1_inv_trans_8x8_c: 547.7
vc1dsp.vc1_inv_trans_8x8_neon: 137.0
vc1dsp.vc1_inv_trans_8x8_dc_c: 268.2
vc1dsp.vc1_inv_trans_8x8_dc_neon: 30.5

Signed-off-by: Ben Avison 
---
 libavcodec/aarch64/vc1dsp_init_aarch64.c |  19 +
 libavcodec/aarch64/vc1dsp_neon.S | 678 +++
 2 files changed, 697 insertions(+)

diff --git a/libavcodec/aarch64/vc1dsp_init_aarch64.c 
b/libavcodec/aarch64/vc1dsp_init_aarch64.c
index 8f96e4802d..e0eb52dd63 100644
--- a/libavcodec/aarch64/vc1dsp_init_aarch64.c
+++ b/libavcodec/aarch64/vc1dsp_init_aarch64.c
@@ -25,6 +25,16 @@
 
 #include "config.h"
 
+void ff_vc1_inv_trans_8x8_neon(int16_t *block);
+void ff_vc1_inv_trans_8x4_neon(uint8_t *dest, ptrdiff_t stride, int16_t 
*block);
+void ff_vc1_inv_trans_4x8_neon(uint8_t *dest, ptrdiff_t stride, int16_t 
*block);
+void ff_vc1_inv_trans_4x4_neon(uint8_t *dest, ptrdiff_t stride, int16_t 
*block);
+
+void ff_vc1_inv_trans_8x8_dc_neon(uint8_t *dest, ptrdiff_t stride, int16_t 
*block);
+void ff_vc1_inv_trans_8x4_dc_neon(uint8_t *dest, ptrdiff_t stride, int16_t 
*block);
+void ff_vc1_inv_trans_4x8_dc_neon(uint8_t *dest, ptrdiff_t stride, int16_t 
*block);
+void ff_vc1_inv_trans_4x4_dc_neon(uint8_t *dest, ptrdiff_t stride, int16_t 
*block);
+
 void ff_vc1_v_loop_filter4_neon(uint8_t *src, ptrdiff_t stride, int pq);
 void ff_vc1_h_loop_filter4_neon(uint8_t *src, ptrdiff_t stride, int pq);
 void ff_vc1_v_loop_filter8_neon(uint8_t *src, ptrdiff_t stride, int pq);
@@ -46,6 +56,15 @@ av_cold void ff_vc1dsp_init_aarch64(VC1DSPContext *dsp)
 int cpu_flags = av_get_cpu_flags();
 
 if (have_neon(cpu_flags)) {
+dsp->vc1_inv_trans_8x8 = ff_vc1_inv_trans_8x8_neon;
+dsp->vc1_inv_trans_8x4 = ff_vc1_inv_trans_8x4_neon;
+dsp->vc1_inv_trans_4x8 = ff_vc1_inv_trans_4x8_neon;
+dsp->vc1_inv_trans_4x4 = ff_vc1_inv_trans_4x4_neon;
+dsp->vc1_inv_trans_8x8_dc = ff_vc1_inv_trans_8x8_dc_neon;
+dsp->vc1_inv_trans_8x4_dc = ff_vc1_inv_trans_8x4_dc_neon;
+dsp->vc1_inv_trans_4x8_dc = ff_vc1_inv_trans_4x8_dc_neon;
+dsp->vc1_inv_trans_4x4_dc = ff_vc1_inv_trans_4x4_dc_neon;
+
 dsp->vc1_v_loop_filter4  = ff_vc1_v_loop_filter4_neon;
 dsp->vc1_h_loop_filter4  = ff_vc1_h_loop_filter4_neon;
 dsp->vc1_v_loop_filter8  = ff_vc1_v_loop_filter8_neon;
diff --git a/libavcodec/aarch64/vc1dsp_neon.S b/libavcodec/aarch64/vc1dsp_neon.S
index 1ea9fa75ff..0201db4f78 100644
--- a/libavcodec/aarch64/vc1dsp_neon.S
+++ b/libavcodec/aarch64/vc1dsp_neon.S
@@ -22,7 +22,685 @@
 
 #include "libavutil/aarch64/asm.S"
 
+// VC-1 8x8 inverse transform
+// On entry:
+//   x0 -> array of 16-bit inverse transform coefficients, in column-major 
order
+// On exit:
+//   array at x0 updated to hold transformed block; also now held in row-major 
order
+function ff_vc1_inv_trans_8x8_neon, export=1
+ld1 {v1.16b, v2.16b}, [x0], #32
+ld1 {v3.16b, v4.16b}, [x0], #32
+ld1 {v5.16b, v6.16b}, [x0], #32
+shl v1.8h, v1.8h, #2// 8/2 * src[0]
+sub x1, x0, #3*32
+ld1 {v16.16b, v17.16b}, [x0]
+shl v7.8h, v2.8h, #4//  16 * src[8]
+shl v18.8h, v2.8h, #2   //   4 * src[8]
+shl v19.8h, v4.8h, #4   //16 * 
src[24]
+ldr d0, .Lcoeffs_it8
+shl v5.8h, v5.8h, #2// 
 8/2 * src[32]
+shl v20.8h, v6.8h, #4   // 
  16 * src[40]
+shl v21.8h, v6.8h, #2   // 
   4 * src[40]
+shl v22.8h, v17.8h, #4  // 
 16 * src[56]
+ssrav20.8h, v19.8h, #2  // 4 * 
src[24] + 16 * src[40]
+mul v23.8h, v3.8h, v0.h[0]  //   6/2 * 
src[16]
+sub v19.8h, v19.8h, v21.8h  //16 * 
src[24] -  4 * src[40]
+ssrav7.8h, v22.8h, #2   //  16 * src[8]
   +  4 * src[56]
+sub v18.8h, v22.8h, v18.8h  //-  4 * src[8]
   + 16 * src[56]
+shl v3.8h, v3.8h, #3//

[FFmpeg-devel] [PATCH v3 04/10] avcodec/vc1: Introduce fast path for unescaping bitstream buffer

2022-03-31 Thread Ben Avison

Includes a checkasm test.

Signed-off-by: Ben Avison 
---
 libavcodec/vc1dec.c | 20 ++--
 libavcodec/vc1dsp.c |  2 ++
 libavcodec/vc1dsp.h |  3 ++
 tests/checkasm/vc1dsp.c | 67 +
 4 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/libavcodec/vc1dec.c b/libavcodec/vc1dec.c
index e279ffd1c1..0426e8a752 100644
--- a/libavcodec/vc1dec.c
+++ b/libavcodec/vc1dec.c
@@ -491,7 +491,7 @@ static av_cold int vc1_decode_init(AVCodecContext *avctx)
 size = next - start - 4;
 if (size <= 0)
 continue;
-buf2_size = vc1_unescape_buffer(start + 4, size, buf2);
+buf2_size = v->vc1dsp.vc1_unescape_buffer(start + 4, size, buf2);
 init_get_bits(, buf2, buf2_size * 8);
 switch (AV_RB32(start)) {
 case VC1_CODE_SEQHDR:
@@ -681,7 +681,7 @@ static int vc1_decode_frame(AVCodecContext *avctx, void 
*data,
 case VC1_CODE_FRAME:
 if (avctx->hwaccel)
 buf_start = start;
-buf_size2 = vc1_unescape_buffer(start + 4, size, buf2);
+buf_size2 = v->vc1dsp.vc1_unescape_buffer(start + 4, size, 
buf2);
 break;
 case VC1_CODE_FIELD: {
 int buf_size3;
@@ -698,8 +698,8 @@ static int vc1_decode_frame(AVCodecContext *avctx, void 
*data,
 ret = AVERROR(ENOMEM);
 goto err;
 }
-buf_size3 = vc1_unescape_buffer(start + 4, size,
-slices[n_slices].buf);
+buf_size3 = v->vc1dsp.vc1_unescape_buffer(start + 4, size,
+  
slices[n_slices].buf);
 init_get_bits([n_slices].gb, slices[n_slices].buf,
   buf_size3 << 3);
 slices[n_slices].mby_start = avctx->coded_height + 31 >> 5;
@@ -710,7 +710,7 @@ static int vc1_decode_frame(AVCodecContext *avctx, void 
*data,
 break;
 }
 case VC1_CODE_ENTRYPOINT: /* it should be before frame data */
-buf_size2 = vc1_unescape_buffer(start + 4, size, buf2);
+buf_size2 = v->vc1dsp.vc1_unescape_buffer(start + 4, size, 
buf2);
 init_get_bits(>gb, buf2, buf_size2 * 8);
 ff_vc1_decode_entry_point(avctx, v, >gb);
 break;
@@ -727,8 +727,8 @@ static int vc1_decode_frame(AVCodecContext *avctx, void 
*data,
 ret = AVERROR(ENOMEM);
 goto err;
 }
-buf_size3 = vc1_unescape_buffer(start + 4, size,
-slices[n_slices].buf);
+buf_size3 = v->vc1dsp.vc1_unescape_buffer(start + 4, size,
+  
slices[n_slices].buf);
 init_get_bits([n_slices].gb, slices[n_slices].buf,
   buf_size3 << 3);
 slices[n_slices].mby_start = 
get_bits([n_slices].gb, 9);
@@ -762,7 +762,7 @@ static int vc1_decode_frame(AVCodecContext *avctx, void 
*data,
 ret = AVERROR(ENOMEM);
 goto err;
 }
-buf_size3 = vc1_unescape_buffer(divider + 4, buf + buf_size - 
divider - 4, slices[n_slices].buf);
+buf_size3 = v->vc1dsp.vc1_unescape_buffer(divider + 4, buf + 
buf_size - divider - 4, slices[n_slices].buf);
 init_get_bits([n_slices].gb, slices[n_slices].buf,
   buf_size3 << 3);
 slices[n_slices].mby_start = s->mb_height + 1 >> 1;
@@ -771,9 +771,9 @@ static int vc1_decode_frame(AVCodecContext *avctx, void 
*data,
 n_slices1 = n_slices - 1;
 n_slices++;
 }
-buf_size2 = vc1_unescape_buffer(buf, divider - buf, buf2);
+buf_size2 = v->vc1dsp.vc1_unescape_buffer(buf, divider - buf, 
buf2);
 } else {
-buf_size2 = vc1_unescape_buffer(buf, buf_size, buf2);
+buf_size2 = v->vc1dsp.vc1_unescape_buffer(buf, buf_size, buf2);
 }
 init_get_bits(>gb, buf2, buf_size2*8);
 } else{
diff --git a/libavcodec/vc1dsp.c b/libavcodec/vc1dsp.c
index f651d7d461..f1b7bb2397 100644
--- a/libavcodec/vc1dsp.c
+++ b/libavcodec/vc1dsp.c
@@ -34,6 +34,7 @@
 #include "rnd_avg.h"
 #include "vc1dsp.h"
 #include "startcode.h"
+#include "vc1_common.h"
 
 /* Apply overlap transform to horizontal edge */
 static void vc1_v_overlap_c(uint8_t *src, ptrdiff_t stride)
@@ -1030,6 +1031,7 @@ av_cold void ff_vc1dsp_init(VC1DSPContext *dsp)
 #endif /* CONFIG_WMV3IMAGE_DECODER || CONFIG_VC1IMAGE_DECODER */

[FFmpeg-devel] [PATCH v3 06/10] avcodec/vc1: Arm 32-bit NEON deblocking filter fast paths

2022-03-31 Thread Ben Avison

checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C
version can still outperform the NEON version in specific cases. The balance
between different code paths is stream-dependent, but in practice the best
case happens about 5% of the time, the worst case happens about 40% of the
time, and the complexity of the remaining cases fall somewhere in between.
Therefore, taking the average of the best and worst case timings is
probably a conservative estimate of the degree by which the NEON code
improves performance.

vc1dsp.vc1_h_loop_filter4_bestcase_c: 19.0
vc1dsp.vc1_h_loop_filter4_bestcase_neon: 48.5
vc1dsp.vc1_h_loop_filter4_worstcase_c: 144.7
vc1dsp.vc1_h_loop_filter4_worstcase_neon: 76.2
vc1dsp.vc1_h_loop_filter8_bestcase_c: 41.0
vc1dsp.vc1_h_loop_filter8_bestcase_neon: 75.0
vc1dsp.vc1_h_loop_filter8_worstcase_c: 294.0
vc1dsp.vc1_h_loop_filter8_worstcase_neon: 102.7
vc1dsp.vc1_h_loop_filter16_bestcase_c: 54.7
vc1dsp.vc1_h_loop_filter16_bestcase_neon: 130.0
vc1dsp.vc1_h_loop_filter16_worstcase_c: 569.7
vc1dsp.vc1_h_loop_filter16_worstcase_neon: 186.7
vc1dsp.vc1_v_loop_filter4_bestcase_c: 20.2
vc1dsp.vc1_v_loop_filter4_bestcase_neon: 47.2
vc1dsp.vc1_v_loop_filter4_worstcase_c: 164.2
vc1dsp.vc1_v_loop_filter4_worstcase_neon: 68.5
vc1dsp.vc1_v_loop_filter8_bestcase_c: 43.5
vc1dsp.vc1_v_loop_filter8_bestcase_neon: 55.2
vc1dsp.vc1_v_loop_filter8_worstcase_c: 316.2
vc1dsp.vc1_v_loop_filter8_worstcase_neon: 72.7
vc1dsp.vc1_v_loop_filter16_bestcase_c: 62.2
vc1dsp.vc1_v_loop_filter16_bestcase_neon: 103.7
vc1dsp.vc1_v_loop_filter16_worstcase_c: 646.5
vc1dsp.vc1_v_loop_filter16_worstcase_neon: 110.7

Signed-off-by: Ben Avison 
---
 libavcodec/arm/vc1dsp_init_neon.c |  14 +
 libavcodec/arm/vc1dsp_neon.S  | 643 ++
 2 files changed, 657 insertions(+)

diff --git a/libavcodec/arm/vc1dsp_init_neon.c 
b/libavcodec/arm/vc1dsp_init_neon.c
index 2cca784f5a..f5f5c702d7 100644
--- a/libavcodec/arm/vc1dsp_init_neon.c
+++ b/libavcodec/arm/vc1dsp_init_neon.c
@@ -32,6 +32,13 @@ void ff_vc1_inv_trans_4x8_dc_neon(uint8_t *dest, ptrdiff_t 
stride, int16_t *bloc
 void ff_vc1_inv_trans_8x4_dc_neon(uint8_t *dest, ptrdiff_t stride, int16_t 
*block);
 void ff_vc1_inv_trans_4x4_dc_neon(uint8_t *dest, ptrdiff_t stride, int16_t 
*block);
 
+void ff_vc1_v_loop_filter4_neon(uint8_t *src, int stride, int pq);
+void ff_vc1_h_loop_filter4_neon(uint8_t *src, int stride, int pq);
+void ff_vc1_v_loop_filter8_neon(uint8_t *src, int stride, int pq);
+void ff_vc1_h_loop_filter8_neon(uint8_t *src, int stride, int pq);
+void ff_vc1_v_loop_filter16_neon(uint8_t *src, int stride, int pq);
+void ff_vc1_h_loop_filter16_neon(uint8_t *src, int stride, int pq);
+
 void ff_put_pixels8x8_neon(uint8_t *block, const uint8_t *pixels,
ptrdiff_t line_size, int rnd);
 
@@ -92,6 +99,13 @@ av_cold void ff_vc1dsp_init_neon(VC1DSPContext *dsp)
 dsp->vc1_inv_trans_8x4_dc = ff_vc1_inv_trans_8x4_dc_neon;
 dsp->vc1_inv_trans_4x4_dc = ff_vc1_inv_trans_4x4_dc_neon;
 
+dsp->vc1_v_loop_filter4  = ff_vc1_v_loop_filter4_neon;
+dsp->vc1_h_loop_filter4  = ff_vc1_h_loop_filter4_neon;
+dsp->vc1_v_loop_filter8  = ff_vc1_v_loop_filter8_neon;
+dsp->vc1_h_loop_filter8  = ff_vc1_h_loop_filter8_neon;
+dsp->vc1_v_loop_filter16 = ff_vc1_v_loop_filter16_neon;
+dsp->vc1_h_loop_filter16 = ff_vc1_h_loop_filter16_neon;
+
 dsp->put_vc1_mspel_pixels_tab[1][ 0] = ff_put_pixels8x8_neon;
 FN_ASSIGN(1, 0);
 FN_ASSIGN(2, 0);
diff --git a/libavcodec/arm/vc1dsp_neon.S b/libavcodec/arm/vc1dsp_neon.S
index 93f043bf08..ba54221ef6 100644
--- a/libavcodec/arm/vc1dsp_neon.S
+++ b/libavcodec/arm/vc1dsp_neon.S
@@ -1161,3 +1161,646 @@ function ff_vc1_inv_trans_4x4_dc_neon, export=1
 vst1.32 {d1[1]},  [r0,:32]
 bx  lr
 endfunc
+
+@ VC-1 in-loop deblocking filter for 4 pixel pairs at boundary of 
vertically-neighbouring blocks
+@ On entry:
+@   r0 -> top-left pel of lower block
+@   r1 = row stride, bytes
+@   r2 = PQUANT bitstream parameter
+function ff_vc1_v_loop_filter4_neon, export=1
+sub r3, r0, r1, lsl #2
+vldrd0, .Lcoeffs
+vld1.32 {d1[0]}, [r0], r1   @ P5
+vld1.32 {d2[0]}, [r3], r1   @ P1
+vld1.32 {d3[0]}, [r3], r1   @ P2
+vld1.32 {d4[0]}, [r0], r1   @ P6
+vld1.32 {d5[0]}, [r3], r1   @ P3
+vld1.32 {d6[0]}, [r0], r1   @ P7
+vld1.32 {d7[0]}, [r3]   @ P4
+vld1.32 {d16[0]}, [r0]  @ P8
+vshll.u8q9, d1, #1  @ 2*P5
+vdup.16 d17, r2 @ pq
+vshll.u8q10, d2, #1 @ 2*P1
+vmovl.u8q11, d3 @ P2
+vmovl.u8q1, d4  @ P6
+vmovl.u8q12, d5 @ P3
+vmls.i16d20, d22, d0[1]

[FFmpeg-devel] [PATCH v3 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests

2022-03-31 Thread Ben Avison

Signed-off-by: Ben Avison 
---
 tests/checkasm/Makefile   |  1 +
 tests/checkasm/checkasm.c |  3 ++
 tests/checkasm/checkasm.h |  1 +
 tests/checkasm/idctdsp.c  | 98 +++
 tests/fate/checkasm.mak   |  1 +
 5 files changed, 104 insertions(+)
 create mode 100644 tests/checkasm/idctdsp.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 7133a6ee66..f6b1008855 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -9,6 +9,7 @@ AVCODECOBJS-$(CONFIG_G722DSP)   += g722dsp.o
 AVCODECOBJS-$(CONFIG_H264DSP)   += h264dsp.o
 AVCODECOBJS-$(CONFIG_H264PRED)  += h264pred.o
 AVCODECOBJS-$(CONFIG_H264QPEL)  += h264qpel.o
+AVCODECOBJS-$(CONFIG_IDCTDSP)   += idctdsp.o
 AVCODECOBJS-$(CONFIG_LLVIDDSP)  += llviddsp.o
 AVCODECOBJS-$(CONFIG_LLVIDENCDSP)   += llviddspenc.o
 AVCODECOBJS-$(CONFIG_VC1DSP)+= vc1dsp.o
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index c2efd81b6d..57134f96ea 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -123,6 +123,9 @@ static const struct {
 #if CONFIG_HUFFYUV_DECODER
 { "huffyuvdsp", checkasm_check_huffyuvdsp },
 #endif
+#if CONFIG_IDCTDSP
+{ "idctdsp", checkasm_check_idctdsp },
+#endif
 #if CONFIG_JPEG2000_DECODER
 { "jpeg2000dsp", checkasm_check_jpeg2000dsp },
 #endif
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 52ab18a5b1..a86db140e3 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -64,6 +64,7 @@ void checkasm_check_hevc_idct(void);
 void checkasm_check_hevc_pel(void);
 void checkasm_check_hevc_sao(void);
 void checkasm_check_huffyuvdsp(void);
+void checkasm_check_idctdsp(void);
 void checkasm_check_jpeg2000dsp(void);
 void checkasm_check_llviddsp(void);
 void checkasm_check_llviddspenc(void);
diff --git a/tests/checkasm/idctdsp.c b/tests/checkasm/idctdsp.c
new file mode 100644
index 00..02724536a7
--- /dev/null
+++ b/tests/checkasm/idctdsp.c
@@ -0,0 +1,98 @@
+/*
+ * Copyright (c) 2022 Ben Avison
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+
+#include "checkasm.h"
+
+#include "libavcodec/idctdsp.h"
+
+#include "libavutil/common.h"
+#include "libavutil/internal.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/mem_internal.h"
+
+#define IDCTDSP_TEST(func) { #func, offsetof(IDCTDSPContext, func) },
+
+typedef struct {
+const char *name;
+size_t offset;
+} test;
+
+#define RANDOMIZE_BUFFER16(name, size)  \
+do {\
+int i;  \
+for (i = 0; i < size; ++i) {\
+uint16_t r = rnd() % 0x201 - 0x100; \
+AV_WN16A(name##0 + i, r);   \
+AV_WN16A(name##1 + i, r);   \
+}   \
+} while (0)
+
+#define RANDOMIZE_BUFFER8(name, size) \
+do {  \
+int i;\
+for (i = 0; i < size; ++i) {  \
+uint8_t r = rnd();\
+name##0[i] = r;   \
+name##1[i] = r;   \
+} \
+} while (0)
+
+static void check_add_put_clamped(void)
+{
+/* Source buffers are only as big as needed, since any over-read won't 
affect results */
+LOCAL_ALIGNED_16(int16_t, src0, [64]);
+LOCAL_ALIGNED_16(int16_t, src1, [64]);
+/* Destination buffers have borders of one row above/below and 8 columns 
left/right to catch overflows */
+LOCAL_ALIGNED_8(uint8_t, dst0, [10 * 24]);
+LOCAL_ALIGNED_8(uint8_t, dst1, [10 * 24]);
+
+AVCodecContext avctx = { 0 };
+IDCTDSPContext h;
+
+const test tests[] = {
+IDCTDSP_TEST(add_pixels_clamped)
+IDCTDSP_TEST(put_pixels_clamped)
+IDCTDSP_TEST(put_signed_pixels_clamped)
+};
+
+ff_idctdsp_init(, );
+
+for (size_t t = 0; t < FF_ARRAY_ELEMS(tests); ++t) {
+void (*func)(const int16_t *, uint8_t * ptrdiff_t) = *(void 
**)((intptr_t)  + tests[t].offset);
+if (check_func(func, "idctdsp.%s",

[FFmpeg-devel] [PATCH v3 05/10] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths

2022-03-31 Thread Ben Avison

checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C
version can still outperform the NEON version in specific cases. The balance
between different code paths is stream-dependent, but in practice the best
case happens about 5% of the time, the worst case happens about 40% of the
time, and the complexity of the remaining cases fall somewhere in between.
Therefore, taking the average of the best and worst case timings is
probably a conservative estimate of the degree by which the NEON code
improves performance.

vc1dsp.vc1_h_loop_filter4_bestcase_c: 10.7
vc1dsp.vc1_h_loop_filter4_bestcase_neon: 43.5
vc1dsp.vc1_h_loop_filter4_worstcase_c: 184.5
vc1dsp.vc1_h_loop_filter4_worstcase_neon: 73.7
vc1dsp.vc1_h_loop_filter8_bestcase_c: 31.2
vc1dsp.vc1_h_loop_filter8_bestcase_neon: 62.2
vc1dsp.vc1_h_loop_filter8_worstcase_c: 358.2
vc1dsp.vc1_h_loop_filter8_worstcase_neon: 88.2
vc1dsp.vc1_h_loop_filter16_bestcase_c: 51.0
vc1dsp.vc1_h_loop_filter16_bestcase_neon: 107.7
vc1dsp.vc1_h_loop_filter16_worstcase_c: 722.7
vc1dsp.vc1_h_loop_filter16_worstcase_neon: 140.5
vc1dsp.vc1_v_loop_filter4_bestcase_c: 9.7
vc1dsp.vc1_v_loop_filter4_bestcase_neon: 43.0
vc1dsp.vc1_v_loop_filter4_worstcase_c: 178.7
vc1dsp.vc1_v_loop_filter4_worstcase_neon: 69.0
vc1dsp.vc1_v_loop_filter8_bestcase_c: 30.2
vc1dsp.vc1_v_loop_filter8_bestcase_neon: 50.7
vc1dsp.vc1_v_loop_filter8_worstcase_c: 353.0
vc1dsp.vc1_v_loop_filter8_worstcase_neon: 69.2
vc1dsp.vc1_v_loop_filter16_bestcase_c: 60.0
vc1dsp.vc1_v_loop_filter16_bestcase_neon: 90.0
vc1dsp.vc1_v_loop_filter16_worstcase_c: 714.2
vc1dsp.vc1_v_loop_filter16_worstcase_neon: 97.2

Signed-off-by: Ben Avison 
---
 libavcodec/aarch64/Makefile  |   1 +
 libavcodec/aarch64/vc1dsp_init_aarch64.c |  14 +
 libavcodec/aarch64/vc1dsp_neon.S | 692 +++
 3 files changed, 707 insertions(+)
 create mode 100644 libavcodec/aarch64/vc1dsp_neon.S

diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index 954461f81d..5b25e4dfb9 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -48,6 +48,7 @@ NEON-OBJS-$(CONFIG_IDCTDSP) += 
aarch64/simple_idct_neon.o
 NEON-OBJS-$(CONFIG_MDCT)+= aarch64/mdct_neon.o
 NEON-OBJS-$(CONFIG_MPEGAUDIODSP)+= aarch64/mpegaudiodsp_neon.o
 NEON-OBJS-$(CONFIG_PIXBLOCKDSP) += aarch64/pixblockdsp_neon.o
+NEON-OBJS-$(CONFIG_VC1DSP)  += aarch64/vc1dsp_neon.o
 NEON-OBJS-$(CONFIG_VP8DSP)  += aarch64/vp8dsp_neon.o
 
 # decoders/encoders
diff --git a/libavcodec/aarch64/vc1dsp_init_aarch64.c 
b/libavcodec/aarch64/vc1dsp_init_aarch64.c
index 13dfd74940..8f96e4802d 100644
--- a/libavcodec/aarch64/vc1dsp_init_aarch64.c
+++ b/libavcodec/aarch64/vc1dsp_init_aarch64.c
@@ -25,6 +25,13 @@
 
 #include "config.h"
 
+void ff_vc1_v_loop_filter4_neon(uint8_t *src, ptrdiff_t stride, int pq);
+void ff_vc1_h_loop_filter4_neon(uint8_t *src, ptrdiff_t stride, int pq);
+void ff_vc1_v_loop_filter8_neon(uint8_t *src, ptrdiff_t stride, int pq);
+void ff_vc1_h_loop_filter8_neon(uint8_t *src, ptrdiff_t stride, int pq);
+void ff_vc1_v_loop_filter16_neon(uint8_t *src, ptrdiff_t stride, int pq);
+void ff_vc1_h_loop_filter16_neon(uint8_t *src, ptrdiff_t stride, int pq);
+
 void ff_put_vc1_chroma_mc8_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
 int h, int x, int y);
 void ff_avg_vc1_chroma_mc8_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride,
@@ -39,6 +46,13 @@ av_cold void ff_vc1dsp_init_aarch64(VC1DSPContext *dsp)
 int cpu_flags = av_get_cpu_flags();
 
 if (have_neon(cpu_flags)) {
+dsp->vc1_v_loop_filter4  = ff_vc1_v_loop_filter4_neon;
+dsp->vc1_h_loop_filter4  = ff_vc1_h_loop_filter4_neon;
+dsp->vc1_v_loop_filter8  = ff_vc1_v_loop_filter8_neon;
+dsp->vc1_h_loop_filter8  = ff_vc1_h_loop_filter8_neon;
+dsp->vc1_v_loop_filter16 = ff_vc1_v_loop_filter16_neon;
+dsp->vc1_h_loop_filter16 = ff_vc1_h_loop_filter16_neon;
+
 dsp->put_no_rnd_vc1_chroma_pixels_tab[0] = ff_put_vc1_chroma_mc8_neon;
 dsp->avg_no_rnd_vc1_chroma_pixels_tab[0] = ff_avg_vc1_chroma_mc8_neon;
 dsp->put_no_rnd_vc1_chroma_pixels_tab[1] = ff_put_vc1_chroma_mc4_neon;
diff --git a/libavcodec/aarch64/vc1dsp_neon.S b/libavcodec/aarch64/vc1dsp_neon.S
new file mode 100644
index 00..1ea9fa75ff
--- /dev/null
+++ b/libavcodec/aarch64/vc1dsp_neon.S
@@ -0,0 +1,692 @@
+/*
+ * VC1 AArch64 NEON optimisations
+ *
+ * Copyright (c) 2022 Ben Avison 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY

[FFmpeg-devel] [PATCH v3 02/10] checkasm: Add vc1dsp inverse transform tests

2022-03-31 Thread Ben Avison

This test deliberately doesn't exercise the full range of inputs described in
the committee draft VC-1 standard. It says:

input coefficients in frequency domain, D, satisfy   -2048 <= D < 2047
intermediate coefficients, E, satisfy-4096 <= E < 4095
fully inverse-transformed coefficients, R, satisfy-512 <= R <  511

For one thing, the inequalities look odd. Did they mean them to go the
other way round? That would make more sense because the equations generally
both add and subtract coefficients multiplied by constants, including powers
of 2. Requiring the most-negative values to be valid extends the number of
bits to represent the intermediate values just for the sake of that one case!

For another thing, the extreme values don't look to occur in real streams -
both in my experience and supported by the following comment in the AArch32
decoder:

tNhalf is half of the value of tN (as described in vc1_inv_trans_8x8_c).
This is done because sometimes files have input that causes tN + tM to
overflow. To avoid this overflow, we compute tNhalf, then compute
tNhalf + tM (which doesn't overflow), and then we use vhadd to compute
(tNhalf + (tNhalf + tM)) >> 1 which does not overflow because it is
one instruction.

My AArch64 decoder goes further than this. It calculates tNhalf and tM
then does an SRA (essentially a fused halve and add) to compute
(tN + tM) >> 1 without ever having to hold (tNhalf + tM) in a 16-bit element
without overflowing. It only encounters difficulties if either tNhalf or
tM overflow in isolation.

I haven't had sight of the final standard, so it's possible that these
issues were dealt with during finalisation, which could explain the lack
of usage of extreme inputs in real streams. Or a preponderance of decoders
that only support 16-bit intermediate values in their inverse transforms
might have caused encoders to steer clear of such cases.

I have effectively followed this approach in the test, and limited the
scale of the coefficients sufficient that both the existing AArch32 decoder
and my new AArch64 decoder both pass.

Signed-off-by: Ben Avison 
---
 tests/checkasm/vc1dsp.c | 283 
 1 file changed, 283 insertions(+)

diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c
index 2fd6c74d6c..7d4457306f 100644
--- a/tests/checkasm/vc1dsp.c
+++ b/tests/checkasm/vc1dsp.c
@@ -30,12 +30,208 @@
 #include "libavutil/mem_internal.h"
 
 #define VC1DSP_TEST(func) { #func, offsetof(VC1DSPContext, func) },
+#define VC1DSP_SIZED_TEST(func, width, height) { #func, 
offsetof(VC1DSPContext, func), width, height },
 
 typedef struct {
 const char *name;
 size_t offset;
+int width;
+int height;
 } test;
 
+typedef struct matrix {
+size_t width;
+size_t height;
+float d[];
+} matrix;
+
+static const matrix T8 = { 8, 8, {
+12,  12,  12,  12,  12,  12,  12,  12,
+16,  15,   9,   4,  -4,  -9, -15, -16,
+16,   6,  -6, -16, -16,  -6,   6,  16,
+15,  -4, -16,  -9,   9,  16,   4, -15,
+12, -12, -12,  12,  12, -12, -12,  12,
+ 9, -16,   4,  15, -15,  -4,  16,  -9,
+ 6, -16,  16,  -6,  -6,  16, -16,   6,
+ 4,  -9,  15, -16,  16, -15,   9,  -4
+} };
+
+static const matrix T4 = { 4, 4, {
+17,  17,  17,  17,
+22,  10, -10, -22,
+17, -17, -17,  17,
+10, -22,  22, -10
+} };
+
+static const matrix T8t = { 8, 8, {
+12,  16,  16,  15,  12,   9,   6,   4,
+12,  15,   6,  -4, -12, -16, -16,  -9,
+12,   9,  -6, -16, -12,   4,  16,  15,
+12,   4, -16,  -9,  12,  15,  -6, -16,
+12,  -4, -16,   9,  12, -15,  -6,  16,
+12,  -9,  -6,  16, -12,  -4,  16, -15,
+12, -15,   6,   4, -12,  16, -16,   9,
+12, -16,  16, -15,  12,  -9,   6,  -4
+} };
+
+static const matrix T4t = { 4, 4, {
+17,  22,  17,  10,
+17,  10, -17, -22,
+17, -10, -17,  22,
+17, -22,  17, -10
+} };
+
+static matrix *new_matrix(size_t width, size_t height)
+{
+matrix *out = av_mallocz(sizeof (matrix) + height * width * sizeof 
(float));
+if (out == NULL) {
+fprintf(stderr, "Memory allocation failure\n");
+exit(EXIT_FAILURE);
+}
+out->width = width;
+out->height = height;
+return out;
+}
+
+static matrix *multiply(const matrix *a, const matrix *b)
+{
+matrix *out;
+if (a->width != b->height) {
+fprintf(stderr, "Incompatible multiplication\n");
+exit(EXIT_FAILURE);
+}
+out = new_matrix(b->width, a->height);
+for (int j = 0; j < out->height; ++j)
+for (int i = 0; i < out->width; ++i) {
+float sum = 0;
+for (int k = 0; k < a->width; ++k)
+sum += a->d[j * a->width + k] * b->d[k * b->width + i];
+out->d[j * out->width + i] = sum;
+}
+return out;
+}
+
+static void normalise(matrix *a)
+{
+for (int j = 0; j < a->height; ++j)

[FFmpeg-devel] [PATCH v3 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests

2022-03-31 Thread Ben Avison

Note that the benchmarking results for these functions are highly dependent
upon the input data. Therefore, each function is benchmarked twice,
corresponding to the best and worst case complexity of the reference C
implementation. The performance of a real stream decode will fall somewhere
between these two extremes.

Signed-off-by: Ben Avison 
---
 tests/checkasm/Makefile   |   1 +
 tests/checkasm/checkasm.c |   3 ++
 tests/checkasm/checkasm.h |   1 +
 tests/checkasm/vc1dsp.c   | 102 ++
 tests/fate/checkasm.mak   |   1 +
 5 files changed, 108 insertions(+)
 create mode 100644 tests/checkasm/vc1dsp.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index f768b1144e..7133a6ee66 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -11,6 +11,7 @@ AVCODECOBJS-$(CONFIG_H264PRED)  += h264pred.o
 AVCODECOBJS-$(CONFIG_H264QPEL)  += h264qpel.o
 AVCODECOBJS-$(CONFIG_LLVIDDSP)  += llviddsp.o
 AVCODECOBJS-$(CONFIG_LLVIDENCDSP)   += llviddspenc.o
+AVCODECOBJS-$(CONFIG_VC1DSP)+= vc1dsp.o
 AVCODECOBJS-$(CONFIG_VP8DSP)+= vp8dsp.o
 AVCODECOBJS-$(CONFIG_VIDEODSP)  += videodsp.o
 
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 748d6a9f3a..c2efd81b6d 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -147,6 +147,9 @@ static const struct {
 #if CONFIG_V210_ENCODER
 { "v210enc", checkasm_check_v210enc },
 #endif
+#if CONFIG_VC1DSP
+{ "vc1dsp", checkasm_check_vc1dsp },
+#endif
 #if CONFIG_VP8DSP
 { "vp8dsp", checkasm_check_vp8dsp },
 #endif
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index c3192d8c23..52ab18a5b1 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -78,6 +78,7 @@ void checkasm_check_sw_scale(void);
 void checkasm_check_utvideodsp(void);
 void checkasm_check_v210dec(void);
 void checkasm_check_v210enc(void);
+void checkasm_check_vc1dsp(void);
 void checkasm_check_vf_eq(void);
 void checkasm_check_vf_gblur(void);
 void checkasm_check_vf_hflip(void);
diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c
new file mode 100644
index 00..2fd6c74d6c
--- /dev/null
+++ b/tests/checkasm/vc1dsp.c
@@ -0,0 +1,102 @@
+/*
+ * Copyright (c) 2022 Ben Avison
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+
+#include "checkasm.h"
+
+#include "libavcodec/vc1dsp.h"
+
+#include "libavutil/common.h"
+#include "libavutil/internal.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/mem_internal.h"
+
+#define VC1DSP_TEST(func) { #func, offsetof(VC1DSPContext, func) },
+
+typedef struct {
+const char *name;
+size_t offset;
+} test;
+
+#define RANDOMIZE_BUFFER8_MID_WEIGHTED(name, size)  \
+do {\
+uint8_t *p##0 = name##0, *p##1 = name##1;   \
+int i = (size); \
+while (i-- > 0) {   \
+int x = 0x80 | (rnd() & 0x7F);  \
+x >>= rnd() % 9;\
+if (rnd() & 1)  \
+x = -x; \
+*p##1++ = *p##0++ = 0x80 + x;   \
+}   \
+} while (0)
+
+static void check_loop_filter(void)
+{
+/* Deblocking filter buffers are big enough to hold a 16x16 block,
+ * plus 16 columns left and 4 rows above to hold filter inputs
+ * (depending on whether v or h neighbouring block edge, oversized
+ * horizontally to maintain 16-byte alignment) plus 16 columns and
+ * 4 rows below to catch write overflows */
+LOCAL_ALIGNED_16(uint8_t, filter_buf0, [24 * 48]);
+LOCAL_ALIGNED_16(uint8_t, filter_buf1, [24 * 48]);
+
+VC1DSPContext h;
+
+const test tests[] = {
+VC1DSP_TEST(vc1_v_loop_filter4)
+VC1DSP_TEST(vc1_h_loop_filter4)
+VC1DSP_TEST(vc1_v_loop_filter8)
+VC1DSP_TEST(vc1_h_loop_filter8)
+VC1DSP_TEST(vc1_v_loop_filter16)
+VC1DSP_TEST(vc1_h_loop_filter16)
+};
+
+ff_vc1dsp_init();
+
+for (size_t t = 0; t < FF_ARRAY_ELEMS(tests); ++t) {
+void

[FFmpeg-devel] [PATCH v3 00/10] avcodec/vc1: Arm optimisations

2022-03-31 Thread Ben Avison

The VC1 decoder was missing lots of important fast paths for Arm, especially
for 64-bit Arm. This submission fills in implementations for all functions
where a fast path already existed and the fallback C implementation was
taking 1% or more of the runtime, and adds a new fast path to permit
vc1_unescape_buffer() to be overridden.

I've measured the playback speed on a 1.5 GHz Cortex-A72 (Raspberry Pi 4)
using `ffmpeg -i  -f null -` for a couple of example streams:

Architecture:  AArch32AArch32AArch64AArch64
Stream:1  2  1  2
Before speed:  1.22x  0.82x  1.00x  0.67x
After speed:   1.31x  0.98x  1.39x  1.06x
Improvement:   7.4%   20%39%58%

`make fate` passes on both AArch32 and AArch64.

Changes in v2:

* Refactor checkasm tests to convert some macros into functions.
* Remove cast-to-void of checked_call.
* Limit 16-bit values in idctdsp checkasm test to +/-0x100.
* Reinstate ff_add_pixels_clamped_arm.
* Adapt vc1 deblocking filters to specify stride as ptrdiff_t.
* Add align specifiers to a few VLD/VST instructions for AArch32 deblocking
  filter, and adapt checkasm test not to test with tighter alignment than is
  encountered in normal use.
* Correct unescape buffer memcmp length.
* Update benchmarks for AArch64 idctdsp.

Ben Avison (10):
  checkasm: Add vc1dsp in-loop deblocking filter tests
  checkasm: Add vc1dsp inverse transform tests
  checkasm: Add idctdsp add/put-pixels-clamped tests
  avcodec/vc1: Introduce fast path for unescaping bitstream buffer
  avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths
  avcodec/vc1: Arm 32-bit NEON deblocking filter fast paths
  avcodec/vc1: Arm 64-bit NEON inverse transform fast paths
  avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths
  avcodec/vc1: Arm 64-bit NEON unescape fast path
  avcodec/vc1: Arm 32-bit NEON unescape fast path

 libavcodec/aarch64/Makefile   |4 +-
 libavcodec/aarch64/idctdsp_init_aarch64.c |   26 +-
 libavcodec/aarch64/idctdsp_neon.S |  130 ++
 libavcodec/aarch64/vc1dsp_init_aarch64.c  |   94 ++
 libavcodec/aarch64/vc1dsp_neon.S  | 1546 +
 libavcodec/arm/vc1dsp_init_neon.c |   75 +
 libavcodec/arm/vc1dsp_neon.S  |  761 ++
 libavcodec/vc1dec.c   |   20 +-
 libavcodec/vc1dsp.c   |2 +
 libavcodec/vc1dsp.h   |3 +
 tests/checkasm/Makefile   |2 +
 tests/checkasm/checkasm.c |6 +
 tests/checkasm/checkasm.h |2 +
 tests/checkasm/idctdsp.c  |   98 ++
 tests/checkasm/vc1dsp.c   |  452 ++
 tests/fate/checkasm.mak   |2 +
 16 files changed, 3204 insertions(+), 19 deletions(-)
 create mode 100644 libavcodec/aarch64/idctdsp_neon.S
 create mode 100644 libavcodec/aarch64/vc1dsp_neon.S
 create mode 100644 tests/checkasm/idctdsp.c
 create mode 100644 tests/checkasm/vc1dsp.c

-- 
2.25.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths

2022-03-31 Thread Ben Avison


On 30/03/2022 15:14, Martin Storsjö wrote:

On Fri, 25 Mar 2022, Ben Avison wrote:
+// Clamp 16-bit signed block coefficients to signed 8-bit (biased by 
128)

+// On entry:
+//   x0 -> array of 64x 16-bit coefficients
+//   x1 -> 8-bit results
+//   x2 = row stride for results, bytes
+function ff_put_signed_pixels_clamped_neon, export=1
+    ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x0], #64
+    movi    v4.8b, #128
+    ld1 {v16.16b, v17.16b, v18.16b, v19.16b}, [x0]
+    sqxtn   v0.8b, v0.8h
+    sqxtn   v1.8b, v1.8h
+    sqxtn   v2.8b, v2.8h
+    sqxtn   v3.8b, v3.8h
+    sqxtn   v5.8b, v16.8h
+    add v0.8b, v0.8b, v4.8b


Here you could save 4 add instructions with sqxtn2 and adding .16b 
vectors, but I'm not sure if it's wortwhile. (It reduces the checkasm 
numbers by 0.7 for Cortex A72, by 0.3 for A73, but increases the runtime 
by 1.0 on A53.) Stranegely enough, I get much smaller numbers on my A72 
than you got.


That's weird. As you say, it should be independent of clock-frequency. 
FWIW, I'm benchmarking on a Raspberry Pi 4; I'd assume all its board 
variants' Cortex-A72 cores are of identical revision.


Now I run it again, I'm getting these figures:

idctdsp.add_pixels_clamped_c: 313.3
idctdsp.add_pixels_clamped_neon: 24.3
idctdsp.put_pixels_clamped_c: 220.3
idctdsp.put_pixels_clamped_neon: 15.5
idctdsp.put_signed_pixels_clamped_c: 210.5
idctdsp.put_signed_pixels_clamped_neon: 19.5

which is more in line with what you see! I am getting a lot of 
variability between runs though - from a small sample, I'm seeing 
add_pixels_clamped_neon coming out as anything from 21 to 30, which is 
well above the sort of differences you're seeing between alternate 
implementations.


This sort of case is always going to be difficult to schedule optimally 
for multiple core - factors like how much dual-issuing is possible, 
latency before values can be used, load speed and the granularity of 
scoreboarding parts of vectors, all vary widely.


In the case of the Cortex-A72, the critical path goes
ld1 of first 16 bytes -> sqxtn:  5 cycles
sqxtn -> add:4 cycles
add -> st1 of first 8 bytes: 3 cycles

It then bangs out one store per cycle, a total of 8. Everything else can 
largely be fitted in around this - so for example, other than I-cache 
usage, there shouldn't be a disadvantage to the adds being non-Q-form as 
they should dual-issue with the sqxtns and st2s - you'll notice I have 
them alternating.


I'd have expected anything interfering with this (such as by updating 
half the vector input required by any Q-form add) to slow things down.


Ben
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v10 1/1] avformat: Add IPFS protocol support.

2022-03-31 Thread Mark Gaiser

On Wed, Mar 30, 2022 at 5:16 PM Mark Gaiser  wrote:

>
>
> On Wed, Mar 30, 2022 at 3:57 PM Andreas Rheinhardt <
> andreas.rheinha...@outlook.com> wrote:
>
>> Mark Gaiser:
>> > On Wed, Mar 30, 2022 at 2:21 PM Andreas Rheinhardt <
>> > andreas.rheinha...@outlook.com> wrote:
>> >
>> >> Mark Gaiser:
>> >>> This patch adds support for:
>> >>> - ffplay ipfs://
>> >>> - ffplay ipns://
>> >>>
>> >>> IPFS data can be played from so called "ipfs gateways".
>> >>> A gateway is essentially a webserver that gives access to the
>> >>> distributed IPFS network.
>> >>>
>> >>> This protocol support (ipfs and ipns) therefore translates
>> >>> ipfs:// and ipns:// to a http:// url. This resulting url is
>> >>> then handled by the http protocol. It could also be https
>> >>> depending on the gateway provided.
>> >>>
>> >>> To use this protocol, a gateway must be provided.
>> >>> If you do nothing it will try to find it in your
>> >>> $HOME/.ipfs/gateway file. The ways to set it manually are:
>> >>> 1. Define a -gateway  to the gateway.
>> >>> 2. Define $IPFS_GATEWAY with the full http link to the gateway.
>> >>> 3. Define $IPFS_PATH and point it to the IPFS data path.
>> >>> 4. Have IPFS running in your local user folder (under $HOME/.ipfs).
>> >>>
>> >>> Signed-off-by: Mark Gaiser 
>> >>> ---
>> >>>  configure |   2 +
>> >>>  doc/protocols.texi|  30 
>> >>>  libavformat/Makefile  |   2 +
>> >>>  libavformat/ipfsgateway.c | 309
>> ++
>> >>>  libavformat/protocols.c   |   2 +
>> >>>  5 files changed, 345 insertions(+)
>> >>>  create mode 100644 libavformat/ipfsgateway.c
>> >>>
>> >>> diff --git a/configure b/configure
>> >>> index e4d36aa639..55af90957a 100755
>> >>> --- a/configure
>> >>> +++ b/configure
>> >>> @@ -3579,6 +3579,8 @@ udp_protocol_select="network"
>> >>>  udplite_protocol_select="network"
>> >>>  unix_protocol_deps="sys_un_h"
>> >>>  unix_protocol_select="network"
>> >>> +ipfs_protocol_select="https_protocol"
>> >>> +ipns_protocol_select="https_protocol"
>> >>>
>> >>>  # external library protocols
>> >>>  libamqp_protocol_deps="librabbitmq"
>> >>> diff --git a/doc/protocols.texi b/doc/protocols.texi
>> >>> index d207df0b52..7c9c0a4808 100644
>> >>> --- a/doc/protocols.texi
>> >>> +++ b/doc/protocols.texi
>> >>> @@ -2025,5 +2025,35 @@ decoding errors.
>> >>>
>> >>>  @end table
>> >>>
>> >>> +@section ipfs
>> >>> +
>> >>> +InterPlanetary File System (IPFS) protocol support. One can access
>> >> files stored
>> >>> +on the IPFS network through so called gateways. Those are http(s)
>> >> endpoints.
>> >>> +This protocol wraps the IPFS native protocols (ipfs:// and ipns://)
>> to
>> >> be send
>> >>> +to such a gateway. Users can (and should) host their own node which
>> >> means this
>> >>> +protocol will use your local machine gateway to access files on the
>> >> IPFS network.
>> >>> +
>> >>> +If a user doesn't have a node of their own then the public gateway
>> >> dweb.link is
>> >>> +used by default.
>> >>> +
>> >>> +You can use this protocol in 2 ways. Using IPFS:
>> >>> +@example
>> >>> +ffplay ipfs://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
>> >>> +@end example
>> >>> +
>> >>> +Or the IPNS protocol (IPNS is mutable IPFS):
>> >>> +@example
>> >>> +ffplay ipns://QmbGtJg23skhvFmu9mJiePVByhfzu5rwo74MEkVDYAmF5T
>> >>> +@end example
>> >>> +
>> >>> +You can also change the gateway to be used:
>> >>> +
>> >>> +@table @option
>> >>> +
>> >>> +@item gateway
>> >>> +Defines the gateway to use. When nothing is provided the protocol
>> will
>> >> first try
>> >>> +your local gateway. If that fails dweb.link will be used.
>> >>> +
>> >>> +@end table
>> >>>
>> >>>  @c man end PROTOCOLS
>> >>> diff --git a/libavformat/Makefile b/libavformat/Makefile
>> >>> index d7182d6bd8..e3233fd7ac 100644
>> >>> --- a/libavformat/Makefile
>> >>> +++ b/libavformat/Makefile
>> >>> @@ -660,6 +660,8 @@ OBJS-$(CONFIG_SRTP_PROTOCOL) +=
>> >> srtpproto.o srtp.o
>> >>>  OBJS-$(CONFIG_SUBFILE_PROTOCOL)  += subfile.o
>> >>>  OBJS-$(CONFIG_TEE_PROTOCOL)  += teeproto.o tee_common.o
>> >>>  OBJS-$(CONFIG_TCP_PROTOCOL)  += tcp.o
>> >>> +OBJS-$(CONFIG_IPFS_PROTOCOL) += ipfsgateway.o
>> >>> +OBJS-$(CONFIG_IPNS_PROTOCOL) += ipfsgateway.o
>> >>>  TLS-OBJS-$(CONFIG_GNUTLS)+= tls_gnutls.o
>> >>>  TLS-OBJS-$(CONFIG_LIBTLS)+= tls_libtls.o
>> >>>  TLS-OBJS-$(CONFIG_MBEDTLS)   += tls_mbedtls.o
>> >>> diff --git a/libavformat/ipfsgateway.c b/libavformat/ipfsgateway.c
>> >>> new file mode 100644
>> >>> index 00..1a039589c0
>> >>> --- /dev/null
>> >>> +++ b/libavformat/ipfsgateway.c
>> >>> @@ -0,0 +1,309 @@
>> >>> +/*
>> >>> + * IPFS and IPNS protocol support through IPFS Gateway.
>> >>> + * Copyright (c) 2022 Mark Gaiser
>> >>> + *
>> >>> + * This file is part of FFmpeg.
>> >>> + *
>> >>> + * FFmpeg is free software; you can redistribute it and/or
>> >>> + * modify it

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-31 Thread Ben Avison


On 30/03/2022 14:49, Martin Storsjö wrote:
Looks generally reasonable. Is it possible to factorize out the 
individual transforms (so that you'd e.g. invoke the same macro twice in 
the 8x8 and 4x4 functions) without too much loss?


There is a close analogy here with the vertical/horizontal deblocking 
filters, because while there are similarities between the two matrix 
multiplications within a transform, one of them follows a series of 
loads and the other follows a matrix transposition.


If you look for example at ff_vc1_inv_trans_8x8_neon, you'll see I was 
able to do a fair amount of overlap between sections of the function - 
particularly between the transpose and the second matrix multiplication, 
but to a lesser extent between the loads and the first matrix 
multiplication and between the second multiplication and the stores. 
This sort of overlapping is tricky to maintain when using macros. Also, 
it means the the order of operations within each matrix multiply ended 
up quite different.


At first sight, you might think that the multiplies from the 8x8 
function (which you might also view as kind of 8-tap filter) would be 
re-usable for the size-8 multiplies in the 8x4 or 4x8 function. Yes, the 
instructions are similar, save for using .4h elements rather than .8h 
elements, but that has significant impacts on scheduling. For example, 
the Cortex-A72, which is my primary target, can only do NEON bit-shifts 
in one pipeline at once, irrespective of whether the vectors are 64-bit 
or 128-bit long, while other instructions don't have such restrictions.


So while in theory you could factor some of this code out more, I 
suspect any attempt to do so would have a detrimental effect on performance.


Ben
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavutil/hwcontext_vaapi: Re-enable support for libva v1

2022-03-31 Thread Xiang, Haihao

On Thu, 2022-03-31 at 14:58 +, Xiang, Haihao wrote:
> On Tue, 2022-03-29 at 14:37 +, Xiang, Haihao wrote:
> > On Fri, 2022-03-11 at 13:24 +0100, Ingo Brückl wrote:
> > > Commit e050959103f375e6494937fa28ef2c4d2d15c9ef implemented passing in
> > > modifiers by using the PRIME_2 memory type, which only exists in v2 of
> > > the library.
> > > 
> > > To still support v1 of the library, conditionally compile using
> > > VA_CHECK_VERSION() for both the new code and the old code before
> > > the commit.
> > > ---
> > >  libavutil/hwcontext_vaapi.c | 57 -
> > >  1 file changed, 56 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/libavutil/hwcontext_vaapi.c b/libavutil/hwcontext_vaapi.c
> > > index 994b744e4d..799490442e 100644
> > > --- a/libavutil/hwcontext_vaapi.c
> > > +++ b/libavutil/hwcontext_vaapi.c
> > > @@ -1026,7 +1026,12 @@ static void vaapi_unmap_from_drm(AVHWFramesContext
> > > *dst_fc,
> > >  static int vaapi_map_from_drm(AVHWFramesContext *src_fc, AVFrame *dst,
> > >const AVFrame *src, int flags)
> > >  {
> > > +#if VA_CHECK_VERSION(2, 0, 0)
> > >  VAAPIFramesContext *src_vafc = src_fc->internal->priv;
> > > +int use_prime2;
> > > +#else
> > > +int k;
> > > +#endif
> > >  AVHWFramesContext  *dst_fc =
> > >  (AVHWFramesContext*)dst->hw_frames_ctx->data;
> > >  AVVAAPIDeviceContext  *dst_dev = dst_fc->device_ctx->hwctx;
> > > @@ -1034,10 +1039,28 @@ static int vaapi_map_from_drm(AVHWFramesContext
> > > *src_fc, AVFrame *dst,
> > >  const VAAPIFormatDescriptor *format_desc;
> > >  VASurfaceID surface_id;
> > >  VAStatus vas = VA_STATUS_SUCCESS;
> > > -int use_prime2;
> > >  uint32_t va_fourcc;
> > >  int err, i, j;
> > >  
> > > +#if !VA_CHECK_VERSION(2, 0, 0)
> > > +unsigned long buffer_handle;
> > > +VASurfaceAttribExternalBuffers buffer_desc;
> > > +VASurfaceAttrib attrs[2] = {
> > > +{
> > > +.type  = VASurfaceAttribMemoryType,
> > > +.flags = VA_SURFACE_ATTRIB_SETTABLE,
> > > +.value.type= VAGenericValueTypeInteger,
> > > +.value.value.i = VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME,
> > > +},
> > > +{
> > > +.type  = VASurfaceAttribExternalBufferDescriptor,
> > > +.flags = VA_SURFACE_ATTRIB_SETTABLE,
> > > +.value.type= VAGenericValueTypePointer,
> > > +.value.value.p = _desc,
> > > +}
> > > +};
> > > +#endif
> > > +
> > >  desc = (AVDRMFrameDescriptor*)src->data[0];
> > >  
> > >  if (desc->nb_objects != 1) {
> > > @@ -1072,6 +1095,7 @@ static int vaapi_map_from_drm(AVHWFramesContext
> > > *src_fc,
> > > AVFrame *dst,
> > >  format_desc = vaapi_format_from_fourcc(va_fourcc);
> > >  av_assert0(format_desc);
> > >  
> > > +#if VA_CHECK_VERSION(2, 0, 0)
> > >  use_prime2 = !src_vafc->prime_2_import_unsupported &&
> > >   desc->objects[0].format_modifier !=
> > > DRM_FORMAT_MOD_INVALID;
> > >  if (use_prime2) {
> > > @@ -1183,6 +1207,37 @@ static int vaapi_map_from_drm(AVHWFramesContext
> > > *src_fc, AVFrame *dst,
> > > _id, 1,
> > > buffer_attrs,
> > > FF_ARRAY_ELEMS(buffer_attrs));
> > >  }
> > > +#else
> > > +buffer_handle = desc->objects[0].fd;
> > > +buffer_desc.pixel_format = va_fourcc;
> > > +buffer_desc.width= src_fc->width;
> > > +buffer_desc.height   = src_fc->height;
> > > +buffer_desc.data_size= desc->objects[0].size;
> > > +buffer_desc.buffers  = _handle;
> > > +buffer_desc.num_buffers  = 1;
> > > +buffer_desc.flags= 0;
> > > +
> > > +k = 0;
> > > +for (i = 0; i < desc->nb_layers; i++) {
> > > +for (j = 0; j < desc->layers[i].nb_planes; j++) {
> > > +buffer_desc.pitches[k] = desc->layers[i].planes[j].pitch;
> > > +buffer_desc.offsets[k] = desc->layers[i].planes[j].offset;
> > > +++k;
> > > +}
> > > +}
> > > +buffer_desc.num_planes = k;
> > > +
> > > +if (format_desc->chroma_planes_swapped &&
> > > +buffer_desc.num_planes == 3) {
> > > +FFSWAP(uint32_t, buffer_desc.pitches[1], buffer_desc.pitches[2]);
> > > +FFSWAP(uint32_t, buffer_desc.offsets[1], buffer_desc.offsets[2]);
> > > +}
> > > +
> > > +vas = vaCreateSurfaces(dst_dev->display, format_desc->rt_format,
> > > +   src->width, src->height,
> > > +   _id, 1,
> > > +   attrs, FF_ARRAY_ELEMS(attrs));
> > > +#endif
> > >  if (vas != VA_STATUS_SUCCESS) {
> > >  av_log(dst_fc, AV_LOG_ERROR, "Failed to create surface from DRM "
> > > "object: %d (%s).\n", vas, vaErrorStr(vas));
> > 
> > LGTM, will apply
> 
> I'm sorry I didn't notice you are using `VA_CHECK_VERSION(2, 0, 0)`. PRIME_2
> is
>

Re: [FFmpeg-devel] [PATCH 05/10] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths

2022-03-31 Thread Ben Avison


On 30/03/2022 13:35, Martin Storsjö wrote:

Overall, the code looks sensible to me.

Would it make sense to share the core of the filter between the 
horizontal/vertical cases with e.g. a macro? (I didn't check in detail 
if there's much differences in the core of the filter. At most some 
differences in condition registers for partial writeout in the 
horizontal forms?)


Well, looking at the comments at the right-hand side of the source, 
which give the logical meaning of the results of each instruction, I 
admit there's a resemblance in the middle of the 8-pixel-pair function. 
However, the physical register assignments are quite different, and 
attempting to reassign the registers in one to match the other isn't a 
trivial task. It's hard enough when you start register assignment from 
the top of a function and work your way down, as I have done here.


In the 16-pixel-pair case, the fact that the input values arrive in a 
different order as the result of them, in one case, being loaded in 
regularly-increasing address order, and in the other, falling out of a 
matrix transposition, has resulted in even the logical order of 
instructions being quite different in the two cases.


In the 4-pixel-pair case, the values are packed differently into 
registers in the two cases, because in the v case, we're loading 4 
pixels between row-strides, which means it's easy to place each row in 
its own vector, whereas in the h case we load 4 rows of 8 pixels each 
and transpose, which leaves the values in 4 vectors rather than 8. Some 
of the filtering steps can be performed with the data packed in this way 
(calculating a1 and a2) while waiting for it to be restructured in order 
to calculate the other metrics, but it's not worth packing the data 
together in this way in the v case given that it starts off already 
separated. So the two implementations end up quite different in the 
operations they perform, not just the scheduling of instructions and in 
register assignment terms.


Some background: as you may have guessed, I didn't start out writing 
these functions as they currently appear. Prototype versions didn't care 
much for scheduling or keeping to a small number of registers. They were 
primarily for checking the correctness of the mathematics, and they'd 
use all available vectors, sometimes shuffling values between registers 
or to the stack to make room. Once I'd verified correctness, I then 
reworked them to keep to a minimal number of registers and to minimise 
stalls as far as possible.


I'm targeting the Cortex-A72, since that's what the Raspberry Pi 4 uses 
and it's on the cusp of having enough power to decode VC-1 BluRay 
streams, so I deliberately didn't take too much consideration of the 
requirements of earlier cores. Yes, it's an out-of-order core, but I 
reckoned there are probably limits to how wisely it can select 
instructions to execute (there have got to be limits to instruction 
queue lengths, for example). So based on the pipeline structure 
documented in Arm's Cortex-A72 software opimization guide, I arranged 
the instructions to best keep all pipelines busy as much as possible, 
then assigned registers to keep the instructions in this order.


For the most part, I was able to keep the number of vectors used low 
enough that no callee-saving was required - or failing that, at least 
avoiding having to spill values to the stack mid-function. But it came 
pretty close at times - witness for example the peculiar order in which 
vectors had to be loaded in the AArch32 version of 
ff_vc1_h_loop_filter16_neon. There's reason behind that!


In short, I'd really rather not tamper with these larger assembly 
functions any more unless I really have to.


Ben
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavutil/hwcontext_vaapi: Re-enable support for libva v1

2022-03-31 Thread Xiang, Haihao

On Tue, 2022-03-29 at 14:37 +, Xiang, Haihao wrote:
> On Fri, 2022-03-11 at 13:24 +0100, Ingo Brückl wrote:
> > Commit e050959103f375e6494937fa28ef2c4d2d15c9ef implemented passing in
> > modifiers by using the PRIME_2 memory type, which only exists in v2 of
> > the library.
> > 
> > To still support v1 of the library, conditionally compile using
> > VA_CHECK_VERSION() for both the new code and the old code before
> > the commit.
> > ---
> >  libavutil/hwcontext_vaapi.c | 57 -
> >  1 file changed, 56 insertions(+), 1 deletion(-)
> > 
> > diff --git a/libavutil/hwcontext_vaapi.c b/libavutil/hwcontext_vaapi.c
> > index 994b744e4d..799490442e 100644
> > --- a/libavutil/hwcontext_vaapi.c
> > +++ b/libavutil/hwcontext_vaapi.c
> > @@ -1026,7 +1026,12 @@ static void vaapi_unmap_from_drm(AVHWFramesContext
> > *dst_fc,
> >  static int vaapi_map_from_drm(AVHWFramesContext *src_fc, AVFrame *dst,
> >const AVFrame *src, int flags)
> >  {
> > +#if VA_CHECK_VERSION(2, 0, 0)
> >  VAAPIFramesContext *src_vafc = src_fc->internal->priv;
> > +int use_prime2;
> > +#else
> > +int k;
> > +#endif
> >  AVHWFramesContext  *dst_fc =
> >  (AVHWFramesContext*)dst->hw_frames_ctx->data;
> >  AVVAAPIDeviceContext  *dst_dev = dst_fc->device_ctx->hwctx;
> > @@ -1034,10 +1039,28 @@ static int vaapi_map_from_drm(AVHWFramesContext
> > *src_fc, AVFrame *dst,
> >  const VAAPIFormatDescriptor *format_desc;
> >  VASurfaceID surface_id;
> >  VAStatus vas = VA_STATUS_SUCCESS;
> > -int use_prime2;
> >  uint32_t va_fourcc;
> >  int err, i, j;
> >  
> > +#if !VA_CHECK_VERSION(2, 0, 0)
> > +unsigned long buffer_handle;
> > +VASurfaceAttribExternalBuffers buffer_desc;
> > +VASurfaceAttrib attrs[2] = {
> > +{
> > +.type  = VASurfaceAttribMemoryType,
> > +.flags = VA_SURFACE_ATTRIB_SETTABLE,
> > +.value.type= VAGenericValueTypeInteger,
> > +.value.value.i = VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME,
> > +},
> > +{
> > +.type  = VASurfaceAttribExternalBufferDescriptor,
> > +.flags = VA_SURFACE_ATTRIB_SETTABLE,
> > +.value.type= VAGenericValueTypePointer,
> > +.value.value.p = _desc,
> > +}
> > +};
> > +#endif
> > +
> >  desc = (AVDRMFrameDescriptor*)src->data[0];
> >  
> >  if (desc->nb_objects != 1) {
> > @@ -1072,6 +1095,7 @@ static int vaapi_map_from_drm(AVHWFramesContext
> > *src_fc,
> > AVFrame *dst,
> >  format_desc = vaapi_format_from_fourcc(va_fourcc);
> >  av_assert0(format_desc);
> >  
> > +#if VA_CHECK_VERSION(2, 0, 0)
> >  use_prime2 = !src_vafc->prime_2_import_unsupported &&
> >   desc->objects[0].format_modifier !=
> > DRM_FORMAT_MOD_INVALID;
> >  if (use_prime2) {
> > @@ -1183,6 +1207,37 @@ static int vaapi_map_from_drm(AVHWFramesContext
> > *src_fc, AVFrame *dst,
> > _id, 1,
> > buffer_attrs, FF_ARRAY_ELEMS(buffer_attrs));
> >  }
> > +#else
> > +buffer_handle = desc->objects[0].fd;
> > +buffer_desc.pixel_format = va_fourcc;
> > +buffer_desc.width= src_fc->width;
> > +buffer_desc.height   = src_fc->height;
> > +buffer_desc.data_size= desc->objects[0].size;
> > +buffer_desc.buffers  = _handle;
> > +buffer_desc.num_buffers  = 1;
> > +buffer_desc.flags= 0;
> > +
> > +k = 0;
> > +for (i = 0; i < desc->nb_layers; i++) {
> > +for (j = 0; j < desc->layers[i].nb_planes; j++) {
> > +buffer_desc.pitches[k] = desc->layers[i].planes[j].pitch;
> > +buffer_desc.offsets[k] = desc->layers[i].planes[j].offset;
> > +++k;
> > +}
> > +}
> > +buffer_desc.num_planes = k;
> > +
> > +if (format_desc->chroma_planes_swapped &&
> > +buffer_desc.num_planes == 3) {
> > +FFSWAP(uint32_t, buffer_desc.pitches[1], buffer_desc.pitches[2]);
> > +FFSWAP(uint32_t, buffer_desc.offsets[1], buffer_desc.offsets[2]);
> > +}
> > +
> > +vas = vaCreateSurfaces(dst_dev->display, format_desc->rt_format,
> > +   src->width, src->height,
> > +   _id, 1,
> > +   attrs, FF_ARRAY_ELEMS(attrs));
> > +#endif
> >  if (vas != VA_STATUS_SUCCESS) {
> >  av_log(dst_fc, AV_LOG_ERROR, "Failed to create surface from DRM "
> > "object: %d (%s).\n", vas, vaErrorStr(vas));
> 
> LGTM, will apply

I'm sorry I didn't notice you are using `VA_CHECK_VERSION(2, 0, 0)`. PRIME_2 is
available since VA-API 1.0, so you should use `VA_CHECK_VERSION(1, 0, 0)`, could
you please update your patch ?

Thanks
Haihao



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe,

Re: [FFmpeg-devel] [PATCH 04/10] avcodec/vc1: Introduce fast path for unescaping bitstream buffer

2022-03-31 Thread Martin Storsjö


On Thu, 31 Mar 2022, Ben Avison wrote:


On 29/03/2022 21:37, Martin Storsjö wrote:

On Fri, 25 Mar 2022, Ben Avison wrote:
As with the rest of the checkasm tests - please unmacro most things where 
possible (except for the RANDOMIZE_* macros, those are ok to keep macroed 
if you want to).


In the case of TEST_UNESCAPE, I think it has to remain as a macro, otherwise 
the next function up ends up with a declare_func_emms() and a bench_new() but 
no call_ref() or call_new(), which means some builds end up with an unused 
function warning.


Oh, right - yes, call_ref and call_new need to be in the same scope as 
declare_func, yes.


I can, however, split all the unescape tests out of checkasm_check_vc1dsp 
into a separate function (and separate functions for inverse-transform and 
deblocking tests).


Awesome, thanks!

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 04/10] avcodec/vc1: Introduce fast path for unescaping bitstream buffer

2022-03-31 Thread Ben Avison


On 29/03/2022 21:37, Martin Storsjö wrote:

On Fri, 25 Mar 2022, Ben Avison wrote:
+#define 
TEST_UNESCAPE   
\
+    do 
{
\
+    for (int count = 100; count > 0; --count) 
{ \
+    escaped_offset = rnd() & 
7; \
+    unescaped_offset = rnd() & 
7;   \
+    escaped_len = (1u << (rnd() % 8) + 3) - (rnd() & 
7);    \
+    RANDOMIZE_BUFFER8(unescaped, 
UNESCAPE_BUF_SIZE);    \


The output buffer will be overwritten in the end, but I guess this 
initialization is useful for making sure that the test doesn't 
accidentally rely on the output from the previous iteration, right?


The main idea was to catch examples of writing to the buffer beyond the 
length reported (and less likely, writes before the start of the 
buffer). I suppose it's possible that someone might want to deliberately 
overwrite in specific conditions, but the test could always be loosened 
up at that point once those conditions become clearer.


+    len0 = call_ref(escaped0 + escaped_offset, escaped_len, 
unescaped0 + unescaped_offset); \
+    len1 = call_new(escaped1 + escaped_offset, escaped_len, 
unescaped1 + unescaped_offset); \
+    if (len0 != len1 || memcmp(unescaped0, unescaped1, 
len0))   \


Don't you need to include unescaped_offset here too? Otherwise you're 
just checking areas of the buffer that wasn't necessarily written.


I realise I should have made the memcmp length UNESCAPE_BUF_SIZE here to 
achieve what I intended. Testing len0 bytes from the start of the buffer 
neither checks all the written bytes nor checks the byte after those 
written :-$


As with the rest of the checkasm tests - please unmacro most things 
where possible (except for the RANDOMIZE_* macros, those are ok to keep 
macroed if you want to).


In the case of TEST_UNESCAPE, I think it has to remain as a macro, 
otherwise the next function up ends up with a declare_func_emms() and a 
bench_new() but no call_ref() or call_new(), which means some builds end 
up with an unused function warning.


I can, however, split all the unescape tests out of 
checkasm_check_vc1dsp into a separate function (and separate functions 
for inverse-transform and deblocking tests).


Ben
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/4 v2] ffmpeg: ensure a keyframe was not seen before skipping packets

2022-03-31 Thread James Almer





On 3/31/2022 8:47 AM, Anton Khirnov wrote:

Quoting James Almer (2022-02-23 16:03:53)

A keyframe could be buffered in the bsf and not be output until more packets
had been fed to it.

Signed-off-by: James Almer 
---
Changed the check from pkt to !eof, since a packet is always provided.

  fftools/ffmpeg.c | 4 +++-
  fftools/ffmpeg.h | 1 +
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c
index 44043ef203..2b61c0d5aa 100644
--- a/fftools/ffmpeg.c
+++ b/fftools/ffmpeg.c
@@ -890,6 +890,8 @@ static void output_packet(OutputFile *of, AVPacket *pkt,
  
  /* apply the output bitstream filters */

  if (ost->bsf_ctx) {
+if (!eof && pkt->flags & AV_PKT_FLAG_KEY)
+ost->seen_kf = 1;


Shouldn't this also be set when no bsfs are used?


Afaict only in streamcopy with bsfs scenarios can packets be temporarily 
withheld, so it's not necessary.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/4 v2] ffmpeg: ensure a keyframe was not seen before skipping packets

2022-03-31 Thread Anton Khirnov

Quoting James Almer (2022-02-23 16:03:53)
> A keyframe could be buffered in the bsf and not be output until more packets
> had been fed to it.
> 
> Signed-off-by: James Almer 
> ---
> Changed the check from pkt to !eof, since a packet is always provided.
> 
>  fftools/ffmpeg.c | 4 +++-
>  fftools/ffmpeg.h | 1 +
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c
> index 44043ef203..2b61c0d5aa 100644
> --- a/fftools/ffmpeg.c
> +++ b/fftools/ffmpeg.c
> @@ -890,6 +890,8 @@ static void output_packet(OutputFile *of, AVPacket *pkt,
>  
>  /* apply the output bitstream filters */
>  if (ost->bsf_ctx) {
> +if (!eof && pkt->flags & AV_PKT_FLAG_KEY)
> +ost->seen_kf = 1;

Shouldn't this also be set when no bsfs are used?

-- 
Anton Khirnov
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 6/7] avcodec: Make avcodec_decoder_subtitles2 accept a const AVPacket*

2022-03-31 Thread Anton Khirnov

Quoting Andreas Rheinhardt (2022-03-31 00:49:57)
> From: Andreas Rheinhardt 
> 
> Signed-off-by: Andreas Rheinhardt 
> ---
>  doc/APIchanges| 3 +++
>  fftools/ffmpeg.c  | 4 ++--
>  fftools/ffprobe.c | 2 +-
>  libavcodec/avcodec.h  | 3 +--
>  libavcodec/decode.c   | 9 -
>  libavcodec/version.h  | 2 +-
>  tools/target_dec_fuzzer.c | 4 ++--
>  7 files changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/doc/APIchanges b/doc/APIchanges
> index 1a9f0a303e..326a3c721c 100644
> --- a/doc/APIchanges
> +++ b/doc/APIchanges
> @@ -14,6 +14,9 @@ libavutil: 2021-04-27
>  
>  API changes, most recent first:
>  
> +2022-03-30 - xx - lavc 59.26.100 - avcodec.h
> +  avcodec_decode_subtitle2() now accepts const AVPacket*.

I vaguely recall C++ having a problem with such changes. Anybody
remembers the details? Do we care?

-- 
Anton Khirnov
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2] doc/filters: document vf_libplacebo

2022-03-31 Thread Niklas Haas

From: Niklas Haas 

Signed-off-by: Niklas Haas 
---
Changes in v2:
- expand documentation of tone mapping curves
- slight rewording of some sections
- add more examples
---
 doc/filters.texi | 494 +++
 1 file changed, 494 insertions(+)

diff --git a/doc/filters.texi b/doc/filters.texi
index 1d56d24819..a6f2f1397e 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -14793,6 +14793,500 @@ ffmpeg -i input.mov -vf 
lensfun=make=Canon:model="Canon EOS 100D":lens_model="Ca
 
 @end itemize
 
+@section libplacebo
+
+Flexible GPU-accelerated processing filter based on libplacebo
+(@url{https://code.videolan.org/videolan/libplacebo}). Note that this filter
+currently only accepts Vulkan input frames.
+
+@subsection Options
+
+The options for this filter are divided into the following sections:
+
+@subsubsection Output mode
+These options control the overall output mode. By default, libplacebo will try
+to preserve the source colorimetry and size as best as it can, but it will
+apply any embedded film grain, dolby vision metadata or anamorphic SAR present
+in source frames.
+@table @option
+@item w
+@item h
+Set the output video dimension expression. Default value is the input 
dimension.
+
+Allows for the same expressions as the @ref{scale} filter.
+
+@item format
+Set the output format override. If unset (the default), frames will be output
+in the same format as the respective input frames. Otherwise, format conversion
+will be performed.
+
+@item force_original_aspect_ratio
+@item force_divisible_by
+Work the same as the identical @ref{scale} filter options.
+
+@item normalize_sar
+If enabled (the default), output frames will always have a pixel aspect ratio
+of 1:1. If disabled, any aspect ratio mismatches, including those from e.g.
+anamorphic video sources, are forwarded to the output pixel aspect ratio.
+
+@item pad_crop_ratio
+Specifies a ratio (between @code{0.0} and @code{1.0}) between padding and
+cropping when the input aspect ratio does not match the output aspect ratio and
+@option{normalize_sar} is in effect. The default of @code{0.0} always pads the
+content with black borders, while a value of @code{1.0} always crops off parts
+of the content. Intermediate values are possible, leading to a mix of the two
+approaches.
+
+@item colorspace
+@item color_primaries
+@item color_trc
+@item range
+Configure the colorspace that output frames will be delivered in. The default
+value of @code{auto} outputs frames in the same format as the input frames,
+leading to no change. For any other value, conversion will be performed.
+
+See the @ref{setparams} filter for a list of possible values.
+
+@item apply_filmgrain
+Apply film grain (e.g. AV1 or H.274) if present in source frames, and strip
+it from the output. Enabled by default.
+
+@item apply_dolbyvision
+Apply Dolby Vision RPU metadata if present in source frames, and strip it from
+the output. Enabled by default. Note that Dolby Vision will always output
+BT.2020+PQ, overriding the usual input frame metadata. These will also be
+picked as the values of @code{auto} for the respective frame output options.
+@end table
+
+@subsubsection Scaling
+The options in this section control how libplacebo performs upscaling and (if
+necessary) downscaling. Note that libplacebo will always internally operate on
+4:4:4 content, so any sub-sampled chroma formats such as @code{yuv420p} will
+necessarily be upsampled and downsampled as part of the rendering process. That
+means scaling might be in effect even if the source and destination resolution
+are the same.
+@table @option
+@item upscaler
+@item downscaler
+Configure the filter kernel used for upscaling and downscaling. The respective
+defaults are @code{spline36} and @code{mitchell}. For a full list of possible
+values, pass @code{help} to these options. The most important values are:
+@table @samp
+
+@item none
+Forces the use of built-in GPU texture sampling (typically bilinear). Extremely
+fast but poor quality, especially when downscaling.
+
+@item bilinear
+Bilinear interpolation. Can generally be done for free on GPUs, except when
+doing so would lead to aliasing. Fast and low quality.
+
+@item nearest
+Nearest-neighbour interpolation. Sharp but highly aliasing.
+
+@item oversample
+Algorithm that looks visually similar to nearest-neighbour interpolation but
+tries to preserve pixel aspect ratio. Good for pixel art, since it results in
+minimal distortion of the artistic appearance.
+
+@item lanczos
+Standard sinc-sinc interpolation kernel.
+
+@item spline36
+Cubic spline approximation of lanczos. No difference in performance, but has
+very slightly less ringing.
+
+@item ewa_lanczos
+Elliptically weighted average version of lanczos, based on a jinc-sinc kernel.
+This is also popularly referred to as just "Jinc scaling". Slow but very high
+quality.
+
+@item gaussian
+Gaussian kernel. Has certain ideal mathematical properties, but subjectively
+very blurry.
+

Re: [FFmpeg-devel] [PATCH v2] avfilter/vf_libplacebo: update for new tone mapping API

2022-03-31 Thread Niklas Haas

Applied as e301a24fa191ad19574289b765ff1946b23c03f3

On Fri, 25 Mar 2022 16:11:19 +0100 Niklas Haas  wrote:
> From: Niklas Haas 
> 
> Upstream gained a new tone-mapping API, which we never switched to. We
> don't need a version bump for this because it was included as part of
> the v4.192 release we currently already depend on.
> 
> Some of the old options can be moderately approximated with the new API,
> but specifically "desaturation_base" and "max_boost" cannot. Remove
> these entirely, rather than deprecating them. They have actually been
> non-functional for a while as a result of the upstream deprecation.
> 
> Signed-off-by: Niklas Haas 
> ---
> Changes in v2:
> - Avoid use of strings in favor of replicating the enum values
> - Fix two wrong enum option value ranges
> - Simplify the option setting code again slightly
> ---
>  libavfilter/vf_libplacebo.c | 112 
>  1 file changed, 89 insertions(+), 23 deletions(-)
> 
> diff --git a/libavfilter/vf_libplacebo.c b/libavfilter/vf_libplacebo.c
> index 31ae28ac38..8ce6462c66 100644
> --- a/libavfilter/vf_libplacebo.c
> +++ b/libavfilter/vf_libplacebo.c
> @@ -26,6 +26,33 @@
>  #include 
>  #include 
>  
> +enum {
> +TONE_MAP_AUTO,
> +TONE_MAP_CLIP,
> +TONE_MAP_BT2390,
> +TONE_MAP_BT2446A,
> +TONE_MAP_SPLINE,
> +TONE_MAP_REINHARD,
> +TONE_MAP_MOBIUS,
> +TONE_MAP_HABLE,
> +TONE_MAP_GAMMA,
> +TONE_MAP_LINEAR,
> +TONE_MAP_COUNT,
> +};
> +
> +static const struct pl_tone_map_function * const 
> tonemapping_funcs[TONE_MAP_COUNT] = {
> +[TONE_MAP_AUTO] = _tone_map_auto,
> +[TONE_MAP_CLIP] = _tone_map_clip,
> +[TONE_MAP_BT2390]   = _tone_map_bt2390,
> +[TONE_MAP_BT2446A]  = _tone_map_bt2446a,
> +[TONE_MAP_SPLINE]   = _tone_map_spline,
> +[TONE_MAP_REINHARD] = _tone_map_reinhard,
> +[TONE_MAP_MOBIUS]   = _tone_map_mobius,
> +[TONE_MAP_HABLE]= _tone_map_hable,
> +[TONE_MAP_GAMMA]= _tone_map_gamma,
> +[TONE_MAP_LINEAR]   = _tone_map_linear,
> +};
> +
>  typedef struct LibplaceboContext {
>  /* lavfi vulkan*/
>  FFVulkanContext vkctx;
> @@ -91,12 +118,16 @@ typedef struct LibplaceboContext {
>  
>  /* pl_color_map_params */
>  int intent;
> +int gamut_mode;
>  int tonemapping;
>  float tonemapping_param;
> +int tonemapping_mode;
> +int inverse_tonemapping;
> +float crosstalk;
> +int tonemapping_lut_size;
> +/* for backwards compatibility */
>  float desat_str;
>  float desat_exp;
> -float desat_base;
> -float max_boost;
>  int gamut_warning;
>  int gamut_clipping;
>  
> @@ -281,6 +312,8 @@ static int process_frames(AVFilterContext *avctx, AVFrame 
> *out, AVFrame *in)
>  int err = 0, ok;
>  LibplaceboContext *s = avctx->priv;
>  struct pl_render_params params;
> +enum pl_tone_map_mode tonemapping_mode = s->tonemapping_mode;
> +enum pl_gamut_mode gamut_mode = s->gamut_mode;
>  struct pl_frame image, target;
>  ok = pl_map_avframe_ex(s->gpu, , pl_avframe_params(
>  .frame= in,
> @@ -305,6 +338,24 @@ static int process_frames(AVFilterContext *avctx, 
> AVFrame *out, AVFrame *in)
>  pl_rect2df_aspect_set(, aspect, s->pad_crop_ratio);
>  }
>  
> +/* backwards compatibility with older API */
> +if (!tonemapping_mode && (s->desat_str >= 0.0f || s->desat_exp >= 0.0f)) 
> {
> +float str = s->desat_str < 0.0f ? 0.9f : s->desat_str;
> +float exp = s->desat_exp < 0.0f ? 0.2f : s->desat_exp;
> +if (str >= 0.9f && exp <= 0.1f) {
> +tonemapping_mode = PL_TONE_MAP_RGB;
> +} else if (str > 0.1f) {
> +tonemapping_mode = PL_TONE_MAP_HYBRID;
> +} else {
> +tonemapping_mode = PL_TONE_MAP_LUMA;
> +}
> +}
> +
> +if (s->gamut_warning)
> +gamut_mode = PL_GAMUT_WARN;
> +if (s->gamut_clipping)
> +gamut_mode = PL_GAMUT_DESATURATE;
> +
>  /* Update render params */
>  params = (struct pl_render_params) {
>  PL_RENDER_DEFAULTS
> @@ -338,14 +389,13 @@ static int process_frames(AVFilterContext *avctx, 
> AVFrame *out, AVFrame *in)
>  
>  .color_map_params = pl_color_map_params(
>  .intent = s->intent,
> -.tone_mapping_algo = s->tonemapping,
> +.gamut_mode = gamut_mode,
> +.tone_mapping_function = tonemapping_funcs[s->tonemapping],
>  .tone_mapping_param = s->tonemapping_param,
> -.desaturation_strength = s->desat_str,
> -.desaturation_exponent = s->desat_exp,
> -.desaturation_base = s->desat_base,
> -.max_boost = s->max_boost,
> -.gamut_warning = s->gamut_warning,
> -.gamut_clipping = s->gamut_clipping,
> +.tone_mapping_mode = tonemapping_mode,
> +.inverse_tone_mapping = s->inverse_tonemapping,
> +.tone_mapping_crosstalk = s->crosstalk,
> +

Re: [FFmpeg-devel] [PATCH 8/8] avcodec/codec_internal: Include codec_tags only when they are needed

2022-03-31 Thread Andreas Rheinhardt

Andreas Rheinhardt:
> They are only needed for the fuzzer, so check for CONFIG_OSSFUZZ.
> This decreases sizeof(FFCodec), which is important given that
> FFCodecs reside in .data.rel.ro in case of ELF with
> position-independent code which is always loaded and can't be shared
> between processes.
> 
> Signed-off-by: Andreas Rheinhardt 
> ---
>  libavcodec/bitpacked_dec.c  |  5 +
>  libavcodec/codec_internal.h | 10 ++
>  libavcodec/hapdec.c | 13 +
>  tools/target_dec_fuzzer.c   |  2 ++
>  4 files changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/libavcodec/bitpacked_dec.c b/libavcodec/bitpacked_dec.c
> index 419550dfe0..b62d88fa8f 100644
> --- a/libavcodec/bitpacked_dec.c
> +++ b/libavcodec/bitpacked_dec.c
> @@ -151,9 +151,6 @@ const FFCodec ff_bitpacked_decoder = {
>  .init = bitpacked_init_decoder,
>  .decode = bitpacked_decode,
>  .p.capabilities = AV_CODEC_CAP_FRAME_THREADS,
> -.codec_tags = (const uint32_t []){
> -MKTAG('U', 'Y', 'V', 'Y'),
> -FF_CODEC_TAGS_END,
> -},
>  .caps_internal  = FF_CODEC_CAP_INIT_THREADSAFE,
> +FF_CODEC_TAGS(MKTAG('U', 'Y', 'V', 'Y'))
>  };
> diff --git a/libavcodec/codec_internal.h b/libavcodec/codec_internal.h
> index 596cdbebd2..b6b5b05b44 100644
> --- a/libavcodec/codec_internal.h
> +++ b/libavcodec/codec_internal.h
> @@ -21,6 +21,7 @@
>  
>  #include 
>  
> +#include "config.h"
>  #include "libavutil/attributes.h"
>  #include "codec.h"
>  
> @@ -74,10 +75,16 @@
>   */
>  #define FF_CODEC_CAP_SETS_FRAME_PROPS   (1 << 8)
>  
> +#if CONFIG_OSSFUZZ
>  /**
>   * FFCodec.codec_tags termination value
>   */
>  #define FF_CODEC_TAGS_END -1
> +#define FF_CODEC_TAGS(...) \
> +.codec_tags = (const uint32_t[]){ __VA_ARGS__, FF_CODEC_TAGS_END },
> +#else
> +#define FF_CODEC_TAGS(...)
> +#endif
>  
>  typedef struct FFCodecDefault {
>  const char *key;
> @@ -196,10 +203,13 @@ typedef struct FFCodec {
>   */
>  const struct AVCodecHWConfigInternal *const *hw_configs;
>  
> +#if CONFIG_OSSFUZZ
>  /**
>   * List of supported codec_tags, terminated by FF_CODEC_TAGS_END.
> + * Should be defined with the FF_CODEC_TAGS() macro.
>   */
>  const uint32_t *codec_tags;
> +#endif
>  } FFCodec;
>  
>  static av_always_inline const FFCodec *ffcodec(const AVCodec *codec)
> diff --git a/libavcodec/hapdec.c b/libavcodec/hapdec.c
> index 4a7ac15a8e..72f922bc5b 100644
> --- a/libavcodec/hapdec.c
> +++ b/libavcodec/hapdec.c
> @@ -486,12 +486,9 @@ const FFCodec ff_hap_decoder = {
>AV_CODEC_CAP_DR1,
>  .caps_internal  = FF_CODEC_CAP_INIT_THREADSAFE |
>FF_CODEC_CAP_INIT_CLEANUP,
> -.codec_tags = (const uint32_t []){
> -MKTAG('H','a','p','1'),
> -MKTAG('H','a','p','5'),
> -MKTAG('H','a','p','Y'),
> -MKTAG('H','a','p','A'),
> -MKTAG('H','a','p','M'),
> -FF_CODEC_TAGS_END,
> -},
> +FF_CODEC_TAGS(MKTAG('H','a','p','1'),
> +  MKTAG('H','a','p','5'),
> +  MKTAG('H','a','p','Y'),
> +  MKTAG('H','a','p','A'),
> +  MKTAG('H','a','p','M'))
>  };
> diff --git a/tools/target_dec_fuzzer.c b/tools/target_dec_fuzzer.c
> index 288aa63313..77f4bb8dd8 100644
> --- a/tools/target_dec_fuzzer.c
> +++ b/tools/target_dec_fuzzer.c
> @@ -279,12 +279,14 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t 
> size) {
>  ctx->sample_rate= bytestream2_get_le32() 
> & 0x7FFF;
>  ctx->ch_layout.nb_channels  = 
> (unsigned)bytestream2_get_le32() % FF_SANE_NB_CHANNELS;
>  ctx->block_align= bytestream2_get_le32() 
> & 0x7FFF;
> +#if CONFIG_OSSFUZZ
>  ctx->codec_tag  = bytestream2_get_le32();
>  if (c->codec_tags) {
>  int n;
>  for (n = 0; c->codec_tags[n] != FF_CODEC_TAGS_END; n++);
>  ctx->codec_tag = c->codec_tags[ctx->codec_tag % n];
>  }
> +#endif
>  keyframes   = bytestream2_get_le64();
>  request_channel_layout  = bytestream2_get_le64();
>  

Will apply tomorrow unless there are objections.

- Andreas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] libavutil/hwcontext_qsv: Align width and heigh when download qsv frame

2022-03-31 Thread Wenbin Chen

The width and height for qsv frame to download need to be
aligned with 16. Add the alignment operation.
Now the following command works:
ffmpeg -hwaccel qsv -f rawvideo -s 1920x1080 -pix_fmt yuv420p -i \
input.yuv -vf "hwupload=extra_hw_frames=16,format=qsv,hwdownload, \
format=nv12" -f null -

Signed-off-by: Wenbin Chen 
---
 libavutil/hwcontext_qsv.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/libavutil/hwcontext_qsv.c b/libavutil/hwcontext_qsv.c
index 95f8071abe..1e7c065902 100644
--- a/libavutil/hwcontext_qsv.c
+++ b/libavutil/hwcontext_qsv.c
@@ -1063,6 +1063,40 @@ static int qsv_transfer_data_from(AVHWFramesContext 
*ctx, AVFrame *dst,
 if (ret < 0)
 return ret;
 
+/* According to MSDK spec for mfxframeinfo, "Width must be a multiple of 
16.
+ * Height must be a multiple of 16 for progressive frame sequence and a
+ * multiple of 32 otherwise.", so allign all frames to 16 before 
downloading. */
+if (ctx->height & 15 || dst->linesize[0] & 15) {
+AVFrame *tmp_frame;
+tmp_frame = av_frame_alloc();
+if (!tmp_frame)
+return AVERROR(ENOMEM);
+ret = av_frame_ref(tmp_frame, dst);
+if (ret < 0) {
+av_frame_free(_frame);
+return ret;
+}
+av_frame_unref(dst);
+
+dst->width  = FFALIGN(tmp_frame->width, 16);
+dst->height = FFALIGN(ctx->height, 16);
+dst->format = tmp_frame->format;
+ret = av_frame_get_buffer(dst, 0);
+if (ret < 0) {
+av_frame_free(_frame);
+return ret;
+}
+
+dst->width = tmp_frame->width;
+dst->height = tmp_frame->height;
+ret = av_frame_copy_props(dst, tmp_frame);
+if (ret < 0) {
+av_frame_free(_frame);
+return ret;
+}
+av_frame_free(_frame);
+}
+
 if (!s->session_download) {
 if (s->child_frames_ref)
 return qsv_transfer_data_child(ctx, dst, src);
-- 
2.32.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/8] fate/filter-refcmp-*: make refcmp_metadata fail on empty input

2022-03-31 Thread Tobias Rapp


On 30/03/2022 22:31, Marton Balint wrote:

On empty input the awk script was always successful which caused the
filter-refcmp tests to always succeed.

Also fix the command lines for refcmp_metadata compare function because it
needs auto conversion filters, and update reference of test
filter-refcmp-psnr-rgb because it was missed in
a7fc78c1a638a32c3695c06f727774c740d675c2 but was never noticed due to the
original issue...

Signed-off-by: Marton Balint 
---
  tests/fate-run.sh |  2 +-
  tests/ref/fate/filter-refcmp-psnr-rgb | 80 +--
  tests/refcmp-metadata.awk |  3 +
  3 files changed, 44 insertions(+), 41 deletions(-)

diff --git a/tests/fate-run.sh b/tests/fate-run.sh
index fbfc0a925d..5e8d607d88 100755
--- a/tests/fate-run.sh
+++ b/tests/fate-run.sh
@@ -377,7 +377,7 @@ refcmp_metadata(){
  refcmp=$1
  pixfmt=$2
  fuzz=${3:-0.001}
-ffmpeg $FLAGS $ENC_OPTS \
+ffmpeg -auto_conversion_filters $FLAGS $ENC_OPTS \
  -lavfi 
"testsrc2=size=300x200:rate=1:duration=5,format=${pixfmt},split[ref][tmp];[tmp]avgblur=4[enc];[enc][ref]${refcmp},metadata=print:file=-"
 \
  -f null /dev/null | awk -v ref=${ref} -v fuzz=${fuzz} -f 
${base}/refcmp-metadata.awk -
  }
diff --git a/tests/ref/fate/filter-refcmp-psnr-rgb 
b/tests/ref/fate/filter-refcmp-psnr-rgb
index f06db575ac..20abd3dc5a 100644
--- a/tests/ref/fate/filter-refcmp-psnr-rgb
+++ b/tests/ref/fate/filter-refcmp-psnr-rgb
@@ -1,45 +1,45 @@
  frame:0pts:0   pts_time:0
-lavfi.psnr.mse.r=1381.80
-lavfi.psnr.psnr.r=16.73
-lavfi.psnr.mse.g=896.00
-lavfi.psnr.psnr.g=18.61
-lavfi.psnr.mse.b=277.38
-lavfi.psnr.psnr.b=23.70
-lavfi.psnr.mse_avg=851.73
-lavfi.psnr.psnr_avg=18.83
+lavfi.psnr.mse.r=1367.642090
+lavfi.psnr.psnr.r=16.771078
+lavfi.psnr.mse.g=885.804382
+lavfi.psnr.psnr.g=18.657425
+lavfi.psnr.mse.b=274.825073
+lavfi.psnr.psnr.b=23.740240
+lavfi.psnr.mse_avg=842.757202
+lavfi.psnr.psnr_avg=18.873779
  frame:1pts:1   pts_time:1
-lavfi.psnr.mse.r=1380.37
-lavfi.psnr.psnr.r=16.73
-lavfi.psnr.mse.g=975.91
-lavfi.psnr.psnr.g=18.24
-lavfi.psnr.mse.b=435.72
-lavfi.psnr.psnr.b=21.74
-lavfi.psnr.mse_avg=930.67
-lavfi.psnr.psnr_avg=18.44
+lavfi.psnr.mse.r=1356.681152
+lavfi.psnr.psnr.r=16.806026
+lavfi.psnr.mse.g=958.161560
+lavfi.psnr.psnr.g=18.316416
+lavfi.psnr.mse.b=428.238312
+lavfi.psnr.psnr.b=21.813948
+lavfi.psnr.mse_avg=914.360352
+lavfi.psnr.psnr_avg=18.519630
  frame:2pts:2   pts_time:2
-lavfi.psnr.mse.r=1403.20
-lavfi.psnr.psnr.r=16.66
-lavfi.psnr.mse.g=954.05
-lavfi.psnr.psnr.g=18.34
-lavfi.psnr.mse.b=494.22
-lavfi.psnr.psnr.b=21.19
-lavfi.psnr.mse_avg=950.49
-lavfi.psnr.psnr_avg=18.35
+lavfi.psnr.mse.r=1387.254883
+lavfi.psnr.psnr.r=16.709242
+lavfi.psnr.mse.g=939.230957
+lavfi.psnr.psnr.g=18.403080
+lavfi.psnr.mse.b=493.913757
+lavfi.psnr.psnr.b=21.194292
+lavfi.psnr.mse_avg=940.133179
+lavfi.psnr.psnr_avg=18.398911
  frame:3pts:3   pts_time:3
-lavfi.psnr.mse.r=1452.80
-lavfi.psnr.psnr.r=16.51
-lavfi.psnr.mse.g=1001.02
-lavfi.psnr.psnr.g=18.13
-lavfi.psnr.mse.b=557.39
-lavfi.psnr.psnr.b=20.67
-lavfi.psnr.mse_avg=1003.74
-lavfi.psnr.psnr_avg=18.11
+lavfi.psnr.mse.r=1433.291260
+lavfi.psnr.psnr.r=16.567459
+lavfi.psnr.mse.g=990.005859
+lavfi.psnr.psnr.g=18.174425
+lavfi.psnr.mse.b=550.512329
+lavfi.psnr.psnr.b=20.723133
+lavfi.psnr.mse_avg=991.269836
+lavfi.psnr.psnr_avg=18.168884
  frame:4pts:4   pts_time:4
-lavfi.psnr.mse.r=1401.25
-lavfi.psnr.psnr.r=16.67
-lavfi.psnr.mse.g=1009.80
-lavfi.psnr.psnr.g=18.09
-lavfi.psnr.mse.b=602.42
-lavfi.psnr.psnr.b=20.33
-lavfi.psnr.mse_avg=1004.49
-lavfi.psnr.psnr_avg=18.11
+lavfi.psnr.mse.r=1385.949341
+lavfi.psnr.psnr.r=16.713329
+lavfi.psnr.mse.g=997.065796
+lavfi.psnr.psnr.g=18.143566
+lavfi.psnr.mse.b=601.962952
+lavfi.psnr.psnr.b=20.335106
+lavfi.psnr.mse_avg=994.992676
+lavfi.psnr.psnr_avg=18.152605
diff --git a/tests/refcmp-metadata.awk b/tests/refcmp-metadata.awk
index fa21aad0e0..e7ed5ae809 100644
--- a/tests/refcmp-metadata.awk
+++ b/tests/refcmp-metadata.awk
@@ -50,12 +50,15 @@ BEGIN {
  }
  
  END {

+result = result && (NR != 0);


Checking for "NR == ref_nr" would additionally catch truncated input.


  if (result) {
  for (i = 1; i <= ref_nr; i++)
  print ref_lines[i];
  } else {
  for (i = 1; i <= NR; i++)
  print cmp_lines[i];
+if (NR == 0)
+print "[refcmp] no input";


Output should go to stderr here.


  if (NR != ref_nr)
  print "[refcmp] lines: " NR " != " ref_nr > "/dev/stderr";


Maybe add an "else" before the "if" to avoid that both lines are printed 
for empty input.



  if (delta_max >= fuzz)


Otherwise looks good to me. Thanks for catching the issue!

Regards,
Tobias

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email

Re: [FFmpeg-devel] [PATCH 1/2] avutil/hwcontext_videotoolbox: create real buffer pool

2022-03-31 Thread zhilizhao(赵志立)

Ping for patch 1/2. rcombs has reviewed the patch on IRC. I decided to
drop patch 2/2.


> 11:05 rcombs: quink_: seems reasonable to me
> 11:06 quink_: rcombs: thanks : )
> 11:06 rcombs: not entirely sure what the deal with the second commit is but 
> ¯\_(ツ)_/¯ it's harmless so w/e


> On Mar 10, 2022, at 12:37 PM, Zhao Zhili  wrote:
> 
> vt_get_buffer shouldn't do buffer pool's job.
> ---
> libavutil/hwcontext_videotoolbox.c | 71 ++
> 1 file changed, 34 insertions(+), 37 deletions(-)
> 
> diff --git a/libavutil/hwcontext_videotoolbox.c 
> b/libavutil/hwcontext_videotoolbox.c
> index 026127d412..e442a95007 100644
> --- a/libavutil/hwcontext_videotoolbox.c
> +++ b/libavutil/hwcontext_videotoolbox.c
> @@ -210,9 +210,36 @@ static int vt_pool_alloc(AVHWFramesContext *ctx)
> return AVERROR_EXTERNAL;
> }
> 
> -static AVBufferRef *vt_dummy_pool_alloc(void *opaque, size_t size)
> +static void videotoolbox_buffer_release(void *opaque, uint8_t *data)
> +{
> +CVPixelBufferRelease((CVPixelBufferRef)data);
> +}
> +
> +static AVBufferRef *vt_pool_alloc_buffer(void *opaque, size_t size)
> {
> -return NULL;
> +CVPixelBufferRef pixbuf;
> +AVBufferRef *buf;
> +CVReturn err;
> +AVHWFramesContext *ctx = opaque;
> +VTFramesContext *fctx = ctx->internal->priv;
> +
> +err = CVPixelBufferPoolCreatePixelBuffer(
> +NULL,
> +fctx->pool,
> +
> +);
> +if (err != kCVReturnSuccess) {
> +av_log(ctx, AV_LOG_ERROR, "Failed to create pixel buffer from pool: 
> %d\n", err);
> +return NULL;
> +}
> +
> +buf = av_buffer_create((uint8_t *)pixbuf, size,
> +   videotoolbox_buffer_release, NULL, 0);
> +if (!buf) {
> +CVPixelBufferRelease(pixbuf);
> +return NULL;
> +}
> +return buf;
> }
> 
> static void vt_frames_uninit(AVHWFramesContext *ctx)
> @@ -238,9 +265,9 @@ static int vt_frames_init(AVHWFramesContext *ctx)
> return AVERROR(ENOSYS);
> }
> 
> -// create a dummy pool so av_hwframe_get_buffer doesn't EINVAL
> if (!ctx->pool) {
> -ctx->internal->pool_internal = av_buffer_pool_init2(0, ctx, 
> vt_dummy_pool_alloc, NULL);
> +ctx->internal->pool_internal = av_buffer_pool_init2(
> +sizeof(CVPixelBufferRef), ctx, vt_pool_alloc_buffer, NULL);
> if (!ctx->internal->pool_internal)
> return AVERROR(ENOMEM);
> }
> @@ -252,41 +279,11 @@ static int vt_frames_init(AVHWFramesContext *ctx)
> return 0;
> }
> 
> -static void videotoolbox_buffer_release(void *opaque, uint8_t *data)
> -{
> -CVPixelBufferRelease((CVPixelBufferRef)data);
> -}
> -
> static int vt_get_buffer(AVHWFramesContext *ctx, AVFrame *frame)
> {
> -VTFramesContext *fctx = ctx->internal->priv;
> -
> -if (ctx->pool && ctx->pool->size != 0) {
> -frame->buf[0] = av_buffer_pool_get(ctx->pool);
> -if (!frame->buf[0])
> -return AVERROR(ENOMEM);
> -} else {
> -CVPixelBufferRef pixbuf;
> -AVBufferRef *buf = NULL;
> -CVReturn err;
> -
> -err = CVPixelBufferPoolCreatePixelBuffer(
> -NULL,
> -fctx->pool,
> -
> -);
> -if (err != kCVReturnSuccess) {
> -av_log(ctx, AV_LOG_ERROR, "Failed to create pixel buffer from 
> pool: %d\n", err);
> -return AVERROR_EXTERNAL;
> -}
> -
> -buf = av_buffer_create((uint8_t *)pixbuf, 1, 
> videotoolbox_buffer_release, NULL, 0);
> -if (!buf) {
> -CVPixelBufferRelease(pixbuf);
> -return AVERROR(ENOMEM);
> -}
> -frame->buf[0] = buf;
> -}
> +frame->buf[0] = av_buffer_pool_get(ctx->pool);
> +if (!frame->buf[0])
> +return AVERROR(ENOMEM);
> 
> frame->data[3] = frame->buf[0]->data;
> frame->format  = AV_PIX_FMT_VIDEOTOOLBOX;
> -- 
> 2.31.1
> 
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

54 matches

Mail list logo