classification is done on every detection bounding box in frame's side data,
which are the results of object detection (filter dnn_detect).
Please refer to commit log of dnn_detect for the material for detection,
and see below for classification.
- download material for classifcation:
wget
https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/emotions-recognition-retail-0003.bin
wget
https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/emotions-recognition-retail-0003.xml
wget
https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/emotions-recognition-retail-0003.label
- run command as:
./ffmpeg -i cici.jpg -vf
dnn_detect=dnn_backend=openvino:model=face-detection-adas-0001.xml:input=data:output=detection_out:confidence=0.6:labels=face-detection-adas-0001.label,dnn_classify=dnn_backend=openvino:model=emotions-recognition-retail-0003.xml:input=data:output=prob_emotion:confidence=0.3:labels=emotions-recognition-retail-0003.label:target=face,showinfo
-f null -
We'll see the detect result as below:
[Parsed_showinfo_2 @ 0x55b7d25e77c0] side data - detection bounding boxes:
[Parsed_showinfo_2 @ 0x55b7d25e77c0] source: face-detection-adas-0001.xml,
emotions-recognition-retail-0003.xml
[Parsed_showinfo_2 @ 0x55b7d25e77c0] index: 0, region: (1005, 813) -> (1086,
905), label: face, confidence: 1/1.
[Parsed_showinfo_2 @ 0x55b7d25e77c0]classify: label: happy,
confidence: 6757/1.
[Parsed_showinfo_2 @ 0x55b7d25e77c0] index: 1, region: (888, 839) -> (967,
926), label: face, confidence: 6917/1.
[Parsed_showinfo_2 @ 0x55b7d25e77c0]classify: label: anger,
confidence: 4320/1.
Signed-off-by: Guo, Yejun
---
the main change of V2 in this patch set is to rebase with latest code
by resolving the conflicts.
configure | 1 +
doc/filters.texi | 39
libavfilter/Makefile | 1 +
libavfilter/allfilters.c | 1 +
libavfilter/vf_dnn_classify.c | 330 ++
5 files changed, 372 insertions(+)
create mode 100644 libavfilter/vf_dnn_classify.c
diff --git a/configure b/configure
index 820f719a32..9f2dfaf2d4 100755
--- a/configure
+++ b/configure
@@ -3550,6 +3550,7 @@ derain_filter_select="dnn"
deshake_filter_select="pixelutils"
deshake_opencl_filter_deps="opencl"
dilation_opencl_filter_deps="opencl"
+dnn_classify_filter_select="dnn"
dnn_detect_filter_select="dnn"
dnn_processing_filter_select="dnn"
drawtext_filter_deps="libfreetype"
diff --git a/doc/filters.texi b/doc/filters.texi
index 36e35a175b..b405cc5dfb 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -10127,6 +10127,45 @@ ffmpeg -i INPUT -f lavfi -i
nullsrc=hd720,geq='r=128+80*(sin(sqrt((X-W/2)*(X-W/2
@end example
@end itemize
+@section dnn_classify
+
+Do classification with deep neural networks based on bounding boxes.
+
+The filter accepts the following options:
+
+@table @option
+@item dnn_backend
+Specify which DNN backend to use for model loading and execution. This option
accepts
+only openvino now, tensorflow backends will be added.
+
+@item model
+Set path to model file specifying network architecture and its parameters.
+Note that different backends use different file formats.
+
+@item input
+Set the input name of the dnn network.
+
+@item output
+Set the output name of the dnn network.
+
+@item confidence
+Set the confidence threshold (default: 0.5).
+
+@item labels
+Set path to label file specifying the mapping between label id and name.
+Each label name is written in one line, tailing spaces and empty lines are
skipped.
+The first line is the name of label id 0,
+and the second line is the name of label id 1, etc.
+The label id is considered as name if the label file is not provided.
+
+@item backend_configs
+Set the configs to be passed into backend
+
+For tensorflow backend, you can set its configs with @option{sess_config}
options,
+please use tools/python/tf_sess_config.py to get the configs for your system.
+
+@end table
+
@section dnn_detect
Do object detection with deep neural networks.
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 5a287364b0..6c22d0404e 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -243,6 +243,7 @@ OBJS-$(CONFIG_DILATION_FILTER) +=
vf_neighbor.o
OBJS-$(CONFIG_DILATION_OPENCL_FILTER)+= vf_neighbor_opencl.o opencl.o \
opencl/neighbor.o
OBJS-$(CONFIG_DISPLACE_FILTER) += vf_displace.o framesync.o
+OBJS-$(CONFIG_DNN_CLASSIFY_FILTER) += vf_dnn_classify.o
OBJS-$(CONFIG_DNN_DETECT_FILTER) += vf_dnn_detect.o
OBJS-$(CONFIG_DNN_PROCESSING_FILTER) += vf_dnn_processing.o
OBJS-$(CONFIG_DOUBLEWEAVE_FILTER)+= vf_weave.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 931d7dbb0d..87c3661cf4 100644
--- a/libavfilter/allfilters.c
+++