I have reviewed this and my main comment is that the testing draft reads as a 
description of the kind of tests that could be done, and the sort of 
measurements that could be made, and does not specify the precise test 
conditions and metrics that should be used. 

I think to make progress in the group having some common test conditions 
specified so that tools and technologies can be compared, and the exact minimal 
sets of results that should be reported,  would be good. The common test 
conditions should specify codec configurations for the two codecs that have so 
far been proposed, quality/qp settings, test sequences, anchors, BDR metric 
calculations and so on. They should also not be so onerous that it requires a 
compute farm to run them.

The test sets mentioned in the draft are quite large, and each group contains a 
lot of very similar material. Running simulations in high complexity mode for 
Daala or Thor for multiple data points across the whole test set takes hundreds 
of CPU hours. So it would be valuable to identify a core smaller set of 
sequences, and simulation parameters to be reported on, as part of the common 
test conditions. These could be periodically checked against the larger test 
sets for representativeness.

The draft refers to the BDR calculation tool provided as part of the 
AreWeCompressedYet. It would be useful to specify mathematically what this 
calculation is, as whilst there are multiple implementations of the original 
BDR calculation, the only implementation of the spline-interpolated version is 
in these scripts. Having a specification means that these scripts can be tested 
for sanity also.

I think it also needs to be specified how BDR numbers from tests should be 
aggregated to produce a BDR figure for a category or a whole test set. AWCY 
does an unusual thing in computing a combined BDR as if all the individual 
sequences had been concatenated. This means that the sequence with the largest 
bit rate tends to dominate, and the correlation between the metrics and visual 
quality is worsened, since this is quite strong for single sequences,  but 
pretty weak across sequences. Potentially this number is very misleading. I 
think averaging the BDR figures for individual sequences would be much better, 
and that should be mandated in the draft.

The draft requires that at least 10 samples should be used. I think this 
depends on the range that you want to cover, and whether it is necessary to 
cover very low and very high rates, and also what metrics are being tested. 
FASTSSIM is the least log-linear metric, and does tend to require more points. 
But it also has the most unpredictable relation with quality, sometimes 
indicating big losses when there are visual improvements, sometimes vice-versa. 
So I am not so sure that a large number of points should be mandated. Only 4 
were deemed necessary for AVC and HEVC standardization.

The draft mentions three bit rate ranges, and it implies that BDR should be 
computed for them individually. That seemed like a good idea initially but 
since experimenting with this I've found it pretty problematic, because BDR 
requires an overlap in *quality* ranges, not bit rate ranges. Therefore two 
curves being compared may not overlap at all, or very much. In some cases it 
has also proved impossible to configure a codec to provide data points that 
cover the whole of the upper or the whole of the lower range. On the AWCY tool, 
this results in missing data and invalidates all the aggregate BDR values. 

Best regards

Thomas


-----Original Message-----
From: video-codec [mailto:[email protected]] On Behalf Of 
[email protected]
Sent: Tuesday, March 01, 2016 2:38 AM
To: [email protected]
Cc: [email protected]
Subject: [video-codec] I-D Action: draft-ietf-netvc-testing-01.txt


A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the Internet Video Codec of the IETF.

        Title           : Video Codec Testing and Quality Measurement
        Authors         : Thomas Daede
                          Andrey Norkin
        Filename        : draft-ietf-netvc-testing-01.txt
        Pages           : 11
        Date            : 2016-02-29

Abstract:
   This document describes guidelines and procedures for evaluating a
   video codec specified at the IETF.  This covers subjective and
   objective tests, test conditions, and materials used for the test.


The IETF datatracker status page for this draft is:
https://datatracker.ietf.org/doc/draft-ietf-netvc-testing/

There's also a htmlized version available at:
https://tools.ietf.org/html/draft-ietf-netvc-testing-01

A diff from the previous version is available at:
https://www.ietf.org/rfcdiff?url2=draft-ietf-netvc-testing-01


Please note that it may take a couple of minutes from the time of submission 
until the htmlized version and diff are available at tools.ietf.org.

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

_______________________________________________
video-codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/video-codec

_______________________________________________
video-codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/video-codec

Reply via email to