I have reviewed this and my main comment is that the testing draft reads as a description of the kind of tests that could be done, and the sort of measurements that could be made, and does not specify the precise test conditions and metrics that should be used.
I think to make progress in the group having some common test conditions specified so that tools and technologies can be compared, and the exact minimal sets of results that should be reported, would be good. The common test conditions should specify codec configurations for the two codecs that have so far been proposed, quality/qp settings, test sequences, anchors, BDR metric calculations and so on. They should also not be so onerous that it requires a compute farm to run them. The test sets mentioned in the draft are quite large, and each group contains a lot of very similar material. Running simulations in high complexity mode for Daala or Thor for multiple data points across the whole test set takes hundreds of CPU hours. So it would be valuable to identify a core smaller set of sequences, and simulation parameters to be reported on, as part of the common test conditions. These could be periodically checked against the larger test sets for representativeness. The draft refers to the BDR calculation tool provided as part of the AreWeCompressedYet. It would be useful to specify mathematically what this calculation is, as whilst there are multiple implementations of the original BDR calculation, the only implementation of the spline-interpolated version is in these scripts. Having a specification means that these scripts can be tested for sanity also. I think it also needs to be specified how BDR numbers from tests should be aggregated to produce a BDR figure for a category or a whole test set. AWCY does an unusual thing in computing a combined BDR as if all the individual sequences had been concatenated. This means that the sequence with the largest bit rate tends to dominate, and the correlation between the metrics and visual quality is worsened, since this is quite strong for single sequences, but pretty weak across sequences. Potentially this number is very misleading. I think averaging the BDR figures for individual sequences would be much better, and that should be mandated in the draft. The draft requires that at least 10 samples should be used. I think this depends on the range that you want to cover, and whether it is necessary to cover very low and very high rates, and also what metrics are being tested. FASTSSIM is the least log-linear metric, and does tend to require more points. But it also has the most unpredictable relation with quality, sometimes indicating big losses when there are visual improvements, sometimes vice-versa. So I am not so sure that a large number of points should be mandated. Only 4 were deemed necessary for AVC and HEVC standardization. The draft mentions three bit rate ranges, and it implies that BDR should be computed for them individually. That seemed like a good idea initially but since experimenting with this I've found it pretty problematic, because BDR requires an overlap in *quality* ranges, not bit rate ranges. Therefore two curves being compared may not overlap at all, or very much. In some cases it has also proved impossible to configure a codec to provide data points that cover the whole of the upper or the whole of the lower range. On the AWCY tool, this results in missing data and invalidates all the aggregate BDR values. Best regards Thomas -----Original Message----- From: video-codec [mailto:[email protected]] On Behalf Of [email protected] Sent: Tuesday, March 01, 2016 2:38 AM To: [email protected] Cc: [email protected] Subject: [video-codec] I-D Action: draft-ietf-netvc-testing-01.txt A New Internet-Draft is available from the on-line Internet-Drafts directories. This draft is a work item of the Internet Video Codec of the IETF. Title : Video Codec Testing and Quality Measurement Authors : Thomas Daede Andrey Norkin Filename : draft-ietf-netvc-testing-01.txt Pages : 11 Date : 2016-02-29 Abstract: This document describes guidelines and procedures for evaluating a video codec specified at the IETF. This covers subjective and objective tests, test conditions, and materials used for the test. The IETF datatracker status page for this draft is: https://datatracker.ietf.org/doc/draft-ietf-netvc-testing/ There's also a htmlized version available at: https://tools.ietf.org/html/draft-ietf-netvc-testing-01 A diff from the previous version is available at: https://www.ietf.org/rfcdiff?url2=draft-ietf-netvc-testing-01 Please note that it may take a couple of minutes from the time of submission until the htmlized version and diff are available at tools.ietf.org. Internet-Drafts are also available by anonymous FTP at: ftp://ftp.ietf.org/internet-drafts/ _______________________________________________ video-codec mailing list [email protected] https://www.ietf.org/mailman/listinfo/video-codec _______________________________________________ video-codec mailing list [email protected] https://www.ietf.org/mailman/listinfo/video-codec
