On 03/04/2016 08:27 AM, Thomas Davies (thdavies) wrote:
> I have reviewed this and my main comment is that the testing draft reads as a 
> description of the kind of tests that could be done, and the sort of 
> measurements that could be made, and does not specify the precise test 
> conditions and metrics that should be used. 
> 
> I think to make progress in the group having some common test conditions 
> specified so that tools and technologies can be compared, and the exact 
> minimal sets of results that should be reported,  would be good. The common 
> test conditions should specify codec configurations for the two codecs that 
> have so far been proposed, quality/qp settings, test sequences, anchors, BDR 
> metric calculations and so on. They should also not be so onerous that it 
> requires a compute farm to run them.
>
> The test sets mentioned in the draft are quite large, and each group contains 
> a lot of very similar material. Running simulations in high complexity mode 
> for Daala or Thor for multiple data points across the whole test set takes 
> hundreds of CPU hours. So it would be valuable to identify a core smaller set 
> of sequences, and simulation parameters to be reported on, as part of the 
> common test conditions. These could be periodically checked against the 
> larger test sets for representativeness.

I'll update the draft with the exact sequences specified for the
categories in "Automation". For feature changes, I think having a total
# of frames of content similar to that used in HEVC testing would be
reasonable.

> The draft refers to the BDR calculation tool provided as part of the 
> AreWeCompressedYet. It would be useful to specify mathematically what this 
> calculation is, as whilst there are multiple implementations of the original 
> BDR calculation, the only implementation of the spline-interpolated version 
> is in these scripts. Having a specification means that these scripts can be 
> tested for sanity also.

One of the downsides of the cubic hermite spline (or similar
alternatives, like pchip) is that it's somewhat complex to specify. I
also think it would be great to specify it in the test document, rather
than rely on a certain implementation.

> I think it also needs to be specified how BDR numbers from tests should be 
> aggregated to produce a BDR figure for a category or a whole test set. AWCY 
> does an unusual thing in computing a combined BDR as if all the individual 
> sequences had been concatenated. This means that the sequence with the 
> largest bit rate tends to dominate, and the correlation between the metrics 
> and visual quality is worsened, since this is quite strong for single 
> sequences,  but pretty weak across sequences. Potentially this number is very 
> misleading. I think averaging the BDR figures for individual sequences would 
> be much better, and that should be mandated in the draft.

Sounds good, proposing this text in the BD-Rate section:

### Result aggregation

Total BD-Rate changes for a set of test clips should be calculated by
the average of the rate change (in percent) calculated for each test clip.

> The draft requires that at least 10 samples should be used. I think this 
> depends on the range that you want to cover, and whether it is necessary to 
> cover very low and very high rates, and also what metrics are being tested. 
> FASTSSIM is the least log-linear metric, and does tend to require more 
> points. But it also has the most unpredictable relation with quality, 
> sometimes indicating big losses when there are visual improvements, sometimes 
> vice-versa. So I am not so sure that a large number of points should be 
> mandated. Only 4 were deemed necessary for AVC and HEVC standardization.

I realized I specified two numbers in this section - I think I'll drop
the 10 sample requirement, and rely on the "4 samples within the
measured range". This implies at least 6 samples, unless the ranges
coincide exactly with the sample points.

> The draft mentions three bit rate ranges, and it implies that BDR should be 
> computed for them individually. That seemed like a good idea initially but 
> since experimenting with this I've found it pretty problematic, because BDR 
> requires an overlap in *quality* ranges, not bit rate ranges. Therefore two 
> curves being compared may not overlap at all, or very much. In some cases it 
> has also proved impossible to configure a codec to provide data points that 
> cover the whole of the upper or the whole of the lower range. On the AWCY 
> tool, this results in missing data and invalidates all the aggregate BDR 
> values. 

Bit rates are mapped to qualities based on the first curve. However,
this does not solve the issue that codecs can't hit the required bitrate
(this is most apparent in the Twitch set.) Options include:

a) Choosing bit rate ranges per test sequence.
b) Choosing bit rate ranges high enough for all test sequences covered.
c) Defining ranges based on quantizers of the first curve.
d) Defining ranges based on the intersection of qualities bounded by the
quantizers for both curves.
e) Defining ranges based on quantizers of a third, reference curve.

I think you might have suggested option e) to me before. I'm going to
try an implementation of it on the AWCY site.

> Best regards
> 
> Thomas
> 
> 
> -----Original Message-----
> From: video-codec [mailto:[email protected]] On Behalf Of 
> [email protected]
> Sent: Tuesday, March 01, 2016 2:38 AM
> To: [email protected]
> Cc: [email protected]
> Subject: [video-codec] I-D Action: draft-ietf-netvc-testing-01.txt
> 
> 
> A New Internet-Draft is available from the on-line Internet-Drafts 
> directories.
> This draft is a work item of the Internet Video Codec of the IETF.
> 
>         Title           : Video Codec Testing and Quality Measurement
>         Authors         : Thomas Daede
>                           Andrey Norkin
>       Filename        : draft-ietf-netvc-testing-01.txt
>       Pages           : 11
>       Date            : 2016-02-29
> 
> Abstract:
>    This document describes guidelines and procedures for evaluating a
>    video codec specified at the IETF.  This covers subjective and
>    objective tests, test conditions, and materials used for the test.
> 
> 
> The IETF datatracker status page for this draft is:
> https://datatracker.ietf.org/doc/draft-ietf-netvc-testing/
> 
> There's also a htmlized version available at:
> https://tools.ietf.org/html/draft-ietf-netvc-testing-01
> 
> A diff from the previous version is available at:
> https://www.ietf.org/rfcdiff?url2=draft-ietf-netvc-testing-01
> 
> 
> Please note that it may take a couple of minutes from the time of submission 
> until the htmlized version and diff are available at tools.ietf.org.
> 
> Internet-Drafts are also available by anonymous FTP at:
> ftp://ftp.ietf.org/internet-drafts/
> 
> _______________________________________________
> video-codec mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/video-codec
> 
> _______________________________________________
> video-codec mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/video-codec
> 

_______________________________________________
video-codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/video-codec

Reply via email to