Re: [video-codec] Test sequences, automated and otherwise

Thomas Daede Wed, 25 Feb 2015 14:13:00 -0800

To start the discussion, here is a brief overview of the four metrics we
currently use in Daala. The reference code is in the tools/dump_*.c
files in the Daala repository. Note that all of these metrics are
applied to the luma plane only.

## PSNR

PSNR is a traditional signal quality metric, measured in decibels. It is
directly drived from mean square error (MSE), or its square root (RMSE).
The formula used is:

20 * log10 ( MAX / RMSE )

or, equivalently:

10 * log10 ( MAX^2 / MSE )

which is the method used in the dump_psnr.c reference implementation.

## PSNR-HVS-M

The PSNR-HVS metric performs a DCT transform of 8x8 blocks of the image,
weights the coefficients, and then calculates the PSNR of those
coefficients. Several different sets of weights have been considered.
The weights used by the dump_pnsrhvs.c tool have been found to be the
best match to real MOS scores.

## SSIM

SSIM (Structural Similarity Image Metric) is a still image quality
metric introduced in 2004. It computes a score for each individual
pixel, using a window of neighboring pixels. These scores can then be
averaged to produce a global score for the entire image. The original
paper produces scores ranging between 0 and 1.

For the metric to appear more linear on BD-rate curves, the score is
converted into a nonlinear decibel scale:

-10 * log10 (1 - SSIM)

## Fast Multi-Scale SSIM

Multi-Scale SSIM is SSIM extended to multiple window sizes. This is
implemented by downscaling the image a number of times, and computing
SSIM over the same number of pixels, then averaging the SSIM scores
together. The final score is converted to decibels in the same manner as
SSIM.

On 02/25/2015 01:39 PM, Mo Zanaty (mzanaty) wrote:
> This is perhaps getting into charter bashing, but I think we will need
> some early milestone (close to requirements) for an evaluation criteria
> document that represents the workgroup consensus on comparative testing
> methodology and selection of solution candidates or specific tools. The
> set of test sequences will only be one small part of that. Metrics will be
> a very important part of that. While I agree designing new metrics should
> probably be beyond the scope of proposed deliverables, I think we likely
> need a thorough evaluation and discussion of various metrics and reach
> some consensus on how proposed solutions and tools will be measured and
> adopted.
> 
> Mo
> 
> On 2/25/15, 3:05 PM, Timothy B. Terriberry <[email protected]> wrote:
> 
> Harald Alvestrand wrote:
>> psnr values of 35 dB where x264 achieves 40 dB - it seems psnr isn't
>> particularly sensitive to the resulting blurriness).
> 
> Yes, it's well-known that PSNR loves low-passing. It's not the only
> metric that's going to have these problems. FastSSIM will probably be
> similarly blind. Fixable problems, maybe, but I don't want to get in the
> business of designing my own metrics. I'm not even sure there's good
> data on human preferences for when one should downsample, but I haven't
> spent any time looking.
> 
> _______________________________________________
> video-codec mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/video-codec
> 

_______________________________________________
video-codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/video-codec

Re: [video-codec] Test sequences, automated and otherwise

Reply via email to