To start the discussion, here is a brief overview of the four metrics we currently use in Daala. The reference code is in the tools/dump_*.c files in the Daala repository. Note that all of these metrics are applied to the luma plane only.
## PSNR PSNR is a traditional signal quality metric, measured in decibels. It is directly drived from mean square error (MSE), or its square root (RMSE). The formula used is: 20 * log10 ( MAX / RMSE ) or, equivalently: 10 * log10 ( MAX^2 / MSE ) which is the method used in the dump_psnr.c reference implementation. ## PSNR-HVS-M The PSNR-HVS metric performs a DCT transform of 8x8 blocks of the image, weights the coefficients, and then calculates the PSNR of those coefficients. Several different sets of weights have been considered. The weights used by the dump_pnsrhvs.c tool have been found to be the best match to real MOS scores. ## SSIM SSIM (Structural Similarity Image Metric) is a still image quality metric introduced in 2004. It computes a score for each individual pixel, using a window of neighboring pixels. These scores can then be averaged to produce a global score for the entire image. The original paper produces scores ranging between 0 and 1. For the metric to appear more linear on BD-rate curves, the score is converted into a nonlinear decibel scale: -10 * log10 (1 - SSIM) ## Fast Multi-Scale SSIM Multi-Scale SSIM is SSIM extended to multiple window sizes. This is implemented by downscaling the image a number of times, and computing SSIM over the same number of pixels, then averaging the SSIM scores together. The final score is converted to decibels in the same manner as SSIM. On 02/25/2015 01:39 PM, Mo Zanaty (mzanaty) wrote: > This is perhaps getting into charter bashing, but I think we will need > some early milestone (close to requirements) for an evaluation criteria > document that represents the workgroup consensus on comparative testing > methodology and selection of solution candidates or specific tools. The > set of test sequences will only be one small part of that. Metrics will be > a very important part of that. While I agree designing new metrics should > probably be beyond the scope of proposed deliverables, I think we likely > need a thorough evaluation and discussion of various metrics and reach > some consensus on how proposed solutions and tools will be measured and > adopted. > > Mo > > On 2/25/15, 3:05 PM, Timothy B. Terriberry <[email protected]> wrote: > > Harald Alvestrand wrote: >> psnr values of 35 dB where x264 achieves 40 dB - it seems psnr isn't >> particularly sensitive to the resulting blurriness). > > Yes, it's well-known that PSNR loves low-passing. It's not the only > metric that's going to have these problems. FastSSIM will probably be > similarly blind. Fixable problems, maybe, but I don't want to get in the > business of designing my own metrics. I'm not even sure there's good > data on human preferences for when one should downsample, but I haven't > spent any time looking. > > _______________________________________________ > video-codec mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/video-codec > _______________________________________________ video-codec mailing list [email protected] https://www.ietf.org/mailman/listinfo/video-codec
