Dear all,

trying to understand the effect of capping read mapping qualities with
the -C/--adjust-MQ options of samtools mpileup or samtools calmd:

The samtools docs state:

Given a read with a phred-scaled probability q of being generated from
the mapped position, the new mapping quality is about sqrt((INT-q)/INT)*INT.

but when considering this as the actual formula used, it looks really
weird (e.g., what about the common case INT-q < 0, but also it's weird
because this would invert the meaning of the quality score since with
higher original q you would get lower recalibrated scores).

Indeed when I checked the source code of sam_cap_mapq in htslib realn.c,
I found:

    for (i = 0, t = 1; i < mm; ++i)
        t *= (double)len / (i+1);
    t = q - 4.343 * log(t) + clip_q / 5.;
    if (t > thres) return -1;
    if (t < 0) t = 0;
    t = sqrt((thres - t) / thres) * thres;

where mm is the number of mismatches in the read, len the sum of
matches+mismatches, clip_q 13 times the number of hard-clipped bases (if
I read the code correctly) and q the sum of all mismatch qualities.

So that makes a lot more sense, but left me with three new questions:

1) the return -1: would that translate into a MAPQ of 255?

2) what is the scientific/empirical foundation of q - 4.343 * log(t) +
clip_q / 5. ?

3) how could I paraphrase these inner workings of the function so that
the words come closer to what's actually happening than the current docs
(to test my own understanding of things and to talk to other people
about it)?

Thanks in advance for any insights, help, suggestions!

Best,
Wolfgang





_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to