Thank you Colin,

Yes, that’s the reason for the discrepancy!
It seems realignment and BQ recalibration may not be a good choice in this 
case. The upstream indel that give rise to the discrepancies is supported by 
several reads and corresponds to a known SNP (rs11481705)

It is said that realignment for the computation of base alignment quality (BAQ) 
reduces the number of false positives. If I have used a very stringent 
alignment approach and also realigned with GATK, is this feature of samtools 
still recommended? 

Regards,
Federico

 


> On 17 Jun 2016, at 01:55, Colin Hercus <co...@novocraft.com> wrote:
> 
> Hi Federico,
> 
> I expect this is related to BAQ calculations
> 
> 2. Base alignment quality (BAQ) computation is turned on by default.
> 
> BAQ is a phred-like score representing the probability that a read base is 
> mis-aligned; it lowers the base quality score of mismatches that are near 
> indels. This is to help rule out false positive SNP calls due to alignment 
> artifacts near small indels. There have been recent suggestions, however, 
> that BAQ may be too strict and cause real SNPs to be missed. Several users of 
> the VarScan variant caller  <http://varscan.sourceforge.net/>have reported 
> that its read counts disagree with what is seen in IGV, or somatic mutations 
> were missed 
> <http://sourceforge.net/projects/varscan/forums/forum/1073559/topic/5061782> 
> when mpileup was used instead of pileup. These issues are almost always  due 
> to BAQ’s downgrade of base qualities to 0 or 1. This adjustment can’t be seen 
> in IGV, but it’s below VarScan’s default base quality threshold. You can 
> disable BAQ with the -B parameter, or perform a more sensitive BAQ 
> calculation with -E. I’ve heard that the latter option will be turned on by 
> default in the next version of SAMtools.
> 
> Try option --no-BAQ on the pileup with a reference.
> 
> Colin
> 
> On 17 June 2016 at 06:22, Federico Abascal <f...@sanger.ac.uk 
> <mailto:f...@sanger.ac.uk>> wrote:
> Hi,
> 
> I’ve found something very strange with samtools mpileup, looks like a bug.
> 
> If I provide the reference fasta I get completely different results than if I 
> don’t.
> 
> Look at this example:
> ./samtools-1.3.1/samtools mpileup -r 20:36052981 -f human_g1k_v37.fasta 
> file.bam
> 20    36052981        A       2       ,G      >1
> 20    36052982        G       2       ,.      >1
> 20    36052983        G       2       ,$.     <3
> 20    36052984        A       9       .,..,,.,.       <B7C77779
> 20    36052985        G       9       .,..,,.,.       <B7C77779
> 20    36052986        A       9       .,..$,,.,.      <F7=77779
> 20    36052987        A       8       GgGggGgG        </777779
> 20    36052988        G       8       .,.,,.,.        <F777779
> 20    36052989        G       8       .,.,,.,.        <F777779
> 20    36052990        A       8       .,.,,.,.        FFKKKKKK
> 20    36052991        G       8       .,.,,.,.        FFKKKKKK
> 20    36052992        C       8       .,.,,.,.        FFKKKKKK
> 20    36052993        T       8       .,.,,.,.        FFKKKKKK
> 20    36052994        G       8       .,.,,.,.        FFKKKKKK
> 20    36052995        A       8       .,.,,.,.        BFKKKKKK
> 20    36052996        G       8       .,.$,,.,.       BF>KKKKK
> 
> Now look what we get if we do not provide a reference:
> ./samtools-1.3.1/samtools mpileup -r 20:36052981 file.bam
> 20    36052981        N       11      aaGgGGggGgG     BBFFKKKKKKK
> 20    36052982        N       11      g$gGgGGggGgG    BBFFKKKKKKK
> 20    36052983        N       10      g$GgGGggGgG     <FFKKKKKKK
> 20    36052984        N       9       AaAAaaAaA       FBKKKKKKK
> 20    36052985        N       9       GgGGggGgG       FBKKKKKKK
> 20    36052986        N       9       AaAA$aaAaA      FFKKKKKKK
> 20    36052987        N       8       GgGggGgG        F/KKKKKK
> 20    36052988        N       8       GgGggGgG        FFKKKKKK
> 20    36052989        N       8       GgGggGgG        FFKKKKKK
> 20    36052990        N       8       AaAaaAaA        FFKKKKKK
> 20    36052991        N       8       GgGggGgG        FFKKKKKK
> 20    36052992        N       8       CcCccCcC        FFKKKKKK
> 20    36052993        N       8       TtTttTtT        FFKKKKKK
> 20    36052994        N       8       GgGggGgG        FFKKKKKK
> 20    36052995        N       8       AaAaaAaA        BFKKKKKK
> 20    36052996        N       8       GgG$ggGgG       BFKKKKKK
> 
> For site 36052981, if I provide the reference only two bases are reported, 
> whereas without the reference 11 bases are found.
> Note that quality bases also change.
> 
> I checked with IGV that the second case is the correct one (see snapshot). 
> Interestingly, all but two of the reads that overlap 36052981 have an indel 
> three bases before, so I guess the bug may be related to this. 
> 
> I am attaching a very small bam of the region. If it helps diagnosing the 
> problem I can send more information for other “strange” cases.
> 
> Regards,
> Federico
> 
> 
> 
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
> patterns at an interface-level. Reveals which users, apps, and protocols are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity planning
> reports. http://sdm.link/zohomanageengine <http://sdm.link/zohomanageengine>
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net 
> <mailto:Samtools-help@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/samtools-help 
> <https://lists.sourceforge.net/lists/listinfo/samtools-help>
> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to