Thank you Colin, Yes, that’s the reason for the discrepancy! It seems realignment and BQ recalibration may not be a good choice in this case. The upstream indel that give rise to the discrepancies is supported by several reads and corresponds to a known SNP (rs11481705)
It is said that realignment for the computation of base alignment quality (BAQ) reduces the number of false positives. If I have used a very stringent alignment approach and also realigned with GATK, is this feature of samtools still recommended? Regards, Federico > On 17 Jun 2016, at 01:55, Colin Hercus <co...@novocraft.com> wrote: > > Hi Federico, > > I expect this is related to BAQ calculations > > 2. Base alignment quality (BAQ) computation is turned on by default. > > BAQ is a phred-like score representing the probability that a read base is > mis-aligned; it lowers the base quality score of mismatches that are near > indels. This is to help rule out false positive SNP calls due to alignment > artifacts near small indels. There have been recent suggestions, however, > that BAQ may be too strict and cause real SNPs to be missed. Several users of > the VarScan variant caller <http://varscan.sourceforge.net/>have reported > that its read counts disagree with what is seen in IGV, or somatic mutations > were missed > <http://sourceforge.net/projects/varscan/forums/forum/1073559/topic/5061782> > when mpileup was used instead of pileup. These issues are almost always due > to BAQ’s downgrade of base qualities to 0 or 1. This adjustment can’t be seen > in IGV, but it’s below VarScan’s default base quality threshold. You can > disable BAQ with the -B parameter, or perform a more sensitive BAQ > calculation with -E. I’ve heard that the latter option will be turned on by > default in the next version of SAMtools. > > Try option --no-BAQ on the pileup with a reference. > > Colin > > On 17 June 2016 at 06:22, Federico Abascal <f...@sanger.ac.uk > <mailto:f...@sanger.ac.uk>> wrote: > Hi, > > I’ve found something very strange with samtools mpileup, looks like a bug. > > If I provide the reference fasta I get completely different results than if I > don’t. > > Look at this example: > ./samtools-1.3.1/samtools mpileup -r 20:36052981 -f human_g1k_v37.fasta > file.bam > 20 36052981 A 2 ,G >1 > 20 36052982 G 2 ,. >1 > 20 36052983 G 2 ,$. <3 > 20 36052984 A 9 .,..,,.,. <B7C77779 > 20 36052985 G 9 .,..,,.,. <B7C77779 > 20 36052986 A 9 .,..$,,.,. <F7=77779 > 20 36052987 A 8 GgGggGgG </777779 > 20 36052988 G 8 .,.,,.,. <F777779 > 20 36052989 G 8 .,.,,.,. <F777779 > 20 36052990 A 8 .,.,,.,. FFKKKKKK > 20 36052991 G 8 .,.,,.,. FFKKKKKK > 20 36052992 C 8 .,.,,.,. FFKKKKKK > 20 36052993 T 8 .,.,,.,. FFKKKKKK > 20 36052994 G 8 .,.,,.,. FFKKKKKK > 20 36052995 A 8 .,.,,.,. BFKKKKKK > 20 36052996 G 8 .,.$,,.,. BF>KKKKK > > Now look what we get if we do not provide a reference: > ./samtools-1.3.1/samtools mpileup -r 20:36052981 file.bam > 20 36052981 N 11 aaGgGGggGgG BBFFKKKKKKK > 20 36052982 N 11 g$gGgGGggGgG BBFFKKKKKKK > 20 36052983 N 10 g$GgGGggGgG <FFKKKKKKK > 20 36052984 N 9 AaAAaaAaA FBKKKKKKK > 20 36052985 N 9 GgGGggGgG FBKKKKKKK > 20 36052986 N 9 AaAA$aaAaA FFKKKKKKK > 20 36052987 N 8 GgGggGgG F/KKKKKK > 20 36052988 N 8 GgGggGgG FFKKKKKK > 20 36052989 N 8 GgGggGgG FFKKKKKK > 20 36052990 N 8 AaAaaAaA FFKKKKKK > 20 36052991 N 8 GgGggGgG FFKKKKKK > 20 36052992 N 8 CcCccCcC FFKKKKKK > 20 36052993 N 8 TtTttTtT FFKKKKKK > 20 36052994 N 8 GgGggGgG FFKKKKKK > 20 36052995 N 8 AaAaaAaA BFKKKKKK > 20 36052996 N 8 GgG$ggGgG BFKKKKKK > > For site 36052981, if I provide the reference only two bases are reported, > whereas without the reference 11 bases are found. > Note that quality bases also change. > > I checked with IGV that the second case is the correct one (see snapshot). > Interestingly, all but two of the reads that overlap 36052981 have an indel > three bases before, so I guess the bug may be related to this. > > I am attaching a very small bam of the region. If it helps diagnosing the > problem I can send more information for other “strange” cases. > > Regards, > Federico > > > > > > > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity planning > reports. http://sdm.link/zohomanageengine <http://sdm.link/zohomanageengine> > _______________________________________________ > Samtools-help mailing list > Samtools-help@lists.sourceforge.net > <mailto:Samtools-help@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/samtools-help > <https://lists.sourceforge.net/lists/listinfo/samtools-help> > >
smime.p7s
Description: S/MIME cryptographic signature
------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://sdm.link/zohomanageengine
_______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help