Hi Federico,

I expect this is related to BAQ calculations

*2. Base alignment quality (BAQ) computation is turned on by default*. BAQ
is a phred-like score representing the probability that a read base is
mis-aligned; it lowers the base quality score of mismatches that are near
indels. This is to help rule out false positive SNP calls due to alignment
artifacts near small indels. There have been recent suggestions, however,
that BAQ may be too strict and cause real SNPs to be missed. Several users
of the *VarScan variant caller <http://varscan.sourceforge.net>*have
reported that its *read counts disagree with what is seen in IGV*, or somatic
mutations were missed
<http://sourceforge.net/projects/varscan/forums/forum/1073559/topic/5061782>
when mpileup was used instead of pileup. These issues are almost always due
to BAQ’s downgrade of base qualities to 0 or 1. This adjustment can’t be
seen in IGV, but it’s below VarScan’s default base quality threshold. You
can disable BAQ with the *-B* parameter, or perform a more sensitive BAQ
calculation with *-E*. I’ve heard that the latter option will be turned on
by default in the next version of SAMtools.

Try option --no-BAQ on the pileup with a reference.

Colin

On 17 June 2016 at 06:22, Federico Abascal <f...@sanger.ac.uk> wrote:

> Hi,
>
> I’ve found something very strange with samtools mpileup, looks like a bug.
>
> If I provide the reference fasta I get completely different results than
> if I don’t.
>
> Look at this example:
>
> ./samtools-1.3.1/samtools mpileup -r 20:36052981 *-f human_g1k_v37.fasta*
> file.bam
> 20 36052981 A 2 ,G >1
> 20 36052982 G 2 ,. >1
> 20 36052983 G 2 ,$. <3
> 20 36052984 A 9 .,..,,.,. <B7C77779
> 20 36052985 G 9 .,..,,.,. <B7C77779
> 20 36052986 A 9 .,..$,,.,. <F7=77779
> 20 36052987 A 8 GgGggGgG </777779
> 20 36052988 G 8 .,.,,.,. <F777779
> 20 36052989 G 8 .,.,,.,. <F777779
> 20 36052990 A 8 .,.,,.,. FFKKKKKK
> 20 36052991 G 8 .,.,,.,. FFKKKKKK
> 20 36052992 C 8 .,.,,.,. FFKKKKKK
> 20 36052993 T 8 .,.,,.,. FFKKKKKK
> 20 36052994 G 8 .,.,,.,. FFKKKKKK
> 20 36052995 A 8 .,.,,.,. BFKKKKKK
> 20 36052996 G 8 .,.$,,.,. BF>KKKKK
>
>
> Now look what we get if we do not provide a reference:
>
> ./samtools-1.3.1/samtools mpileup -r 20:36052981 file.bam
> 20 36052981 N 11 aaGgGGggGgG BBFFKKKKKKK
> 20 36052982 N 11 g$gGgGGggGgG BBFFKKKKKKK
> 20 36052983 N 10 g$GgGGggGgG <FFKKKKKKK
> 20 36052984 N 9 AaAAaaAaA FBKKKKKKK
> 20 36052985 N 9 GgGGggGgG FBKKKKKKK
> 20 36052986 N 9 AaAA$aaAaA FFKKKKKKK
> 20 36052987 N 8 GgGggGgG F/KKKKKK
> 20 36052988 N 8 GgGggGgG FFKKKKKK
> 20 36052989 N 8 GgGggGgG FFKKKKKK
> 20 36052990 N 8 AaAaaAaA FFKKKKKK
> 20 36052991 N 8 GgGggGgG FFKKKKKK
> 20 36052992 N 8 CcCccCcC FFKKKKKK
> 20 36052993 N 8 TtTttTtT FFKKKKKK
> 20 36052994 N 8 GgGggGgG FFKKKKKK
> 20 36052995 N 8 AaAaaAaA BFKKKKKK
> 20 36052996 N 8 GgG$ggGgG BFKKKKKK
>
>
> For site 36052981, if I provide the reference only two bases are reported,
> whereas without the reference 11 bases are found.
> Note that quality bases also change.
>
> I checked with IGV that the second case is the correct one (see snapshot).
> Interestingly, all but two of the reads that overlap 36052981 have an indel
> three bases before, so I guess the bug may be related to this.
>
> I am attaching a very small bam of the region. If it helps diagnosing the
> problem I can send more information for other “strange” cases.
>
> Regards,
> Federico
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning
> reports. http://sdm.link/zohomanageengine
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help
>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to