Hi Federico,
When I use samtools mpileup I usually turn BAQ off but then I only use it
for testing our aligner.
For production use we use Freebayes or multiple variant callers.
Brad Chapman has an interesting blog on Variant Callers where he concludes
Freebayes is best and without recalibration or realignment
https://bcbio.wordpress.com/2013/10/21/updated-comparison-of-variant-detection-methods-ensemble-freebayes-and-minimal-bam-preparation-pipelines/
Cheers, Colin
On 17 June 2016 at 15:04, Federico Abascal <f...@sanger.ac.uk> wrote:
> Thank you Colin,
>
> Yes, that’s the reason for the discrepancy!
> It seems realignment and BQ recalibration may not be a good choice in this
> case. The upstream indel that give rise to the discrepancies is supported
> by several reads and corresponds to a known SNP (rs11481705)
>
> It is said that realignment for the computation of base alignment quality
> (BAQ) reduces the number of false positives. If I have used a very
> stringent alignment approach and also realigned with GATK, is this feature
> of samtools still recommended?
>
> Regards,
> Federico
>
>
>
>
> On 17 Jun 2016, at 01:55, Colin Hercus <co...@novocraft.com> wrote:
>
> Hi Federico,
>
> I expect this is related to BAQ calculations
>
> *2. Base alignment quality (BAQ) computation is turned on by default*.
> BAQ is a phred-like score representing the probability that a read base is
> mis-aligned; it lowers the base quality score of mismatches that are near
> indels. This is to help rule out false positive SNP calls due to alignment
> artifacts near small indels. There have been recent suggestions, however,
> that BAQ may be too strict and cause real SNPs to be missed. Several users
> of the *VarScan variant caller <http://varscan.sourceforge.net/>*have
> reported that its *read counts disagree with what is seen in IGV*, or somatic
> mutations were missed
> <http://sourceforge.net/projects/varscan/forums/forum/1073559/topic/5061782>
> when mpileup was used instead of pileup. These issues are almost always due
> to BAQ’s downgrade of base qualities to 0 or 1. This adjustment can’t be
> seen in IGV, but it’s below VarScan’s default base quality threshold. You
> can disable BAQ with the *-B* parameter, or perform a more sensitive BAQ
> calculation with *-E*. I’ve heard that the latter option will be turned
> on by default in the next version of SAMtools.
>
> Try option --no-BAQ on the pileup with a reference.
>
> Colin
>
> On 17 June 2016 at 06:22, Federico Abascal <f...@sanger.ac.uk> wrote:
>
>> Hi,
>>
>> I’ve found something very strange with samtools mpileup, looks like a bug.
>>
>> If I provide the reference fasta I get completely different results than
>> if I don’t.
>>
>> Look at this example:
>>
>> ./samtools-1.3.1/samtools mpileup -r 20:36052981 *-f human_g1k_v37.fasta*
>> file.bam
>> 20 36052981 A 2 ,G >1
>> 20 36052982 G 2 ,. >1
>> 20 36052983 G 2 ,$. <3
>> 20 36052984 A 9 .,..,,.,. <B7C77779
>> 20 36052985 G 9 .,..,,.,. <B7C77779
>> 20 36052986 A 9 .,..$,,.,. <F7=77779
>> 20 36052987 A 8 GgGggGgG </777779
>> 20 36052988 G 8 .,.,,.,. <F777779
>> 20 36052989 G 8 .,.,,.,. <F777779
>> 20 36052990 A 8 .,.,,.,. FFKKKKKK
>> 20 36052991 G 8 .,.,,.,. FFKKKKKK
>> 20 36052992 C 8 .,.,,.,. FFKKKKKK
>> 20 36052993 T 8 .,.,,.,. FFKKKKKK
>> 20 36052994 G 8 .,.,,.,. FFKKKKKK
>> 20 36052995 A 8 .,.,,.,. BFKKKKKK
>> 20 36052996 G 8 .,.$,,.,. BF>KKKKK
>>
>>
>> Now look what we get if we do not provide a reference:
>>
>> ./samtools-1.3.1/samtools mpileup -r 20:36052981 file.bam
>> 20 36052981 N 11 aaGgGGggGgG BBFFKKKKKKK
>> 20 36052982 N 11 g$gGgGGggGgG BBFFKKKKKKK
>> 20 36052983 N 10 g$GgGGggGgG <FFKKKKKKK
>> 20 36052984 N 9 AaAAaaAaA FBKKKKKKK
>> 20 36052985 N 9 GgGGggGgG FBKKKKKKK
>> 20 36052986 N 9 AaAA$aaAaA FFKKKKKKK
>> 20 36052987 N 8 GgGggGgG F/KKKKKK
>> 20 36052988 N 8 GgGggGgG FFKKKKKK
>> 20 36052989 N 8 GgGggGgG FFKKKKKK
>> 20 36052990 N 8 AaAaaAaA FFKKKKKK
>> 20 36052991 N 8 GgGggGgG FFKKKKKK
>> 20 36052992 N 8 CcCccCcC FFKKKKKK
>> 20 36052993 N 8 TtTttTtT FFKKKKKK
>> 20 36052994 N 8 GgGggGgG FFKKKKKK
>> 20 36052995 N 8 AaAaaAaA BFKKKKKK
>> 20 36052996 N 8 GgG$ggGgG BFKKKKKK
>>
>>
>> For site 36052981, if I provide the reference only two bases are
>> reported, whereas without the reference 11 bases are found.
>> Note that quality bases also change.
>>
>> I checked with IGV that the second case is the correct one (see
>> snapshot). Interestingly, all but two of the reads that overlap 36052981
>> have an indel three bases before, so I guess the bug may be related to
>> this.
>>
>> I am attaching a very small bam of the region. If it helps diagnosing the
>> problem I can send more information for other “strange” cases.
>>
>> Regards,
>> Federico
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> What NetFlow Analyzer can do for you? Monitors network bandwidth and
>> traffic
>> patterns at an interface-level. Reveals which users, apps, and protocols
>> are
>> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
>> J-Flow, sFlow and other flows. Make informed decisions using capacity
>> planning
>> reports. http://sdm.link/zohomanageengine
>> _______________________________________________
>> Samtools-help mailing list
>> Samtools-help@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/samtools-help
>>
>>
>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help