Since this is amplicon sequencing, is it possible that the primers for a second amplicon fall right adjacent to the site of interest ? If so, then widening the region slightly would suddenly double the maximum depth found within the region and go over the 8000 limit. This is just a guess. Could be something quite else.

                                                -  tom blackwell  -

On Fri, 13 Jun 2014, Scott Newman wrote:

Truncated output from mpileup:


Hi All,

We've got Illumina paired end reads from amplicon sequencing aligned with BWA.
We noticed that Varscan 2 was missing some very important cancer mutations in 
KRAS codon 12.
We initially thought that it was VarScan's fault, but it turned out that the 
problem was with the intermediate data coming from mpileup.

If you mpileup genome-wide and then see what you get at position chr12:25398285 
(where we know there's a mutation). The coverage seems quite sparse when in 
reality we have around 8000x.

samtools mpileup -B -f $hg19 tmp.bam | grep 25398285
chr12??? 25398285??? C??? 17??? A.......,,,,,^],^],^],^],??? HHHHHFAFFF0CFF000

I've tried single end, paired ends, A flag, B flag, samtools 0.18, samtools 0.19, 
and patched versions of samtools that allows >8000x coverage and I get the same 
truncated result. The confusing thing is that when you confine mpileup to a single 
basepair, you get the full 8000x coverage that you were expecting:

samtools mpileup -r chr12:25398285-25398285 -B -f $hg19 tmp.bam | grep 25398285
chr12??? 25398285??? C??? 7999??? 
AAAAAAAA,,,,...,a,,aa,,,a,............A.A..AA... etc etc etc
HHGHHHHHHHHHH0HHHHHHHHFHHHHHHGHFHHHHHFCHHH0HHHHHHHHHGHHHGHFHHHH etc etc etc

But if you widen the region slighly, it truncates again.

samtools mpileup -r chr12:25398277-25398285 -B -f $hg19 tmp.bam | grep 25398285

chr12??? 25398285??? C??? 17??? A.......,,,,,^],^],^],^],??? HHHHHFAFFF0CFF000

We have 90 of these samples an the problem occurs in 1/3 of them.

Thanks for you help,

S
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to