Truncated output from mpileup:
Hi All,
We've got Illumina paired end reads from amplicon sequencing aligned with BWA.
We noticed that Varscan 2 was missing some very important cancer mutations in
KRAS codon 12.
We initially thought that it was VarScan's fault, but it turned out that the
problem was with the intermediate data coming from mpileup.
If you mpileup genome-wide and then see what you get at position chr12:25398285
(where we know there's a mutation). The coverage seems quite sparse when in
reality we have around 8000x.
>samtools mpileup -B -f $hg19 tmp.bam | grep 25398285
>chr12 25398285 C 17 A.......,,,,,^],^],^],^], HHHHHFAFFF0CFF000
I've tried single end, paired ends, A flag, B flag, samtools 0.18, samtools
0.19, and patched versions of samtools that allows >8000x coverage and I get
the same truncated result. The confusing thing is that when you confine mpileup
to a single basepair, you get the full 8000x coverage that you were expecting:
>samtools mpileup -r chr12:25398285-25398285 -B -f $hg19 tmp.bam | grep 25398285
>chr12 25398285 C 7999
>AAAAAAAA,,,,...,a,,aa,,,a,............A.A..AA... etc etc etc
HHGHHHHHHHHHH0HHHHHHHHFHHHHHHGHFHHHHHFCHHH0HHHHHHHHHGHHHGHFHHHH etc etc etc
But if you widen the region slighly, it truncates again.
samtools mpileup -r chr12:25398277-25398285 -B -f $hg19 tmp.bam | grep 25398285
chr12 25398285 C 17 A.......,,,,,^],^],^],^], HHHHHFAFFF0CFF000
We have 90 of these samples an the problem occurs in 1/3 of them.
Thanks for you help,
S
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help