So far I called variants on my high coverage targeted sequencing data 
with the following mpileup settings:

samtools mpileup -ug -Q 5 -d 2000 -L 2000 -f hg19.fasta -l target.bed 
in.bam | ...

bam files are around 300 megabytes, target.bed contains 10 genes and 
runtime was pretty acceptable with around 1-2 hours. But this setting 
has two major drawbacks:

- I missed indels with a coverage higher than 2000 because of the -L 
paramter
- Even if I think SNPs are called correctly with this -d value the DP 
and DP4 values are higher than 2000 (which I don't understand 
completely) but don't contain all reads as shown in IGV

So I tried the following settings

samtools mpileup -ug -Q 0 -d 1000000 -L 1000000 -f hg19.fasta -l 
target.bed in.bam | ...

Now "nothing" is missed and DP values are correct but the program runs 
for more than 12 hours, which I think is to long given my relatively 
small input data.

I already tried the --no-BAQ parameter with little success. So my questions:

Is my runtime comparable to others, which parameters have the most 
influence on runtime and how can I get correct DP values even if not 
using all reads for variant calling?

Best,
Christian

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to