So far I called variants on my high coverage targeted sequencing data with the following mpileup settings:
samtools mpileup -ug -Q 5 -d 2000 -L 2000 -f hg19.fasta -l target.bed in.bam | ... bam files are around 300 megabytes, target.bed contains 10 genes and runtime was pretty acceptable with around 1-2 hours. But this setting has two major drawbacks: - I missed indels with a coverage higher than 2000 because of the -L paramter - Even if I think SNPs are called correctly with this -d value the DP and DP4 values are higher than 2000 (which I don't understand completely) but don't contain all reads as shown in IGV So I tried the following settings samtools mpileup -ug -Q 0 -d 1000000 -L 1000000 -f hg19.fasta -l target.bed in.bam | ... Now "nothing" is missed and DP values are correct but the program runs for more than 12 hours, which I think is to long given my relatively small input data. I already tried the --no-BAQ parameter with little success. So my questions: Is my runtime comparable to others, which parameters have the most influence on runtime and how can I get correct DP values even if not using all reads for variant calling? Best, Christian ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help