Hello, The following routine seems to produce invalid tabix indices (samtools 1.2): zgrep '^chr1' some.vcf.gz > chr1.vcf zgrep '^chr2' some.vcf.gz > chr2.vcf zgrep '^#' some.vcf.gz > header.vcf cat header.vcf chr1.vcf > chr1_h.vcf bgzip chr1_h.vcf bgzip chr2.vcf cat chr1_h.vcf.gz chr2.vcf.gz > test.vcf.gz tabix test.vcf.gz tabix test.vcf.gz chr2 # blank tabix test.vcf.gz chr1 # works
bgzip -d test.vcf.gz bgzip test.vcf tabix test.vcf.gz tabix test.vcf.gz chr2 # works now I was under the impression that bgzipped files are directly cat'able. Is this a bug? Thank you, Stathis -----Original Message----- From: Christian Ruckert [mailto:cruck...@uni-muenster.de] Sent: 13 May 2015 10:19 To: samtools-help@lists.sourceforge.net Subject: [Samtools-help] mpileup: tradeoff between runtime and accuracy So far I called variants on my high coverage targeted sequencing data with the following mpileup settings: samtools mpileup -ug -Q 5 -d 2000 -L 2000 -f hg19.fasta -l target.bed in.bam | ... bam files are around 300 megabytes, target.bed contains 10 genes and runtime was pretty acceptable with around 1-2 hours. But this setting has two major drawbacks: - I missed indels with a coverage higher than 2000 because of the -L paramter - Even if I think SNPs are called correctly with this -d value the DP and DP4 values are higher than 2000 (which I don't understand completely) but don't contain all reads as shown in IGV So I tried the following settings samtools mpileup -ug -Q 0 -d 1000000 -L 1000000 -f hg19.fasta -l target.bed in.bam | ... Now "nothing" is missed and DP values are correct but the program runs for more than 12 hours, which I think is to long given my relatively small input data. I already tried the --no-BAQ parameter with little success. So my questions: Is my runtime comparable to others, which parameters have the most influence on runtime and how can I get correct DP values even if not using all reads for variant calling? Best, Christian ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help