Hello,
The following routine seems to produce invalid tabix indices (samtools 1.2):
zgrep '^chr1' some.vcf.gz > chr1.vcf
zgrep '^chr2' some.vcf.gz > chr2.vcf
zgrep '^#' some.vcf.gz > header.vcf
cat header.vcf chr1.vcf > chr1_h.vcf
bgzip chr1_h.vcf
bgzip chr2.vcf
cat chr1_h.vcf.gz chr2.vcf.gz > test.vcf.gz
tabix test.vcf.gz
tabix test.vcf.gz chr2 # blank
tabix test.vcf.gz chr1 # works

bgzip -d test.vcf.gz
bgzip test.vcf
tabix test.vcf.gz
tabix test.vcf.gz chr2 # works now

I was under the impression that bgzipped files are directly cat'able. Is this a 
bug? 

Thank you,
Stathis


-----Original Message-----
From: Christian Ruckert [mailto:cruck...@uni-muenster.de] 
Sent: 13 May 2015 10:19
To: samtools-help@lists.sourceforge.net
Subject: [Samtools-help] mpileup: tradeoff between runtime and accuracy

So far I called variants on my high coverage targeted sequencing data with the 
following mpileup settings:

samtools mpileup -ug -Q 5 -d 2000 -L 2000 -f hg19.fasta -l target.bed in.bam | 
...

bam files are around 300 megabytes, target.bed contains 10 genes and runtime 
was pretty acceptable with around 1-2 hours. But this setting has two major 
drawbacks:

- I missed indels with a coverage higher than 2000 because of the -L paramter
- Even if I think SNPs are called correctly with this -d value the DP and DP4 
values are higher than 2000 (which I don't understand
completely) but don't contain all reads as shown in IGV

So I tried the following settings

samtools mpileup -ug -Q 0 -d 1000000 -L 1000000 -f hg19.fasta -l target.bed 
in.bam | ...

Now "nothing" is missed and DP values are correct but the program runs for more 
than 12 hours, which I think is to long given my relatively small input data.

I already tried the --no-BAQ parameter with little success. So my questions:

Is my runtime comparable to others, which parameters have the most influence on 
runtime and how can I get correct DP values even if not using all reads for 
variant calling?

Best,
Christian

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud Widest 
out-of-the-box monitoring support with 50+ applications Performance metrics, 
stats and reports that give you Actionable Insights Deep dive visibility with 
transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to