On Tue, May 01, 2018 at 10:00:49AM +0100, Jody Phelan wrote: > I'm runinig into some trouble when truing to use the 'samtools mpileup | > bcftools call' combination. > It seems to incorrectly be calling very long indels where it looks like > there is not support. For example: > > 'samtools mpileup -f ref.fa sample.bam -r Chromosome:198940-198940' > produces: > > Chromosome 198940 C 37 .,.-1A.-1A...-1A.-1A.-1A.,-1a. > -1A.,-1a,-1a.,.-1A,,.-1A,,+1a,+1a,.-1A,,-1a.,-1a,-1a,,-1a,-1a.-1A,-1a. > 06?5/<:8>/4<?3691465<3354.08:23:11@3C > > This looks fine... however when I try to call the genotype with bcftools I > get a very long indel. > 'samtools mpileup -ugf ref.fa sample.bam -r Chromosome:198940-198940 -t AD > | bcftools call -mv' produces: > > Chromosome 198940 . > CAAAAGACGTCATCGACGTACGGTCGGTGACCACCGAGATCAACACGTTGTTCCAGACGCTCACCTCGATCGCCGA > C 225 . > INDEL;IDV=32;IMF=0.5;DP=62;VDB=0.145385;SGB=-0.662043;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,7,2;MQ=60 > GT:PL:AD 1/1:255,27,0:0,9
Firstly we'd recommend using bcftools mpileup now instead of samtools mpileup. The samtools vcf output is deprecated and liable to vanish at some point. Do you still get this problem with bcftools? Secondly, for indel calling you'll probably want to adjust IDV and IMF thresholds. These can be done as hard filters on calling (I think it's -m and -F flags to bcftools mpileup) or later in post-processing of the VCF file via a filter. Eg bcftools view -e "IDV < 3 || IMF < 0.03". (Adding e.g. DP > 60 to filter overdepth - say 1.8x your mean depth - can also be a good way of removing false variants caused by copy-number variations or collapsed repeats.) Finally, don't give bcftools call such a small region as I'm not sure it's tuned well for that. Do you get the same problem with say: bcftools mpileup -f ref.fa sample.bam -r Chromosome:198800-199200 \| bcftools call -vm -r Chromosome:198940-198940 - Ie the input to bcftools call is a larger region, but the caller itself is then restricted back down again? Sadly though I don't know what is causing this apparent 1bp indel to be called as a very long one. We'd need to see more data to debug this. James -- James Bonfield (j...@sanger.ac.uk) The Sanger Institute, Hinxton, Cambs, CB10 1SA -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help