On Tue, May 01, 2018 at 10:00:49AM +0100, Jody Phelan wrote:
> I'm runinig into some trouble when truing to use the 'samtools mpileup |
> bcftools call' combination.
> It seems to incorrectly be calling very long indels where it looks like
> there is not support. For example:
> 
> 'samtools mpileup -f ref.fa sample.bam -r Chromosome:198940-198940'
> produces:
> 
> Chromosome    198940    C    37    .,.-1A.-1A...-1A.-1A.-1A.,-1a.
> -1A.,-1a,-1a.,.-1A,,.-1A,,+1a,+1a,.-1A,,-1a.,-1a,-1a,,-1a,-1a.-1A,-1a.
> 06?5/<:8>/4<?3691465<3354.08:23:11@3C
> 
> This looks fine... however when I try to call the genotype with bcftools I
> get a very long indel.
> 'samtools mpileup -ugf ref.fa sample.bam -r Chromosome:198940-198940 -t AD
> | bcftools call -mv' produces:
> 
> Chromosome    198940    .
> CAAAAGACGTCATCGACGTACGGTCGGTGACCACCGAGATCAACACGTTGTTCCAGACGCTCACCTCGATCGCCGA
> C    225    .
> INDEL;IDV=32;IMF=0.5;DP=62;VDB=0.145385;SGB=-0.662043;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,7,2;MQ=60
> GT:PL:AD    1/1:255,27,0:0,9

Firstly we'd recommend using bcftools mpileup now instead of samtools
mpileup.  The samtools vcf output is deprecated and liable to vanish
at some point.  Do you still get this problem with bcftools?

Secondly, for indel calling you'll probably want to adjust IDV and IMF
thresholds.  These can be done as hard filters on calling (I think
it's -m and -F flags to bcftools mpileup) or later in post-processing
of the VCF file via a filter. Eg bcftools view -e "IDV < 3 || IMF < 0.03". 
(Adding e.g. DP > 60 to filter overdepth - say 1.8x your mean depth -
can also be a good way of removing false variants caused by
copy-number variations or collapsed repeats.)

Finally, don't give bcftools call such a small region as I'm not sure
it's tuned well for that.  Do you get the same problem with say:

bcftools mpileup -f ref.fa sample.bam -r Chromosome:198800-199200 \|
    bcftools call -vm -r Chromosome:198940-198940 -

Ie the input to bcftools call is a larger region, but the caller
itself is then restricted back down again?

Sadly though I don't know what is causing this apparent 1bp indel to
be called as a very long one.  We'd need to see more data to debug
this.

James

-- 
James Bonfield (j...@sanger.ac.uk)
The Sanger Institute, Hinxton, Cambs, CB10 1SA


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to