Hi James,

Thanks for your help. I have tried your suggestions however it still
appears to give the same results.
I am in the process of uploading the data to the ENA but while I wait I
have put together another example using very similar data.

The example finds a very large variant at position 3329088 however by
looking at the raw data I find very little support for it.
I have created a gist so you can replicate it if you would like.
https://gist.github.com/jodyphelan/e721112d91dcaaa67cd79217ef2af24f

Thanks,
Jody

On Tue, May 1, 2018 at 2:02 PM, James Bonfield <j...@sanger.ac.uk> wrote:

> On Tue, May 01, 2018 at 10:00:49AM +0100, Jody Phelan wrote:
> > I'm runinig into some trouble when truing to use the 'samtools mpileup |
> > bcftools call' combination.
> > It seems to incorrectly be calling very long indels where it looks like
> > there is not support. For example:
> >
> > 'samtools mpileup -f ref.fa sample.bam -r Chromosome:198940-198940'
> > produces:
> >
> > Chromosome    198940    C    37    .,.-1A.-1A...-1A.-1A.-1A.,-1a.
> > -1A.,-1a,-1a.,.-1A,,.-1A,,+1a,+1a,.-1A,,-1a.,-1a,-1a,,-1a,-1a.-1A,-1a.
> > 06?5/<:8>/4<?3691465<3354.08:23:11@3C
> >
> > This looks fine... however when I try to call the genotype with bcftools
> I
> > get a very long indel.
> > 'samtools mpileup -ugf ref.fa sample.bam -r Chromosome:198940-198940 -t
> AD
> > | bcftools call -mv' produces:
> >
> > Chromosome    198940    .
> > CAAAAGACGTCATCGACGTACGGTCGGTGACCACCGAGATCAACACGTTGTTCCAGACGC
> TCACCTCGATCGCCGA
> > C    225    .
> > INDEL;IDV=32;IMF=0.5;DP=62;VDB=0.145385;SGB=-0.662043;
> MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,7,2;MQ=60
> > GT:PL:AD    1/1:255,27,0:0,9
>
> Firstly we'd recommend using bcftools mpileup now instead of samtools
> mpileup.  The samtools vcf output is deprecated and liable to vanish
> at some point.  Do you still get this problem with bcftools?
>
> Secondly, for indel calling you'll probably want to adjust IDV and IMF
> thresholds.  These can be done as hard filters on calling (I think
> it's -m and -F flags to bcftools mpileup) or later in post-processing
> of the VCF file via a filter. Eg bcftools view -e "IDV < 3 || IMF < 0.03".
> (Adding e.g. DP > 60 to filter overdepth - say 1.8x your mean depth -
> can also be a good way of removing false variants caused by
> copy-number variations or collapsed repeats.)
>
> Finally, don't give bcftools call such a small region as I'm not sure
> it's tuned well for that.  Do you get the same problem with say:
>
> bcftools mpileup -f ref.fa sample.bam -r Chromosome:198800-199200 \|
>     bcftools call -vm -r Chromosome:198940-198940 -
>
> Ie the input to bcftools call is a larger region, but the caller
> itself is then restricted back down again?
>
> Sadly though I don't know what is causing this apparent 1bp indel to
> be called as a very long one.  We'd need to see more data to debug
> this.
>
> James
>
> --
> James Bonfield (j...@sanger.ac.uk)
> The Sanger Institute, Hinxton, Cambs, CB10 1SA
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>



-- 
Jody Phelan
64 Queen Alexandra Mansions,Hastings St
London, WC1H 9DR, UK
Tel: +447478609078
Email: jodyphe...@gmail.com
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to