Hi Rasmus,

this should be fixed now in github. Please try it out and let me know in
case of problems!

Cheers,
Petr


On Thu, 2014-11-06 at 13:22 +0100, Rasmus Borup Hansen wrote: 
> Thanks for the pointer. I'll look into it.
> 
> 
> I tried using the consensus caller with my example data and got a line
> like this:
> 
> 
> MyRef   120     .       G       .       8.23579 .
> INDEL;IDV=2;IMF=0.666667;DP=3;SGB=-0.379885;MQ0F=0;AF1=1;AC1=2;DP4=0,0,1,0;MQ=60;FQ=-37.5301
>     GT:PL 0/1:30
> 
> 
> How can GT be "0/1" if the ALT field is just "."?
> 
> 
> This is with the development branch as of yesterday after you fixed
> the bug and using the command "bcftools call -c -O v". The release
> version does the same.
> 
> 
> Best,
> 
> 
> Rasmus
> 
> Intomics is a contract research organization specialized in deriving
> core biological insight from large scale data. We help our clients in
> the pharmaceutical industry develop tomorrow's medicines better,
> faster, and cheaper through optimized use of biomedical data.
> -----------------------------------------------------------------
> Hansen, Rasmus Borup              Intomics - from data to biology
> System Administrator              Diplomvej 377
> Scientific Programmer             DK-2800 Kgs. Lyngby
>                                   Denmark
> E: r...@intomics.com               W: http://www.intomics.com/
> P: +45 5167 7972                  P: +45 8880 7979
> 
> > On 06 Nov 2014, at 11:12, Petr Danecek <p...@sanger.ac.uk> wrote: 
> > 
> > PLs are likelihoods, not probabilities - they are scaled and don't
> > add
> > up to 1, one is always 0.
> > 
> > For details of calculations see math notes here
> > http://samtools.github.io/bcftools/
> > 
> > petr
> > 
> > 
> > On Thu, 2014-11-06 at 10:49 +0100, Rasmus Borup Hansen wrote: 
> > > Thanks for fixing the bug.
> > > 
> > > 
> > > I still don't understand why the candidate INDEL "MyRef 120 . G
> > > GGAAT"
> > > is not called, but is reported with a PL of 0 (ie almost 100%
> > > likelihood) when the "-A" option is present. Is there an
> > > explanation
> > > of what bcftools considers a variant available somewhere?
> > > 
> > > 
> > > Best,
> > > 
> > > 
> > > Rasmus
> > > 
> > > Intomics is a contract research organization specialized in
> > > deriving
> > > core biological insight from large scale data. We help our clients
> > > in
> > > the pharmaceutical industry develop tomorrow's medicines better,
> > > faster, and cheaper through optimized use of biomedical data.
> > > -----------------------------------------------------------------
> > > Hansen, Rasmus Borup              Intomics - from data to biology
> > > System Administrator              Diplomvej 377
> > > Scientific Programmer             DK-2800 Kgs. Lyngby
> > >                                  Denmark
> > > E: r...@intomics.com               W: http://www.intomics.com/
> > > P: +45 5167 7972                  P: +45 8880 7979
> > > 
> > > > On 05 Nov 2014, at 14:49, Petr Danecek <p...@sanger.ac.uk>
> > > > wrote: 
> > > > 
> > > > On Tue, 2014-11-04 at 16:22 +0100, Rasmus Borup Hansen wrote: 
> > > > > Thanks for the reply.
> > > > > 
> > > > > 
> > > > > I tried adding the "-A" option to "bcftools call" in the
> > > > > script.
> > > > > Then,
> > > > > at position 120, I get "GGAAT" in the ALT column, and the
> > > > > genotype
> > > > > field *changes* from "0" to "1:30,0". This genotype looks more
> > > > > like
> > > > > what I'd expect from looking at the alignment using "samtools
> > > > > tview".
> > > > 
> > > > This is correct behavior, with single allele PL field is not
> > > > output.
> > > > 
> > > > > If I use the "-v" option as well, position 120 is included in
> > > > > the
> > > > > VCF
> > > > > when the "-A" option is present, but not when "-A" is absent.
> > > > 
> > > > This is a bug, -A should not influence sites discarded by -v.
> > > > This
> > > > is
> > > > now fixed in github, thanks.
> > > > 
> > > > petr
> > > > 
> > > > 
> > > > > As I understand it, the "-A" option should not change if a
> > > > > variant
> > > > > is
> > > > > called at a position or not, only how the data for the
> > > > > position
> > > > > are
> > > > > output. Please correct me if I'm wrong.
> > > > > 
> > > > > 
> > > > > Best,
> > > > > 
> > > > > 
> > > > > Rasmus
> > > > > 
> > > > > Intomics is a contract research organization specialized in
> > > > > deriving
> > > > > core biological insight from large scale data. We help our
> > > > > clients
> > > > > in
> > > > > the pharmaceutical industry develop tomorrow's medicines
> > > > > better,
> > > > > faster, and cheaper through optimized use of biomedical data.
> > > > > -----------------------------------------------------------------
> > > > > Hansen, Rasmus Borup              Intomics - from data to
> > > > > biology
> > > > > System Administrator              Diplomvej 377
> > > > > Scientific Programmer             DK-2800 Kgs. Lyngby
> > > > >                                 Denmark
> > > > > E: r...@intomics.com               W: http://www.intomics.com/
> > > > > P: +45 5167 7972                  P: +45 8880 7979
> > > > > 
> > > > > > On 04 Nov 2014, at 11:56, Petr Danecek <p...@sanger.ac.uk>
> > > > > > wrote: 
> > > > > > 
> > > > > > Hi Rasmus,
> > > > > > 
> > > > > > mpileup outputs the candidate indel 
> > > > > > 
> > > > > > MyRef 120 . G GGAAT
> > > > > > 
> > > > > > which is then not called by bcftools (*). The -v option is
> > > > > > not
> > > > > > present,
> > > > > > therefore all input records are printed on output. The -A
> > > > > > option
> > > > > > is
> > > > > > not
> > > > > > present, therefore unused ALT alleles are trimmed. The INDEL
> > > > > > tag
> > > > > > is
> > > > > > a
> > > > > > reminiscence of the site being a candidate indel.
> > > > > > 
> > > > > > I hope this helps
> > > > > > 
> > > > > > Petr
> > > > > > 
> > > > > > 
> > > > > > (*) that is, with the default options, it would be called
> > > > > > with
> > > > > > -mP
> > > > > > 0.01 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Fri, 2014-10-31 at 15:56 +0100, Rasmus Borup Hansen
> > > > > > wrote: 
> > > > > > > Hi! I'm getting VCF files with some strange INDELS from
> > > > > > > bcftools.
> > > > > > > The
> > > > > > > shell script below contains everything needed to reproduce
> > > > > > > it
> > > > > > > (I'm
> > > > > > > using bwa 0.7.10-r789, samtools 1.1, and bcftools 1.1) and
> > > > > > > it
> > > > > > > outputs
> > > > > > > the following:
> > > > > > > 
> > > > > > > 
> > > > > > > Lines for position 120 when the VCF is generated by
> > > > > > > "samtools
> > > > > > > mpileup":
> > > > > > > 
> > > > > > > 
> > > > > > > MyRef   120     .       G       <X>     0       .
> > > > > > > DP=3;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=0,0;MQ0F=0  PL
> > > > > > >    0,0,0
> > > > > > > MyRef   120     .       G       GGAAT   0       .
> > > > > > > INDEL;IDV=2;IMF=0.666667;DP=3;I16=0,0,1,0,0,0,30,900,0,0,60,3600,0,0,25,625;QS=0,1;SGB=-0.379885;MQ0F=0
> > > > > > >       PL      30,3,0
> > > > > > > 
> > > > > > > 
> > > > > > > Lines for position 120 when the VCF is generated by
> > > > > > > "bcftools
> > > > > > > call":
> > > > > > > 
> > > > > > > 
> > > > > > > MyRef   120     .       G       .       0       .
> > > > > > > DP=3;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.       GT      .
> > > > > > > MyRef   120     .       G       .       2.1484  .
> > > > > > > INDEL;IDV=2;IMF=0.666667;DP=3;SGB=-0.379885;MQ0F=0;AN=1;DP4=0,0,1,0;MQ=60
> > > > > > >        GT      0
> > > > > > > 
> > > > > > > 
> > > > > > > Output from "samtools mpileup" around position 120:
> > > > > > > 
> > > > > > > 
> > > > > > > MyRef   110     G       2       .,      mD
> > > > > > > MyRef   111     T       2       .,      bD
> > > > > > > MyRef   112     G       2       .,      eD
> > > > > > > MyRef   113     T       2       .,      jD
> > > > > > > MyRef   114     T       2       C,      kD
> > > > > > > MyRef   115     G       1       .       m
> > > > > > > MyRef   116     G       1       .       e
> > > > > > > MyRef   117     G       1       .       j
> > > > > > > MyRef   118     G       1       .       i
> > > > > > > MyRef   119     A       1       .       k
> > > > > > > MyRef   120     G       0
> > > > > > > MyRef   121     A       0
> > > > > > > MyRef   122     C       1       .       I
> > > > > > > MyRef   123     T       1       .       k
> > > > > > > MyRef   124     C       2       .,      gA
> > > > > > > MyRef   125     A       2       Gg      dA
> > > > > > > MyRef   126     G       2       .,      _A
> > > > > > > MyRef   127     T       2       .,      iF
> > > > > > > MyRef   128     G       2       .,      fH
> > > > > > > MyRef   129     C       2       .,      jJ
> > > > > > > MyRef   130     C       2       .,      iI
> > > > > > > 
> > > > > > > 
> > > > > > > It's the output from "bcftools call" that has ALT="."
> > > > > > > while
> > > > > > > the
> > > > > > > "INDEL" flag is present that worries me. Is this a bug, or
> > > > > > > just
> > > > > > > something I don't understand (yet)?
> > > > > > > 
> > > > > > > 
> > > > > > > The shell script to reproduce the above output:
> > > > > > > 
> > > > > > > 
> > > > > > > #!/bin/bash
> > > > > > > 
> > > > > > > 
> > > > > > > # Make directory for data
> > > > > > > mkdir -p tmp-data
> > > > > > > cd tmp-data
> > > > > > > 
> > > > > > > 
> > > > > > > # Redirect stderr.
> > > > > > > exec 2>stderr
> > > > > > > 
> > > > > > > 
> > > > > > > # Use this short refence.
> > > > > > > cat >ref <<EOF
> > > > > > > > MyRef
> > > > > > > GGTGGCGAAGGCGGCTTGCTGGACAAATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCA
> > > > > > > AACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAGGTGTTGGGGAG
> > > > > > > ACTCAGTGCCGCAGCTAACGCAATAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGA
> > > > > > > EOF
> > > > > > > 
> > > > > > > 
> > > > > > > # Use these reads.
> > > > > > > cat >reads.fastq <<EOF
> > > > > > > @Instrument:1:FlowCell:1:1:1:1/1
> > > > > > > ACCTTGCGGTCGTACTCCCCAGGTGGAGTGCTTATTGCGTTTGCTGCGGCACCGACCATCTCTGGCCAACACCTAGCACTCATCGTTTACGGCGTGGA
> > > > > > > +
> > > > > > > CCCFFFFFHFHHGJJJJ3EHGIJ?FHDHFHJJIJJJJJJHIJJJJJJIJHFFDDDDDDDDDDEDCDDDDDDDDDDCCCCDDDDDDBDDDDDDDDDDDD
> > > > > > > @Instrument:1:FlowCell:1:1:1:1/2
> > > > > > > GGTGGCGAAGGCGGCTTACTGGACTGTTACTGACGCTGAGTCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACG
> > > > > > > +
> > > > > > > CCBFFFFFHHHHHJJJJJJIJJIIJJJIJJJJJIJJJIJJHHHHHFFFDE9=BDDDDDDDDDDDDDDDDDDDDEDDDDDDADDDEDDDBDDDDDDDD<
> > > > > > > @Instrument:1:FlowCell:1:2:2:2/1
> > > > > > > TCAACCTTGCGGTCGTACTCCCCAGGTGGAGTGCTTATTGCGTTAGCTGCGGCACCGAGGATTCCTCCCCGACACCTAGCACTCATCGTTTACGGCG
> > > > > > > +
> > > > > > > BCCFFFFFHHHGFHIJJGHIIIIHDH?GDFI*BGIHIIJGEGAEEHIHJIGHFFB?@DDBBCDCDCCDBDDDBBDB9CCDDD@?CDDDDDD?@@DBB
> > > > > > > @Instrument:1:FlowCell:1:2:2:2/2
> > > > > > > GGTAGTCCACGCCGTAAACGATGAGTGCTAGGTGTCGGGGAGGAATCCTCGGTGCCGCAGCTAACGCAATAAGCACTCCACCTGGGGAGTACGACCGC
> > > > > > > +
> > > > > > > BBBDFDF?FHHHFHHHIJIJJJIHIHGJIGIJADGHJDGGIGGGFHHHHHF>DACCDBDBDDDDDBDD@DDCDDDDDDDCDDDCBBB@DDCDDDBDBB
> > > > > > > EOF
> > > > > > > 
> > > > > > > 
> > > > > > > # Prepare indices of the reference.
> > > > > > > samtools faidx ref
> > > > > > > bwa index ref
> > > > > > > 
> > > > > > > 
> > > > > > > # Align the reads to the reference.
> > > > > > > bwa mem -p -t 4 -R '@RG\tID:MySample\tSM:MySample' ref
> > > > > > > reads.fastq
> > > > > > > > 
> > > > > > > alignment.sam
> > > > > > > 
> > > > > > > 
> > > > > > > # bwa-mem needs to infer the insert size distribution from
> > > > > > > data.
> > > > > > > You
> > > > > > > # have to mix the read pair with at least tens of other
> > > > > > > pairs.
> > > > > > > For
> > > > > > > # this reason bwa doesn't think the reads are properly
> > > > > > > paired,
> > > > > > > so
> > > > > > > we
> > > > > > > # set the flags manually.
> > > > > > > perl -pe '@_=split /\t/; if ($_[0] !~ /^@/) { @_[1] |= 2;
> > > > > > > $_=join("\t", @_) }' alignment.sam > fixed_alignment.sam
> > > > > > > 
> > > > > > > 
> > > > > > > # Sort the alignment.
> > > > > > > samtools sort -T samtools.sort fixed_alignment.sam -O bam
> > > > > > > >
> > > > > > > sorted_alignment.bam
> > > > > > > 
> > > > > > > 
> > > > > > > # Index the sorted alignment.
> > > > > > > samtools index sorted_alignment.bam
> > > > > > > 
> > > > > > > 
> > > > > > > # Make a file containing the ploidy of the sample (for
> > > > > > > "bcftools
> > > > > > > call
> > > > > > > -m").
> > > > > > > echo -e >sample.tab "MySample\t1"
> > > > > > > 
> > > > > > > 
> > > > > > > # Generate vcf using samtools.
> > > > > > > samtools mpileup -r MyRef:110-130 -vuf ref
> > > > > > > sorted_alignment.bam >
> > > > > > > pos110-130.samtools.vcf
> > > > > > > 
> > > > > > > 
> > > > > > > # Generate vcf using bcftools.
> > > > > > > samtools mpileup -r MyRef:110-130 -uf ref
> > > > > > > sorted_alignment.bam
> > > > > > > |
> > > > > > > bcftools call -m -S sample.tab -O v >
> > > > > > > pos110-130.bcftools.vcf
> > > > > > > 
> > > > > > > 
> > > > > > > # Generate mpileup data.
> > > > > > > samtools mpileup -r MyRef:110-130 -f ref
> > > > > > > sorted_alignment.bam
> > > > > > > > 
> > > > > > > mpileup.tab
> > > > > > > 
> > > > > > > 
> > > > > > > # Output results.
> > > > > > > echo -e "\nLines for position 120 when the VCF is
> > > > > > > generated by
> > > > > > > \"samtools mpileup\":\n"
> > > > > > > grep -P '^MyRef\t120' pos110-130.samtools.vcf
> > > > > > > echo -e "\nLines for position 120 when the VCF is
> > > > > > > generated by
> > > > > > > \"bcftools call\":\n"
> > > > > > > grep -P '^MyRef\t120' pos110-130.bcftools.vcf
> > > > > > > echo -e "\nOutput from \"samtools mpileup\" around
> > > > > > > position
> > > > > > > 120:
> > > > > > > \n"
> > > > > > > cat mpileup.tab
> > > > > > > echo
> > > > > > > 
> > > > > > > 
> > > > > > > ### END OF SCRIPT
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Best,
> > > > > > > 
> > > > > > > 
> > > > > > > Rasmus Borup Hansen
> > > > > > > 
> > > > > > > Intomics is a contract research organization specialized
> > > > > > > in
> > > > > > > deriving
> > > > > > > core biological insight from large scale data. We help our
> > > > > > > clients
> > > > > > > in
> > > > > > > the pharmaceutical industry develop tomorrow's medicines
> > > > > > > better,
> > > > > > > faster, and cheaper through optimized use of biomedical
> > > > > > > data.
> > > > > > > -----------------------------------------------------------------
> > > > > > > Hansen, Rasmus Borup              Intomics - from data to
> > > > > > > biology
> > > > > > > System Administrator              Diplomvej 377
> > > > > > > Scientific Programmer             DK-2800 Kgs. Lyngby
> > > > > > >                                Denmark
> > > > > > > E: r...@intomics.com               W:
> > > > > > > http://www.intomics.com/
> > > > > > > P: +45 5167 7972                  P: +45 8880 7979
> > > > > > > 
> > > > > > > ------------------------------------------------------------------------------
> > > > > > > _______________________________________________
> > > > > > > Samtools-help mailing list
> > > > > > > Samtools-help@lists.sourceforge.net
> > > > > > > https://lists.sourceforge.net/lists/listinfo/samtools-help
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > -- 
> > > > > > The Wellcome Trust Sanger Institute is operated by Genome
> > > > > > Research 
> > > > > > Limited, a charity registered in England with number 1021457
> > > > > > and
> > > > > > a 
> > > > > > company registered in England with number 2742969, whose
> > > > > > registered 
> > > > > > office is 215 Euston Road, London, NW1 2BE. 
> > > > > > 
> > > > > 
> > > > > ------------------------------------------------------------------------------
> > > > > _______________________________________________
> > > > > Samtools-help mailing list
> > > > > Samtools-help@lists.sourceforge.net
> > > > > https://lists.sourceforge.net/lists/listinfo/samtools-help
> > > > 
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > > The Wellcome Trust Sanger Institute is operated by Genome
> > > > Research 
> > > > Limited, a charity registered in England with number 1021457 and
> > > > a 
> > > > company registered in England with number 2742969, whose
> > > > registered 
> > > > office is 215 Euston Road, London, NW1 2BE. 
> > > > 
> > > 
> > > ------------------------------------------------------------------------------
> > > _______________________________________________
> > > Samtools-help mailing list
> > > Samtools-help@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/samtools-help
> > 
> > 
> > 
> > 
> > -- 
> > The Wellcome Trust Sanger Institute is operated by Genome Research 
> > Limited, a charity registered in England with number 1021457 and a 
> > company registered in England with number 2742969, whose registered 
> > office is 215 Euston Road, London, NW1 2BE. 
> > 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to