On Tue, 2014-11-04 at 16:22 +0100, Rasmus Borup Hansen wrote: 
> Thanks for the reply.
> 
> 
> I tried adding the "-A" option to "bcftools call" in the script. Then,
> at position 120, I get "GGAAT" in the ALT column, and the genotype
> field *changes* from "0" to "1:30,0". This genotype looks more like
> what I'd expect from looking at the alignment using "samtools tview".

This is correct behavior, with single allele PL field is not output.

> If I use the "-v" option as well, position 120 is included in the VCF
> when the "-A" option is present, but not when "-A" is absent.

This is a bug, -A should not influence sites discarded by -v. This is
now fixed in github, thanks.

petr


> As I understand it, the "-A" option should not change if a variant is
> called at a position or not, only how the data for the position are
> output. Please correct me if I'm wrong.
> 
> 
> Best,
> 
> 
> Rasmus
> 
> Intomics is a contract research organization specialized in deriving
> core biological insight from large scale data. We help our clients in
> the pharmaceutical industry develop tomorrow's medicines better,
> faster, and cheaper through optimized use of biomedical data.
> -----------------------------------------------------------------
> Hansen, Rasmus Borup              Intomics - from data to biology
> System Administrator              Diplomvej 377
> Scientific Programmer             DK-2800 Kgs. Lyngby
>                                   Denmark
> E: r...@intomics.com               W: http://www.intomics.com/
> P: +45 5167 7972                  P: +45 8880 7979
> 
> > On 04 Nov 2014, at 11:56, Petr Danecek <p...@sanger.ac.uk> wrote: 
> > 
> > Hi Rasmus,
> > 
> > mpileup outputs the candidate indel 
> > 
> > MyRef 120 . G GGAAT
> > 
> > which is then not called by bcftools (*). The -v option is not
> > present,
> > therefore all input records are printed on output. The -A option is
> > not
> > present, therefore unused ALT alleles are trimmed. The INDEL tag is
> > a
> > reminiscence of the site being a candidate indel.
> > 
> > I hope this helps
> > 
> > Petr
> > 
> > 
> > (*) that is, with the default options, it would be called with -mP
> > 0.01 
> > 
> > 
> > 
> > On Fri, 2014-10-31 at 15:56 +0100, Rasmus Borup Hansen wrote: 
> > > Hi! I'm getting VCF files with some strange INDELS from bcftools.
> > > The
> > > shell script below contains everything needed to reproduce it (I'm
> > > using bwa 0.7.10-r789, samtools 1.1, and bcftools 1.1) and it
> > > outputs
> > > the following:
> > > 
> > > 
> > > Lines for position 120 when the VCF is generated by "samtools
> > > mpileup":
> > > 
> > > 
> > > MyRef   120     .       G       <X>     0       .
> > > DP=3;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=0,0;MQ0F=0  PL
> > >      0,0,0
> > > MyRef   120     .       G       GGAAT   0       .
> > > INDEL;IDV=2;IMF=0.666667;DP=3;I16=0,0,1,0,0,0,30,900,0,0,60,3600,0,0,25,625;QS=0,1;SGB=-0.379885;MQ0F=0
> > >       PL      30,3,0
> > > 
> > > 
> > > Lines for position 120 when the VCF is generated by "bcftools
> > > call":
> > > 
> > > 
> > > MyRef   120     .       G       .       0       .
> > > DP=3;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.       GT      .
> > > MyRef   120     .       G       .       2.1484  .
> > > INDEL;IDV=2;IMF=0.666667;DP=3;SGB=-0.379885;MQ0F=0;AN=1;DP4=0,0,1,0;MQ=60 
> > >       GT      0
> > > 
> > > 
> > > Output from "samtools mpileup" around position 120:
> > > 
> > > 
> > > MyRef   110     G       2       .,      mD
> > > MyRef   111     T       2       .,      bD
> > > MyRef   112     G       2       .,      eD
> > > MyRef   113     T       2       .,      jD
> > > MyRef   114     T       2       C,      kD
> > > MyRef   115     G       1       .       m
> > > MyRef   116     G       1       .       e
> > > MyRef   117     G       1       .       j
> > > MyRef   118     G       1       .       i
> > > MyRef   119     A       1       .       k
> > > MyRef   120     G       0
> > > MyRef   121     A       0
> > > MyRef   122     C       1       .       I
> > > MyRef   123     T       1       .       k
> > > MyRef   124     C       2       .,      gA
> > > MyRef   125     A       2       Gg      dA
> > > MyRef   126     G       2       .,      _A
> > > MyRef   127     T       2       .,      iF
> > > MyRef   128     G       2       .,      fH
> > > MyRef   129     C       2       .,      jJ
> > > MyRef   130     C       2       .,      iI
> > > 
> > > 
> > > It's the output from "bcftools call" that has ALT="." while the
> > > "INDEL" flag is present that worries me. Is this a bug, or just
> > > something I don't understand (yet)?
> > > 
> > > 
> > > The shell script to reproduce the above output:
> > > 
> > > 
> > > #!/bin/bash
> > > 
> > > 
> > > # Make directory for data
> > > mkdir -p tmp-data
> > > cd tmp-data
> > > 
> > > 
> > > # Redirect stderr.
> > > exec 2>stderr
> > > 
> > > 
> > > # Use this short refence.
> > > cat >ref <<EOF
> > > > MyRef
> > > GGTGGCGAAGGCGGCTTGCTGGACAAATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCA
> > > AACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAGGTGTTGGGGAG
> > > ACTCAGTGCCGCAGCTAACGCAATAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGA
> > > EOF
> > > 
> > > 
> > > # Use these reads.
> > > cat >reads.fastq <<EOF
> > > @Instrument:1:FlowCell:1:1:1:1/1
> > > ACCTTGCGGTCGTACTCCCCAGGTGGAGTGCTTATTGCGTTTGCTGCGGCACCGACCATCTCTGGCCAACACCTAGCACTCATCGTTTACGGCGTGGA
> > > +
> > > CCCFFFFFHFHHGJJJJ3EHGIJ?FHDHFHJJIJJJJJJHIJJJJJJIJHFFDDDDDDDDDDEDCDDDDDDDDDDCCCCDDDDDDBDDDDDDDDDDDD
> > > @Instrument:1:FlowCell:1:1:1:1/2
> > > GGTGGCGAAGGCGGCTTACTGGACTGTTACTGACGCTGAGTCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACG
> > > +
> > > CCBFFFFFHHHHHJJJJJJIJJIIJJJIJJJJJIJJJIJJHHHHHFFFDE9=BDDDDDDDDDDDDDDDDDDDDEDDDDDDADDDEDDDBDDDDDDDD<
> > > @Instrument:1:FlowCell:1:2:2:2/1
> > > TCAACCTTGCGGTCGTACTCCCCAGGTGGAGTGCTTATTGCGTTAGCTGCGGCACCGAGGATTCCTCCCCGACACCTAGCACTCATCGTTTACGGCG
> > > +
> > > BCCFFFFFHHHGFHIJJGHIIIIHDH?GDFI*BGIHIIJGEGAEEHIHJIGHFFB?@DDBBCDCDCCDBDDDBBDB9CCDDD@?CDDDDDD?@@DBB
> > > @Instrument:1:FlowCell:1:2:2:2/2
> > > GGTAGTCCACGCCGTAAACGATGAGTGCTAGGTGTCGGGGAGGAATCCTCGGTGCCGCAGCTAACGCAATAAGCACTCCACCTGGGGAGTACGACCGC
> > > +
> > > BBBDFDF?FHHHFHHHIJIJJJIHIHGJIGIJADGHJDGGIGGGFHHHHHF>DACCDBDBDDDDDBDD@DDCDDDDDDDCDDDCBBB@DDCDDDBDBB
> > > EOF
> > > 
> > > 
> > > # Prepare indices of the reference.
> > > samtools faidx ref
> > > bwa index ref
> > > 
> > > 
> > > # Align the reads to the reference.
> > > bwa mem -p -t 4 -R '@RG\tID:MySample\tSM:MySample' ref reads.fastq
> > > >
> > > alignment.sam
> > > 
> > > 
> > > # bwa-mem needs to infer the insert size distribution from data.
> > > You
> > > # have to mix the read pair with at least tens of other pairs. For
> > > # this reason bwa doesn't think the reads are properly paired, so
> > > we
> > > # set the flags manually.
> > > perl -pe '@_=split /\t/; if ($_[0] !~ /^@/) { @_[1] |= 2;
> > > $_=join("\t", @_) }' alignment.sam > fixed_alignment.sam
> > > 
> > > 
> > > # Sort the alignment.
> > > samtools sort -T samtools.sort fixed_alignment.sam -O bam >
> > > sorted_alignment.bam
> > > 
> > > 
> > > # Index the sorted alignment.
> > > samtools index sorted_alignment.bam
> > > 
> > > 
> > > # Make a file containing the ploidy of the sample (for "bcftools
> > > call
> > > -m").
> > > echo -e >sample.tab "MySample\t1"
> > > 
> > > 
> > > # Generate vcf using samtools.
> > > samtools mpileup -r MyRef:110-130 -vuf ref sorted_alignment.bam >
> > > pos110-130.samtools.vcf
> > > 
> > > 
> > > # Generate vcf using bcftools.
> > > samtools mpileup -r MyRef:110-130 -uf ref sorted_alignment.bam |
> > > bcftools call -m -S sample.tab -O v > pos110-130.bcftools.vcf
> > > 
> > > 
> > > # Generate mpileup data.
> > > samtools mpileup -r MyRef:110-130 -f ref sorted_alignment.bam >
> > > mpileup.tab
> > > 
> > > 
> > > # Output results.
> > > echo -e "\nLines for position 120 when the VCF is generated by
> > > \"samtools mpileup\":\n"
> > > grep -P '^MyRef\t120' pos110-130.samtools.vcf
> > > echo -e "\nLines for position 120 when the VCF is generated by
> > > \"bcftools call\":\n"
> > > grep -P '^MyRef\t120' pos110-130.bcftools.vcf
> > > echo -e "\nOutput from \"samtools mpileup\" around position 120:
> > > \n"
> > > cat mpileup.tab
> > > echo
> > > 
> > > 
> > > ### END OF SCRIPT
> > > 
> > > 
> > > 
> > > 
> > > Best,
> > > 
> > > 
> > > Rasmus Borup Hansen
> > > 
> > > Intomics is a contract research organization specialized in
> > > deriving
> > > core biological insight from large scale data. We help our clients
> > > in
> > > the pharmaceutical industry develop tomorrow's medicines better,
> > > faster, and cheaper through optimized use of biomedical data.
> > > -----------------------------------------------------------------
> > > Hansen, Rasmus Borup              Intomics - from data to biology
> > > System Administrator              Diplomvej 377
> > > Scientific Programmer             DK-2800 Kgs. Lyngby
> > >                                  Denmark
> > > E: r...@intomics.com               W: http://www.intomics.com/
> > > P: +45 5167 7972                  P: +45 8880 7979
> > > 
> > > ------------------------------------------------------------------------------
> > > _______________________________________________
> > > Samtools-help mailing list
> > > Samtools-help@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/samtools-help
> > 
> > 
> > 
> > 
> > -- 
> > The Wellcome Trust Sanger Institute is operated by Genome Research 
> > Limited, a charity registered in England with number 1021457 and a 
> > company registered in England with number 2742969, whose registered 
> > office is 215 Euston Road, London, NW1 2BE. 
> > 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to