Thanks for the reply. I tried adding the "-A" option to "bcftools call" in the script. Then, at position 120, I get "GGAAT" in the ALT column, and the genotype field *changes* from "0" to "1:30,0". This genotype looks more like what I'd expect from looking at the alignment using "samtools tview".
If I use the "-v" option as well, position 120 is included in the VCF when the "-A" option is present, but not when "-A" is absent. As I understand it, the "-A" option should not change if a variant is called at a position or not, only how the data for the position are output. Please correct me if I'm wrong. Best, Rasmus Intomics is a contract research organization specialized in deriving core biological insight from large scale data. We help our clients in the pharmaceutical industry develop tomorrow's medicines better, faster, and cheaper through optimized use of biomedical data. ----------------------------------------------------------------- Hansen, Rasmus Borup Intomics - from data to biology System Administrator Diplomvej 377 Scientific Programmer DK-2800 Kgs. Lyngby Denmark E: r...@intomics.com W: http://www.intomics.com/ P: +45 5167 7972 P: +45 8880 7979 > On 04 Nov 2014, at 11:56, Petr Danecek <p...@sanger.ac.uk> wrote: > > Hi Rasmus, > > mpileup outputs the candidate indel > > MyRef 120 . G GGAAT > > which is then not called by bcftools (*). The -v option is not present, > therefore all input records are printed on output. The -A option is not > present, therefore unused ALT alleles are trimmed. The INDEL tag is a > reminiscence of the site being a candidate indel. > > I hope this helps > > Petr > > > (*) that is, with the default options, it would be called with -mP 0.01 > > > > On Fri, 2014-10-31 at 15:56 +0100, Rasmus Borup Hansen wrote: >> Hi! I'm getting VCF files with some strange INDELS from bcftools. The >> shell script below contains everything needed to reproduce it (I'm >> using bwa 0.7.10-r789, samtools 1.1, and bcftools 1.1) and it outputs >> the following: >> >> >> Lines for position 120 when the VCF is generated by "samtools >> mpileup": >> >> >> MyRef 120 . G <X> 0 . >> DP=3;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=0,0;MQ0F=0 PL 0,0,0 >> MyRef 120 . G GGAAT 0 . >> INDEL;IDV=2;IMF=0.666667;DP=3;I16=0,0,1,0,0,0,30,900,0,0,60,3600,0,0,25,625;QS=0,1;SGB=-0.379885;MQ0F=0 >> PL 30,3,0 >> >> >> Lines for position 120 when the VCF is generated by "bcftools call": >> >> >> MyRef 120 . G . 0 . >> DP=3;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=. GT . >> MyRef 120 . G . 2.1484 . >> INDEL;IDV=2;IMF=0.666667;DP=3;SGB=-0.379885;MQ0F=0;AN=1;DP4=0,0,1,0;MQ=60 >> GT 0 >> >> >> Output from "samtools mpileup" around position 120: >> >> >> MyRef 110 G 2 ., mD >> MyRef 111 T 2 ., bD >> MyRef 112 G 2 ., eD >> MyRef 113 T 2 ., jD >> MyRef 114 T 2 C, kD >> MyRef 115 G 1 . m >> MyRef 116 G 1 . e >> MyRef 117 G 1 . j >> MyRef 118 G 1 . i >> MyRef 119 A 1 . k >> MyRef 120 G 0 >> MyRef 121 A 0 >> MyRef 122 C 1 . I >> MyRef 123 T 1 . k >> MyRef 124 C 2 ., gA >> MyRef 125 A 2 Gg dA >> MyRef 126 G 2 ., _A >> MyRef 127 T 2 ., iF >> MyRef 128 G 2 ., fH >> MyRef 129 C 2 ., jJ >> MyRef 130 C 2 ., iI >> >> >> It's the output from "bcftools call" that has ALT="." while the >> "INDEL" flag is present that worries me. Is this a bug, or just >> something I don't understand (yet)? >> >> >> The shell script to reproduce the above output: >> >> >> #!/bin/bash >> >> >> # Make directory for data >> mkdir -p tmp-data >> cd tmp-data >> >> >> # Redirect stderr. >> exec 2>stderr >> >> >> # Use this short refence. >> cat >ref <<EOF >>> MyRef >> GGTGGCGAAGGCGGCTTGCTGGACAAATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCA >> AACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAGGTGTTGGGGAG >> ACTCAGTGCCGCAGCTAACGCAATAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGA >> EOF >> >> >> # Use these reads. >> cat >reads.fastq <<EOF >> @Instrument:1:FlowCell:1:1:1:1/1 >> ACCTTGCGGTCGTACTCCCCAGGTGGAGTGCTTATTGCGTTTGCTGCGGCACCGACCATCTCTGGCCAACACCTAGCACTCATCGTTTACGGCGTGGA >> + >> CCCFFFFFHFHHGJJJJ3EHGIJ?FHDHFHJJIJJJJJJHIJJJJJJIJHFFDDDDDDDDDDEDCDDDDDDDDDDCCCCDDDDDDBDDDDDDDDDDDD >> @Instrument:1:FlowCell:1:1:1:1/2 >> GGTGGCGAAGGCGGCTTACTGGACTGTTACTGACGCTGAGTCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACG >> + >> CCBFFFFFHHHHHJJJJJJIJJIIJJJIJJJJJIJJJIJJHHHHHFFFDE9=BDDDDDDDDDDDDDDDDDDDDEDDDDDDADDDEDDDBDDDDDDDD< >> @Instrument:1:FlowCell:1:2:2:2/1 >> TCAACCTTGCGGTCGTACTCCCCAGGTGGAGTGCTTATTGCGTTAGCTGCGGCACCGAGGATTCCTCCCCGACACCTAGCACTCATCGTTTACGGCG >> + >> BCCFFFFFHHHGFHIJJGHIIIIHDH?GDFI*BGIHIIJGEGAEEHIHJIGHFFB?@DDBBCDCDCCDBDDDBBDB9CCDDD@?CDDDDDD?@@DBB >> @Instrument:1:FlowCell:1:2:2:2/2 >> GGTAGTCCACGCCGTAAACGATGAGTGCTAGGTGTCGGGGAGGAATCCTCGGTGCCGCAGCTAACGCAATAAGCACTCCACCTGGGGAGTACGACCGC >> + >> BBBDFDF?FHHHFHHHIJIJJJIHIHGJIGIJADGHJDGGIGGGFHHHHHF>DACCDBDBDDDDDBDD@DDCDDDDDDDCDDDCBBB@DDCDDDBDBB >> EOF >> >> >> # Prepare indices of the reference. >> samtools faidx ref >> bwa index ref >> >> >> # Align the reads to the reference. >> bwa mem -p -t 4 -R '@RG\tID:MySample\tSM:MySample' ref reads.fastq > >> alignment.sam >> >> >> # bwa-mem needs to infer the insert size distribution from data. You >> # have to mix the read pair with at least tens of other pairs. For >> # this reason bwa doesn't think the reads are properly paired, so we >> # set the flags manually. >> perl -pe '@_=split /\t/; if ($_[0] !~ /^@/) { @_[1] |= 2; >> $_=join("\t", @_) }' alignment.sam > fixed_alignment.sam >> >> >> # Sort the alignment. >> samtools sort -T samtools.sort fixed_alignment.sam -O bam > >> sorted_alignment.bam >> >> >> # Index the sorted alignment. >> samtools index sorted_alignment.bam >> >> >> # Make a file containing the ploidy of the sample (for "bcftools call >> -m"). >> echo -e >sample.tab "MySample\t1" >> >> >> # Generate vcf using samtools. >> samtools mpileup -r MyRef:110-130 -vuf ref sorted_alignment.bam > >> pos110-130.samtools.vcf >> >> >> # Generate vcf using bcftools. >> samtools mpileup -r MyRef:110-130 -uf ref sorted_alignment.bam | >> bcftools call -m -S sample.tab -O v > pos110-130.bcftools.vcf >> >> >> # Generate mpileup data. >> samtools mpileup -r MyRef:110-130 -f ref sorted_alignment.bam > >> mpileup.tab >> >> >> # Output results. >> echo -e "\nLines for position 120 when the VCF is generated by >> \"samtools mpileup\":\n" >> grep -P '^MyRef\t120' pos110-130.samtools.vcf >> echo -e "\nLines for position 120 when the VCF is generated by >> \"bcftools call\":\n" >> grep -P '^MyRef\t120' pos110-130.bcftools.vcf >> echo -e "\nOutput from \"samtools mpileup\" around position 120:\n" >> cat mpileup.tab >> echo >> >> >> ### END OF SCRIPT >> >> >> >> >> Best, >> >> >> Rasmus Borup Hansen >> >> Intomics is a contract research organization specialized in deriving >> core biological insight from large scale data. We help our clients in >> the pharmaceutical industry develop tomorrow's medicines better, >> faster, and cheaper through optimized use of biomedical data. >> ----------------------------------------------------------------- >> Hansen, Rasmus Borup Intomics - from data to biology >> System Administrator Diplomvej 377 >> Scientific Programmer DK-2800 Kgs. Lyngby >> Denmark >> E: r...@intomics.com W: http://www.intomics.com/ >> P: +45 5167 7972 P: +45 8880 7979 >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Samtools-help mailing list >> Samtools-help@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/samtools-help > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
_______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help