Hi Rasmus, mpileup outputs the candidate indel
MyRef 120 . G GGAAT which is then not called by bcftools (*). The -v option is not present, therefore all input records are printed on output. The -A option is not present, therefore unused ALT alleles are trimmed. The INDEL tag is a reminiscence of the site being a candidate indel. I hope this helps Petr (*) that is, with the default options, it would be called with -mP 0.01 On Fri, 2014-10-31 at 15:56 +0100, Rasmus Borup Hansen wrote: > Hi! I'm getting VCF files with some strange INDELS from bcftools. The > shell script below contains everything needed to reproduce it (I'm > using bwa 0.7.10-r789, samtools 1.1, and bcftools 1.1) and it outputs > the following: > > > Lines for position 120 when the VCF is generated by "samtools > mpileup": > > > MyRef 120 . G <X> 0 . > DP=3;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=0,0;MQ0F=0 PL 0,0,0 > MyRef 120 . G GGAAT 0 . > INDEL;IDV=2;IMF=0.666667;DP=3;I16=0,0,1,0,0,0,30,900,0,0,60,3600,0,0,25,625;QS=0,1;SGB=-0.379885;MQ0F=0 > PL 30,3,0 > > > Lines for position 120 when the VCF is generated by "bcftools call": > > > MyRef 120 . G . 0 . > DP=3;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=. GT . > MyRef 120 . G . 2.1484 . > INDEL;IDV=2;IMF=0.666667;DP=3;SGB=-0.379885;MQ0F=0;AN=1;DP4=0,0,1,0;MQ=60 > GT 0 > > > Output from "samtools mpileup" around position 120: > > > MyRef 110 G 2 ., mD > MyRef 111 T 2 ., bD > MyRef 112 G 2 ., eD > MyRef 113 T 2 ., jD > MyRef 114 T 2 C, kD > MyRef 115 G 1 . m > MyRef 116 G 1 . e > MyRef 117 G 1 . j > MyRef 118 G 1 . i > MyRef 119 A 1 . k > MyRef 120 G 0 > MyRef 121 A 0 > MyRef 122 C 1 . I > MyRef 123 T 1 . k > MyRef 124 C 2 ., gA > MyRef 125 A 2 Gg dA > MyRef 126 G 2 ., _A > MyRef 127 T 2 ., iF > MyRef 128 G 2 ., fH > MyRef 129 C 2 ., jJ > MyRef 130 C 2 ., iI > > > It's the output from "bcftools call" that has ALT="." while the > "INDEL" flag is present that worries me. Is this a bug, or just > something I don't understand (yet)? > > > The shell script to reproduce the above output: > > > #!/bin/bash > > > # Make directory for data > mkdir -p tmp-data > cd tmp-data > > > # Redirect stderr. > exec 2>stderr > > > # Use this short refence. > cat >ref <<EOF > >MyRef > GGTGGCGAAGGCGGCTTGCTGGACAAATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCA > AACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAGGTGTTGGGGAG > ACTCAGTGCCGCAGCTAACGCAATAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGA > EOF > > > # Use these reads. > cat >reads.fastq <<EOF > @Instrument:1:FlowCell:1:1:1:1/1 > ACCTTGCGGTCGTACTCCCCAGGTGGAGTGCTTATTGCGTTTGCTGCGGCACCGACCATCTCTGGCCAACACCTAGCACTCATCGTTTACGGCGTGGA > + > CCCFFFFFHFHHGJJJJ3EHGIJ?FHDHFHJJIJJJJJJHIJJJJJJIJHFFDDDDDDDDDDEDCDDDDDDDDDDCCCCDDDDDDBDDDDDDDDDDDD > @Instrument:1:FlowCell:1:1:1:1/2 > GGTGGCGAAGGCGGCTTACTGGACTGTTACTGACGCTGAGTCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACG > + > CCBFFFFFHHHHHJJJJJJIJJIIJJJIJJJJJIJJJIJJHHHHHFFFDE9=BDDDDDDDDDDDDDDDDDDDDEDDDDDDADDDEDDDBDDDDDDDD< > @Instrument:1:FlowCell:1:2:2:2/1 > TCAACCTTGCGGTCGTACTCCCCAGGTGGAGTGCTTATTGCGTTAGCTGCGGCACCGAGGATTCCTCCCCGACACCTAGCACTCATCGTTTACGGCG > + > BCCFFFFFHHHGFHIJJGHIIIIHDH?GDFI*BGIHIIJGEGAEEHIHJIGHFFB?@DDBBCDCDCCDBDDDBBDB9CCDDD@?CDDDDDD?@@DBB > @Instrument:1:FlowCell:1:2:2:2/2 > GGTAGTCCACGCCGTAAACGATGAGTGCTAGGTGTCGGGGAGGAATCCTCGGTGCCGCAGCTAACGCAATAAGCACTCCACCTGGGGAGTACGACCGC > + > BBBDFDF?FHHHFHHHIJIJJJIHIHGJIGIJADGHJDGGIGGGFHHHHHF>DACCDBDBDDDDDBDD@DDCDDDDDDDCDDDCBBB@DDCDDDBDBB > EOF > > > # Prepare indices of the reference. > samtools faidx ref > bwa index ref > > > # Align the reads to the reference. > bwa mem -p -t 4 -R '@RG\tID:MySample\tSM:MySample' ref reads.fastq > > alignment.sam > > > # bwa-mem needs to infer the insert size distribution from data. You > # have to mix the read pair with at least tens of other pairs. For > # this reason bwa doesn't think the reads are properly paired, so we > # set the flags manually. > perl -pe '@_=split /\t/; if ($_[0] !~ /^@/) { @_[1] |= 2; > $_=join("\t", @_) }' alignment.sam > fixed_alignment.sam > > > # Sort the alignment. > samtools sort -T samtools.sort fixed_alignment.sam -O bam > > sorted_alignment.bam > > > # Index the sorted alignment. > samtools index sorted_alignment.bam > > > # Make a file containing the ploidy of the sample (for "bcftools call > -m"). > echo -e >sample.tab "MySample\t1" > > > # Generate vcf using samtools. > samtools mpileup -r MyRef:110-130 -vuf ref sorted_alignment.bam > > pos110-130.samtools.vcf > > > # Generate vcf using bcftools. > samtools mpileup -r MyRef:110-130 -uf ref sorted_alignment.bam | > bcftools call -m -S sample.tab -O v > pos110-130.bcftools.vcf > > > # Generate mpileup data. > samtools mpileup -r MyRef:110-130 -f ref sorted_alignment.bam > > mpileup.tab > > > # Output results. > echo -e "\nLines for position 120 when the VCF is generated by > \"samtools mpileup\":\n" > grep -P '^MyRef\t120' pos110-130.samtools.vcf > echo -e "\nLines for position 120 when the VCF is generated by > \"bcftools call\":\n" > grep -P '^MyRef\t120' pos110-130.bcftools.vcf > echo -e "\nOutput from \"samtools mpileup\" around position 120:\n" > cat mpileup.tab > echo > > > ### END OF SCRIPT > > > > > Best, > > > Rasmus Borup Hansen > > Intomics is a contract research organization specialized in deriving > core biological insight from large scale data. We help our clients in > the pharmaceutical industry develop tomorrow's medicines better, > faster, and cheaper through optimized use of biomedical data. > ----------------------------------------------------------------- > Hansen, Rasmus Borup Intomics - from data to biology > System Administrator Diplomvej 377 > Scientific Programmer DK-2800 Kgs. Lyngby > Denmark > E: r...@intomics.com W: http://www.intomics.com/ > P: +45 5167 7972 P: +45 8880 7979 > > ------------------------------------------------------------------------------ > _______________________________________________ > Samtools-help mailing list > Samtools-help@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help