On 13 Dec 2016, at 23:56, Jacob Ulirsch <julir...@broadinstitute.org> wrote:
> When I run samtools (v1.3.1) mpileup I am running into some very concerning 
> issues. The reference bases for the output pileup file are not in the same 
> position as the references in the input fasta. In fact, they appear to be 
> oddly shifted. For example, here is output for 10 variants from the following 
> samtools pileup command:
> 
> samtools mpileup -f mtDNA.fasta -r chrM -Q 0 -t AD,ADF,ADR --skip-indels 
> input.bam > input.txt

(The -t and --skip-indels options are effective only when producing VCF/BCF 
output, not when producing mpileup output.)

> chrM    1330    C       457     G$G$GGGgggtg...
> chrM    1331    A       461     T$T$T$tttttT...
> chrM    1332    A       455     c$c$c$c$c$CC...
> chrM    1333    G       449     A$A$A$A$A$A$...
> chrM    1334    G       463     A$,$,$AAaaaa...
> chrM    1335    T       487     GGgggggggggg...
> chrM    1336    G       529     .$.$,$,$,$,,...
> chrM    1337    T       524     ,$,$,$,$,$,$...
> chrM    1338    A       503     g$g$g$g$g$g$...
> chrM    1339    G       497     TTcccctttTTT...
> chrM    1340    C       506     A$A$ttttaaaA...
> 
> Here is output from faidx, showing the correct fasta sequence for these 
> positions:
> 
> samtools faidx mtDNA.fasta chrM:1330-1340
> >chrM:1330-1340
> GTCAAGGTGTA

You don't say what species you're working with, but assuming human...  In fact 
the bases at positions 1330-1340 in the human mitochondrial DNA sequence 
(NC_012920) are indeed CAAGGTGTAGC.  So if there is a problem, it is with your 
samtools faidx command.

Are you sure mtDNA.fasta.fai is up to date with your mtDNA.fasta file?  Rename 
mtDNA.fasta.fai and regenerate it with `samtools faidx mtDNA.fasta`.  If you 
compare the two .fai files I think you will find that some of the offset 
numbers have changed slightly, perhaps because the header line in mtDNA.fasta 
has been changed since the original index file was generated.

    John

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to