On 21 Feb 2018, at 20:05, Nicholas Hill <nish...@ucsc.edu> wrote:
> Say my pileup line is this:
> chr3    73912   A       21      g,,..G,.gGGGgGGg,.Ggg   JJ<JJJ<sJJssJJsss7JkJ
> 
> So at chr3:73912 the reference was an A. On the forward strand (read1), there 
> are 7 guanine base pairs that aligned to the reference sequence at this 
> position. Additionally, on the reverse strand (read2), there are 6 guanine 
> base pairs that aligned to the reference sequence at this position ( or is it 
> 6 cytosine base pairs, given that it is the reverse?). This is where I am 
> confused.

It may be clearer if you mock up a small SAM file and mpileup it yourself so 
you can see what is going on.  For example,

@SQ     SN:chr3 LN:1000000
foo     0       chr3    73910   20      6M      *       0       0       GGGGGG  
AAAAAA
bar     16      chr3    73912   20      6M      *       0       0       GGGGGG  
BBBBBB

"samtools mpileup foobar.sam" gives  2  Gg  AB  mpileup columns at positions 
73912-73915. The lowercase "g"s correspond to the bases of bar -- as viewed on 
the reference sequence's forward strand, as one always does for mapped data in 
SAM files, as Tom noted. This read has been reverse-complemented (flags=16), so 
it was CCCCCC as viewed when it came off the sequencing machine, and it has 
been mapped to the reverse strand of the reference.

> Also, what if my reference base is lowercase: 
> chr3    73912   a       21      g,,..G,.gGGGgGGg,.Ggg   JJ<JJJ<sJJssJJsss7JkJ
> 
> Does this mean that the reference base is actually a thymine, given that it 
> is from the reference genome? 

No, it's an A. All that mpileup documentation about upper/lower-case and 
dots/commas indicating strands pertains only to the "g,,..G,.gGGGgGGg,.Ggg" 
column.

The reference base in column 3 is simply copied from your -f genome.fa file. So 
your genome.fa reference genome must have a mixture of upper- and lowercase 
reference bases in it. Presumably this is some kind of masking, but what it's 
supposed to indicate would be a question for whoever made that reference genome 
.fa file -- mpileup is not altering it in any way.

    John
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to