Using the sample SAM file at the end of this message, I see a problem when
requesting the base positions in the pileup
--- Issue 1: wrong number of base positions ---
For example
samtools mpileup -s -O test.bam -r chr20:251604-251604
returns
chr20 251604 N 5 GGGGg lGJpG ~~~~~
44,34,32,28,20,9,4
Notice that there are 5 bases printed, but 7 positions. It's not possible
to know which positions go with which bases/reads.
I know there are 7 reads in the file but 2 are being filtered out by
quality score. If i turn off the filtering (-Q 0) I at least get consistent
results
samtools mpileup -s -O test.bam -r chr20:251604-251604 -Q 0
returns
chr20 251604 N 7 GGGGggg lGJpG!! ~~~~~~~
44,34,32,28,20,9,4
The filtering should drop positions for dropped bases
--- Issue 2: different quality scores ---
If i run
samtools mpileup -s -O test.bam -r chr20:251604-251604 -Q 0
using version 1.18, i get
chr20 251604 N 7 GGGGggg GGJIGHF ~~~~~~~
44,34,32,28,20,9,4
but using 1.2, i get
chr20 251604 N 7 GGGGggg lGJpG!! ~~~~~~~
44,34,32,28,20,9,4
notice how the base quality scores are different. If i extract the values
from the BAM file, i get results that agree with version 1.18. I tried
setting -B to avoid adjusting BAQ but that didn't seem to make a difference
Is there another flag which affects the reporting of base quality?
--- Sample SAM file (test.sam) ---
(convert to BAM file and index to work with sample code above)
@HD VN:1.3
@SQ SN:chr20 LN:36025250
@RG ID:0 PG:GEM PL:ILLUMINA SM:0
HW-ST546:136:D0HWFACXX:6:1203:17278:16013 99 chr20 251561 180 75M =
251601 115
CCCACCTGGCCCAGCAGCACCAACCAGAAAGAAGGGAAGAAGAGAGGAAAAAACCACAGGAAGAAAGAAAGGAGG
CCCFFFFFHHHHHJJJJIJIJJJJJJJJJJHEHIJJHIIIJIGGDHIIIJJJJHHHFFFFEEEEEDDDDDDDBDD
RG:Z:0 NM:i:0 XT:A:U md:Z:75
HW-ST546:136:D0HWFACXX:6:1106:19363:28977 99 chr20 251571 180 75M =
251651 155
CCAGCAGCACCAACCAGAAAGAAGGGAAGAAGAGAGGAAAAAACCACAGGAAGAAAGAAAGGAGGGAGGGAGGGA
CCCFFFFFGHHHHJIJJIJJJIJJJJIGIIJIEGGHHEGIJJIJBHIHIFDGEHHHFFFDDECDCB8?@D@@DD5
RG:Z:0 NM:i:0 XT:A:U md:Z:75
HW-ST546:136:D0HWFACXX:6:2308:7416:167373 163 chr20 251573 180 75M =
251633 -130
AGCAGCACCAACCAGAAAGAAGGGAAGAAGAGAGGAAAAAACCACAGGAAGAAAGAAAGGAGGGAGGGAGGGAGA
CCCFFFFFHHHHHJIIJIJIIIJJIJJIIIIJIIIJJJJJJJJJJJJJIJHIIHHHHFFFFFDDBDDDBDDDBD<
RG:Z:0 NM:i:10 XT:A:U md:Z:75
HW-ST546:136:D0HWFACXX:6:1304:4908:37632 99 chr20 251577 180 75M =
251596 94
GCACCAACCAGAAAGAAGGGAAGAAGAGAGGAAAAAACCACAGGAAGAAAGAAAGGAGGGAGGGAGGGAGAGGAG
CCCFFFFFHHHHHIJJJJJJIJGIJJIIIJIGHJJIGHHIIGHGIGIHIJIJJGFG@DDDACB@;@DD;;;??8?
RG:Z:0 NM:i:0 XT:A:U md:Z:75
HW-ST546:136:D0HWFACXX:6:1206:3218:57131 83 chr20 251585 180 75M =
251503 157
CAGAAAGAAGGGAAGAAGAGAGGAAAAAACCACAGGAAGAAAGAAAGGAGGGAGGGAGGGAGAGGAGGAAGGAAG
DB?>DEFEFEH@IGGIHGGGHEGIJIIHFHHFDCIHEIEGFGGCIHEGGGHFBIHJIJIGEJHHFHHADBDFCC@
RG:Z:0 NM:i:3 XT:A:U md:Z:75
HW-ST546:136:D0HWFACXX:6:1304:4908:37632 147 chr20 251596 180 75M =
251577 -94
GAAGAAGAGAGGAAAAAACCACAGGAAGAAAGAAAGGAGGGAGGGAGGGAGAGGAGGAAGGAAGGAGGAGGGAAG
EFFFFFFHHHHHHJJIHGIGIJIIHJJIJJJJJIIJIIJJJFJJJIJJJJIGIJJJJJJJJJHHHHHFFFFFCCC
RG:Z:0 NM:i:0 XT:A:U md:Z:75
HW-ST546:136:D0HWFACXX:6:1203:17278:16013 147 chr20 251601 180 75M =
251561 -115
AGAGAGGAAAAAACCACAGGAAGAAAGAAAGGAGGGAGGGAGGGAGAGGAGGAAGGAAGGAGGAGGGAAGGGAAG
EEFFFFFFJJJIIJIIIIJJJIGHHIJJJJJJJJJJIJJJJJJJJJIJJIJJJJJJJJJJJJHHHHHFFFFFC@@
RG:Z:0 NM:i:0 XT:A:U md:Z:75
---------
Thank you.
Matthew Flickinger
PhD Candidate
Center for Statistical Genetics
University of Michigan
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help