Hello,
I have some Iontorrent pgm sequencing data from mitochondrial primer
capture experiments (with very difficult low quant DNA samples, so reads
are not great). I noticed many duplicates (as in duplicate=reads that start
and end at the same position and are equal in sequence). From this tutorial
https://www.broadinstitute.org/gatk/events/2038/GATKwh0-BP-1-Map_and_Dedup.pdf,
I gather that marking equal CIGAR String reads, is what picard-tools
MarkDuplicates
function does.
I am using picard-tools-1.129, GATKv3.3-0, samtools-0.1.19
so from my bam files I first remove the unmapped reads (bash script):
*samtools-0.1.19/samtools view -b -F 4 $f > $f.mapped.bam*
then I mark the duplicates:
*java -jar picard-tools-1.129/MarkDuplicates.jar
\INPUT=$f.mapped.bam.sorted.bam \OUTPUT=$f.mapped.bam.sorted.bam.dedup.bam
\METRICS_FILE=metrics.txt*
then I build the bai file:
*java -jar picard-tools-1.129BU/BuildBamIndex.jar
\INPUT=$f.mapped.bam.sorted.bam.dedup.bam.bam*
Looking at the data I don't understand why some reads are marked as
duplicate, as example below copied from the sam file (but there are more).
Reads before and after start at other positions.
85MXK:00198:02595 1040 rCRS| 257 67 2S29M1I23M4I66M * 0 0
GTACAGCCGCTTTCCACACAGACATCATAACAAAAAAATTTCCACCAAACCCCCCCCCTCCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGCCAAACCCCAAAAACAAAGAACCCTAA
;<DE<<:DDD>DD@EEEEF>><;<3==3-33)333333,C@
<CC<?6A5&:::::::::):::::::::A@IBEBFDDDEC
??=C?GBBBDDDDDDFD?C6C6)666)DDD>61:::A:1:<=8C XA:Z:map4-1 ZA:i:125 ZB:i:30
ZC:B:i,189,188,1,0 MD:Z:6A46T64 ZF:i:24 PG:Z:MarkDuplicates
RG:Z:85MXK.IonXpress_016 ZG:i:189 NM:i:7 XM:i:118
ZM:B:s,254,2,322,2,2,268,10,256,254,260,224,516,244,2,2,12,256,28,226,226,2,278,252,222,700,2,2,212,2,4,762,4,482,4,232,4,836,4,26,254,1206,10,4,1012,4,4,4,4,802,4,524,4,46,216,54,214,6,212,6,218,10,60,240,4,4,274,18,66,224,14,4,200,260,2,2,278,2,2,2,2,666,2,52,406,154,6,180,86,220,248,58,0,196,0,92,70,246,14,0,262,236,12,0,518,0,478,0,70,236,300,0,486,38,0,292,0,12,282,1478,270,26,2066,646,18,26,0,546,4,264,0,0,578,24,646,0,102,1398,0,0,256,66,0,0,0,646,0,42,208,312,16,244,208,298,212,182,10,240,0,2,0,252,0,8,254,232,8,38,222,236,24,10,450,578,180,214,0,64,24,446,12,38,246,78,74,218,230,204,172,280,204,74,44,222,24,192,90,44,240,436,190,42,10,0,32
ZP:B:f,0.00519992,0.00619387,9.47046e-07 AS:i:90 XS:i:8
85MXK:01378:02427 1040 rCRS| 257 75 2S53M3I65M * 0 0
GTACAGCCGCTTTCCACACAGACATCATAACAAAAAATTTCCACCAAACCCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGCCAAACCCCAAAAACAAAGAACCCTAA
DCCD<<:IDD>EE?DEECC=<;BB:<<:3??0?CCDD?MFAB?::0::&CEIFEIE@
:*::::::?AFGKCC@ECCCBDDB=<988;DBBCEDDD>??<LK;EF:):::::8AAA>:1:::6@
XA:Z:map4-1 ZA:i:123 ZB:i:30 ZC:B:i,189,188,1,0 MD:Z:6A46T64 ZF:i:24
PG:Z:MarkDuplicates RG:Z:85MXK.IonXpress_016 ZG:i:189 NM:i:5 XM:i:118
ZM:B:s,242,0,284,0,2,252,14,242,222,272,236,496,280,20,6,0,264,4,254,238,16,274,314,226,678,24,0,242,14,0,772,0,470,0,216,26,730,4,0,290,1144,14,0,938,12,16,4,0,804,14,504,20,58,212,110,234,0,206,0,220,46,20,258,4,0,230,0,64,186,42,0,218,244,0,0,254,0,10,0,0,624,0,96,426,126,12,214,98,196,252,42,8,180,0,96,50,272,0,0,276,210,4,2,486,0,466,0,56,232,298,16,452,32,0,280,20,0,264,1430,278,42,1974,592,20,20,0,536,6,224,10,18,608,52,642,0,128,1258,6,2,252,84,4,6,2,612,2,28,274,278,4,288,222,256,252,198,20,242,6,10,6,264,14,6,278,244,6,30,232,226,26,8,458,578,176,200,8,66,8,440,10,28,274,104,76,212,212,168,200,254,176,126,50,226,44,198,86,38,232,426,198,4,12,2,18
ZP:B:f,0.00478832,0.00666637,3.13317e-07 AS:i:99 XS:i:-2147483647
and another example:
85MXK:01331:01601 1024 rCRS| 215 76 79M * 0 0
ATTAATGCTTGTAGGATATAATAATAACAATTGAATGTCTGCACAGCCGCTTTCTACACAGACATCATAACAAAAAATT
II?D?DFDD<??DD>CBEDD@DD@DD<=;?D>BB@CC<===<=<=DC<::::1:@BEDDDDDCCBBB8
;6.-----(/* XA:Z:map4-1 ZA:i:149 ZB:i:30 ZC:B:i,289,289,2,0 MD:Z:16C31A5C24
ZF:i:27 PG:Z:MarkDuplicates RG:Z:85MXK.IonXpress_016 ZG:i:289 NM:i:3 XM:i:79
ZM:B:s,238,0,280,0,6,268,12,250,260,284,246,488,272,0,16,0,242,2,242,220,0,272,248,218,244,0,0,250,0,0,0,0,476,504,0,0,246,0,0,216,26,236,458,190,12,0,12,0,264,2,2,256,2,2,440,268,250,38,6,258,2,2,4,2,234,486,2,2,268,502,2,2,234,4,2,2,520,2,246,418,480,2,178,512,234,18,188,100,248,22,6,44,288,28,2,34,266,20,16,242,18,46,272,0,48,2,0,24,224,-2,286,214,44,50,202,22,6,430,230,14,32,2,-2,0,328,24,18,54,658,42,256,20,198,236,282,10,2,26,30,-4,234,14,266,238,34,14,214,266,6,260,18,256,200,0,-8,0,282,276,18,0,246,456,274,2,-6,1208,24,-8,516,288,-6,-6,258,4,308,410,98,498,-6,112,96,456,48,168
ZP:B:f,0.00484822,0.00638284,7.94061e-06 AS:i:67 XS:i:-2147483647
85MXK:01433:03052 0 rCRS| 215 2 13M103S * 0 0
ATTAATCCTTGTATTGCGCCCGACGCAAAAAACAGCACCACACAGACCACTGGCAGAATCAGCCAGATCTTGGGGACGGCCTTGAGTAACAGCCAGCTCGTCACACCGCTTAGCAC
DDBEBEE?C?CDEEBBBBEF?E=<:?@@@@@*::::::4:?::=CC?=::::3::CC<:<=B?4:=CCCC?CAD8<=>@EAFAEEEDE7<<::3:;;=DFED>==D?=:<=C<=>D
XA:Z:map4-1 ZA:i:116 ZB:i:30 ZC:B:i,229,226,1,0 MD:Z:6G6 ZF:i:27
PG:Z:MarkDuplicates RG:Z:85MXK.IonXpress_016 ZG:i:229 NM:i:1 XM:i:13
ZM:B:s,246,0,294,0,2,264,10,248,232,264,222,494,248,0,20,0,264,0,224,212,0,274,252,226,232,0,0,226,4,0,0,0,470,484,0,0,274,12,466,0,470,34,30,218,32,0,0,0,268,36,0,238,488,22,246,0,22,2,0,0,282,-2,234,694,48,-4,36,236,-6,222,326,232,-8,280,10,14,1352,-2,320,238,-2,16,224,-10,6,300,2,348,-14,4,0,6,410,242,16,286,26,254,326,4,8,240,80,220,-2,14,-4,2,276,42,440,262,10,308,42,50,158,114,352,36,20,72,-16,4,324,236,228,28,30,428,28,68,198,-2,298,-8,0,20,-24,-24,252,274,466,312,44,62,204,234,204,232,72,2,402,860,-12,202,346,34,446,432,434,22,4,262,10,206,54,234,260,26,8,194,470,38,288,222,12,34,250,38,-20,520,6,210,74,248,-14,24,322,20,70,52,230,62,242,224,222,30,244,38,10,104,-10,-12,272,8,300,216,44,482,220,-18,30,320,-8,34,434,4,-32,190,84,24,208,346,18,222,220,32,-26,194,72,-10,248,220,-24,-10,264,26,500,44,4,88
ZP:B:f,0.00478331,0.00679823,1.76384e-07 AS:i:9 XS:i:-2147483647
Somehow think I am missing a point here, can anybody help ?
Thanks,
Mayra
--
Mayra Mayr-Eduardoff
PhD Candidate
Institute of Legal Medicine
Innsbruck Medical University
Müllerstrasse 44
6020 Innsbruck
Austria
Tel:+43512900370624
Mail: mayra.eduard...@i-med.ac.at
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help