Hello, I hope this list is the right place for my question regarding the SAM file format. I have also posted it to
http://seqanswers.com/forums/showthread.php?t=47637 just in case. Please point me to better sites or mailing list if this does not belong here. I'm trying to wrap my head around the optional MD tag in SAM files because a tool in my processing pipeline relies on this tag. In theory it should allow me to call SNPs/indels without looking up the reference sequence for a read. An example MD tag from a file I'm dealing with is MD:Z:2A11G14G7G9^C3 read: GAGGAACCTTACCAAGGCTTGACATGTAGCTGCAAGCGCACGGAAACGTGTG CIGAR: 32M1I5M1I10M1D3M Now while the sum of the CIGAR M/I/S/=/X operations correctly equals the length of the read (52 bases, 53 when also considering the deletion/gap at position 50), I only get to 51 reference bases when I attempt to (manually) reconstruct the reference from the MD tag alone: 2A11G14G7G9^C3 in a "decompressed" form becomes ==A===========G==============G=======G=========-=== becomes the following reference sequence (first line) as compared to the true reference sequence (second line): GAAGAACCTTACCAGGGCTTGACATGTAGGTGCAAGCGCACGGAAACCTGT GAAGAACCTTACCAGGGCTTGACATGTAGGTG-AAGCG-GCGGAAACGTCGTG The difference in length as well as the shift in the sequence both seem arise from the lack of a notation for the two gaps in the reference (positions 33 and 39). Now, am I just misunderstanding the MD tag? Do I always have to consider both the CIGAR string AND the MD tag to infer the reference sequence? Or is there a notation for gaps in the reference that I simply have overlooked in the SAM specification? What I've found so far is the Regex or permitted characters on page 6: [0-9]+(([A-Z]|\^[A-Z]+)[0-9]+)* and the footnote on page 7 claiming that the MD field ought to match the CIGAR string (which it obviously doesn't in my example). Thank you a lot for any insight and clarification! Florian Aldehoff ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://p.sf.net/sfu/Zoho _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help