[Samtools-help] samtools calmd behavior and MD tag

Martin MOKREJŠ Thu, 28 Jul 2016 01:56:33 -0700

Hi,
   I received some bam files prepared elsewhere in the past and I want to 
re-create the VCF files. From the header I know human_g1k_v37.fasta.gz used to 
be used as the reference by bwa aligner. Also seems last time it was processed 
using MarkDuplicates.
   There is 
ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz 
and also ftp://ftp.broadinstitute.org/bundle/2.8/b37/human_g1k_v37.fasta.gz 
file with the same name. I picked the former origin.


   I executed now:

samtools calmd -Arb dedup.realigned.bam 
ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz 
> dedup.realigned.calmd.bam
[bam_fillmd1] different NM for read 
'HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358': 15 -> 70
[bam_fillmd1] different NM for read 
'HWI-xxxxx:xxx:xxxxxxxxx:1:2312:4142:89740': 0 -> 1
[bam_fillmd1] different NM for read 
'HWI-xxxxx:xxx:xxxxxxxxx:1:2216:4498:85887': 0 -> 2
[bam_fillmd1] different NM for read 
'HWI-xxxxx:xxx:xxxxxxxxx:1:2312:4142:89740': 0 -> 1
[bam_fillmd1] different NM for read 
'HWI-xxxxx:xxx:xxxxxxxxx:1:1213:15077:9565': 0 -> 1
[bam_fillmd1] different NM for read 
'HWI-xxxxx:xxx:xxxxxxxxx:1:2216:4498:85887': 0 -> 1
[bam_fillmd1] different NM for read 
'HWI-xxxxx:xxx:xxxxxxxxx:1:1213:12195:85510': 0 -> 1
[bam_fillmd1] different NM for read 
'HWI-xxxxx:xxx:xxxxxxxxx:1:1213:15077:9565': 0 -> 1
[bam_fillmd1] different NM for read 
'HWI-xxxxx:xxx:xxxxxxxxx:1:1107:10684:14625': 0 -> 1

Below the original and newly created BAM files for the first read:

$ samtools view dedup.realigned.bam | grep 
HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358
HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358      145     1       9932    0       
74M     20      59443385        0       
GGGGGGTGGGGGTCAGAGCACCACAGGGGGGGGGGGAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAACC      
########################################################=>=;?<<>=@?>1+)2);      
BD:Z:MMMMSRQPPOOOONNNMMMMMMMMMMMMNNNNNNMMMMMMLLLLKKKKKKKKKKKKJJJJKKKKKKKMKFNNMM 
       PG:Z:MarkDuplicates     RG:Z:some-sample1   
BI:Z:FFFFCA@@AAAABCDEFGHIJJKKLLLLLLLLLMMMMMNNNNMMMMMMMLLLLLLLKKKKKKKKKKKLLHHGFF 
NM:i:15 MQ:i:96 PQ:i:187        UQ:i:38 XQ:i:0

$ samtools view dedup.realigned.calmd.bam | grep 
HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358
HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358      145     1       9932    0       
74M     20      59443385        0       
GGGGGGTGGGGGTCAGAGCACCACAGGGGGGGGGGGAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAACC      
########################################################)))))))))))))))*)*      
BD:Z:MMMMSRQPPOOOONNNMMMMMMMMMMMMNNNNNNMMMMMMLLLLKKKKKKKKKKKKJJJJKKKKKKKMKFNNMM 
       PG:Z:MarkDuplicates     RG:Z:some-sample1   
BI:Z:FFFFCA@@AAAABCDEFGHIJJKKLLLLLLLLLMMMMMNNNNMMMMMMMLLLLLLLKKKKKKKKKKKLLHHGFF 
MQ:i:96 PQ:i:187        UQ:i:38 XQ:i:0  
ZQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@TUTRVSSUTWVUHB@H@Q 
NM:i:70 
MD:Z:0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0T4


My questions are:

   Is the message 'different NM for read' because I picked up a wrong reference 
file than the one used initially by bwa aligner?

   In my eyes 
MD:Z:0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0T4
  means that now the reference contains N's while it did not have them in the 
past.

   If the @MD tag is correct it means the read should not be considered as 
mapped to the chromosome 1 at all, and theoretically could have a better match 
elsewhere. So, keeping it badly aligned in this locus is wrong I think. So 
should I try to re-align all reads mentioned on the 'different NM for read' 
line again? How? Remove them from BAM, mapping newly while creating a new BAM 
file, then merging the BAM files? Sound I would loose some of the optional 
columns so the merged BAM file would not have all columns for all reads. Is 
that a problem?


Thank you,
Martin

------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

[Samtools-help] samtools calmd behavior and MD tag

Reply via email to