Hi, I received some bam files prepared elsewhere in the past and I want to re-create the VCF files. From the header I know human_g1k_v37.fasta.gz used to be used as the reference by bwa aligner. Also seems last time it was processed using MarkDuplicates. There is ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz and also ftp://ftp.broadinstitute.org/bundle/2.8/b37/human_g1k_v37.fasta.gz file with the same name. I picked the former origin.
I executed now: samtools calmd -Arb dedup.realigned.bam ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz > dedup.realigned.calmd.bam [bam_fillmd1] different NM for read 'HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358': 15 -> 70 [bam_fillmd1] different NM for read 'HWI-xxxxx:xxx:xxxxxxxxx:1:2312:4142:89740': 0 -> 1 [bam_fillmd1] different NM for read 'HWI-xxxxx:xxx:xxxxxxxxx:1:2216:4498:85887': 0 -> 2 [bam_fillmd1] different NM for read 'HWI-xxxxx:xxx:xxxxxxxxx:1:2312:4142:89740': 0 -> 1 [bam_fillmd1] different NM for read 'HWI-xxxxx:xxx:xxxxxxxxx:1:1213:15077:9565': 0 -> 1 [bam_fillmd1] different NM for read 'HWI-xxxxx:xxx:xxxxxxxxx:1:2216:4498:85887': 0 -> 1 [bam_fillmd1] different NM for read 'HWI-xxxxx:xxx:xxxxxxxxx:1:1213:12195:85510': 0 -> 1 [bam_fillmd1] different NM for read 'HWI-xxxxx:xxx:xxxxxxxxx:1:1213:15077:9565': 0 -> 1 [bam_fillmd1] different NM for read 'HWI-xxxxx:xxx:xxxxxxxxx:1:1107:10684:14625': 0 -> 1 Below the original and newly created BAM files for the first read: $ samtools view dedup.realigned.bam | grep HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358 HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358 145 1 9932 0 74M 20 59443385 0 GGGGGGTGGGGGTCAGAGCACCACAGGGGGGGGGGGAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAACC ########################################################=>=;?<<>=@?>1+)2); BD:Z:MMMMSRQPPOOOONNNMMMMMMMMMMMMNNNNNNMMMMMMLLLLKKKKKKKKKKKKJJJJKKKKKKKMKFNNMM PG:Z:MarkDuplicates RG:Z:some-sample1 BI:Z:FFFFCA@@AAAABCDEFGHIJJKKLLLLLLLLLMMMMMNNNNMMMMMMMLLLLLLLKKKKKKKKKKKLLHHGFF NM:i:15 MQ:i:96 PQ:i:187 UQ:i:38 XQ:i:0 $ samtools view dedup.realigned.calmd.bam | grep HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358 HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358 145 1 9932 0 74M 20 59443385 0 GGGGGGTGGGGGTCAGAGCACCACAGGGGGGGGGGGAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAACC ########################################################)))))))))))))))*)* BD:Z:MMMMSRQPPOOOONNNMMMMMMMMMMMMNNNNNNMMMMMMLLLLKKKKKKKKKKKKJJJJKKKKKKKMKFNNMM PG:Z:MarkDuplicates RG:Z:some-sample1 BI:Z:FFFFCA@@AAAABCDEFGHIJJKKLLLLLLLLLMMMMMNNNNMMMMMMMLLLLLLLKKKKKKKKKKKLLHHGFF MQ:i:96 PQ:i:187 UQ:i:38 XQ:i:0 ZQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@TUTRVSSUTWVUHB@H@Q NM:i:70 MD:Z:0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0T4 My questions are: Is the message 'different NM for read' because I picked up a wrong reference file than the one used initially by bwa aligner? In my eyes MD:Z:0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0T4 means that now the reference contains N's while it did not have them in the past. If the @MD tag is correct it means the read should not be considered as mapped to the chromosome 1 at all, and theoretically could have a better match elsewhere. So, keeping it badly aligned in this locus is wrong I think. So should I try to re-align all reads mentioned on the 'different NM for read' line again? How? Remove them from BAM, mapping newly while creating a new BAM file, then merging the BAM files? Sound I would loose some of the optional columns so the merged BAM file would not have all columns for all reads. Is that a problem? Thank you, Martin ------------------------------------------------------------------------------ _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help