On Thu, Jul 28, 2016 at 10:53:53AM +0200, Martin MOKREJ? wrote: > $ samtools view dedup.realigned.calmd.bam | grep > HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358 > HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358 145 1 9932 0 > 74M 20 59443385 0 > GGGGGGTGGGGGTCAGAGCACCACAGGGGGGGGGGGAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAACC > ########################################################)))))))))))))))*)* > > BD:Z:MMMMSRQPPOOOONNNMMMMMMMMMMMMNNNNNNMMMMMMLLLLKKKKKKKKKKKKJJJJKKKKKKKMKFNNMM > PG:Z:MarkDuplicates RG:Z:some-sample1 > BI:Z:FFFFCA@@AAAABCDEFGHIJJKKLLLLLLLLLMMMMMNNNNMMMMMMMLLLLLLLKKKKKKKKKKKLLHHGFF > MQ:i:96 PQ:i:187 UQ:i:38 XQ:i:0 > ZQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@TUTRVSSUTWVUHB@H@Q > NM:i:70 > MD:Z:0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0T4
This looks correct to me. > My questions are: > > Is the message 'different NM for read' because I picked up a wrong > reference file than the one used initially by bwa aligner? Either that or a bug in the aligner itself. I've seen some very odd calculations of MD/NM from some aligners, eg ones that treat N as A for hash purposes, etc. I don't know how bwa handles N, but it may have also changed over time so you'd need to check the @PG headers to see the release number too. > In my eyes > MD:Z:0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0T4 > means that now the reference contains N's while it did not have them in the > past. Correct. $ samtools faidx hs37d5.fa 1:9932-10005 >1:9932-10005 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNTAACC calmd is claiming all bar the last 4 bases differ and should be N{69}T, with the last AACC matching. > If the @MD tag is correct it means the read should not be > considered as mapped to the chromosome 1 at all, and theoretically > could have a better match elsewhere. So, keeping it badly aligned in > this locus is wrong I think. So should I try to re-align all reads > mentioned on the 'different NM for read' line again? How? Remove > them from BAM, mapping newly while creating a new BAM file, then > merging the BAM files? Sound I would loose some of the optional > columns so the merged BAM file would not have all columns for all > reads. Is that a problem? Again, this depends on the aligner and options used. I'd agree though that it looks like a "courageous" alignment, but possibly it's the best (maybe even correct) alignment that places it the expected distance apart from a uniquely placed mate pair. At least it got a very low mapping quality. Realigning is a little messy. You'd have to come up with a way to filter out the specific read pairs you want to realign (maybe even something as dumb as an egrep if there are just a few), align and sort them, and then merge your filtered and newly aligned files back again. (I don't have time to construct such a command line.) James -- James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help