On Thu, Jul 28, 2016 at 10:53:53AM +0200, Martin MOKREJ? wrote:
> $ samtools view dedup.realigned.calmd.bam | grep 
> HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358
> HWI-xxxxx:xxx:xxxxxxxxx:1:1106:17285:41358      145     1       9932    0     
>   74M     20      59443385        0       
> GGGGGGTGGGGGTCAGAGCACCACAGGGGGGGGGGGAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAACC    
>   ########################################################)))))))))))))))*)*  
>     
> BD:Z:MMMMSRQPPOOOONNNMMMMMMMMMMMMNNNNNNMMMMMMLLLLKKKKKKKKKKKKJJJJKKKKKKKMKFNNMM
>         PG:Z:MarkDuplicates     RG:Z:some-sample1   
> BI:Z:FFFFCA@@AAAABCDEFGHIJJKKLLLLLLLLLMMMMMNNNNMMMMMMMLLLLLLLKKKKKKKKKKKLLHHGFF
>  MQ:i:96 PQ:i:187        UQ:i:38 XQ:i:0  
> ZQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@TUTRVSSUTWVUHB@H@Q
>  NM:i:70 
> MD:Z:0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0T4

This looks correct to me.

> My questions are:
> 
>    Is the message 'different NM for read' because I picked up a wrong 
> reference file than the one used initially by bwa aligner?

Either that or a bug in the aligner itself.  I've seen some very odd
calculations of MD/NM from some aligners, eg ones that treat N as A
for hash purposes, etc.  I don't know how bwa handles N, but it may
have also changed over time so you'd need to check the @PG headers to
see the release number too.

>    In my eyes 
> MD:Z:0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0N0T4
>   means that now the reference contains N's while it did not have them in the 
> past.

Correct.

$ samtools faidx hs37d5.fa 1:9932-10005
>1:9932-10005
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNTAACC

calmd is claiming all bar the last 4 bases differ and should be
N{69}T, with the last AACC matching.

>    If the @MD tag is correct it means the read should not be
> considered as mapped to the chromosome 1 at all, and theoretically
> could have a better match elsewhere. So, keeping it badly aligned in
> this locus is wrong I think. So should I try to re-align all reads
> mentioned on the 'different NM for read' line again? How? Remove
> them from BAM, mapping newly while creating a new BAM file, then
> merging the BAM files? Sound I would loose some of the optional
> columns so the merged BAM file would not have all columns for all
> reads. Is that a problem? 

Again, this depends on the aligner and options used.  I'd agree though
that it looks like a "courageous" alignment, but possibly it's the
best (maybe even correct) alignment that places it the expected
distance apart from a uniquely placed mate pair.

At least it got a very low mapping quality.

Realigning is a little messy.  You'd have to come up with a way to
filter out the specific read pairs you want to realign (maybe even
something as dumb as an egrep if there are just a few), align and sort
them, and then merge your filtered and newly aligned files back again.

(I don't have time to construct such a command line.)

James

-- 
James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
                                  | Plurima gyrabant gymbolitare vabo;
  A Staden Package developer:     | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/   | Momiferique omnes exgrabure Rathi. 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to