Hi James, Martin, As James indicates BWA doesn't actually report the the correct MD tags as it changes all ambiguity and N positions to ACGT (by some reproducible method).
I had requested addition of functionality to biobambam(2) to recalculate the MD if the read spanned any ambiguity/N positions: bamsort -h This is biobambam2 version 2.0.50. biobambam2 is distributed under version 3 of the GNU General Public License. ... calmdnm=<[0]> : calculate MD and NM aux fields (for coordinate sorted output only) calmdnmreference=<[]> : reference for calculating MD and NM aux fields (calmdnm=1 only) calmdnmrecompindetonly=<[0]> : only recalculate MD and NM in the presence of indeterminate bases (calmdnm=1 only) calmdnmwarnchange=<[0]> : warn when changing existing MD/NM fields (calmdnm=1 only) This gives you control over the warnings if that's what you specifically care about. Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute k...@sanger.ac.uk Tel:+44 (0)1223 834244 Ext: 7703 Office: H104 > On 2 Aug 2016, at 12:09, James Bonfield <j...@sanger.ac.uk> wrote: > > On Tue, Aug 02, 2016 at 11:57:52AM +0200, Martin MOKREJ? wrote: >> Could samtools calmd apply the following logic for bwa-processed input? >> Get positions of all N's in the read. >> Do not complain about those positions which are based on N's. >> Do report other positions. > > I don't entirely understand what you're wanting. > > Calmd computes the MD:Z tag as described in the SAM specification, > which is difference between the sequence and the reference. It should > not be changed to some other algorithm, no matter what aligners happen > to do. If aligners produce different MD tags then they are buggy. > > Similarly for NM:i. > >> OK, I know "samtools calmd" reports a total sum of the differences before >> and after but quite likely there will be only a "0 -> 0" to be reported. So >> I will get less warning on the screen, which is always good. ;) > > The warnings are there because it is correcting the aligner output. > The proper fix is to fix the aligners to produce the correct MD in the > first place, not to break calmd to be buggy in the same manner. > > James > > -- > James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova > | Plurima gyrabant gymbolitare vabo; > A Staden Package developer: | Et Borogovorum mimzebant undique formae, > https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > ------------------------------------------------------------------------------ > _______________________________________________ > Samtools-help mailing list > Samtools-help@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
------------------------------------------------------------------------------
_______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help