Hi James, Martin,

As James indicates BWA doesn't actually report the the correct MD tags as it 
changes all ambiguity and N positions to ACGT (by some reproducible method).

I had requested addition of functionality to biobambam(2) to recalculate the MD 
if the read spanned any ambiguity/N positions:

bamsort -h
This is biobambam2 version 2.0.50.
biobambam2 is distributed under version 3 of the GNU General Public License.
...
calmdnm=<[0]>                 : calculate MD and NM aux fields (for coordinate 
sorted output only)
calmdnmreference=<[]>         : reference for calculating MD and NM aux fields 
(calmdnm=1 only)
calmdnmrecompindetonly=<[0]>  : only recalculate MD and NM in the presence of 
indeterminate bases (calmdnm=1 only)
calmdnmwarnchange=<[0]>       : warn when changing existing MD/NM fields 
(calmdnm=1 only)

This gives you control over the warnings if that's what you specifically care 
about.

Regards,

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

k...@sanger.ac.uk
Tel:+44 (0)1223 834244 Ext: 7703
Office: H104

> On 2 Aug 2016, at 12:09, James Bonfield <j...@sanger.ac.uk> wrote:
> 
> On Tue, Aug 02, 2016 at 11:57:52AM +0200, Martin MOKREJ? wrote:
>> Could samtools calmd apply the following logic for bwa-processed input?
>> Get positions of all N's in the read.
>> Do not complain about those positions which are based on N's.
>> Do report other positions.
> 
> I don't entirely understand what you're wanting.
> 
> Calmd computes the MD:Z tag as described in the SAM specification,
> which is difference between the sequence and the reference.  It should
> not be changed to some other algorithm, no matter what aligners happen
> to do.  If aligners produce different MD tags then they are buggy.
> 
> Similarly for NM:i.
> 
>> OK, I know "samtools calmd" reports a total sum of the differences before 
>> and after but quite likely there will be only a "0 -> 0" to be reported. So 
>> I will get less warning on the screen, which is always good. ;)
> 
> The warnings are there because it is correcting the aligner output.
> The proper fix is to fix the aligners to produce the correct MD in the
> first place, not to break calmd to be buggy in the same manner.
> 
> James
> 
> -- 
> James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
>                                  | Plurima gyrabant gymbolitare vabo;
>  A Staden Package developer:     | Et Borogovorum mimzebant undique formae,
> https://sf.net/projects/staden/   | Momiferique omnes exgrabure Rathi. 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to