On Tue, May 17, 2016 at 09:17:20PM +0000, Lin, Chih-Hsu wrote:
> During the conversion, BAM->CRAM->BAM, using samtools 1.3.1, I found the NM 
> tags were changed. Does anyone have solution to that?

Can you give us a concrete example please; an alignment record along
with the relevant @SQ line so we can see what the reference is.

The reason for NM and MD changes is that CRAM doesn't explicitly store
these (although it could, it leads to larger files).  Instead it uses
the reference to compute them on-the-fly.  However we have seen a
number of cases where NM/MD in the original BAM file are incorrect,
due to bugs in aligners.  This leads to changes after round-tripping
through CRAM.

So that said, are the NM values output by CRAM->BAM the same values
that samtools calmd generates?  If so then this is fixing your data!
If not then we possibly have a bug that needs fixing.

Some have asked for a way to store the invalid data in CRAM
regardless.  There is perhaps some (albeit twisted) logic to this as
it makes the validation of the data the responsibility of other tools
and not the file format itself.  I experimented with this by writing
out all NM/MD, but it usually leads to 5-10% growth.  A better
solution would be to check when they differ to the computed values and
only store them then, although that will slow up CRAM encoding
somewhat so I'm not convinced yet this is a problem in need of a
solution.

James

-- 
James Bonfield (j...@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
                                  | Plurima gyrabant gymbolitare vabo;
  A Staden Package developer:     | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/   | Momiferique omnes exgrabure Rathi. 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to