On Fri, Oct 19, 2018 at 04:30:33PM -0600, Brent Pedersen wrote:
> I want to store a minimal representation of an alignment (more minimal
> than CRAM).
> I thought I could save the position, cigar, and MD and be able to
> reconstruct the
> read sequence, but the MD (IIUC) allows reconstructing the reference
> from the read.

To reconstruct the sequence you'll need an analogue of MD, as you
say.  This isn't something available at the moment.

> Is there something exposed in the current htslib that would facilitate
> this? E.g. a tag that's sort of the inverse of MD? I know this must be
> used internally for CRAM.

Internally CRAM has a number of data series to encode differences
between a sequence and a reference.  These differences are called
"features" and loosely correspond to CIGAR operations.

FN (number of features)
FP (position of feature; a delta to the last feature)
FC (feature code; type of the edit)

Then each feature has its own associated data series.  Eg FC "X" is
substitution, which stores the edit in the BS (base substitution) data
series,  FC "D" (deletion) stores the deletion length (but not the
bases) in DL, and FC "I" (insertion) stores the inserted bases in IN.

This is a rather distributed way of doing things, rather than a single
aux tag like MD.  Hence we don't have any code that can easily be
used for this.

James

-- 
James Bonfield (j...@sanger.ac.uk)
The Sanger Institute, Hinxton, Cambs, CB10 1SA


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to