On Tue, Nov 06, 2018 at 10:47:36AM -0500, Will Stokes wrote:
> Sorry I was unclear. I'm referring to data included following the name,
> cigar, sequence, and quality, aka at the end of the bam_t.data block.
> 
> int extra_len = 0;
> 
> int bam_len = numNameBytes + numCigarBytes + numSeqBytes +
> numQualityBytes + extra_len;

Ah yes, this is used in CRAM's bam_construct_seq function purely for
purposes of memory allocation.

The extra data referred to here is the auxiliary tags, either verbatim
ones stored in CRAM or auto-generated ones such as the RG:Z: tag.

> Note that when encoding the name I *am* ensuring I use 1-4 null bytes
> to ensure 32-bit alignment for CIGAR data. Perhaps I should padd the
> entire data buffer such that

I wouldn't do that.  Any extra data left over after quality will be
interpreted as auxiliary tags.  If you have none then the bam record
must end immediately after the quality values.

See the BAM table in the SAM specification (section 4.2).

James

-- 
James Bonfield (j...@sanger.ac.uk)
The Sanger Institute, Hinxton, Cambs, CB10 1SA


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to