On Tue, Nov 06, 2018 at 10:47:36AM -0500, Will Stokes wrote: > Sorry I was unclear. I'm referring to data included following the name, > cigar, sequence, and quality, aka at the end of the bam_t.data block. > > int extra_len = 0; > > int bam_len = numNameBytes + numCigarBytes + numSeqBytes + > numQualityBytes + extra_len;
Ah yes, this is used in CRAM's bam_construct_seq function purely for purposes of memory allocation. The extra data referred to here is the auxiliary tags, either verbatim ones stored in CRAM or auto-generated ones such as the RG:Z: tag. > Note that when encoding the name I *am* ensuring I use 1-4 null bytes > to ensure 32-bit alignment for CIGAR data. Perhaps I should padd the > entire data buffer such that I wouldn't do that. Any extra data left over after quality will be interpreted as auxiliary tags. If you have none then the bam record must end immediately after the quality values. See the BAM table in the SAM specification (section 4.2). James -- James Bonfield (j...@sanger.ac.uk) The Sanger Institute, Hinxton, Cambs, CB10 1SA -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help