Thanks for your super helpful responses. > -I'm not writing any "extra bytes" after the read quality, aka > "extra_len". > > What is this region used for? > > Do you mean extranul? This is used to pad out read names to end on a > multiple of 4 bytes in the in-memory address, so that the 32-bit cigar > fields are correctly aligned in memory. (Failure to do this can cause > some SIMD compiler optimisations to cause crashes.) We could consider > this to be a BAM design flaw as swapping the read name and cigar field > would make this hack unnecessary. > > It's purely an in-memory trick though and when written to disk the > layout is as per the specification. Provided your in-memory layout > and extranul usage are internally consistent you should be fine for > writing. >
Sorry I was unclear. I'm referring to data included following the name, cigar, sequence, and quality, aka at the end of the bam_t.data block. int extra_len = 0; int bam_len = numNameBytes + numCigarBytes + numSeqBytes + numQualityBytes + extra_len; bam1_t* b = (bam1_t*) calloc( 1, sizeof(bam1_t) ); uint8_t *dest = b->data; encodeName( name, qname_nuls, dest ); encodeOperations( operations, dest ); encodeSequence( sequence, dest ); encodeQuality( quality, seqLen, dest ); As you can see I'm encoding the name, operations (CIGAR), sequence, and quality (if provided), but no bytes following that data. Note that when encoding the name I *am* ensuring I use 1-4 null bytes to ensure 32-bit alignment for CIGAR data. Perhaps I should padd the entire data buffer such that bam_len % 4 == 0? Aka int extra_len = bamlen % 4; bamLen += extra_len; -Will
_______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help