Thanks for your super helpful responses.

> -I'm not writing any "extra bytes" after the read quality, aka
> "extra_len".
> > What is this region used for?
>
> Do you mean extranul?  This is used to pad out read names to end on a
> multiple of 4 bytes in the in-memory address, so that the 32-bit cigar
> fields are correctly aligned in memory.  (Failure to do this can cause
> some SIMD compiler optimisations to cause crashes.)  We could consider
> this to be a BAM design flaw as swapping the read name and cigar field
> would make this hack unnecessary.
>
> It's purely an in-memory trick though and when written to disk the
> layout is as per the specification.  Provided your in-memory layout
> and extranul usage are internally consistent you should be fine for
> writing.
>

Sorry I was unclear. I'm referring to data included following the name,
cigar, sequence, and quality, aka at the end of the bam_t.data block.

int extra_len = 0;

int bam_len = numNameBytes + numCigarBytes + numSeqBytes +
numQualityBytes + extra_len;

bam1_t* b = (bam1_t*) calloc( 1, sizeof(bam1_t) );

uint8_t *dest = b->data;

encodeName( name, qname_nuls,   dest );

encodeOperations( operations,   dest );

encodeSequence( sequence,       dest );

encodeQuality( quality, seqLen, dest );


As you can see I'm encoding the name, operations (CIGAR), sequence,
and quality (if provided), but no bytes following that data.

Note that when encoding the name I *am* ensuring I use 1-4 null bytes
to ensure 32-bit alignment for CIGAR data. Perhaps I should padd the
entire data buffer such that


bam_len % 4 == 0?


Aka


int extra_len = bamlen % 4;

bamLen += extra_len;



-Will
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to