Thanks, I documented this in my code. It looks like I am running into a
problem reading back BAM content I have written (reading SAM content isn't
a problem). Attached are a pair of BAM and SAM files I generated the same
way. The only difference is passed "wb" instead of "w" to sam_open in order
to write the BAM file. When I read back in the BAM file it fails and I see
the following printed in standard output:

[W::bam_hdr_read] EOF marker is absent. The input is probably truncated


In order to write the SAM/BAM files I make the following calls:

sam_open

sam_hdr_write

sam_write1

sam_close


This is with the most recent release of htslib. When writing a BAM file I
find:


outputFile->format.compression = 2

outputFile->format.compression_level = -1


immediately after calling sam_open (outputFile is a samFile*).

>From reading the SAM/BAM specification it sounds like I'm missing a 28 byte
EOF at the end of my file. Am I expeted to write those 1f 8b .. 00 00 bytes
myself manually, or was htslib supposed to have done that for me, or is
there some htslib command I have failed to call?

Note reading the header for this file works, but sam_read1 fails so I
cannot parse the various reads.

Will


On Wed, Nov 7, 2018 at 6:28 AM James Bonfield <j...@sanger.ac.uk> wrote:

> On Tue, Nov 06, 2018 at 10:47:36AM -0500, Will Stokes wrote:
> > Sorry I was unclear. I'm referring to data included following the name,
> > cigar, sequence, and quality, aka at the end of the bam_t.data block.
> >
> > int extra_len = 0;
> >
> > int bam_len = numNameBytes + numCigarBytes + numSeqBytes +
> > numQualityBytes + extra_len;
>
> Ah yes, this is used in CRAM's bam_construct_seq function purely for
> purposes of memory allocation.
>
> The extra data referred to here is the auxiliary tags, either verbatim
> ones stored in CRAM or auto-generated ones such as the RG:Z: tag.
>
> > Note that when encoding the name I *am* ensuring I use 1-4 null bytes
> > to ensure 32-bit alignment for CIGAR data. Perhaps I should padd the
> > entire data buffer such that
>
> I wouldn't do that.  Any extra data left over after quality will be
> interpreted as auxiliary tags.  If you have none then the bam record
> must end immediately after the quality values.
>
> See the BAM table in the SAM specification (section 4.2).
>
> James
>
> --
> James Bonfield (j...@sanger.ac.uk)
> The Sanger Institute, Hinxton, Cambs, CB10 1SA
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>
>
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help
>

Attachment: simple.bam
Description: application/dna

Attachment: simple.sam
Description: Binary data

_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to