Hello Will,

On Mon, Oct 29, 2018 at 04:25:38PM -0400, Will Stokes wrote:
> Hello, I have been experimenting with using htslib to read SAM / BAM
> content which has gone well. I'd now like to be able to write SAM / BAM
> content de Novo. I see two options:
> 
> 1) write SAM content myself, then use samtools or htslib to convert from
> SAM to BAM format

That would work, but is likely slow.  Writing SAM efficiently isn't
easy as the general C/C++ functions tend to be rather inefficient at
formatting numbers.

> 2) write BAM content directly using htslib.
> 
> For example, creating the header seems relatively straightforward:
> 
>   header = bam_hdr_init();
> 
>   header->n_targets = count;
> 
>   header->target_len = (uint32_t*)calloc( count, sizeof(uint32_t) );
> 
>   for(int i=0; i < count; i++)
> 
> {
> 
>     header->target_len[i]  = nameLens[i].length();
> 
>     header->target_name[i] = strdup( names[i] );
> 
> }
> 
> 
> I don't think I need to populate header->text and header->l_text,
> correct me if I'm wrong.

I'm not 100% sure about headers.  I think it's permissible to simply
have a BAM file containing the arrays of seq name/length, but this
doesn't leave any room for useful meta data such as read-groups.  It's
also mandatory to have @SQ M5 headers if you're writing CRAM.

Basically I think it'll work without a header, but it's strongly
recommended to create one.


> Writing looks straight forward as well:
> 
> 
> samFile* file = sam_open( tempFile, "w" );
> 
> sam_hdr_write( file, header );
> 
> 
> I'm assuming the filename extension is used by sam_open to determine if it
> should write out using SAM or BAM encoding. Please confirm.

The mode string handles this, so "w" for sam, "wb" for bam and "wc"
for cram.

There's also sam_open_mode() which takes format as a string (sam, bam,
cram), or look for the suffix if no format string was given.  It then
calls sam_open with the appropriate mode string.

> However, in order to write alignments...
> 
> for( auto alignment : alignments )
> 
> {  sam_write1( file, header, alignment ); }
> 
> 
> I'll first need a list of bam1_t objects. I'm seeing guidance for constucting
> 
> bam1_t and the related bam1_core_t structures. I have been poking
> around in the test folders for both samtools and htslib and must have
> overlooked such an example.
> 
> Are there any examples of doing this?

This interface is sadly lacking!  We really need a public facing way
of doing this.

The in-memory bam1_t / bam1_core_t structures are laid out in the
uncompressed BAM format as defined in the SAM spec (barring the odd
bit of memory alignment).  It's tedious to do by hand, but not that
hard.

There are htslib internal functions for this though.  The main
sam_parse1 function constructs a bam1_t structure, but it's optimised
for speed, not readability, so it is quite hard to follow.

The CRAM implementation added a "bam_construct_seq" function which
takes a series of strings and integers to generate a bam1_t struct.  I
think this is an internal function, but you can look at the code at
least as a demonstration for how to do it.  See
htslib/cram/cram_samtools.c.

A note of confusion-avoidance, bam_seq_t is the same as bam1_t.  It's
a typedef simply due to the ancestry of this code - the CRAM code
started off as part of Staden io_lib, which had a different type
name.  I really should go through and rename them all, but it's not
externally visible so has never been high priority.

For example usage of this, check sam_read1() in sam.c and follow the
chain of calling via cram_get_bam_seq -> cram_to_bam ->
bam_construct_seq.

James

-- 
James Bonfield (j...@sanger.ac.uk)
The Sanger Institute, Hinxton, Cambs, CB10 1SA


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to