A bit late, but I've just noticed by accident where I use the 'b' feature. In scramble / samtools there is the option to do referenceless CRAM encoding. It just stores the sequence as-is. It can be useful for assemblies where the "reference" is ephemeral and it's pointless lodging a copy with a refget server (although in practice we're better off embedding the reference instead).
Example command line "scramble -x in.sam out.cram". I don't know how this related to Mark Duplicates, but it's possible a similar cause triggers it there. Losing track of the reference *may* cause it to output referenceless mode? Does htsjdk support that? James On Fri, May 31, 2019 at 12:14:11AM -0400, Colin wrote: > Thanks for the reply, the old mailing list model dies hard :) > > Samtools view seems to decode this correctly (with or without the reference > sequence that they sent to me, a MT chromosome, specified with -T) with > that exact CIGAR string, and additionally calculates an MD string... > > All letters in the reference they sent me are upper cased in the file I > received, and from their commands they sent it looks like they were > consistent about using the same reference -- James Bonfield (j...@sanger.ac.uk) The Sanger Institute, Hinxton, Cambs, CB10 1SA -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help