Hello Colin, Apologies for the slow reply. I don't normally check samtools mailing list unless my email client tells me something new arrived, but for some reason that's broken. Anyway...
On Wed, May 22, 2019 at 01:25:59PM -0400, Colin wrote: > Before: > > [ { code: 'X', data: 0, pos: 21, refPos: 198 }, > { code: 'X', data: 2, pos: 52, refPos: 229 }, > { code: 'X', data: 0, pos: 54, refPos: 231 }, > { code: 'X', data: 2, pos: 70, refPos: 247 }, > { code: 'X', data: 2, pos: 80, refPos: 257 }, > { code: 'X', data: 1, pos: 86, refPos: 263 }, > { code: 'I', data: 'CT', pos: 133, refPos: 310 }, > { code: 'X', data: 1, pos: 135, refPos: 310 } ] > > After: > > [ { code: 'b', > > data: > '65,84,84,65,67,65,71,71,67,71,65,65,67,65,84,65,67,84,84,65,65,84,65,65,65,71,84,71,84,71,84,84,65,65,84,84,65,65,84,84,65,65,84,71,67,84,84,71,84,65,71,84,65,65,65,84,65,65,84,65,65,84,65,65,67,65,65,84,84,84,65,65,84,71,84,67,84,71,67,84,67,65,71,67,67,71,67,84,84,84,67,67,65,67,65,67,65,71,65,67,65,84,67,65,84,65,65,67,65,65,65,65,65,65,84,84,84,67,67,65,67,67,65,65,65,67,67,67,67,67,67,67', > pos: 1, > refPos: 178 }, > { code: 'I', data: 'CT', pos: 133, refPos: 310 }, > { code: 'Q', data: 36, pos: 133, refPos: 308 }, > { code: 'Q', data: 36, pos: 134, refPos: 309 }, > { code: 'b', > data: '67,67,67,67,67,67,71,67,84,84,67,84,71,71,67', > pos: 135, > refPos: 310 } ] That's very mysterious. It appears it's replacing everything that isn't the insertion with "b". Maybe it's become misaligned due to some process and rather than create a whole string of SNPs it's created "b" features instead? I'm guessing this must be htsjdk as my code doesn't take that approach, even though it may well be a better strategy. Does the sequence in question decode correctly? Does it have a sensible looking CIGAR string? Eg 132M2I15M? > The header of the file that contained the 'b' features used MarkDuplicates I don't see why MarkDuplicates would be changing alignments in any way. It may however be decoding the record and re-encoding it, which could change the decisions as to how to encode something (especially if the original file was, say, written by samtools and the subsequent one is by picard). Is the reference in question here uppercase or lowercase? If I recall there were issues with that at one point. James -- James Bonfield (j...@sanger.ac.uk) The Sanger Institute, Hinxton, Cambs, CB10 1SA -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help