Hello Colin,

Apologies for the slow reply.  I don't normally check samtools
mailing list unless my email client tells me something new arrived,
but for some reason that's broken.  Anyway...

On Wed, May 22, 2019 at 01:25:59PM -0400, Colin wrote:
> Before:
> 
> [ { code: 'X', data: 0, pos: 21, refPos: 198 },
>   { code: 'X', data: 2, pos: 52, refPos: 229 },
>   { code: 'X', data: 0, pos: 54, refPos: 231 },
>   { code: 'X', data: 2, pos: 70, refPos: 247 },
>   { code: 'X', data: 2, pos: 80, refPos: 257 },
>   { code: 'X', data: 1, pos: 86, refPos: 263 },
>   { code: 'I', data: 'CT', pos: 133, refPos: 310 },
>   { code: 'X', data: 1, pos: 135, refPos: 310 } ]
> 
> After:
> 
> [ { code: 'b',
> 
> data:  
> '65,84,84,65,67,65,71,71,67,71,65,65,67,65,84,65,67,84,84,65,65,84,65,65,65,71,84,71,84,71,84,84,65,65,84,84,65,65,84,84,65,65,84,71,67,84,84,71,84,65,71,84,65,65,65,84,65,65,84,65,65,84,65,65,67,65,65,84,84,84,65,65,84,71,84,67,84,71,67,84,67,65,71,67,67,71,67,84,84,84,67,67,65,67,65,67,65,71,65,67,65,84,67,65,84,65,65,67,65,65,65,65,65,65,84,84,84,67,67,65,67,67,65,65,65,67,67,67,67,67,67,67',
>     pos: 1,
>     refPos: 178 },
>   { code: 'I', data: 'CT', pos: 133, refPos: 310 },
>   { code: 'Q', data: 36, pos: 133, refPos: 308 },
>   { code: 'Q', data: 36, pos: 134, refPos: 309 },
>   { code: 'b',
>     data: '67,67,67,67,67,67,71,67,84,84,67,84,71,71,67',
>     pos: 135,
>     refPos: 310 } ]

That's very mysterious.  It appears it's replacing everything that
isn't the insertion with "b".  Maybe it's become misaligned due to
some process and rather than create a whole string of SNPs it's
created "b" features instead?  I'm guessing this must be htsjdk as my
code doesn't take that approach, even though it may well be a better
strategy.

Does the sequence in question decode correctly?  Does it have a
sensible looking CIGAR string? Eg 132M2I15M?

> The header of the file that contained the 'b' features used MarkDuplicates

I don't see why MarkDuplicates would be changing alignments in any
way.  It may however be decoding the record and re-encoding it, which
could change the decisions as to how to encode something (especially
if the original file was, say, written by samtools and the subsequent
one is by picard).

Is the reference in question here uppercase or lowercase?  If I recall
there were issues with that at one point.

James

-- 
James Bonfield (j...@sanger.ac.uk)
The Sanger Institute, Hinxton, Cambs, CB10 1SA


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to