Thanks for the reply, the old mailing list model dies hard :)

Samtools view seems to decode this correctly (with or without the reference
sequence that they sent to me, a MT chromosome, specified with -T) with
that exact CIGAR string, and additionally calculates an MD string...

All letters in the reference they sent me are upper cased in the file I
received, and from their commands they sent it looks like they were
consistent about using the same reference


-Colin

On Thu, May 30, 2019 at 5:08 AM James Bonfield <j...@sanger.ac.uk> wrote:

> Hello Colin,
>
> Apologies for the slow reply.  I don't normally check samtools
> mailing list unless my email client tells me something new arrived,
> but for some reason that's broken.  Anyway...
>
> On Wed, May 22, 2019 at 01:25:59PM -0400, Colin wrote:
> > Before:
> >
> > [ { code: 'X', data: 0, pos: 21, refPos: 198 },
> >   { code: 'X', data: 2, pos: 52, refPos: 229 },
> >   { code: 'X', data: 0, pos: 54, refPos: 231 },
> >   { code: 'X', data: 2, pos: 70, refPos: 247 },
> >   { code: 'X', data: 2, pos: 80, refPos: 257 },
> >   { code: 'X', data: 1, pos: 86, refPos: 263 },
> >   { code: 'I', data: 'CT', pos: 133, refPos: 310 },
> >   { code: 'X', data: 1, pos: 135, refPos: 310 } ]
> >
> > After:
> >
> > [ { code: 'b',
> >
> > data:
> '65,84,84,65,67,65,71,71,67,71,65,65,67,65,84,65,67,84,84,65,65,84,65,65,65,71,84,71,84,71,84,84,65,65,84,84,65,65,84,84,65,65,84,71,67,84,84,71,84,65,71,84,65,65,65,84,65,65,84,65,65,84,65,65,67,65,65,84,84,84,65,65,84,71,84,67,84,71,67,84,67,65,71,67,67,71,67,84,84,84,67,67,65,67,65,67,65,71,65,67,65,84,67,65,84,65,65,67,65,65,65,65,65,65,84,84,84,67,67,65,67,67,65,65,65,67,67,67,67,67,67,67',
> >     pos: 1,
> >     refPos: 178 },
> >   { code: 'I', data: 'CT', pos: 133, refPos: 310 },
> >   { code: 'Q', data: 36, pos: 133, refPos: 308 },
> >   { code: 'Q', data: 36, pos: 134, refPos: 309 },
> >   { code: 'b',
> >     data: '67,67,67,67,67,67,71,67,84,84,67,84,71,71,67',
> >     pos: 135,
> >     refPos: 310 } ]
>
> That's very mysterious.  It appears it's replacing everything that
> isn't the insertion with "b".  Maybe it's become misaligned due to
> some process and rather than create a whole string of SNPs it's
> created "b" features instead?  I'm guessing this must be htsjdk as my
> code doesn't take that approach, even though it may well be a better
> strategy.
>
> Does the sequence in question decode correctly?  Does it have a
> sensible looking CIGAR string? Eg 132M2I15M?
>
> > The header of the file that contained the 'b' features used
> MarkDuplicates
>
> I don't see why MarkDuplicates would be changing alignments in any
> way.  It may however be decoding the record and re-encoding it, which
> could change the decisions as to how to encode something (especially
> if the original file was, say, written by samtools and the subsequent
> one is by picard).
>
> Is the reference in question here uppercase or lowercase?  If I recall
> there were issues with that at one point.
>
> James
>
> --
> James Bonfield (j...@sanger.ac.uk)
> The Sanger Institute, Hinxton, Cambs, CB10 1SA
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to