I guess I forgot to ask a question on the post, but what I meant to ask is
whether I should be manually calculating SNPs between the reference
sequence and the 'b' sequence or what particular part of this mark
duplicates pipeline caused this to happen (maybe that is an htsjdk question
though)

-Colin

On Wed, May 22, 2019 at 1:25 PM Colin <colin.di...@gmail.com> wrote:

> Hello all,
> A user sent us a CRAM file that contains 'b' readFeatures. By that I mean
> it refers to this type of element from the CRAM spec v3
> ...
> else if feature_code =‘b’ then 36:
>    bases ← ReadItem(BB, Byte[])
>
> I can decode these, but we are working in cram-js/jbrowse and these reads
> were not showing any SNPs since we don't calculate anything about these 'b'
> tags as containing SNPs. To be a little more clear it looked like this.
> Here is the same read from before and after some process replaced it with
> 'b' tags
>
>
> Before:
>
> [ { code: 'X', data: 0, pos: 21, refPos: 198 },
>   { code: 'X', data: 2, pos: 52, refPos: 229 },
>   { code: 'X', data: 0, pos: 54, refPos: 231 },
>   { code: 'X', data: 2, pos: 70, refPos: 247 },
>   { code: 'X', data: 2, pos: 80, refPos: 257 },
>   { code: 'X', data: 1, pos: 86, refPos: 263 },
>   { code: 'I', data: 'CT', pos: 133, refPos: 310 },
>   { code: 'X', data: 1, pos: 135, refPos: 310 } ]
>
> After:
>
> [ { code: 'b',
>
> data:  
> '65,84,84,65,67,65,71,71,67,71,65,65,67,65,84,65,67,84,84,65,65,84,65,65,65,71,84,71,84,71,84,84,65,65,84,84,65,65,84,84,65,65,84,71,67,84,84,71,84,65,71,84,65,65,65,84,65,65,84,65,65,84,65,65,67,65,65,84,84,84,65,65,84,71,84,67,84,71,67,84,67,65,71,67,67,71,67,84,84,84,67,67,65,67,65,67,65,71,65,67,65,84,67,65,84,65,65,67,65,65,65,65,65,65,84,84,84,67,67,65,67,67,65,65,65,67,67,67,67,67,67,67',
>     pos: 1,
>     refPos: 178 },
>   { code: 'I', data: 'CT', pos: 133, refPos: 310 },
>   { code: 'Q', data: 36, pos: 133, refPos: 308 },
>   { code: 'Q', data: 36, pos: 134, refPos: 309 },
>   { code: 'b',
>     data: '67,67,67,67,67,67,71,67,84,84,67,84,71,71,67',
>     pos: 135,
>     refPos: 310 } ]
>
>
> The header of the file that contained the 'b' features used MarkDuplicates
>
> @HD     VN:1.5  SO:coordinate
> @PG     ID:bwa  PN:bwa  VN:0.7.13-r1126 CL:bwa mem -R
> @RG\tID:DM_19_0434\tSM:DM_19_0434\tPL:ILLUMINA -M MT.fasta -t 10
> fastq/DM_19_0434/reads/CGEN/raw/R1.fastq.gz
> fastq/DM_19_0434/reads/CGEN/raw/R2.fastq.gz -v 1
> @PG     ID:MarkDuplicates       VN:2.9.0-1-gf5b9f50-SNAPSHOT
>  CL:picard.sam.markduplicates.MarkDuplicates
> INPUT=[fastq/DM_19_0434/GRCh37.p13/alignments/bwa/CGEN/raw_sorted.bam]
> OUTPUT=fastq/DM_19_0434/GRCh37.p13/alignments/bwa/CGEN/raw_sorted_duplicates_removed.bam
> METRICS_FILE=fastq/DM_19_0434/GRCh37.p13/alignments/bwa/CGEN/raw_sorted_duplication_metrics.txt
> REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000
> MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25
> REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag
> ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES
> PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates
> READ_NAME_REGEX=<optimized capture of last three ':' separated fields as
> numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO
> QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5
> MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
> GA4GH_CLIENT_SECRETS=client_secrets.json     PN:MarkDuplicates
> @SQ     SN:MT   LN:16569
> @RG     ID:DM_19_0434   SM:DM_19_0434   PL:ILLUMINA
>
>
> -Colin
>
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to