Hello all,
A user sent us a CRAM file that contains 'b' readFeatures. By that I mean
it refers to this type of element from the CRAM spec v3
...
else if feature_code =‘b’ then 36:
   bases ← ReadItem(BB, Byte[])

I can decode these, but we are working in cram-js/jbrowse and these reads
were not showing any SNPs since we don't calculate anything about these 'b'
tags as containing SNPs. To be a little more clear it looked like this.
Here is the same read from before and after some process replaced it with
'b' tags


Before:

[ { code: 'X', data: 0, pos: 21, refPos: 198 },
  { code: 'X', data: 2, pos: 52, refPos: 229 },
  { code: 'X', data: 0, pos: 54, refPos: 231 },
  { code: 'X', data: 2, pos: 70, refPos: 247 },
  { code: 'X', data: 2, pos: 80, refPos: 257 },
  { code: 'X', data: 1, pos: 86, refPos: 263 },
  { code: 'I', data: 'CT', pos: 133, refPos: 310 },
  { code: 'X', data: 1, pos: 135, refPos: 310 } ]

After:

[ { code: 'b',

data:  
'65,84,84,65,67,65,71,71,67,71,65,65,67,65,84,65,67,84,84,65,65,84,65,65,65,71,84,71,84,71,84,84,65,65,84,84,65,65,84,84,65,65,84,71,67,84,84,71,84,65,71,84,65,65,65,84,65,65,84,65,65,84,65,65,67,65,65,84,84,84,65,65,84,71,84,67,84,71,67,84,67,65,71,67,67,71,67,84,84,84,67,67,65,67,65,67,65,71,65,67,65,84,67,65,84,65,65,67,65,65,65,65,65,65,84,84,84,67,67,65,67,67,65,65,65,67,67,67,67,67,67,67',
    pos: 1,
    refPos: 178 },
  { code: 'I', data: 'CT', pos: 133, refPos: 310 },
  { code: 'Q', data: 36, pos: 133, refPos: 308 },
  { code: 'Q', data: 36, pos: 134, refPos: 309 },
  { code: 'b',
    data: '67,67,67,67,67,67,71,67,84,84,67,84,71,71,67',
    pos: 135,
    refPos: 310 } ]


The header of the file that contained the 'b' features used MarkDuplicates

@HD     VN:1.5  SO:coordinate
@PG     ID:bwa  PN:bwa  VN:0.7.13-r1126 CL:bwa mem -R
@RG\tID:DM_19_0434\tSM:DM_19_0434\tPL:ILLUMINA -M MT.fasta -t 10
fastq/DM_19_0434/reads/CGEN/raw/R1.fastq.gz
fastq/DM_19_0434/reads/CGEN/raw/R2.fastq.gz -v 1
@PG     ID:MarkDuplicates       VN:2.9.0-1-gf5b9f50-SNAPSHOT
 CL:picard.sam.markduplicates.MarkDuplicates
INPUT=[fastq/DM_19_0434/GRCh37.p13/alignments/bwa/CGEN/raw_sorted.bam]
OUTPUT=fastq/DM_19_0434/GRCh37.p13/alignments/bwa/CGEN/raw_sorted_duplicates_removed.bam
METRICS_FILE=fastq/DM_19_0434/GRCh37.p13/alignments/bwa/CGEN/raw_sorted_duplication_metrics.txt
REMOVE_DUPLICATES=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000
MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25
REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag
ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES
PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates
READ_NAME_REGEX=<optimized capture of last three ':' separated fields as
numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO
QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5
MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
GA4GH_CLIENT_SECRETS=client_secrets.json     PN:MarkDuplicates
@SQ     SN:MT   LN:16569
@RG     ID:DM_19_0434   SM:DM_19_0434   PL:ILLUMINA


-Colin
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to