Hello all, A user sent us a CRAM file that contains 'b' readFeatures. By that I mean it refers to this type of element from the CRAM spec v3 ... else if feature_code =‘b’ then 36: bases ← ReadItem(BB, Byte[])
I can decode these, but we are working in cram-js/jbrowse and these reads were not showing any SNPs since we don't calculate anything about these 'b' tags as containing SNPs. To be a little more clear it looked like this. Here is the same read from before and after some process replaced it with 'b' tags Before: [ { code: 'X', data: 0, pos: 21, refPos: 198 }, { code: 'X', data: 2, pos: 52, refPos: 229 }, { code: 'X', data: 0, pos: 54, refPos: 231 }, { code: 'X', data: 2, pos: 70, refPos: 247 }, { code: 'X', data: 2, pos: 80, refPos: 257 }, { code: 'X', data: 1, pos: 86, refPos: 263 }, { code: 'I', data: 'CT', pos: 133, refPos: 310 }, { code: 'X', data: 1, pos: 135, refPos: 310 } ] After: [ { code: 'b', data: '65,84,84,65,67,65,71,71,67,71,65,65,67,65,84,65,67,84,84,65,65,84,65,65,65,71,84,71,84,71,84,84,65,65,84,84,65,65,84,84,65,65,84,71,67,84,84,71,84,65,71,84,65,65,65,84,65,65,84,65,65,84,65,65,67,65,65,84,84,84,65,65,84,71,84,67,84,71,67,84,67,65,71,67,67,71,67,84,84,84,67,67,65,67,65,67,65,71,65,67,65,84,67,65,84,65,65,67,65,65,65,65,65,65,84,84,84,67,67,65,67,67,65,65,65,67,67,67,67,67,67,67', pos: 1, refPos: 178 }, { code: 'I', data: 'CT', pos: 133, refPos: 310 }, { code: 'Q', data: 36, pos: 133, refPos: 308 }, { code: 'Q', data: 36, pos: 134, refPos: 309 }, { code: 'b', data: '67,67,67,67,67,67,71,67,84,84,67,84,71,71,67', pos: 135, refPos: 310 } ] The header of the file that contained the 'b' features used MarkDuplicates @HD VN:1.5 SO:coordinate @PG ID:bwa PN:bwa VN:0.7.13-r1126 CL:bwa mem -R @RG\tID:DM_19_0434\tSM:DM_19_0434\tPL:ILLUMINA -M MT.fasta -t 10 fastq/DM_19_0434/reads/CGEN/raw/R1.fastq.gz fastq/DM_19_0434/reads/CGEN/raw/R2.fastq.gz -v 1 @PG ID:MarkDuplicates VN:2.9.0-1-gf5b9f50-SNAPSHOT CL:picard.sam.markduplicates.MarkDuplicates INPUT=[fastq/DM_19_0434/GRCh37.p13/alignments/bwa/CGEN/raw_sorted.bam] OUTPUT=fastq/DM_19_0434/GRCh37.p13/alignments/bwa/CGEN/raw_sorted_duplicates_removed.bam METRICS_FILE=fastq/DM_19_0434/GRCh37.p13/alignments/bwa/CGEN/raw_sorted_duplication_metrics.txt REMOVE_DUPLICATES=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json PN:MarkDuplicates @SQ SN:MT LN:16569 @RG ID:DM_19_0434 SM:DM_19_0434 PL:ILLUMINA -Colin
_______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help