On 14 Mar 2019, at 15:14, Aengus Stewart <aengus.stew...@crick.ac.uk> wrote: > I completely understand that and I can fix it. However I am just pointing > out the current default we are getting from the > illumina sequencers. > > So either > Illumina needs to conform to the current SAM format in bcl2fastq > The SAM format needs to be updated :-) > Everyone who uses the -C option needs to reformat all of their FASTQ files if > the files are dual index
The FASTQ format is not SAM. What you’re really seeing is the lack of standards and conventions around representing metadata on FASTQ @ lines. BWA’s -C option has a convention of interpreting the stuff after the read name as SAM tagged fields, which is nicely general purpose and not a bad idea if you want to put arbitrary SAM tagged fields through the aligner. OTOH Illumina has its own conventions around what’s on the @ line: >>> @M02212:177:000000000-CBJHK:1:1101:11456:1264 1:N:0:AGGCAGAA+CTCTCTAT What would be handy would be if BWA also had an option to interpret Illumina’s 1:N:0:AGGCAGAA+CTCTCTAT metadata and re-encode it into appropriate SAM flags and tagged fields. It doesn’t, and in the meantime everyone gets to write scripts to do that reformatting. John _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help