On 14 Mar 2019, at 15:14, Aengus Stewart <aengus.stew...@crick.ac.uk> wrote:
> I completely understand that and I can fix it.  However I am just pointing 
> out the current default we are getting from the
> illumina sequencers.
> 
> So either
> Illumina needs to conform to the current SAM format in bcl2fastq
> The SAM format needs to be updated :-)
> Everyone who uses the -C option needs to reformat all of their FASTQ files if 
> the files are dual index

The FASTQ format is not SAM. What you’re really seeing is the lack of standards 
and conventions around representing metadata on FASTQ @ lines.

BWA’s -C option has a convention of interpreting the stuff after the read name 
as SAM tagged fields, which is nicely general purpose and not a bad idea if you 
want to put arbitrary SAM tagged fields through the aligner. OTOH Illumina has 
its own conventions around what’s on the @ line:

>>> @M02212:177:000000000-CBJHK:1:1101:11456:1264 1:N:0:AGGCAGAA+CTCTCTAT

What would be handy would be if BWA also had an option to interpret Illumina’s 
1:N:0:AGGCAGAA+CTCTCTAT metadata and re-encode it into appropriate SAM flags 
and tagged fields. It doesn’t, and in the meantime everyone gets to write 
scripts to do that reformatting.

    John

_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to