I completely understand that and I can fix it. However I am just pointing out the current default we are getting from the illumina sequencers.
So either Illumina needs to conform to the current SAM format in bcl2fastq The SAM format needs to be updated :-) Everyone who uses the -C option needs to reformat all of their FASTQ files if the files are dual index The reason I am doing this, I have a pathogen sample that will also have human sequences in it and I have been asked to produce FASTQ only files for the pathogen. To recreate read1 and read2 files I need the header info. Aengus On 14/03/2019 14:47, Thomas W. Blackwell wrote: > > The relevant man pages are pretty clear on this question: > > bio-bwa.sourceforge.net/bwa.shtml > samtools.github.io/hts-specs/SAMv1.pdf > > The comment string following the space in each fastq header line needs to > conform to the SAM spec for user-defined optional > fields, and the example shown does not. One probably needs to write a sed > script that alters the header lines into an > acceptable format. > > - tom blackwell - > > On Thu, 14 Mar 2019, Aengus Stewart wrote: > >> >> Hi guys, >> >> Our sequencer is now outputting header lines with the following format >> >> @M02212:177:000000000-CBJHK:1:1101:11456:1264 1:N:0:AGGCAGAA+CTCTCTAT >> >> If I use BWA with the -C flag then this header info gets sent to the .sam >> output file however in the sam->bam conversion I am >> getting >> >> [M::mem_pestat] skip orientation RR as there are not enough pairs >> [M::mem_process_seqs] Processed 202164 reads in 14.429 CPU sec, 3.606 real >> sec >> [E::sam_parse1] unrecognized type : >> [W::sam_read1] Parse error at line 11 >> samtools sort: truncated file. Aborting >> User: 14.22 >> System: 0.62 >> Elapsed: 0:04.21 >> >> >> I am running modules for SAMtools/1.9-foss-2018b BWA/0.7.17-foss-2018b >> >> my default commandline would look like >> >> bwa mem -t 4 -M -B 2 -C -R >> "@RG\tID:B-Zhejiang-Wuxin-113-2018_QFX\tLB:B-Zhejiang-Wuxin-113-2018_QFX\tSM:B-Zhejiang-Wuxin-113-2018_QFX\tPL:ILLUMINA" >> \ >> >> /camp/stp/babs/working/stewara/projects/asf/laura.cubitt/RN19003/reference/swH1N1 >> \ >> B-Zhejiang-Wuxin-113-2018_QFX.R1.trim.fastq \ >> B-Zhejiang-Wuxin-113-2018_QFX.R2.trim.fastq | \ >> samtools sort -@4 -O BAM -o >> B-Zhejiang-Wuxin-113-2018_QFX.swH1N1.test.bam - >> >> >> Cheers >> Aengus >> >> >> >> -- >> ----------------------------------------------------------------------- >> Aengus Stewart Tel: +44 (0)20 3796 1702 >> Head of Bioinformatics and BioStatistics >> Francis Crick Institute >> 1 Midland Rd >> Kings Cross, London NW1 1AT, UK >> ----------------------------------------------------------------------- >> The Francis Crick Institute Limited is a registered charity in England and >> Wales no. 1140062 and a company registered in >> England and Wales no. 06885462, with its registered office at 1 Midland Road >> London NW1 1AT >> >> _______________________________________________ >> Samtools-help mailing list >> Samtools-help@lists.sourceforge.net >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fsamtools-help&data=02%7C01%7C%7Cd05f5527417946a3910508d6a88bee74%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C0%7C636881716291570863&sdata=P%2F9JEgC%2FN9Llpo%2FfPmgGpGHRRgx5FIPjv4sJwEp5378%3D&reserved=0 >> >> -- ----------------------------------------------------------------------- Aengus Stewart Tel: +44 (0)20 3796 1702 Head of Bioinformatics and BioStatistics Francis Crick Institute 1 Midland Rd Kings Cross, London NW1 1AT, UK ----------------------------------------------------------------------- The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help