Tom, I am not entirely sure. I will try sorting each file prior to merging to see if this fixes the issue. Thanks.
Mark ---------------------- Mark J. Margres, Ph.D. Postdoc, Storfer Lab School of Biological Sciences Washington State University Pullman, WA 99164-4236 ________________________________________ From: Thomas W. Blackwell [tbla...@umich.edu] Sent: Friday, May 12, 2017 10:34 AM To: Margres, Mark J Cc: samtools-help@lists.sourceforge.net Subject: RE: [Samtools-help] samtools merge issue Mark - "Sort order" refers to the ordering of records for individual sequence reads in the body of a .bam file, not to the ordering of the reference sequence contigs in the header. Bwa, by default, preserves the ordering of sequence reads that's found in the input .fastq file(s). It's customary to sort each .bam file separately into chromosome coordinate order before merging. In that case, the default without "-n" is correct. Separately, why do none of your *PE.bam or *SE.bam files begin with an "@HD" line before the first "@SQ" ? I woould expect samtools to supply this. - tom blackwell - On Fri, 12 May 2017, Margres, Mark J wrote: > Hi Tom, > > The "sample + .bam" file was not called until that command, so it should not > be an empty file. As you can see below, the alignments appear to be sorted > by chromosome, not read, so I think ?merge -c? is appropriate. Do I need to > sort prior to merging? > > Input file one for merging: > > [mark.margres@login-p1n01 02-Mapped]$ samtools view -H 165499PE.bam > @SQ SN:chr1_GL834412_random LN:2180112 > @SQ SN:chr1_GL834413_random LN:373546 > @SQ SN:chr1_GL834414_random LN:1075293 > @SQ SN:chr1_GL834415_random LN:174856 > @SQ SN:chr1_GL834416_random LN:817733 > @SQ SN:chr1_GL834417_random LN:3669695 > @SQ SN:chr1_GL834418_random LN:1117890 > @SQ SN:chr1_GL834419_random LN:2046010 > > Input file two for merging: > > [mark.margres@login-p1n01 02-Mapped]$ samtools view -H 165499SE.bam > @SQ SN:chr1_GL834412_random LN:2180112 > @SQ SN:chr1_GL834413_random LN:373546 > @SQ SN:chr1_GL834414_random LN:1075293 > @SQ SN:chr1_GL834415_random LN:174856 > @SQ SN:chr1_GL834416_random LN:817733 > @SQ SN:chr1_GL834417_random LN:3669695 > @SQ SN:chr1_GL834418_random LN:1117890 > @SQ SN:chr1_GL834419_random LN:2046010 > > Merged file: > > [mark.margres@login-p1n01 02-Mapped]$ samtools view -H 165499.bam > @SQ SN:chr1_GL834412_random LN:2180112 > @SQ SN:chr1_GL834413_random LN:373546 > @SQ SN:chr1_GL834414_random LN:1075293 > @SQ SN:chr1_GL834415_random LN:174856 > @SQ SN:chr1_GL834416_random LN:817733 > @SQ SN:chr1_GL834417_random LN:3669695 > @SQ SN:chr1_GL834418_random LN:1117890 > @SQ SN:chr1_GL834419_random LN:2046010 > > Sorted file: > > [mark.margres@login-p1n01 02-Mapped]$ samtools view -H 165499sorted.bam > @HD VN:1.3 SO:coordinate > @SQ SN:chr1_GL834412_random LN:2180112 > @SQ SN:chr1_GL834413_random LN:373546 > @SQ SN:chr1_GL834414_random LN:1075293 > @SQ SN:chr1_GL834415_random LN:174856 > @SQ SN:chr1_GL834416_random LN:817733 > @SQ SN:chr1_GL834417_random LN:3669695 > @SQ SN:chr1_GL834418_random LN:1117890 > @SQ SN:chr1_GL834419_random LN:2046010 > > Thanks. > > Mark > > ---------------------- > Mark J. Margres, Ph.D. > Postdoc, Storfer Lab > School of Biological Sciences > Washington State University > Pullman, WA 99164-4236 > > ________________________________________ > From: Thomas W. Blackwell [tbla...@umich.edu] > Sent: Friday, May 12, 2017 9:50 AM > To: Margres, Mark J > Cc: samtools-help@lists.sourceforge.net > Subject: Re: [Samtools-help] samtools merge issue > > Mark - > > Just guessing: One may need to specify "merge -f" in case an > earlier invocation of the command has created an empty file with > the path generated by "jp(bamFolder, sample + ".bam")". Also may > need to specify "merge -n" if, in fact, the output from bwa is > already properly name-sorted. Again, just guessing. > > - tom blackwell - > > On Fri, 12 May 2017, Margres, Mark J wrote: > >> Hello, >> >> I am having an issue with merging different bam assemblies using samtools. >> In my pipeline, I merge reads using flash2, and then perform reference-based >> assemblies using bwa, one assembly uses just the merged reads and a second >> assembly uses just the unmerged reads. I then attempt to merge the >> assemblies using samtools and get the following output: >> >> >> [W::bam_merge_core2] No @HD tag found. >> >> I am assuming this is due to the unmerged read assembly having a different >> header than the merged reads, but I am not sure why (and also not sure if >> this is a bwa issue or a samtools issue). Is there a way for me to merge >> and then sort the assemblies with maintaining the correct header tags? I >> have the original assemblies to work with, but have assembled abut a dozen >> genomes and would prefer to not re-assemble if possible. Code is below. >> Thanks. >> >> # Run BWA to map PE samples to reference genome >> # -t number of threads -R read group header >> logFile = jp(resultsDir, sample + '_mapping.log') >> cmd = ' '.join(["/home/mark.margres/bwa-0.7.15/bwa mem -t 50 -R >> '@RG\tID:bwa\tSM:" + sample + "\tPL:ILLUMINA'", >> bwaIndex, jp(resultsDir, sample + >> "_cleaned_PE1.fastq.gz"), >> jp(resultsDir, sample + "_cleaned_PE2.fastq.gz"), "| >> samtools view -bS -@ 50 -o", jp(bamFolder, sample + "PE.bam"), >> "2>", logFile]) >> log(cmd, logCommands) >> os.system(cmd) >> # Run BWA to map SE samples to reference genome >> cmd = ' '.join(["/home/mark.margres/bwa-0.7.15/bwa mem -t 50 -R '@RG\tID:bwa\tSM:" + sample + "\tPL:ILLUMINA'", >> bwaIndex, jp(resultsDir, sample + >> "_cleaned_SE.fastq.gz"), "| samtools view -bS -@ 50 -o", jp(bamFolder, >> sample + "SE.bam"), >> "2>>", logFile]) >> log(cmd, logCommands) >> os.system(cmd) >> # merge and sort >> cmd = ' '.join(['samtools merge -c', jp(bamFolder, sample + ".bam"), >> jp(bamFolder, sample + "PE.bam"), jp(bamFolder, sample + "SE.bam")]) >> log(cmd, logCommands) >> os.system(cmd) >> # sort bam file; -@ number of threads >> cmd = ' '.join(['samtools sort -o', jp(bamFolder, sample) + "sorted.bam", >> ' -@ 50', jp(bamFolder, sample + ".bam")]) >> log(cmd, logCommands) >> os.system(cmd) >> # Mark and remove PCR duplicates >> cmd = ' '.join([picard + "MarkDuplicates INPUT=" + jp(bamFolder, sample + >> "sorted.bam"), ' OUTPUT=' + jp(bamFolder, sample + "_markdup.bam"), >> ' METRICS_FILE=' + jp(bamFolder, sample + ".metrics"), ' >> REMOVE_DUPLICATES=true ', >> ' ASSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT', >> '>>', logFile, '2>&1']) >> log(cmd, logCommands) >> os.system(cmd) >> # Index bam file: >> cmd = ' '.join(['samtools index', jp(bamFolder, sample) + "_markdup.bam"]) >> >> ---------------------- >> Mark J. Margres, Ph.D. >> Postdoc, Storfer Lab >> School of Biological Sciences >> Washington State University >> Pullman, WA 99164-4236 >> ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help