Unless I'm completely mistaken, I think the question is "How do I correctly merge .bam files which already have @RG tags in them, and their @RG ID values are all the same, even though the files are for different individuals ?"
If that's the question, then absolutely vanilla samtools merge is what you want, without -h, -r or -c. The paragraph in the samtools manpage which begins "Unless the -c or -p flags are specified ..." says that even though the original ID values are identical, they will have distinct suffixes in the merged .bam file. The resulting ID values may be ugly, but they will successfully distinguish individuals. If the individual .bam files do not already have @RG tags, then -r is what you want, followed by a 'samtools reheader' step to fill in LB: and SM: values in the header with the appropriate information.
- tom blackwell - On Sat, 8 Nov 2014, Tommy Carstensen wrote:
How do I merge bam files with identical @RG IDs? How do I create a merged output with a unique @RG ID for each unique @RG SM? The manual reads: -r Attach an RG tag to each alignment. The tag value is inferred from file names. I have tried adding the -r option: samtools merge -r -R 1:100000-100200 samtools.merge.bam $bamFiles The manual reads: -h FILEUse the lines of FILE as Œ@¹ headers to be copied to out.bam, replacing any header lines that would otherwise be copied from in1.bam. (FILE is actually in SAM format, though any alignment records it may contain are ignored.) I have tried adding -h RG.txt: samtools merge -r -R 1:100000-100200 -h RG.txt samtools.merge.bam $bamFiles But I am not sure, what the contents of RG.txt should be. The example "Attach the RG tag while merging sorted alignments" is not clear to me. Others seem to have had the same problem: https://www.biostars.org/p/80150/ http://seqanswers.com/forums/showthread.php?t=33260 http://sourceforge.net/p/samtools/mailman/message/30655641/ Currently my RG.txt file looks like this (tab separated fields): grep "#1[^0-9]" RG.txt | head | rev | cut -c3- | rev @RG ID:1#1 PL:ILLUMINA LB:7721122 SM:EGAN000011605 @RG ID:1#1.1 PL:ILLUMINA LB:7721122 SM:EGAN000011605 @RG ID:1#1.2 PL:ILLUMINA LB:7721122 SM:EGAN000011605 @RG ID:1#1 PL:ILLUMINA LB:7672393 SM:EGAN000011612 @RG ID:1#1.1 PL:ILLUMINA LB:7672393 SM:EGAN000011612 @RG ID:1#1.2 PL:ILLUMINA LB:7672393 SM:EGAN000011612 @RG ID:1#1 PL:ILLUMINA LB:7790252 SM:EGAN000011617 @RG ID:1#1.1 PL:ILLUMINA LB:7790252 SM:EGAN000011617 @RG ID:1#1.2 PL:ILLUMINA LB:7790252 SM:EGAN000011617 @RG ID:1#1 PL:ILLUMINA LB:7672199 SM:EGAN000011621 Thanks for any help on this. Tommy -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ------------------------------------------------------------------------------ _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help
------------------------------------------------------------------------------
_______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help