Unless I'm completely mistaken, I think the question is "How do I correctly merge .bam files which already have @RG tags in them, and their @RG ID values are all the same, even though the files are for different individuals ?"

If that's the question, then absolutely vanilla samtools merge is what you want, without -h, -r or -c. The paragraph in the samtools manpage which begins "Unless the -c or -p flags are specified ..." says that even though the original ID values are identical, they will have distinct suffixes in the merged .bam file. The resulting ID values may be ugly, but they will successfully distinguish individuals. If the individual .bam files do not already have @RG tags, then -r is what you want, followed by a 'samtools reheader' step to fill in LB: and SM: values in the header with the appropriate information.

                                                        -  tom blackwell  -

On Sat, 8 Nov 2014, Tommy Carstensen wrote:

How do I merge bam files with identical @RG IDs? How do I create a merged
output with a unique @RG ID for each unique @RG SM?

The manual reads:
-r Attach an RG tag to each alignment. The tag value is inferred from file
names.

I have tried adding the -r option:
samtools merge -r -R 1:100000-100200 samtools.merge.bam $bamFiles

The manual reads:
-h FILEUse the lines of FILE as Œ@¹ headers to be copied to out.bam,
replacing any header lines that would otherwise be copied from in1.bam.
(FILE is actually in SAM format, though any alignment records it may
contain are ignored.)


I have tried adding -h RG.txt:

samtools merge -r -R 1:100000-100200 -h RG.txt samtools.merge.bam $bamFiles

But I am not sure, what the contents of RG.txt should be. The example
"Attach the RG tag while merging sorted alignments" is not clear to me.

Others seem to have had the same problem:
https://www.biostars.org/p/80150/

http://seqanswers.com/forums/showthread.php?t=33260

http://sourceforge.net/p/samtools/mailman/message/30655641/


Currently my RG.txt file looks like this (tab separated fields):
grep "#1[^0-9]" RG.txt | head | rev | cut -c3- | rev
@RG     ID:1#1  PL:ILLUMINA     LB:7721122      SM:EGAN000011605
@RG     ID:1#1.1        PL:ILLUMINA     LB:7721122      SM:EGAN000011605
@RG     ID:1#1.2        PL:ILLUMINA     LB:7721122      SM:EGAN000011605
@RG     ID:1#1  PL:ILLUMINA     LB:7672393      SM:EGAN000011612
@RG     ID:1#1.1        PL:ILLUMINA     LB:7672393      SM:EGAN000011612
@RG     ID:1#1.2        PL:ILLUMINA     LB:7672393      SM:EGAN000011612
@RG     ID:1#1  PL:ILLUMINA     LB:7790252      SM:EGAN000011617
@RG     ID:1#1.1        PL:ILLUMINA     LB:7790252      SM:EGAN000011617
@RG     ID:1#1.2        PL:ILLUMINA     LB:7790252      SM:EGAN000011617
@RG     ID:1#1  PL:ILLUMINA     LB:7672199      SM:EGAN000011621


Thanks for any help on this.

Tommy



--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help
------------------------------------------------------------------------------
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to