Is there a way how I can figure out how many reads have been removed by
applying rmdup?
I have a file foobar.bam containing paired-end reads, and I want to remove
PCR duplicates, and also I want to know how many pairs have been removed.
Here is what I am doing at the moment:
# sort foobar.bam
samtools sort -n foobar.bam -o foobar.srtd.bam
# apply fixmate
samtools fixmate foobar.srtd.bam foobar.srtd.matefixed.bam
# do the PCR duplicate removal
samtools rmdup foobar.srtd.matefixed.bam foobar.rmdupped.bam
# make sure that the output is sorted
samtools sort -n foobar.rmdupped.bam -o foobar.rmdupped.srtd.bam
What I am doing then is to run samtools stat on foobar.srtd.matefixed.bam
(file before the rmdup stage) and foobar.rmdupped.bam (file after the rmdup
stage). Then I could write a script that parses the relevant numbers out of
the output.
Is this the way to it? Is there a more straightforward way?
I need to programmatically access for a bam file how many PCR duplicates
have been removed with rmdup. Or, I need this as a parseable output from
the samtools rmdup command.
Thanks,
Anton
## Versions used:
samtools --version samtools 1.3.1
Using htslib 1.3.1
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help