Is there a way how I can figure out how many reads have been removed by
applying rmdup?

I have a file foobar.bam containing paired-end reads, and I want to remove
PCR duplicates, and also I want to know how many pairs have been removed.
Here is what I am doing at the moment:

# sort foobar.bam
samtools sort -n foobar.bam -o foobar.srtd.bam

# apply fixmate
samtools fixmate foobar.srtd.bam foobar.srtd.matefixed.bam

# do the PCR duplicate removal
samtools rmdup foobar.srtd.matefixed.bam foobar.rmdupped.bam

# make sure that the output is sorted
samtools sort -n foobar.rmdupped.bam -o foobar.rmdupped.srtd.bam

What I am doing then is to run samtools stat on foobar.srtd.matefixed.bam
(file before the rmdup stage) and foobar.rmdupped.bam (file after the rmdup
stage). Then I could write a script that parses the relevant numbers out of
the output.

Is this the way to it? Is there a more straightforward way?

I need to programmatically access for a bam file how many PCR duplicates
have been removed with rmdup. Or, I need this as a parseable output from
the samtools rmdup command.

Thanks,
Anton

## Versions used:
samtools --version samtools 1.3.1
Using htslib 1.3.1
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to