Hi Jon,
This resource is very helpful:
https://www.htslib.org/algorithms/duplicate.html,
One day I will write a full description of how duplicate marking works.
I keep telling myself that.
but does samtools
markdup handle interchromosomal read-pairs like picard?
It does.
What are the differences in the algorithms?
samtools markdup run using mode s should give identical results to
picard for duplicate marking.
There are two modes and I will just quote the man page here:
"Duplicate decision method for paired reads. Values are 't' or
's'. Mode t measures positions based on template start/end
(default). Mode 's' measures positions based on sequence
start. While the two methods identify mostly the same reads
as duplicates, mode 's' tends to return more results.
Unpaired reads are treated identically by both modes."
The differences only appear when both reads are are mapped to the same
direction (FF or RR).
There are differences in optical duplicate marking for HiSeq X data but
that is down to an integer overflow bug in picard.
Regards,
Andrew
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help