Thanks Nils,
I did some testing and the duplicate rates look much more reasonable
post-sanitization. I'm surprised that any reads were identified as
duplicates while the paired reads had discordant names, but once the
names for read1 and read2 were made to match, the duplicate count went
from 5,000 (which was extremely small) to 500,000 (reasonable).
thanks,
RIchard
On 10/17/2014 02:04 PM, Nils Homer wrote:
My suggestion would be sanitize the read names,
N
On Fri, Oct 17, 2014 at 3:04 PM, Richard Corbett <rcorb...@bcgsc.ca
<mailto:rcorb...@bcgsc.ca>> wrote:
Hi all,
Our usual read name format in a bam is something like:
HS13_248:4:2111:2846:54933
But we just were asked to analyse some data where the read names
in the
bam have this format:
HWI-ST909_0086:3:1101:19761:56275#CGATGT/2/
I notice that while running MarkDuplicates in version 1.122, the live
messages suggest that Picard can't identify the pairs because of the
terminal /1/ and /2/ in the read names:
...
INFO 2014-10-17 11:55:18 MarkDuplicates Read 114,000,000
records. Elapsed time: 00:21:07s. Time for last 1,000,000:
12s. Last
read position: GL000212.1:39,902
INFO 2014-10-17 11:55:18 MarkDuplicates Tracking 112566758 as
yet unmatched pairs. 2570 records in RAM.
...
However, at the end of the run, there are still some reads properly
marked as duplicates.
Is it ok to ignore the warnings about unmatched pairs, or should we go
back and edit the read names in the fastqs to ensure our duplicate
marking ?
thanks for all your help,
RIchard
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push
notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
<mailto:Samtools-help@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/samtools-help
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help