Hi all,

Our usual read name format in a bam is something like:
HS13_248:4:2111:2846:54933

But we just were asked to analyse some data where the read names in the 
bam have this format:
HWI-ST909_0086:3:1101:19761:56275#CGATGT/2/

I notice that while running MarkDuplicates in version 1.122, the live 
messages suggest that Picard can't identify the pairs because of the 
terminal /1/ and /2/ in the read names:
...
INFO    2014-10-17 11:55:18    MarkDuplicates    Read   114,000,000 
records.  Elapsed time: 00:21:07s.  Time for last 1,000,000:   12s. Last 
read position: GL000212.1:39,902
INFO    2014-10-17 11:55:18    MarkDuplicates    Tracking 112566758 as 
yet unmatched pairs. 2570 records in RAM.
...

However, at the end of the run, there are still some reads properly 
marked as duplicates.

Is it ok to ignore the warnings about unmatched pairs, or should we go 
back and edit the read names in the fastqs to ensure our duplicate marking ?

thanks for all your help,
RIchard


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to