Thanks Nils,
I did some testing and the duplicate rates look much more reasonable post-sanitization. I'm surprised that any reads were identified as duplicates while the paired reads had discordant names, but once the names for read1 and read2 were made to match, the duplicate count went from 5,000 (which was extremely small) to 500,000 (reasonable).

thanks,
RIchard


On 10/17/2014 02:04 PM, Nils Homer wrote:
My suggestion would be sanitize the read names,

N

On Fri, Oct 17, 2014 at 3:04 PM, Richard Corbett <rcorb...@bcgsc.ca <mailto:rcorb...@bcgsc.ca>> wrote:

    Hi all,

    Our usual read name format in a bam is something like:
    HS13_248:4:2111:2846:54933

    But we just were asked to analyse some data where the read names
    in the
    bam have this format:
    HWI-ST909_0086:3:1101:19761:56275#CGATGT/2/

    I notice that while running MarkDuplicates in version 1.122, the live
    messages suggest that Picard can't identify the pairs because of the
    terminal /1/ and /2/ in the read names:
    ...
    INFO    2014-10-17 11:55:18    MarkDuplicates    Read  114,000,000
records. Elapsed time: 00:21:07s. Time for last 1,000,000: 12s. Last
    read position: GL000212.1:39,902
    INFO    2014-10-17 11:55:18    MarkDuplicates    Tracking 112566758 as
    yet unmatched pairs. 2570 records in RAM.
    ...

    However, at the end of the run, there are still some reads properly
    marked as duplicates.

    Is it ok to ignore the warnings about unmatched pairs, or should we go
    back and edit the read names in the fastqs to ensure our duplicate
    marking ?

    thanks for all your help,
    RIchard


    
------------------------------------------------------------------------------
    Comprehensive Server Monitoring with Site24x7.
    Monitor 10 servers for $9/Month.
    Get alerted through email, SMS, voice calls or mobile push
    notifications.
    Take corrective actions from your mobile device.
    http://p.sf.net/sfu/Zoho
    _______________________________________________
    Samtools-help mailing list
    Samtools-help@lists.sourceforge.net
    <mailto:Samtools-help@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/samtools-help



------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to