On Oct 20, 2014, at 13:59, Heng Li <l...@me.com> wrote:

> I have written an optical duplicate remover,

Sorry, I mena “I haven’t written an optical duplicate remover”. BTW, it would 
be good to make the example to the list if it is not too large.

Thank you,

Heng

> but I would like to know your exact rules to identify two reads being 
> duplicates. As I briefly skimmed OpticalDuplicateFinder.java, it relies on a 
> parameter “this.opticalDuplicatePixelDistance”, which is expected. Are you 
> using the same threshold?
> 
>> we do not extract the lane # from the read name, only tile, x-coordinate, 
>> and y-coordinate.
> 
> Nils, why not use lane number?
> 
> Heng
> 
> On Oct 20, 2014, at 13:49, Salzberg, Anna <asalzb...@hmc.psu.edu> wrote:
> 
>> Dear Nils,
>> 
>> I counted BY HAND the number of duplicates that have the same tile in the 
>> A.debug.L1.sam file I had already sent you (note that there’s only a single 
>> lane).  The number is 12 (which matches my script).  However, picard 
>> MarkDuplicates is reporting 25 READ_PAIR_OPTICAL_DUPLICATES, that is 50.
>> 
>> I really don’t want to be a pest, however we find that the optical 
>> duplicates functionality is AWESOME, and we’d be extremely happy for it to 
>> work.
>> 
>> Thank you again for your help.
>> Anna
>> 
>> 
>> From: Nils Homer [mailto:nho...@broadinstitute.org] 
>> Sent: Thursday, October 16, 2014 8:41 PM
>> To: Salzberg, Anna
>> Cc: samtools-help@lists.sourceforge.net
>> Subject: Re: [Samtools-help] Reporting Bug - Optical Duplicates of Picard 
>> MarkDuplicates
>> 
>> Thanks Anna for the example set.  I have observed a few things regarding 
>> this issue
>> 
>> The first is that we do not extract the lane # from the read name, only 
>> tile, x-coordinate, and y-coordinate.  You can see this in the code here if 
>> you are interested: 
>> https://github.com/broadinstitute/picard/blob/master/src/java/picard/sam/markduplicates/util/OpticalDuplicateFinder.java#L84-L104
>> 
>> Secondly, we also do not retrieve either the barcode information or library 
>> identifier in the read name, since they themselves are not embedded in the 
>> read name.  Both barcode and library identifier are also important to 
>> condition upon when searching for optical duplicates, or duplicates in 
>> general.  
>> 
>> This brings us to where *do* we expect to retrieve this information?  We use 
>> the read group header lines to capture lane, barcode, library, flowcell (for 
>> Illumina) and other information for specific sets or groups of reads.  If 
>> this information is given, which I recommend that as a best practice it 
>> should, MarkDuplicates will behave as you expect.  I believe it is much more 
>> robust to annotate these metadata in the header rather than rely on parsing 
>> read names wholly, since read name structures do change, albeit infrequently.
>> 
>> I would recommend adding read groups to your SAM header within your 
>> pipeline.  We use FastqToSam or IlluminaBasecallsToSam to set the read group 
>> appropriately depending on our inputs.  In Picard, we also have tools like 
>> AddOrReplaceReadGroups that can help you add read groups prior to marking 
>> duplicates.
>> 
>> Nils
>> ------------------------------------------------------------------------------
>> Comprehensive Server Monitoring with Site24x7.
>> Monitor 10 servers for $9/Month.
>> Get alerted through email, SMS, voice calls or mobile push notifications.
>> Take corrective actions from your mobile device.
>> http://p.sf.net/sfu/Zoho_______________________________________________
>> Samtools-help mailing list
>> Samtools-help@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/samtools-help
> 
> 
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://p.sf.net/sfu/Zoho
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to