On Oct 20, 2014, at 13:59, Heng Li <l...@me.com> wrote: > I have written an optical duplicate remover,
Sorry, I mena “I haven’t written an optical duplicate remover”. BTW, it would be good to make the example to the list if it is not too large. Thank you, Heng > but I would like to know your exact rules to identify two reads being > duplicates. As I briefly skimmed OpticalDuplicateFinder.java, it relies on a > parameter “this.opticalDuplicatePixelDistance”, which is expected. Are you > using the same threshold? > >> we do not extract the lane # from the read name, only tile, x-coordinate, >> and y-coordinate. > > Nils, why not use lane number? > > Heng > > On Oct 20, 2014, at 13:49, Salzberg, Anna <asalzb...@hmc.psu.edu> wrote: > >> Dear Nils, >> >> I counted BY HAND the number of duplicates that have the same tile in the >> A.debug.L1.sam file I had already sent you (note that there’s only a single >> lane). The number is 12 (which matches my script). However, picard >> MarkDuplicates is reporting 25 READ_PAIR_OPTICAL_DUPLICATES, that is 50. >> >> I really don’t want to be a pest, however we find that the optical >> duplicates functionality is AWESOME, and we’d be extremely happy for it to >> work. >> >> Thank you again for your help. >> Anna >> >> >> From: Nils Homer [mailto:nho...@broadinstitute.org] >> Sent: Thursday, October 16, 2014 8:41 PM >> To: Salzberg, Anna >> Cc: samtools-help@lists.sourceforge.net >> Subject: Re: [Samtools-help] Reporting Bug - Optical Duplicates of Picard >> MarkDuplicates >> >> Thanks Anna for the example set. I have observed a few things regarding >> this issue >> >> The first is that we do not extract the lane # from the read name, only >> tile, x-coordinate, and y-coordinate. You can see this in the code here if >> you are interested: >> https://github.com/broadinstitute/picard/blob/master/src/java/picard/sam/markduplicates/util/OpticalDuplicateFinder.java#L84-L104 >> >> Secondly, we also do not retrieve either the barcode information or library >> identifier in the read name, since they themselves are not embedded in the >> read name. Both barcode and library identifier are also important to >> condition upon when searching for optical duplicates, or duplicates in >> general. >> >> This brings us to where *do* we expect to retrieve this information? We use >> the read group header lines to capture lane, barcode, library, flowcell (for >> Illumina) and other information for specific sets or groups of reads. If >> this information is given, which I recommend that as a best practice it >> should, MarkDuplicates will behave as you expect. I believe it is much more >> robust to annotate these metadata in the header rather than rely on parsing >> read names wholly, since read name structures do change, albeit infrequently. >> >> I would recommend adding read groups to your SAM header within your >> pipeline. We use FastqToSam or IlluminaBasecallsToSam to set the read group >> appropriately depending on our inputs. In Picard, we also have tools like >> AddOrReplaceReadGroups that can help you add read groups prior to marking >> duplicates. >> >> Nils >> ------------------------------------------------------------------------------ >> Comprehensive Server Monitoring with Site24x7. >> Monitor 10 servers for $9/Month. >> Get alerted through email, SMS, voice calls or mobile push notifications. >> Take corrective actions from your mobile device. >> http://p.sf.net/sfu/Zoho_______________________________________________ >> Samtools-help mailing list >> Samtools-help@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/samtools-help > > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho > _______________________________________________ > Samtools-help mailing list > Samtools-help@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/samtools-help ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://p.sf.net/sfu/Zoho _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help