Hey Anna,

could you provide me with the inputs (i.e BAM and FASTA) to the
MarkDuplicates tool so we can reproduce and debug?  Having a reduced test
case would be ideal as I cannot debug easily from the information you
provided.  Thanks!

Nils

On Thu, Oct 16, 2014 at 1:10 PM, Salzberg, Anna <asalzb...@hmc.psu.edu>
wrote:

>  Nils,
>
>
>
> I have identified a bug with optical duplicates that is not fixed in
> picard v 1.122.
>
>
>
> In particular, I ran bwa and then picard MarkDuplicates on lane 1 and lane
> 2 files separately, and I then did the same for the merged fastq files.  I
> got the picard MarkDuplicates results below.  Note that the overall %
> duplication is higher in the merged file than in the individual lane ones
> (to be expected), but that the OPTICAL duplicates is also higher, that is,
> it is more than the sum of the OPTICAL duplicates of lanes 1 and 2 (NOT to
> be expected).
>
>
>
> In other words, it is expected that the overall duplication is higher than
> the sum of the ones in lanes 1 and 2 due to interaction between the reads
> of lanes 1 and 2 (for ex., read1 from lane1 could have no duplicates in
> lane1, and read2 from lane2 could have no duplicate in lane2 but in the
> merged file they could be duplicates of each other).  However, by
> definition, there should be no interaction in the OPTICAL duplicates
> between lanes 1 and 2, and therefore the results below seem wrong.  My
> suspicion is that the MarkDuplicates optical duplication algorithm is not
> checking for the lane.
>
>
>
> Sample                 LIBRARY               UNPAIRED_READS_EXAMINED
> READ_PAIRS_EXAMINED             UNMAPPED_READS
> UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES
> READ_PAIR_OPTICAL_DUPLICATES
> PERCENT_DUPLICATION               ESTIMATED_LIBRARY_SIZE
>
> A_merged          Unknown Library             497938  21072701
> 939669  326848  9821728                3875890                0.46831
>                 18726206
>
> A_L1                      Unknown Library             247592
> 10518632             458565  141958  3109570
> 1074927                0.298856                18640119
>
> A_L2                      Unknown Library             250347
> 10554068             481105  147968  3130326
> 1080576                0.30005                 18605087
>
>
>
>
>
> Thank you for looking into this!
> Anna
>
>
>
> *From:* Salzberg, Anna
> *Sent:* Friday, October 10, 2014 11:33 AM
> *To:* 'Nils Homer'
> *Cc:* 'samtools-help@lists.sourceforge.net'
> *Subject:* RE: [Samtools-help] Reporting Bug - Optical Duplicates of
> Picard MarkDuplicates
>
>
>
> Nils,
>
>
>
> I see I made a mistake in my countTileDups script, in that I report the
> wrong variable in the end!!  The number of dups in the same tile is in fact
> > than you reported optical duplicates.
>
>
>
> So please ignore my request for now.
>
>
>
> Thank you,
>
> Anna
>
>
>
> *From:* Salzberg, Anna
> *Sent:* Friday, October 10, 2014 10:26 AM
> *To:* 'Nils Homer'
> *Cc:* samtools-help@lists.sourceforge.net
> *Subject:* RE: [Samtools-help] Reporting Bug - Optical Duplicates of
> Picard MarkDuplicates
>
>
>
> Dear Nils,
>
>
>
> I installed Picard 1.122, and the number of optical duplicates was reduced
> by over 25% (the estimated library size was also different in the new
> version).
>
>
>
> Unfortunately, I still think that there is a bug with the number of
> optical duplicates, as simply counting the number of duplicates that have
> the same tile results in 3 orders of magnitude less than the MarkDuplicates
> optical duplicates count.
>
>
>
>
>
> I would *greatly* appreciate if you could look into this as this is super
> important to my lab.  I have provided in my previous email 2 scripts; one
> of them is a very simple script (only a few lines) that simply counts
> duplicates with the same tile.
>
>
>
> Thank you very much for your help with this issue.
>
>
>
> Anna
>
>
>
> *From:* Nils Homer [mailto:nho...@broadinstitute.org
> <nho...@broadinstitute.org>]
> *Sent:* Thursday, October 09, 2014 4:34 PM
> *To:* Salzberg, Anna
> *Cc:* samtools-help@lists.sourceforge.net
> *Subject:* Re: [Samtools-help] Reporting Bug - Optical Duplicates of
> Picard MarkDuplicates
>
>
>
> I am replying to the list so others can benefit from our discussion.
>
>
>
> The latest Picard release to support updated Illumina read names is 1.120
> while your install is 1.99.  You will need to update to this version or the
> latest version to get the benefit of this update.
>
>
>
> Nils
>
>
>
> On Thu, Oct 9, 2014 at 4:08 PM, Nils Homer <nho...@broadinstitute.org>
> wrote:
>
> Could you tell us what version of Picard you are using?  There was an
> issue earlier with parsing read names from newer Illumina analysis software.
>
>
>
> Nils
>
>
>
> On Thu, Oct 9, 2014 at 3:00 PM, Salzberg, Anna <asalzb...@hmc.psu.edu>
> wrote:
>
>   Hello,
>
>
>
> I am convinced that the optical duplicates count of the Picard
> MarkDuplicates command is incorrect.  When I wrote a script to  detect
> optical duplicates in my dataset, I got only ~1k optical duplicates as
> opposed to MarkDuplicates ~3 million.  I think the problem with
> MarkDuplicates is tile related because I then wrote a super simple script
> that simply counts how many duplicates share the same tile, and that was <
> 4k, that is, 3 orders of magnitudes less than MarkDuplicates!  The overall
> number of duplicates (opticals or otherwise) matched (~7 million).  I'm
> convinced my script is right, as it's so simple.
>
>
>
> Remove optical duplicates script:
>
> https://gist.github.com/annasa/eef7c30152ac296bb49b
>
>
>
> Count duplicates in same tile:
>
> https://gist.github.com/annasa/f5633eecf012153a3ff2
>
> Both scripts take as input a sam file sorted on chr and startPos. They
> also assume that when the sequence name is parsed by ":" then the tile is
> the 5th field, x the 6th and y the 7th (e.g.
> HWI-ST1318:119:H89A3ADXX:1:2209:1705:6933, where tile is '2209', x is
> '1705' and y is'6933').  Finally, they assume that the file is for a single
> lane, as I was working with such files.
>
> This is VERY important for my lab.  Please advise as soon as you can.
>
> Thank you,
>
> Anna
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> Samtools-help mailing list
> Samtools-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/samtools-help
>
>
>
>
>
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to