Ah, that makes perfect sense.
I should have looked at the source myself, too lazy.

Thanks Nils
Louis

On 14-10-09 12:06 PM, Nils Homer wrote:
> Hey Louis,
> 
> I have copied the documentation from the source code, which I hope answers 
> your questions.  Please let me know if you need more clarification.
> 
> 
>     This tool will also not work with alignments that have large gaps or 
> skips, such as those from RNA-seq data.  This is due to the need to buffer 
> small genomic windows to ensure integrity of the duplicate marking, while 
> large skips (ex. skipping introns) in the alignment records would force 
> making that window very large, thus exhausting memory.
> 
> source: 
> https://github.com/broadinstitute/picard/blob/master/src/java/picard/sam/markduplicates/MarkDuplicatesWithMateCigar.java
> 
> Long story short is that SAM files are sorted by the 5' alignment start 
> position while for duplicate marking we look at the 3'-end sequencing start 
> position, with the latter significantly affected by soft clipping and skips 
> in the alignment.
> 
> N
> 
> On Thu, Oct 9, 2014 at 11:41 AM, Louis Letourneau 
> <louis.letourn...@mail.mcgill.ca <mailto:louis.letourn...@mail.mcgill.ca>> 
> wrote:
> 
>     I'm curious as to why MarkDuplicatesWithMateCigar has the "This tool 
> cannot be used with alignments that have large gaps or reference skips, which 
> happens frequently in RNA-seq data." limitation?
> 
>     Thanks
>     Louis
> 
>     On 14-10-08 11:25 AM, George Grant wrote:
>     > Picard Release 1.122
>     > 8 October 2014
>     >
>     > - New Command Line Program "GenotypeConcordance"
>     >     -- Calculates the concordance between genotype data for two samples 
> in two different VCFs - one being considered the truth (or reference) the 
> other being considered the call.  The concordance is broken into separate 
> results sections for SNPs and indels.  Summary and detailed statistics are 
> reported.
>     >     Note that for any pair of variants to compare, only the alleles for 
> the samples under interrogation are considered and MNP, Symbolic, and Mixed 
> classes of variants are not included.
>     >
>     > - New Command Line Program "UpdateVcfDictionary"
>     >     -- Updates the sequence dictionary of a VCF from another file (SAM, 
> BAM, VCF, dictionary, interval_list, fasta, etc).
>     >
>     > - New Command Line Program "VcfToIntervalList"
>     >     -- Create an interval list from a VCF
>     >
>     > - New Command Line Program "MarkDuplicatesWithMateCigar"
>     >     -- A new tool with which to mark duplicates:
>     >     This tool can replace MarkDuplicates if the input SAM/BAM has Mate 
> CIGAR (MC) optional tags
>     >     pre-computed (see the tools 
> RevertOriginalBaseQualitiesAndAddMateCigar and
>     >     FixMateInformation).  This allows the new tool to perform a 
> streaming duplicate
>     >     marking routine (i.e. a single-pass).  This tool cannot be used with
>     >     alignments that have large gaps or reference skips, which happens
>     >     frequently in RNA-seq data.
>     >
>     >     There were many refactors of the old MarkDuplicates and
>     >     MarkDuplicatesWithMateCigar, since the share common code.
>     >     EstimateLibraryComplexity was caught up in this too.
>     >
>     >     Many, many, many unit tests were added to were added to prove
>     >     equivalency of MarkDuplicatesWithMateCigar to MarkDuplicates.  This 
> also
>     >     exposed a few one in a million corner cases in MarkDuplicates both 
> in
>     >     duplicate marking as well as optical duplicate detection.  This 
> results
>     >     in MarkDuplicates needing to write slightly larger temporary files 
> when
>     >     running.  SamFileTester was also improved to handle the various test
>     >     cases for duplicate marking testing.
>     >
>     > - Updates to IntervalList:
>     >     -- Added capacity to create a simple interval list from a string 
> (the name of the contig)
>     >     -- Added the capacity to subtract one interval list from another 
> (currently
>     >        it would only work if they were both wrapped inside a container)
>     >
>     > - Updates to SamLocusIterator
>     >     -- Performance optimizations gaining about 35% speed up...
>     >
>     > - Updates to MarkDuplicates:
>     >     -- Removed unnecessary storage of a string in the Read Ends in Mark
>     >     -- Clarifed the size of ReadEndsForMarkDuplicates
>     >
>     > - Updated the minimum number of times that the BAIT_INTERVALS (in 
> CalculateHsMetrics) and TARGET_INTERVALS (in CollectTargetedMetrics) must be 
> set to one.
>     >
>     > - Moved CollectHiSeqPfFailMetrics into picard public
>     >
>     > - Updates to documentation generation (internal):
>     >     -- changed link to IntervalList.java documentation
>     >     -- updated how _includes/command-line-usage.html is generated
>     >
>     > - Moved SAMSequenceDictionaryExtractor and tests from picard to htsjdk
>     >
>     > - George
>     >
>     >
>     >
>     > 
> ------------------------------------------------------------------------------
>     > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
>     > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
>     > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
>     > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>     > 
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
>     >
>     >
>     >
>     > _______________________________________________
>     > Samtools-help mailing list
>     > Samtools-help@lists.sourceforge.net 
> <mailto:Samtools-help@lists.sourceforge.net>
>     > https://lists.sourceforge.net/lists/listinfo/samtools-help
>     >
> 
>     
> ------------------------------------------------------------------------------
>     Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
>     Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
>     Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
>     Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>     
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
>     _______________________________________________
>     Samtools-help mailing list
>     Samtools-help@lists.sourceforge.net 
> <mailto:Samtools-help@lists.sourceforge.net>
>     https://lists.sourceforge.net/lists/listinfo/samtools-help
> 
> 

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Samtools-help mailing list
Samtools-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to