Ah, that makes perfect sense. I should have looked at the source myself, too lazy.
Thanks Nils Louis On 14-10-09 12:06 PM, Nils Homer wrote: > Hey Louis, > > I have copied the documentation from the source code, which I hope answers > your questions. Please let me know if you need more clarification. > > > This tool will also not work with alignments that have large gaps or > skips, such as those from RNA-seq data. This is due to the need to buffer > small genomic windows to ensure integrity of the duplicate marking, while > large skips (ex. skipping introns) in the alignment records would force > making that window very large, thus exhausting memory. > > source: > https://github.com/broadinstitute/picard/blob/master/src/java/picard/sam/markduplicates/MarkDuplicatesWithMateCigar.java > > Long story short is that SAM files are sorted by the 5' alignment start > position while for duplicate marking we look at the 3'-end sequencing start > position, with the latter significantly affected by soft clipping and skips > in the alignment. > > N > > On Thu, Oct 9, 2014 at 11:41 AM, Louis Letourneau > <louis.letourn...@mail.mcgill.ca <mailto:louis.letourn...@mail.mcgill.ca>> > wrote: > > I'm curious as to why MarkDuplicatesWithMateCigar has the "This tool > cannot be used with alignments that have large gaps or reference skips, which > happens frequently in RNA-seq data." limitation? > > Thanks > Louis > > On 14-10-08 11:25 AM, George Grant wrote: > > Picard Release 1.122 > > 8 October 2014 > > > > - New Command Line Program "GenotypeConcordance" > > -- Calculates the concordance between genotype data for two samples > in two different VCFs - one being considered the truth (or reference) the > other being considered the call. The concordance is broken into separate > results sections for SNPs and indels. Summary and detailed statistics are > reported. > > Note that for any pair of variants to compare, only the alleles for > the samples under interrogation are considered and MNP, Symbolic, and Mixed > classes of variants are not included. > > > > - New Command Line Program "UpdateVcfDictionary" > > -- Updates the sequence dictionary of a VCF from another file (SAM, > BAM, VCF, dictionary, interval_list, fasta, etc). > > > > - New Command Line Program "VcfToIntervalList" > > -- Create an interval list from a VCF > > > > - New Command Line Program "MarkDuplicatesWithMateCigar" > > -- A new tool with which to mark duplicates: > > This tool can replace MarkDuplicates if the input SAM/BAM has Mate > CIGAR (MC) optional tags > > pre-computed (see the tools > RevertOriginalBaseQualitiesAndAddMateCigar and > > FixMateInformation). This allows the new tool to perform a > streaming duplicate > > marking routine (i.e. a single-pass). This tool cannot be used with > > alignments that have large gaps or reference skips, which happens > > frequently in RNA-seq data. > > > > There were many refactors of the old MarkDuplicates and > > MarkDuplicatesWithMateCigar, since the share common code. > > EstimateLibraryComplexity was caught up in this too. > > > > Many, many, many unit tests were added to were added to prove > > equivalency of MarkDuplicatesWithMateCigar to MarkDuplicates. This > also > > exposed a few one in a million corner cases in MarkDuplicates both > in > > duplicate marking as well as optical duplicate detection. This > results > > in MarkDuplicates needing to write slightly larger temporary files > when > > running. SamFileTester was also improved to handle the various test > > cases for duplicate marking testing. > > > > - Updates to IntervalList: > > -- Added capacity to create a simple interval list from a string > (the name of the contig) > > -- Added the capacity to subtract one interval list from another > (currently > > it would only work if they were both wrapped inside a container) > > > > - Updates to SamLocusIterator > > -- Performance optimizations gaining about 35% speed up... > > > > - Updates to MarkDuplicates: > > -- Removed unnecessary storage of a string in the Read Ends in Mark > > -- Clarifed the size of ReadEndsForMarkDuplicates > > > > - Updated the minimum number of times that the BAIT_INTERVALS (in > CalculateHsMetrics) and TARGET_INTERVALS (in CollectTargetedMetrics) must be > set to one. > > > > - Moved CollectHiSeqPfFailMetrics into picard public > > > > - Updates to documentation generation (internal): > > -- changed link to IntervalList.java documentation > > -- updated how _includes/command-line-usage.html is generated > > > > - Moved SAMSequenceDictionaryExtractor and tests from picard to htsjdk > > > > - George > > > > > > > > > ------------------------------------------------------------------------------ > > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer > > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports > > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper > > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer > > > http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk > > > > > > > > _______________________________________________ > > Samtools-help mailing list > > Samtools-help@lists.sourceforge.net > <mailto:Samtools-help@lists.sourceforge.net> > > https://lists.sourceforge.net/lists/listinfo/samtools-help > > > > > ------------------------------------------------------------------------------ > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer > > http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk > _______________________________________________ > Samtools-help mailing list > Samtools-help@lists.sourceforge.net > <mailto:Samtools-help@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/samtools-help > > ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help