Thanks for your ideas, Alec! But unfortunately it didn't work. Same error, but much faster. When I give 4g RAM, it goes till 10Mio reads and quits with the same error.
$ java -Xmx2g -jar EstimateLibraryComplexity.jar INPUT=file.bam OUTPUT=file.libraryComplexity2 READ_NAME_REGEX=null [Tue Jun 10 09:37:06 CEST 2014] picard.sam.EstimateLibraryComplexity INPUT=[file.bam] OUTPUT=file.libraryComplexity READ_NAME_REGEX=null MIN_IDENTICAL_BASES=5 MAX_DIFF_RATE=0.03 MIN_MEAN_QUALITY=20 MAX_GROUP_RATIO=500 OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Tue Jun 10 09:37:06 CEST 2014] Executing as me@work on Linux 3.6.2-1.fc16.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_07-b10; Picard version: 1.114(444810c1de1433d9eca8130be63ccc7fd70a9499_1400593393) JdkDeflater INFO 2014-06-10 09:37:06 EstimateLibraryComplexity Will store 3098916 read pairs in memory before sorting. INFO 2014-06-10 09:37:11 EstimateLibraryComplexity Read 1,000,000 records. Elapsed time: 00:00:04s. Time for last 1,000,000: 4s. Last read position: chr10:38,239,480 INFO 2014-06-10 09:37:15 EstimateLibraryComplexity Read 2,000,000 records. Elapsed time: 00:00:09s. Time for last 1,000,000: 4s. Last read position: chr10:73,576,914 INFO 2014-06-10 09:37:21 EstimateLibraryComplexity Read 3,000,000 records. Elapsed time: 00:00:14s. Time for last 1,000,000: 5s. Last read position: chr10:95,461,504 INFO 2014-06-10 09:37:30 EstimateLibraryComplexity Read 4,000,000 records. Elapsed time: 00:00:23s. Time for last 1,000,000: 9s. Last read position: chr10:104,953,620 INFO 2014-06-10 09:37:43 EstimateLibraryComplexity Read 5,000,000 records. Elapsed time: 00:00:36s. Time for last 1,000,000: 12s. Last read position: chr10:134,596,780 [Tue Jun 10 09:46:39 CEST 2014] picard.sam.EstimateLibraryComplexity done. Elapsed time: 9.54 minutes. Runtime.totalMemory()=1034944512 To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.lang.String.<init>(String.java:315) at htsjdk.samtools.util.StringUtil.bytesToString(StringUtil.java:301) at htsjdk.samtools.BAMRecord.decodeReadName(BAMRecord.java:331) at htsjdk.samtools.BAMRecord.getReadName(BAMRecord.java:220) at htsjdk.samtools.SAMUtils.validateCigar(SAMUtils.java:864) at htsjdk.samtools.SAMRecord.validateCigar(SAMRecord.java:1381) at htsjdk.samtools.BAMRecord.getCigar(BAMRecord.java:247) at htsjdk.samtools.SAMRecord.getAlignmentEnd(SAMRecord.java:460) at htsjdk.samtools.SAMRecord.computeIndexingBin(SAMRecord.java:1235) at htsjdk.samtools.SAMRecord.isValid(SAMRecord.java:1609) at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:642) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:514) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:488) at picard.sam.EstimateLibraryComplexity.doWork(EstimateLibraryComplexity.java:246) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183) at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:124) at picard.sam.EstimateLibraryComplexity.main(EstimateLibraryComplexity.java:217) On Jun 9, 2014, at 5:39 PM, Alec Wysoker <al...@broadinstitute.org> wrote: > Hi David, > > Unfortunately, the more memory you give to this program, the more it tries to > hold in RAM. There should be a way to control this but currently there > isn't. Two things you might try: > > • reduce the -Xmx value. E.g. try -Xmx2g . I know this is > counterintuitive but as I mentioned above this will reduce the amount of RAM > the program decides to use. > • Add the command-line argument READ_NAME_REGEX=null . This may cause > some inflation of the library size estimate if you have optical duplicates, > because optical duplicate detection will be disabled, but since your stack > traced below indicates that you are running out of memory parsing the > physical location information, disabling this feature might get the program > to run. > > Let me know how it goes. > > -Alec > > On Jun 5, 2014, at 9:41 AM, David Langenberger <david.langenber...@gmail.com> > wrote: > >> Dear Samtools-help list members, >> >> I want to run EstimateLibraryComplexity.jar with a 9.8GB big bam file, but I >> always get a OutOfMemoryError error. I already tried -Xmx (up to 60GB) and >> still get the error. Has anybody an idea of how to run >> EstimateLibraryComplexity on bigger bam files? >> >> >> That's my call and the error message: >> ============================= >> >> $ java -Xmx10g -jar EstimateLibraryComplexity.jar INPUT=file.bam >> OUTPUT=file.libraryComplexity >> >> [Wed Jun 04 21:43:08 CEST 2014] picard.sam.EstimateLibraryComplexity >> INPUT=[file.bam] OUTPUT=file.libraryComplexity MIN_IDENTICAL_BASES=5 >> MAX_DIFF_RATE=0.03 MIN_MEAN_QUALITY=20 MAX_GROUP_RATIO=500 >> READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* >> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false >> VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 >> CREATE_INDEX=false CREATE_MD5_FILE=false >> [Wed Jun 04 21:43:08 CEST 2014] Executing as me@work on Linux >> 3.6.2-1.fc16.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_07-b10; >> Picard version: 1.114(444810c1de1433d9eca8130be63ccc7fd70a9499_1400593393) >> JdkDeflater >> INFO 2014-06-04 21:43:08 EstimateLibraryComplexity Will store >> 15494157 read pairs in memory before sorting. >> INFO 2014-06-04 21:43:13 EstimateLibraryComplexity Read >> 1,000,000 records. Elapsed time: 00:00:05s. Time for last 1,000,000: >> 5s. Last read position: chr10:38,239,480 >> >> .... >> >> INFO 2014-06-04 21:53:21 EstimateLibraryComplexity Read >> 30,000,000 records. Elapsed time: 00:10:13s. Time for last 1,000,000: >> 183s. Last read position: chr15:34,522,127 >> >> [Wed Jun 04 22:54:26 CEST 2014] picard.sam.EstimateLibraryComplexity done. >> Elapsed time: 71.30 minutes. >> Runtime.totalMemory()=5801312256 >> To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp >> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space >> at java.util.Arrays.copyOfRange(Arrays.java:2694) >> at java.lang.String.<init>(String.java:203) >> at java.lang.String.substring(String.java:1913) >> at htsjdk.samtools.util.StringUtil.split(StringUtil.java:89) >> at >> picard.sam.AbstractDuplicateFindingAlgorithm.addLocationInformation(AbstractDuplicateFindingAlgorithm.java:71) >> at >> picard.sam.EstimateLibraryComplexity.doWork(EstimateLibraryComplexity.java:256) >> at >> picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183) >> at >> picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:124) >> at >> picard.sam.EstimateLibraryComplexity.main(EstimateLibraryComplexity.java:217) >> >> >> >> And that's the java version: >> ===================== >> >> $ java -showversion >> java version "1.7.0_07" >> Java(TM) SE Runtime Environment (build 1.7.0_07-b10) >> Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) >> >> >> I also tried ValidateSamFile.jar: >> ======================== >> >> >> $ java -jar /scr/k41san/tools/picard/picard-tools-1.114/ValidateSamFile.jar >> INPUT=file.bam MODE=SUMMARY >> >> [Thu Jun 05 12:12:17 CEST 2014] picard.sam.ValidateSamFile INPUT=file.bam >> MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true >> IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO >> QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 >> MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false >> [Thu Jun 05 12:12:17 CEST 2014] Executing as me@work on Linux >> 3.6.2-1.fc16.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_07-b10; >> Picard version: 1.114(444810c1de1433d9eca8130be63ccc7fd70a9499_1400593393) >> JdkDeflater >> INFO 2014-06-05 12:13:18 SamFileValidator Validated Read >> 10,000,000 records. Elapsed time: 00:01:00s. Time for last 10,000,000: >> 60s. Last read position: chr11:67,275,063 >> INFO 2014-06-05 12:14:36 SamFileValidator Validated Read >> 20,000,000 records. Elapsed time: 00:02:18s. Time for last 10,000,000: >> 77s. Last read position: chr12:112,229,147 >> INFO 2014-06-05 12:15:45 SamFileValidator Validated Read >> 30,000,000 records. Elapsed time: 00:03:27s. Time for last 10,000,000: >> 69s. Last read position: chr15:34,522,127 >> INFO 2014-06-05 12:18:05 SamFileValidator Validated Read >> 40,000,000 records. Elapsed time: 00:05:47s. Time for last 10,000,000: >> 140s. Last read position: chr16:56,362,603 >> INFO 2014-06-05 12:20:07 SamFileValidator Validated Read >> 50,000,000 records. Elapsed time: 00:07:49s. Time for last 10,000,000: >> 121s. Last read position: chr17:65,979,420 >> INFO 2014-06-05 12:21:11 SamFileValidator Validated Read >> 60,000,000 records. Elapsed time: 00:08:53s. Time for last 10,000,000: >> 64s. Last read position: chr19:38,049,399 >> INFO 2014-06-05 12:27:34 SamFileValidator Validated Read >> 70,000,000 records. Elapsed time: 00:15:16s. Time for last 10,000,000: >> 383s. Last read position: chr1:43,396,405 >> INFO 2014-06-05 12:48:18 SamFileValidator Validated Read >> 80,000,000 records. Elapsed time: 00:36:00s. Time for last 10,000,000: >> 1,243s. Last read position: chr1:246,706,542 >> >>>> Still running 2014-06-05 15:37 >> >> >> I also posted the problem at Biostars (https://www.biostars.org/p/102538/) >> and SEQanswers (http://seqanswers.com/forums/showthread.php?t=43910). >> >> >> Thanks for your help, >> David Langenberger >> ------------------------------------------------------------------------------ >> Learn Graph Databases - Download FREE O'Reilly Book >> "Graph Databases" is the definitive new guide to graph databases and their >> applications. Written by three acclaimed leaders in the field, >> this first edition is now available. Download your free book today! >> http://p.sf.net/sfu/NeoTech >> _______________________________________________ >> Samtools-help mailing list >> Samtools-help@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/samtools-help > ------------------------------------------------------------------------------ HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing & Easy Data Exploration http://p.sf.net/sfu/hpccsystems _______________________________________________ Samtools-help mailing list Samtools-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/samtools-help