Hi there, I reported an issue related to the non sequential parser in the 1.8 code line last year (PDFBOX-1965) and was really happy to see that the issue was recently fixed. Thanks a lot, Andreas!
I also noticed that the non sequential parser will become the default parser in 2.0. In my project we're using pdfbox to verify that all pages in a given pdf can be printed by a 3rd party print service (all pages have to be A4, only use standard fonts or embed them otherwise, have certain margins etc etc). We noticed the document returned by getDocument() gets increasingly big memory wise (especially if the pdf is large and complex in structure - http://no.mouser.com/catalog/English/103/dload/pdf/mouser.pdf demonstrates the effect well) as we iterate over all the pages in the pdf, and we free it up gradually by doing the following in a subclass of NonSequentialParser / CosParser @Override public PDPage getPage(int pageNr) throws IOException { // Free up memory regularly if (pageNr % 5 == 0) { Set<COSObjectKey> cosObjectKeys = super.xrefTrailerResolver.getXrefTable().keySet(); for (COSObjectKey cosObjectKey : cosObjectKeys) { super.getDocument().removeObject(cosObjectKey); } } return super.getPage(pageNr); } This feels a bit like a hack - any chance this kind of functionality could be build into pdfbox? And, BTW, any clues when the 2.0 release will be ready? Are you planning on shipping release candidates too (which would prevent people from having to rely upon/distribute snapshot versions)? Thanks Stefan -- BEKK Open http://open.bekk.no TesTcl - a unit test framework for iRules http://testcl.com

