Sorry, for the delay I was away for a week. > a) use a scratch file PDDocument.load(File file, boolean useScratchFiles)
I could not find a load method that has a boolean parameter to indicate whether to use scratch files. However, If I use the PDDocument#load(File file, RandomAccess scratchFile) and specify a scratch file then I get an Exception which occurs for every page I process. The Exception itself doesn't seem to cause any issue as the resulting PDF looks correct, but it is disconcerting. The stacktrace for the thrown exception looks like: [error] Oct 03, 2015 11:10:50 AM org.apache.pdfbox.pdmodel.font.PDFont parseCmap [error] SEVERE: An error occurs while reading a CMap [error] java.io.IOException: Error: expected the end of a dictionary. [error] at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:432) [error] at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:119) [error] at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:626) [error] at org.apache.pdfbox.pdmodel.font.PDSimpleFont.extractToUnicodeEncoding(PDSimpleFont.java:457) [error] at org.apache.pdfbox.pdmodel.font.PDSimpleFont.determineEncoding(PDSimpleFont.java:411) [error] at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:214) [error] at org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:89) [error] at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:67) [error] at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108) [error] at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:213) [error] at org.apache.pdfbox.pdmodel.PDResources.addFont(PDResources.java:586) [error] at org.apache.pdfbox.pdmodel.edit.PDPageContentStream.setFont(PDPageContentStream.java:321) > b) don't use doc.getDocumentCatalog.getAllPages() as this fetches all pages > from the document but use PDDocumentCatalog.getPages() which only gives you > the root into the page tree (drawback is that you need to do the iteration > yourself). That has been enhanced in PDFBox 2.0.0 which also has an improved > resource handling. > I am just wondering how I do the iteration? Are there any examples? If I use PDDocumentCatalog#getPages() then I get a PDPageNode, but from there it looks like I have to call PDPageNode#getKids() which then just gives me a list of all pages, so I can not see how this would be any more efficient, can someone explain? Also I see that PDFBox 2.0.0 is not yet released but does have an iterator interface on PDPageTree. Is it already stable/reliable enough to use? -- Adam Retter skype: adam.retter tweet: adamretter http://www.adamretter.org.uk --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

