[ https://issues.apache.org/jira/browse/PDFBOX-361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671774#action_12671774 ]
Takashi Komatsubara commented on PDFBOX-361: -------------------------------------------- Hello, I have just got the latest code and confirm that James's change is working well. Here is the part which we have to change. PDFParse.java ------------------ ( begin ) else if( !pdfSource.isEOF() ) { // PDF Spec 1.5 introduced "Cross Reference Streams" // There can be multiple "%%EOF" strings in the file return parseObject (); /************************ //we might really be at the end of the file, there might just be some crap at the //end of the file. pdfSource.fillBuffer(); if( pdfSource.available() < 1000 ) { //We need to determine if we are at the end of the file. byte[] data = new byte[ 1000 ]; int amountRead = pdfSource.read( data ); if( amountRead != -1 ) { pdfSource.unread( data, 0, amountRead ); } boolean atEndOfFile = true;//we assume yes unless we find another. for( int i=0; i<amountRead-3 && atEndOfFile; i++ ) { atEndOfFile = !(data[i] == 'E' && data[i+1] == 'O' && data[i+2] == 'F' ); } if( atEndOfFile ) { while( pdfSource.read( data, 0, data.length ) != -1 ) { //read until done. } } } ***************************/ (End) Takashi. > NullPointerException in PDPageNode.getAllKids > --------------------------------------------- > > Key: PDFBOX-361 > URL: https://issues.apache.org/jira/browse/PDFBOX-361 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Reporter: Jukka Zitting > Attachments: Long_9.pdf, PDFParser.java > > > [Issue from SourceForge] > http://sourceforge.net/tracker/index.php?func=detail&aid=2008371&group_id=78314&atid=552832 > The parser cannot seem to find the Pages object in files created with > Acrobat Pro 9. A sample file is attached. > public static void main(String[] argv) throws Exception { > String name = "./test.pdf"; > PDDocument doc = PDDocument.load(name); > doc.close(); > PDPageNode root = doc.getDocumentCatalog().getPages(); > ArrayList<PDPage> pages = new ArrayList<PDPage>(); > root.getAllKids(pages); > System.out.println("pages.size() == "+pages.size()); > } > Exception in thread "main" java.lang.NullPointerException > at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) > at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) > http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&file_id=283367&aid=2008371 > [Comment on SourceForge] > Date: 2008-07-02 00:57 > Sender: foundart > Logged In: YES > user_id=1693709 > Originator: YES > This happens with the latest code from CVS and also in older versions. > [Comment on SourceForge] > Date: 2008-07-14 17:25 > Sender: orthello > Logged In: YES > user_id=853566 > Originator: NO > We are experiencing the same problem. Offending pdf available if any of > you need it (jwil...@nmcourt.fed.us). Looks like pdfbox does not support > some new feature introduced in Acrobat 9. > [Comment on SourceForge] > Date: 2008-07-14 23:20 > Sender: foundart > Logged In: YES > user_id=1693709 > Originator: YES > In Acrobat 8, the default was to generate PDFs following version 1.4 of > the PDF specification. In Acrobat 9, the default is to to generate PDFs > following version 1.5 of the PDF specification. PDF1.5 has objects known > as cross-reference streams and it turns out that PDFBox does not parse them > correctly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.