[ https://issues.apache.org/jira/browse/PDFBOX-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676299#action_12676299 ]
Justin LeFebvre commented on PDFBOX-186: ---------------------------------------- This file no longer exists so I can't test it. However, there have been a lot of recent changes to the parser which could have taken care of the problem. > NullPointerException in getAllKids with corrupted pdf > ----------------------------------------------------- > > Key: PDFBOX-186 > URL: https://issues.apache.org/jira/browse/PDFBOX-186 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Priority: Minor > > [imported from SourceForge] > http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1532246 > Originally submitted by ojaquemet on 2006-08-01 01:15. > java.lang.NullPointerException > at > org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) > at > org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) > at > org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226) > at > org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) > at [...] > Tested with PDFBox-0.7.2-log4j.jar and > PDFBox-0.7.3-dev-20060731.jar > Because the corrupted PDF is too big (7MB) to be > attached here, you'll be able to find it there: > http://olivier.jaquemet.free.fr/PDF-corrupted.pdf > [comment on SourceForge] > Originally sent by nobody. > Logged In: NO > I get this message too. How do you parse big PDFs? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.