Hi Rob, what version of pdfbox are you using? An older one or the trunk version? If you are using an older version, please try the trunk version if possible.
Please create an issue on [1] and attach one of the sample documents to it. Thanks in advance, Andreas Lehmkühler [1] https://issues.apache.org/jira/browse/PDFBOX Rob Whall schrieb: > Hi folks, > I am using PDFBox to try to index PDFs for Lucene. So far it has worked > like a champ, although I've recently started to get errors on a bunch of > PDFs that I'm trying to index. The specific error is in: > > PDPageNode.getAllKids(List result, COSDictionary page, boolean recurse) > The line is: > kids = (COSArray)page.getDictionaryObject( COSName.KIDS ); > > The problem is that the page getting passed in is null. It would appear > that the page is not being created properly. > > This sounds similar to a problem reported here (MarkMail thread) in > April: > http://pdfbox-users.markmail.org/search/?q=PDPageNode.getAllKids > > > I have sample PDFs I can send to someone if it would help the debugging > effort. I don't know what software created the PDFs, but I'm trying to > find out. If there is any other information I can provide, or help out > in any way to resolve this issue, please let me know. > > Thanks, > Rob > >