Hi folks, I am using PDFBox to try to index PDFs for Lucene. So far it has worked like a champ, although I've recently started to get errors on a bunch of PDFs that I'm trying to index. The specific error is in:
PDPageNode.getAllKids(List result, COSDictionary page, boolean recurse) The line is: kids = (COSArray)page.getDictionaryObject( COSName.KIDS ); The problem is that the page getting passed in is null. It would appear that the page is not being created properly. This sounds similar to a problem reported here (MarkMail thread) in April: http://pdfbox-users.markmail.org/search/?q=PDPageNode.getAllKids I have sample PDFs I can send to someone if it would help the debugging effort. I don't know what software created the PDFs, but I'm trying to find out. If there is any other information I can provide, or help out in any way to resolve this issue, please let me know. Thanks, Rob