Hi folks,
I am using PDFBox to try to index PDFs for Lucene. So far it has worked
like a champ, although I've recently started to get errors on a bunch of
PDFs that I'm trying to index. The specific error is in: 

PDPageNode.getAllKids(List result, COSDictionary page, boolean recurse)
The line is:
kids = (COSArray)page.getDictionaryObject( COSName.KIDS );

The problem is that the page getting passed in is null. It would appear
that the page is not being created properly.

This sounds similar to a problem reported here (MarkMail thread) in
April:
http://pdfbox-users.markmail.org/search/?q=PDPageNode.getAllKids


I have sample PDFs I can send to someone if it would help the debugging
effort. I don't know what software created the PDFs, but I'm trying to
find out. If there is any other information I can provide, or help out
in any way to resolve this issue, please let me know.

Thanks,
Rob


Reply via email to