[ https://issues.apache.org/jira/browse/PDFBOX-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler resolved PDFBOX-1273. ---------------------------------------- Resolution: Fixed Fix Version/s: 2.0.0 1.8.8 Assignee: Andreas Lehmkühler I've fixed the issue as proposed. Thanks for the contribution and sorry for the delay! > java.io.IOException: Error: Unknown annotation type null > -------------------------------------------------------- > > Key: PDFBOX-1273 > URL: https://issues.apache.org/jira/browse/PDFBOX-1273 > Project: PDFBox > Issue Type: Bug > Components: PDModel > Affects Versions: 1.7.0, 1.8.7, 2.0.0 > Reporter: William > Assignee: Andreas Lehmkühler > Priority: Minor > Fix For: 1.8.8, 2.0.0 > > Attachments: PDPageQuickFix.patch > > > Hi, > I've come across the following exception on a very small number of documents: > org.apache.tika.exception.TikaException: Unable to extract PDF content > at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:80) > ~[extractor.jar:na] > at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:116) > ~[extractor.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > ~[extractor.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > ~[extractor.jar:na] > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > ~[extractor.jar:na] > Caused by: java.io.IOException: Error: Unknown annotation type null > at > org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation.createAnnotation(PDAnnotation.java:165) > ~[extractor.jar:na] > at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:785) > ~[extractor.jar:na] > at org.apache.pdfbox.tika.PDF2XHTML.endPage(PDF2XHTML.java:142) > ~[extractor.jar:na] > at > org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:450) > ~[extractor.jar:na] > at > org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372) > ~[extractor.jar:na] > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328) > ~[extractor.jar:na] > at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:63) > ~[extractor.jar:na] > Here are a few examples: > http://www.jdsupra.com/documents/01ece854-a961-4184-8de7-f6d5311d6a48.pdf > http://www.jdsupra.com/documents/0aabecb4-094a-40e4-a507-8b49ecb90a3e.pdf > http://www.jdsupra.com/documents/0d74ccf8-2d57-487d-88c2-98eee26f8236.pdf > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)