[jira] [Commented] (PDFBOX-1273) java.io.IOException: Error: Unknown annotation type null

2014-10-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181620#comment-14181620
 ] 

ASF subversion and git services commented on PDFBOX-1273:
-

Commit 1633897 from [~lehmi] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1633897 ]

PDFBOX-1273: skip null references within an annotation array to avoid 
IOException as proposed by William

 java.io.IOException: Error: Unknown annotation type null
 

 Key: PDFBOX-1273
 URL: https://issues.apache.org/jira/browse/PDFBOX-1273
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.7.0
Reporter: William
Priority: Minor
 Attachments: PDPageQuickFix.patch


 Hi,
 I've come across the following exception on a very small number of documents:
 org.apache.tika.exception.TikaException: Unable to extract PDF content
 at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:80) 
 ~[extractor.jar:na]
 at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:116) 
 ~[extractor.jar:na]
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
 ~[extractor.jar:na]
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
 ~[extractor.jar:na]
 at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
 ~[extractor.jar:na]
 Caused by: java.io.IOException: Error: Unknown annotation type null
 at 
 org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation.createAnnotation(PDAnnotation.java:165)
  ~[extractor.jar:na]
 at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:785) 
 ~[extractor.jar:na]
 at org.apache.pdfbox.tika.PDF2XHTML.endPage(PDF2XHTML.java:142) 
 ~[extractor.jar:na]
 at 
 org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:450) 
 ~[extractor.jar:na]
 at 
 org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372) 
 ~[extractor.jar:na]
 at 
 org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328) 
 ~[extractor.jar:na]
 at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:63) 
 ~[extractor.jar:na]
 Here are a few examples:
 http://www.jdsupra.com/documents/01ece854-a961-4184-8de7-f6d5311d6a48.pdf
 http://www.jdsupra.com/documents/0aabecb4-094a-40e4-a507-8b49ecb90a3e.pdf
 http://www.jdsupra.com/documents/0d74ccf8-2d57-487d-88c2-98eee26f8236.pdf
 Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-1273) java.io.IOException: Error: Unknown annotation type null

2014-10-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181624#comment-14181624
 ] 

ASF subversion and git services commented on PDFBOX-1273:
-

Commit 1633900 from [~lehmi] in branch 'pdfbox/branches/1.8'
[ https://svn.apache.org/r1633900 ]

PDFBOX-1273: skip null references within an annotation array to avoid 
IOException as proposed by William

 java.io.IOException: Error: Unknown annotation type null
 

 Key: PDFBOX-1273
 URL: https://issues.apache.org/jira/browse/PDFBOX-1273
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.7.0, 1.8.7, 2.0.0
Reporter: William
Priority: Minor
 Attachments: PDPageQuickFix.patch


 Hi,
 I've come across the following exception on a very small number of documents:
 org.apache.tika.exception.TikaException: Unable to extract PDF content
 at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:80) 
 ~[extractor.jar:na]
 at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:116) 
 ~[extractor.jar:na]
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
 ~[extractor.jar:na]
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
 ~[extractor.jar:na]
 at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
 ~[extractor.jar:na]
 Caused by: java.io.IOException: Error: Unknown annotation type null
 at 
 org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation.createAnnotation(PDAnnotation.java:165)
  ~[extractor.jar:na]
 at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:785) 
 ~[extractor.jar:na]
 at org.apache.pdfbox.tika.PDF2XHTML.endPage(PDF2XHTML.java:142) 
 ~[extractor.jar:na]
 at 
 org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:450) 
 ~[extractor.jar:na]
 at 
 org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372) 
 ~[extractor.jar:na]
 at 
 org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328) 
 ~[extractor.jar:na]
 at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:63) 
 ~[extractor.jar:na]
 Here are a few examples:
 http://www.jdsupra.com/documents/01ece854-a961-4184-8de7-f6d5311d6a48.pdf
 http://www.jdsupra.com/documents/0aabecb4-094a-40e4-a507-8b49ecb90a3e.pdf
 http://www.jdsupra.com/documents/0d74ccf8-2d57-487d-88c2-98eee26f8236.pdf
 Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-1273) java.io.IOException: Error: Unknown annotation type null

2013-03-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615797#comment-13615797
 ] 

Michael McCandless commented on PDFBOX-1273:


Looks like this is the same issue as TIKA-1098.

 java.io.IOException: Error: Unknown annotation type null
 

 Key: PDFBOX-1273
 URL: https://issues.apache.org/jira/browse/PDFBOX-1273
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.7.0
Reporter: William
Priority: Minor
 Attachments: PDPageQuickFix.patch


 Hi,
 I've come across the following exception on a very small number of documents:
 org.apache.tika.exception.TikaException: Unable to extract PDF content
 at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:80) 
 ~[extractor.jar:na]
 at org.apache.pdfbox.tika.PDFParser.parse(PDFParser.java:116) 
 ~[extractor.jar:na]
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
 ~[extractor.jar:na]
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
 ~[extractor.jar:na]
 at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
 ~[extractor.jar:na]
 Caused by: java.io.IOException: Error: Unknown annotation type null
 at 
 org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation.createAnnotation(PDAnnotation.java:165)
  ~[extractor.jar:na]
 at org.apache.pdfbox.pdmodel.PDPage.getAnnotations(PDPage.java:785) 
 ~[extractor.jar:na]
 at org.apache.pdfbox.tika.PDF2XHTML.endPage(PDF2XHTML.java:142) 
 ~[extractor.jar:na]
 at 
 org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:450) 
 ~[extractor.jar:na]
 at 
 org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372) 
 ~[extractor.jar:na]
 at 
 org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328) 
 ~[extractor.jar:na]
 at org.apache.pdfbox.tika.PDF2XHTML.process(PDF2XHTML.java:63) 
 ~[extractor.jar:na]
 Here are a few examples:
 http://www.jdsupra.com/documents/01ece854-a961-4184-8de7-f6d5311d6a48.pdf
 http://www.jdsupra.com/documents/0aabecb4-094a-40e4-a507-8b49ecb90a3e.pdf
 http://www.jdsupra.com/documents/0d74ccf8-2d57-487d-88c2-98eee26f8236.pdf
 Thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira