[jira] [Commented] (TIKA-2117) NullPointerException on PDF (fixed in PDFBox)
[ https://issues.apache.org/jira/browse/TIKA-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590291#comment-15590291 ] Tim Allison commented on TIKA-2117: --- Thank you for checking. I forgot that PDFBox's ExtractText doesn't exercise extraction of form fields so you wouldn't trigger this issue by following our directions. Sorry. > NullPointerException on PDF (fixed in PDFBox) > - > > Key: TIKA-2117 > URL: https://issues.apache.org/jira/browse/TIKA-2117 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 > Environment: Windows 7 x64, JVM 1.8.0_101 >Reporter: Seva Alekseyev > > Tika PDF parser emits a NullPointerException on the following PDF file: > https://dl.dropboxusercontent.com/u/92341073/TEST_THOR.PDF > The file displays as expected in Acrobat. > The call stack goes: > java.lang.NullPointerException > at > org.apache.pdfbox.pdmodel.interactive.form.PDFieldFactory.findFieldType(PDFieldFactory.java:113) > at > org.apache.pdfbox.pdmodel.interactive.form.PDFieldFactory.createField(PDFieldFactory.java:48) > at > org.apache.pdfbox.pdmodel.interactive.form.PDField.fromDictionary(PDField.java:77) > at > org.apache.pdfbox.pdmodel.interactive.form.PDNonTerminalField.getChildren(PDNonTerminalField.java:136) > at > org.apache.tika.parser.pdf.PDF2XHTML.processAcroField(PDF2XHTML.java:698) > at > org.apache.tika.parser.pdf.PDF2XHTML.extractAcroForm(PDF2XHTML.java:680) > at org.apache.tika.parser.pdf.PDF2XHTML.endDocument(PDF2XHTML.java:243) > at > org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:267) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:160) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:144) > at gov.nih.niaid.fscanner.Extract.ExtractContents(Extract.java:62) > at gov.nih.niaid.temp.Main.main(Main.java:69) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2117) NullPointerException on PDF (fixed in PDFBox)
[ https://issues.apache.org/jira/browse/TIKA-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589481#comment-15589481 ] Seva Alekseyev commented on TIKA-2117: -- Doesn't reproduce in PDFBox trunk. > NullPointerException on PDF (fixed in PDFBox) > - > > Key: TIKA-2117 > URL: https://issues.apache.org/jira/browse/TIKA-2117 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 > Environment: Windows 7 x64, JVM 1.8.0_101 >Reporter: Seva Alekseyev > > Tika PDF parser emits a NullPointerException on the following PDF file: > https://dl.dropboxusercontent.com/u/92341073/TEST_THOR.PDF > The file displays as expected in Acrobat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2117) NullPointerException on PDF
[ https://issues.apache.org/jira/browse/TIKA-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576683#comment-15576683 ] Tim Allison commented on TIKA-2117: --- I confirmed both this and the other issue (TIKA-2121) still exist for Tika trunk. Please confirm that they both exist with PDFBox trunk. If they do, please open issues on PDFBox's JIRA and link to this issue and TIKA-2121. > NullPointerException on PDF > --- > > Key: TIKA-2117 > URL: https://issues.apache.org/jira/browse/TIKA-2117 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 > Environment: Windows 7 x64, JVM 1.8.0_101 >Reporter: Seva Alekseyev > > Tika PDF parser emits a NullPointerException on the following PDF file: > https://dl.dropboxusercontent.com/u/92341073/TEST_THOR.PDF > The file displays as expected in Acrobat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2117) NullPointerException on PDF
[ https://issues.apache.org/jira/browse/TIKA-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576634#comment-15576634 ] Tim Allison commented on TIKA-2117: --- Thank you for opening this issue and the others and for sharing the triggering docs! For PDFs, would you be willing to try the steps described here: [PDF_Text_Problems|https://wiki.apache.org/tika/Troubleshooting%20Tika#PDF_Text_Problems]? Thank you. > NullPointerException on PDF > --- > > Key: TIKA-2117 > URL: https://issues.apache.org/jira/browse/TIKA-2117 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 > Environment: Windows 7 x64, JVM 1.8.0_101 >Reporter: Seva Alekseyev > > Tika PDF parser emits a NullPointerException on the following PDF file: > https://dl.dropboxusercontent.com/u/92341073/TEST_THOR.PDF > The file displays as expected in Acrobat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)