Re: [VOTE] Apache Tika 1.14 Release Candidate #1
[Resending - has anyone else run into this same issue, when building from the 1.14-rc1 tag?] Just for grins, I pulled from git and checked out the the 1.14-rc1 tag, then ran “mvn clean package”. For me it fails with: Running org.apache.tika.parser.strings.StringsParserTest Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.685 sec <<< FAILURE! - in org.apache.tika.parser.strings.StringsParserTest testParse(org.apache.tika.parser.strings.StringsParserTest) Time elapsed: 1.685 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.tika.parser.strings.StringsParserTest.testParse(StringsParserTest.java:68) … Results : Failed tests: StringsParserTest.testParse:68 null Tests run: 755, Failures: 1, Errors: 0, Skipped: 18 — Ken > On Oct 19, 2016, at 11:48am, Chris Mattmann wrote: > > Hi Folks, > > A first candidate for the Tika 1.14 release is available at: > > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: > > https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=687d7706c9778e4f49f2834a07e5a9d99b23042b > > > The SHA1 checksum of the archive is: > ad9152392ffe6b620c8102ab538df0579b36c520 > > In addition, a staged maven repository is available here: > > https://repository.apache.org/content/repositories/orgapachetika-1020/ > > Please vote on releasing this package as Apache Tika 1.14. > The vote is open for the next 72 hours and passes if a majority of at > least three +1 Tika PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.14 > [ ] -1 Do not release this package because.. > > Cheers, > Chris > > P.S. Of course here is my +1. > > > > > -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
[jira] [Commented] (TIKA-2098) Tika.parseToString() with maxLength doesn't work correctly for PDF files
[ https://issues.apache.org/jira/browse/TIKA-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626453#comment-15626453 ] Hudson commented on TIKA-2098: -- FAILURE: Integrated in Jenkins build tika-2.x #169 (See [https://builds.apache.org/job/tika-2.x/169/]) improve unit test for TIKA-2098 (tallison: rev 6ca74bec6a1d448bbe3340d51dc84ca8ca58507a) * (edit) tika-parser-modules/tika-parser-multimedia-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java > Tika.parseToString() with maxLength doesn't work correctly for PDF files > > > Key: TIKA-2098 > URL: https://issues.apache.org/jira/browse/TIKA-2098 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 >Reporter: Alexander Kazakov >Assignee: Tim Allison > Labels: java, parser, pdf > Fix For: 2.0, 1.14 > > > When parsing PDF file with Tika.parseToString(InputStream stream, Metadata > metadata, int maxLength) and maxLength < content size it throws Exception. > {code:java} > org.apache.tika.exception.TikaException: Unable to extract all PDF content > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:135) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:150) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at org.apache.tika.Tika.parseToString(Tika.java:568) > Caused by: org.apache.commons.io.IOExceptionWithCause: Unable to write a > string: Tika - Content Analysis Toolkit > at org.apache.tika.parser.pdf.PDF2XHTML.writeString(PDF2XHTML.java:302) > at > org.apache.pdfbox.text.PDFTextStripper.writeString(PDFTextStripper.java:779) > at > org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1738) > at > org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:672) > at > org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392) > at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:143) > at > org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) > at > org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:111) > ... 35 more > Caused by: org.apache.tika.sax.TaggedSAXException: Your document contained > more than 100 characters, and so your requested limit has been reached. To > receive the full text of the document, increase your limit. (Text up to the > limit is however available). > org.apache.tika.sax.TaggedSAXException: Your document contained more than 100 > characters, and so your requested limit has been reached. To receive the full > text of the document, increase your limit. (Text up to the limit is however > available). > org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your > document contained more than 100 characters, and so your requested limit has > been reached. To receive the full text of the document, increase your limit. > (Text up to the limit is however available). > at > org.apache.tika.sax.TaggedContentHandler.handleException(TaggedContentHandler.java:113) > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:148) > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) > at > org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) > at > org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) > at > org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) > at > org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) > at > org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279) > at > org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306) > at org.apache.tika.parser.pdf.PDF2XHTML.writeString(PDF2XHTML.java:300) > ... 43 more > Caused by: org.apache.tika.sax.TaggedSAXException: Your document contained > more than 100 characters, and so your requested limit has been reached. To > receive the full text of the document, increase your limit. (Text up to the > limit is however available). > org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your > document contained more than 100 characters, and so your requested limit has > been reached. To receive the full text of the document, increase your limit. > (Text up to the li
tika-2.x - Build # 169 - Still Failing
The Apache Jenkins build system has built tika-2.x (build #169) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x/169/ to view the results.
[jira] [Commented] (TIKA-2098) Tika.parseToString() with maxLength doesn't work correctly for PDF files
[ https://issues.apache.org/jira/browse/TIKA-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626388#comment-15626388 ] Hudson commented on TIKA-2098: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1131 (See [https://builds.apache.org/job/Tika-trunk/1131/]) improve test for TIKA-2098 (tallison: rev 2df68c84b043f3158c0bdfa63d1a0c8d44d7e18a) * (edit) tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java > Tika.parseToString() with maxLength doesn't work correctly for PDF files > > > Key: TIKA-2098 > URL: https://issues.apache.org/jira/browse/TIKA-2098 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 >Reporter: Alexander Kazakov >Assignee: Tim Allison > Labels: java, parser, pdf > Fix For: 2.0, 1.14 > > > When parsing PDF file with Tika.parseToString(InputStream stream, Metadata > metadata, int maxLength) and maxLength < content size it throws Exception. > {code:java} > org.apache.tika.exception.TikaException: Unable to extract all PDF content > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:135) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:150) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at org.apache.tika.Tika.parseToString(Tika.java:568) > Caused by: org.apache.commons.io.IOExceptionWithCause: Unable to write a > string: Tika - Content Analysis Toolkit > at org.apache.tika.parser.pdf.PDF2XHTML.writeString(PDF2XHTML.java:302) > at > org.apache.pdfbox.text.PDFTextStripper.writeString(PDFTextStripper.java:779) > at > org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1738) > at > org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:672) > at > org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392) > at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:143) > at > org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) > at > org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:111) > ... 35 more > Caused by: org.apache.tika.sax.TaggedSAXException: Your document contained > more than 100 characters, and so your requested limit has been reached. To > receive the full text of the document, increase your limit. (Text up to the > limit is however available). > org.apache.tika.sax.TaggedSAXException: Your document contained more than 100 > characters, and so your requested limit has been reached. To receive the full > text of the document, increase your limit. (Text up to the limit is however > available). > org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your > document contained more than 100 characters, and so your requested limit has > been reached. To receive the full text of the document, increase your limit. > (Text up to the limit is however available). > at > org.apache.tika.sax.TaggedContentHandler.handleException(TaggedContentHandler.java:113) > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:148) > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) > at > org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) > at > org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) > at > org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) > at > org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) > at > org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279) > at > org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306) > at org.apache.tika.parser.pdf.PDF2XHTML.writeString(PDF2XHTML.java:300) > ... 43 more > Caused by: org.apache.tika.sax.TaggedSAXException: Your document contained > more than 100 characters, and so your requested limit has been reached. To > receive the full text of the document, increase your limit. (Text up to the > limit is however available). > org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your > document contained more than 100 characters, and so your requested limit has > been reached. To receive the full text of the document, increase your limit. > (Text up to the limit is however available). > a
[jira] [Commented] (TIKA-2152) NullPointerException on a valid Word file
[ https://issues.apache.org/jira/browse/TIKA-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626330#comment-15626330 ] Tim Allison commented on TIKA-2152: --- https://bz.apache.org/bugzilla/show_bug.cgi?id=60329 > NullPointerException on a valid Word file > - > > Key: TIKA-2152 > URL: https://issues.apache.org/jira/browse/TIKA-2152 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 > Environment: Windows 7 x64, JVM 1.8.0_101 >Reporter: Seva Alekseyev > Attachments: A5346.docx > > > On the attached Word document, which opens fine in Word, the Tika parser > throws the following error: > java.lang.NullPointerException > at > org.apache.poi.xwpf.usermodel.XWPFStyles.getStyle(XWPFStyles.java:198) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractParagraph(XWPFWordExtractorDecorator.java:149) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractIBodyText(XWPFWordExtractorDecorator.java:107) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTable(XWPFWordExtractorDecorator.java:362) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractHeaderText(XWPFWordExtractorDecorator.java:414) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractHeaders(XWPFWordExtractorDecorator.java:404) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:89) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:109) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-2153) TaggedIOException on a valid Powerpoint file
Seva Alekseyev created TIKA-2153: Summary: TaggedIOException on a valid Powerpoint file Key: TIKA-2153 URL: https://issues.apache.org/jira/browse/TIKA-2153 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.13 Environment: Windows 7 x64, JVM 1.8.0_101 Reporter: Seva Alekseyev On the following Powerpoint file, which opens fine with Powerpoint: https://dl.dropboxusercontent.com/u/92341073/Data%20Club%202%20March%2028.pptx the Tika parses throws the following error: org.apache.tika.io.TaggedIOException: invalid stored block lengths at org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:82) at org.apache.tika.mime.MimeTypes.readMagicHeader(MimeTypes.java:258) at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:471) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:298) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:199) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:112) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) Caused by: org.apache.tika.io.TaggedIOException: invalid stored block lengths at org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:103) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:78) ... 13 more Caused by: java.util.zip.ZipException: invalid stored block lengths at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164) at org.apache.poi.openxml4j.util.ZipSecureFile$ThresholdInputStream.read(ZipSecureFile.java:213) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) ... 19 more Could be similar to #2130. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-2152) NullPointerException on a valid Word file
Seva Alekseyev created TIKA-2152: Summary: NullPointerException on a valid Word file Key: TIKA-2152 URL: https://issues.apache.org/jira/browse/TIKA-2152 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.13 Environment: Windows 7 x64, JVM 1.8.0_101 Reporter: Seva Alekseyev Attachments: A5346.docx On the attached Word document, which opens fine in Word, the Tika parser throws the following error: java.lang.NullPointerException at org.apache.poi.xwpf.usermodel.XWPFStyles.getStyle(XWPFStyles.java:198) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractParagraph(XWPFWordExtractorDecorator.java:149) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractIBodyText(XWPFWordExtractorDecorator.java:107) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTable(XWPFWordExtractorDecorator.java:362) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractHeaderText(XWPFWordExtractorDecorator.java:414) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractHeaders(XWPFWordExtractorDecorator.java:404) at org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:89) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:109) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-2152) NullPointerException on a valid Word file
[ https://issues.apache.org/jira/browse/TIKA-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Seva Alekseyev updated TIKA-2152: - Attachment: A5346.docx > NullPointerException on a valid Word file > - > > Key: TIKA-2152 > URL: https://issues.apache.org/jira/browse/TIKA-2152 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 > Environment: Windows 7 x64, JVM 1.8.0_101 >Reporter: Seva Alekseyev > Attachments: A5346.docx > > > On the attached Word document, which opens fine in Word, the Tika parser > throws the following error: > java.lang.NullPointerException > at > org.apache.poi.xwpf.usermodel.XWPFStyles.getStyle(XWPFStyles.java:198) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractParagraph(XWPFWordExtractorDecorator.java:149) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractIBodyText(XWPFWordExtractorDecorator.java:107) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTable(XWPFWordExtractorDecorator.java:362) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractHeaderText(XWPFWordExtractorDecorator.java:414) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractHeaders(XWPFWordExtractorDecorator.java:404) > at > org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:89) > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:109) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:112) > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
tika-2.x - Build # 168 - Failure
The Apache Jenkins build system has built tika-2.x (build #168) Status: Failure Check console output at https://builds.apache.org/job/tika-2.x/168/ to view the results.
[jira] [Commented] (TIKA-2151) Imposed Write Limit Causes Lost Data With Pdfs
[ https://issues.apache.org/jira/browse/TIKA-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626214#comment-15626214 ] Josh Cummings commented on TIKA-2151: - Agreed. When I did my search, I just searched for unresolved issues. Should have checked Resolved, too. Thanks! > Imposed Write Limit Causes Lost Data With Pdfs > -- > > Key: TIKA-2151 > URL: https://issues.apache.org/jira/browse/TIKA-2151 > Project: Tika > Issue Type: Bug > Components: core >Affects Versions: 1.13 >Reporter: Josh Cummings >Priority: Critical > > When we upgraded to 1.13, we noticed a new exception in our logs: > org.apache.tika.exception.TikaException: Unable to extract all PDF content > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:184) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:144) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at org.apache.tika.Tika.parseToString(Tika.java:527) > at org.apache.tika.Tika.parseToString(Tika.java:602) > at > com.attask.tika.WriteLimitAllCatchTikaTest.testStillNeedOverride(WriteLimitAllCatchTikaTest.java:31) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) > Caused by: org.apache.commons.io.IOExceptionWithCause: Unable to write a > string: One will of mine to make thy large will more. > at org.apache.tika.parser.pdf.PDF2XHTML.writeString(PDF2XHTML.java:500) > at > org.apache.pdfbox.text.PDFTextStripper.writeString(PDFTextStripper.java:779) > at > org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1738) > at > org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:672) > at > org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392) > at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:214) > at > org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) > at > org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:160) > ... 33 more > Caused by: org.apache.tika.sax.TaggedSAXException: Your document contained > more than 10 characters, and so your requested limit has been reached. To > receive the full text of the document, increase your limit. (Text up to the > limit is however available). > org.apache.tika.sax.TaggedSAXException: Your document contained more than
[jira] [Commented] (TIKA-2151) Imposed Write Limit Causes Lost Data With Pdfs
[ https://issues.apache.org/jira/browse/TIKA-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626194#comment-15626194 ] Tim Allison commented on TIKA-2151: --- I think this may be a duplicate of TIKA-2098. The fix will be in Tika 1.14, which should be out towards the end of the week. I just improved the unit test for TIKA-2098 to be: {noformat} @Test public void testMaxLength() throws Exception { InputStream is = getResourceAsStream("/test-documents/testPDF.pdf"); String content = new Tika().parseToString(is, new Metadata(), 100); assertTrue(content.length() == 100); assertContains("Tika - Content", content); } {noformat} > Imposed Write Limit Causes Lost Data With Pdfs > -- > > Key: TIKA-2151 > URL: https://issues.apache.org/jira/browse/TIKA-2151 > Project: Tika > Issue Type: Bug > Components: core >Affects Versions: 1.13 >Reporter: Josh Cummings >Priority: Critical > > When we upgraded to 1.13, we noticed a new exception in our logs: > org.apache.tika.exception.TikaException: Unable to extract all PDF content > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:184) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:144) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at org.apache.tika.Tika.parseToString(Tika.java:527) > at org.apache.tika.Tika.parseToString(Tika.java:602) > at > com.attask.tika.WriteLimitAllCatchTikaTest.testStillNeedOverride(WriteLimitAllCatchTikaTest.java:31) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) > Caused by: org.apache.commons.io.IOExceptionWithCause: Unable to write a > string: One will of mine to make thy large will more. > at org.apache.tika.parser.pdf.PDF2XHTML.writeString(PDF2XHTML.java:500) > at > org.apache.pdfbox.text.PDFTextStripper.writeString(PDFTextStripper.java:779) > at > org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1738) > at > org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:672) > at > org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392) > at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:214) > at > org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) > at > org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) > at org.apache.tika.parser.pdf.P
[jira] [Created] (TIKA-2151) Imposed Write Limit Causes Lost Data With Pdfs
Josh Cummings created TIKA-2151: --- Summary: Imposed Write Limit Causes Lost Data With Pdfs Key: TIKA-2151 URL: https://issues.apache.org/jira/browse/TIKA-2151 Project: Tika Issue Type: Bug Components: core Affects Versions: 1.13 Reporter: Josh Cummings Priority: Critical When we upgraded to 1.13, we noticed a new exception in our logs: org.apache.tika.exception.TikaException: Unable to extract all PDF content at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:184) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:144) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.Tika.parseToString(Tika.java:527) at org.apache.tika.Tika.parseToString(Tika.java:602) at com.attask.tika.WriteLimitAllCatchTikaTest.testStillNeedOverride(WriteLimitAllCatchTikaTest.java:31) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) Caused by: org.apache.commons.io.IOExceptionWithCause: Unable to write a string: One will of mine to make thy large will more. at org.apache.tika.parser.pdf.PDF2XHTML.writeString(PDF2XHTML.java:500) at org.apache.pdfbox.text.PDFTextStripper.writeString(PDFTextStripper.java:779) at org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1738) at org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:672) at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:392) at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:214) at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:160) ... 33 more Caused by: org.apache.tika.sax.TaggedSAXException: Your document contained more than 10 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available). org.apache.tika.sax.TaggedSAXException: Your document contained more than 10 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available). org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your document contained more than 10 characters, and so your requested limit has been reached. To receive the full text of the document, increa
[jira] [Commented] (TIKA-2111) Executable Parser adds Content-Type instead of setting
[ https://issues.apache.org/jira/browse/TIKA-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625783#comment-15625783 ] Hudson commented on TIKA-2111: -- SUCCESS: Integrated in Jenkins build tika-2.x #167 (See [https://builds.apache.org/job/tika-2.x/167/]) TIKA-2111 - ExecutableParser should set rather than add a Content-Type (tallison: rev a6978521fb4c75195180d33734ceb23de8b6bd43) * (edit) tika-parser-modules/tika-parser-code-module/src/main/java/org/apache/tika/parser/executable/ExecutableParser.java * (edit) tika-parser-modules/tika-parser-code-module/src/test/java/org/apache/tika/parser/executable/ExecutableParserTest.java > Executable Parser adds Content-Type instead of setting > -- > > Key: TIKA-2111 > URL: https://issues.apache.org/jira/browse/TIKA-2111 > Project: Tika > Issue Type: Bug >Reporter: Tim Allison >Priority: Trivial > Fix For: 2.0, 1.15 > > > The ExecutableParser {{add}} s {{Content-Type}} instead of setting it. This > can lead to multiple or duplicate {{Content-Type}} s. > Should probably have asked on the user-list first...Is this the desired > behavior? If not, let's convert {{add()}} to {{set()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TIKA-2111) Executable Parser adds Content-Type instead of setting
[ https://issues.apache.org/jira/browse/TIKA-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2111. --- Resolution: Fixed Fix Version/s: 1.15 2.0 > Executable Parser adds Content-Type instead of setting > -- > > Key: TIKA-2111 > URL: https://issues.apache.org/jira/browse/TIKA-2111 > Project: Tika > Issue Type: Bug >Reporter: Tim Allison >Priority: Trivial > Fix For: 2.0, 1.15 > > > The ExecutableParser {{add}} s {{Content-Type}} instead of setting it. This > can lead to multiple or duplicate {{Content-Type}} s. > Should probably have asked on the user-list first...Is this the desired > behavior? If not, let's convert {{add()}} to {{set()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2111) Executable Parser adds Content-Type instead of setting
[ https://issues.apache.org/jira/browse/TIKA-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625640#comment-15625640 ] Hudson commented on TIKA-2111: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1130 (See [https://builds.apache.org/job/Tika-trunk/1130/]) TIKA-2111 - set instead of add "Content-Type" in the ExecutableParser (tallison: rev 15a92302501d5ee6a319442c8109eafe37ec4595) * (edit) tika-parsers/src/test/java/org/apache/tika/parser/executable/ExecutableParserTest.java * (edit) tika-parsers/src/main/java/org/apache/tika/parser/executable/ExecutableParser.java > Executable Parser adds Content-Type instead of setting > -- > > Key: TIKA-2111 > URL: https://issues.apache.org/jira/browse/TIKA-2111 > Project: Tika > Issue Type: Bug >Reporter: Tim Allison >Priority: Trivial > > The ExecutableParser {{add}} s {{Content-Type}} instead of setting it. This > can lead to multiple or duplicate {{Content-Type}} s. > Should probably have asked on the user-list first...Is this the desired > behavior? If not, let's convert {{add()}} to {{set()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2143) POI deprecated method used in TIKA 1.13
[ https://issues.apache.org/jira/browse/TIKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625484#comment-15625484 ] Tim Allison commented on TIKA-2143: --- Hi [~sbathrutheen], any luck finding an older version of POI on your classpath? > POI deprecated method used in TIKA 1.13 > > > Key: TIKA-2143 > URL: https://issues.apache.org/jira/browse/TIKA-2143 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.9, 1.13 > Environment: Windows java application >Reporter: sbathrutheen >Priority: Trivial > Fix For: 1.13 > > > We see that TIKA throws a long list of errors when extraction ppt files. We > tested with standalone tike application (1.13) we cannot reproduce the issue. > We took a look at POI source code and abserved the class "HSLFSlideShow" we > could see the below deprecated method defined > * > /** > - * Get the lookup from slide numbers to their offsets inside > - * _ptrData, used when adding or moving slides. > - * > - * @deprecated since POI 3.11, not supported anymore > - */ > - @Deprecated > - public Hashtable getSlideOffsetDataLocationsLookup() { > - throw new > UnsupportedOperationException("PersistPtrHolder.getSlideOffsetDataLocationsLookup() > is not supported since 3.12-Beta1"); > - } > * > we may think Tika library still calling this deprecated method causing this > run time Exception > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@204c3b78 > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > com.searchtechnologies.aspire.docprocessing.extracttext.ExtractTextStage.process(ExtractTextStage.java:140) > ... 14 more > Caused by: java.lang.UnsupportedOperationException > at java.util.AbstractMap$SimpleImmutableEntry.setValue(Unknown Source) > at org.apache.poi.hslf.HSLFSlideShow.read(HSLFSlideShow.java:293) > at org.apache.poi.hslf.HSLFSlideShow.buildRecords(HSLFSlideShow.java:273) > at org.apache.poi.hslf.HSLFSlideShow.(HSLFSlideShow.java:188) > at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61) > at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149) > at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > ... 17 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2056) Installing exiftool causes ForkParserIntegration test errors
[ https://issues.apache.org/jira/browse/TIKA-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625287#comment-15625287 ] Konstantin Gribov commented on TIKA-2056: - [~chrismattmann], I set "fix versions" to 1.15 just in case you wouldn't roll new RC. If you would, I'll update it. > Installing exiftool causes ForkParserIntegration test errors > > > Key: TIKA-2056 > URL: https://issues.apache.org/jira/browse/TIKA-2056 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.14 >Reporter: Chris A. Mattmann >Assignee: Konstantin Gribov > Fix For: 1.15 > > > [~rgauss] maybe you can help me with this. For some reason when I was trying > your PR, I got all sorts of weird errors that I thought had to do with your > PR, but in fact, had to do with Fork Parser Integration test. [~kkrugler] > I've seen you've contributed to the Fork parser tests so tagging you on this > too. Any reason you guys can think of that exiftool causes the Fork parser > integration tests to fail? > Here's the log msg (that I thought was due to the Sentiment parser, but is in > fact not!): > {noformat} > [INFO] Changes detected - recompiling the module! > [INFO] Compiling 124 source files to > /Users/mattmann/tmp/tika1.14/tika-parsers/target/test-classes > [INFO] > /Users/mattmann/tmp/tika1.14/tika-parsers/src/test/java/org/apache/tika/parser/odf/ODFParserTest.java: > Some input files use or override a deprecated API. > [INFO] > /Users/mattmann/tmp/tika1.14/tika-parsers/src/test/java/org/apache/tika/parser/odf/ODFParserTest.java: > Recompile with -Xlint:deprecation for details. > [INFO] > [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @ tika-parsers --- > [INFO] Surefire report directory: > /Users/mattmann/tmp/tika1.14/tika-parsers/target/surefire-reports > --- > T E S T S > --- > Running org.apache.tika.parser.fork.ForkParserIntegrationTest > Tests run: 5, Failures: 1, Errors: 3, Skipped: 0, Time elapsed: 2.46 sec <<< > FAILURE! - in org.apache.tika.parser.fork.ForkParserIntegrationTest > testForkedTextParsing(org.apache.tika.parser.fork.ForkParserIntegrationTest) > Time elapsed: 0.185 sec <<< ERROR! > org.apache.tika.exception.TikaException: Unable to serialize AutoDetectParser > to pass to the Forked Parser > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at java.util.ArrayList.writeObject(ArrayList.java:762) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at java.util.ArrayList.writeObject(ArrayList.java:762) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOut
[jira] [Resolved] (TIKA-2056) Installing exiftool causes ForkParserIntegration test errors
[ https://issues.apache.org/jira/browse/TIKA-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov resolved TIKA-2056. - Resolution: Fixed Fix Version/s: 1.15 > Installing exiftool causes ForkParserIntegration test errors > > > Key: TIKA-2056 > URL: https://issues.apache.org/jira/browse/TIKA-2056 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.14 >Reporter: Chris A. Mattmann >Assignee: Konstantin Gribov > Fix For: 1.15 > > > [~rgauss] maybe you can help me with this. For some reason when I was trying > your PR, I got all sorts of weird errors that I thought had to do with your > PR, but in fact, had to do with Fork Parser Integration test. [~kkrugler] > I've seen you've contributed to the Fork parser tests so tagging you on this > too. Any reason you guys can think of that exiftool causes the Fork parser > integration tests to fail? > Here's the log msg (that I thought was due to the Sentiment parser, but is in > fact not!): > {noformat} > [INFO] Changes detected - recompiling the module! > [INFO] Compiling 124 source files to > /Users/mattmann/tmp/tika1.14/tika-parsers/target/test-classes > [INFO] > /Users/mattmann/tmp/tika1.14/tika-parsers/src/test/java/org/apache/tika/parser/odf/ODFParserTest.java: > Some input files use or override a deprecated API. > [INFO] > /Users/mattmann/tmp/tika1.14/tika-parsers/src/test/java/org/apache/tika/parser/odf/ODFParserTest.java: > Recompile with -Xlint:deprecation for details. > [INFO] > [INFO] --- maven-surefire-plugin:2.18.1:test (default-test) @ tika-parsers --- > [INFO] Surefire report directory: > /Users/mattmann/tmp/tika1.14/tika-parsers/target/surefire-reports > --- > T E S T S > --- > Running org.apache.tika.parser.fork.ForkParserIntegrationTest > Tests run: 5, Failures: 1, Errors: 3, Skipped: 0, Time elapsed: 2.46 sec <<< > FAILURE! - in org.apache.tika.parser.fork.ForkParserIntegrationTest > testForkedTextParsing(org.apache.tika.parser.fork.ForkParserIntegrationTest) > Time elapsed: 0.185 sec <<< ERROR! > org.apache.tika.exception.TikaException: Unable to serialize AutoDetectParser > to pass to the Forked Parser > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at java.util.ArrayList.writeObject(ArrayList.java:762) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at java.util.ArrayList.writeObject(ArrayList.java:762) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutp