Re: Branch_1x build broke?
Or it might be that you have the python image preprocessing libraries installed (and I don’t)... Will fix today. On Thu, May 24, 2018 at 2:55 PM Tim Allisonwrote: > Y, you're probably running a different version of tesseract than I was > running and getting different (worse) text out during ocr. I guess we > could add an or 'dehaystack'? > > On Thu, May 24, 2018 at 12:09 PM, Chris Mattmann > wrote: > >> Tim, >> >> >> >> Are you seeing this? >> >> >> >> Results : >> >> >> >> Failed tests: >> >> >> PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103 >> pdf_haystack not found in: >> >> http://www.w3.org/1999/xhtml;> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > content="application/vnd.openxmlformats-officedocument.wordprocessingml.document" >> /> >> >> >> >> > content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" /> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Outer_haystack >> >> Outer_haystack >> >> >> >> >> >> Outer_haystack >> >> >> >> Outer_haystack >> >> >> >> Outer_haystack >> >> >> >> >> >> >> >> >> >> attached.pdf >> >> dehayslack dehaystack dehayslack >> dehaystack dehaystack dehaystack pd' >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Haystack >> >> >> >> Needle >> >> >> >> Haystack >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30 >> >> >> >> [INFO] >> >> >> [INFO] Reactor Summary: >> >> [INFO] >> >> [INFO] Apache Tika parent . SUCCESS [ >> 1.565 s] >> >> [INFO] Apache Tika core ... SUCCESS [ >> 32.977 s] >> >> [INFO] Apache Tika parsers FAILURE >> [05:52 min] >> >> [INFO] Apache Tika XMP SKIPPED >> >> [INFO] Apache Tika serialization .. SKIPPED >> >> [INFO] Apache Tika batch .. SKIPPED >> >> [INFO] Apache Tika language detection . SKIPPED >> >> [INFO] Apache Tika application SKIPPED >> >> [INFO] Apache Tika OSGi bundle SKIPPED >> >> [INFO] Apache Tika translate .. SKIPPED >> >> [INFO] Apache Tika server . SKIPPED >> >> [INFO] Apache Tika examples ... SKIPPED >> >> [INFO] Apache Tika Java-7 Components .. SKIPPED >> >> [INFO] Apache Tika eval ... SKIPPED >> >> [INFO] Apache Tika Deep Learning (powered by DL4J) SKIPPED >> >> [INFO] Apache Tika Natural Language Processing SKIPPED >> >> [INFO] Apache Tika SKIPPED >> >> [INFO] >> >> >> [INFO] BUILD FAILURE >> >> [INFO] >> >> >> [INFO] Total time: 06:27 min >> >> [INFO] Finished at: 2018-05-24T09:04:59-07:00 >> >> [INFO] Final Memory: 72M/1029M >> >> [INFO] >> >> >> [ERROR] Failed to execute goal >> org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) >> on project tika-parsers: There are test failures. >> >> [ERROR] >> >> [ERROR] Please refer to >> /Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the >> individual test results. >> >> [ERROR] -> [Help 1] >> >> [ERROR] >> >> [ERROR] To see the full stack trace of the errors, re-run Maven with the >> -e switch. >> >> [ERROR] Re-run Maven using the -X switch to enable full debug logging. >> >> [ERROR] >> >> [ERROR] For more information about the errors and possible solutions, >> please read the following articles: >> >> [ERROR] [Help 1] >> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException >> >> [ERROR] >> >> [ERROR] After correcting the problems, you can resume the build with the >> command >> >> [ERROR] mvn -rf :tika-parsers >> >> >> >> Keeps failing for me. >> >> nonas:tika2.0.0 mattmann$ java -version >> >> java version "1.8.0_144" >> >> Java(TM) SE Runtime Environment (build 1.8.0_144-b01) >> >> Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) >> >> nonas:tika2.0.0 mattmann$ >> >> >> >> Any ideas? >> >> >> >> Cheers, >> >> Chris >> >> >> >> >
Re: Branch_1x build broke?
Y, you're probably running a different version of tesseract than I was running and getting different (worse) text out during ocr. I guess we could add an or 'dehaystack'? On Thu, May 24, 2018 at 12:09 PM, Chris Mattmannwrote: > Tim, > > > > Are you seeing this? > > > > Results : > > > > Failed tests: > > PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103 > pdf_haystack not found in: > > http://www.w3.org/1999/xhtml;> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" > /> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Outer_haystack > > Outer_haystack > > > > > > Outer_haystack > > > > Outer_haystack > > > > Outer_haystack > > > > > > > > > > attached.pdf > > dehayslack dehaystack dehayslack > dehaystack dehaystack dehaystack pd' > > > > > > > > > > > > > > > > > > > > Haystack > > > > Needle > > > > Haystack > > > > > > > > > > > > > > > > > > > > > > > > > > Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30 > > > > [INFO] > > > [INFO] Reactor Summary: > > [INFO] > > [INFO] Apache Tika parent . SUCCESS [ > 1.565 s] > > [INFO] Apache Tika core ... SUCCESS [ > 32.977 s] > > [INFO] Apache Tika parsers FAILURE > [05:52 min] > > [INFO] Apache Tika XMP SKIPPED > > [INFO] Apache Tika serialization .. SKIPPED > > [INFO] Apache Tika batch .. SKIPPED > > [INFO] Apache Tika language detection . SKIPPED > > [INFO] Apache Tika application SKIPPED > > [INFO] Apache Tika OSGi bundle SKIPPED > > [INFO] Apache Tika translate .. SKIPPED > > [INFO] Apache Tika server . SKIPPED > > [INFO] Apache Tika examples ... SKIPPED > > [INFO] Apache Tika Java-7 Components .. SKIPPED > > [INFO] Apache Tika eval ... SKIPPED > > [INFO] Apache Tika Deep Learning (powered by DL4J) SKIPPED > > [INFO] Apache Tika Natural Language Processing SKIPPED > > [INFO] Apache Tika SKIPPED > > [INFO] > > > [INFO] BUILD FAILURE > > [INFO] > > > [INFO] Total time: 06:27 min > > [INFO] Finished at: 2018-05-24T09:04:59-07:00 > > [INFO] Final Memory: 72M/1029M > > [INFO] > > > [ERROR] Failed to execute goal org.apache.maven.plugins: > maven-surefire-plugin:2.18.1:test (default-test) on project tika-parsers: > There are test failures. > > [ERROR] > > [ERROR] Please refer to /Users/mattmann/tmp/tika2.0.0/ > tika-parsers/target/surefire-reports for the individual test results. > > [ERROR] -> [Help 1] > > [ERROR] > > [ERROR] To see the full stack trace of the errors, re-run Maven with the > -e switch. > > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > > [ERROR] > > [ERROR] For more information about the errors and possible solutions, > please read the following articles: > > [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ > MojoFailureException > > [ERROR] > > [ERROR] After correcting the problems, you can resume the build with the > command > > [ERROR] mvn -rf :tika-parsers > > > > Keeps failing for me. > > nonas:tika2.0.0 mattmann$ java -version > > java version "1.8.0_144" > > Java(TM) SE Runtime Environment (build 1.8.0_144-b01) > > Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) > > nonas:tika2.0.0 mattmann$ > > > > Any ideas? > > > > Cheers, > > Chris > > > >
Re: Branch_1x build broke?
Thanks Dave, yes I have tesseract enabled and this is on my Mac Book. Thanks for looking into it Dave… Cheers, Chris From: "loo...@gmail.com" <loo...@gmail.com> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> Date: Thursday, May 24, 2018 at 11:34 AM To: "dev@tika.apache.org" <dev@tika.apache.org> Subject: Re: Branch_1x build broke? Hey Chris, This is happening to me with Tesseract enabled but only on my MacBook. Are you running this on OSX? Been trying to get some time to dig into it as it works perfectly on my Windows and Linux setups. Cheers, Dave On Thu, 24 May 2018, 17:09 Chris Mattmann, <mattm...@apache.org> wrote: Tim, Are you seeing this? Results : Failed tests: PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103 pdf_haystack not found in: http://www.w3.org/1999/xhtml;> Outer_haystack Outer_haystack Outer_haystack Outer_haystack Outer_haystack attached.pdf dehayslack dehaystack dehayslack dehaystack dehaystack dehaystack pd' Haystack Needle Haystack Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Tika parent . SUCCESS [ 1.565 s] [INFO] Apache Tika core ... SUCCESS [ 32.977 s] [INFO] Apache Tika parsers FAILURE [05:52 min] [INFO] Apache Tika XMP SKIPPED [INFO] Apache Tika serialization .. SKIPPED [INFO] Apache Tika batch .. SKIPPED [INFO] Apache Tika language detection . SKIPPED [INFO] Apache Tika application SKIPPED [INFO] Apache Tika OSGi bundle SKIPPED [INFO] Apache Tika translate .. SKIPPED [INFO] Apache Tika server . SKIPPED [INFO] Apache Tika examples ... SKIPPED [INFO] Apache Tika Java-7 Components .. SKIPPED [INFO] Apache Tika eval ... SKIPPED [INFO] Apache Tika Deep Learning (powered by DL4J) SKIPPED [INFO] Apache Tika Natural Language Processing SKIPPED [INFO] Apache Tika SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 06:27 min [INFO] Finished at: 2018-05-24T09:04:59-07:00 [INFO] Final Memory: 72M/1029M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project tika-parsers: There are test failures. [ERROR] [ERROR] Please refer to /Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the individual test results. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tika-parsers Keeps failing for me. nonas:tika2.0.0 mattmann$ java -version java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) nonas:tika2.0.0 mattmann$ Any ideas? Cheers, Chris
Re: Branch_1x build broke?
Hey Chris, This is happening to me with Tesseract enabled but only on my MacBook. Are you running this on OSX? Been trying to get some time to dig into it as it works perfectly on my Windows and Linux setups. Cheers, Dave On Thu, 24 May 2018, 17:09 Chris Mattmann,wrote: > Tim, > > > > Are you seeing this? > > > > Results : > > > > Failed tests: > > > PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103 > pdf_haystack not found in: > > http://www.w3.org/1999/xhtml;> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > content="application/vnd.openxmlformats-officedocument.wordprocessingml.document" > /> > > > > content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" /> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Outer_haystack > > Outer_haystack > > > > > > Outer_haystack > > > > Outer_haystack > > > > Outer_haystack > > > > > > > > > > attached.pdf > > dehayslack dehaystack dehayslack > dehaystack dehaystack dehaystack pd' > > > > > > > > > > > > > > > > > > > > Haystack > > > > Needle > > > > Haystack > > > > > > > > > > > > > > > > > > > > > > > > > > Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30 > > > > [INFO] > > > [INFO] Reactor Summary: > > [INFO] > > [INFO] Apache Tika parent . SUCCESS [ > 1.565 s] > > [INFO] Apache Tika core ... SUCCESS [ > 32.977 s] > > [INFO] Apache Tika parsers FAILURE [05:52 > min] > > [INFO] Apache Tika XMP SKIPPED > > [INFO] Apache Tika serialization .. SKIPPED > > [INFO] Apache Tika batch .. SKIPPED > > [INFO] Apache Tika language detection . SKIPPED > > [INFO] Apache Tika application SKIPPED > > [INFO] Apache Tika OSGi bundle SKIPPED > > [INFO] Apache Tika translate .. SKIPPED > > [INFO] Apache Tika server . SKIPPED > > [INFO] Apache Tika examples ... SKIPPED > > [INFO] Apache Tika Java-7 Components .. SKIPPED > > [INFO] Apache Tika eval ... SKIPPED > > [INFO] Apache Tika Deep Learning (powered by DL4J) SKIPPED > > [INFO] Apache Tika Natural Language Processing SKIPPED > > [INFO] Apache Tika SKIPPED > > [INFO] > > > [INFO] BUILD FAILURE > > [INFO] > > > [INFO] Total time: 06:27 min > > [INFO] Finished at: 2018-05-24T09:04:59-07:00 > > [INFO] Final Memory: 72M/1029M > > [INFO] > > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) > on project tika-parsers: There are test failures. > > [ERROR] > > [ERROR] Please refer to > /Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the > individual test results. > > [ERROR] -> [Help 1] > > [ERROR] > > [ERROR] To see the full stack trace of the errors, re-run Maven with the > -e switch. > > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > > [ERROR] > > [ERROR] For more information about the errors and possible solutions, > please read the following articles: > > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > > [ERROR] > > [ERROR] After correcting the problems, you can resume the build with the > command > > [ERROR] mvn -rf :tika-parsers > > > > Keeps failing for me. > > nonas:tika2.0.0 mattmann$ java -version > > java version "1.8.0_144" > > Java(TM) SE Runtime Environment (build 1.8.0_144-b01) > > Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) > > nonas:tika2.0.0 mattmann$ > > > > Any ideas? > > > > Cheers, > > Chris > > > >
Branch_1x build broke?
Tim, Are you seeing this? Results : Failed tests: PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103 pdf_haystack not found in: http://www.w3.org/1999/xhtml;> Outer_haystack Outer_haystack Outer_haystack Outer_haystack Outer_haystack attached.pdf dehayslack dehaystack dehayslack dehaystack dehaystack dehaystack pd' Haystack Needle Haystack Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Tika parent . SUCCESS [ 1.565 s] [INFO] Apache Tika core ... SUCCESS [ 32.977 s] [INFO] Apache Tika parsers FAILURE [05:52 min] [INFO] Apache Tika XMP SKIPPED [INFO] Apache Tika serialization .. SKIPPED [INFO] Apache Tika batch .. SKIPPED [INFO] Apache Tika language detection . SKIPPED [INFO] Apache Tika application SKIPPED [INFO] Apache Tika OSGi bundle SKIPPED [INFO] Apache Tika translate .. SKIPPED [INFO] Apache Tika server . SKIPPED [INFO] Apache Tika examples ... SKIPPED [INFO] Apache Tika Java-7 Components .. SKIPPED [INFO] Apache Tika eval ... SKIPPED [INFO] Apache Tika Deep Learning (powered by DL4J) SKIPPED [INFO] Apache Tika Natural Language Processing SKIPPED [INFO] Apache Tika SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 06:27 min [INFO] Finished at: 2018-05-24T09:04:59-07:00 [INFO] Final Memory: 72M/1029M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project tika-parsers: There are test failures. [ERROR] [ERROR] Please refer to /Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the individual test results. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tika-parsers Keeps failing for me. nonas:tika2.0.0 mattmann$ java -version java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) nonas:tika2.0.0 mattmann$ Any ideas? Cheers, Chris