Re: Branch_1x build broke?

2018-05-25 Thread Tim Allison
Or it might be that you have the python image preprocessing libraries
installed (and I don’t)...

Will fix today.

On Thu, May 24, 2018 at 2:55 PM Tim Allison  wrote:

> Y, you're probably running a different version of tesseract than I was
> running and getting different (worse) text out during ocr.  I guess we
> could add an or 'dehaystack'?
>
> On Thu, May 24, 2018 at 12:09 PM, Chris Mattmann 
> wrote:
>
>> Tim,
>>
>>
>>
>> Are you seeing this?
>>
>>
>>
>> Results :
>>
>>
>>
>> Failed tests:
>>
>>
>> PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103
>> pdf_haystack not found in:
>>
>> http://www.w3.org/1999/xhtml;>
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> > content="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
>> />
>>
>> 
>>
>> > content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" />
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> Outer_haystack
>>
>> Outer_haystack
>>
>> 
>>
>> 
>>
>> Outer_haystack
>>
>> 
>>
>> Outer_haystack
>>
>> 
>>
>> Outer_haystack
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> attached.pdf
>>
>> dehayslack dehaystack dehayslack
>> dehaystack dehaystack dehaystack pd'
>>
>>
>>
>> 
>>
>> 
>>
>> 
>>
>>
>>
>> 
>>
>>
>>
>> 
>>
>>
>>
>> Haystack
>>
>>
>>
>> Needle
>>
>>
>>
>> Haystack
>>
>>
>>
>> 
>>
>>
>>
>> 
>>
>>
>>
>> 
>>
>>
>>
>> 
>>
>>
>>
>> 
>>
>> 
>>
>>
>>
>> Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30
>>
>>
>>
>> [INFO]
>> 
>>
>> [INFO] Reactor Summary:
>>
>> [INFO]
>>
>> [INFO] Apache Tika parent . SUCCESS [
>> 1.565 s]
>>
>> [INFO] Apache Tika core ... SUCCESS [
>> 32.977 s]
>>
>> [INFO] Apache Tika parsers  FAILURE
>> [05:52 min]
>>
>> [INFO] Apache Tika XMP  SKIPPED
>>
>> [INFO] Apache Tika serialization .. SKIPPED
>>
>> [INFO] Apache Tika batch .. SKIPPED
>>
>> [INFO] Apache Tika language detection . SKIPPED
>>
>> [INFO] Apache Tika application  SKIPPED
>>
>> [INFO] Apache Tika OSGi bundle  SKIPPED
>>
>> [INFO] Apache Tika translate .. SKIPPED
>>
>> [INFO] Apache Tika server . SKIPPED
>>
>> [INFO] Apache Tika examples ... SKIPPED
>>
>> [INFO] Apache Tika Java-7 Components .. SKIPPED
>>
>> [INFO] Apache Tika eval ... SKIPPED
>>
>> [INFO] Apache Tika Deep Learning (powered by DL4J)  SKIPPED
>>
>> [INFO] Apache Tika Natural Language Processing  SKIPPED
>>
>> [INFO] Apache Tika  SKIPPED
>>
>> [INFO]
>> 
>>
>> [INFO] BUILD FAILURE
>>
>> [INFO]
>> 
>>
>> [INFO] Total time: 06:27 min
>>
>> [INFO] Finished at: 2018-05-24T09:04:59-07:00
>>
>> [INFO] Final Memory: 72M/1029M
>>
>> [INFO]
>> 
>>
>> [ERROR] Failed to execute goal
>> org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test)
>> on project tika-parsers: There are test failures.
>>
>> [ERROR]
>>
>> [ERROR] Please refer to
>> /Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the
>> individual test results.
>>
>> [ERROR] -> [Help 1]
>>
>> [ERROR]
>>
>> [ERROR] To see the full stack trace of the errors, re-run Maven with the
>> -e switch.
>>
>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>>
>> [ERROR]
>>
>> [ERROR] For more information about the errors and possible solutions,
>> please read the following articles:
>>
>> [ERROR] [Help 1]
>> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>>
>> [ERROR]
>>
>> [ERROR] After correcting the problems, you can resume the build with the
>> command
>>
>> [ERROR]   mvn  -rf :tika-parsers
>>
>>
>>
>> Keeps failing for me.
>>
>> nonas:tika2.0.0 mattmann$ java -version
>>
>> java version "1.8.0_144"
>>
>> Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>>
>> Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>>
>> nonas:tika2.0.0 mattmann$
>>
>>
>>
>> Any ideas?
>>
>>
>>
>> Cheers,
>>
>> Chris
>>
>>
>>
>>
>


Re: Branch_1x build broke?

2018-05-24 Thread Tim Allison
Y, you're probably running a different version of tesseract than I was
running and getting different (worse) text out during ocr.  I guess we
could add an or 'dehaystack'?

On Thu, May 24, 2018 at 12:09 PM, Chris Mattmann 
wrote:

> Tim,
>
>
>
> Are you seeing this?
>
>
>
> Results :
>
>
>
> Failed tests:
>
>   PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103
> pdf_haystack not found in:
>
> http://www.w3.org/1999/xhtml;>
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
>  content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser"
> />
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> Outer_haystack
>
> Outer_haystack
>
> 
>
> 
>
> Outer_haystack
>
> 
>
> Outer_haystack
>
> 
>
> Outer_haystack
>
> 
>
> 
>
> 
>
> 
>
> attached.pdf
>
> dehayslack dehaystack dehayslack
> dehaystack dehaystack dehaystack pd'
>
>
>
> 
>
> 
>
> 
>
>
>
> 
>
>
>
> 
>
>
>
> Haystack
>
>
>
> Needle
>
>
>
> Haystack
>
>
>
> 
>
>
>
> 
>
>
>
> 
>
>
>
> 
>
>
>
> 
>
> 
>
>
>
> Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30
>
>
>
> [INFO] 
> 
>
> [INFO] Reactor Summary:
>
> [INFO]
>
> [INFO] Apache Tika parent . SUCCESS [
> 1.565 s]
>
> [INFO] Apache Tika core ... SUCCESS [
> 32.977 s]
>
> [INFO] Apache Tika parsers  FAILURE
> [05:52 min]
>
> [INFO] Apache Tika XMP  SKIPPED
>
> [INFO] Apache Tika serialization .. SKIPPED
>
> [INFO] Apache Tika batch .. SKIPPED
>
> [INFO] Apache Tika language detection . SKIPPED
>
> [INFO] Apache Tika application  SKIPPED
>
> [INFO] Apache Tika OSGi bundle  SKIPPED
>
> [INFO] Apache Tika translate .. SKIPPED
>
> [INFO] Apache Tika server . SKIPPED
>
> [INFO] Apache Tika examples ... SKIPPED
>
> [INFO] Apache Tika Java-7 Components .. SKIPPED
>
> [INFO] Apache Tika eval ... SKIPPED
>
> [INFO] Apache Tika Deep Learning (powered by DL4J)  SKIPPED
>
> [INFO] Apache Tika Natural Language Processing  SKIPPED
>
> [INFO] Apache Tika  SKIPPED
>
> [INFO] 
> 
>
> [INFO] BUILD FAILURE
>
> [INFO] 
> 
>
> [INFO] Total time: 06:27 min
>
> [INFO] Finished at: 2018-05-24T09:04:59-07:00
>
> [INFO] Final Memory: 72M/1029M
>
> [INFO] 
> 
>
> [ERROR] Failed to execute goal org.apache.maven.plugins:
> maven-surefire-plugin:2.18.1:test (default-test) on project tika-parsers:
> There are test failures.
>
> [ERROR]
>
> [ERROR] Please refer to /Users/mattmann/tmp/tika2.0.0/
> tika-parsers/target/surefire-reports for the individual test results.
>
> [ERROR] -> [Help 1]
>
> [ERROR]
>
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
>
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>
> [ERROR]
>
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
>
> [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/
> MojoFailureException
>
> [ERROR]
>
> [ERROR] After correcting the problems, you can resume the build with the
> command
>
> [ERROR]   mvn  -rf :tika-parsers
>
>
>
> Keeps failing for me.
>
> nonas:tika2.0.0 mattmann$ java -version
>
> java version "1.8.0_144"
>
> Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>
> Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>
> nonas:tika2.0.0 mattmann$
>
>
>
> Any ideas?
>
>
>
> Cheers,
>
> Chris
>
>
>
>


Re: Branch_1x build broke?

2018-05-24 Thread Chris Mattmann
Thanks Dave, yes I have tesseract enabled and this is on my Mac Book.

Thanks for looking into it Dave…

 

Cheers,

Chris

 

 

 

From: "loo...@gmail.com" <loo...@gmail.com>
Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
Date: Thursday, May 24, 2018 at 11:34 AM
To: "dev@tika.apache.org" <dev@tika.apache.org>
Subject: Re: Branch_1x build broke?

 

Hey Chris,

 

This is happening to me with Tesseract enabled but only on my MacBook.

 

Are you running this on OSX?

 

Been trying to get some time to dig into it as it works perfectly on my

Windows and Linux setups.

 

Cheers,

Dave

 

 

 

On Thu, 24 May 2018, 17:09 Chris Mattmann, <mattm...@apache.org> wrote:

 

Tim,

 

 

 

Are you seeing this?

 

 

 

Results :

 

 

 

Failed tests:

 

 

PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103

pdf_haystack not found in:

 

http://www.w3.org/1999/xhtml;>

 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 



 

Outer_haystack

 

Outer_haystack

 



 



 

Outer_haystack

 



 

Outer_haystack

 



 

Outer_haystack

 



 



 



 



 

attached.pdf

 

dehayslack dehaystack dehayslack

dehaystack dehaystack dehaystack pd'

 

 

 



 



 



 

 

 



 

 

 



 

 

 

Haystack

 

 

 

Needle

 

 

 

Haystack

 

 

 



 

 

 



 

 

 



 

 

 



 

 

 



 



 

 

 

Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30

 

 

 

[INFO]



 

[INFO] Reactor Summary:

 

[INFO]

 

[INFO] Apache Tika parent . SUCCESS [

1.565 s]

 

[INFO] Apache Tika core ... SUCCESS [

32.977 s]

 

[INFO] Apache Tika parsers  FAILURE [05:52

min]

 

[INFO] Apache Tika XMP  SKIPPED

 

[INFO] Apache Tika serialization .. SKIPPED

 

[INFO] Apache Tika batch .. SKIPPED

 

[INFO] Apache Tika language detection . SKIPPED

 

[INFO] Apache Tika application  SKIPPED

 

[INFO] Apache Tika OSGi bundle  SKIPPED

 

[INFO] Apache Tika translate .. SKIPPED

 

[INFO] Apache Tika server . SKIPPED

 

[INFO] Apache Tika examples ... SKIPPED

 

[INFO] Apache Tika Java-7 Components .. SKIPPED

 

[INFO] Apache Tika eval ... SKIPPED

 

[INFO] Apache Tika Deep Learning (powered by DL4J)  SKIPPED

 

[INFO] Apache Tika Natural Language Processing  SKIPPED

 

[INFO] Apache Tika  SKIPPED

 

[INFO]



 

[INFO] BUILD FAILURE

 

[INFO]



 

[INFO] Total time: 06:27 min

 

[INFO] Finished at: 2018-05-24T09:04:59-07:00

 

[INFO] Final Memory: 72M/1029M

 

[INFO]



 

[ERROR] Failed to execute goal

org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test)

on project tika-parsers: There are test failures.

 

[ERROR]

 

[ERROR] Please refer to

/Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the

individual test results.

 

[ERROR] -> [Help 1]

 

[ERROR]

 

[ERROR] To see the full stack trace of the errors, re-run Maven with the

-e switch.

 

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

 

[ERROR]

 

[ERROR] For more information about the errors and possible solutions,

please read the following articles:

 

[ERROR] [Help 1]

http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

 

[ERROR]

 

[ERROR] After correcting the problems, you can resume the build with the

command

 

[ERROR]   mvn  -rf :tika-parsers

 

 

 

Keeps failing for me.

 

nonas:tika2.0.0 mattmann$ java -version

 

java version "1.8.0_144"

 

Java(TM) SE Runtime Environment (build 1.8.0_144-b01)

 

Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

 

nonas:tika2.0.0 mattmann$

 

 

 

Any ideas?

 

 

 

Cheers,

 

Chris

 

 

 

 

 



Re: Branch_1x build broke?

2018-05-24 Thread loompa
Hey Chris,

This is happening to me with Tesseract enabled but only on my MacBook.

Are you running this on OSX?

Been trying to get some time to dig into it as it works perfectly on my
Windows and Linux setups.

Cheers,
Dave



On Thu, 24 May 2018, 17:09 Chris Mattmann,  wrote:

> Tim,
>
>
>
> Are you seeing this?
>
>
>
> Results :
>
>
>
> Failed tests:
>
>
> PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103
> pdf_haystack not found in:
>
> http://www.w3.org/1999/xhtml;>
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
>  content="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
> />
>
> 
>
>  content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" />
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> Outer_haystack
>
> Outer_haystack
>
> 
>
> 
>
> Outer_haystack
>
> 
>
> Outer_haystack
>
> 
>
> Outer_haystack
>
> 
>
> 
>
> 
>
> 
>
> attached.pdf
>
> dehayslack dehaystack dehayslack
> dehaystack dehaystack dehaystack pd'
>
>
>
> 
>
> 
>
> 
>
>
>
> 
>
>
>
> 
>
>
>
> Haystack
>
>
>
> Needle
>
>
>
> Haystack
>
>
>
> 
>
>
>
> 
>
>
>
> 
>
>
>
> 
>
>
>
> 
>
> 
>
>
>
> Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30
>
>
>
> [INFO]
> 
>
> [INFO] Reactor Summary:
>
> [INFO]
>
> [INFO] Apache Tika parent . SUCCESS [
> 1.565 s]
>
> [INFO] Apache Tika core ... SUCCESS [
> 32.977 s]
>
> [INFO] Apache Tika parsers  FAILURE [05:52
> min]
>
> [INFO] Apache Tika XMP  SKIPPED
>
> [INFO] Apache Tika serialization .. SKIPPED
>
> [INFO] Apache Tika batch .. SKIPPED
>
> [INFO] Apache Tika language detection . SKIPPED
>
> [INFO] Apache Tika application  SKIPPED
>
> [INFO] Apache Tika OSGi bundle  SKIPPED
>
> [INFO] Apache Tika translate .. SKIPPED
>
> [INFO] Apache Tika server . SKIPPED
>
> [INFO] Apache Tika examples ... SKIPPED
>
> [INFO] Apache Tika Java-7 Components .. SKIPPED
>
> [INFO] Apache Tika eval ... SKIPPED
>
> [INFO] Apache Tika Deep Learning (powered by DL4J)  SKIPPED
>
> [INFO] Apache Tika Natural Language Processing  SKIPPED
>
> [INFO] Apache Tika  SKIPPED
>
> [INFO]
> 
>
> [INFO] BUILD FAILURE
>
> [INFO]
> 
>
> [INFO] Total time: 06:27 min
>
> [INFO] Finished at: 2018-05-24T09:04:59-07:00
>
> [INFO] Final Memory: 72M/1029M
>
> [INFO]
> 
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test)
> on project tika-parsers: There are test failures.
>
> [ERROR]
>
> [ERROR] Please refer to
> /Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the
> individual test results.
>
> [ERROR] -> [Help 1]
>
> [ERROR]
>
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
>
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>
> [ERROR]
>
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
>
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>
> [ERROR]
>
> [ERROR] After correcting the problems, you can resume the build with the
> command
>
> [ERROR]   mvn  -rf :tika-parsers
>
>
>
> Keeps failing for me.
>
> nonas:tika2.0.0 mattmann$ java -version
>
> java version "1.8.0_144"
>
> Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>
> Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>
> nonas:tika2.0.0 mattmann$
>
>
>
> Any ideas?
>
>
>
> Cheers,
>
> Chris
>
>
>
>


Branch_1x build broke?

2018-05-24 Thread Chris Mattmann
Tim,

 

Are you seeing this?

 

Results :

 

Failed tests: 

  PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103 
pdf_haystack not found in:

http://www.w3.org/1999/xhtml;>





















































































Outer_haystack

Outer_haystack





Outer_haystack



Outer_haystack



Outer_haystack









attached.pdf

dehayslack dehaystack dehayslack dehaystack 
dehaystack dehaystack pd'

 







 



 



 

Haystack

 

Needle

 

Haystack

 



 



 



 



 





 

Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30

 

[INFO] 

[INFO] Reactor Summary:

[INFO] 

[INFO] Apache Tika parent . SUCCESS [  1.565 s]

[INFO] Apache Tika core ... SUCCESS [ 32.977 s]

[INFO] Apache Tika parsers  FAILURE [05:52 min]

[INFO] Apache Tika XMP  SKIPPED

[INFO] Apache Tika serialization .. SKIPPED

[INFO] Apache Tika batch .. SKIPPED

[INFO] Apache Tika language detection . SKIPPED

[INFO] Apache Tika application  SKIPPED

[INFO] Apache Tika OSGi bundle  SKIPPED

[INFO] Apache Tika translate .. SKIPPED

[INFO] Apache Tika server . SKIPPED

[INFO] Apache Tika examples ... SKIPPED

[INFO] Apache Tika Java-7 Components .. SKIPPED

[INFO] Apache Tika eval ... SKIPPED

[INFO] Apache Tika Deep Learning (powered by DL4J)  SKIPPED

[INFO] Apache Tika Natural Language Processing  SKIPPED

[INFO] Apache Tika  SKIPPED

[INFO] 

[INFO] BUILD FAILURE

[INFO] 

[INFO] Total time: 06:27 min

[INFO] Finished at: 2018-05-24T09:04:59-07:00

[INFO] Final Memory: 72M/1029M

[INFO] 

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on 
project tika-parsers: There are test failures.

[ERROR] 

[ERROR] Please refer to 
/Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the 
individual test results.

[ERROR] -> [Help 1]

[ERROR] 

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

[ERROR] 

[ERROR] For more information about the errors and possible solutions, please 
read the following articles:

[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

[ERROR] 

[ERROR] After correcting the problems, you can resume the build with the command

[ERROR]   mvn  -rf :tika-parsers

 

Keeps failing for me.

nonas:tika2.0.0 mattmann$ java -version

java version "1.8.0_144"

Java(TM) SE Runtime Environment (build 1.8.0_144-b01)

Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

nonas:tika2.0.0 mattmann$ 

 

Any ideas?

 

Cheers,

Chris