Ah okay thanks, so:

java -Dtika.config=/tmp/tika-config.xml -cp
/Users/aretter/Downloads/tika-core-1.10.jar:/Users/aretter/Downloads/tika-parsers-1.10.jar:/Users/aretter/Downloads/pdfbox-1.8.10.jar:/Users/aretter/Downloads/fontbox-1.8.10.jar:/Users/aretter/Downloads/jempbox-1.8.10.jar:/Users/aretter/.m2/repository/commons-logging/commons-logging/1.2/commons-logging-1.2.jar
ExtractTest

does now work.

However,

java -Dtika.config=/tmp/tika-config.xml -cp
/Users/aretter/Downloads/tika-core-1.10.jar:/Users/aretter/Downloads/tika-parsers-1.10.jar:/Users/aretter/Downloads/pdfbox-2.0.0-20151014.234027-1764.jar:/Users/aretter/Downloads/fontbox-2.0.0-20151014.233904-1817.jar:/Users/aretter/Downloads/jempbox-2.0.0-20140823.120514-532.jar:/Users/aretter/.m2/repository/commons-logging/commons-logging/1.2/commons-logging-1.2.jar
ExtractTest

Still returns no text from the PDF. Is it just the case that Tika
doesn't work with PDFBox version 2.0.0-SNAPSHOT?

My /tmp/tika-config.xml looks like:

<properties>
  <service-loader loadErrorHandler="THROW"/>
</properties>


I don't get any errors, exceptions or messages :-/


On 15 October 2015 at 17:25, Nick Burch <[email protected]> wrote:
> On Thu, 15 Oct 2015, Adam Retter wrote:
>>
>> java -cp
>> /Users/aretter/Downloads/tika-core-1.10.jar:/Users/aretter/Downloads/tika-parsers-1.10.jar:/Users/aretter/Downloads/pdfbox-1.8.10.jar
>> ExtractTest
>
>
> You probably need fontbox and jempbox as well. Ask maven nicely and it'll
> tell you what the dependencies are
>
> Have a look at the troubleshooting page too, lots of good advice there on
> missing parsers and dependencies
> http://wiki.apache.org/tika/Troubleshooting%20Tika
>
>> Also is there anyway to get Tika to complain or throw an exception if it
>> doesn't have the dependencies that it needs?
>
>
> Yes, you just need a Tika Config file that has WARN or THROW for load error
> handling, see
> http://tika.apache.org/1.10/configuring.html#Load_Error_Handling
>
> Nick



-- 
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Reply via email to