I found my issue.  I need to include JARs off: \solr\contrib\extraction\lib\

Steve

On Tue, Feb 2, 2016 at 4:24 PM, Steven White <swhite4...@gmail.com> wrote:

> I'm not using solr-app.jar.  I need to stick with Tika JARs that come with
> Solr 5.2 and yet get the full text extraction feature of Tika (all file
> types it supports).
>
> At first, I started to include Tika JARs as needed; I now have all Tika
> related JARs that come with Solr and yet it is not working.  Here is the
> list: tika-core-1.7.jar, tika-java7-1.7.jar, tika-parsers-1.7.jar,
> tika-xmp-1.7.jar,
> vorbis-java-tika-0.6.jar, kite-morphlines-tika-core-0.12.1.jar
> and kite-morphlines-tika-decompress-0.12.1.jar.  As part of my program, I
> also have SolrJ JARs and their dependency: solr-solrj-5.2.1.jar,
> solr-core-5.2.1.jar, etc.
>
> You said "Might not have the parsers on your path within your Solr
> framework?".  I"m using Tika outside Solr framework.  I'm trying to use
> Tika from my own crawler application that uses SojrJ to send the raw text
> to Solr for indexing.
>
> What is it that I am missing?!
>
> Steve
>
> On Tue, Feb 2, 2016 at 3:03 PM, Allison, Timothy B. <talli...@mitre.org>
> wrote:
>
>> Might not have the parsers on your path within your Solr framework?
>>
>> Which tika jars are on your path?
>>
>> If you want the functionality of all of Tika, use the standalone
>> tika-app.jar, but do not use the app in the same JVM as Solr...without a
>> custom class loader.  The Solr team carefully prunes the dependencies when
>> integrating Tika and makes sure that the main parsers _just work_.
>>
>>
>> -----Original Message-----
>> From: Steven White [mailto:swhite4...@gmail.com]
>> Sent: Tuesday, February 02, 2016 2:53 PM
>> To: solr-user@lucene.apache.org
>> Subject: Using Tika that comes with Solr 5.2
>>
>> Hi,
>>
>> I'm trying to use Tika that comes with Solr 5.2.  The following code is
>> not
>> working:
>>
>> public static void parseWithTika() throws Exception {
>>     File file = new File("C:\\temp\\test.pdf");
>>
>>     FileInputStream in = new FileInputStream(file);
>>     AutoDetectParser parser = new AutoDetectParser();
>>     Metadata metadata = new Metadata();
>>     metadata.add(Metadata.RESOURCE_NAME_KEY, file.getName());
>>     BodyContentHandler contentHandler = new BodyContentHandler();
>>
>>     parser.parse(in, contentHandler, metadata);
>>
>>     String content = contentHandler.toString();   <=== 'content' is always
>> empty
>>
>>     in.close();
>> }
>>
>> 'content' is always empty string unless when the file I pass to Tika is a
>> text file.  Any idea what's the issue?
>>
>> I have also tried sample codes off
>> https://tika.apache.org/1.8/examples.html
>> with the same result.
>>
>>
>> Thanks !!
>>
>> Steve
>>
>
>

Reply via email to