I found my issue. I need to include JARs off: \solr\contrib\extraction\lib\
Steve On Tue, Feb 2, 2016 at 4:24 PM, Steven White <swhite4...@gmail.com> wrote: > I'm not using solr-app.jar. I need to stick with Tika JARs that come with > Solr 5.2 and yet get the full text extraction feature of Tika (all file > types it supports). > > At first, I started to include Tika JARs as needed; I now have all Tika > related JARs that come with Solr and yet it is not working. Here is the > list: tika-core-1.7.jar, tika-java7-1.7.jar, tika-parsers-1.7.jar, > tika-xmp-1.7.jar, > vorbis-java-tika-0.6.jar, kite-morphlines-tika-core-0.12.1.jar > and kite-morphlines-tika-decompress-0.12.1.jar. As part of my program, I > also have SolrJ JARs and their dependency: solr-solrj-5.2.1.jar, > solr-core-5.2.1.jar, etc. > > You said "Might not have the parsers on your path within your Solr > framework?". I"m using Tika outside Solr framework. I'm trying to use > Tika from my own crawler application that uses SojrJ to send the raw text > to Solr for indexing. > > What is it that I am missing?! > > Steve > > On Tue, Feb 2, 2016 at 3:03 PM, Allison, Timothy B. <talli...@mitre.org> > wrote: > >> Might not have the parsers on your path within your Solr framework? >> >> Which tika jars are on your path? >> >> If you want the functionality of all of Tika, use the standalone >> tika-app.jar, but do not use the app in the same JVM as Solr...without a >> custom class loader. The Solr team carefully prunes the dependencies when >> integrating Tika and makes sure that the main parsers _just work_. >> >> >> -----Original Message----- >> From: Steven White [mailto:swhite4...@gmail.com] >> Sent: Tuesday, February 02, 2016 2:53 PM >> To: solr-user@lucene.apache.org >> Subject: Using Tika that comes with Solr 5.2 >> >> Hi, >> >> I'm trying to use Tika that comes with Solr 5.2. The following code is >> not >> working: >> >> public static void parseWithTika() throws Exception { >> File file = new File("C:\\temp\\test.pdf"); >> >> FileInputStream in = new FileInputStream(file); >> AutoDetectParser parser = new AutoDetectParser(); >> Metadata metadata = new Metadata(); >> metadata.add(Metadata.RESOURCE_NAME_KEY, file.getName()); >> BodyContentHandler contentHandler = new BodyContentHandler(); >> >> parser.parse(in, contentHandler, metadata); >> >> String content = contentHandler.toString(); <=== 'content' is always >> empty >> >> in.close(); >> } >> >> 'content' is always empty string unless when the file I pass to Tika is a >> text file. Any idea what's the issue? >> >> I have also tried sample codes off >> https://tika.apache.org/1.8/examples.html >> with the same result. >> >> >> Thanks !! >> >> Steve >> > >