Hi all we are using Tika in a web app to extract text from pdf documents with no problem. but now we wanted to move some of our code to run in servicemix and Tika doesn't work anymore. just produces empty results. there is no exception or anything, just empty result. I think it all comes down to some classpath issue with servicemix that I can't understand how to fix.
I installed tika-bundle-1.0 and tika-core-1.0 into servicemix. I'm invoking Tika from my bundle, actually a camel route. this is the code I tried: Tika tika = new Tika(); tika.setMaxStringLength(-1); Metadata metadata = new Metadata(); Reader in = tika.parse(new FileInputStream("/myopt/books/ejb-3_0-fr-spec-persistence.pdf"),metadata); org.apache.commons.io.IOUtils.copy(in, new FileOutputStream("/tmp/tikatest/tika-text.txt")); running the same code fragment outside of servicemix produces the 13000 lines of text. debugging the code I found that org.apache.tika.config.ServiceLoader#loadServiceProviders can't find the list of parsers in location META-INF/services/org.apache.tika.parser.Parser although the file is there. as a result the DefaultParser is initialized with no parsers list, and when calling parse on Tika object then in org.apache.tika.parser.CompositeParser#parse getParser(metadata) returns the EmptyParser. I tried to put the files org.apache.tika.parser.Parser and org.apache.tika.detect.Detector in my own bundle under the same location but didn't help. I tried to provide Tika with a tika config file that looks like that: <properties> <mimeTypeRepository resource="/org/apache/tika/mime/tika-mimetypes.xml" magic="false"/> <parsers> <parser name="parse-pdf" class="org.apache.tika.parser.pdf.PDFParser"> <mime>application/pdf</mime> </parser> </parsers> </properties> and initialized tika: tikaConfig = new TikaConfig(configFile); tika = new Tika(tikaConfig); but then I got an excetption saying that the pdf parser not found, the class is there of course: org.apache.tika.exception.TikaException: Configured parser class not found: org.apache.tika.parser.pdf.PDFParser Caused by: java.lang.ClassNotFoundException: org.apache.tika.parser.pdf.PDFParser not found by org.apache.tika.core [314] at org.apache.felix.framework.ModuleImpl.findClassOrResourceByDelegation(ModuleImpl.java:787) at org.apache.felix.framework.ModuleImpl.access$400(ModuleImpl.java:71) at org.apache.felix.framework.ModuleImpl$ModuleClassLoader.loadClass(ModuleImpl.java:1768) I tried installing tika-parsers into servicemix also, although I'm sure its not necessary as the tika-bundle contains them all. didn't help either. I tried installing the regular maven Tika jars with all their dependencies into servicemix but I got the same result. I've been trying all night to make it work, I ran out of ideas, Thank you for any help. Shalom