Thank you freeman. you helped me identify the problem. at the end I'm sure that tika-bundle-1.0 is not working. I switched to tika-bundle-0.9 and it works in servicemix.
Thank you. On 03/08/12 08:04, Freeman Fang wrote: > Hi, > > Seems you hit a common issue when use SPI in OSGi, you can take a > look at [1] & [2] to get more background of this issue. > > More important in Servicemix we use OSGiLocater to resolve this kind > of issue, take a look at Guillaume's blog[3] about it. > > The smx OSGiLocater code is in[4], I think you need leverage smx > OSGiLocator to find the Serivice Providers. > > > [1]http://www.osgi.org/blog/2008/09/spi-in-osgi.html > [2]http://blog.tfd.co.uk/2011/12/13/osgi-and-spi/ > [3]http://gnodet.blogspot.com/2008/05/jee-specs-in-osgi.html > [4]https://svn.apache.org/repos/asf/servicemix/smx4/specs/trunk/locator/ > > Freeman > On 2012-3-8, at 下午1:02, shin938 wrote: > >> Hi all >> I have send this also to Tika mailing list but I have a feeling its >> maybe a >> servicemix issue. >> we are using Tika in a web app to extract text from pdf documents with >> no problem. >> but now we wanted to move some of our code to run in servicemix and Tika >> doesn't work anymore. just produces empty results. there is no exception >> or anything, just empty result. >> I think it all comes down to some classpath issue with servicemix that I >> can't understand how to fix. >> >> I installed tika-bundle-1.0 and tika-core-1.0 into servicemix. >> I'm invoking Tika from my bundle, actually a camel route. >> >> this is the code I tried: >> >> Tika tika = new Tika(); >> tika.setMaxStringLength(-1); >> Metadata metadata = new Metadata(); >> Reader in = tika.parse(new >> FileInputStream("/myopt/books/ejb-3_0-fr-spec-persistence.pdf"),metadata); >> >> org.apache.commons.io.IOUtils.copy(in, new >> FileOutputStream("/tmp/tikatest/tika-text.txt")); >> >> >> running the same code fragment outside of servicemix produces the 13000 >> lines of text. >> >> >> debugging the code I found that >> org.apache.tika.config.ServiceLoader#loadServiceProviders can't find the >> list of parsers in location >> META-INF/services/org.apache.tika.parser.Parser although the file is >> there. as a result the DefaultParser is initialized with no parsers >> list, and when calling parse on Tika object then in >> org.apache.tika.parser.CompositeParser#parse getParser(metadata) returns >> the EmptyParser. >> >> I tried to put the files org.apache.tika.parser.Parser and >> org.apache.tika.detect.Detector in my own bundle under the same location >> but didn't help. >> >> I tried to provide Tika with a tika config file that looks like that: >> <properties> >> >> <mimeTypeRepository >> resource="/org/apache/tika/mime/tika-mimetypes.xml" magic="false"/> >> >> <parsers> >> <parser name="parse-pdf" >> class="org.apache.tika.parser.pdf.PDFParser"> >> <mime>application/pdf</mime> >> </parser> >> </parsers> >> >> </properties> >> >> and initialized tika: >> tikaConfig = new TikaConfig(configFile); >> tika = new Tika(tikaConfig); >> >> but then I got an excetption saying that the pdf parser not found, the >> class is there of course: >> org.apache.tika.exception.TikaException: Configured parser class not >> found: org.apache.tika.parser.pdf.PDFParser >> Caused by: java.lang.ClassNotFoundException: >> org.apache.tika.parser.pdf.PDFParser not found by >> org.apache.tika.core [314] >> at >> org.apache.felix.framework.ModuleImpl.findClassOrResourceByDelegation(ModuleImpl.java:787) >> >> at >> org.apache.felix.framework.ModuleImpl.access$400(ModuleImpl.java:71) >> at >> org.apache.felix.framework.ModuleImpl$ModuleClassLoader.loadClass(ModuleImpl.java:1768) >> >> >> >> I tried installing tika-parsers into servicemix also, although I'm sure >> its not necessary as the tika-bundle contains them all. didn't help >> either. >> >> I tried installing the regular maven Tika jars with all their >> dependencies into servicemix but I got the same result. >> >> >> Thank you for any help. >> >> Shalom >> >> -- >> View this message in context: >> http://servicemix.396122.n5.nabble.com/tika-in-servicemix-empty-result-parsers-not-found-tp5546463p5546463.html >> Sent from the ServiceMix - User mailing list archive at Nabble.com. > > --------------------------------------------- > Freeman Fang > > FuseSource > Email:[email protected] > Web: fusesource.com > Twitter: freemanfang > Blog: http://freemanfang.blogspot.com > > > > > > > > > >
