Hi,
Seems you hit a common issue when use SPI in OSGi, you can take a
look at [1] & [2] to get more background of this issue.
More important in Servicemix we use OSGiLocater to resolve this kind
of issue, take a look at Guillaume's blog[3] about it.
The smx OSGiLocater code is in[4], I think you need leverage smx
OSGiLocator to find the Serivice Providers.
[1]http://www.osgi.org/blog/2008/09/spi-in-osgi.html
[2]http://blog.tfd.co.uk/2011/12/13/osgi-and-spi/
[3]http://gnodet.blogspot.com/2008/05/jee-specs-in-osgi.html
[4]https://svn.apache.org/repos/asf/servicemix/smx4/specs/trunk/locator/
Freeman
On 2012-3-8, at 下午1:02, shin938 wrote:
Hi all
I have send this also to Tika mailing list but I have a feeling its
maybe a
servicemix issue.
we are using Tika in a web app to extract text from pdf documents with
no problem.
but now we wanted to move some of our code to run in servicemix and
Tika
doesn't work anymore. just produces empty results. there is no
exception
or anything, just empty result.
I think it all comes down to some classpath issue with servicemix
that I
can't understand how to fix.
I installed tika-bundle-1.0 and tika-core-1.0 into servicemix.
I'm invoking Tika from my bundle, actually a camel route.
this is the code I tried:
Tika tika = new Tika();
tika.setMaxStringLength(-1);
Metadata metadata = new Metadata();
Reader in = tika.parse(new
FileInputStream("/myopt/books/ejb-3_0-fr-spec-
persistence.pdf"),metadata);
org.apache.commons.io.IOUtils.copy(in, new
FileOutputStream("/tmp/tikatest/tika-text.txt"));
running the same code fragment outside of servicemix produces the
13000
lines of text.
debugging the code I found that
org.apache.tika.config.ServiceLoader#loadServiceProviders can't find
the
list of parsers in location
META-INF/services/org.apache.tika.parser.Parser although the file is
there. as a result the DefaultParser is initialized with no parsers
list, and when calling parse on Tika object then in
org.apache.tika.parser.CompositeParser#parse getParser(metadata)
returns
the EmptyParser.
I tried to put the files org.apache.tika.parser.Parser and
org.apache.tika.detect.Detector in my own bundle under the same
location
but didn't help.
I tried to provide Tika with a tika config file that looks like that:
<properties>
<mimeTypeRepository
resource="/org/apache/tika/mime/tika-mimetypes.xml" magic="false"/>
<parsers>
<parser name="parse-pdf"
class="org.apache.tika.parser.pdf.PDFParser">
<mime>application/pdf</mime>
</parser>
</parsers>
</properties>
and initialized tika:
tikaConfig = new TikaConfig(configFile);
tika = new Tika(tikaConfig);
but then I got an excetption saying that the pdf parser not found, the
class is there of course:
org.apache.tika.exception.TikaException: Configured parser class not
found: org.apache.tika.parser.pdf.PDFParser
Caused by: java.lang.ClassNotFoundException:
org.apache.tika.parser.pdf.PDFParser not found by
org.apache.tika.core [314]
at
org
.apache
.felix
.framework
.ModuleImpl.findClassOrResourceByDelegation(ModuleImpl.java:787)
at
org.apache.felix.framework.ModuleImpl.access$400(ModuleImpl.java:71)
at
org.apache.felix.framework.ModuleImpl
$ModuleClassLoader.loadClass(ModuleImpl.java:1768)
I tried installing tika-parsers into servicemix also, although I'm
sure
its not necessary as the tika-bundle contains them all. didn't help
either.
I tried installing the regular maven Tika jars with all their
dependencies into servicemix but I got the same result.
Thank you for any help.
Shalom
--
View this message in context:
http://servicemix.396122.n5.nabble.com/tika-in-servicemix-empty-result-parsers-not-found-tp5546463p5546463.html
Sent from the ServiceMix - User mailing list archive at Nabble.com.
---------------------------------------------
Freeman Fang
FuseSource
Email:[email protected]
Web: fusesource.com
Twitter: freemanfang
Blog: http://freemanfang.blogspot.com