Thank you freeman.
you helped me identify the problem.
at the end I'm sure that tika-bundle-1.0 is not working. I switched to
tika-bundle-0.9 and it works in servicemix.

Thank you.


On 03/08/12 08:04, Freeman Fang wrote:
> Hi,
>
> Seems you hit a common issue when use SPI in OSGi,  you can take a
> look at [1] & [2] to get more background of this issue.
>
> More important in Servicemix we use OSGiLocater to resolve this kind
> of issue, take a look at Guillaume's blog[3] about it.
>
> The smx OSGiLocater code is in[4],  I think you need  leverage smx
> OSGiLocator to find the Serivice Providers.
>
>
> [1]http://www.osgi.org/blog/2008/09/spi-in-osgi.html
> [2]http://blog.tfd.co.uk/2011/12/13/osgi-and-spi/
> [3]http://gnodet.blogspot.com/2008/05/jee-specs-in-osgi.html
> [4]https://svn.apache.org/repos/asf/servicemix/smx4/specs/trunk/locator/
>
> Freeman
> On 2012-3-8, at 下午1:02, shin938 wrote:
>
>> Hi all
>> I have send this also to Tika mailing list but I have a feeling its
>> maybe a
>> servicemix issue.
>> we are using Tika in a web app to extract text from pdf documents with
>> no problem.
>> but now we wanted to move some of our code to run in servicemix and Tika
>> doesn't work anymore. just produces empty results. there is no exception
>> or anything, just empty result.
>> I think it all comes down to some classpath issue with servicemix that I
>> can't understand how to fix.
>>
>> I installed tika-bundle-1.0 and tika-core-1.0 into servicemix.
>> I'm invoking Tika from my bundle, actually a camel route.
>>
>> this is the code I tried:
>>
>>            Tika tika = new Tika();
>>            tika.setMaxStringLength(-1);
>>            Metadata metadata = new Metadata();
>>            Reader in = tika.parse(new
>> FileInputStream("/myopt/books/ejb-3_0-fr-spec-persistence.pdf"),metadata);
>>
>>            org.apache.commons.io.IOUtils.copy(in, new
>> FileOutputStream("/tmp/tikatest/tika-text.txt"));
>>
>>
>> running the same code fragment outside of servicemix produces the 13000
>> lines of text.
>>
>>
>> debugging the code I found that
>> org.apache.tika.config.ServiceLoader#loadServiceProviders can't find the
>> list of parsers in location
>> META-INF/services/org.apache.tika.parser.Parser although the file is
>> there. as a result the DefaultParser is initialized with no parsers
>> list, and when calling parse on Tika object then in
>> org.apache.tika.parser.CompositeParser#parse getParser(metadata) returns
>> the EmptyParser.
>>
>> I tried to put the files org.apache.tika.parser.Parser and
>> org.apache.tika.detect.Detector in my own bundle under the same location
>> but didn't help.
>>
>> I tried to provide Tika with a tika config file that looks like that:
>> <properties>
>>
>>    <mimeTypeRepository
>> resource="/org/apache/tika/mime/tika-mimetypes.xml" magic="false"/>
>>
>>    <parsers>
>>        <parser name="parse-pdf"
>> class="org.apache.tika.parser.pdf.PDFParser">
>>                <mime>application/pdf</mime>
>>        </parser>
>>    </parsers>
>>
>> </properties>
>>
>> and initialized tika:
>> tikaConfig = new TikaConfig(configFile);
>> tika = new Tika(tikaConfig);
>>
>> but then I got an excetption saying that the pdf parser not found, the
>> class is there of course:
>> org.apache.tika.exception.TikaException: Configured parser class not
>> found: org.apache.tika.parser.pdf.PDFParser
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.tika.parser.pdf.PDFParser not found by
>> org.apache.tika.core [314]
>>        at
>> org.apache.felix.framework.ModuleImpl.findClassOrResourceByDelegation(ModuleImpl.java:787)
>>
>>        at
>> org.apache.felix.framework.ModuleImpl.access$400(ModuleImpl.java:71)
>>        at
>> org.apache.felix.framework.ModuleImpl$ModuleClassLoader.loadClass(ModuleImpl.java:1768)
>>
>>
>>
>> I tried installing tika-parsers into servicemix also, although I'm sure
>> its not necessary as the tika-bundle contains them all. didn't help
>> either.
>>
>> I tried installing the regular maven Tika jars with all their
>> dependencies into servicemix but I got the same result.
>>
>>
>> Thank you for any help.
>>
>> Shalom
>>
>> -- 
>> View this message in context:
>> http://servicemix.396122.n5.nabble.com/tika-in-servicemix-empty-result-parsers-not-found-tp5546463p5546463.html
>> Sent from the ServiceMix - User mailing list archive at Nabble.com.
>
> ---------------------------------------------
> Freeman Fang
>
> FuseSource
> Email:[email protected]
> Web: fusesource.com
> Twitter: freemanfang
> Blog: http://freemanfang.blogspot.com
>
>
>
>
>
>
>
>
>
>

Reply via email to