Re: jackrabbit, lucene, tika ... and pdfbox

Kevin Jansz Thu, 10 Mar 2011 17:09:51 -0800

> Your view is correct. The idea is to avoid direct parser class references in
> jackrabbit-core and just rely on the service provider loader mechanism in
> Tika to pick up all the available parsers.
>
> We also decided to move the tika-parsers dependency from jackrabbit-core to
> deployment packages like jackrabbit-webapp and jackrabbit-standalone. This
> should make it even easier for people to set up custom deployments with few
> or no parser libraries.


that's brilliant, thanks for clarifying ...

Regards,
Kevin

--
Kevin Jansz
[email protected]
Level 7, 10-16 Queen Street, Melbourne 3000 Australia
Tel +61 3 9621 2773 | Fax +61 3 9621 2776
Exari Systems
Boston | London | Melbourne | Munich
www.exari.com

Test drive our software online - www.exari.com/demo-trial.html
Read our blog on document assembly - blog.exari.com




On 10 March 2011 20:27, Jukka Zitting <[email protected]> wrote:
> Hi,
>
> On 03/09/2011 04:51 AM, Kevin Jansz wrote:
>>
>> It's not a huge issue I guess as it seems with tika 0.9 (or 0.8.1?)
>> the PDF parser issue will be resolved in which case I expect the
>> code in org.apache.jackrabbit.core.query.pdf.* will disappear along
>> with reference to it from the tika-config.xml.
>
> Yes, that's what we've already done in trunk.
>
>> I'm taking the time to mention it here in case it saves someone time
>> and also to gauge if our view of lucene, tika and the parsers is
>> incorrect - that future releases of jackrabbit may still include
>> parsers other than DefaultParser and EmptyParser in it's
>> tika-config.xml.
>
> Your view is correct. The idea is to avoid direct parser class references in
> jackrabbit-core and just rely on the service provider loader mechanism in
> Tika to pick up all the available parsers.
>
> We also decided to move the tika-parsers dependency from jackrabbit-core to
> deployment packages like jackrabbit-webapp and jackrabbit-standalone. This
> should make it even easier for people to set up custom deployments with few
> or no parser libraries.
>
> --
> Jukka Zitting
>

Re: jackrabbit, lucene, tika ... and pdfbox

Reply via email to