Re: Opinions on ExtractingRequestHandler

2018-02-08 Thread Charlie Hull

On 08/02/2018 11:47, Frederik Van Hoyweghen wrote:

Hey everyone,

What are your experiences on making (in production) use of Solr's
ExtractingRequestHandler?

I've been reading some mixed remarks so I was wondering what your actual
experiences with it are.

Personally, I feel like setting up a separate service which is solely
responsible for parsing file contents (to be indexed by Solr later on in
the process) using Tika is a safer approach, so we can use whatever Tika
version we want along with other things we might want to add.


Yes, do this. It's entirely possible to bring down Tika with a nasty 
PDF, or end up consuming lots of resources in the extraction step and 
have these impact your Solr server. Run it separately and you can 
monitor it/kill it if necessary.


You might like my colleague Matt Pearce's DropWizard wrapper for Tika 
https://github.com/mattflax/dropwizard-tika-server


Cheers

Charlie


Looking forward to your response!

Kind regards,
Frederik




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Opinions on ExtractingRequestHandler

2018-02-08 Thread Sreenivas.T
Frederik,

We have also used separate service, which uses tika & then use solrj to
index the content.
The main reason, why we went for this approach is to have flexibility to
manipulate/transform data over and above what tika does.

What I understand is that, if there is no other transformation needed
"ExtractingRequestHandler"
should be fine in production too.

Regards,
Sreenivas

On 8 February 2018 at 17:17, Frederik Van Hoyweghen <
frederik.vanhoyweg...@chapoo.com> wrote:

> Hey everyone,
>
> What are your experiences on making (in production) use of Solr's
> ExtractingRequestHandler?
>
> I've been reading some mixed remarks so I was wondering what your actual
> experiences with it are.
>
> Personally, I feel like setting up a separate service which is solely
> responsible for parsing file contents (to be indexed by Solr later on in
> the process) using Tika is a safer approach, so we can use whatever Tika
> version we want along with other things we might want to add.
>
> Looking forward to your response!
>
> Kind regards,
> Frederik
>


Opinions on ExtractingRequestHandler

2018-02-08 Thread Frederik Van Hoyweghen
Hey everyone,

What are your experiences on making (in production) use of Solr's
ExtractingRequestHandler?

I've been reading some mixed remarks so I was wondering what your actual
experiences with it are.

Personally, I feel like setting up a separate service which is solely
responsible for parsing file contents (to be indexed by Solr later on in
the process) using Tika is a safer approach, so we can use whatever Tika
version we want along with other things we might want to add.

Looking forward to your response!

Kind regards,
Frederik