Re: Opinions on ExtractingRequestHandler
On 08/02/2018 11:47, Frederik Van Hoyweghen wrote: Hey everyone, What are your experiences on making (in production) use of Solr's ExtractingRequestHandler? I've been reading some mixed remarks so I was wondering what your actual experiences with it are. Personally, I feel like setting up a separate service which is solely responsible for parsing file contents (to be indexed by Solr later on in the process) using Tika is a safer approach, so we can use whatever Tika version we want along with other things we might want to add. Yes, do this. It's entirely possible to bring down Tika with a nasty PDF, or end up consuming lots of resources in the extraction step and have these impact your Solr server. Run it separately and you can monitor it/kill it if necessary. You might like my colleague Matt Pearce's DropWizard wrapper for Tika https://github.com/mattflax/dropwizard-tika-server Cheers Charlie Looking forward to your response! Kind regards, Frederik -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: Opinions on ExtractingRequestHandler
Frederik, We have also used separate service, which uses tika & then use solrj to index the content. The main reason, why we went for this approach is to have flexibility to manipulate/transform data over and above what tika does. What I understand is that, if there is no other transformation needed "ExtractingRequestHandler" should be fine in production too. Regards, Sreenivas On 8 February 2018 at 17:17, Frederik Van Hoyweghen < frederik.vanhoyweg...@chapoo.com> wrote: > Hey everyone, > > What are your experiences on making (in production) use of Solr's > ExtractingRequestHandler? > > I've been reading some mixed remarks so I was wondering what your actual > experiences with it are. > > Personally, I feel like setting up a separate service which is solely > responsible for parsing file contents (to be indexed by Solr later on in > the process) using Tika is a safer approach, so we can use whatever Tika > version we want along with other things we might want to add. > > Looking forward to your response! > > Kind regards, > Frederik >
Opinions on ExtractingRequestHandler
Hey everyone, What are your experiences on making (in production) use of Solr's ExtractingRequestHandler? I've been reading some mixed remarks so I was wondering what your actual experiences with it are. Personally, I feel like setting up a separate service which is solely responsible for parsing file contents (to be indexed by Solr later on in the process) using Tika is a safer approach, so we can use whatever Tika version we want along with other things we might want to add. Looking forward to your response! Kind regards, Frederik