On 07/03/2018 09:32, lala wrote:
Thanks for your reply Erick,

Actually I am using Solrj to index files among other operations with Solr,
but to index a large amount of differesnt kinds of file, I'm sending a DIH
request to Solr using Solrj API : FileListEntityProcessor with
Why not benefit from this technology if Solr offers it? It simplifies our
work tremendosely...

It may simplify your work, but it isn't good practice. Tika has some heavy lifting to do to extract text from some formats and you should consider how this load will affect Solr. We've often put Tika into a different process for this reason.

Isn't there any way to be able to extract inline images in PDF docs??

https://stackoverflow.com/questions/31303735/how-to-extract-images-from-a-file-using-apache-tika has some useful suggestions.


Waiting your reply, best regards...

Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Reply via email to