On 3/25/2016 5:44 AM, Moncif Aidi wrote:
> Im Using solr 5.4.1 for indexing thousands of documents, and it works
> perfectly.The issue comes when some documents are not well formatted or
> contains some special characters and it makes solr hangs or blocked on some
> perticular documents and it gives these errors when viewing the log :
> i want to detect what files are causing these problems, or at least point
> me to some library Im missing. Thanks in advance

Tika is known for problems like this, particularly with PDF and
Microsoft Office documents.

This is one of the hazards of running with the Tika application built
into Solr's Extracting Request Handler.  You can't get any good
information out of Solr about what went wrong, and any severe problems
with Tika might actually cause Solr to completely crash.

If you're going to use Tika for production indexing, you should write a
Java program using SolrJ and Tika so that you are in complete control,
and so Solr isn't unstable.

Thanks,
Shawn

Reply via email to