On 3/25/2016 5:44 AM, Moncif Aidi wrote: > Im Using solr 5.4.1 for indexing thousands of documents, and it works > perfectly.The issue comes when some documents are not well formatted or > contains some special characters and it makes solr hangs or blocked on some > perticular documents and it gives these errors when viewing the log : > i want to detect what files are causing these problems, or at least point > me to some library Im missing. Thanks in advance
Tika is known for problems like this, particularly with PDF and Microsoft Office documents. This is one of the hazards of running with the Tika application built into Solr's Extracting Request Handler. You can't get any good information out of Solr about what went wrong, and any severe problems with Tika might actually cause Solr to completely crash. If you're going to use Tika for production indexing, you should write a Java program using SolrJ and Tika so that you are in complete control, and so Solr isn't unstable. Thanks, Shawn