Hi, Solr is 4.4, manifoldcf 1.3.
We are indexing a shared windows network drive, filtering on *.doc*, *.xls*, *.pdf ... with about 650,000 files to index, giving a SOLR index 35GB in size. The result is great except that the manifoldcf job crashes before the end. Note that: - ignoreTikaException is true in solrconfig.xml (otherwise the manifoldcf job stops very early). - tomcat has been given 24 GB of memory (it uses 15GB) - there are 8 cores Message in http://localhost:8080/mcf-crawler-ui/showjobstatus.jsp is: Error: Repeated service interruptions - failure processing document: Server at http://localhost:8080/solr/collection1 returned non ok status:500, message:Internal Server Error Then, instead of indexing the full drive in one job, we have defined one job for each subfolder. Almost all "subfolder" jobs end successfully, only for 2 or 3 we receive the same message, and for 2 or 3 other ones a different message: Error: Repeated service interruptions - failure processing document: Read timed out If we try to go further (defining one job for each subfolder of a subfolder in error), the same happens: success for almost all subfolders except 1 or 2. What is the first step to do to solve this problem? Thanks.
