Ok Karl, thanks for the tip and the quick response, we will do this and come back with the result.
On Wed, Nov 6, 2013 at 9:28 PM, Karl Wright <[email protected]> wrote: > Hi Ronny, > > One minor thing: you should need to set throttling to 2 ONLY for the > Windows repository connection, not for AD or Solr. > > > As for how to debug this issue, first off you should be looking in the > manifoldcf.log file (or the equivalent). You should see WARN messages from > the shared file connector under most conditions when there's a service > interruption. You would probably see "Read timed out" warnings if you > looked there, since that is what aborted the job run, along with a stack > trace. However, that's not going to add much information to the analysis > at this point. > > What might be valuable is to determine whether the problem is happening on > the Windows side or on the Solr side. At this point I can't tell. You > could, however, create a null output connection, and create a similar job > the sends its output there, and see if it completes. Can you do this and > get back to me? > > Thanks, > Karl > > > > > > On Wed, Nov 6, 2013 at 3:17 PM, Ronny Heylen <[email protected]>wrote: > >> Hi, >> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive with >> several hundred thousands documents. >> Doing only one manifoldcf job to index all the drive was always giving >> some kind of error, therefore to better understand where the problem can >> be, we made one job to index all *.doc*, another one for *.xls*, another >> one for *.pdf ... >> Using the help from the list (thanks!) we set the size limit to 100MB and >> all jobs succeeds (great) except the one for *.pptx >> The message is >> Error: Repeated service interruptions - failure processing document: Read >> timed out >> We don't find any error in the log we have searched: solr.log, ... >> Based on some indications found on Internet, we have set the Throttling >> max connections setting to 2 (instead of 10) in 3 places: >> output connection to SOLR >> authority connection to the Active Directory >> repository connection to the windows file share >> But the problem stays the same. >> We have tried on another machine with SOLR 4.5 and Manifoldcf 1.4, same >> problem. >> We can let run the job for all *.PDF, or all *.DOC*, or all *.XLS* >> without problem, but the same message comes always for *.PPTX. >> The last time the job stops with the message, it displays (not the same >> numbers for each run as the windows drive is changing) 56311 documents, >> with 17466 busy and 38847 processed. >> As we don't find anything in the log (but probably we don't look at the >> correct place), we don't know what to do. >> Thanks for your help, >> Ronny and Frédéric >> > >
