Hi Madalina, 1200 documents is quite small, and MCF would typically crawl that in a few minutes at most. My guess is that some of those 1200 documents are very very large, and that's potentially causing a problem with a network switch somewhere. If you could look at the ManifoldCF log file, you may get some insights as to what the issue is.
Karl On Mon, Mar 9, 2015 at 8:28 AM, Madalina R <[email protected]> wrote: > I need to crawl some SharePoint 2010 site collections that contain 150 GB > of documents. I will have filters in place for the types of documents that > need to be crawled (mostly Office documents). > > I am now trying to configure the Manifold job, but the only way for it to > not trigger "Aborted - service interruptions" errors and freeze is to > have 2 maximum connections on the Repository connection. The Output is > currently Null. I am running the multiprocess file example process (on > Jetty, not Tomcat). > However this is too slow, it takes 5 hours to process a test site > collection with 1200 docs that together are less than 500MB. > > What can I do to improve the speed? Are there some settings that I am > maybe missing to configure correctly? > > Thank you! >
