I need to crawl some SharePoint 2010 site collections that contain 150 GB of documents. I will have filters in place for the types of documents that need to be crawled (mostly Office documents).
I am now trying to configure the Manifold job, but the only way for it to not trigger "Aborted - service interruptions" errors and freeze is to have 2 maximum connections on the Repository connection. The Output is currently Null. I am running the multiprocess file example process (on Jetty, not Tomcat). However this is too slow, it takes 5 hours to process a test site collection with 1200 docs that together are less than 500MB. What can I do to improve the speed? Are there some settings that I am maybe missing to configure correctly? Thank you!
