I need to crawl some SharePoint 2010 site collections that contain 150 GB
of documents. I will have filters in place for the types of documents that
need to be crawled (mostly Office documents).

I am now trying to configure the Manifold job, but the only way for it to
not trigger "Aborted - service interruptions" errors and freeze is to have
2 maximum connections on the Repository connection. The Output is currently
Null. I am running the multiprocess file example process (on Jetty, not
Tomcat).
However this is too slow, it takes 5 hours to process a test site
collection with 1200 docs that together are less than 500MB.

What can I do to improve the speed? Are there some settings that I am maybe
missing to configure correctly?

Thank you!

Reply via email to