Hi Madalina,

1200 documents is quite small, and MCF would typically crawl that in a few
minutes at most.  My guess is that some of those 1200 documents are very
very large, and that's potentially causing a problem with a network switch
somewhere.  If you could look at the ManifoldCF log file, you may get some
insights as to what the issue is.

Karl


On Mon, Mar 9, 2015 at 8:28 AM, Madalina R <[email protected]> wrote:

> I need to crawl some SharePoint 2010 site collections that contain 150 GB
> of documents. I will have filters in place for the types of documents that
> need to be crawled (mostly Office documents).
>
> I am now trying to configure the Manifold job, but the only way for it to
> not trigger "Aborted - service interruptions" errors and freeze is to
> have 2 maximum connections on the Repository connection. The Output is
> currently Null. I am running the multiprocess file example process (on
> Jetty, not Tomcat).
> However this is too slow, it takes 5 hours to process a test site
> collection with 1200 docs that together are less than 500MB.
>
> What can I do to improve the speed? Are there some settings that I am
> maybe missing to configure correctly?
>
> Thank you!
>

Reply via email to