Hi Baptiste,

ManifoldCF is not limited by the number of agents processes or parallel
connectors.  Overall database performance is the limiting factor.

I would read this:

http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html

Also, there's a section in ManifoldCF (I believe Chapter 2) that discusses
this issue.

Some five years ago, I successfully crawled 5 million web documents, using
Postgresql 8.3.  Postgresql 9.x is faster, and with modern SSD's, I expect
that you will do even better.  In general, I'd say it was fine to shoot for
10M - 100M documents on ManifoldCF, provided that you use a good database,
and provided that you maintain it properly.

Thanks,
Karl





On Wed, Sep 10, 2014 at 10:07 AM, Baptiste Berthier <[email protected]>
wrote:

> Hi
>
> I would like to know what is the maximum number of documents that you
> managed to crawl with ManifoldCF and with how many connectors in parallel
> it could works ?
>
> Thanks for your answer
>
> Baptiste
>

Reply via email to