Hi Baptiste, ManifoldCF is not limited by the number of agents processes or parallel connectors. Overall database performance is the limiting factor.
I would read this: http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html Also, there's a section in ManifoldCF (I believe Chapter 2) that discusses this issue. Some five years ago, I successfully crawled 5 million web documents, using Postgresql 8.3. Postgresql 9.x is faster, and with modern SSD's, I expect that you will do even better. In general, I'd say it was fine to shoot for 10M - 100M documents on ManifoldCF, provided that you use a good database, and provided that you maintain it properly. Thanks, Karl On Wed, Sep 10, 2014 at 10:07 AM, Baptiste Berthier <[email protected]> wrote: > Hi > > I would like to know what is the maximum number of documents that you > managed to crawl with ManifoldCF and with how many connectors in parallel > it could works ? > > Thanks for your answer > > Baptiste >
