Got it. Is there any way we can increase the speed of the minimal crawl. Currently we are running one VM for manifold with 8 cores and 32 gb Ram. Postgres runs on another machine with a similar configuration. Have tuned the Postgres and Manifoldcf parameters as per the recommendations. We run a full vacuum once daily.
Would switching to a multi process configuration with manifoldcf running on two servers give a boost. Thanks, Gaurav On Saturday, February 9, 2019, Karl Wright <[email protected]> wrote: > It does the minimum necessary. That means it can't do it in less. If > this is a business requirement, then you should be angry with whoever made > this requirement. > > Share point doesn't give you the ability to grab all changes or added > documents up front. You have to crawl to discover them. That is how it > is built and mcf cannot change it. > > Karl > > On Fri, Feb 8, 2019, 2:14 PM Gaurav G <[email protected] wrote: > >> Hi Karl, >> >> Thanks for the response. We tried scheduling minimal crawl for 15 >> minutes. At the end of fifteen minutes it stops with about 3000 docs in >> processing state and takes about 20-25 mins to stop. Then the question >> becomes when to schedule the next crawl. And also in those 15 minutes would >> it have picked all the adds and updates first or could they be part of the >> 3000 docs which are still in processing state which would get picked in the >> next run. The number of docs that actually change in a 30 min period won't >> be more than 200. >> >> Being able to capture adds and updates in 30 minutes is a key business >> requirement. >> >> Thanks, >> Gaurav >> >> On Friday, February 8, 2019, Karl Wright <[email protected]> wrote: >> >>> Hi Guarav, >>> >>> The right way to do this is to schedule "minimal" crawls every 15 >>> minutes (which will process only the minimum needed to deal with adds and >>> updates), and periodically perform "full" crawls (which will also include >>> deletions). >>> >>> Thanks, >>> Karl >>> >>> >>> On Fri, Feb 8, 2019 at 10:11 AM Gaurav G <[email protected]> wrote: >>> >>>> Hi All, >>>> >>>> We're trying to crawl a Sharepoint repo with about 30000 docs. Ideally >>>> we would like to be able to synchronize changes with the repo within 30 >>>> minutes. We are scheduling incremental crawling on this. Our observation is >>>> that a full crawl takes about 60-75 minutes. So if we schedule the >>>> incremental crawl for 30 minutes, in what order would it process the >>>> changes. Would it first bring the adds and updates and then process the >>>> rest of the docs? What kind of logic is there in the incremental crawl? >>>> We also tried the Continuous crawl to achieve this. However somehow the >>>> continuous crawl was not picking up new documents. >>>> >>>> Thanks, >>>> Gaurav >>>> >>>
