Hi Shigeki,

What database is ManifoldCF configured to use in this case?  Do you
see any indication of slow queries in the ManifoldCF log?


Karl

On Fri, Jan 18, 2013 at 5:27 AM, Shigeki Kobayashi
<[email protected]> wrote:
> Hello
>
>
> I would like some advice to improve crawling time of new/updated files using
> Windows share connection.
>
> I crawl file in Windows server and index them into Solr.
>
> Currently, the second crawling of two hundred thousands files takes  over 5
> hours, even though any files are not updated, created, deleted.
>
> I assume MCF does the following processes (let me know if I am wrong)
>
> - obtain updated time of a file
> - compare the updated time with the one MCF obtained last time crawling(
> probably stored in DB)
> - if they are different MCF recognizes the file is to be indexed.
>
> If the above processes are done for two thousands files, what part of the
> processes could take time the most? obtaining updated time? reading data
> from DB? what could be done to increase the crawling time do you think?
>
> Please give me some advice.
>
>
> Regards,
>
> Shigeki
>
>

Reply via email to