Hello list,

We have been trying to crawl webpages and files at the same time (during a unique crawl). Our problem is that, depending on the configuration (which files, which sites, topN), sometimes Nutch will not index the files at all. Probably because their scores is lower to the scores of the webpages.

Can anybody confirm this problem? How would you go about it? One solution would be to perform one crawl with the files and one crawl with the webpages and then merge the indexes, but we would prefer to run it in a single crawl.

Thanks,
Renaud

--
Renaud Richardet
COO America
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
office +1 857 776-3195                     mobile +1 617 230 9112
renaud.richardet <at> wyona.com              http://www.wyona.com

Reply via email to