Hello list,
We have been trying to crawl webpages and files at the same time (during
a unique crawl). Our problem is that, depending on the configuration
(which files, which sites, topN), sometimes Nutch will not index the
files at all. Probably because their scores is lower to the scores of
the webpages.
Can anybody confirm this problem? How would you go about it? One
solution would be to perform one crawl with the files and one crawl with
the webpages and then merge the indexes, but we would prefer to run it
in a single crawl.
Thanks,
Renaud
--
Renaud Richardet
COO America
Wyona Inc. - Open Source Content Management - Apache Lenya
office +1 857 776-3195 mobile +1 617 230 9112
renaud.richardet <at> wyona.com http://www.wyona.com