Hello list,

We have been trying to crawl webpages and files at the same time (during 
a unique crawl). Our problem is that, depending on the configuration 
(which files, which sites, topN), sometimes Nutch will not index the 
files at all. Probably because their scores is lower to the scores of 
the webpages.

Can anybody confirm this problem? How would you go about it? One 
solution would be to perform one crawl with the files and one crawl with 
the webpages and then merge the indexes, but we would prefer to run it 
in a single crawl.

Thanks,
Renaud

-- 
Renaud Richardet
COO America
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
office +1 857 776-3195                     mobile +1 617 230 9112
renaud.richardet <at> wyona.com              http://www.wyona.com


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to