Hello list, We have been trying to crawl webpages and files at the same time (during a unique crawl). Our problem is that, depending on the configuration (which files, which sites, topN), sometimes Nutch will not index the files at all. Probably because their scores is lower to the scores of the webpages.
Can anybody confirm this problem? How would you go about it? One solution would be to perform one crawl with the files and one crawl with the webpages and then merge the indexes, but we would prefer to run it in a single crawl. Thanks, Renaud -- Renaud Richardet COO America Wyona Inc. - Open Source Content Management - Apache Lenya office +1 857 776-3195 mobile +1 617 230 9112 renaud.richardet <at> wyona.com http://www.wyona.com ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
