D.Saravanaraj wrote:
hi,after applying adaptive refetch patch to nutch mapred, for the first time i called the crawl command as i have to initialize the crawldb... the next time, i comment out the following lines in org.apache.nutch.crawl.Crawl.java if (fs.exists(dir)) { throw new RuntimeException(dir + " already exists."); } and new Injector(job).inject(crawlDb, rootUrlDir); But i find, the files are fetched even though they were nt modified. how to use the same crawldb and using the same for further crawls in mapred versions?
Are you using default settings? Are you sure the files are really fetched in full, or just their headers are fetched? I would need more information...
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
