Re: help needed - adaptive refetch

Andrzej Bialecki Mon, 06 Mar 2006 12:56:37 -0800

D.Saravanaraj wrote:

hi,


after applying adaptive refetch patch to nutch mapred, for the first time i
called the crawl command as i have to initialize the crawldb...
the next time, i comment out the following lines in
org.apache.nutch.crawl.Crawl.java

if (fs.exists(dir)) {
         throw new RuntimeException(dir + " already exists.");
}

and

new Injector(job).inject(crawlDb, rootUrlDir);

But i find, the files are fetched even though they were nt modified. how to
use the same crawldb and using the same for further crawls in mapred
versions?

Are you using default settings? Are you sure the files are reallyfetched in full, or just their headers are fetched? I would need moreinformation...


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: help needed - adaptive refetch

Reply via email to