D.Saravanaraj wrote:
hi,

after applying adaptive refetch patch to nutch mapred, for the first time i
called the crawl command as i have to initialize the crawldb...
the next time, i comment out the following lines in
org.apache.nutch.crawl.Crawl.java

if (fs.exists(dir)) {
         throw new RuntimeException(dir + " already exists.");
}

and

new Injector(job).inject(crawlDb, rootUrlDir);

But i find, the files are fetched even though they were nt modified. how to
use the same crawldb and using the same for further crawls in mapred
versions?

Are you using default settings? Are you sure the files are really fetched in full, or just their headers are fetched? I would need more information...

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to