Hi,

There's only 1 url in table 'webpage'. I run command: bin/nutch crawl -solr 
http://localhost:8080/solr/collection2 -threads 10 -depth 2 -topN 10000, then I 
find the url is crawled twice.

Here's the log:
 55 2013-02-17 20:45:00,965 INFO  fetcher.FetcherJob - fetching 
http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm
 84 2013-02-17 20:45:11,021 INFO  parse.ParserJob - Parsing 
http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm
215 2013-02-17 20:45:38,922 INFO  fetcher.FetcherJob - fetching 
http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm
244 2013-02-17 20:45:46,031 INFO  parse.ParserJob - Parsing 
http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm

Do you know how to fix this?
Besides, when I run the command again. The same log is written in hadoop.log. I 
don't know why the configuration 'db.fetch.interval.default' in nutch-site.xml 
doesn't take effect.

Thanks.

Regards,
Rui

Reply via email to