Hi, Please make sure you have no temp files in the same directory and try again Please either use the crawl script which is provided with nutch or alternatively build your own script.
On Sunday, February 17, 2013, 高睿 <[email protected]> wrote: > Hi, > Additional, the nutch version is 2.1. And I have an ParserFilter to purge outlinks of parse object. (by code: parse.setOutlinks(new Outlink[] {});) > > When I specify '-depth 1', the url is only crawled once, and If I specify '-depth 3', the url is crawled 3 times. > Is this expected behavior? Should I use command 'crawl' to do all works in one go? > > > > > > > > At 2013-02-17 22:11:22,"高睿" <[email protected]> wrote: >>Hi, >> >>There's only 1 url in table 'webpage'. I run command: bin/nutch crawl -solr http://localhost:8080/solr/collection2 -threads 10 -depth 2 -topN 10000, then I find the url is crawled twice. >> >>Here's the log: >> 55 2013-02-17 20:45:00,965 INFO fetcher.FetcherJob - fetching http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm >> 84 2013-02-17 20:45:11,021 INFO parse.ParserJob - Parsing http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm >>215 2013-02-17 20:45:38,922 INFO fetcher.FetcherJob - fetching http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm >>244 2013-02-17 20:45:46,031 INFO parse.ParserJob - Parsing http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm >> >>Do you know how to fix this? >>Besides, when I run the command again. The same log is written in hadoop.log. I don't know why the configuration 'db.fetch.interval.default' in nutch-site.xml doesn't take effect. >> >>Thanks. >> >>Regards, >>Rui > -- *Lewis*

