Re: fetch/parse twice?

Lewis John Mcgibbney Sun, 17 Feb 2013 17:54:02 -0800

Wherever your url directory is kept

On Sunday, February 17, 2013, 高睿 <[email protected]> wrote:
> Hi,
>
> What do you mean the same directory? '/tmp' or '${NUTCH_HOME}'?
>
>
>
>
>
>
>
>
> At 2013-02-18 00:45:00,"Lewis John Mcgibbney" <[email protected]>
wrote:
>>Hi,
>>Please make sure you have no temp files in the same directory and try
again
>>Please either use the crawl script which is provided with nutch or
>>alternatively build your own script.
>>
>>
>>On Sunday, February 17, 2013, 高睿 <[email protected]> wrote:
>>> Hi,
>>> Additional, the nutch version is 2.1. And I have an ParserFilter to
purge
>>outlinks of parse object. (by code: parse.setOutlinks(new Outlink[] {});)
>>>
>>> When I specify '-depth 1', the url is only crawled once, and If I
specify
>>'-depth 3', the url is crawled 3 times.
>>> Is this expected behavior? Should I use command 'crawl' to do all works
>>in one go?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 2013-02-17 22:11:22,"高睿" <[email protected]> wrote:
>>>>Hi,
>>>>
>>>>There's only 1 url in table 'webpage'. I run command: bin/nutch crawl
>>-solr http://localhost:8080/solr/collection2 -threads 10 -depth 2 -topN
>>10000, then I find the url is crawled twice.
>>>>
>>>>Here's the log:
>>>> 55 2013-02-17 20:45:00,965 INFO  fetcher.FetcherJob - fetching
>>http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm
>>>> 84 2013-02-17 20:45:11,021 INFO  parse.ParserJob - Parsing
>>http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm
>>>>215 2013-02-17 20:45:38,922 INFO  fetcher.FetcherJob - fetching
>>http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm
>>>>244 2013-02-17 20:45:46,031 INFO  parse.ParserJob - Parsing
>>http://www.p5w.net/stock/lzft/gsyj/201209/t4470475.htm
>>>>
>>>>Do you know how to fix this?
>>>>Besides, when I run the command again. The same log is written in
>>hadoop.log. I don't know why the configuration 'db.fetch.interval.default'
>>in nutch-site.xml doesn't take effect.
>>>>
>>>>Thanks.
>>>>
>>>>Regards,
>>>>Rui
>>>
>>
>>--
>>*Lewis*
>


-- 
*Lewis*

Re: fetch/parse twice?

Reply via email to