Re: Problem with crawl and recrawl

Dennis Kubes Mon, 01 Dec 2008 09:48:08 -0800

When you do the generate, fetch commands, are you doing and updatedbcommand also and then multiple generate and fetch cycles? The depth 3parameter automates this on the crawl command.


Dennis


José Mestre wrote:

Hi,

I'm using nutch to index part of an intranet website.

When I use the "crawl" command the database indexes 3000 documents:
e.g.: nutch crawl urls -dir crawl -threads 200 -depth 3
But when I do the same with the separate "generate, fetch, ..." commands I just 
have 50 documents in the database:
e.g.: for example the crawl or recrawl script with adddays=31
http://wiki.apache.org/nutch/Crawl
http://wiki.apache.org/nutch/IntranetRecrawl
I've tried using  fetch with option -noAdditions

Do someone know why this happen ?

I think crawl-urlfilter.txt ' and 'regex-urlfilter.txt' are ok.

Regards.

Jo

Re: Problem with crawl and recrawl

Reply via email to