Problem with crawl and recrawl

José Mestre Mon, 01 Dec 2008 09:41:46 -0800

Hi,

I'm using nutch to index part of an intranet website.


When I use the "crawl" command the database indexes 3000 documents:
e.g.: nutch crawl urls -dir crawl -threads 200 -depth 3
But when I do the same with the separate "generate, fetch, ..." commands I just 
have 50 documents in the database:
e.g.: for example the crawl or recrawl script with adddays=31
http://wiki.apache.org/nutch/Crawl
http://wiki.apache.org/nutch/IntranetRecrawl
I've tried using  fetch with option -noAdditions

Do someone know why this happen ?

I think crawl-urlfilter.txt ' and 'regex-urlfilter.txt' are ok.

Regards.

Jo

Problem with crawl and recrawl

Reply via email to