Hi, I'm using nutch to index part of an intranet website.
When I use the "crawl" command the database indexes 3000 documents: e.g.: nutch crawl urls -dir crawl -threads 200 -depth 3 But when I do the same with the separate "generate, fetch, ..." commands I just have 50 documents in the database: e.g.: for example the crawl or recrawl script with adddays=31 http://wiki.apache.org/nutch/Crawl http://wiki.apache.org/nutch/IntranetRecrawl I've tried using fetch with option -noAdditions Do someone know why this happen ? I think crawl-urlfilter.txt ' and 'regex-urlfilter.txt' are ok. Regards. Jo
