Hi, I use the script and I've already tried line by line. Yes after the fetch I do an updatedb, and after I do a fetch again, ... as many fetch as depth value. I've tried using updatedb with -noAdditions option as mentioned in a script description but no success.
Regards. Jo -----Original Message----- From: Dennis Kubes [mailto:[EMAIL PROTECTED] Sent: lundi 1 décembre 2008 18:48 To: [email protected] Subject: Re: Problem with crawl and recrawl When you do the generate, fetch commands, are you doing and updatedb command also and then multiple generate and fetch cycles? The depth 3 parameter automates this on the crawl command. Dennis José Mestre wrote: > Hi, > > I'm using nutch to index part of an intranet website. > > When I use the "crawl" command the database indexes 3000 documents: > e.g.: nutch crawl urls -dir crawl -threads 200 -depth 3 > But when I do the same with the separate "generate, fetch, ..." commands I > just have 50 documents in the database: > e.g.: for example the crawl or recrawl script with adddays=31 > http://wiki.apache.org/nutch/Crawl > http://wiki.apache.org/nutch/IntranetRecrawl > I've tried using fetch with option -noAdditions > > Do someone know why this happen ? > > I think crawl-urlfilter.txt ' and 'regex-urlfilter.txt' are ok. > > Regards. > > Jo > >
