I don't that you can use nutch crawl command to do that, this is a one stop shop command. You probably want to use individual commands. Type nutch generate to get the help and you will see the option -adddays, read that page on the wiki to get a feel how you should do: http://wiki.apache.org/nutch/Crawl
2009/12/9 Peters, Vijaya <[email protected]> > I didn't see a setting to override in crawl-urlfilter. How do I set > numberDays? I have regular expressions to include/exclude certain extensions > and certain urls, but that's all I have in there. > > Please send me an example and I'll give it a try. > > Thanks! > > Vijaya Peters > SRA International, Inc. > 4350 Fair Lakes Court North > Room 4004 > Fairfax, VA 22033 > Tel: 703-502-1184 > > www.sra.com > Named to FORTUNE's "100 Best Companies to Work For" list for 10 consecutive > years > P Please consider the environment before printing this e-mail > This electronic message transmission contains information from SRA > International, Inc. which may be confidential, privileged or proprietary. > The information is intended for the use of the individual or entity named > above. If you are not the intended recipient, be aware that any disclosure, > copying, distribution, or use of the contents of this information is > strictly prohibited. If you have received this electronic information in > error, please notify us immediately by telephone at 866-584-2143. > > -----Original Message----- > From: xiao yang [mailto:[email protected]] > Sent: Wednesday, December 09, 2009 1:41 PM > To: [email protected] > Subject: Re: how to force nutch to do a recrawl > > What about the configuration in crawl-urlfilter.txt? > > On Thu, Dec 10, 2009 at 2:29 AM, Peters, Vijaya <[email protected]> > wrote: > > I tried that too. > > in Nutch-site.xml, I added in the below, but this had no effect. > > > > <property> > > <name>db.default.fetch.interval</name> > > <value>0</value> > > <description>(DEPRECATED) The default number of days between re-fetches > of a page. value was 30 > > </description> > > </property> > > > > <property> > > <name>db.fetch.interval.default</name> > > <value>3600</value> > > <description>The default number of seconds between re-fetches of a page > (30 days). value was 2592000 (30 days) > > </description> > > </property> > > > > <property> > > <name>db.fetch.interval.max</name> > > <value>3600</value> > > <description>The maximum number of seconds between re-fetches of a page > > (90 days). After this period every page in the db will be re-tried, no > > matter what is its status. value was 7776000 > > </description> > > </property> > > > > Vijaya Peters > > SRA International, Inc. > > 4350 Fair Lakes Court North > > Room 4004 > > Fairfax, VA 22033 > > Tel: 703-502-1184 > > > > www.sra.com > > Named to FORTUNE's "100 Best Companies to Work For" list for 10 > consecutive years > > P Please consider the environment before printing this e-mail > > This electronic message transmission contains information from SRA > International, Inc. which may be confidential, privileged or proprietary. > The information is intended for the use of the individual or entity named > above. If you are not the intended recipient, be aware that any disclosure, > copying, distribution, or use of the contents of this information is > strictly prohibited. If you have received this electronic information in > error, please notify us immediately by telephone at 866-584-2143. > > > > -----Original Message----- > > From: MilleBii [mailto:[email protected]] > > Sent: Wednesday, December 09, 2009 1:27 PM > > To: [email protected] > > Subject: Re: how to force nutch to do a recrawl > > > > Nutch only recrawl every 30 days by default. So you set the numberDays > > adequately and it wil recrawl read nutch-default.xml to get the > > details > > > > 2009/12/9, xiao yang <[email protected]>: > >> What do you mean by "recrawl"? > >> Does the following command meets what you need? > >> bin/nutch crawl urls -dir crawl -depth 3 -topN 50 > >> Change the destination directory to a different one with the last crawl. > >> > >> On Thu, Dec 10, 2009 at 1:44 AM, Peters, Vijaya <[email protected]> > >> wrote: > >>> I'm running Nutch 1.0 in windows. How do I force Nutch to do a > complete > >>> recrawl? > >>> > >>> > >>> > >>> thanks, > >>> > >>> - Vijaya > >>> > >>> > >>> > >>> Vijaya Peters > >>> SRA International, Inc. > >>> 4350 Fair Lakes Court North > >>> Room 4004 > >>> Fairfax, VA 22033 > >>> Tel: 703-502-1184 > >>> > >>> www.sra.com <http://www.sra.com/> > >>> Named to FORTUNE's "100 Best Companies to Work For" list for 10 > >>> consecutive years > >>> > >>> P Please consider the environment before printing this e-mail > >>> > >>> This electronic message transmission contains information from SRA > >>> International, Inc. which may be confidential, privileged or > >>> proprietary. The information is intended for the use of the individual > >>> or entity named above. If you are not the intended recipient, be aware > >>> that any disclosure, copying, distribution, or use of the contents of > >>> this information is strictly prohibited. If you have received this > >>> electronic information in error, please notify us immediately by > >>> telephone at 866-584-2143. > >>> > >>> > >>> > >>> > >> > > > > > > -- > > -MilleBii- > > > -- -MilleBii-
