What about the configuration in crawl-urlfilter.txt?
On Thu, Dec 10, 2009 at 2:29 AM, Peters, Vijaya <[email protected]> wrote: > I tried that too. > in Nutch-site.xml, I added in the below, but this had no effect. > > <property> > <name>db.default.fetch.interval</name> > <value>0</value> > <description>(DEPRECATED) The default number of days between re-fetches of a > page. value was 30 > </description> > </property> > > <property> > <name>db.fetch.interval.default</name> > <value>3600</value> > <description>The default number of seconds between re-fetches of a page (30 > days). value was 2592000 (30 days) > </description> > </property> > > <property> > <name>db.fetch.interval.max</name> > <value>3600</value> > <description>The maximum number of seconds between re-fetches of a page > (90 days). After this period every page in the db will be re-tried, no > matter what is its status. value was 7776000 > </description> > </property> > > Vijaya Peters > SRA International, Inc. > 4350 Fair Lakes Court North > Room 4004 > Fairfax, VA 22033 > Tel: 703-502-1184 > > www.sra.com > Named to FORTUNE's "100 Best Companies to Work For" list for 10 consecutive > years > P Please consider the environment before printing this e-mail > This electronic message transmission contains information from SRA > International, Inc. which may be confidential, privileged or proprietary. > The information is intended for the use of the individual or entity named > above. If you are not the intended recipient, be aware that any disclosure, > copying, distribution, or use of the contents of this information is strictly > prohibited. If you have received this electronic information in error, > please notify us immediately by telephone at 866-584-2143. > > -----Original Message----- > From: MilleBii [mailto:[email protected]] > Sent: Wednesday, December 09, 2009 1:27 PM > To: [email protected] > Subject: Re: how to force nutch to do a recrawl > > Nutch only recrawl every 30 days by default. So you set the numberDays > adequately and it wil recrawl read nutch-default.xml to get the > details > > 2009/12/9, xiao yang <[email protected]>: >> What do you mean by "recrawl"? >> Does the following command meets what you need? >> bin/nutch crawl urls -dir crawl -depth 3 -topN 50 >> Change the destination directory to a different one with the last crawl. >> >> On Thu, Dec 10, 2009 at 1:44 AM, Peters, Vijaya <[email protected]> >> wrote: >>> I'm running Nutch 1.0 in windows. How do I force Nutch to do a complete >>> recrawl? >>> >>> >>> >>> thanks, >>> >>> - Vijaya >>> >>> >>> >>> Vijaya Peters >>> SRA International, Inc. >>> 4350 Fair Lakes Court North >>> Room 4004 >>> Fairfax, VA 22033 >>> Tel: 703-502-1184 >>> >>> www.sra.com <http://www.sra.com/> >>> Named to FORTUNE's "100 Best Companies to Work For" list for 10 >>> consecutive years >>> >>> P Please consider the environment before printing this e-mail >>> >>> This electronic message transmission contains information from SRA >>> International, Inc. which may be confidential, privileged or >>> proprietary. The information is intended for the use of the individual >>> or entity named above. If you are not the intended recipient, be aware >>> that any disclosure, copying, distribution, or use of the contents of >>> this information is strictly prohibited. If you have received this >>> electronic information in error, please notify us immediately by >>> telephone at 866-584-2143. >>> >>> >>> >>> >> > > > -- > -MilleBii- >
