I don't that you can use nutch crawl command to do that, this is a one stop
shop command.
You probably want to use individual commands.
Type nutch generate to get the help and you will see the option -adddays,
read that page on the wiki to get a feel how you should do:
http://wiki.apache.org/nutch/Crawl

2009/12/9 Peters, Vijaya <[email protected]>

> I didn't see a setting to override in crawl-urlfilter.  How do I set
> numberDays? I have regular expressions to include/exclude certain extensions
> and certain urls, but that's all I have in there.
>
> Please send me an example and I'll give it a try.
>
> Thanks!
>
> Vijaya Peters
> SRA International, Inc.
> 4350 Fair Lakes Court North
> Room 4004
> Fairfax, VA  22033
> Tel:  703-502-1184
>
> www.sra.com
> Named to FORTUNE's "100 Best Companies to Work For" list for 10 consecutive
> years
> P Please consider the environment before printing this e-mail
> This electronic message transmission contains information from SRA
> International, Inc. which may be confidential, privileged or proprietary.
>  The information is intended for the use of the individual or entity named
> above.  If you are not the intended recipient, be aware that any disclosure,
> copying, distribution, or use of the contents of this information is
> strictly prohibited.  If you have received this electronic information in
> error, please notify us immediately by telephone at 866-584-2143.
>
> -----Original Message-----
> From: xiao yang [mailto:[email protected]]
> Sent: Wednesday, December 09, 2009 1:41 PM
> To: [email protected]
> Subject: Re: how to force nutch to do a recrawl
>
> What about the configuration in crawl-urlfilter.txt?
>
> On Thu, Dec 10, 2009 at 2:29 AM, Peters, Vijaya <[email protected]>
> wrote:
> > I tried that too.
> > in Nutch-site.xml, I added in the below, but this had no effect.
> >
> > <property>
> >  <name>db.default.fetch.interval</name>
> >  <value>0</value>
> >  <description>(DEPRECATED) The default number of days between re-fetches
> of a page.  value was 30
> >  </description>
> > </property>
> >
> > <property>
> >  <name>db.fetch.interval.default</name>
> >  <value>3600</value>
> >  <description>The default number of seconds between re-fetches of a page
> (30 days). value was 2592000 (30 days)
> >  </description>
> > </property>
> >
> > <property>
> >  <name>db.fetch.interval.max</name>
> >  <value>3600</value>
> >  <description>The maximum number of seconds between re-fetches of a page
> >  (90 days). After this period every page in the db will be re-tried, no
> >  matter what is its status.  value was 7776000
> >  </description>
> > </property>
> >
> > Vijaya Peters
> > SRA International, Inc.
> > 4350 Fair Lakes Court North
> > Room 4004
> > Fairfax, VA  22033
> > Tel:  703-502-1184
> >
> > www.sra.com
> > Named to FORTUNE's "100 Best Companies to Work For" list for 10
> consecutive years
> > P Please consider the environment before printing this e-mail
> > This electronic message transmission contains information from SRA
> International, Inc. which may be confidential, privileged or proprietary.
>  The information is intended for the use of the individual or entity named
> above.  If you are not the intended recipient, be aware that any disclosure,
> copying, distribution, or use of the contents of this information is
> strictly prohibited.  If you have received this electronic information in
> error, please notify us immediately by telephone at 866-584-2143.
> >
> > -----Original Message-----
> > From: MilleBii [mailto:[email protected]]
> > Sent: Wednesday, December 09, 2009 1:27 PM
> > To: [email protected]
> > Subject: Re: how to force nutch to do a recrawl
> >
> > Nutch only recrawl every 30 days by default. So you set the numberDays
> > adequately and it wil recrawl read nutch-default.xml to get the
> > details
> >
> > 2009/12/9, xiao yang <[email protected]>:
> >> What do you mean by "recrawl"?
> >> Does the following command meets what you need?
> >> bin/nutch crawl urls -dir crawl -depth 3 -topN 50
> >> Change the destination directory to a different one with the last crawl.
> >>
> >> On Thu, Dec 10, 2009 at 1:44 AM, Peters, Vijaya <[email protected]>
> >> wrote:
> >>> I'm running Nutch 1.0 in windows.  How do I force Nutch to do a
> complete
> >>> recrawl?
> >>>
> >>>
> >>>
> >>> thanks,
> >>>
> >>> - Vijaya
> >>>
> >>>
> >>>
> >>> Vijaya Peters
> >>> SRA International, Inc.
> >>> 4350 Fair Lakes Court North
> >>> Room 4004
> >>> Fairfax, VA  22033
> >>> Tel:  703-502-1184
> >>>
> >>> www.sra.com <http://www.sra.com/>
> >>> Named to FORTUNE's "100 Best Companies to Work For" list for 10
> >>> consecutive years
> >>>
> >>> P Please consider the environment before printing this e-mail
> >>>
> >>> This electronic message transmission contains information from SRA
> >>> International, Inc. which may be confidential, privileged or
> >>> proprietary.  The information is intended for the use of the individual
> >>> or entity named above.  If you are not the intended recipient, be aware
> >>> that any disclosure, copying, distribution, or use of the contents of
> >>> this information is strictly prohibited.  If you have received this
> >>> electronic information in error, please notify us immediately by
> >>> telephone at 866-584-2143.
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
> > --
> > -MilleBii-
> >
>



-- 
-MilleBii-

Reply via email to